Re: [heka] 答复： Some question about HEKA

Rob Miller Thu, 11 Dec 2014 10:35:26 -0800

In a sandbox filter you typically implement two functions. The first, 
`process_message`, is called for every incoming message, and it's where you 
perform the aggregation. The second, `timer_event`, is called every ticker 
interval, and it's where you emit the data that has been aggregated since the 
last timer_event call.


If you don't want to include data for the most recent interval b/c that data is 
still incomplete, there's nothing stopping you from doing so, it'd likely just 
be some extra logic (and maybe some copying) in your timer_event function.

-r


On 12/10/2014 11:38 PM, 储晓颖(章邯) wrote:

Thanks a lot for reply. I think my problem happens while "periodically
emitting the circular buffer data which will show up as a graph" :


Here is a slide from

http://slides.seld.be/?file=2013-12-13+Application+monitoring+with+Heka+and+statsd.html

I notice that the tail of the graph is failing. I guess it's the same
problem as mine: I don't want to emit the real-time data until it's
totally correct. And the difficulty is when does HEKA know a "60 second
aggregation" has completed totally?

I think the key of solution is Periodically-Data-Collecting. We must
collect the data in the very source periodically, wtih executing
aggregation task periodically. Then we can emit the correct data when
the whole task is fiinished (like a real-time hadoop MAP/REDUCE job, but
the source is not HDFS). If the data-flow is like a stream (using storm,
for expamle), we cannot acheive the target easily.



    ------------------------------------------------------------------
    发件人：Rob Miller <[email protected]>
    发送时间：2014年12月11日(星期四) 01:41
    收件人：储晓颖(章邯) <[email protected]>
    抄　送：heka <[email protected]>
    主　题：Re: [heka] Some question about HEKA

    I'm not entirely sure what "the PV (of some minute) from an apache's
    log in some server" means (page views, probably), but the answer to
    your question in general is that you'd use a filter to perform any
    aggregation that you need.

    Heka exposes a circular buffer library in its Lua sandbox,
    specifically intended for handling time series data (see
    
https://github.com/mozilla-services/lua_sandbox/blob/dev/docs/circular_buffer.md).
    To track by the minute, you'd initialize a cbuf with 60 seconds per
    row, adding values to the column in question (page views, in your
    case) as they come in, periodically emitting the circular buffer
    data which will show up as a graph on the dashboard or otherwise
    converted and processed as you see fit. The cbuf library also
    supports simple anomaly detection and alerting, if you want to do
    monitoring of the data.

    Heka ships with a filter that uses the cbuf library to track HTTP
    status codes that have been parsed out of a web server's logs, see
    
https://hekad.readthedocs.org/en/latest/config/filters/index.html#http-status-graph
    (or
    
https://github.com/mozilla-services/heka/blob/dev/sandbox/lua/filters/http_status.lua
    for source code).

    You don't have to use a circular buffer, of course; you can handle
    the aggregation yourself, and you can emit data in any format you
    desire, but then you lose the built in interoperability with the
    dashboard and the anomaly detection.

    Hope this helps,

    -r


    On 12/09/2014 09:21 AM, 储晓颖(章邯) wrote:
     > Hi All,
     > I am a software engineer. Recently I learnt about the brilliant HEKA
     > project. And I am wondering if She has solved the problems that I
    used
     > to deal with. The most important problem is the consistency in
     > Term-Data-Calculating situation. For example, if I want to
    calculate the
     > PV(of some minute) from an apache's log in some server, I have to
    flow
     > the log's content into HEKA and wait for its output. Assuming the
    minute
     > is M, when does HEKA know that the whole log of M has all arrived and
     > updated into the result? In my situation, I cannot show the PV of M
     > until it's calculated completely. I used to depend on the data-driven
     > way —— if the first log of M+1 has arrived and the log transfer is in
     > sequence, I can release the data of M. And the solution is more
     > complicate considering merged PV of distributed apaches' logs in many
     > servers.
     > Should I still concern about this problem if I use HEKA? And how
     > does she handle it?
     >
     > Thanks a lot.
     > zhanghan
     >
     >
     > _______________________________________________
     > Heka mailing list
     > [email protected]
     > https://mail.mozilla.org/listinfo/heka
     >


_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

Re: [heka] 答复： Some question about HEKA

Reply via email to