Re: [heka] interested in using heka for a reliable metrics pipeline. have some questions

Rob Miller Wed, 05 Aug 2015 11:08:25 -0700

On 08/05/2015 06:55 AM, Dieter Plaetinck wrote:

On Tue, 04 Aug 2015 11:49:38 -0700
Rob Miller <[email protected]> wrote:


>> I have started work on a heka output and encoder plugin for kairosdb's rest 
endpoint.
> Is a custom output really needed? Ideally you'd just use the standard 
HttpOutput.

i'll try it out. i presume it honors response codes, and when combined with 
disk buffer, updates the cursor upon 2xx response?

Close. Currently updates the cursor when the output's `request` method 
(https://github.com/mozilla-services/heka/blob/dev/plugins/http/http_output.go#L125)
 doesn't return an error. Error is returned if the request can't be created, or if 
the response status is >= 400. Not sure if this is optimal for all cases, open 
to feedback.

> There's currently no support for delaying the RabbitMQ ack until the received 
message has been written to disk. All AMQP interactions happen in the AMQPInput. 
Disk buffering, if it happens, doesn't occur until between the message matcher and 
the filter or output plugin.

oh, i seemed to recall acks could be postponed until further processing down 
the pipe. I thought this came up as a differentiatior compared to logstash in 
one of the older videos on vimeo. but maybe i'm mistaken. or did this change?

Might have been mentioned as on the wish list, but wasn't ever presented as 
already in place. Current thinking is that coordinating btn multiple steps in 
the pipeline is not the best approach, but rather we can use disk buffers in 
more places, so that each step in the pipeline always only has one message in 
memory at a time, like Hindsight.

>>     I couldn't find any config option to the disk buffer that controls how 
often/after how much data sync() is called.
> There's no relationship btn filter / output disk buffering and the details of 
any particular input plugin.

I was just curious how often the disk buffer syncs data to disk or what kind of 
semantics it uses to determine when data that comes in (the disk buffer code), 
when it is safely stored on disk, and if there's any knobs to control the 
behavior like sync_every x ms or sync_after x messages

Disk buffer code is here: 
https://github.com/mozilla-services/heka/blob/dev/pipeline/queue_buffer.go

Currently we're just writing every message that is fed into the disk buffer 
directly to the currently open file handle. No explicit syncing, leaving that 
to the OS.

-r
_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

Re: [heka] interested in using heka for a reliable metrics pipeline. have some questions

Reply via email to