On 01/10/2015 06:57 PM, Denis Shashkov wrote:

For this one case, though, an easier fix would be to have the ES output retry, 
and/or if it fails to have the data that *would* have been sent to ES to 
instead be written out to disk.

I agree with Rob, that it's easier to patch ES output and add a retry.

I've patched heka to add buffered output (like in TCPOutput plugin). It 
perfectly works and we have simple monitoring of buffering and sending 
performance. But... we completely lost ability to send bulk POSTs to ES because 
BufferedOutput works only with single messages. Yes, we might use flush_* 
parameters, but it's meaningless to keep messages in previous buffer and to 
lost the whole bulk of them in ES output because of network errors.

Interesting. I'm assuming you mean you've added buffering support just to the ES output, and not that you've made it generally available to *all* outputs. Am I correct?

BTW, Rob did you consider in your feature plans how to handle this situation 
(single processing in BufferedOutput vs bulk sending outputs)?

We haven't worked out the details of the new API yet, no. But adding simple batch support to the current BufferedOutput implementation would be pretty straightforward. Currently there's a `QueueRecord(*PipelinePack) error` method that serializes the pack to a byte slice (using the specified encoder), adds framing (if necessary), and writes the output bytes to the buffer. If you add alongside this a `QueueBytes([]byte) error` method that expects to receive an already serialized byte slice, then you can pass in an entire batch of encoded records and the buffer would treat is as a single record. Then SendRecord will be passed a batch at a time, and if it returns an error the entire batch will stay in the buffer for retry.

Note that, with this approach, QueueRecord would just end up encoding the pack and then calling QueueBytes. Also, we'd have to take care with nested framing. If a batch contains multiple records that are framed using the same Heka framing that the buffer uses to demarcate record boundaries, we might have issues.

That's how we'd get there with the current code. What we end up with in the long run is TBD, but it will probably be along similar lines, where an entire batch would be buffered as a single record.

Hope this helps,

-r
_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

Reply via email to