Re: [heka] Retry with ElasticSearchoutput

Rob Miller Mon, 12 Jan 2015 11:13:57 -0800

This is true, but this is true no matter how the buffering is engineered. There 
might be messages sitting on the router's channel, or the output matcher's 
channel, or the output's channel, all of which would be lost if Heka crashes. 
I've imagined the buffering to be protecting against downtime and/or slowness 
of the downstream destination, not as a delivery guarantee mechanism. I wish I 
could tell you Heka guarantees message delivery, but it doesn't. I'd love to 
add support, but it would need to be considered within the context of the whole 
pipeline, not just a single point.


That being said, harm reduction is reasonable, and I'm not dead set against 
being able to buffer partial batches. But I still shy away from the idea of 
having a separate SendRecords function for bulk sending, if possible. Maybe a 
buffer that was set up for batching could accumulate individual records to a 
short term buffer, adding them to the end of the send queue when a full batch 
is acquired? I'm open to further discussion.

Also, our message header contains a message length, so with care it should be 
possible to nest the framing, it's just that confusing the inner framing for 
the outer framing is a pitfall to beware.

-r


On 01/11/2015 11:03 PM, Tiru Srikantha wrote:

The simple approach means you can lose messages if the Heka process dies
before the ticker expires or the maxcount is hit, as well as the nested
framing issues you raised. I'm probably mentally over-engineering this,
though, and if it's good enough for you, then that's what you get. I'll
send a PR with an implementation based on what you recommended - it
should be fairly simple given those limitations. Thanks for the guidance.

On Sun, Jan 11, 2015 at 6:52 PM, Rob Miller <[email protected]
<mailto:[email protected]>> wrote:



    On 01/11/2015 12:34 AM, Tiru Srikantha wrote:

        Yeah, I'm looking at this as well. I don't like losing the bulk
        because
        that means it gets slow when your load gets high due to HTTP
        overhead.
        If I didn't care about bulk it'd just be a quick rewrite. The other
        problem I ran into around bulk operations on a buffered queue is
        that
        there are 4 points when you want to flush to the output:

        1. Queued messages count equals count to send.
        2. Queued message size with the new message added will exceed
        the max
        bulk message size, even if the count is less than max, due to large
        messages.
        3. A timer expires to force a flush to the output target even if the
        count or max size hasn't been hit yet, to get the data into the
        output
        target in a timely manner.
        4. Plugin is shutting down.

        I'm actually re-writing a lot of the plugins/buffered_output.go
        file in
        my local fork because of this and teasing out the "read the next
        record
        from the file pile" operation from the "send the next record"
        operation,
        so what I'm aiming for is something like a BulkBufferedOutput
        interface
        with a SendRecords(records [][]byte) method that must be
        implemented in
        lieu of BufferedOutput's SendRecord(record []byte) and some code
        that's
        shared between them to buffer new messages and read buffered
        messages.
        SendRecords would not advance the cursor until the bulk operation
        succeeded, as you might expect.


    I recommend reading my other message in this thread
    (http://is.gd/kzuhxs) for an alternate approach. I think you can
    achieve what you want with less effort, and better separation of
    concerns, by doing the batching *before* you pass data in to the
    BufferedOutput. Ultimately each buffer "record" is just a slice of
    bytes, the buffer doesn't need to know or care if that slice
    contains a single serialized message or an accumulated batch of them.

        I'll submit a PR once I finish.


    I always look forward to PRs, but I'll warn you that one that the
    approach described above will likely be rejected. I'm not very keen
    on introducing a separate buffering interface specifically for
    batches, when a simpler change can solve the same problems.

    -r


_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

Re: [heka] Retry with ElasticSearchoutput

Reply via email to