This is true, but this is true no matter how the buffering is engineered. There might be messages sitting on the router's channel, or the output matcher's channel, or the output's channel, all of which would be lost if Heka crashes. I've imagined the buffering to be protecting against downtime and/or slowness of the downstream destination, not as a delivery guarantee mechanism. I wish I could tell you Heka guarantees message delivery, but it doesn't. I'd love to add support, but it would need to be considered within the context of the whole pipeline, not just a single point.
That being said, harm reduction is reasonable, and I'm not dead set against being able to buffer partial batches. But I still shy away from the idea of having a separate SendRecords function for bulk sending, if possible. Maybe a buffer that was set up for batching could accumulate individual records to a short term buffer, adding them to the end of the send queue when a full batch is acquired? I'm open to further discussion. Also, our message header contains a message length, so with care it should be possible to nest the framing, it's just that confusing the inner framing for the outer framing is a pitfall to beware. -r On 01/11/2015 11:03 PM, Tiru Srikantha wrote:
The simple approach means you can lose messages if the Heka process dies before the ticker expires or the maxcount is hit, as well as the nested framing issues you raised. I'm probably mentally over-engineering this, though, and if it's good enough for you, then that's what you get. I'll send a PR with an implementation based on what you recommended - it should be fairly simple given those limitations. Thanks for the guidance. On Sun, Jan 11, 2015 at 6:52 PM, Rob Miller <[email protected] <mailto:[email protected]>> wrote: On 01/11/2015 12:34 AM, Tiru Srikantha wrote: Yeah, I'm looking at this as well. I don't like losing the bulk because that means it gets slow when your load gets high due to HTTP overhead. If I didn't care about bulk it'd just be a quick rewrite. The other problem I ran into around bulk operations on a buffered queue is that there are 4 points when you want to flush to the output: 1. Queued messages count equals count to send. 2. Queued message size with the new message added will exceed the max bulk message size, even if the count is less than max, due to large messages. 3. A timer expires to force a flush to the output target even if the count or max size hasn't been hit yet, to get the data into the output target in a timely manner. 4. Plugin is shutting down. I'm actually re-writing a lot of the plugins/buffered_output.go file in my local fork because of this and teasing out the "read the next record from the file pile" operation from the "send the next record" operation, so what I'm aiming for is something like a BulkBufferedOutput interface with a SendRecords(records [][]byte) method that must be implemented in lieu of BufferedOutput's SendRecord(record []byte) and some code that's shared between them to buffer new messages and read buffered messages. SendRecords would not advance the cursor until the bulk operation succeeded, as you might expect. I recommend reading my other message in this thread (http://is.gd/kzuhxs) for an alternate approach. I think you can achieve what you want with less effort, and better separation of concerns, by doing the batching *before* you pass data in to the BufferedOutput. Ultimately each buffer "record" is just a slice of bytes, the buffer doesn't need to know or care if that slice contains a single serialized message or an accumulated batch of them. I'll submit a PR once I finish. I always look forward to PRs, but I'll warn you that one that the approach described above will likely be rejected. I'm not very keen on introducing a separate buffering interface specifically for batches, when a simpler change can solve the same problems. -r
_______________________________________________ Heka mailing list [email protected] https://mail.mozilla.org/listinfo/heka

