On 10/15/2014 11:58 PM, Denis Shashkov wrote:

(Sorry, I hadn't notice this issues when I browsed all them)

Thank you, it's a great news about buffering!

Glad you think so.

Have you consider making heka pipeline synchronous or transactional?

We've never considered making the pipeline synchronous or transactional, no. We have considered taking other methods to improve reliability, such as writing data to a disk queue in the input layer, and tracking what messages have been successfully processed by the entire pipeline (for some definition of "successfully processed"), so that we'd be able to re-process any messages not marked as such upon restart after a crash. We've also considered adding disk queues to more places in Heka's pipeline, so that any place that's processing messages has its own queue that could be reprocessed rather than thinking of the whole pipeline as a single entity. But, alas, our resources are limited, and tackling this particular issue isn't on our short list, beyond the output buffering that I already mentioned.

I mean something like that:
- if output plugin cannot write the message pack (more than one) outside
or do writing, it just blocks

Currently our output disk buffer will continue to grow indefinitely. We have an issue open to improve this:

https://github.com/mozilla-services/heka/issues/1110

An obvious question this brings up is "what happens when the buffer grows to the maximum size?" I see three choices: shut down Heka, drop data on the floor, or stop pulling from the input channel which will cause back pressure to be applied to the rest of the pipeline. The third choice is what you're describing, and once the hooks for that are in place it could be used whether or not a disk buffer was actually in play, just think of it as the max buffer size set to 0.

This would need to be configurable, obviously, b/c in many cases blocking the rest of Heka will not be desired behavior, but having this as a possibility is on our radar.

- all tied modules (encoders, filters, pipeline, decoders, inputs) also
eventually blocks

It's a bit tricky to say exactly what defines a "tied module" here. One of the reasons Heka is so versatile is that the inputs, filters, and outputs are all loosely coupled, with the router and the message_matchers being the glue that ties things together. Currently, if any of the outputs or filters block, then that back pressure will flow to the router, which will in turn block *all* of the inputs, so the whole pipeline stops. This is already true, it's just not very obvious b/c hardly any of the outputs ever block, except when there are bugs, so the behavior doesn't show up very often.

- while input plugin is blocked, it doesn't acknowledge input data (e.g.
LogStreamer doesn't write position to journal or HttpListen doesn't
response).

This is already the case in most cases. If the router is backed up, the input (or the decoder that an input is using) will eventually block on dropping messages on the router's channel, which will prevent the input from continuing to process incoming data. LogstreamerInput will stop reading from the input files, TcpInput will stop accepting data, etc. I haven't looked into what HttpListenInput will do; it's possible that it will continue to accept incoming HTTP requests, accumulating a growing set of goroutines, each blocked on the stuck router. I'd say this is a bug, and each input should be looked at on a case by case basis to make sure that when Heka is backed up the failure modes are reasonable.

This mode may prevent losing messages at all. (But it might decrease
performance.)
Now heka doesn't protected from crashes or something else. If OOM killer
kills heka, I'll lose 3 messages at minimum (because of channels in the
pipeline + input and output plugins).

You'll lose many more than 3 in the default configuration. Every decoder has an in channel, the router has an in channel, every message matcher has an in channel, as well as every filter and output. By default each of these channels is 50 deep, so a busy Heka that crashes could actually be losing hundreds of messages. This channel size is configurable, you could even set it to 0 if you want unbuffered channels. We could probably stand to lower the default a bit. But once you get to a channel size of less than about 20, you might see a slight performance drop, and when you get down to very low numbers (less than 3), you'll see considerable loss of throughput as blocking increases significantly.

We've been very clear from the beginning that a) Heka isn't making any guarantees w.r.t. message delivery and b) if you absolutely *can't* afford to lose any data, you should use Heka in connection w/ additional tools and processes to make sure you're not relying on Heka itself to never drop anything, or you should maybe not use Heka at all. We'd love to support a much higher level of reliability, we have ideas about how to do so, and some of them we're planning to implement, but getting it to a rock-solid "we promise we won't ever lose a message" is not our highest priority, unfortunately. If any individuals or companies out there are interested in supporting such an undertaking, I'd be thrilled to work with them providing guidance to help make it happen, but that's all I can offer.

That being said, our experience actually using Heka is that it's generally pretty reliable, and message loss hasn't been a huge issue. And there are tons of use cases (most of them?) where a bit of loss is perfectly acceptable. We have Heka aggregators processing around 500 million messages per day; at that volume, and with what we're doing with our data, losing a few hundred messages here and there isn't a big deal. Every case is different, though, and we do plan on continuing to incrementally work towards becoming more and more reliable, being up front about our limitations along the way.

Hope this helps clarify,

-r


---- On Wed, 15 Oct 2014 22:58:07 +0700 *Rob Miller <[email protected]
<mailto:[email protected]>>* wrote ----

    Nimi is right that the TcpOutput actually does buffer messages to disk.
    Originally that functionality was built directly in to the TcpOutput,
    but we later abstracted it out so it could be used by other output
    plugins.

    What hasn't been mentioned so far is that we have plans to change the
    interaction btn encoders and outputs and, as part of that, we plan on
    making the output buffering automatically available as a configuration
    option for *every* output. There are already a couple of issues open to
    capture this:

    https://github.com/mozilla-services/heka/issues/930

    and:

    https://github.com/mozilla-services/heka/issues/1103

    which contains this relevant comment:

    https://github.com/mozilla-services/heka/issues/1103#issuecomment-58548339


    This will be a fair amount of work. It is all intended to land
    before we
    release Heka 1.0, which is targeted for January 2015.

    -r

    On 10/15/2014 03:48 AM, Denis Shashkov wrote:
     >
     > Hello!
     >
     > AFAIK, now heka doesn't guarantee message delivering in case of
    output
     > plugin couldn't write message:
     > - output plugins didn't buffer or re-try write operations,
     > - there is no interior buffer between pipeline and output plugins (I
     > know about channels, but they have finite length and they located in
     > memory),
     > - you cannot write a buffering filter (because you cannot write all
     > messages back into pipeline).
     >
     > (Please, correct me if I wrong).
     >
     > I thought a lot about how not to lost messages if my storage
    (e.g. HTTP
     > server) is unavailable. I decided it will be great if some output
    plugin
     > will be special: if other outputs cannot write messages, this
    fallback
     > output will write instead of them.
     >
     > Can I do this without touching pipeline code?
     >





_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

Reply via email to