Re: [heka] Fallback output

Rob Miller Thu, 16 Oct 2014 10:57:48 -0700

On 10/15/2014 11:58 PM, Denis Shashkov wrote:


(Sorry, I hadn't notice this issues when I browsed all them)

Thank you, it's a great news about buffering!


Glad you think so.

Have you consider making heka pipeline synchronous or transactional?

We've never considered making the pipeline synchronous or transactional,no. We have considered taking other methods to improve reliability, suchas writing data to a disk queue in the input layer, and tracking whatmessages have been successfully processed by the entire pipeline (forsome definition of "successfully processed"), so that we'd be able tore-process any messages not marked as such upon restart after a crash.We've also considered adding disk queues to more places in Heka'spipeline, so that any place that's processing messages has its own queuethat could be reprocessed rather than thinking of the whole pipeline asa single entity. But, alas, our resources are limited, and tackling thisparticular issue isn't on our short list, beyond the output bufferingthat I already mentioned.

I mean something like that:
- if output plugin cannot write the message pack (more than one) outside
or do writing, it just blocks

Currently our output disk buffer will continue to grow indefinitely. Wehave an issue open to improve this:


https://github.com/mozilla-services/heka/issues/1110

An obvious question this brings up is "what happens when the buffergrows to the maximum size?" I see three choices: shut down Heka, dropdata on the floor, or stop pulling from the input channel which willcause back pressure to be applied to the rest of the pipeline. The thirdchoice is what you're describing, and once the hooks for that are inplace it could be used whether or not a disk buffer was actually inplay, just think of it as the max buffer size set to 0.

This would need to be configurable, obviously, b/c in many casesblocking the rest of Heka will not be desired behavior, but having thisas a possibility is on our radar.

- all tied modules (encoders, filters, pipeline, decoders, inputs) also
eventually blocks

It's a bit tricky to say exactly what defines a "tied module" here. Oneof the reasons Heka is so versatile is that the inputs, filters, andoutputs are all loosely coupled, with the router and themessage_matchers being the glue that ties things together. Currently, ifany of the outputs or filters block, then that back pressure will flowto the router, which will in turn block *all* of the inputs, so thewhole pipeline stops. This is already true, it's just not very obviousb/c hardly any of the outputs ever block, except when there are bugs, sothe behavior doesn't show up very often.

- while input plugin is blocked, it doesn't acknowledge input data (e.g.
LogStreamer doesn't write position to journal or HttpListen doesn't
response).

This is already the case in most cases. If the router is backed up, theinput (or the decoder that an input is using) will eventually block ondropping messages on the router's channel, which will prevent the inputfrom continuing to process incoming data. LogstreamerInput will stopreading from the input files, TcpInput will stop accepting data, etc. Ihaven't looked into what HttpListenInput will do; it's possible that itwill continue to accept incoming HTTP requests, accumulating a growingset of goroutines, each blocked on the stuck router. I'd say this is abug, and each input should be looked at on a case by case basis to makesure that when Heka is backed up the failure modes are reasonable.

This mode may prevent losing messages at all. (But it might decrease
performance.)
Now heka doesn't protected from crashes or something else. If OOM killer
kills heka, I'll lose 3 messages at minimum (because of channels in the
pipeline + input and output plugins).

You'll lose many more than 3 in the default configuration. Every decoderhas an in channel, the router has an in channel, every message matcherhas an in channel, as well as every filter and output. By default eachof these channels is 50 deep, so a busy Heka that crashes could actuallybe losing hundreds of messages. This channel size is configurable, youcould even set it to 0 if you want unbuffered channels. We couldprobably stand to lower the default a bit. But once you get to a channelsize of less than about 20, you might see a slight performance drop, andwhen you get down to very low numbers (less than 3), you'll seeconsiderable loss of throughput as blocking increases significantly.

We've been very clear from the beginning that a) Heka isn't making anyguarantees w.r.t. message delivery and b) if you absolutely *can't*afford to lose any data, you should use Heka in connection w/ additionaltools and processes to make sure you're not relying on Heka itself tonever drop anything, or you should maybe not use Heka at all. We'd loveto support a much higher level of reliability, we have ideas about howto do so, and some of them we're planning to implement, but getting itto a rock-solid "we promise we won't ever lose a message" is not ourhighest priority, unfortunately. If any individuals or companies outthere are interested in supporting such an undertaking, I'd be thrilledto work with them providing guidance to help make it happen, but that'sall I can offer.

That being said, our experience actually using Heka is that it'sgenerally pretty reliable, and message loss hasn't been a huge issue.And there are tons of use cases (most of them?) where a bit of loss isperfectly acceptable. We have Heka aggregators processing around 500million messages per day; at that volume, and with what we're doing withour data, losing a few hundred messages here and there isn't a big deal.Every case is different, though, and we do plan on continuing toincrementally work towards becoming more and more reliable, being upfront about our limitations along the way.


Hope this helps clarify,

-r


---- On Wed, 15 Oct 2014 22:58:07 +0700 *Rob Miller <[email protected]
<mailto:[email protected]>>* wrote ----

    Nimi is right that the TcpOutput actually does buffer messages to disk.
    Originally that functionality was built directly in to the TcpOutput,
    but we later abstracted it out so it could be used by other output
    plugins.

    What hasn't been mentioned so far is that we have plans to change the
    interaction btn encoders and outputs and, as part of that, we plan on
    making the output buffering automatically available as a configuration
    option for *every* output. There are already a couple of issues open to
    capture this:

    https://github.com/mozilla-services/heka/issues/930

    and:

    https://github.com/mozilla-services/heka/issues/1103

    which contains this relevant comment:

    https://github.com/mozilla-services/heka/issues/1103#issuecomment-58548339


    This will be a fair amount of work. It is all intended to land
    before we
    release Heka 1.0, which is targeted for January 2015.

    -r

    On 10/15/2014 03:48 AM, Denis Shashkov wrote:
     >
     > Hello!
     >
     > AFAIK, now heka doesn't guarantee message delivering in case of
    output
     > plugin couldn't write message:
     > - output plugins didn't buffer or re-try write operations,
     > - there is no interior buffer between pipeline and output plugins (I
     > know about channels, but they have finite length and they located in
     > memory),
     > - you cannot write a buffering filter (because you cannot write all
     > messages back into pipeline).
     >
     > (Please, correct me if I wrong).
     >
     > I thought a lot about how not to lost messages if my storage
    (e.g. HTTP
     > server) is unavailable. I decided it will be great if some output
    plugin
     > will be special: if other outputs cannot write messages, this
    fallback
     > output will write instead of them.
     >
     > Can I do this without touching pipeline code?
     >





_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

Re: [heka] Fallback output

Reply via email to