On 10/15/2014 11:58 PM, Denis Shashkov wrote:
(Sorry, I hadn't notice this issues when I browsed all them)
Thank you, it's a great news about buffering!
Glad you think so.
Have you consider making heka pipeline synchronous or transactional?
We've never considered making the pipeline synchronous or transactional,
no. We have considered taking other methods to improve reliability, such
as writing data to a disk queue in the input layer, and tracking what
messages have been successfully processed by the entire pipeline (for
some definition of "successfully processed"), so that we'd be able to
re-process any messages not marked as such upon restart after a crash.
We've also considered adding disk queues to more places in Heka's
pipeline, so that any place that's processing messages has its own queue
that could be reprocessed rather than thinking of the whole pipeline as
a single entity. But, alas, our resources are limited, and tackling this
particular issue isn't on our short list, beyond the output buffering
that I already mentioned.
I mean something like that:
- if output plugin cannot write the message pack (more than one) outside
or do writing, it just blocks
Currently our output disk buffer will continue to grow indefinitely. We
have an issue open to improve this:
https://github.com/mozilla-services/heka/issues/1110
An obvious question this brings up is "what happens when the buffer
grows to the maximum size?" I see three choices: shut down Heka, drop
data on the floor, or stop pulling from the input channel which will
cause back pressure to be applied to the rest of the pipeline. The third
choice is what you're describing, and once the hooks for that are in
place it could be used whether or not a disk buffer was actually in
play, just think of it as the max buffer size set to 0.
This would need to be configurable, obviously, b/c in many cases
blocking the rest of Heka will not be desired behavior, but having this
as a possibility is on our radar.
- all tied modules (encoders, filters, pipeline, decoders, inputs) also
eventually blocks
It's a bit tricky to say exactly what defines a "tied module" here. One
of the reasons Heka is so versatile is that the inputs, filters, and
outputs are all loosely coupled, with the router and the
message_matchers being the glue that ties things together. Currently, if
any of the outputs or filters block, then that back pressure will flow
to the router, which will in turn block *all* of the inputs, so the
whole pipeline stops. This is already true, it's just not very obvious
b/c hardly any of the outputs ever block, except when there are bugs, so
the behavior doesn't show up very often.
- while input plugin is blocked, it doesn't acknowledge input data (e.g.
LogStreamer doesn't write position to journal or HttpListen doesn't
response).
This is already the case in most cases. If the router is backed up, the
input (or the decoder that an input is using) will eventually block on
dropping messages on the router's channel, which will prevent the input
from continuing to process incoming data. LogstreamerInput will stop
reading from the input files, TcpInput will stop accepting data, etc. I
haven't looked into what HttpListenInput will do; it's possible that it
will continue to accept incoming HTTP requests, accumulating a growing
set of goroutines, each blocked on the stuck router. I'd say this is a
bug, and each input should be looked at on a case by case basis to make
sure that when Heka is backed up the failure modes are reasonable.
This mode may prevent losing messages at all. (But it might decrease
performance.)
Now heka doesn't protected from crashes or something else. If OOM killer
kills heka, I'll lose 3 messages at minimum (because of channels in the
pipeline + input and output plugins).
You'll lose many more than 3 in the default configuration. Every decoder
has an in channel, the router has an in channel, every message matcher
has an in channel, as well as every filter and output. By default each
of these channels is 50 deep, so a busy Heka that crashes could actually
be losing hundreds of messages. This channel size is configurable, you
could even set it to 0 if you want unbuffered channels. We could
probably stand to lower the default a bit. But once you get to a channel
size of less than about 20, you might see a slight performance drop, and
when you get down to very low numbers (less than 3), you'll see
considerable loss of throughput as blocking increases significantly.
We've been very clear from the beginning that a) Heka isn't making any
guarantees w.r.t. message delivery and b) if you absolutely *can't*
afford to lose any data, you should use Heka in connection w/ additional
tools and processes to make sure you're not relying on Heka itself to
never drop anything, or you should maybe not use Heka at all. We'd love
to support a much higher level of reliability, we have ideas about how
to do so, and some of them we're planning to implement, but getting it
to a rock-solid "we promise we won't ever lose a message" is not our
highest priority, unfortunately. If any individuals or companies out
there are interested in supporting such an undertaking, I'd be thrilled
to work with them providing guidance to help make it happen, but that's
all I can offer.
That being said, our experience actually using Heka is that it's
generally pretty reliable, and message loss hasn't been a huge issue.
And there are tons of use cases (most of them?) where a bit of loss is
perfectly acceptable. We have Heka aggregators processing around 500
million messages per day; at that volume, and with what we're doing with
our data, losing a few hundred messages here and there isn't a big deal.
Every case is different, though, and we do plan on continuing to
incrementally work towards becoming more and more reliable, being up
front about our limitations along the way.
Hope this helps clarify,
-r
---- On Wed, 15 Oct 2014 22:58:07 +0700 *Rob Miller <[email protected]
<mailto:[email protected]>>* wrote ----
Nimi is right that the TcpOutput actually does buffer messages to disk.
Originally that functionality was built directly in to the TcpOutput,
but we later abstracted it out so it could be used by other output
plugins.
What hasn't been mentioned so far is that we have plans to change the
interaction btn encoders and outputs and, as part of that, we plan on
making the output buffering automatically available as a configuration
option for *every* output. There are already a couple of issues open to
capture this:
https://github.com/mozilla-services/heka/issues/930
and:
https://github.com/mozilla-services/heka/issues/1103
which contains this relevant comment:
https://github.com/mozilla-services/heka/issues/1103#issuecomment-58548339
This will be a fair amount of work. It is all intended to land
before we
release Heka 1.0, which is targeted for January 2015.
-r
On 10/15/2014 03:48 AM, Denis Shashkov wrote:
>
> Hello!
>
> AFAIK, now heka doesn't guarantee message delivering in case of
output
> plugin couldn't write message:
> - output plugins didn't buffer or re-try write operations,
> - there is no interior buffer between pipeline and output plugins (I
> know about channels, but they have finite length and they located in
> memory),
> - you cannot write a buffering filter (because you cannot write all
> messages back into pipeline).
>
> (Please, correct me if I wrong).
>
> I thought a lot about how not to lost messages if my storage
(e.g. HTTP
> server) is unavailable. I decided it will be great if some output
plugin
> will be special: if other outputs cannot write messages, this
fallback
> output will write instead of them.
>
> Can I do this without touching pipeline code?
>
_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka
_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka