Re: [heka] TcpInput queue behavior

Rob Miller Thu, 16 Apr 2015 15:36:06 -0700

Great, thanks for reporting back. As you may have noticed in another thread on 
this list, there *is* a bug where the TcpOutput is not recovering from the 
queue filling up, whether in 'block' or 'drop' mode. I've fixed that and opened 
a PR with the fix (https://github.com/mozilla-services/heka/pull/1487), and am 
currently working on a similar fix for the ElasticSearchOutput. Both of these 
fixes will land in the 0.9.2 release, which is being pushed back to early next 
week so I can get these in.


-r


On 04/16/2015 03:19 PM, Giordano, J C. wrote:

Rob,

With respect to why the queue file has been filling up so quickly, this
has been due to my configuration with the LogstreamerInput responsible
for harvesting Apache log entries.  This input was processing a total of
38 Apache logs and when starting up Heka, there's no doubt it was
experiencing an initial surge of log entries leading to full queue
files.  Once I configured the Input to use just a single Apache log,
Heka running in more of a steady state operation performs fine.
  Moreover, I've been able to confirm the queue files are, indeed, being
automatically deleted.  In fact, I ran this configuration continuously
for several hours with no issue.

FWIW. Based on an earlier comment you made about reproducing an issue
with the queue_full_action=block setting, I've done all my testing with
the setting: queue_full_action=drop.  Will look forward to an update
that addresses the 'block' issue not properly recovering.

Again, greatly appreciate your support!

Thanks.

Chris

Thanks so much for this explanation.

On Apr 15, 2015, at 6:08 PM, Rob Miller wrote:
>
>
> I'll explain how things are supposed to work, hopefully that'll help
> provide some context.
>
> When buffering is in play, whether in a TcpOutput or an
> ElasticSearchOutput, *every* message will go through the buffer.
> Messages are received via the plugin's input channel and then (when
> things are flowing smoothly) immediately written to disk at the end of
> the buffer. There is another goroutine running that is constantly
> pulling records from earlier in the buffer, where the cursor points,
> and trying to send them. If a send is successful, the queue cursor is
> updated to the next record, and the process is repeated. If the send
> fails, the queue cursor doesn't update, and the same record will be
> retried until the send succeeds.
>
> When the sending goroutine clears out one queue file and moves on to
> the next one, it is supposed to advance the cursor to the next file
> and *delete* the file that was just finished. You didn't mention
> explicitly whether the queue files are being automatically deleted as
> they're drained. Are they?
>
> While this is happening, Heka is keeping track of the size of the disk
> queue. When a message is added to the queue, the size increases. When
> a file is drained and deleted, the size goes down. This is all fine
> unless and until the size of the queue hits the specified max size,
> then the behavior is specified by `queue_full_action`. The "shutdown"
> option is self-explanatory, Heka shuts down. The "drop" option means
> that the intake goroutine just drops the message on the floor.
> Messages keep flowing, but they don't get added to the queue, they
> never will. The "block" option means that the plugin stops pulling
> from the input channel altogether. The channel backs up, eventually
> blocking the router, traffic stops flowing through Heka until there's
> room for the queue to grow again.
>
> In both the "drop" and "block" case, correct recovery depends on Heka
> being able to continue processing the buffer. As records get processed
> and queue files are drained, they're deleted. This will push the queue
> size below the maximum size, which then in turn means the intake
> goroutine can once again start appending to the end of the queue.
>
> If you delete queue files out from under the output, things will get
> weird. The output goroutine will probably get confused, because the
> file handle it's holding no longer points to a valid file. Also, Heka
> won't know to subtract that file size from the queue size, so until
> you do a restart (which causes Heka to scan through the queue and
> recalculate its total size) the queue will always seem bigger than it
> actually is.
>
> One thing that comes up for me is it's weird how your queue is filling
> up so quickly. Why is that happening? The buffer is meant to allow
> Heka to survive small amounts of downtime or disconnect without losing
> any data, or to handle short burst spikes. It's not magic; if the data
> is continually coming in more quickly than it can go out, then you're
> going to have a problem, no matter what, disk queuing is only going to
> delay the inevitable.
>
> Does this help at all?
>
> -r
>



_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka


_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

Re: [heka] TcpInput queue behavior

Reply via email to