Re: [heka] Any plans or thoughts on 1.0?

Rob Miller Fri, 24 Apr 2015 10:50:54 -0700

Thanks for asking, I'm definitely overdue to provide an update. While some of 
you might want to know the details of what's going on, I'm sure most of you 
don't, so I'll provide two versions of the update:


TL;DR VERSION:

1.0 will be released after generic buffering for all output plugins has landed 
and had some time to burn in for a while, so we feel comfortable that it's 
stable. Unfortunately performance issues related to the current filter and 
output plugin APIs have caused the implementation of generic buffering to be a 
*lot* more work than expected. That work is happening alongside lots of other 
stuff that needs to get done, so it has taken (and will continue to take) a 
while to be completed.

EXCRUCIATING DETAIL VERSION:

There is no current schedule for a 1.0 release. Originally we were targeting 
trying to have one out by now, but other work (including the work of actually 
*using* Heka inside of Mozilla) has gotten in the way. The 
buffering-for-all-outputs is the last major change that is on deck before 1.0, 
but that's a big enough change that I'm planning to put out a 0.10 release so 
it gets some break-in time before bumping the revision to 1.0.

A great deal of work has been done on the buffering front, but unfortunately 
it's ballooned into a bigger job than expected, so there's still a lot more to 
do. We've put it together so that the buffering happens directly off of the 
router. Whenever a message_matcher match is found, Heka will check to see 
whether or not the plugin is using buffering. If no, the message will be placed 
on the plugin's input channel, as before. If yes, the message will instead be 
placed in the plugin's disk queue. For each buffered plugin, a separate 
goroutine will run to pull messages off of the queue and feed them to the 
plugin.

So far things are pretty straightforward, but here's where they take a turn. In 
order to support retries, Heka needs to know whether or not a message was 
successfully delivered. Not only that, it needs to know synchronously, before 
it advances to the next record in the disk queue. Unfortunately, Heka's current 
design makes this difficult; outputs run in their own goroutine, and expect to 
get all of their messages from the input channel. This means that telling Heka 
whether or not the message delivery was successful involves bi-directional 
synchronization between the goroutine that is pulling from the disk queue and 
the output plugin's main goroutine.

I've come up with a way to get this all working while making minimal changes to 
existing plugin code, but the cross-goroutine synchronization slows everything 
down. With this scheme, I'm seeing TcpOutput throughput of about 50% what we 
get with the existing, entirely-inside-the-plugin disk buffering. That's a 
non-starter. We can get rid of the problem by getting rid of the separate 
goroutines. But that involves a complete overhaul of the output plugin API, so 
that the Go API looks more like the existing sandbox API. That means plugin 
authors would implement `ProcessMessage` and `TimerEvent` methods instead of 
receiving messages and ticker notifications over channels like they do now.

Needless to say, rewriting every output (and filter, since with this approach 
they'll support buffering too) to use a fundamentally different API is a lot of 
work. I'm not excited about that, but I'm not sure how else to approach it. My 
current thinking is that we'll let both of the APIs exist side by side for a 
while. This would mean existing plugins could support buffering, albeit with a 
performance hit, with only trivial code changes. Much improved performance 
would be possible, but it would require reimplementing the plugin using the 
newer API. (Note that the performance penalty only applies when buffering is 
used... the unbuffered performance would match what we have now regardless of 
which API was used.)

So my next steps are to hammer out the details of the new API, implement 
support for it in the Heka core, and update the TcpOutput to support it. I'll 
keep plugging away at it, but that work is happening alongside a number of 
other important initiatives being worked on, so it will likely take a while 
longer.

Hope this answers your questions. Ideas, comments, feedback welcome as always.

-r

On 04/23/2015 10:52 AM, Tom Davis wrote:

The recent 0.9.2 release prompted me to wonder again about what might be planned
for 1.0. The milestone on GitHub seems quite outdated, aside from the generic
buffering for output plugins (which sounds great). Rob, Mike, do you guys have
any big goals for 1.0? Any timeline?

(I don't personally have a burning desire for a particular number, it's just
commonly seen as the time when most of the big, breaking stuff has been
finished and I'm wondering what's next for Heka)

Thanks!
_______________________________________________
Heka mailing list
Heka@mozilla.org
https://mail.mozilla.org/listinfo/heka


_______________________________________________
Heka mailing list
Heka@mozilla.org
https://mail.mozilla.org/listinfo/heka

Re: [heka] Any plans or thoughts on 1.0?

Reply via email to