Re: [heka] How to handle multiple types of log data from same input

Tom Davis Mon, 06 Apr 2015 14:59:29 -0700

Rob Miller writes:

> On 04/06/2015 11:49 AM, Tom Davis wrote:
>> Rob Miller writes:
>> >
>> > We've considered adding some sort of routing to the MultiDecoder, which 
>> > would allow you to look at input data and decide which decoder should 
>> > receive it based on arbitrary conditions, but that's not yet in place.
>>
>> This would be really cool for my use case, too. Currently all of my decoders 
>> in
>> the MultiDecoder chain are implemented in Go so I'm not worried as much about
>> the performance implications (and load is quite low), but more advanced 
>> routing
>> would help as there are dependencies between the decoders. If the first 
>> fails,
>> trying any others is pointless because the message is already missing vital
>> data; it is up to the subsequent decoders to identify the failure, however,
>> since cascade_strategy=all has no short-circuit method.


> What you're describing here is pretty much exactly why we recommend doing 
> this sort of thing
> entirely in Lua instead of in Go. Trying to express complex conditional 
> relationships using a
> declarative config format like TOML is never going to be fun. A Turing 
> complete language is a much
> better choice. If you implemented your decoding logic in Lua instead of Go, 
> it would be easy to
> write a small amount of glue code to make sure that the decoding is handled 
> correctly, and that
> the correct actions are taken if a requisite step along the way fails. Having 
> us continue to add
> more and more complexity to the MultiDecoder as more and more complicated use 
> cases surface seems
> like a losing battle.
>

Yeah, I definitely see your point. I started off wanting to do the decoding in 
Lua, but for various
boring reasons it would have created a bunch more work than doing it in Go. At 
the moment the logic
is simple enough that I'm willing to deal with making a couple assertions as 
data goes through the
multi-decoder.

>> Then again, I'm using Heka pretty outside its core use case of processing log
>> and analytics data.
> I wouldn't say those are Heka's core use cases, they're just the paths that 
> have been most traveled, so far. Ideally Heka is useful in any situation 
> where you want to collect and ship data, possibly transforming it along the 
> way.
>
> I'm curious what your use cases are... mind sharing?

I'm using Heka as a generic pipeline for source code packaging. For instance, 
one input is a
URL that gets "decoded" to source code (tarballs are downloaded and extracted, 
git repos cloned,
etc.). The source is "filtered" in Lua based on matches (name, source language, 
whatever); this may
involve invoking external processes, adding new metadata files to the source 
tree, etc. When
everything is in order, the new tree is "encoded" in some way (packaged as RPM, 
Docker container,
etc.) and output.

For obvious reasons I'm not pushing the raw bytes of arbitrary source trees 
through messages, so a
lot of the transforming happens on the file system. The messages carry the 
metadata filters need to
match and/or decide what to do. If a particular project needs custom logic (and 
they often do) I can
load custom-built filters easily through the sandbox manager. Messages are also 
an easy way to
indirectly communicate between parts of the pipeline. An abuse, perhaps, but no 
sense bolting on
more channels.

My steps don't always map super cleanly to parts of the Heka pipeline, but this 
beats writing and
instrumenting my own script-able state machine. Thanks for the interest!

Cheers,

Tom

>
>
> -r
_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

Re: [heka] How to handle multiple types of log data from same input

Reply via email to