Re: [heka] How to handle multiple types of log data from same input

Ali Mon, 06 Apr 2015 13:20:15 -0700

On Mon, Apr 6, 2015 at 1:31 PM Rob Miller <[email protected]> wrote:


> This would work, and may be a way to get started, but it is suboptimal for
> a few reasons:
>
> * PayloadRegexDecoder is a convenient way to get started for folks who are
> unfamiliar with using grammars, but it is generally slower, less flexible,
> and less composable / reusable than LPEG. I think the time spent writing
> regular expressions to parse your logs would be better spent learning to
> use grammars.
>
> * MultiDecoder only supports running all of the registered decoders in
> sequence (not at all suitable for this use case) or cascading through them
> all such that the first successful decoder wins. The latter choice can be
> made to work, but clearly it's pretty inefficient when there are more than
> 2 or 3 decoders to choose from.
>
> * Due to the mem copies required when transferring data between Go and C,
> there is a small performance cost whenever you cross a sandbox boundary.
> This is small enough to still allow for reasonably good throughput in most
> cases, but if you have a MultiDecoder chaining multiple SandboxDecoders
> together you'll end up crossing that boundary many times in rapid
> succession, which certainly will burn cycles unnecessarily, and might slow
> things down more than is acceptable.
>
> We've considered adding some sort of routing to the MultiDecoder, which
> would allow you to look at input data and decide which decoder should
> receive it based on arbitrary conditions, but that's not yet in place.
>
> The best solution for this right now would be to do all of the work in a
> single SandboxDecoder. If you look at the various sandbox-based decoders
> that Heka provides, you'll see that most of the heavy lifting isn't done in
> the decoder code itself, but is delegated to Lua modules that we provide.
> Similarly, custom grammars can be added to an existing Heka installation as
> Lua modules. That way the main decoder code could use `read_message` calls
> to examine the input data, decide what type of message has been received,
> and invoke the appropriate parsing grammar for each one.
>
> Whether it's worth it to you to set this up probably depends on the amount
> of data you need to process. If the MultiDecoder solution works, great, but
> keep in mind that if you start to need more throughput that you can evolve
> your system to meet the need.
>
> Hope this helps!
>

Yep!  Helps a lot.  Thanks, Rob (and Tom).  I'll be working on Lua tomorrow.

Cheers,
Ali

_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

Re: [heka] How to handle multiple types of log data from same input

Reply via email to