Re: [heka] How to handle multiple types of log data from same input

Rob Miller Mon, 06 Apr 2015 11:31:38 -0700

This would work, and may be a way to get started, but it is suboptimal for a 
few reasons:


* PayloadRegexDecoder is a convenient way to get started for folks who are 
unfamiliar with using grammars, but it is generally slower, less flexible, and 
less composable / reusable than LPEG. I think the time spent writing regular 
expressions to parse your logs would be better spent learning to use grammars.

* MultiDecoder only supports running all of the registered decoders in sequence 
(not at all suitable for this use case) or cascading through them all such that 
the first successful decoder wins. The latter choice can be made to work, but 
clearly it's pretty inefficient when there are more than 2 or 3 decoders to 
choose from.

* Due to the mem copies required when transferring data between Go and C, there 
is a small performance cost whenever you cross a sandbox boundary. This is 
small enough to still allow for reasonably good throughput in most cases, but 
if you have a MultiDecoder chaining multiple SandboxDecoders together you'll 
end up crossing that boundary many times in rapid succession, which certainly 
will burn cycles unnecessarily, and might slow things down more than is 
acceptable.

We've considered adding some sort of routing to the MultiDecoder, which would 
allow you to look at input data and decide which decoder should receive it 
based on arbitrary conditions, but that's not yet in place.

The best solution for this right now would be to do all of the work in a single 
SandboxDecoder. If you look at the various sandbox-based decoders that Heka 
provides, you'll see that most of the heavy lifting isn't done in the decoder 
code itself, but is delegated to Lua modules that we provide. Similarly, custom 
grammars can be added to an existing Heka installation as Lua modules. That way 
the main decoder code could use `read_message` calls to examine the input data, 
decide what type of message has been received, and invoke the appropriate 
parsing grammar for each one.

Whether it's worth it to you to set this up probably depends on the amount of 
data you need to process. If the MultiDecoder solution works, great, but keep 
in mind that if you start to need more throughput that you can evolve your 
system to meet the need.

Hope this helps!

-r


On 04/06/2015 09:56 AM, Ali wrote:

Ah-ha!  Should I use a combination of MultiDecoder and
PayloadRegexDecoder (for custom formats)?  And just assign the
MultiDecoder to the TcpInput?

-Ali

On Mon, Apr 6, 2015 at 11:49 AM Ali <[email protected]
<mailto:[email protected]>> wrote:

    Morning, all!

    I'm trying out nxlog on remote hosts and having nxlog send logs to
    my Heka host's TcpInput.  However, I'm starting to add multiple
    types of log data (syslog files, Apache logs, Tomcat logs) to the
    nxlog forwarder and I'm wondering how best to handle this.  Should I
    configure Heka to use a single TcpInput for all of these different
    message types?  Should I configure a separate TcpInput for each
    distinct message type?  Something else?

    TIA,
    Ali



_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka


_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

Re: [heka] How to handle multiple types of log data from same input

Reply via email to