Re: [heka] Need advice on choose of best tools

Rob Miller Mon, 16 Nov 2015 13:08:12 -0800

On 11/16/2015 12:31 PM, web user wrote:

Thanks again for the quick reply:


    Gob is not supported. Heka's native, most efficient serialization
    mechanism is protocol buffers. The simplest way to achieve what you
    want is to use TcpOutputs (with `use_framing` set to true and a
    ProtobufEncoder, which are the TcpOutput default settings) on the
    edge nodes, and a TcpInput (with a HekaFramingSplitter and a
    ProtobufDecoder, which again are the defaults) on the aggregator. If
    you'd like a more robust transport, you could consider switching to
    AMQP, Kafka, or NSQ, but those of course require running an
    additional service.


protocol buffers should be fine for now and TcpInput would also work.
What happens if the network is down for a day and the log files have
rotated:

syslog.log.1.gz
syslog.log.2.gz
syslog.log.3.gz
syslog.log.4.gz
syslog.log

Is the heka agent smart enought to figure out where it left of and push
the new messages out?

There are two parts to this. First there's the LogstreamerInput parsingside. If the Heka that's loading the log files goes down, and the filesrotate underneath Heka while it's down, Heka should notice this, scanthrough the files to find the actual place it left off, and pick up fromthere.

The second part is the TcpOutput part, which does actually run alloutput messages through a disk buffer, maintaining a checkpoint into thebuffer to know which messages have been delivered. If the connection isbroken, messages will accumulate in the buffer until the connectioncomes back, at which point it will pick up where it left off and startdraining the buffer.

It's important to realize that these two parts are not tightly coupled.The LogstreamerInput keeps track of where it is in the log stream. TheTcpOutput keeps track of the messages that it has sent. The TcpOutputknows nothing about the log stream... it neither knows nor cares muchabout where the messages that it's sending came from. If the TCPconnection goes down, the LogstreamerInput will continue processing thelog files as they're written, but the TcpOutput will be buffering themessages until the uplink comes back.

What happens if tcp input cannot connect. Will it
timeout and then keep retrying?

Yes, it will keep trying until the link comes back, or until the bufferhits a configurable max buffer size, at which point it will either startdropping messages, apply back pressure to the entire pipeline byblocking the router, or cause Heka to shut down, depending on yourconfiguration.

Is a round robin between servers
supported? or a backup heka server if it cannot connect to a primary one?

No, neither of these are supported at this time. Currently you'd need toeither do this at the network level, or use an alternate transport.

        Absolutely. If you set up the appropriate decoders on the edge
        nodes, as a part of the LogstreamerInput config, then the Heka
        messages passed from the edge nodes to the aggregator will
        contain the parsed data encoded in the message fields. If you
        don't do the decoding on the edge, then the messages will
        contain the unparsed data in the message payload, and you'll
        need to parse them on the aggregator. Note that this will
        require a MultiDecoder, because you'll first need to decode from
        protobuf, and *then* you'll need to parse the payload of the
        decoded message.


Great. This would be the great to do this at the agents.

        Great, just making sure you know the overall sitch. Although I
        should clarify that, while it's possible to push new
        SandboxFilters to a correctly configured Heka instance without
        needing a restart, deploying any other sandboxed plugin type, or
        changing the code underneath a filter that came from the config
        (rather than being dynamically injected) *will* require a
        restart. You're correct that you won't need to redeploy Heka
        itself, however.



I guess making the agent smarter where you can figure what the user/host
should be watching and getting the lua scripts for those files out the
heka agent is the customization that I would need to add. It's just nice
to know that is possible. When we get that far along, I'll reach out to
this list again with more detailed questions.



Good luck,

-r
_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

Re: [heka] Need advice on choose of best tools

Reply via email to