Hi!

Just wanted to give everyone a heads up that a significant change will be landing on the dev branch of Heka soon, to be included in the eventual 0.6 release.

This change will introduce a new "Encoder" plugin type. Encoders are the inverse of the already existing Decoder plugin type. Decoders are used by Input plugins to convert arbitrary input data into a Heka message struct, Encoders are used by Output plugins to convert Heka message structs into arbitrary output data.

We realized this was a necessary abstraction when we saw that various different Output plugins were implementing their own ways to manage the serialization. For instance, currently:

- FileOutput supports a 'format' config option which can be one of 'text', 'json', or 'protobufstream'. The 'text' option includes only the message payload. The 'json' format contains all of the message fields, but is an inflexible format.

- LogOutput supports a 'payload_only' boolean config option. If true, then the message payload will be written to stdout. If false, then a custom, inflexible text rendering of the message data will be generated.

- ElasticSearchOutput supports a 'format' config option which can be one of 'clean', 'logstash_v0', 'payload', or 'raw'. All of them generate JSON in a specific format, except 'payload', which presumes that the message payload already contains the JSON that you want to send to ElasticSearch.

- TcpOutput has no flexibility, it only generates Protocol Buffer encoded message streams.

The introduction of Encoder plugins means all of these one-off serialization strategies can go away. Instead, you'll add an Encoder config section to your TOML config, and then you'll refer to configured Encoder sections from your Output config sections. So what would have been this:

    [LogOutput]
    payload_only = true

Instead will be this:

    [PayloadEncoder]

    [LogOutput]
    encoder = "PayloadEncoder"

The initial code will include three encoders: ProtobufEncoder (generates Heka's native protocol buffer streams), PayloadEncoder (extracts message payload), and SandboxEncoder (lets you use Lua code to extract data from a message and generate whatever output you want). There may be more coming in the future, but really our hope is that the SandboxEncoder will meet most of your needs.

Also, initially the TcpOutput, LogOutput, and FileOutput have been modified to use Encoders instead of their previous mechanisms. TcpOutput defaults to use of ProtobufEncoder, which exactly matches the previous behavior, so no config changes should be necessary. If you're using LogOutput or FileOutput, however, when you upgrade you'll need to modify your config to include appropriate Encoder plugins and to make sure they're being used by your outputs.

Anyone interested in digging in to the code can take a look at the open pull request at https://github.com/mozilla-services/heka/pull/838. It's currently awaiting code review, which might result in further small revisions, but we definitely expect to land this on dev over the next few days. I'll send another note out when it lands.

If you made it this far, wow, I'm impressed! Thanks for your attention, hope the Encoder plugins work for you, and please let us know if you have any questions or issues.

Thanks!

-r
_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

Reply via email to