[heka] more changes to encoding, now and going forward

Rob Miller Wed, 25 Jun 2014 16:44:01 -0700

Hi all,

Apologies in advance for what will be a long post. This will be ofinterest to you if you care about the details of Heka's design withregard to serialization and deserialization. In particular it deals withthe interactions and divisions of responsibility between encoder andoutput plugins. It introduces some small further changes that arelanding very soon, and describes some bigger changes that we areconsidering going forward, on which we'd appreciate feedback from anyonewho might have thoughts.

Over the last couple of days trink and I have been trying to deal withthe issue of stream framing in Heka's encoding layer, and digging init's led to some changes. First, some background:

Heka uses protocol buffers as its primary serialization format. Ourmessage objects are defined by a protobuf schema (seehttps://github.com/mozilla-services/heka/blob/dev/message/message.proto). Protocolbuffers does not have any built in support for streaming, however; it'sup to the user to implement framing to delimit the messages. Heka doesthis with a simple header format, also specified in the linked protobufschema and documented here:http://hekad.readthedocs.org/en/latest/message/index.html#protobuf-stream-framing

Heka depends on this framing in a number of cases, such as when sendingmessages from one Heka server to another over TCP or AMQP, or whenqueuing messages to disk. Before the introduction of encoder plugins,certain outputs would use framing in certain cases. We had a loose (butultimately false) assumption that whenever protobuf serialization wasused the framing would be desired.

When encoders were introduced, it seemed reasonable to have the encoderhandle the framing. The ProtobufEncoder would always include it, and theSandboxEncoder would include it whenever it was emitting protobufencoded data. This quickly proved ineffectively, however. There werecases where people wanted to use protobuf encoding but didn't need theframing, such as Ian Neubert's plugins for using Amazon's SQS as atransport (https://github.com/ianneub/heka-sqs). We started by addingknobs to turn off the framing to the encoders, but digging in werealized that a) the options and code was getting more complicated thanwe wanted and b) there was an inherent asymmetry in the fact that bydefault a ProtobufEncoder generated binary data that a ProtobufDecodercould not parse (since the decoder assumed that the framing had alreadybeen removed).

This finally brings me to describing the current small change. When mylatest pull request (https://github.com/mozilla-services/heka/pull/931)is merged, message framing will no longer be handled by encoder pluginsat all. Instead, every output will support a 'use_framing' config optionthat, if set to true, will mean that Heka's stream framing should beused by that output.

It is not necessary for each individual output to specify, check for, orreact to this config option. Heka itself will check if the option isthere. The catch is that instead of output code calling'OutputRunner.Encoder()' to get the encoder and then'Encoder.Encode(pack)' to do the encoding, you will just call the newlyadded 'OutputRunner.Encode(pack)' method. The OutputRunner will use theencoder to perform the initial serialization, and then will add theframing header if 'use_framing' was set to true.

That is how things will stand for the 0.6 release, but for 0.7 andbeyond we're thinking of making an even bigger change. Since we're nowat the point where the OutputRunner is handling most of the encodingdetails, it seems like it might make sense to go ahead and finish thejob so that the encoding (and any desired framing) happens before theoutput gets involved at all. This means that an output plugin would nolonger be pulling `*PipelinePack` objects off of the input channel, butwould instead receive already serialized `[]byte` blobs. Then outputcode would really focus entirely on i/o, with no need (in most cases) tothink about or interact with the encoding process.

This seems to make sense to us, and we've opened up an issue on it inour tracker (https://github.com/mozilla-services/heka/issues/930). We'reinterested in feedback, though, especially from anyone who has written(or plans to write) Heka output plugins. If you have any thoughts oropinions, please let us know. :)

And if you made it this far and are still reading I'm not sure whetherto congratulate or apologize to you.


Cheers,

-r
_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

[heka] more changes to encoding, now and going forward

Reply via email to