Hi all,
Apologies in advance for what will be a long post. This will be of
interest to you if you care about the details of Heka's design with
regard to serialization and deserialization. In particular it deals with
the interactions and divisions of responsibility between encoder and
output plugins. It introduces some small further changes that are
landing very soon, and describes some bigger changes that we are
considering going forward, on which we'd appreciate feedback from anyone
who might have thoughts.
Over the last couple of days trink and I have been trying to deal with
the issue of stream framing in Heka's encoding layer, and digging in
it's led to some changes. First, some background:
Heka uses protocol buffers as its primary serialization format. Our
message objects are defined by a protobuf schema (see
https://github.com/mozilla-services/heka/blob/dev/message/message.proto). Protocol
buffers does not have any built in support for streaming, however; it's
up to the user to implement framing to delimit the messages. Heka does
this with a simple header format, also specified in the linked protobuf
schema and documented here:
http://hekad.readthedocs.org/en/latest/message/index.html#protobuf-stream-framing
Heka depends on this framing in a number of cases, such as when sending
messages from one Heka server to another over TCP or AMQP, or when
queuing messages to disk. Before the introduction of encoder plugins,
certain outputs would use framing in certain cases. We had a loose (but
ultimately false) assumption that whenever protobuf serialization was
used the framing would be desired.
When encoders were introduced, it seemed reasonable to have the encoder
handle the framing. The ProtobufEncoder would always include it, and the
SandboxEncoder would include it whenever it was emitting protobuf
encoded data. This quickly proved ineffectively, however. There were
cases where people wanted to use protobuf encoding but didn't need the
framing, such as Ian Neubert's plugins for using Amazon's SQS as a
transport (https://github.com/ianneub/heka-sqs). We started by adding
knobs to turn off the framing to the encoders, but digging in we
realized that a) the options and code was getting more complicated than
we wanted and b) there was an inherent asymmetry in the fact that by
default a ProtobufEncoder generated binary data that a ProtobufDecoder
could not parse (since the decoder assumed that the framing had already
been removed).
This finally brings me to describing the current small change. When my
latest pull request (https://github.com/mozilla-services/heka/pull/931)
is merged, message framing will no longer be handled by encoder plugins
at all. Instead, every output will support a 'use_framing' config option
that, if set to true, will mean that Heka's stream framing should be
used by that output.
It is not necessary for each individual output to specify, check for, or
react to this config option. Heka itself will check if the option is
there. The catch is that instead of output code calling
'OutputRunner.Encoder()' to get the encoder and then
'Encoder.Encode(pack)' to do the encoding, you will just call the newly
added 'OutputRunner.Encode(pack)' method. The OutputRunner will use the
encoder to perform the initial serialization, and then will add the
framing header if 'use_framing' was set to true.
That is how things will stand for the 0.6 release, but for 0.7 and
beyond we're thinking of making an even bigger change. Since we're now
at the point where the OutputRunner is handling most of the encoding
details, it seems like it might make sense to go ahead and finish the
job so that the encoding (and any desired framing) happens before the
output gets involved at all. This means that an output plugin would no
longer be pulling `*PipelinePack` objects off of the input channel, but
would instead receive already serialized `[]byte` blobs. Then output
code would really focus entirely on i/o, with no need (in most cases) to
think about or interact with the encoding process.
This seems to make sense to us, and we've opened up an issue on it in
our tracker (https://github.com/mozilla-services/heka/issues/930). We're
interested in feedback, though, especially from anyone who has written
(or plans to write) Heka output plugins. If you have any thoughts or
opinions, please let us know. :)
And if you made it this far and are still reading I'm not sure whether
to congratulate or apologize to you.
Cheers,
-r
_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka