Hey, thanks! Responses inline.

> 1. Please, please, open tickets for anything you need. Your pain on the
> offset management is something I've been mulling over. I don't have a
> really good idea on how to solve this, but I agree we should figure it out.

Yes, absolutely. I'd been holding off opening them until I had a
better idea of exactly what I was missing, but you can expect a ticket
or two soon.

> 2. "But if our task is the only producer for that partition, it¹s easy to
> calculate the value of nŠ and by starting from the commit, and then
> dropping the first n messages of output we create, we can get ourselves
> back to a consistent state." This is true only in cases where the
> processing is deterministic. There are a shocking amount of cases where
> this is not true, including some which are very subtle. If you rely on
> wall-clock time, for example, your logic will change between processing,
> so simply dropping N messages could lead to data loss or duplication.
> Other, more nuanced, non-determinism includes deploying a different
> (newer) version of code, dependency on external systems, etc.

You're completely right; determinism is a huge consideration here. I
had a note on that in an earlier draft, but I seem to have edited it
out completely -- thanks for pointing this out.

The short version: if you can have zero or more outputs for every
input, it's definitely possible to get these weird results. If there's
precisely one output for every input message, though, you typically
*don't* have the same issues. It's also frequently possible to split a
nondeterministic job that has multiple outputs into two: one
nondeterministic half that batches its outputs as a single message,
and one deterministic one that splits that message up into multiple
outputs for some downstream job.

Again, there's a lot to say here... I'll save it for the next post.

> 3. You might want to consider the case where a downstream consumer is
> reading from a changelog, as well. This is a use case we've seen pop up
> once in a while.

This is a great optimization -- but as far as coordination goes, I
think it's similar to the general state/changelog situation. (Though
you do get to skip the downstream offset.)

> 4. One variation of merge log that we'd considered was to batch reads from
> an SSP together. This would allow you to shrink your merge log, so you can
> say, "Read 100 messages from SSP1, 100 messages from SSP2, etc." This
> allows you to drastically shrink the merge log's message count, which
> should improve performance. Samza's DefaultMessageChooser supports this
> style of batching, but it's off by default.

I've thought a bit about this; it does indeed save a lot of messages
(though repeating the same topic names over and over does gzip quite
well). I'm not sure how well it fits behind the existing
MessageChooser interface, though -- you need to know that there are
100 messages coming for that particular stream, but the message
chooser only sees them one at a time.

> 5. You should have a look at Kafka's transactionality proposal, as well
> (https://cwiki.apache.org/confluence/display/KAFKA/Transactional+Messaging+
> in+Kafka). This is the only way that I'm aware of that will support all
> use cases (including non-deterministic ones).

The transactional-messaging work is very cool, but it raises some
complexity / scalability flags for me. (All transactions serialized
through a single coordinator, unbounded buffering in the client, etc.)
It does still seem like a useful pattern, but I do think there's still
space for these other tools.

I'd love to see a project that does for Kafka what Curator does for
Zookeeper -- packaging up some of these patterns as a library and
decoupling them from the core project. But that's another
conversation...

-- 
Ben Kirwin
http://ben.kirw.in/

Reply via email to