Hi Jay, Thanks for your answer.
> However a few things have changed since that original design: > 1. We now have the additional use cases of copycat and Samza > 2. We now realize that the assignment strategies don't actually necessarily > ensure each partition is assigned to only one consumer--there are really > valid use cases for broadcast or multiple replica assignment schemes--so we > can't actually make the a hard assertion on the server. > > So it may make sense to revist this, I don't think it is necessarily a > massive change and would give more flexibility for the variety of cases. > > -Jay I totally agree, the 1-partition-1-task mapping is too restrictive. However, I think the fundamental operation that Samza, Copycat, and Kafka consumers should agree upon is "how can I specify in a simple and transparent way which partitions I want to consume, and how?". This means providing a mapping from partitions to consumer tasks, possibly in a transparent way so as to allow for optimizations in placement, co-partitioning, etc... This issue has the potential of generating again a lot of duplicate work, and I think it should be solved at the Kafka level. Given that Copycat and normal consumers are already inside Kafka, I think having Samza there as well would simplify things a lot. The result is that Kafka would be a complete package for handling streams: - Messaging, partitioning, and fault tolerance (Kafka core) - Ingestion (Copycat) - Lightweight processing (Samza) - Coupling with other systems (Kafka consumers) Cheers, -- Gianmarco