Hi, Gianmarco, {quote} However, I think the fundamental operation that Samza, Copycat, and Kafka consumers should agree upon is "how can I specify in a simple and transparent way which partitions I want to consume, and how?". {quote}
I agree that some basic partition distribution mechanism can be common and those common use patterns should be provided / solved at Kafka level. I would argue that a client-side pluggable logic is needed for the following two reasons: 1. On the broker-side, the system does not have a view on client-side resource/state (i.e. host-affinity of local state is a good example). When the partition distribution/assignment needs to take client-side resource/state into consideration, we need the client-side logic. 2. When we run Samza as a service, their might be additional resource/quote related policies that requires an application-level decision, which the information needed for decision is not visible at Kafka-level. In that case, a pluggable client-side logic is useful. Thanks! On Fri, Jul 3, 2015 at 1:40 AM, Gianmarco De Francisci Morales < g...@apache.org> wrote: > Hi Jay, > > Thanks for your answer. > > > > However a few things have changed since that original design: > > 1. We now have the additional use cases of copycat and Samza > > 2. We now realize that the assignment strategies don't actually > necessarily > > ensure each partition is assigned to only one consumer--there are really > > valid use cases for broadcast or multiple replica assignment schemes--so > we > > can't actually make the a hard assertion on the server. > > > > So it may make sense to revist this, I don't think it is necessarily a > > massive change and would give more flexibility for the variety of cases. > > > > -Jay > > > I totally agree, the 1-partition-1-task mapping is too restrictive. > However, I think the fundamental operation that Samza, Copycat, and Kafka > consumers should agree upon is "how can I specify in a simple and > transparent way which partitions I want to consume, and how?". > This means providing a mapping from partitions to consumer tasks, possibly > in a transparent way so as to allow for optimizations in placement, > co-partitioning, etc... > This issue has the potential of generating again a lot of duplicate work, > and I think it should be solved at the Kafka level. > Given that Copycat and normal consumers are already inside Kafka, I think > having Samza there as well would simplify things a lot. > The result is that Kafka would be a complete package for handling streams: > - Messaging, partitioning, and fault tolerance (Kafka core) > - Ingestion (Copycat) > - Lightweight processing (Samza) > - Coupling with other systems (Kafka consumers) > > Cheers, > > -- > Gianmarco >