Hi Jay,

Thanks for your answer.


> However a few things have changed since that original design:
> 1. We now have the additional use cases of copycat and Samza
> 2. We now realize that the assignment strategies don't actually necessarily
> ensure each partition is assigned to only one consumer--there are really
> valid use cases for broadcast or multiple replica assignment schemes--so we
> can't actually make the a hard assertion on the server.
>
> So it may make sense to revist this, I don't think it is necessarily a
> massive change and would give more flexibility for the variety of cases.
>
> -Jay


I totally agree, the 1-partition-1-task mapping is too restrictive.
However, I think the fundamental operation that Samza, Copycat, and Kafka
consumers should agree upon is "how can I specify in a simple and
transparent way which partitions I want to consume, and how?".
This means providing a mapping from partitions to consumer tasks, possibly
in a transparent way so as to allow for optimizations in placement,
co-partitioning, etc...
This issue has the potential of generating again a lot of duplicate work,
and I think it should be solved at the Kafka level.
Given that Copycat and normal consumers are already inside Kafka, I think
having Samza there as well would simplify things a lot.
The result is that Kafka would be a complete package for handling streams:
- Messaging, partitioning, and fault tolerance (Kafka core)
- Ingestion (Copycat)
- Lightweight processing (Samza)
- Coupling with other systems (Kafka consumers)

Cheers,

--
Gianmarco

Reply via email to