1. I am neutral to modifying the consumer rebalance protocol to move the
logic pluggable to the client side, but I think if we decide to go this
route we'd better do it now than later as the protocol is not officially
"released" yet. This may delay the first release of the new consumer.

2. I like the idea of rebranding Samza as Kafka Messaging to keep the same
API / project structure. But I think the Samza PMC / committers will have
more saying in this manner.

Guozhang



On Fri, Jul 3, 2015 at 12:11 PM, Jay Kreps <j...@confluent.io> wrote:

> Hey Gianmarco,
>
> To your broader point, I agree that having a close alignment with Kafka
> would be a great thing in terms of adoption/discoverability/etc. There
> areas where I think this matters a lot are:
> 1. Website and docs: ideally when reading about Kafka you should be able to
> find out about Samza.
> 2. Api style and naming: ideally the various interfaces should feel similar
> and use similar concepts and names. This is a bunch of little things
> (calling topics and partitions in the same way, sharing metrics, sharing
> partitioning strategies, etc).
> 3. Release alignment--i.e. this set of versions all work together.
> 4. Branding--I actually think if we go down that route it would be worth
> considering just calling Samza something like "Kafka Streams" or "Kafka
> Streaming" which I think would help a lot people to understand what it is
> and since Kafka is heavily adopted would help with adoption. It always
> seems silly to bother with naming, but I actually think this ends up
> mattering a ton in how people understand the system (I guess as programmers
> we kind of all intuitively understand the importance of good naming).
>
> WRT partition mapping, yeah I totally agree. I think in all proposals this
> is left pluggable. And I think ideally the same set of assignment
> strategies should be available either in the Kafka consumer or in Samza. I
> think at this point the only debate is whether this is controlled client
> side or server side.
>
> -Jay
>
> On Fri, Jul 3, 2015 at 1:40 AM, Gianmarco De Francisci Morales <
> g...@apache.org> wrote:
>
> > Hi Jay,
> >
> > Thanks for your answer.
> >
> >
> > > However a few things have changed since that original design:
> > > 1. We now have the additional use cases of copycat and Samza
> > > 2. We now realize that the assignment strategies don't actually
> > necessarily
> > > ensure each partition is assigned to only one consumer--there are
> really
> > > valid use cases for broadcast or multiple replica assignment
> schemes--so
> > we
> > > can't actually make the a hard assertion on the server.
> > >
> > > So it may make sense to revist this, I don't think it is necessarily a
> > > massive change and would give more flexibility for the variety of
> cases.
> > >
> > > -Jay
> >
> >
> > I totally agree, the 1-partition-1-task mapping is too restrictive.
> > However, I think the fundamental operation that Samza, Copycat, and Kafka
> > consumers should agree upon is "how can I specify in a simple and
> > transparent way which partitions I want to consume, and how?".
> > This means providing a mapping from partitions to consumer tasks,
> possibly
> > in a transparent way so as to allow for optimizations in placement,
> > co-partitioning, etc...
> > This issue has the potential of generating again a lot of duplicate work,
> > and I think it should be solved at the Kafka level.
> > Given that Copycat and normal consumers are already inside Kafka, I think
> > having Samza there as well would simplify things a lot.
> > The result is that Kafka would be a complete package for handling
> streams:
> > - Messaging, partitioning, and fault tolerance (Kafka core)
> > - Ingestion (Copycat)
> > - Lightweight processing (Samza)
> > - Coupling with other systems (Kafka consumers)
> >
> > Cheers,
> >
> > --
> > Gianmarco
> >
>



-- 
-- Guozhang

Reply via email to