Hello Boyang,

I've made another thorough pass over this KIP and I'd like to spilt it into
two parts: the first part, covered in KIP-429 would be touching on Consumer
Coordinator only to have incremental rebalance protocol in place. The
second part (for now I've reserved KIP number 444 for it) would contain all
the changes on StreamsPartitionAssginor to allow warming up new members.

I think the first part, a.k.a. the current updated KIP-429 is ready for
review and discussions again. Would love to hear people's feedbacks and
ideas.

Guozhang



On Mon, Mar 4, 2019 at 10:09 AM Boyang Chen <bche...@outlook.com> wrote:

> Thanks Guozhang for the great questions. Answers are inlined:
>
> 1. I'm still not sure if it's worthwhile to add a new type of "learner
> task" in addition to "standby task": if the only difference is that for the
> latter, we would consider workload balance while for the former we would
> not, I think we can just adjust the logic of StickyTaskAssignor a bit to
> break that difference. Adding a new type of task would be adding a lot of
> code complexity, so if we can still piggy-back the logic on a standby-task
> I would prefer to do so.
> In the proposal we stated that we are not adding a new type of task
> implementation. The
> learner task shall share the same implementation with normal standby task,
> only that we
> shall tag the standby task with learner and prioritize the learner tasks
> replay effort.
> 2. One thing that's still not clear from the KIP wiki itself is which layer
> would the logic be implemented at. Although for most KIPs we would not
> require internal implementation details but only public facing API updates,
> for a KIP like this I think it still requires to flesh out details on the
> implementation design. More specifically: today Streams embed a full
> fledged Consumer client, which hard-code a ConsumerCoordinator inside,
> Streams then injects a StreamsPartitionAssignor to its pluggable
> PartitionAssignor interface and inside the StreamsPartitionAssignor we also
> have a TaskAssignor interface whose default implementation is
> StickyPartitionAssignor. Streams partition assignor logic today sites in
> the latter two classes. Hence the hierarchy today is:
>
> KafkaConsumer -> ConsumerCoordinator -> StreamsPartitionAssignor ->
> StickyTaskAssignor.
>
> We need to think about where the proposed implementation would take place
> at, and personally I think it is not the best option to inject all of them
> into the StreamsPartitionAssignor / StickyTaskAssignor since the logic of
> "triggering another rebalance" etc would require some coordinator logic
> which is hard to mimic at PartitionAssignor level. On the other hand, since
> we are embedding a KafkaConsumer client as a whole we cannot just replace
> ConsumerCoordinator with a specialized StreamsCoordinator like Connect does
> in KIP-415. So I'd like to maybe split the current proposal in both
> consumer layer and streams-assignor layer like we did in KIP-98/KIP-129.
> And then the key thing to consider is how to cut off the boundary so that
> the modifications we push to ConsumerCoordinator would be beneficial
> universally for any consumers, while keep the Streams-specific logic at the
> assignor level.
> Yes, that's also my ideal plan. The details for the implementation are
> depicted
> in this doc<
> https://docs.google.com/document/d/1me2a5wvxAZT1QE6HkwyDl7C2TiBQlKN3Dpw_I1ro91U/edit#heading=h.qix74qdmekae>,
> and I have explained the reasoning on why we want to push a
> global change of replacing ConsumerCoordinator with StreamCoordinator. The
> motivation
> is that KIP space is usually used for public & algorithm level change, not
> for internal
> implementation details.
>
> 3. Depending on which design direction we choose, our migration plan would
> also be quite different. For example, if we stay with ConsumerCoordinator
> whose protocol type is "consumer" still, and we can manage to make all
> changes agnostic to brokers as well as to old versioned consumers, then our
> migration plan could be much easier.
> Yes, the upgrade plan was designed to take the new StreamCoordinator
> approach
> which means we shall define a new protocol type. For existing application
> we could only
> maintain the same `consumer` protocol type is because current broker only
> allows
> change of protocol type when the consumer group is empty. It is of course
> user-unfriendly to force
> a wipe-out for the entire application, and I don't think maintaining old
> protocol type would greatly
> impact ongoing services using new stream coordinator. WDYT?
>
> 4. I think one major issue related to this KIP is that today, in the
> StickyPartitionAssignor, we always try to honor stickiness over workload
> balance, and hence "learner task" is needed to break this priority, but I'm
> wondering if we can have a better solution within sticky task assignor that
> accommodate this?
> Great question! That's what I explained in the proposal, which is that we
> should breakdown our
> delivery into different stages. At very beginning, our goal is to trigger
> learner task assignment only on
> `new` hosts, where we shall leverage leader's knowledge of previous round
> of rebalance to figure out. After
> stage one, our goal is to have a smooth scaling up experience, but the
> task balance problem is kind of orthogonal.
> The load balance problem is a much broader topic than auto scaling, which
> I figure worth discussing within
> this KIP's context since it's a naturally next-step, but wouldn't be the
> main topic.
> Learner task or auto scaling support should be treated as `a helpful
> mechanism to reach load balance`, but not `an algorithm defining load
> balance`. It would be great if you could share some insights of the stream
> task balance, which eventually helps us to break out of the KIP-429's scope
> and even define a separate KIP to focus on task weight & assignment logic
> improvement.
>
> Also thank you for making improvement on the KIP context and organization!
>
> Best,
> Boyang
> ________________________________
> From: Guozhang Wang <wangg...@gmail.com>
> Sent: Saturday, March 2, 2019 6:00 AM
> To: dev
> Subject: Re: [DISCUSS] KIP-429 : Smooth Auto-Scaling for Kafka Streams
>
> Hello Boyang,
>
> I've just made a quick pass on the KIP and here are some thoughts.
>
> Meta:
>
> 1. I'm still not sure if it's worthwhile to add a new type of "learner
> task" in addition to "standby task": if the only difference is that for the
> latter, we would consider workload balance while for the former we would
> not, I think we can just adjust the logic of StickyTaskAssignor a bit to
> break that difference. Adding a new type of task would be adding a lot of
> code complexity, so if we can still piggy-back the logic on a standby-task
> I would prefer to do so.
>
> 2. One thing that's still not clear from the KIP wiki itself is which layer
> would the logic be implemented at. Although for most KIPs we would not
> require internal implementation details but only public facing API updates,
> for a KIP like this I think it still requires to flesh out details on the
> implementation design. More specifically: today Streams embed a full
> fledged Consumer client, which hard-code a ConsumerCoordinator inside,
> Streams then injects a StreamsPartitionAssignor to its plugable
> PartitionAssignor interface and inside the StreamsPartitionAssignor we also
> have a TaskAssignor interface whose default implementation is
> StickyPartitionAssignor. Streams partition assignor logic today sites in
> the latter two classes. Hence the hierarchy today is:
>
> KafkaConsumer -> ConsumerCoordinator -> StreamsPartitionAssignor ->
> StickyTaskAssignor.
>
> We need to think about where the proposed implementation would take place
> at, and personally I think it is not the best option to inject all of them
> into the StreamsPartitionAssignor / StickyTaskAssignor since the logic of
> "triggering another rebalance" etc would require some coordinator logic
> which is hard to mimic at PartitionAssignor level. On the other hand, since
> we are embedding a KafkaConsumer client as a whole we cannot just replace
> ConsumerCoordinator with a specialized StreamsCoordinator like Connect does
> in KIP-415. So I'd like to maybe split the current proposal in both
> consumer layer and streams-assignor layer like we did in KIP-98/KIP-129.
> And then the key thing to consider is how to cut off the boundary so that
> the modifications we push to ConsumerCoordinator would be beneficial
> universally for any consumers, while keep the Streams-specific logic at the
> assignor level.
>
> 3. Depending on which design direction we choose, our migration plan would
> also be quite different. For example, if we stay with ConsumerCoordinator
> whose protocol type is "consumer" still, and we can manage to make all
> changes agnostic to brokers as well as to old versioned consumers, then our
> migration plan could be much easier.
>
> 4. I think one major issue related to this KIP is that today, in the
> StickyPartitionAssignor, we always try to honor stickiness over workload
> balance, and hence "learner task" is needed to break this priority, but I'm
> wondering if we can have a better solution within sticky task assignor that
> accommodate this?
>
> Minor:
>
> 1. The idea of two rebalances have also been discussed in
> https://issues.apache.org/jira/browse/KAFKA-6145. So we should add the
> reference on the wiki page as well.
> 2. Could you also add a section describing how the subscription /
> assignment metadata will be re-formatted? Without this information it is
> hard to get to the bottom of your idea. For example in the "Leader Transfer
> Before Scaling" section, I'm not sure why "S2 doesn't know S4 is new
> member"
> and hence would blindly obey stickiness over workload balance requirement.
>
> Guozhang
>
>
> On Thu, Feb 28, 2019 at 11:05 AM Boyang Chen <bche...@outlook.com> wrote:
>
> > Hey community friends,
> >
> > I'm gladly inviting you to have a look at the proposal to add incremental
> > rebalancing to Kafka Streams, A.K.A auto-scaling support.
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-429%3A+Smooth+Auto-Scaling+for+Kafka+Streams
> >
> > Special thanks to Guozhang for giving great guidances and important
> > feedbacks while making this KIP!
> >
> > Best,
> > Boyang
> >
>
>
> --
> -- Guozhang
>


-- 
-- Guozhang

Reply via email to