Hello Boyang,

I've just made a quick pass on the KIP and here are some thoughts.

Meta:

1. I'm still not sure if it's worthwhile to add a new type of "learner
task" in addition to "standby task": if the only difference is that for the
latter, we would consider workload balance while for the former we would
not, I think we can just adjust the logic of StickyTaskAssignor a bit to
break that difference. Adding a new type of task would be adding a lot of
code complexity, so if we can still piggy-back the logic on a standby-task
I would prefer to do so.

2. One thing that's still not clear from the KIP wiki itself is which layer
would the logic be implemented at. Although for most KIPs we would not
require internal implementation details but only public facing API updates,
for a KIP like this I think it still requires to flesh out details on the
implementation design. More specifically: today Streams embed a full
fledged Consumer client, which hard-code a ConsumerCoordinator inside,
Streams then injects a StreamsPartitionAssignor to its plugable
PartitionAssignor interface and inside the StreamsPartitionAssignor we also
have a TaskAssignor interface whose default implementation is
StickyPartitionAssignor. Streams partition assignor logic today sites in
the latter two classes. Hence the hierarchy today is:

KafkaConsumer -> ConsumerCoordinator -> StreamsPartitionAssignor ->
StickyTaskAssignor.

We need to think about where the proposed implementation would take place
at, and personally I think it is not the best option to inject all of them
into the StreamsPartitionAssignor / StickyTaskAssignor since the logic of
"triggering another rebalance" etc would require some coordinator logic
which is hard to mimic at PartitionAssignor level. On the other hand, since
we are embedding a KafkaConsumer client as a whole we cannot just replace
ConsumerCoordinator with a specialized StreamsCoordinator like Connect does
in KIP-415. So I'd like to maybe split the current proposal in both
consumer layer and streams-assignor layer like we did in KIP-98/KIP-129.
And then the key thing to consider is how to cut off the boundary so that
the modifications we push to ConsumerCoordinator would be beneficial
universally for any consumers, while keep the Streams-specific logic at the
assignor level.

3. Depending on which design direction we choose, our migration plan would
also be quite different. For example, if we stay with ConsumerCoordinator
whose protocol type is "consumer" still, and we can manage to make all
changes agnostic to brokers as well as to old versioned consumers, then our
migration plan could be much easier.

4. I think one major issue related to this KIP is that today, in the
StickyPartitionAssignor, we always try to honor stickiness over workload
balance, and hence "learner task" is needed to break this priority, but I'm
wondering if we can have a better solution within sticky task assignor that
accommodate this?

Minor:

1. The idea of two rebalances have also been discussed in
https://issues.apache.org/jira/browse/KAFKA-6145. So we should add the
reference on the wiki page as well.
2. Could you also add a section describing how the subscription /
assignment metadata will be re-formatted? Without this information it is
hard to get to the bottom of your idea. For example in the "Leader Transfer
Before Scaling" section, I'm not sure why "S2 doesn't know S4 is new member"
and hence would blindly obey stickiness over workload balance requirement.

Guozhang


On Thu, Feb 28, 2019 at 11:05 AM Boyang Chen <bche...@outlook.com> wrote:

> Hey community friends,
>
> I'm gladly inviting you to have a look at the proposal to add incremental
> rebalancing to Kafka Streams, A.K.A auto-scaling support.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-429%3A+Smooth+Auto-Scaling+for+Kafka+Streams
>
> Special thanks to Guozhang for giving great guidances and important
> feedbacks while making this KIP!
>
> Best,
> Boyang
>


-- 
-- Guozhang

Reply via email to