Hello Sophie,

Thanks for the proposed KIP. I left some comments on the wiki itself, and I
think I'm still not very clear on a couple or those:

1. With this proposal, does that mean with num.standby.replicas == 0, we
may sometimes still have some standby tasks which may violate the config?

2. I think I understand the rationale to consider lags that is below the
specified threshold to be equal, rather than still considering 5000 is
better than 5001 -- we do not want to "over-optimize" and potentially falls
into endless rebalances back and forth.

But I'm not clear about the rationale of the second parameter of
constrainedBalancedAssignment(StatefulTasksToRankedCandidates,
balance_factor):

Does that mean, e.g. with balance_factor of 3, we'd consider two
assignments one resulting balance_factor 0 and one resulting balance_factor
3 to be equally optimized assignment and therefore may "stop early"? This
was not very convincing to me :P

3. There are a couple of minor comments about the algorithm itself, left on
the wiki page since it needs to refer to the exact line and better
displayed there.

3.a Another wild thought about the threshold itself: today the assignment
itself is memoryless, so we would not know if the reported `TaskLag` itself
is increasing or decreasing even if the current value is under the
threshold. I wonder if it worthy to make it a bit more complicated to track
task lag trend at the assignor? Practically it may not be very uncommon
that stand-by tasks are not keeping up due to the fact that other active
tasks hosted on the same thread is starving the standby tasks.

4. There's a potential race condition risk when reporting `TaskLags` in the
subscription: right after reporting it to the leader, the cleanup thread
kicks in and deletes the state directory. If the task was assigned to the
host it would cause it to restore from beginning and effectively make the
seemingly optimized assignment very sub-optimal.

To be on the safer side we should consider either prune out those tasks
that are "close to be cleaned up" in the subscription, or we should delay
the cleanup right after we've included them in the subscription in case
they are been selected as assigned tasks by the assignor.

5. This is a meta comment: I think it would be helpful to add some user
visibility on the standby tasks lagging as well, via metrics for example.
Today it is hard for us to observe how far are our current standby tasks
compared to the active tasks and whether that lag is being increasing or
decreasing. As a follow-up task, for example, the rebalance should also be
triggered if we realize that some standby task's lag is increasing
indefinitely means that it cannot keep up (which is another indicator
either you need to add more resources with the num.standbys or your are
still not balanced enough).


On Tue, Aug 6, 2019 at 1:32 PM Sophie Blee-Goldman <sop...@confluent.io>
wrote:

> Hey all,
>
> I'd like to kick off discussion on KIP-441, aimed at the long restore times
> in Streams during which further active processing and IQ are blocked.
> Please give it a read and let us know your thoughts
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-441:+Smooth+Scaling+Out+for+Kafka+Streams
>
> Cheers,
> Sophie
>


-- 
-- Guozhang

Reply via email to