Boyang Chen created KAFKA-7728:
----------------------------------
Summary: Add JoinReason to the join group request for better
rebalance handling
Key: KAFKA-7728
URL: https://issues.apache.org/jira/browse/KAFKA-7728
Project: Kafka
Issue Type: Improvement
Reporter: Boyang Chen
Recently [~mgharat] and I discussed about the current rebalance logic on leader
join group request handling. So far we blindly trigger rebalance when the
leader rejoins. The caveat is that KIP-345 is not covering this effort and if a
consumer group is not using sticky assignment but using other strategy like
round robin, the redundant rebalance could still shuffle the topic partitions
around consumers. (for example mirror maker application)
I checked on broker side and here is what we currently do:
{code:java}
if (group.isLeader(memberId) || !member.matches(protocols))
// force a rebalance if a member has changed metadata or if the leader sends
JoinGroup.
// The latter allows the leader to trigger rebalances for changes affecting
assignment
// which do not affect the member metadata (such as topic metadata changes for
the consumer) {code}
Based on the broker logic, we only need to trigger rebalance for leader rejoin
when the topic metadata change has happened. I also looked up the
ConsumerCoordinator code on client side, and found out the metadata monitoring
logic here:
{code:java}
public boolean rejoinNeededOrPending() {
...
// we need to rejoin if we performed the assignment and metadata has changed
if (assignmentSnapshot != null && !assignmentSnapshot.equals(metadataSnapshot))
return true;
}{code}
I guess instead of just returning true, we could introduce a new enum field
called JoinReason which could indicate the purpose of the rejoin. Thus we don't
need to do a full rebalance when the leader is just in rolling bounce.
We could utilize this information I guess. Just add another enum field into the
join group request called JoinReason so that we know whether leader is
rejoining due to topic metadata change. If yes, we trigger rebalance obviously;
if no, we shouldn't trigger rebalance.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)