showuon commented on PR #20859: URL: https://github.com/apache/kafka/pull/20859#issuecomment-3584117480
> Voter set is (A,B) with leader A. Node C is a caught up observer and tries to add itself. The new voter set (A,B,C) is committed by A and B, but C is partitioned and never learns its "auto-join" succeeded. The user removes C, and the voter set is back to (A,B). When the partition goes away, if updateVoterSet timer is expired, node C's shouldSendAddOrRemoveVoter evaluates to true and it will try to auto-join again. > I think in AddVoterHandler we check that the node being added is "caught up," but that check currently returns true if the node has fetched within the last hour (we can make this more strict obviously). Nice find! And I agree we can make the "caught up" criteria more strict to make sure the new added voter already knows the voters change, and then adopt the voter sets track in partitionState.updateState() solution below, we can resolve this issue because the state will move to `HAS_JOINED` when finding the voter contains itself. But we can handle the issue separately since it's not directly related to this issue. Opened [KAFKA-19933](https://issues.apache.org/jira/browse/KAFKA-19933). > Voter set is (A,B) with leader A. Node C is a caught up observer and tries to add itself. The new voter set (A,B,C) is committed by A and B, but C is partitioned and never learns its "auto-join" succeeded. The user removes C, and the voter set is back to (A,B). This is similar to the case above, but when the partition goes away, suppose node C's shouldSendAddOrRemoveVoter evaluates to false, and node C fetches instead of trying to auto-join. If this fetch contains both voter set changes in the same response (the latest voter set is (A,B)), the current code will still believe node C is in the HAS_NOT_JOINED state, and will attempt to auto-join again because we are only checking partitionState.lastVoterSet(). > I still think we can handle this properly in the code, and it means we need transition HAS_NOT_JOINED -> HAS_JOINED if any VotersRecord from the FETCH_RESPONSE contains the local ReplicaKey. We go through the new voter sets in partitionState.updateState(), and remember we do not need the HWM here because to get to the HAS_NOT_JOINED state we had to be "caught up" at that point in time. Another good find! (How could you come up these edge cases :) ) Yes, I think tracking the voters set change in `partitionState.updateState()` should work. @TaiJuWu , So in short, we have to do: 1. When a node is the leader and the state is `UNKNOWN`, we make it to `HAS_JOINED` 2. When a node is the follower (both voter and observer) and state is `UNKNOWN`, wait until the HWM updated, then check if the state is `HAS_JOINED` or `HAS_NOT_JOINED`. 3. Implement the tracking the voters set change in `partitionState.updateState()` solution 4. If in (3), we found the state is `HAS_NOT_JOINED` and local ReplicaKey appear in the voters set, we should change the state to `HAS_JOINED`. What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
