Jason Gustafson created KAFKA-9089:
--------------------------------------

             Summary: Reassignment should be resilient to unexpected errors
                 Key: KAFKA-9089
                 URL: https://issues.apache.org/jira/browse/KAFKA-9089
             Project: Kafka
          Issue Type: Bug
            Reporter: Jason Gustafson
            Assignee: Jason Gustafson
             Fix For: 2.4.0


Reassignment changes typically involve both an update to the assignment state 
in zookeeper and an update to the in-memory representation of that state (in 
the ControllerContext). We can run into trouble when these states get 
inconsistent with each other, so the reassignment logic attempts to follow some 
rules to reduce the impact from this:
 * When creating a new reassignment, we update the state in zookeeper first 
before updating memory. Until the reassignment is known to be persisted, we do 
not begin executing any reassignment logic.
 * When completing a reassignment, all of the completion steps are executed 
before the state is updated in zookeeper. In the event of a failure, the new 
controller can retry reassignment completion.

However, the current logic does not follow these rules strictly which can lead 
to state inconsistencies in the case of an unexpected error.
 # When we override or cancel an existing assignment, we currently use an 
intermediate assignment state which is only reflected in memory. It is 
basically a mix of the previous assignment state and the overlapping parts of 
the new reassignment. The purpose of this is to shutdown unneeded replicas from 
the existing reassignment. Since the intermediate state is not persisted, a 
controller failure will revert to the old reassignment. Any exception which 
does not cause a controller failure will result in state divergence.
 # The target replicas of a reassignment are represented both in the existing 
assignment (PartitionReplicaAssignment) and in a separate context object 
(ReassignedPartitionContext). The reassignment context is updated before a 
reassignment has been accepted and persisted. The intent is to remove this 
context object in the event of a submission failure, but an unexpected error 
will leave it around.

We can make reassignment more resilient to unexpected errors by using 
consistent update invariants. Specifically we can remove the intermediate 
assignment state and enforce the invariant that any active reassignment must be 
persisted before being reflected in memory. Additionally, we can make the 
assignment state the source of truth for the target replicas and eliminate the 
possibility of inconsistency. Doing so simplifies the reassignment logic and 
makes it more resilient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to