[jira] [Commented] (KAFKA-1097) Race condition while reassigning low throughput partition leads to incorrect ISR information in zookeeper

Neha Narkhede (JIRA) Tue, 22 Oct 2013 11:44:43 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-1097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13802108#comment-13802108
 ]


Neha Narkhede commented on KAFKA-1097:
--------------------------------------

This is a timing issue between when the controller executes related state 
changes and when those state changes actually get executed on the broker. In 
this case, the right time to remove the replica from ISR is actually on 
receiving the StopReplicaResponse from the broker. This ensures that the broker 
will not send any more fetch requests to the leader and add itself back to the 
ISR. This is the right fix, but is a non-trivial change to the controller and 
is dependent on KAFKA-1099. 

A short-term fix which will alleviate the issue, but not completely fix it is 
to send the stop replica request to the broker on the OfflineReplica state 
change, then shrink the ISR on the controller. In addition, send the stop 
replica (with delete) request on the NonExistentReplica state change and shrink 
the ISR again. There is a small chance that the broker hasn't acted on the 2 
stop replica requests and still adds itself back to the ISR, after the 
controller shrinks the ISR during the NonExistentReplica state change. But this 
is unlikely. 

So the options for this bug fix are -

1. Take the short term fix for 0.8 and leave the larger change for trunk
2. Don't even take the short term fix and just fix it properly on trunk

> Race condition while reassigning low throughput partition leads to incorrect 
> ISR information in zookeeper 
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-1097
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1097
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 0.8
>            Reporter: Neha Narkhede
>            Assignee: Neha Narkhede
>            Priority: Critical
>             Fix For: 0.8
>
>
> While moving partitions, the controller moves the old replicas through the 
> following state changes -
> ONLINE -> OFFLINE -> NON_EXISTENT
> During the offline state change, the controller removes the old replica and 
> writes the updated ISR to zookeeper and notifies the leader. Note that it 
> doesn't notify the old replicas to stop fetching from the leader (to be fixed 
> in KAFKA-1032). During the non-existent state change, the controller does not 
> write the updated ISR or replica list to zookeeper. Right after the 
> non-existent state change, the controller writes the new replica list to 
> zookeeper, but does not update the ISR. So an old replica can send a fetch 
> request after the offline state change, essentially letting the leader add it 
> back to the ISR. The problem is that if there is no new data coming in for 
> the partition and the old replica is fully caught up, the leader cannot 
> remove it from the ISR. That lets a non existent replica live in the ISR at 
> least until new data comes in to the partition



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (KAFKA-1097) Race condition while reassigning low throughput partition leads to incorrect ISR information in zookeeper

Reply via email to