[jira] [Commented] (KAFKA-2143) Replicas get ahead of leader and fail
[ https://issues.apache.org/jira/browse/KAFKA-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15860712#comment-15860712 ] Mathieu Filotto commented on KAFKA-2143: Is the upgrade the only option ? Is 0.10 upgrade is working too ? > Replicas get ahead of leader and fail > - > > Key: KAFKA-2143 > URL: https://issues.apache.org/jira/browse/KAFKA-2143 > Project: Kafka > Issue Type: Bug > Components: replication >Affects Versions: 0.8.2.1 >Reporter: Evan Huus >Assignee: Jiangjie Qin > Fix For: 0.9.0.1 > > > On a cluster of 6 nodes, we recently saw a case where a single > under-replicated partition suddenly appeared, replication lag spiked, and > network IO spiked. The cluster appeared to recover eventually on its own, > Looking at the logs, the thing which failed was partition 7 of the topic > {{background_queue}}. It had an ISR of 1,4,3 and its leader at the time was > 3. Here are the interesting log lines: > On node 3 (the leader): > {noformat} > [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when > processing fetch request for partition [background_queue,7] offset 3722949957 > from follower with correlation id 148185816. Possible cause: Request for > offset 3722949957 but we only have log segments in the range 3648049863 to > 3722949955. (kafka.server.ReplicaManager) > [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when > processing fetch request for partition [background_queue,7] offset 3722949957 > from follower with correlation id 156007054. Possible cause: Request for > offset 3722949957 but we only have log segments in the range 3648049863 to > 3722949955. (kafka.server.ReplicaManager) > [2015-04-23 16:50:13,960] INFO Partition [background_queue,7] on broker 3: > Shrinking ISR for partition [background_queue,7] from 1,4,3 to 3 > (kafka.cluster.Partition) > {noformat} > Note that both replicas suddenly asked for an offset *ahead* of the available > offsets. > And on nodes 1 and 4 (the replicas) many occurrences of the following: > {noformat} > [2015-04-23 16:50:05,935] INFO Scheduling log segment 3648049863 for log > background_queue-7 for deletion. (kafka.log.Log) (edited) > {noformat} > Based on my reading, this looks like the replicas somehow got *ahead* of the > leader, asked for an invalid offset, got confused, and re-replicated the > entire topic from scratch to recover (this matches our network graphs, which > show 3 sending a bunch of data to 1 and 4). > Taking a stab in the dark at the cause, there appears to be a race condition > where replicas can receive a new offset before the leader has committed it > and is ready to replicate? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-2143) Replicas get ahead of leader and fail
[ https://issues.apache.org/jira/browse/KAFKA-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242428#comment-15242428 ] Jason Rosenberg commented on KAFKA-2143: After looking at KAFKA-2477, I think that is the actual issue reported in this ticket (as opposed to the unrelated issue the first commenter raised), and I see that KAFKA-2477 seems to have been fixed. So upgrading to 0.9.X is the thing to do, it would appear. > Replicas get ahead of leader and fail > - > > Key: KAFKA-2143 > URL: https://issues.apache.org/jira/browse/KAFKA-2143 > Project: Kafka > Issue Type: Bug > Components: replication >Affects Versions: 0.8.2.1 >Reporter: Evan Huus >Assignee: Jiangjie Qin > Fix For: 0.9.0.1 > > > On a cluster of 6 nodes, we recently saw a case where a single > under-replicated partition suddenly appeared, replication lag spiked, and > network IO spiked. The cluster appeared to recover eventually on its own, > Looking at the logs, the thing which failed was partition 7 of the topic > {{background_queue}}. It had an ISR of 1,4,3 and its leader at the time was > 3. Here are the interesting log lines: > On node 3 (the leader): > {noformat} > [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when > processing fetch request for partition [background_queue,7] offset 3722949957 > from follower with correlation id 148185816. Possible cause: Request for > offset 3722949957 but we only have log segments in the range 3648049863 to > 3722949955. (kafka.server.ReplicaManager) > [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when > processing fetch request for partition [background_queue,7] offset 3722949957 > from follower with correlation id 156007054. Possible cause: Request for > offset 3722949957 but we only have log segments in the range 3648049863 to > 3722949955. (kafka.server.ReplicaManager) > [2015-04-23 16:50:13,960] INFO Partition [background_queue,7] on broker 3: > Shrinking ISR for partition [background_queue,7] from 1,4,3 to 3 > (kafka.cluster.Partition) > {noformat} > Note that both replicas suddenly asked for an offset *ahead* of the available > offsets. > And on nodes 1 and 4 (the replicas) many occurrences of the following: > {noformat} > [2015-04-23 16:50:05,935] INFO Scheduling log segment 3648049863 for log > background_queue-7 for deletion. (kafka.log.Log) (edited) > {noformat} > Based on my reading, this looks like the replicas somehow got *ahead* of the > leader, asked for an invalid offset, got confused, and re-replicated the > entire topic from scratch to recover (this matches our network graphs, which > show 3 sending a bunch of data to 1 and 4). > Taking a stab in the dark at the cause, there appears to be a race condition > where replicas can receive a new offset before the leader has committed it > and is ready to replicate? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2143) Replicas get ahead of leader and fail
[ https://issues.apache.org/jira/browse/KAFKA-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242412#comment-15242412 ] Jason Rosenberg commented on KAFKA-2143: Hello, I recently experienced this issue, as described in the original description above. We are running 0.8.2.2. Note, in the original description above (and in my case as well), there was no evidence of a network glitch, and no evidence of any partition leadership change. So, I'm concerned whether the discussion that follows in this ticket (which describes network glitches and leadership changes almost exclusively) really addressed the original issue as reported (and my recent observation of this as well). Can someone comment on this please? [~becket_qin] > Replicas get ahead of leader and fail > - > > Key: KAFKA-2143 > URL: https://issues.apache.org/jira/browse/KAFKA-2143 > Project: Kafka > Issue Type: Bug > Components: replication >Affects Versions: 0.8.2.1 >Reporter: Evan Huus >Assignee: Jiangjie Qin > Fix For: 0.9.0.1 > > > On a cluster of 6 nodes, we recently saw a case where a single > under-replicated partition suddenly appeared, replication lag spiked, and > network IO spiked. The cluster appeared to recover eventually on its own, > Looking at the logs, the thing which failed was partition 7 of the topic > {{background_queue}}. It had an ISR of 1,4,3 and its leader at the time was > 3. Here are the interesting log lines: > On node 3 (the leader): > {noformat} > [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when > processing fetch request for partition [background_queue,7] offset 3722949957 > from follower with correlation id 148185816. Possible cause: Request for > offset 3722949957 but we only have log segments in the range 3648049863 to > 3722949955. (kafka.server.ReplicaManager) > [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when > processing fetch request for partition [background_queue,7] offset 3722949957 > from follower with correlation id 156007054. Possible cause: Request for > offset 3722949957 but we only have log segments in the range 3648049863 to > 3722949955. (kafka.server.ReplicaManager) > [2015-04-23 16:50:13,960] INFO Partition [background_queue,7] on broker 3: > Shrinking ISR for partition [background_queue,7] from 1,4,3 to 3 > (kafka.cluster.Partition) > {noformat} > Note that both replicas suddenly asked for an offset *ahead* of the available > offsets. > And on nodes 1 and 4 (the replicas) many occurrences of the following: > {noformat} > [2015-04-23 16:50:05,935] INFO Scheduling log segment 3648049863 for log > background_queue-7 for deletion. (kafka.log.Log) (edited) > {noformat} > Based on my reading, this looks like the replicas somehow got *ahead* of the > leader, asked for an invalid offset, got confused, and re-replicated the > entire topic from scratch to recover (this matches our network graphs, which > show 3 sending a bunch of data to 1 and 4). > Taking a stab in the dark at the cause, there appears to be a race condition > where replicas can receive a new offset before the leader has committed it > and is ready to replicate? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
RE: [jira] [Commented] (KAFKA-2143) Replicas get ahead of leader and fail
i meet the same issues,the error log, Error when processing fetch request for partition [To_S3_comm_V0_10,2] offset 456234794 from follower with correlation id 254117341. Possible cause: Request for offset 456234794 but we only have log segmen ts in the range 432322850 to 456234793. (kafka.server.ReplicaManager) and i find in the handleOffsetOutOfRange,it just check (leaderEndOffset < replica.logEndOffset.messageOffset),if not, will delete the all log INFO Scheduling log segment 432322850 for log To_S3_comm_V0_10-2 for deletion. (kafka.log.Log) ... INFO Deleting segment 434379909 from log To_S3_comm_V0_10-2. (kafka.log.Log) i think,it must add check (log.logEndOffset < leaderStartOffset) when leaderEndOffset not small than replica.logEndOffset.messageOffset. Thanks, Lax > Date: Fri, 4 Sep 2015 00:41:47 + > From: j...@apache.org > To: dev@kafka.apache.org > Subject: [jira] [Commented] (KAFKA-2143) Replicas get ahead of leader and fail > > > [ > https://issues.apache.org/jira/browse/KAFKA-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730109#comment-14730109 > ] > > Jun Rao commented on KAFKA-2143: > > > [~becket_qin], since before step (3), both A and B are in ISR, the last > committed offset in A can't be larger than 3000. So, in step (3), if A > becomes a follower, it has to first truncate its log to last committed offset > before fetching. So, at that point, A's fetch offset can't be larger than > 3000 and therefore won't be out of range. > > The following is a alternative scenario that can cause this. > > 1) Broker A (leader) receives messages to 5000 > 2) Broker B (follower) receives messages to 3000 (it is still in ISR because > of replica.lag.max.messages) > 3) For some reason, B is dropped out of ISR. > 4) Broker A (the only one in ISR) commits messages to 5000. > 5) For some reason, Broker A is considered dead and Broker B is live. > 6) Broker B is selected as the new leader (unclean leader election) and is > the only one in ISR. > 7) Broker A is considered live again and starts fetching from 5000 (last > committed offset) and gets OffsetOutOfRangeException. > 8) In the mean time, B receives more messages to offset 6000. > 9) Broker A tries to handle OffsetOutOfRangeException and finds out leader > B's log end offset is now larger than its log end offset and truncates all > its log. > > Your patch reduces the amount of the data that Broker A needs to replicate in > step 9, which is probably fine. However, we probably should first verify if > this is indeed what's happening since it seems that it should happen rarely. > Also, KAFKA-2477 reports a similar issue w/o any leadership change. So, may > be there is something else that can cause this. > > > Replicas get ahead of leader and fail > > - > > > > Key: KAFKA-2143 > > URL: https://issues.apache.org/jira/browse/KAFKA-2143 > > Project: Kafka > > Issue Type: Bug > > Components: replication > >Affects Versions: 0.8.2.1 > >Reporter: Evan Huus > >Assignee: Jiangjie Qin > > > > On a cluster of 6 nodes, we recently saw a case where a single > > under-replicated partition suddenly appeared, replication lag spiked, and > > network IO spiked. The cluster appeared to recover eventually on its own, > > Looking at the logs, the thing which failed was partition 7 of the topic > > {{background_queue}}. It had an ISR of 1,4,3 and its leader at the time was > > 3. Here are the interesting log lines: > > On node 3 (the leader): > > {noformat} > > [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when > > processing fetch request for partition [background_queue,7] offset > > 3722949957 from follower with correlation id 148185816. Possible cause: > > Request for offset 3722949957 but we only have log segments in the range > > 3648049863 to 3722949955. (kafka.server.ReplicaManager) > > [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when > > processing fetch request for partition [background_queue,7] offset > > 3722949957 from follower with correlation id 156007054. Possible cause: > > Request for offset 3722949957 but we only have log segments in the range > > 3648049863 to 3722949955. (kafka.server.ReplicaManager) > > [2015-04-23 16:50:13,960] INFO Partition [background_queue,7] on broker 3: > > Shrinking ISR for partition [background_queue,7] from 1,4,3 to 3 > > (kafka.cluster.Partition) > > {noformat} > > Note that both rep
[jira] [Commented] (KAFKA-2143) Replicas get ahead of leader and fail
[ https://issues.apache.org/jira/browse/KAFKA-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15123866#comment-15123866 ] Jiangjie Qin commented on KAFKA-2143: - Thanks, [~guozhang]. I almost forgot I had this ticket :) > Replicas get ahead of leader and fail > - > > Key: KAFKA-2143 > URL: https://issues.apache.org/jira/browse/KAFKA-2143 > Project: Kafka > Issue Type: Bug > Components: replication >Affects Versions: 0.8.2.1 >Reporter: Evan Huus >Assignee: Jiangjie Qin > Fix For: 0.9.0.1 > > > On a cluster of 6 nodes, we recently saw a case where a single > under-replicated partition suddenly appeared, replication lag spiked, and > network IO spiked. The cluster appeared to recover eventually on its own, > Looking at the logs, the thing which failed was partition 7 of the topic > {{background_queue}}. It had an ISR of 1,4,3 and its leader at the time was > 3. Here are the interesting log lines: > On node 3 (the leader): > {noformat} > [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when > processing fetch request for partition [background_queue,7] offset 3722949957 > from follower with correlation id 148185816. Possible cause: Request for > offset 3722949957 but we only have log segments in the range 3648049863 to > 3722949955. (kafka.server.ReplicaManager) > [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when > processing fetch request for partition [background_queue,7] offset 3722949957 > from follower with correlation id 156007054. Possible cause: Request for > offset 3722949957 but we only have log segments in the range 3648049863 to > 3722949955. (kafka.server.ReplicaManager) > [2015-04-23 16:50:13,960] INFO Partition [background_queue,7] on broker 3: > Shrinking ISR for partition [background_queue,7] from 1,4,3 to 3 > (kafka.cluster.Partition) > {noformat} > Note that both replicas suddenly asked for an offset *ahead* of the available > offsets. > And on nodes 1 and 4 (the replicas) many occurrences of the following: > {noformat} > [2015-04-23 16:50:05,935] INFO Scheduling log segment 3648049863 for log > background_queue-7 for deletion. (kafka.log.Log) (edited) > {noformat} > Based on my reading, this looks like the replicas somehow got *ahead* of the > leader, asked for an invalid offset, got confused, and re-replicated the > entire topic from scratch to recover (this matches our network graphs, which > show 3 sending a bunch of data to 1 and 4). > Taking a stab in the dark at the cause, there appears to be a race condition > where replicas can receive a new offset before the leader has committed it > and is ready to replicate? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2143) Replicas get ahead of leader and fail
[ https://issues.apache.org/jira/browse/KAFKA-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122919#comment-15122919 ] ASF GitHub Bot commented on KAFKA-2143: --- Github user asfgit closed the pull request at: https://github.com/apache/kafka/pull/129 > Replicas get ahead of leader and fail > - > > Key: KAFKA-2143 > URL: https://issues.apache.org/jira/browse/KAFKA-2143 > Project: Kafka > Issue Type: Bug > Components: replication >Affects Versions: 0.8.2.1 >Reporter: Evan Huus >Assignee: Jiangjie Qin > Fix For: 0.9.0.1 > > > On a cluster of 6 nodes, we recently saw a case where a single > under-replicated partition suddenly appeared, replication lag spiked, and > network IO spiked. The cluster appeared to recover eventually on its own, > Looking at the logs, the thing which failed was partition 7 of the topic > {{background_queue}}. It had an ISR of 1,4,3 and its leader at the time was > 3. Here are the interesting log lines: > On node 3 (the leader): > {noformat} > [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when > processing fetch request for partition [background_queue,7] offset 3722949957 > from follower with correlation id 148185816. Possible cause: Request for > offset 3722949957 but we only have log segments in the range 3648049863 to > 3722949955. (kafka.server.ReplicaManager) > [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when > processing fetch request for partition [background_queue,7] offset 3722949957 > from follower with correlation id 156007054. Possible cause: Request for > offset 3722949957 but we only have log segments in the range 3648049863 to > 3722949955. (kafka.server.ReplicaManager) > [2015-04-23 16:50:13,960] INFO Partition [background_queue,7] on broker 3: > Shrinking ISR for partition [background_queue,7] from 1,4,3 to 3 > (kafka.cluster.Partition) > {noformat} > Note that both replicas suddenly asked for an offset *ahead* of the available > offsets. > And on nodes 1 and 4 (the replicas) many occurrences of the following: > {noformat} > [2015-04-23 16:50:05,935] INFO Scheduling log segment 3648049863 for log > background_queue-7 for deletion. (kafka.log.Log) (edited) > {noformat} > Based on my reading, this looks like the replicas somehow got *ahead* of the > leader, asked for an invalid offset, got confused, and re-replicated the > entire topic from scratch to recover (this matches our network graphs, which > show 3 sending a bunch of data to 1 and 4). > Taking a stab in the dark at the cause, there appears to be a race condition > where replicas can receive a new offset before the leader has committed it > and is ready to replicate? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2143) Replicas get ahead of leader and fail
[ https://issues.apache.org/jira/browse/KAFKA-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056999#comment-15056999 ] Jiangjie Qin commented on KAFKA-2143: - [~ottomata], KAFKA-2477 is a more likely cause. You may want to take a look at that. > Replicas get ahead of leader and fail > - > > Key: KAFKA-2143 > URL: https://issues.apache.org/jira/browse/KAFKA-2143 > Project: Kafka > Issue Type: Bug > Components: replication >Affects Versions: 0.8.2.1 >Reporter: Evan Huus >Assignee: Jiangjie Qin > > On a cluster of 6 nodes, we recently saw a case where a single > under-replicated partition suddenly appeared, replication lag spiked, and > network IO spiked. The cluster appeared to recover eventually on its own, > Looking at the logs, the thing which failed was partition 7 of the topic > {{background_queue}}. It had an ISR of 1,4,3 and its leader at the time was > 3. Here are the interesting log lines: > On node 3 (the leader): > {noformat} > [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when > processing fetch request for partition [background_queue,7] offset 3722949957 > from follower with correlation id 148185816. Possible cause: Request for > offset 3722949957 but we only have log segments in the range 3648049863 to > 3722949955. (kafka.server.ReplicaManager) > [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when > processing fetch request for partition [background_queue,7] offset 3722949957 > from follower with correlation id 156007054. Possible cause: Request for > offset 3722949957 but we only have log segments in the range 3648049863 to > 3722949955. (kafka.server.ReplicaManager) > [2015-04-23 16:50:13,960] INFO Partition [background_queue,7] on broker 3: > Shrinking ISR for partition [background_queue,7] from 1,4,3 to 3 > (kafka.cluster.Partition) > {noformat} > Note that both replicas suddenly asked for an offset *ahead* of the available > offsets. > And on nodes 1 and 4 (the replicas) many occurrences of the following: > {noformat} > [2015-04-23 16:50:05,935] INFO Scheduling log segment 3648049863 for log > background_queue-7 for deletion. (kafka.log.Log) (edited) > {noformat} > Based on my reading, this looks like the replicas somehow got *ahead* of the > leader, asked for an invalid offset, got confused, and re-replicated the > entire topic from scratch to recover (this matches our network graphs, which > show 3 sending a bunch of data to 1 and 4). > Taking a stab in the dark at the cause, there appears to be a race condition > where replicas can receive a new offset before the leader has committed it > and is ready to replicate? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2143) Replicas get ahead of leader and fail
[ https://issues.apache.org/jira/browse/KAFKA-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056672#comment-15056672 ] Andrew Otto commented on KAFKA-2143: I'd like to add that Wikimedia is experiencing this as well. It only started happening to us recently when we started producing to Kafka via a PHP process serving a web request. This process is short lived, so only connects to Kafka in order to produce a single message, and then disconnects. The only topic that this happens to is the one that this PHP process produces to. Everything recovers as it should, but something is definitely wrong here. Why is a follower trying to consume an offset ahead of what the leader has? We're running 0.8.2.1 (with the snappy compression bugfix backported). > Replicas get ahead of leader and fail > - > > Key: KAFKA-2143 > URL: https://issues.apache.org/jira/browse/KAFKA-2143 > Project: Kafka > Issue Type: Bug > Components: replication >Affects Versions: 0.8.2.1 >Reporter: Evan Huus >Assignee: Jiangjie Qin > > On a cluster of 6 nodes, we recently saw a case where a single > under-replicated partition suddenly appeared, replication lag spiked, and > network IO spiked. The cluster appeared to recover eventually on its own, > Looking at the logs, the thing which failed was partition 7 of the topic > {{background_queue}}. It had an ISR of 1,4,3 and its leader at the time was > 3. Here are the interesting log lines: > On node 3 (the leader): > {noformat} > [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when > processing fetch request for partition [background_queue,7] offset 3722949957 > from follower with correlation id 148185816. Possible cause: Request for > offset 3722949957 but we only have log segments in the range 3648049863 to > 3722949955. (kafka.server.ReplicaManager) > [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when > processing fetch request for partition [background_queue,7] offset 3722949957 > from follower with correlation id 156007054. Possible cause: Request for > offset 3722949957 but we only have log segments in the range 3648049863 to > 3722949955. (kafka.server.ReplicaManager) > [2015-04-23 16:50:13,960] INFO Partition [background_queue,7] on broker 3: > Shrinking ISR for partition [background_queue,7] from 1,4,3 to 3 > (kafka.cluster.Partition) > {noformat} > Note that both replicas suddenly asked for an offset *ahead* of the available > offsets. > And on nodes 1 and 4 (the replicas) many occurrences of the following: > {noformat} > [2015-04-23 16:50:05,935] INFO Scheduling log segment 3648049863 for log > background_queue-7 for deletion. (kafka.log.Log) (edited) > {noformat} > Based on my reading, this looks like the replicas somehow got *ahead* of the > leader, asked for an invalid offset, got confused, and re-replicated the > entire topic from scratch to recover (this matches our network graphs, which > show 3 sending a bunch of data to 1 and 4). > Taking a stab in the dark at the cause, there appears to be a race condition > where replicas can receive a new offset before the leader has committed it > and is ready to replicate? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2143) Replicas get ahead of leader and fail
[ https://issues.apache.org/jira/browse/KAFKA-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730117#comment-14730117 ] Jiangjie Qin commented on KAFKA-2143: - [~junrao], you are right. I realized there should be unclean leader election in this case. I'll check the code further to see if there is any finding for KAFKA-2477. > Replicas get ahead of leader and fail > - > > Key: KAFKA-2143 > URL: https://issues.apache.org/jira/browse/KAFKA-2143 > Project: Kafka > Issue Type: Bug > Components: replication >Affects Versions: 0.8.2.1 >Reporter: Evan Huus >Assignee: Jiangjie Qin > > On a cluster of 6 nodes, we recently saw a case where a single > under-replicated partition suddenly appeared, replication lag spiked, and > network IO spiked. The cluster appeared to recover eventually on its own, > Looking at the logs, the thing which failed was partition 7 of the topic > {{background_queue}}. It had an ISR of 1,4,3 and its leader at the time was > 3. Here are the interesting log lines: > On node 3 (the leader): > {noformat} > [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when > processing fetch request for partition [background_queue,7] offset 3722949957 > from follower with correlation id 148185816. Possible cause: Request for > offset 3722949957 but we only have log segments in the range 3648049863 to > 3722949955. (kafka.server.ReplicaManager) > [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when > processing fetch request for partition [background_queue,7] offset 3722949957 > from follower with correlation id 156007054. Possible cause: Request for > offset 3722949957 but we only have log segments in the range 3648049863 to > 3722949955. (kafka.server.ReplicaManager) > [2015-04-23 16:50:13,960] INFO Partition [background_queue,7] on broker 3: > Shrinking ISR for partition [background_queue,7] from 1,4,3 to 3 > (kafka.cluster.Partition) > {noformat} > Note that both replicas suddenly asked for an offset *ahead* of the available > offsets. > And on nodes 1 and 4 (the replicas) many occurrences of the following: > {noformat} > [2015-04-23 16:50:05,935] INFO Scheduling log segment 3648049863 for log > background_queue-7 for deletion. (kafka.log.Log) (edited) > {noformat} > Based on my reading, this looks like the replicas somehow got *ahead* of the > leader, asked for an invalid offset, got confused, and re-replicated the > entire topic from scratch to recover (this matches our network graphs, which > show 3 sending a bunch of data to 1 and 4). > Taking a stab in the dark at the cause, there appears to be a race condition > where replicas can receive a new offset before the leader has committed it > and is ready to replicate? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2143) Replicas get ahead of leader and fail
[ https://issues.apache.org/jira/browse/KAFKA-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730109#comment-14730109 ] Jun Rao commented on KAFKA-2143: [~becket_qin], since before step (3), both A and B are in ISR, the last committed offset in A can't be larger than 3000. So, in step (3), if A becomes a follower, it has to first truncate its log to last committed offset before fetching. So, at that point, A's fetch offset can't be larger than 3000 and therefore won't be out of range. The following is a alternative scenario that can cause this. 1) Broker A (leader) receives messages to 5000 2) Broker B (follower) receives messages to 3000 (it is still in ISR because of replica.lag.max.messages) 3) For some reason, B is dropped out of ISR. 4) Broker A (the only one in ISR) commits messages to 5000. 5) For some reason, Broker A is considered dead and Broker B is live. 6) Broker B is selected as the new leader (unclean leader election) and is the only one in ISR. 7) Broker A is considered live again and starts fetching from 5000 (last committed offset) and gets OffsetOutOfRangeException. 8) In the mean time, B receives more messages to offset 6000. 9) Broker A tries to handle OffsetOutOfRangeException and finds out leader B's log end offset is now larger than its log end offset and truncates all its log. Your patch reduces the amount of the data that Broker A needs to replicate in step 9, which is probably fine. However, we probably should first verify if this is indeed what's happening since it seems that it should happen rarely. Also, KAFKA-2477 reports a similar issue w/o any leadership change. So, may be there is something else that can cause this. > Replicas get ahead of leader and fail > - > > Key: KAFKA-2143 > URL: https://issues.apache.org/jira/browse/KAFKA-2143 > Project: Kafka > Issue Type: Bug > Components: replication >Affects Versions: 0.8.2.1 >Reporter: Evan Huus >Assignee: Jiangjie Qin > > On a cluster of 6 nodes, we recently saw a case where a single > under-replicated partition suddenly appeared, replication lag spiked, and > network IO spiked. The cluster appeared to recover eventually on its own, > Looking at the logs, the thing which failed was partition 7 of the topic > {{background_queue}}. It had an ISR of 1,4,3 and its leader at the time was > 3. Here are the interesting log lines: > On node 3 (the leader): > {noformat} > [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when > processing fetch request for partition [background_queue,7] offset 3722949957 > from follower with correlation id 148185816. Possible cause: Request for > offset 3722949957 but we only have log segments in the range 3648049863 to > 3722949955. (kafka.server.ReplicaManager) > [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when > processing fetch request for partition [background_queue,7] offset 3722949957 > from follower with correlation id 156007054. Possible cause: Request for > offset 3722949957 but we only have log segments in the range 3648049863 to > 3722949955. (kafka.server.ReplicaManager) > [2015-04-23 16:50:13,960] INFO Partition [background_queue,7] on broker 3: > Shrinking ISR for partition [background_queue,7] from 1,4,3 to 3 > (kafka.cluster.Partition) > {noformat} > Note that both replicas suddenly asked for an offset *ahead* of the available > offsets. > And on nodes 1 and 4 (the replicas) many occurrences of the following: > {noformat} > [2015-04-23 16:50:05,935] INFO Scheduling log segment 3648049863 for log > background_queue-7 for deletion. (kafka.log.Log) (edited) > {noformat} > Based on my reading, this looks like the replicas somehow got *ahead* of the > leader, asked for an invalid offset, got confused, and re-replicated the > entire topic from scratch to recover (this matches our network graphs, which > show 3 sending a bunch of data to 1 and 4). > Taking a stab in the dark at the cause, there appears to be a race condition > where replicas can receive a new offset before the leader has committed it > and is ready to replicate? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2143) Replicas get ahead of leader and fail
[ https://issues.apache.org/jira/browse/KAFKA-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680467#comment-14680467 ] ASF GitHub Bot commented on KAFKA-2143: --- GitHub user becketqin opened a pull request: https://github.com/apache/kafka/pull/129 KAFKA-2143: fix replica offset truncate to beginning during leader migration. You can merge this pull request into a Git repository by running: $ git pull https://github.com/becketqin/kafka KAFKA-2143 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/129.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #129 commit 71f8a4716e1f0b4fc2bd88aa30fe38aef8a9f92e Author: Jiangjie Qin becket@gmail.com Date: 2015-08-03T02:22:02Z Fix for KAFKA-2134, fix replica offset truncate to beginning during leader migration. Replicas get ahead of leader and fail - Key: KAFKA-2143 URL: https://issues.apache.org/jira/browse/KAFKA-2143 Project: Kafka Issue Type: Bug Components: replication Affects Versions: 0.8.2.1 Reporter: Evan Huus Assignee: Jiangjie Qin Fix For: 0.8.3 On a cluster of 6 nodes, we recently saw a case where a single under-replicated partition suddenly appeared, replication lag spiked, and network IO spiked. The cluster appeared to recover eventually on its own, Looking at the logs, the thing which failed was partition 7 of the topic {{background_queue}}. It had an ISR of 1,4,3 and its leader at the time was 3. Here are the interesting log lines: On node 3 (the leader): {noformat} [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when processing fetch request for partition [background_queue,7] offset 3722949957 from follower with correlation id 148185816. Possible cause: Request for offset 3722949957 but we only have log segments in the range 3648049863 to 3722949955. (kafka.server.ReplicaManager) [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when processing fetch request for partition [background_queue,7] offset 3722949957 from follower with correlation id 156007054. Possible cause: Request for offset 3722949957 but we only have log segments in the range 3648049863 to 3722949955. (kafka.server.ReplicaManager) [2015-04-23 16:50:13,960] INFO Partition [background_queue,7] on broker 3: Shrinking ISR for partition [background_queue,7] from 1,4,3 to 3 (kafka.cluster.Partition) {noformat} Note that both replicas suddenly asked for an offset *ahead* of the available offsets. And on nodes 1 and 4 (the replicas) many occurrences of the following: {noformat} [2015-04-23 16:50:05,935] INFO Scheduling log segment 3648049863 for log background_queue-7 for deletion. (kafka.log.Log) (edited) {noformat} Based on my reading, this looks like the replicas somehow got *ahead* of the leader, asked for an invalid offset, got confused, and re-replicated the entire topic from scratch to recover (this matches our network graphs, which show 3 sending a bunch of data to 1 and 4). Taking a stab in the dark at the cause, there appears to be a race condition where replicas can receive a new offset before the leader has committed it and is ready to replicate? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2143) Replicas get ahead of leader and fail
[ https://issues.apache.org/jira/browse/KAFKA-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660629#comment-14660629 ] Ewen Cheslack-Postava commented on KAFKA-2143: -- [~wushujames] I think one easy way for a broker to be in ISR for one topic and out of ISR for another is a partition -- if topic A has broker 1 as leader and topic B has broker 2 as leader, then broker 3 replicating A and B could be in ISR for A, but unable to replicate data for B if there is a partition between brokers 1 and 3. Replicas get ahead of leader and fail - Key: KAFKA-2143 URL: https://issues.apache.org/jira/browse/KAFKA-2143 Project: Kafka Issue Type: Bug Components: replication Affects Versions: 0.8.2.1 Reporter: Evan Huus Assignee: Jiangjie Qin Fix For: 0.8.3 On a cluster of 6 nodes, we recently saw a case where a single under-replicated partition suddenly appeared, replication lag spiked, and network IO spiked. The cluster appeared to recover eventually on its own, Looking at the logs, the thing which failed was partition 7 of the topic {{background_queue}}. It had an ISR of 1,4,3 and its leader at the time was 3. Here are the interesting log lines: On node 3 (the leader): {noformat} [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when processing fetch request for partition [background_queue,7] offset 3722949957 from follower with correlation id 148185816. Possible cause: Request for offset 3722949957 but we only have log segments in the range 3648049863 to 3722949955. (kafka.server.ReplicaManager) [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when processing fetch request for partition [background_queue,7] offset 3722949957 from follower with correlation id 156007054. Possible cause: Request for offset 3722949957 but we only have log segments in the range 3648049863 to 3722949955. (kafka.server.ReplicaManager) [2015-04-23 16:50:13,960] INFO Partition [background_queue,7] on broker 3: Shrinking ISR for partition [background_queue,7] from 1,4,3 to 3 (kafka.cluster.Partition) {noformat} Note that both replicas suddenly asked for an offset *ahead* of the available offsets. And on nodes 1 and 4 (the replicas) many occurrences of the following: {noformat} [2015-04-23 16:50:05,935] INFO Scheduling log segment 3648049863 for log background_queue-7 for deletion. (kafka.log.Log) (edited) {noformat} Based on my reading, this looks like the replicas somehow got *ahead* of the leader, asked for an invalid offset, got confused, and re-replicated the entire topic from scratch to recover (this matches our network graphs, which show 3 sending a bunch of data to 1 and 4). Taking a stab in the dark at the cause, there appears to be a race condition where replicas can receive a new offset before the leader has committed it and is ready to replicate? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2143) Replicas get ahead of leader and fail
[ https://issues.apache.org/jira/browse/KAFKA-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659477#comment-14659477 ] Jiangjie Qin commented on KAFKA-2143: - In that case, yes, messages 3000-5000 will be lost. If producer is not producing with acks=-1, that is the expected behavior. If acks=-1, as long as producer got callback from broker, that means the message has been appended to all the brokers in ISR (NOT all the replicas!) Here is a slide describing how to avoid data loss if you are interested: http://www.slideshare.net/JiangjieQin/no-data-loss-pipeline-with-apache-kafka-49753844 Replicas get ahead of leader and fail - Key: KAFKA-2143 URL: https://issues.apache.org/jira/browse/KAFKA-2143 Project: Kafka Issue Type: Bug Components: replication Affects Versions: 0.8.2.1 Reporter: Evan Huus Assignee: Jiangjie Qin Fix For: 0.8.3 On a cluster of 6 nodes, we recently saw a case where a single under-replicated partition suddenly appeared, replication lag spiked, and network IO spiked. The cluster appeared to recover eventually on its own, Looking at the logs, the thing which failed was partition 7 of the topic {{background_queue}}. It had an ISR of 1,4,3 and its leader at the time was 3. Here are the interesting log lines: On node 3 (the leader): {noformat} [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when processing fetch request for partition [background_queue,7] offset 3722949957 from follower with correlation id 148185816. Possible cause: Request for offset 3722949957 but we only have log segments in the range 3648049863 to 3722949955. (kafka.server.ReplicaManager) [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when processing fetch request for partition [background_queue,7] offset 3722949957 from follower with correlation id 156007054. Possible cause: Request for offset 3722949957 but we only have log segments in the range 3648049863 to 3722949955. (kafka.server.ReplicaManager) [2015-04-23 16:50:13,960] INFO Partition [background_queue,7] on broker 3: Shrinking ISR for partition [background_queue,7] from 1,4,3 to 3 (kafka.cluster.Partition) {noformat} Note that both replicas suddenly asked for an offset *ahead* of the available offsets. And on nodes 1 and 4 (the replicas) many occurrences of the following: {noformat} [2015-04-23 16:50:05,935] INFO Scheduling log segment 3648049863 for log background_queue-7 for deletion. (kafka.log.Log) (edited) {noformat} Based on my reading, this looks like the replicas somehow got *ahead* of the leader, asked for an invalid offset, got confused, and re-replicated the entire topic from scratch to recover (this matches our network graphs, which show 3 sending a bunch of data to 1 and 4). Taking a stab in the dark at the cause, there appears to be a race condition where replicas can receive a new offset before the leader has committed it and is ready to replicate? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2143) Replicas get ahead of leader and fail
[ https://issues.apache.org/jira/browse/KAFKA-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659503#comment-14659503 ] James Cheng commented on KAFKA-2143: Thanks [~becket_qin]. [~ewencp] sent me that slidedeck of yours. I added all your suggested settings to my team's Kafka recommended settings wiki. :) Even with ack=-1, I think it's still possible for a replica to lag behind the leader by 2000 messages, right? It would be lagging behind by 2000 messages that were produced but not-yet-acknowledged. Not sure if losing those would count as message-loss, since they weren't actually ack'd? replica.lag.max.messages is a broker-level config setting, not a topic level config setting. I think that means that it's hard to find a setting that works for high-volume topics and low-volume topics, right? And... a bit of Googling around, and I think my observation is the exact thing that inspired https://cwiki.apache.org/confluence/display/KAFKA/KIP-16+-+Automated+Replica+Lag+Tuning and http://www.confluent.io/blog/hands-free-kafka-replication-a-lesson-in-operational-simplicity/, right? Is it possible for broker to be in the ISR for one topic, and to be out of the ISR for another topic? Replicas get ahead of leader and fail - Key: KAFKA-2143 URL: https://issues.apache.org/jira/browse/KAFKA-2143 Project: Kafka Issue Type: Bug Components: replication Affects Versions: 0.8.2.1 Reporter: Evan Huus Assignee: Jiangjie Qin Fix For: 0.8.3 On a cluster of 6 nodes, we recently saw a case where a single under-replicated partition suddenly appeared, replication lag spiked, and network IO spiked. The cluster appeared to recover eventually on its own, Looking at the logs, the thing which failed was partition 7 of the topic {{background_queue}}. It had an ISR of 1,4,3 and its leader at the time was 3. Here are the interesting log lines: On node 3 (the leader): {noformat} [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when processing fetch request for partition [background_queue,7] offset 3722949957 from follower with correlation id 148185816. Possible cause: Request for offset 3722949957 but we only have log segments in the range 3648049863 to 3722949955. (kafka.server.ReplicaManager) [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when processing fetch request for partition [background_queue,7] offset 3722949957 from follower with correlation id 156007054. Possible cause: Request for offset 3722949957 but we only have log segments in the range 3648049863 to 3722949955. (kafka.server.ReplicaManager) [2015-04-23 16:50:13,960] INFO Partition [background_queue,7] on broker 3: Shrinking ISR for partition [background_queue,7] from 1,4,3 to 3 (kafka.cluster.Partition) {noformat} Note that both replicas suddenly asked for an offset *ahead* of the available offsets. And on nodes 1 and 4 (the replicas) many occurrences of the following: {noformat} [2015-04-23 16:50:05,935] INFO Scheduling log segment 3648049863 for log background_queue-7 for deletion. (kafka.log.Log) (edited) {noformat} Based on my reading, this looks like the replicas somehow got *ahead* of the leader, asked for an invalid offset, got confused, and re-replicated the entire topic from scratch to recover (this matches our network graphs, which show 3 sending a bunch of data to 1 and 4). Taking a stab in the dark at the cause, there appears to be a race condition where replicas can receive a new offset before the leader has committed it and is ready to replicate? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2143) Replicas get ahead of leader and fail
[ https://issues.apache.org/jira/browse/KAFKA-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659277#comment-14659277 ] James Cheng commented on KAFKA-2143: I'm trying to understand part of [~bugzmanov]'s description. If unclean.leader.election.enable=false, then in Step 5, as you say, Broker A is kicked out of the ISR list. What happens, then, to the messages 3000-5000, that were originally in Broker A? Are they just lost? What was the producer settings for request.required.acks (old producer) or acks (new producer) in this scenario? Was it 0 or 1? If it was -1 (old producer) or all (new producer), is it possible for the replica lag to get that large? Replicas get ahead of leader and fail - Key: KAFKA-2143 URL: https://issues.apache.org/jira/browse/KAFKA-2143 Project: Kafka Issue Type: Bug Components: replication Affects Versions: 0.8.2.1 Reporter: Evan Huus Assignee: Jiangjie Qin Fix For: 0.8.3 On a cluster of 6 nodes, we recently saw a case where a single under-replicated partition suddenly appeared, replication lag spiked, and network IO spiked. The cluster appeared to recover eventually on its own, Looking at the logs, the thing which failed was partition 7 of the topic {{background_queue}}. It had an ISR of 1,4,3 and its leader at the time was 3. Here are the interesting log lines: On node 3 (the leader): {noformat} [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when processing fetch request for partition [background_queue,7] offset 3722949957 from follower with correlation id 148185816. Possible cause: Request for offset 3722949957 but we only have log segments in the range 3648049863 to 3722949955. (kafka.server.ReplicaManager) [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when processing fetch request for partition [background_queue,7] offset 3722949957 from follower with correlation id 156007054. Possible cause: Request for offset 3722949957 but we only have log segments in the range 3648049863 to 3722949955. (kafka.server.ReplicaManager) [2015-04-23 16:50:13,960] INFO Partition [background_queue,7] on broker 3: Shrinking ISR for partition [background_queue,7] from 1,4,3 to 3 (kafka.cluster.Partition) {noformat} Note that both replicas suddenly asked for an offset *ahead* of the available offsets. And on nodes 1 and 4 (the replicas) many occurrences of the following: {noformat} [2015-04-23 16:50:05,935] INFO Scheduling log segment 3648049863 for log background_queue-7 for deletion. (kafka.log.Log) (edited) {noformat} Based on my reading, this looks like the replicas somehow got *ahead* of the leader, asked for an invalid offset, got confused, and re-replicated the entire topic from scratch to recover (this matches our network graphs, which show 3 sending a bunch of data to 1 and 4). Taking a stab in the dark at the cause, there appears to be a race condition where replicas can receive a new offset before the leader has committed it and is ready to replicate? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2143) Replicas get ahead of leader and fail
[ https://issues.apache.org/jira/browse/KAFKA-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651344#comment-14651344 ] Jiangjie Qin commented on KAFKA-2143: - Actually I realized KAFKA-2334 is orthogonal to this issue. I submitted patch for this ticket and will leave KAFKA-2334 to be solved separately. Replicas get ahead of leader and fail - Key: KAFKA-2143 URL: https://issues.apache.org/jira/browse/KAFKA-2143 Project: Kafka Issue Type: Bug Components: replication Affects Versions: 0.8.2.1 Reporter: Evan Huus Assignee: Jiangjie Qin On a cluster of 6 nodes, we recently saw a case where a single under-replicated partition suddenly appeared, replication lag spiked, and network IO spiked. The cluster appeared to recover eventually on its own, Looking at the logs, the thing which failed was partition 7 of the topic {{background_queue}}. It had an ISR of 1,4,3 and its leader at the time was 3. Here are the interesting log lines: On node 3 (the leader): {noformat} [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when processing fetch request for partition [background_queue,7] offset 3722949957 from follower with correlation id 148185816. Possible cause: Request for offset 3722949957 but we only have log segments in the range 3648049863 to 3722949955. (kafka.server.ReplicaManager) [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when processing fetch request for partition [background_queue,7] offset 3722949957 from follower with correlation id 156007054. Possible cause: Request for offset 3722949957 but we only have log segments in the range 3648049863 to 3722949955. (kafka.server.ReplicaManager) [2015-04-23 16:50:13,960] INFO Partition [background_queue,7] on broker 3: Shrinking ISR for partition [background_queue,7] from 1,4,3 to 3 (kafka.cluster.Partition) {noformat} Note that both replicas suddenly asked for an offset *ahead* of the available offsets. And on nodes 1 and 4 (the replicas) many occurrences of the following: {noformat} [2015-04-23 16:50:05,935] INFO Scheduling log segment 3648049863 for log background_queue-7 for deletion. (kafka.log.Log) (edited) {noformat} Based on my reading, this looks like the replicas somehow got *ahead* of the leader, asked for an invalid offset, got confused, and re-replicated the entire topic from scratch to recover (this matches our network graphs, which show 3 sending a bunch of data to 1 and 4). Taking a stab in the dark at the cause, there appears to be a race condition where replicas can receive a new offset before the leader has committed it and is ready to replicate? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2143) Replicas get ahead of leader and fail
[ https://issues.apache.org/jira/browse/KAFKA-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650146#comment-14650146 ] Jiangjie Qin commented on KAFKA-2143: - Hmm, with current design when the follower got OffsetOutOfRangeException during fetch, and if *leader's log end offset is smaller than follower's log end offset*, follower will not truncate to zero but the current leader's log end offsets. That means in step (5), broker A should start to fetch from offset 3000 instead of drop to 0 unless broker B really have nothing. However, if the follower's log end offsets is smaller than the leader's log end offsets but we received offset out of range exception. We assume it was because the follower's log end offsets is too small and behind the earliest available offset of leader. But it might not necessarily be true, because we are fetching the latest offset again after we receive the exception. So it is possible that when the exception was returned, follower was ahead of the leader, but when it tries to handle the exception, since the new leader has got some new messages appended, the follower is now behind the leader. In that case, the follower will truncate its log to the log start offset of leader, which is the problem you described. I think what happened is as below: 1) Broker A (leader) has committed offset up-to 5000 2) Broker B (follower) has committed offset up to 3000 (he is still in ISR because of replica.lag.max.messages) ***network glitch happens*** 3) Broker B becomes a leader, Broker A becomes a follower 4) Broker A fetch from Broker B and got an OffsetOutOfRangeException. 4.1) Broker B got some new message and its log end offset become 6000. 4.2) Broker A tries to handle the Exception so it checks the log end offset on broker B and found it is greater than broker A's log end offset so it truncate itself to the starting offset of Broker B. And the rest is pretty what you described. By design at step (3) when Broker A become follower, it will truncate its log to its high watermark, which should be 3000. There is a possible related issue KAFKA-2334, which might make 3000 not the real high watermark on Broker B. And that issue triggered the offset out of range in step (4). So it looks to me KAFKA-2334 and the ReplicaFetcherThread bug together caused the issue. I'll take both of the issues and see if they can be solved together. Replicas get ahead of leader and fail - Key: KAFKA-2143 URL: https://issues.apache.org/jira/browse/KAFKA-2143 Project: Kafka Issue Type: Bug Components: replication Affects Versions: 0.8.2.1 Reporter: Evan Huus On a cluster of 6 nodes, we recently saw a case where a single under-replicated partition suddenly appeared, replication lag spiked, and network IO spiked. The cluster appeared to recover eventually on its own, Looking at the logs, the thing which failed was partition 7 of the topic {{background_queue}}. It had an ISR of 1,4,3 and its leader at the time was 3. Here are the interesting log lines: On node 3 (the leader): {noformat} [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when processing fetch request for partition [background_queue,7] offset 3722949957 from follower with correlation id 148185816. Possible cause: Request for offset 3722949957 but we only have log segments in the range 3648049863 to 3722949955. (kafka.server.ReplicaManager) [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when processing fetch request for partition [background_queue,7] offset 3722949957 from follower with correlation id 156007054. Possible cause: Request for offset 3722949957 but we only have log segments in the range 3648049863 to 3722949955. (kafka.server.ReplicaManager) [2015-04-23 16:50:13,960] INFO Partition [background_queue,7] on broker 3: Shrinking ISR for partition [background_queue,7] from 1,4,3 to 3 (kafka.cluster.Partition) {noformat} Note that both replicas suddenly asked for an offset *ahead* of the available offsets. And on nodes 1 and 4 (the replicas) many occurrences of the following: {noformat} [2015-04-23 16:50:05,935] INFO Scheduling log segment 3648049863 for log background_queue-7 for deletion. (kafka.log.Log) (edited) {noformat} Based on my reading, this looks like the replicas somehow got *ahead* of the leader, asked for an invalid offset, got confused, and re-replicated the entire topic from scratch to recover (this matches our network graphs, which show 3 sending a bunch of data to 1 and 4). Taking a stab in the dark at the cause, there appears to be a race condition where replicas can receive a new offset before the leader has committed it and is ready to replicate? -- This message was sent by Atlassian JIRA
[jira] [Commented] (KAFKA-2143) Replicas get ahead of leader and fail
[ https://issues.apache.org/jira/browse/KAFKA-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648444#comment-14648444 ] Rafael Bagmanov commented on KAFKA-2143: I would like to add that we are experiencing this issue quite often. And the situation get worse when we have configs like this replication factor: 2 unclean.leader.election.enable: true This leads to loss of a whole partition in case of two network glitches happening in a row. The failure scenario looks like this: we have replica.lag.max.messages: 4000 so I'm guessing follower is always little bit behind leader for high volume topic (which means leader has a larger offset value than the follower) 1) Broker A (leader) has committed offset up-to 5000 2) Broker B (follower) has committed offset up to 3000 (he is still in ISR because of replica.lag.max.messages) ***network glitch happens*** 3) Broker B becomes a leader, Broker A becomes a follower 4) Broker A (follower) asks leader for messages starting from 5000 5) Broker A (follower) receives message that this is invalid offset (Broker B has only 3000) and drops partition to 0 ***network glitch happens*** 6) Broker A becomes a leader (unclean election), Broker B becomes a follower 7) Broker B (follower) ask leader for messages starting from 3000 8) Broker B (follower) receives message that this is invalid offset (Broker A has only 0) and drops partition to 0 As a result we lost partition because of 2 network glitches. And if the configs are replication factor: 2 unclean.leader.election.enable: false (!) the scenario repeats up to 5th step, but then Broker A got kicked out of ISR and unclean election is not happening. Replicas get ahead of leader and fail - Key: KAFKA-2143 URL: https://issues.apache.org/jira/browse/KAFKA-2143 Project: Kafka Issue Type: Bug Components: replication Affects Versions: 0.8.2.1 Reporter: Evan Huus On a cluster of 6 nodes, we recently saw a case where a single under-replicated partition suddenly appeared, replication lag spiked, and network IO spiked. The cluster appeared to recover eventually on its own, Looking at the logs, the thing which failed was partition 7 of the topic {{background_queue}}. It had an ISR of 1,4,3 and its leader at the time was 3. Here are the interesting log lines: On node 3 (the leader): {noformat} [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when processing fetch request for partition [background_queue,7] offset 3722949957 from follower with correlation id 148185816. Possible cause: Request for offset 3722949957 but we only have log segments in the range 3648049863 to 3722949955. (kafka.server.ReplicaManager) [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when processing fetch request for partition [background_queue,7] offset 3722949957 from follower with correlation id 156007054. Possible cause: Request for offset 3722949957 but we only have log segments in the range 3648049863 to 3722949955. (kafka.server.ReplicaManager) [2015-04-23 16:50:13,960] INFO Partition [background_queue,7] on broker 3: Shrinking ISR for partition [background_queue,7] from 1,4,3 to 3 (kafka.cluster.Partition) {noformat} Note that both replicas suddenly asked for an offset *ahead* of the available offsets. And on nodes 1 and 4 (the replicas) many occurrences of the following: {noformat} [2015-04-23 16:50:05,935] INFO Scheduling log segment 3648049863 for log background_queue-7 for deletion. (kafka.log.Log) (edited) {noformat} Based on my reading, this looks like the replicas somehow got *ahead* of the leader, asked for an invalid offset, got confused, and re-replicated the entire topic from scratch to recover (this matches our network graphs, which show 3 sending a bunch of data to 1 and 4). Taking a stab in the dark at the cause, there appears to be a race condition where replicas can receive a new offset before the leader has committed it and is ready to replicate? -- This message was sent by Atlassian JIRA (v6.3.4#6332)