[jira] [Commented] (KAFKA-3042) updateIsr should stop after failed several times due to zkVersion issue

Flavio Junqueira (JIRA) Thu, 14 Apr 2016 07:56:36 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241314#comment-15241314
 ]


Flavio Junqueira commented on KAFKA-3042:
-----------------------------------------

[~junrao] In this comment:

https://issues.apache.org/jira/browse/KAFKA-3042?focusedCommentId=15236055&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15236055

 I showed that broker 5 is the one that sent the LeaderAndIsr request to broker 
1, and in here:

https://issues.apache.org/jira/browse/KAFKA-3042?focusedCommentId=15237383&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15237383

that broker 5 also didn't have broker 4 as a live broker when it sent the 
request to broker 1. It does sound right that the controller on failover should 
update the list of live brokers on other brokers before sending requests that 
make them followers or at least the problem should be transient in the sense 
that it could be corrected with a later message. However, it sounds like for 
the partition we are analyzing, there is this additional problem that 
controller 5 also didn't have broker 4 in its list of live brokers.

Interestingly, I also caught an instance of this:

{noformat}
[2016-04-09 00:37:54,111] DEBUG Sending MetadataRequest to 
Brokers:ArrayBuffer(2, 5)...
[2016-04-09 00:37:54,111] ERROR Haven't been able to send metadata update 
requests...
[2016-04-09 00:37:54,112] ERROR [Controller 5]: Forcing the controller to 
resign (kafka.controller.KafkaController)
{noformat}

I don't think this is related, but we have been wondering in another issue 
about the possible causes of batches in {{ControllerBrokerRequestBatch}} not 
being empty, and there are a few occurrences of it in these logs. This is 
happening, however, right after the controller resigns, so I'm guessing this is 
related to the controller shutting down:

{noformat}
[2016-04-09 00:37:54,064] INFO [Controller 5]: Broker 5 resigned as the 
controller (kafka.controller.KafkaController)
{noformat}

In any case, for this last issue, I'll create a jira to make sure that we have 
enough info to identify this issue when it happens. Currently, the exception is 
being propagated, but nowhere we are logging the cause.


> updateIsr should stop after failed several times due to zkVersion issue
> -----------------------------------------------------------------------
>
>                 Key: KAFKA-3042
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3042
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.2.1
>         Environment: jdk 1.7
> centos 6.4
>            Reporter: Jiahongchao
>             Fix For: 0.10.0.0
>
>         Attachments: controller.log, server.log.2016-03-23-01, 
> state-change.log
>
>
> sometimes one broker may repeatly log
> "Cached zkVersion 54 not equal to that in zookeeper, skip updating ISR"
> I think this is because the broker consider itself as the leader in fact it's 
> a follower.
> So after several failed tries, it need to find out who is the leader



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-3042) updateIsr should stop after failed several times due to zkVersion issue

Reply via email to