[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change

James Cheng (JIRA) Fri, 25 Aug 2017 15:32:21 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16142355#comment-16142355
 ]


James Cheng commented on KAFKA-1120:
------------------------------------

Do you mean this mbean?

kafka.network:type=RequestMetrics,name=TotalTimeMs,request=ControlledShutdown

[~junrao], I think you've mentioned before that controller.socket.timeout.ms 
applies to *all* broker-controller communication. So not just 
ControlledShutdown requests, but for LeaderAndIsr updates and stuff like that. 
I'm hesitant to touch that metric. Although, with high partition counts, would 
it be recommended? Most of those other requests are fairly quick requests, so I 
don't think they would benefit from increased socket timeouts. But then again, 
the increased timeout wouldn't hurt them either.

I think ControlledShutdown is one of the few synchronous operations, right?


> Controller could miss a broker state change 
> --------------------------------------------
>
>                 Key: KAFKA-1120
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1120
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8.1
>            Reporter: Jun Rao
>              Labels: reliability
>             Fix For: 1.0.0
>
>
> When the controller is in the middle of processing a task (e.g., preferred 
> leader election, broker change), it holds a controller lock. During this 
> time, a broker could have de-registered and re-registered itself in ZK. After 
> the controller finishes processing the current task, it will start processing 
> the logic in the broker change listener. However, it will see no broker 
> change and therefore won't do anything to the restarted broker. This broker 
> will be in a weird state since the controller doesn't inform it to become the 
> leader of any partition. Yet, the cached metadata in other brokers could 
> still list that broker as the leader for some partitions. Client requests 
> routed to that broker will then get a TopicOrPartitionNotExistException. This 
> broker will continue to be in this bad state until it's restarted again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change

Reply via email to