[ 
https://issues.apache.org/jira/browse/KAFKA-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16745816#comment-16745816
 ] 

Dong Lin commented on KAFKA-7836:
---------------------------------

[~junrao] This solution sounds good to me.

> The propagation of log dir failure can be delayed due to slowness in closing 
> the file handles
> ---------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-7836
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7836
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Jun Rao
>            Priority: Major
>
> In ReplicaManager.handleLogDirFailure(), we call 
> zkClient.propagateLogDirEvent after  logManager.handleLogDirFailure. The 
> latter closes the file handles of the offline replicas, which could take time 
> when the disk is bad. This will delay the new leader election by the 
> controller. In one incident, we have seen the closing of file handles of 
> multiple replicas taking more than 20 seconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to