[ https://issues.apache.org/jira/browse/KAFKA-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16745816#comment-16745816 ]
Dong Lin commented on KAFKA-7836: --------------------------------- [~junrao] This solution sounds good to me. > The propagation of log dir failure can be delayed due to slowness in closing > the file handles > --------------------------------------------------------------------------------------------- > > Key: KAFKA-7836 > URL: https://issues.apache.org/jira/browse/KAFKA-7836 > Project: Kafka > Issue Type: Improvement > Reporter: Jun Rao > Priority: Major > > In ReplicaManager.handleLogDirFailure(), we call > zkClient.propagateLogDirEvent after logManager.handleLogDirFailure. The > latter closes the file handles of the offline replicas, which could take time > when the disk is bad. This will delay the new leader election by the > controller. In one incident, we have seen the closing of file handles of > multiple replicas taking more than 20 seconds. -- This message was sent by Atlassian JIRA (v7.6.3#76005)