[jira] [Commented] (KAFKA-7836) The propagation of log dir failure can be delayed due to slowness in closing the file handles

Dong Lin (JIRA) Thu, 17 Jan 2019 22:16:59 -0800


    [ 
https://issues.apache.org/jira/browse/KAFKA-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16745816#comment-16745816
 ]


Dong Lin commented on KAFKA-7836:
---------------------------------

[~junrao] This solution sounds good to me.

> The propagation of log dir failure can be delayed due to slowness in closing 
> the file handles
> ---------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-7836
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7836
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Jun Rao
>            Priority: Major
>
> In ReplicaManager.handleLogDirFailure(), we call 
> zkClient.propagateLogDirEvent after  logManager.handleLogDirFailure. The 
> latter closes the file handles of the offline replicas, which could take time 
> when the disk is bad. This will delay the new leader election by the 
> controller. In one incident, we have seen the closing of file handles of 
> multiple replicas taking more than 20 seconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (KAFKA-7836) The propagation of log dir failure can be delayed due to slowness in closing the file handles

Reply via email to