[ https://issues.apache.org/jira/browse/HDFS-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397263#comment-13397263 ]
Vinay commented on HDFS-3541: ----------------------------- Hi Lee, I agree your suggestion. But as of now there is no suggestion to stop the PacketResponder. Following things can happen with the proposed solution, # initReplicaRecovery(..) call will interrupt the receiver thread, but it will make the replica state to RUR and release the fsdataset lock. # now PacketResponder may finalize the block, i.e. replica state will be changed to FINALIZED, # then updateReplicaUnderRecovery(..) call will fail because replica is not in RUR state I think we can restrict PacketResponder to finalize the block which is in RUR by throwing exception. In this case updateReplicaUnderRecovery(..) will not fail, and recovery will be success. > Deadlock between recovery, xceiver and packet responder > ------------------------------------------------------- > > Key: HDFS-3541 > URL: https://issues.apache.org/jira/browse/HDFS-3541 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node > Affects Versions: 0.23.3, 2.0.1-alpha > Reporter: suja s > Assignee: Vinay > Attachments: DN_dump.rar > > > Block Recovery initiated while write in progress at Datanode side. Found a > lock between recovery, xceiver and packet responder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira