[ https://issues.apache.org/jira/browse/HDFS-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16252859#comment-16252859 ]
Jiandan Yang edited comment on HDFS-12754 at 11/15/17 2:35 AM: ---------------------------------------------------------------- [~xiaochen] Thank you for reviewing. @ The fix here is to close the output streams out of the lease renewer lock I think you may be wrong. The fix is {{LeaseRenewer#run}} does not hold {{LeaseRenewer}} object lock and {{DFSOutputStream}} object lock at the same time, removes dfsClient.closeAllFilesBeingWritten out of synchronized block. {{LeaseRenewer#run}} gets {{LeaseRenewer}} object lock and then releases, gets {{DFSOutputStream}} object lock and releases. {code:java} synchronized (this) { DFSClientFaultInjector.get().sleepWhenRenewLeaseTimeout(); dfsclientsCopy = new ArrayList<>(dfsclients); dfsclients.clear(); //Expire the current LeaseRenewer thread. emptyTime = 0; Factory.INSTANCE.remove(LeaseRenewer.this); } for (DFSClient dfsClient : dfsclientsCopy) { dfsClient.closeAllFilesBeingWritten(true); } {code} was (Author: yangjiandan): [~xiaochen] Thank you for reviewing. @The fix here is to close the output streams out of the lease renewer lock I think you may be wrong. The fix is {{LeaseRenewer#run}} does not hold {{LeaseRenewer}} object lock and {{DFSOutputStream}} object lock at the same time, removes dfsClient.closeAllFilesBeingWritten out of synchronized block. {{LeaseRenewer#run}} gets {{LeaseRenewer}} object lock and then releases, gets {{DFSOutputStream}} object lock and releases. {code:java} synchronized (this) { DFSClientFaultInjector.get().sleepWhenRenewLeaseTimeout(); dfsclientsCopy = new ArrayList<>(dfsclients); dfsclients.clear(); //Expire the current LeaseRenewer thread. emptyTime = 0; Factory.INSTANCE.remove(LeaseRenewer.this); } for (DFSClient dfsClient : dfsclientsCopy) { dfsClient.closeAllFilesBeingWritten(true); } {code} > Lease renewal can hit a deadlock > --------------------------------- > > Key: HDFS-12754 > URL: https://issues.apache.org/jira/browse/HDFS-12754 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.8.1 > Reporter: Kuhu Shukla > Assignee: Kuhu Shukla > Attachments: HDFS-12754.001.patch, HDFS-12754.002.patch, > HDFS-12754.003.patch, HDFS-12754.004.patch, HDFS-12754.005.patch, > HDFS-12754.006.patch, HDFS-12754.007.patch > > > The Client and the renewer can hit a deadlock during close operation since > closeFile() reaches back to the DFSClient#removeFileBeingWritten. This is > possible if the client class close when the renewer is renewing a lease. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org