[ https://issues.apache.org/jira/browse/HDFS-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213992#comment-14213992 ]
Yongjun Zhang commented on HDFS-4882: ------------------------------------- HI, Thanks Zesheng for reporting the issue, Ravi for working on the solution and other folks for reviewing. I was looking into an infinite loop case when doing checkLeases myself, and figured out that the logic in FSNamesystem#internalReleaseLease {code} switch(lastBlockState) { case COMPLETE: assert false : "Already checked that the last block is incomplete"; break; {code} doesn't take care of the case that penultimate block is COMMITTED and final block is COMPLETE, thus caused the infinite loop. Looking at the history of this jira, I found [~jingzhao] suggested the same at https://issues.apache.org/jira/browse/HDFS-4882?focusedCommentId=14207202&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14207202 I did some analysis to share here (sorry for a long post). When the final block is COMMITTED, the current implementation does the following: {code} case COMMITTED: // Close file if committed blocks are minimally replicated <=================== senario#1 if(penultimateBlockMinReplication && blockManager.checkMinReplication(lastBlock)) { finalizeINodeFileUnderConstruction(src, pendingFile, iip.getLatestSnapshotId()); NameNode.stateChangeLog.warn("BLOCK*" + " internalReleaseLease: Committed blocks are minimally replicated," + " lease removed, file closed."); return true; // closed! } // Cannot close file right now, since some blocks <======================== scenario#2 // are not yet minimally replicated. // This may potentially cause infinite loop in lease recovery // if there are no valid replicas on data-nodes. String message = "DIR* NameSystem.internalReleaseLease: " + "Failed to release lease for file " + src + ". Committed blocks are waiting to be minimally replicated." + " Try again later."; NameNode.stateChangeLog.warn(message); throw new AlreadyBeingCreatedException(message); {code} What it does: * For scenario#1, check minReplication for both penultimate and last block, if satisifed, finalize the block (recover lease, close file) * For scenario#2, throw AlreadyBeingCreatedException derived from IOException (the name of this exception appears to be a misnomer, maybe we should fix later). To solve the case that penultimate block is COMMITTED and final block is COMPLETE, I'd suggest to make some changes on top of the submitted patch (for further discussion): For scenario#1, we can do the same as when the last block is COMMITTED, as described above. For scenario#2, I think we have two options: # option A, drop the code in the existing code that handles scenario#2 (not to throw the exception), let checkLeases check back again (2 second is current internal), waiting for block report to finish to change the minimal replication situation then recover the lease. The infinite loop could still happen if the minimal replication never get satisfied. But this would be rare assuming the minimal replication can be satisfied eventually. # option B, do the similar logic as in the existing code (throwing AlreadyBeingCreatedException). There is an issue for this option too that I can see, and described below. With option B, look at the caller side (LeaseManager#checkLeases, whenever an IOException is caught, it just go ahead removing the lease. So the possible infinite loop described in the scenario#2 comment will not happen because of the lease removal (lease recovered). But the problem with option B is, after the lease removal, the file may still have blocks not satisfying minimal replication (scenario#2), which would be a potential issue. This situation exists in current implementation when handling the case that the last block is COMMITTED. I think we should we wait for minimal replication to be satisfied before recovering the lease. So looks like option A is more preferable. But the original code tries to recover the lease immediately, I'm not sure whether there is any catch here. Comments, thoughts? Thanks again. > Namenode LeaseManager checkLeases() runs into infinite loop > ----------------------------------------------------------- > > Key: HDFS-4882 > URL: https://issues.apache.org/jira/browse/HDFS-4882 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client, namenode > Affects Versions: 2.0.0-alpha, 2.5.1 > Reporter: Zesheng Wu > Assignee: Ravi Prakash > Priority: Critical > Attachments: 4882.1.patch, 4882.patch, 4882.patch, HDFS-4882.1.patch, > HDFS-4882.2.patch, HDFS-4882.3.patch, HDFS-4882.4.patch, HDFS-4882.patch > > > Scenario: > 1. cluster with 4 DNs > 2. the size of the file to be written is a little more than one block > 3. write the first block to 3 DNs, DN1->DN2->DN3 > 4. all the data packets of first block is successfully acked and the client > sets the pipeline stage to PIPELINE_CLOSE, but the last packet isn't sent out > 5. DN2 and DN3 are down > 6. client recovers the pipeline, but no new DN is added to the pipeline > because of the current pipeline stage is PIPELINE_CLOSE > 7. client continuously writes the last block, and try to close the file after > written all the data > 8. NN finds that the penultimate block doesn't has enough replica(our > dfs.namenode.replication.min=2), and the client's close runs into indefinite > loop(HDFS-2936), and at the same time, NN makes the last block's state to > COMPLETE > 9. shutdown the client > 10. the file's lease exceeds hard limit > 11. LeaseManager realizes that and begin to do lease recovery by call > fsnamesystem.internalReleaseLease() > 12. but the last block's state is COMPLETE, and this triggers lease manager's > infinite loop and prints massive logs like this: > {noformat} > 2013-06-05,17:42:25,695 INFO > org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease [Lease. Holder: > DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1] has expired hard > limit > 2013-06-05,17:42:25,695 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. > Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1], src= > /user/h_wuzesheng/test.dat > 2013-06-05,17:42:25,695 WARN org.apache.hadoop.hdfs.StateChange: DIR* > NameSystem.internalReleaseLease: File = /user/h_wuzesheng/test.dat, block > blk_-7028017402720175688_1202597, > lastBLockState=COMPLETE > 2013-06-05,17:42:25,695 INFO > org.apache.hadoop.hdfs.server.namenode.LeaseManager: Started block recovery > for file /user/h_wuzesheng/test.dat lease [Lease. Holder: DFSClient_NONM > APREDUCE_-1252656407_1, pendingcreates: 1] > {noformat} > (the 3rd line log is a debug log added by us) -- This message was sent by Atlassian JIRA (v6.3.4#6332)