[ https://issues.apache.org/jira/browse/HDFS-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224407#comment-14224407 ]
Vinayakumar B commented on HDFS-7342: ------------------------------------- {quote}But the block has to be COMMITTED to be made COMPLETE. If it's not COMMITTED yet (changing to COMMITTED is a request from client and it's asynchronous) , even if it has min replication number of replications, it won't be changed to COMPLETE. So I think we may still need to take care of changing block's state to COMPLETE in FSNamesystem#internalReleaseLease. Right?{quote} I agree that client request and Datanode's IBR are asynchronous. But both will update the block state under writelock. penultimate block will be COMMITTED in the {{getAdditionalBlock()}} client's request. Here there are 3 possibilities, 1. All IBRs comes before even block is COMMITTED. At this time, if the block is FINALIZED in DN, replica will be accepted. {code} if (ucBlock.reportedState == ReplicaState.FINALIZED && !block.findDatanode(storageInfo.getDatanodeDescriptor())) { addStoredBlock(block, storageInfo, null, true); }{code} 2. If client request comes after receiving 2 (=minReplication) IBRs, then client request only will make the state to COMPLETED immediately after making it COMMITTED in following code of {{BlockManager#commitOrCompleteLastBlock()}} {code} final boolean b = commitBlock((BlockInfoUnderConstruction)lastBlock, commitBlock); if(countNodes(lastBlock).liveReplicas() >= minReplication) completeBlock(bc, bc.numBlocks()-1, false); return b;{code} At this time, if the IBRs received are not enough, then block will be just COMMITTED. 3. If the IBRs received after client request. i.e. after COMMITTED, then while processing the second IBR block will be COMPLETED in below code. {code} if(storedBlock.getBlockUCState() == BlockUCState.COMMITTED && numLiveReplicas >= minReplication) { storedBlock = completeBlock(bc, storedBlock, false);{code} So I couldnt find the possibility of the Block in COMMITTED state with minReplication met. {quote}{{recoverLeaseInternal()}} and {{internalReleaseLease()}} will need to be made to distinguish the on-demand recovery from normal lease expiration. For on-demand recovery, we might want it to fail if there is no live replicas, as a file lease is normally recovered for subsequent append or copy(read). If there is no data, they will fail.{quote} I understood [~kihwal]'s suggestions as below. {{recoverLease()}} call from client passes a {{force}} flag to {{recoverLeaseInternal()}}. Based on this flag, we can check the block's states (excluding last block) and # of replicas and decide to go ahead for recovery or not even initiating request to DataNode. So we need not worry this case in commitBlockSynchronization. In {{commitBlockSynchronization()}} directly complete all blocks and close the file. Am I right [~kihwal] ? > Lease Recovery doesn't happen some times > ---------------------------------------- > > Key: HDFS-7342 > URL: https://issues.apache.org/jira/browse/HDFS-7342 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.0.0-alpha > Reporter: Ravi Prakash > Assignee: Ravi Prakash > Attachments: HDFS-7342.1.patch, HDFS-7342.2.patch, HDFS-7342.3.patch > > > In some cases, LeaseManager tries to recover a lease, but is not able to. > HDFS-4882 describes a possibility of that. We should fix this -- This message was sent by Atlassian JIRA (v6.3.4#6332)