[jira] [Commented] (HDFS-7342) Lease Recovery doesn't happen some times

Vinayakumar B (JIRA) Tue, 25 Nov 2014 03:19:38 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224407#comment-14224407
 ]


Vinayakumar B commented on HDFS-7342:
-------------------------------------

{quote}But the block has to be COMMITTED to be made COMPLETE. If it's not 
COMMITTED yet (changing to COMMITTED is a request from client and it's 
asynchronous) , even if it has min replication number of replications, it won't 
be changed to COMPLETE. So I think we may still need to take care of changing 
block's state to COMPLETE in FSNamesystem#internalReleaseLease. Right?{quote}
I agree that client request and Datanode's IBR are asynchronous. But both will 
update the block state under writelock.
penultimate block will  be COMMITTED in the {{getAdditionalBlock()}} client's 
request.

Here there are 3 possibilities,
1. All IBRs comes before even block is COMMITTED. At this time, if the block is 
FINALIZED in DN, replica will be accepted.
{code}    if (ucBlock.reportedState == ReplicaState.FINALIZED &&
        !block.findDatanode(storageInfo.getDatanodeDescriptor())) {
      addStoredBlock(block, storageInfo, null, true);
    }{code}
2. If client request comes after receiving 2 (=minReplication) IBRs, then 
client request only will make the state to COMPLETED immediately after making 
it COMMITTED in following code of {{BlockManager#commitOrCompleteLastBlock()}}
{code}    final boolean b = commitBlock((BlockInfoUnderConstruction)lastBlock, 
commitBlock);
    if(countNodes(lastBlock).liveReplicas() >= minReplication)
      completeBlock(bc, bc.numBlocks()-1, false);
    return b;{code}
  At this time, if the IBRs received are not enough, then block will be just 
COMMITTED.

3. If the IBRs received after client request. i.e. after COMMITTED, then while 
processing the second IBR block will be COMPLETED in below code.
{code}    if(storedBlock.getBlockUCState() == BlockUCState.COMMITTED &&
        numLiveReplicas >= minReplication) {
      storedBlock = completeBlock(bc, storedBlock, false);{code}

So I couldnt find the possibility of the Block in COMMITTED state with 
minReplication met.

{quote}{{recoverLeaseInternal()}} and {{internalReleaseLease()}} will need to 
be made to distinguish the on-demand recovery from normal lease expiration. For 
on-demand recovery, we might want it to fail if there is no live replicas, as a 
file lease is normally recovered for subsequent append or copy(read). If there 
is no data, they will fail.{quote}
I understood [~kihwal]'s suggestions as below.
{{recoverLease()}} call from client passes a {{force}} flag to 
{{recoverLeaseInternal()}}. Based on this flag, we can check the block's states 
(excluding last block) and # of replicas and decide to go ahead for recovery or 
not even initiating request to DataNode. 
So we need not worry this case in commitBlockSynchronization. In 
{{commitBlockSynchronization()}} directly complete all blocks and close the 
file.
Am I right [~kihwal] ?

> Lease Recovery doesn't happen some times
> ----------------------------------------
>
>                 Key: HDFS-7342
>                 URL: https://issues.apache.org/jira/browse/HDFS-7342
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.0.0-alpha
>            Reporter: Ravi Prakash
>            Assignee: Ravi Prakash
>         Attachments: HDFS-7342.1.patch, HDFS-7342.2.patch, HDFS-7342.3.patch
>
>
> In some cases, LeaseManager tries to recover a lease, but is not able to. 
> HDFS-4882 describes a possibility of that. We should fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7342) Lease Recovery doesn't happen some times

Reply via email to