[jira] [Commented] (HDFS-7342) Lease Recovery doesn't happen some times

Yongjun Zhang (JIRA) Tue, 25 Nov 2014 00:22:37 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224213#comment-14224213
 ]


Yongjun Zhang commented on HDFS-7342:
-------------------------------------

Hi Guys, 

Thanks a lot for the comments and new rev. Please see my comments below, one 
for each of you:-)

{quote}
If any COMMITTED blocks reaches minReplication, state will be automatically 
changed to COMPLETE while processing that IBR itself. Need not be user call. So 
there is no chance of COMMITTED block state with minReplication met. right?
{quote}
Hi [~vinayrpet], indeed the following code in {{BlockManager::addStoredBlock}} 
may be called when IBR is processed, that matches what you were saying:
{code}
  if(storedBlock.getBlockUCState() == BlockUCState.COMMITTED &&
        numLiveReplicas >= minReplication) {
      storedBlock = completeBlock(bc, storedBlock, false);
  }
{code}
But the block has to be COMMITTED to be made COMPLETE. If it's not COMMITTED 
yet (changing to COMMITTED is a request from client and it's asynchronous) , 
even if it has min replication number of replications, it won't be changed to 
COMPLETE. So I think we may still need to take care of changing block's state 
to COMPLETE in {{FSNamesystem#internalReleaseLease}}. Right?

Hi [~kihwal], 

Summary of my understanding of your comment is, there are two paths, one is the 
regular write, the other is recovery. 
* for regular write path, we need to enforce minimal replication
* for the recovery patch, we just need to enforce 1 replica and let replication 
monitor to take care of the rest.
* we can make commitBlockSynchronization() to change a block to COMMITTED when 
there is at least one replica, ignoring min-replication. Currently only client 
can inform NN asynchronously to make a block COMMITTED.

I think it makes sense. Am I understanding you correctly?

Hi Ravi,
Thanks for the new rev. While we are still discussing the final solution, I 
noticed couple of things in your rev3 per my original suggested solution:
 
1. Change 
{code}
4471       * <li>If the penultimate/last block is COMMITTED or COMPLETE -> 
force the 
4472       * block to be COMPLETE even if it is not minimally replicated</li>
{code}
To
{code}
4471       * <li>If the penultimate/last block is COMMITTED  -> force the 
4472       * block to be COMPLETE if it is minimally replicated</li>
{code}

2. you forgot to add {{setBlockCollection(blk.getBlockCollection());}} in 
BlockInfoDesired constructor, thus Null pointer exception will happen. 

Let's not rush into addressing those, but see if we can work out a solution 
toward the direction Kihwal stated.

Thank you all again.


> Lease Recovery doesn't happen some times
> ----------------------------------------
>
>                 Key: HDFS-7342
>                 URL: https://issues.apache.org/jira/browse/HDFS-7342
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.0.0-alpha
>            Reporter: Ravi Prakash
>            Assignee: Ravi Prakash
>         Attachments: HDFS-7342.1.patch, HDFS-7342.2.patch, HDFS-7342.3.patch
>
>
> In some cases, LeaseManager tries to recover a lease, but is not able to. 
> HDFS-4882 describes a possibility of that. We should fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7342) Lease Recovery doesn't happen some times

Reply via email to