Daryn Sharp created HDFS-13465:
----------------------------------

             Summary: Overlapping lease recoveries cause NPE in NN
                 Key: HDFS-13465
                 URL: https://issues.apache.org/jira/browse/HDFS-13465
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: namenode
    Affects Versions: 2.8.0
            Reporter: Daryn Sharp


Overlapping lease recoveries for the same file will NPE in the DatanodeManager 
while creating LeaseRecoveryCommands, possibly losing other recovery commands.
 * client1 calls recoverLease, file is added to DN1's recovery queue
 * client2 calls recoverLease, file is added to DN2's recovery queue
 * one DN heartbeats, gets the block recovery command and it completes the 
synchronization before the other DN heartbeats; ie. file is closed.
 * other DN heartbeats, takes block from recovery queue, assumes it's still 
under construction, gets a NPE calling getExpectedLocations

{code:java}
//check lease recovery
BlockInfo[] blocks = nodeinfo.getLeaseRecoveryCommand(Integer.MAX_VALUE);
if (blocks != null) {
  BlockRecoveryCommand brCommand = new BlockRecoveryCommand(
      blocks.length);
  for (BlockInfo b : blocks) {
    BlockUnderConstructionFeature uc = b.getUnderConstructionFeature();
    assert uc != null;
    final DatanodeStorageInfo[] storages = uc.getExpectedStorageLocations();
{code}
This is "ok" to the NN state if only 1 block was queued.  All recoveries are 
lost if multiple blocks were queued.  Recovery will not occur until the client 
explicitly retries or the lease monitor recovers the lease.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to