[ 
https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16859246#comment-16859246
 ] 

star commented on HDFS-12914:
-----------------------------

[~hexiaoqiao], I also write a unit test for this issue, mostly similar to 
yours. Pasted here just for ref.

Other than the test code, a piece of code changed. BlockManager#processReport 
will throw IOException to indicate an invalid lease id. Client will get the 
exception.
{code:java}
if (context != null) {
  if (!blockReportLeaseManager.checkLease(node, startTime,
        context.getLeaseId())) {
    throw new IOException("Invalid block report lease id 
'"+context.getLeaseId()+"'");
  }
}{code}
{code:java}
@Test
public void testDelayedBlockReport() throws IOException{
  FSNamesystem namesystem = cluster.getNameNode(0).getNamesystem();

  BlockManager testBlockManager = Mockito.spy(namesystem.getBlockManager());

  Mockito.doAnswer(new Answer<Boolean>() {
    @Override
    public Boolean answer(InvocationOnMock invocationOnMock) throws Throwable {
      //sleep 1000 ms to delay processing of current report
      Thread.sleep(1000);
      return (Boolean)invocationOnMock.callRealMethod();
    }
  }).when(testBlockManager).processReport(
          Mockito.any(DatanodeID.class), Mockito.any(DatanodeStorage.class),
          Mockito.any(BlockListAsLongs.class),        
  Mockito.any(BlockReportContext.class));
  namesystem.setBlockManagerForTesting(testBlockManager);

  String bpid = namesystem.getBlockPoolId();
  DataNode dn = cluster.getDataNodes().get(0);
  DatanodeRegistration dnReg = dn.getDNRegistrationForBP(bpid);

  namesystem.readLock();
  long leaseId = testBlockManager.requestBlockReportLeaseId(dnReg);
  namesystem.readUnlock();

  Map<DatanodeStorage, BlockListAsLongs> report = cluster.getBlockReport(bpid, 
0);

  List<StorageBlockReport> reportList = new ArrayList<>();
  for(Map.Entry<DatanodeStorage, BlockListAsLongs> en : report.entrySet()){
    reportList.add(new StorageBlockReport(en.getKey(), en.getValue()));
  }

  //it will throw IOException if lease id is invalid
  cluster.getNameNode().getRpcServer().blockReport(
      dnReg, bpid, reportList.toArray(new StorageBlockReport[]{}),
          new BlockReportContext(1, 0, System.nanoTime(), leaseId, true));
}
{code}

> Block report leases cause missing blocks until next report
> ----------------------------------------------------------
>
>                 Key: HDFS-12914
>                 URL: https://issues.apache.org/jira/browse/HDFS-12914
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.8.0, 2.9.2
>            Reporter: Daryn Sharp
>            Assignee: Santosh Marella
>            Priority: Critical
>         Attachments: HDFS-12914-branch-2.001.patch, 
> HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, 
> HDFS-12914.006.patch
>
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for 
> conditions such as "unknown datanode", "not in pending set", "lease has 
> expired", wrong lease id, etc.  Lease rejection does not throw an exception.  
> It returns false which bubbles up to  {{NameNodeRpcServer#blockReport}} and 
> interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes 
> active with _no blocks_.  A replication storm ensues possibly causing DNs to 
> temporarily go dead (HDFS-12645), leading to more FBR lease rejections on 
> re-registration.  The cluster will have many "missing blocks" until the DNs 
> next FBR is sent and/or forced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to