[jira] [Commented] (HDFS-12914) Block report leases cause missing blocks until next report

Santosh Marella (JIRA) Sun, 19 May 2019 21:37:15 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843647#comment-16843647
 ]


Santosh Marella commented on HDFS-12914:
----------------------------------------

[~jojochuang] - We might have hit this issue recently and some initial 
investigation seems to lead to this issue described by [~daryn] in the 
description. I'm new to this area, but from what I have learnt so far, it seems 
that HDFS-14314 fixes a related, but slightly different scenario. Please see 
below.

 

We have a DN that has 12 disks. When we restarted a standby NN, the DN 
registers itself, gets a lease for FBR and sends a FBR containing 12 reports, 
one for each disk. However, only 3 of them got processed and the 9 aren't 
processed, as the lease had expired before processing the 4th report. This 
essentially meant the FBR was partially processed, and, in our case, this 
*might* be one of the reasons it's taking too long for the NN to come out of 
safe mode (as the safe block count is taking too long to reach the threshold 
due to partial FBR processing).

 

Some raw notes that I've collected investigating this issue. Sorry for being 
verbose, but hope it helps everyone. We are on 2.9.2 version of Hadoop.
----
Dug through the logs and observed this for a DN for which only 3 out of 12 
reports were processed. The DN registered itself with the NN, then sent a FBR 
that contained 12 reports (one for each disk). The NN processed 3 of them 
(indicated by *processing first storage report*  and *processing time* entries 
in the log statements). However, for the 9 remaining reports, it prints *lease 
xxx is not valid for DN* messages.  

 
{code:java}
2019-05-16 15:15:35,028 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
registerDatanode: from DatanodeRegistration(10.54.63.120:50010, 
datanodeUuid=7493442a-c552-43f4-b6bd-728be292f66d, infoPort=50075, 
infoSecurePort=0, ipcPort=50020, 
storageInfo=lv=-57;cid=CID-f4a0a2ae-9e3d-41d5-b98a-e0e77ed0249b;nsid=682930173;c=1406912757005)
 storage 7493442a-c552-43f4-b6bd-728be292f66d
2019-05-16 15:15:35,028 INFO 
org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager: 
Registered DN 7493442a-c552-43f4-b6bd-728be292f66d (10.54.63.120:50010).
2019-05-16 15:31:11,578 INFO BlockStateChange: BLOCK* processReport 
0xb4fb52822c9e3f03: Processing first storage report for 
DS-3e8d8352-ecc9-45cb-a39b-86f10d8aa386 from datanode 
7493442a-c552-43f4-b6bd-728be292f66d
2019-05-16 15:31:11,941 INFO BlockStateChange: BLOCK* processReport 
0xb4fb52822c9e3f03: from storage DS-3e8d8352-ecc9-45cb-a39b-86f10d8aa386 node 
DatanodeRegistration(10.54.63.120:50010, 
datanodeUuid=7493442a-c552-43f4-b6bd-728be292f66d, infoPort=50075, 
infoSecurePort=0, ipcPort=50020, 
storageInfo=lv=-57;cid=CID-f4a0a2ae-9e3d-41d5-b98a-e0e77ed0249b;nsid=682930173;c=1406912757005),
 blocks: 12690, hasStaleStorage: true, processing time: 363 msecs, 
invalidatedBlocks: 0
2019-05-16 15:31:17,496 INFO BlockStateChange: BLOCK* processReport 
0xb4fb52822c9e3f03: Processing first storage report for 
DS-600c8ca1-3f99-41fc-a784-f663b928fe21 from datanode 
7493442a-c552-43f4-b6bd-728be292f66d
2019-05-16 15:31:17,851 INFO BlockStateChange: BLOCK* processReport 
0xb4fb52822c9e3f03: from storage DS-600c8ca1-3f99-41fc-a784-f663b928fe21 node 
DatanodeRegistration(10.54.63.120:50010, 
datanodeUuid=7493442a-c552-43f4-b6bd-728be292f66d, infoPort=50075, 
infoSecurePort=0, ipcPort=50020, 
storageInfo=lv=-57;cid=CID-f4a0a2ae-9e3d-41d5-b98a-e0e77ed0249b;nsid=682930173;c=1406912757005),
 blocks: 12670, hasStaleStorage: true, processing time: 355 msecs, 
invalidatedBlocks: 0
2019-05-16 15:31:23,465 INFO BlockStateChange: BLOCK* processReport 
0xb4fb52822c9e3f03: Processing first storage report for 
DS-3e7dc4c5-ab4e-40d1-8f32-64fe28081f94 from datanode 
7493442a-c552-43f4-b6bd-728be292f66d
2019-05-16 15:31:23,821 INFO BlockStateChange: BLOCK* processReport 
0xb4fb52822c9e3f03: from storage DS-3e7dc4c5-ab4e-40d1-8f32-64fe28081f94 node 
DatanodeRegistration(10.54.63.120:50010, 
datanodeUuid=7493442a-c552-43f4-b6bd-728be292f66d, infoPort=50075, 
infoSecurePort=0, ipcPort=50020, 
storageInfo=lv=-57;cid=CID-f4a0a2ae-9e3d-41d5-b98a-e0e77ed0249b;nsid=682930173;c=1406912757005),
 blocks: 12698, hasStaleStorage: true, processing time: 356 msecs, 
invalidatedBlocks: 0
2019-05-16 15:31:29,419 INFO 
org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager: Removing 
expired block report lease 0xfd013f0084d0ed2d for DN 
7493442a-c552-43f4-b6bd-728be292f66d.
2019-05-16 15:31:29,419 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager: BR lease 
0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, 
because the lease has expired.
2019-05-16 15:31:35,891 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager: BR lease 
0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, 
because the DN is not in the pending set.
2019-05-16 15:31:46,775 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager: BR lease 
0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, 
because the DN is not in the pending set.
2019-05-16 15:31:57,719 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager: BR lease 
0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, 
because the DN is not in the pending set.
2019-05-16 15:32:08,791 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager: BR lease 
0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, 
because the DN is not in the pending set.
2019-05-16 15:32:19,706 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager: BR lease 
0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, 
because the DN is not in the pending set.
2019-05-16 15:32:30,660 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager: BR lease 
0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, 
because the DN is not in the pending set.
2019-05-16 15:32:42,248 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager: BR lease 
0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, 
because the DN is not in the pending set.
2019-05-16 15:32:52,921 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager: BR lease 
0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, 
because the DN is not in the pending set.{code}
On the DN side, we can see that it sent 12 reports in the FBR in a single RPC:
{code:java}
smarella[7]smf1-cjn-22-sr1[hadoop-tst-worker] hadoop $ grep 
"0xb4fb52822c9e3f03" hadoop-hdfs-datanode-smf1-cjn-22-sr1.prod.twitter.com.log*
hadoop-hdfs-datanode-smf1-cjn-22-sr1.prod.twitter.com.log:2019-05-16 
15:33:03,692 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Successfully 
sent block report 0xb4fb52822c9e3f03,  containing 12 storage report(s), of 
which we sent 12. The reports had 151730 total blocks and used 1 RPC(s). This 
took 37 msec to generate and 208787 msecs for RPC and NN processing. Got back 
no commands.
{code}
Looked at the code and it indeed seems to be the case that the lease has 
expired before all the 12 reports were processed.
{code:java}
@Override // DatanodeProtocol
public DatanodeCommand blockReport(final DatanodeRegistration nodeReg,
      String poolId, final StorageBlockReport[] reports,
      final BlockReportContext context) throws IOException {
  checkNNStartup();
  verifyRequest(nodeReg);
  if(blockStateChangeLog.isDebugEnabled()) {
    blockStateChangeLog.debug("*BLOCK* NameNode.blockReport: "
         + "from " + nodeReg + ", reports.length=" + reports.length);
  }
  final BlockManager bm = namesystem.getBlockManager(); 
  boolean noStaleStorages = false;
  for (int r = 0; r < reports.length; r++) {
    final BlockListAsLongs blocks = reports[r].getBlocks();
    //
    // BlockManager.processReport accumulates information of prior calls
    // for the same node and storage, so the value returned by the last
    // call of this loop is the final updated value for noStaleStorage.
    //
    final int index = r;
    noStaleStorages = bm.runBlockOp(new Callable<Boolean>() {
      @Override
      public Boolean call() throws IOException {
        return bm.processReport(nodeReg, reports[index].getStorage(),
            blocks, context);
      }
    });
    metrics.incrStorageBlockReportOps();
  }
  bm.removeBRLeaseIfNeeded(nodeReg, context);
{code}
The current behavior is that each of the 12 reports are processed in a loop and 
before attempting to process a report, the validity of the lease is checked. If 
the lease expired, that report is not processed.

 
{code:java}
public boolean processReport(final DatanodeID nodeID,
    final DatanodeStorage storage,
    final BlockListAsLongs newReport,
    BlockReportContext context) throws IOException {
  namesystem.writeLock();
  final long startTime = Time.monotonicNow(); //after acquiring write lock
  final long endTime;
  DatanodeDescriptor node;
  Collection<Block> invalidatedBlocks = Collections.emptyList();
  String strBlockReportId =
      context != null ? Long.toHexString(context.getReportId()) : "";


.....

if (context != null) {
  if (!blockReportLeaseManager.checkLease(node, startTime,
        context.getLeaseId())) {
    return false; <------- ** returns here without processing the FBR below **
  }
}

if (storageInfo.getBlockReportCount() == 0) {
  // The first block report can be processed a lot more efficiently than
  // ordinary block reports.  This shortens restart times.
  blockLog.info("BLOCK* processReport 0x{}: Processing first "
          + "storage report for {} from datanode {}",
      strBlockReportId,
      storageInfo.getStorageID(),
      nodeID.getDatanodeUuid());
  processFirstBlockReport(storageInfo, newReport);
} else {
  invalidatedBlocks = processReport(storageInfo, newReport, context);
}
{code}

> Block report leases cause missing blocks until next report
> ----------------------------------------------------------
>
>                 Key: HDFS-12914
>                 URL: https://issues.apache.org/jira/browse/HDFS-12914
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.8.0
>            Reporter: Daryn Sharp
>            Priority: Critical
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for 
> conditions such as "unknown datanode", "not in pending set", "lease has 
> expired", wrong lease id, etc.  Lease rejection does not throw an exception.  
> It returns false which bubbles up to  {{NameNodeRpcServer#blockReport}} and 
> interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes 
> active with _no blocks_.  A replication storm ensues possibly causing DNs to 
> temporarily go dead (HDFS-12645), leading to more FBR lease rejections on 
> re-registration.  The cluster will have many "missing blocks" until the DNs 
> next FBR is sent and/or forced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12914) Block report leases cause missing blocks until next report

Reply via email to