[ https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843647#comment-16843647 ]
Santosh Marella commented on HDFS-12914: ---------------------------------------- [~jojochuang] - We might have hit this issue recently and some initial investigation seems to lead to this issue described by [~daryn] in the description. I'm new to this area, but from what I have learnt so far, it seems that HDFS-14314 fixes a related, but slightly different scenario. Please see below. We have a DN that has 12 disks. When we restarted a standby NN, the DN registers itself, gets a lease for FBR and sends a FBR containing 12 reports, one for each disk. However, only 3 of them got processed and the 9 aren't processed, as the lease had expired before processing the 4th report. This essentially meant the FBR was partially processed, and, in our case, this *might* be one of the reasons it's taking too long for the NN to come out of safe mode (as the safe block count is taking too long to reach the threshold due to partial FBR processing). Some raw notes that I've collected investigating this issue. Sorry for being verbose, but hope it helps everyone. We are on 2.9.2 version of Hadoop. ---- Dug through the logs and observed this for a DN for which only 3 out of 12 reports were processed. The DN registered itself with the NN, then sent a FBR that contained 12 reports (one for each disk). The NN processed 3 of them (indicated by *processing first storage report* and *processing time* entries in the log statements). However, for the 9 remaining reports, it prints *lease xxx is not valid for DN* messages. {code:java} 2019-05-16 15:15:35,028 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* registerDatanode: from DatanodeRegistration(10.54.63.120:50010, datanodeUuid=7493442a-c552-43f4-b6bd-728be292f66d, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-57;cid=CID-f4a0a2ae-9e3d-41d5-b98a-e0e77ed0249b;nsid=682930173;c=1406912757005) storage 7493442a-c552-43f4-b6bd-728be292f66d 2019-05-16 15:15:35,028 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager: Registered DN 7493442a-c552-43f4-b6bd-728be292f66d (10.54.63.120:50010). 2019-05-16 15:31:11,578 INFO BlockStateChange: BLOCK* processReport 0xb4fb52822c9e3f03: Processing first storage report for DS-3e8d8352-ecc9-45cb-a39b-86f10d8aa386 from datanode 7493442a-c552-43f4-b6bd-728be292f66d 2019-05-16 15:31:11,941 INFO BlockStateChange: BLOCK* processReport 0xb4fb52822c9e3f03: from storage DS-3e8d8352-ecc9-45cb-a39b-86f10d8aa386 node DatanodeRegistration(10.54.63.120:50010, datanodeUuid=7493442a-c552-43f4-b6bd-728be292f66d, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-57;cid=CID-f4a0a2ae-9e3d-41d5-b98a-e0e77ed0249b;nsid=682930173;c=1406912757005), blocks: 12690, hasStaleStorage: true, processing time: 363 msecs, invalidatedBlocks: 0 2019-05-16 15:31:17,496 INFO BlockStateChange: BLOCK* processReport 0xb4fb52822c9e3f03: Processing first storage report for DS-600c8ca1-3f99-41fc-a784-f663b928fe21 from datanode 7493442a-c552-43f4-b6bd-728be292f66d 2019-05-16 15:31:17,851 INFO BlockStateChange: BLOCK* processReport 0xb4fb52822c9e3f03: from storage DS-600c8ca1-3f99-41fc-a784-f663b928fe21 node DatanodeRegistration(10.54.63.120:50010, datanodeUuid=7493442a-c552-43f4-b6bd-728be292f66d, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-57;cid=CID-f4a0a2ae-9e3d-41d5-b98a-e0e77ed0249b;nsid=682930173;c=1406912757005), blocks: 12670, hasStaleStorage: true, processing time: 355 msecs, invalidatedBlocks: 0 2019-05-16 15:31:23,465 INFO BlockStateChange: BLOCK* processReport 0xb4fb52822c9e3f03: Processing first storage report for DS-3e7dc4c5-ab4e-40d1-8f32-64fe28081f94 from datanode 7493442a-c552-43f4-b6bd-728be292f66d 2019-05-16 15:31:23,821 INFO BlockStateChange: BLOCK* processReport 0xb4fb52822c9e3f03: from storage DS-3e7dc4c5-ab4e-40d1-8f32-64fe28081f94 node DatanodeRegistration(10.54.63.120:50010, datanodeUuid=7493442a-c552-43f4-b6bd-728be292f66d, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-57;cid=CID-f4a0a2ae-9e3d-41d5-b98a-e0e77ed0249b;nsid=682930173;c=1406912757005), blocks: 12698, hasStaleStorage: true, processing time: 356 msecs, invalidatedBlocks: 0 2019-05-16 15:31:29,419 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager: Removing expired block report lease 0xfd013f0084d0ed2d for DN 7493442a-c552-43f4-b6bd-728be292f66d. 2019-05-16 15:31:29,419 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager: BR lease 0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, because the lease has expired. 2019-05-16 15:31:35,891 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager: BR lease 0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, because the DN is not in the pending set. 2019-05-16 15:31:46,775 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager: BR lease 0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, because the DN is not in the pending set. 2019-05-16 15:31:57,719 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager: BR lease 0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, because the DN is not in the pending set. 2019-05-16 15:32:08,791 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager: BR lease 0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, because the DN is not in the pending set. 2019-05-16 15:32:19,706 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager: BR lease 0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, because the DN is not in the pending set. 2019-05-16 15:32:30,660 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager: BR lease 0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, because the DN is not in the pending set. 2019-05-16 15:32:42,248 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager: BR lease 0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, because the DN is not in the pending set. 2019-05-16 15:32:52,921 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager: BR lease 0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, because the DN is not in the pending set.{code} On the DN side, we can see that it sent 12 reports in the FBR in a single RPC: {code:java} smarella[7]smf1-cjn-22-sr1[hadoop-tst-worker] hadoop $ grep "0xb4fb52822c9e3f03" hadoop-hdfs-datanode-smf1-cjn-22-sr1.prod.twitter.com.log* hadoop-hdfs-datanode-smf1-cjn-22-sr1.prod.twitter.com.log:2019-05-16 15:33:03,692 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Successfully sent block report 0xb4fb52822c9e3f03, containing 12 storage report(s), of which we sent 12. The reports had 151730 total blocks and used 1 RPC(s). This took 37 msec to generate and 208787 msecs for RPC and NN processing. Got back no commands. {code} Looked at the code and it indeed seems to be the case that the lease has expired before all the 12 reports were processed. {code:java} @Override // DatanodeProtocol public DatanodeCommand blockReport(final DatanodeRegistration nodeReg, String poolId, final StorageBlockReport[] reports, final BlockReportContext context) throws IOException { checkNNStartup(); verifyRequest(nodeReg); if(blockStateChangeLog.isDebugEnabled()) { blockStateChangeLog.debug("*BLOCK* NameNode.blockReport: " + "from " + nodeReg + ", reports.length=" + reports.length); } final BlockManager bm = namesystem.getBlockManager(); boolean noStaleStorages = false; for (int r = 0; r < reports.length; r++) { final BlockListAsLongs blocks = reports[r].getBlocks(); // // BlockManager.processReport accumulates information of prior calls // for the same node and storage, so the value returned by the last // call of this loop is the final updated value for noStaleStorage. // final int index = r; noStaleStorages = bm.runBlockOp(new Callable<Boolean>() { @Override public Boolean call() throws IOException { return bm.processReport(nodeReg, reports[index].getStorage(), blocks, context); } }); metrics.incrStorageBlockReportOps(); } bm.removeBRLeaseIfNeeded(nodeReg, context); {code} The current behavior is that each of the 12 reports are processed in a loop and before attempting to process a report, the validity of the lease is checked. If the lease expired, that report is not processed. {code:java} public boolean processReport(final DatanodeID nodeID, final DatanodeStorage storage, final BlockListAsLongs newReport, BlockReportContext context) throws IOException { namesystem.writeLock(); final long startTime = Time.monotonicNow(); //after acquiring write lock final long endTime; DatanodeDescriptor node; Collection<Block> invalidatedBlocks = Collections.emptyList(); String strBlockReportId = context != null ? Long.toHexString(context.getReportId()) : ""; ..... if (context != null) { if (!blockReportLeaseManager.checkLease(node, startTime, context.getLeaseId())) { return false; <------- ** returns here without processing the FBR below ** } } if (storageInfo.getBlockReportCount() == 0) { // The first block report can be processed a lot more efficiently than // ordinary block reports. This shortens restart times. blockLog.info("BLOCK* processReport 0x{}: Processing first " + "storage report for {} from datanode {}", strBlockReportId, storageInfo.getStorageID(), nodeID.getDatanodeUuid()); processFirstBlockReport(storageInfo, newReport); } else { invalidatedBlocks = processReport(storageInfo, newReport, context); } {code} > Block report leases cause missing blocks until next report > ---------------------------------------------------------- > > Key: HDFS-12914 > URL: https://issues.apache.org/jira/browse/HDFS-12914 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 2.8.0 > Reporter: Daryn Sharp > Priority: Critical > > {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for > conditions such as "unknown datanode", "not in pending set", "lease has > expired", wrong lease id, etc. Lease rejection does not throw an exception. > It returns false which bubbles up to {{NameNodeRpcServer#blockReport}} and > interpreted as {{noStaleStorages}}. > A re-registering node whose FBR is rejected from an invalid lease becomes > active with _no blocks_. A replication storm ensues possibly causing DNs to > temporarily go dead (HDFS-12645), leading to more FBR lease rejections on > re-registration. The cluster will have many "missing blocks" until the DNs > next FBR is sent and/or forced. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org