[ https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843829#comment-16843829 ]
star edited comment on HDFS-12914 at 5/20/19 10:07 AM: ------------------------------------------------------- [~smarella] how many DNs do you have? According to the limited logs, I think it is caused by following case. A high cpu load of SNN delayed the processing of full block report. ||DN1...||DN2|| |register|register| |request Lease| | |process Report| | |...|request Lease| |process Report|{color:#707070}_more than 5 minutes_{color}| |...|process Report| There's no logs between 2019-05-16 15:15:35 and 2019-05-16 15:31:11. Logs unrelated to 10.54.63.120:50010 are filtered out, right [~smarella]? In that time, I think the SNN is processing blockreports from other DN. Untill 2019-05-16 15:31:11, SNN began to process block reports from that DN. It is 6 minutes after when full block lease id is requested, beyond default expire value 5 minutes (DFS_NAMENODE_FULL_BLOCK_REPORT_LEASE_LENGTH_MS_DEFAULT). Don't known when a full block lease id is got from server, for there's no info log about it. I guess it's about 5 minutes before the first failed report, say 15:26:29. was (Author: starphin): [~smarella] how many DNs do you have? According to the limited logs, I think it is caused by following case. A high cpu load of SNN delayed the processing of full block report. ||DN1...||DN2|| |register|register| |request Lease| | |process Request| | |...|request Lease| |process Request|{color:#707070}_more than 5 minutes_{color}| |...|process Request| There's no logs between 2019-05-16 15:15:35 and 2019-05-16 15:31:11. Logs unrelated to 10.54.63.120:50010 are filtered out, right [~smarella]? In that time, I think the SNN is processing blockreports from other DN. Untill 2019-05-16 15:31:11, SNN began to process block reports from that DN. It is 6 minutes after when full block lease id is requested, beyond default expire value 5 minutes (DFS_NAMENODE_FULL_BLOCK_REPORT_LEASE_LENGTH_MS_DEFAULT). Don't known when a full block lease id is got from server, for there's no info log about it. I guess it's about 5 minutes before the first failed report, say 15:26:29. > Block report leases cause missing blocks until next report > ---------------------------------------------------------- > > Key: HDFS-12914 > URL: https://issues.apache.org/jira/browse/HDFS-12914 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 2.8.0 > Reporter: Daryn Sharp > Priority: Critical > > {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for > conditions such as "unknown datanode", "not in pending set", "lease has > expired", wrong lease id, etc. Lease rejection does not throw an exception. > It returns false which bubbles up to {{NameNodeRpcServer#blockReport}} and > interpreted as {{noStaleStorages}}. > A re-registering node whose FBR is rejected from an invalid lease becomes > active with _no blocks_. A replication storm ensues possibly causing DNs to > temporarily go dead (HDFS-12645), leading to more FBR lease rejections on > re-registration. The cluster will have many "missing blocks" until the DNs > next FBR is sent and/or forced. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org