[ https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16845764#comment-16845764 ]
He Xiaoqiao edited comment on HDFS-12914 at 5/22/19 10:48 AM: -------------------------------------------------------------- [~smarella], some minor comments about [^HDFS-12914-trunk.01.patch] , a. we need to check if #context is null when check lease; b. maybe we should catch #UnregisteredNodeException and return {{RegisterCommand.REGISTER}} also; c. {{datanodeManager.getDatanode(nodeId)}} is possible to return null, so we should check {{null}} before pass as one parameter of BlockReportLeaseManager#checkLease; d. it is better to add some unit test as [~jojochuang] and [~starphin] mentioned above. was (Author: hexiaoqiao): [~smarella], some minor comments about [^HDFS-12914-trunk.01.patch] , a. we need to check if #context is null when check lease; b. maybe we should catch #UnregisteredNodeException and return {{RegisterCommand.REGISTER}} also; c. {{datanodeManager.getDatanode(nodeId)}} is possible to return null, so we should check {{null}} before pass as one parameter of BlockReportLeaseManager#checkLease; d. it is better to add some unit test as [~starphin] mentioned above. > Block report leases cause missing blocks until next report > ---------------------------------------------------------- > > Key: HDFS-12914 > URL: https://issues.apache.org/jira/browse/HDFS-12914 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 2.8.0, 2.9.2 > Reporter: Daryn Sharp > Assignee: Santosh Marella > Priority: Critical > Attachments: HDFS-12914-branch-2.001.patch, > HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch > > > {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for > conditions such as "unknown datanode", "not in pending set", "lease has > expired", wrong lease id, etc. Lease rejection does not throw an exception. > It returns false which bubbles up to {{NameNodeRpcServer#blockReport}} and > interpreted as {{noStaleStorages}}. > A re-registering node whose FBR is rejected from an invalid lease becomes > active with _no blocks_. A replication storm ensues possibly causing DNs to > temporarily go dead (HDFS-12645), leading to more FBR lease rejections on > re-registration. The cluster will have many "missing blocks" until the DNs > next FBR is sent and/or forced. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org