[ https://issues.apache.org/jira/browse/HDFS-16942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697630#comment-17697630 ]
ASF GitHub Bot commented on HDFS-16942: --------------------------------------- sodonnel commented on PR #5460: URL: https://github.com/apache/hadoop/pull/5460#issuecomment-1458896438 Not sure what to do about this checkstyle error: ``` ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/InvalidBlockReportLeaseException.java:1:/**: Missing package-info.java file. [JavadocPackage] ``` If I add the file: ``` hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/package-info.java ``` I get an enforcer error: ``` [INFO] -------< org.apache.hadoop:hadoop-client-check-test-invariants >-------- [INFO] Building Apache Hadoop Client Packaging Invariants for Test 3.4.0-SNAPSHOT [106/113] [INFO] --------------------------------[ pom ]--------------------------------- [INFO] [INFO] - > Send error to datanode if FBR is rejected due to bad lease > ---------------------------------------------------------- > > Key: HDFS-16942 > URL: https://issues.apache.org/jira/browse/HDFS-16942 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode > Reporter: Stephen O'Donnell > Assignee: Stephen O'Donnell > Priority: Major > Labels: pull-request-available > > When a datanode sends a FBR to the namenode, it requires a lease to send it. > On a couple of busy clusters, we have seen an issue where the DN is somehow > delayed in sending the FBR after requesting the least. Then the NN rejects > the FBR and logs a message to that effect, but from the Datanodes point of > view, it thinks the report was successful and does not try to send another > report until the 6 hour default interval has passed. > If this happens to a few DNs, there can be missing and under replicated > blocks, further adding to the cluster load. Even worse, I have see the DNs > join the cluster with zero blocks, so it is not obvious the under replication > is caused by lost a FBR, as all DNs appear to be up and running. > I believe we should propagate an error back to the DN if the FBR is rejected, > that way, the DN can request a new lease and try again. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org