tomscut created HDFS-16379: ------------------------------ Summary: Reset fullBlockReportLeaseId after any exceptions Key: HDFS-16379 URL: https://issues.apache.org/jira/browse/HDFS-16379 Project: Hadoop HDFS Issue Type: Bug Reporter: tomscut Assignee: tomscut
Recently we encountered FBR-related problems in the production environment, which were solved by introducing HDFS-12914 and HDFS-14314. But there may be situations like this: 1 DN got *fullBlockReportLeaseId* via heartbeat. 2 DN trigger a blockReport, but some exception occurs (this may be rare, but it may exist), and then DN does multiple retries {*}without resetting leaseID{*}. Because leaseID is reset only if it succeeds currently. 3 After a while, the exception is cleared, but the LeaseID has expired. *Since NN did not throw an exception after the lease expired, the DN considered that the blockReport was successful.* So the blockReport was not actually executed this time and needs to wait until the next time. Therefore, {*}should we consider resetting the fullBlockReportLeaseId in the finally block{*}? The advantage of this is that lease expiration can be avoided. The downside is that each heartbeat will apply for a new fullBlockReportLeaseId during the exception, but I think this cost is negligible. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org