[ 
https://issues.apache.org/jira/browse/HDFS-16379?focusedWorklogId=694359&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-694359
 ]

ASF GitHub Bot logged work on HDFS-16379:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 11/Dec/21 02:50
            Start Date: 11/Dec/21 02:50
    Worklog Time Spent: 10m 
      Work Description: tomscut opened a new pull request #3787:
URL: https://github.com/apache/hadoop/pull/3787


   JIRA: [HDFS-16379](https://issues.apache.org/jira/browse/HDFS-16379).
   
   Recently we encountered FBR-related problems in the production environment, 
which were solved by introducing HDFS-12914 and HDFS-14314.
   
   But there may be situations like this:
   1 DN got `fullBlockReportLeaseId` via heartbeat.
   
   2 DN trigger a blockReport, but some exception occurs (this may be rare, but 
it may exist), and then DN does multiple retries without resetting 
fullBlockReportLeaseId. Because fullBlockReportLeaseId is reset only if it 
succeeds currently.
   
   3 After a while, the exception is cleared, but the `fullBlockReportLeaseId` 
has expired. Since NN did not throw an exception after the lease expired, the 
DN considered that the blockReport was successful. So the blockReport was not 
actually executed this time and needs to wait until the next time.
   
   Therefore, should we consider resetting the `fullBlockReportLeaseId` in the 
finally block? The advantage of this is that lease expiration can be avoided. 
The downside is that each heartbeat will apply for a new 
`fullBlockReportLeaseId` during the exception, but I think this cost is 
negligible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 694359)
    Remaining Estimate: 0h
            Time Spent: 10m

> Reset fullBlockReportLeaseId after any exceptions
> -------------------------------------------------
>
>                 Key: HDFS-16379
>                 URL: https://issues.apache.org/jira/browse/HDFS-16379
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: tomscut
>            Assignee: tomscut
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Recently we encountered FBR-related problems in the production environment, 
> which were solved by introducing HDFS-12914 and HDFS-14314.
> But there may be situations like this:
> 1 DN got *fullBlockReportLeaseId* via heartbeat.
> 2 DN trigger a blockReport, but some exception occurs (this may be rare, but 
> it may exist), and then DN does multiple retries {*}without resetting 
> leaseID{*}. Because leaseID is reset only if it succeeds currently.
> 3 After a while, the exception is cleared, but the LeaseID has expired. 
> *Since NN did not throw an exception after the lease expired, the DN 
> considered that the blockReport was successful.* So the blockReport was not 
> actually executed this time and needs to wait until the next time.
> Therefore, {*}should we consider resetting the fullBlockReportLeaseId in the 
> finally block{*}? The advantage of this is that lease expiration can be 
> avoided. The downside is that each heartbeat will apply for a new 
> fullBlockReportLeaseId during the exception, but I think this cost is 
> negligible.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to