[ 
https://issues.apache.org/jira/browse/HDFS-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140607#comment-17140607
 ] 

Kihwal Lee commented on HDFS-15421:
-----------------------------------

Example of a leak itself: (single replica shown for simplicity)

1) IBRs queued. The file was created, data written to it and closed.  Then it 
was opened for append, additional data written and closed.
{noformat}
2020-06-19 02:38:27,423 [Block report processor] INFO 
blockmanagement.BlockManager: Queueing reported block 
blk_1521788462_1099975774416 in state FINALIZED from datanode 1.2.3.4:1004 for 
later processing because generation stamp is in the future.
2020-06-19 02:38:28,190 [Block report processor] INFO 
blockmanagement.BlockManager: Queueing reported block 
blk_1521788462_1099975774420 in state FINALIZED from datanode 1.2.3.4:1004 for 
later processing because generation stamp is in the future.
{noformat}

2) Processing of queued IBRs as edits replayed.  The IBR with the first gen 
stamp for the initial file is processed. The one from append is not, as the gen 
stamp is still in the future. It is re-queued.
{noformat}
2020-06-19 02:42:22,774 [Edit log tailer] INFO blockmanagement.BlockManager: 
Processing previouly queued message ReportedBlockInfo 
[block=blk_1521788462_1099975774416, dn=1.2.3.4:1004, reportedState=FINALIZED]
2020-06-19 02:42:22,774 [Edit log tailer] INFO blockmanagement.BlockManager: 
Processing previouly queued message ReportedBlockInfo 
[block=blk_1521788462_1099975774420, dn=1.2.3.4:1004, reportedState=FINALIZED]
2020-06-19 02:42:22,774 [Edit log tailer] INFO blockmanagement.BlockManager: 
Queueing reported block blk_1521788462_1099975774420 in state FINALIZED from 
datanode 1.2.3.4:1004 for later processing because generation stamp is in the 
future.
{noformat}

3) When the edits for append is replayed.  The IBR is still identified as from 
future and re-queued.  Since there is no more edits regarding this file, the 
IBR is leaked.
{noformat}
2020-06-19 02:42:22,776 [Edit log tailer] INFO blockmanagement.BlockManager: 
Processing previouly queued message ReportedBlockInfo 
[block=blk_1521788462_1099975774420, dn=1.2.3.4:1004, reportedState=FINALIZED]
2020-06-19 02:42:22,776 [Edit log tailer] INFO blockmanagement.BlockManager: 
Queueing reported block blk_1521788462_1099975774420 in state FINALIZED from 
datanode 1.2.3.4:1004 for later processing because generation stamp is in the 
future.
{noformat}

With HDFS-14941 reverted, the last IBR is processed as expected and the leak 
does not happen anymore.


> IBR leak causes standby NN to be stuck in safe mode
> ---------------------------------------------------
>
>                 Key: HDFS-15421
>                 URL: https://issues.apache.org/jira/browse/HDFS-15421
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Kihwal Lee
>            Priority: Critical
>
> After HDFS-14941, update of the global gen stamp is delayed in certain 
> situations.  This makes the last set of incremental block reports from append 
> "from future", which causes it to be simply re-queued to the pending DN 
> message queue, rather than processed to complete the block.  The last set of 
> IBRs will leak and never cleaned until it transitions to active.  The size of 
> {{pendingDNMessages}} constantly grows until then.
> If a leak happens while in a startup safe mode, the namenode will never be 
> able to come out of safe mode on its own.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to