[jira] [Comment Edited] (HDFS-15391) Standby NameNode due loads the corruption edit log, the service exits and cannot be restarted

huhaiyang (Jira) Thu, 11 Jun 2020 21:29:10 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-15391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133910#comment-17133910
 ]


huhaiyang edited comment on HDFS-15391 at 6/12/20, 4:24 AM:
------------------------------------------------------------

[~ayushtkn] Thank you for reply!

Will try to reproduce.  However, the problem has not been repeated in the test 
environment。
I follow up and see if I can reproduce it？

{quote}
{quote}
    The block used by CloseOp twice is the same instance, which causes the 
first CloseOp has wrong block size.
{quote}
didn't quite understood this.
{quote}
In the first CloseOp(TXID=126060942290) block_11382080753  block size is 
63154347 and GENSTAMP   is 10354157480, but in fact in the first CloseOp 
block_11382080753 block size should be 108764672 and GENSTAMP should be  
10354071495.

And in the second CloseOp(TXID= 126060943585) block_11382080753  block size is 
63154347 and GENSTAMP   is 10354157480.

The block  block_11382080753 used by CloseOp twice is the same instance, the 
first CloseOp has wrong block information.



was (Author: haiyang hu):
[~ayushtkn] Thank you for reply!

Will try to reproduce.  However, the problem has not been repeated in the test 
environment。
I follow up and see if I can reproduce it？

{quote}
{quote}
    The block used by CloseOp twice is the same instance, which causes the 
first CloseOp has wrong block size.
{quote}
didn't quite understood this.
{quote}
in the first CloseOp(TXID=126060942290) block_11382080753  block size is 
63154347 and GENSTAMP   is 10354157480, but in fact in the first CloseOp 
block_11382080753 block size should be 108764672 and GENSTAMP should be  
10354071495.

and in the second CloseOp(TXID= 126060943585) block_11382080753  block size is 
63154347 and GENSTAMP   is 10354157480.

The block  block_11382080753 used by CloseOp twice is the same instance, the 
first CloseOp has wrong block information.


> Standby NameNode due loads the corruption edit log, the service exits and 
> cannot be restarted
> ---------------------------------------------------------------------------------------------
>
>                 Key: HDFS-15391
>                 URL: https://issues.apache.org/jira/browse/HDFS-15391
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 3.2.0
>            Reporter: huhaiyang
>            Priority: Critical
>
> In the cluster version 3.2.0 production environment，
>  We found that due to edit log corruption, Standby NameNode could not 
> properly load the Ediltog log, result in abnormal exit of the service and 
> failure to restart
> {noformat}
> The specific scenario is that Flink writes to HDFS(replication file), and in 
> the case of an exception to the write file, the following operations are 
> performed :
> 1.close file
> 2.open file
> 3.truncate file
> 4.append file
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-15391) Standby NameNode due loads the corruption edit log, the service exits and cannot be restarted

Reply via email to