[jira] [Commented] (HDFS-16950) Gap in edits after -initializeSharedEdits

2023-10-26 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780001#comment-17780001
 ] 

Wei-Chiu Chuang commented on HDFS-16950:


Karthik said because of the missing edit logs it caused data loss. And it's 
reproducible.

A workaround would be to enter the NN in safe mode, take checkpoint, before 
proceed with the migration.


> Gap in edits after -initializeSharedEdits
> -
>
> Key: HDFS-16950
> URL: https://issues.apache.org/jira/browse/HDFS-16950
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: journal-node, namenode
>Reporter: Karthik Palanisamy
>Priority: Critical
>
> Namenode failed in the production cluster when JN role is migrated. 
> {code:java}
> ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start 
> namenode.
> java.io.IOException: There appears to be a gap in the edit log.  We expected 
> txid xx, but got txid xx. {code}
> InitializeSharedEdits issued as part of the role migration step. Note, no 
> checkpoint is performed in the past few hours. 
> InitializeSharedEdits created a new log segment from the edit_inprogres 
> transaction and deleted all old transactions. 
> My ask here is to delete any edit transaction older than the fimage 
> transaction. But currently, it deletes all transactions and no check is 
> enforced in JNStorage#format(). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16950) Gap in edits after -initializeSharedEdits

2023-07-04 Thread Srinivasu Majeti (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17739929#comment-17739929
 ] 

Srinivasu Majeti commented on HDFS-16950:
-

Hi [~kpalanisamy] , Could we make this a bug instead of an improvement ? 

> Gap in edits after -initializeSharedEdits
> -
>
> Key: HDFS-16950
> URL: https://issues.apache.org/jira/browse/HDFS-16950
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: journal-node, namenode
>Reporter: Karthik Palanisamy
>Priority: Major
>
> Namenode failed in the production cluster when JN role is migrated. 
> {code:java}
> ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start 
> namenode.
> java.io.IOException: There appears to be a gap in the edit log.  We expected 
> txid xx, but got txid xx. {code}
> InitializeSharedEdits issued as part of the role migration step. Note, no 
> checkpoint is performed in the past few hours. 
> InitializeSharedEdits created a new log segment from the edit_inprogres 
> transaction and deleted all old transactions. 
> My ask here is to delete any edit transaction older than the fimage 
> transaction. But currently, it deletes all transactions and no check is 
> enforced in JNStorage#format(). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16950) Gap in edits after -initializeSharedEdits

2023-03-15 Thread Karthik Palanisamy (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17700860#comment-17700860
 ] 

Karthik Palanisamy commented on HDFS-16950:
---

For example: 

 

NN meta dir:
{code:java}
-rw-r--r-- 1 hdfs hdfs  18K Mar 14 23:51 fsimage_0003493
-rw-r--r-- 1 hdfs hdfs   62 Mar 14 23:51 fsimage_0003493.md5
-rw-r--r-- 1 hdfs hdfs  193 Mar 14 23:51 VERSION
-rw-r--r-- 1 hdfs hdfs 1.0M Mar 15 00:06 
edits_0003494-0003670
-rw-r--r-- 1 hdfs hdfs 2.3K Mar 15 00:13 
edits_0003671-0003689
-rw-r--r-- 1 hdfs hdfs 1.0M Mar 15 00:14 
edits_0003690-0003696
-rw-r--r-- 1 hdfs hdfs 2.3K Mar 15 00:18 
edits_0003697-0003718
-rw-r--r-- 1 hdfs hdfs5 Mar 15 00:18 seen_txid
-rw-r--r-- 1 hdfs hdfs 1.0M Mar 15 00:18 edits_inprogress_0003719 
{code}
JN format is issued which removed all the edits in the JN meta dir:
{code:java}
2023-03-15 00:22:02,321 INFO  [main] common.Storage 
(Storage.java:clearDirectory(442)) - Will remove files: 
[/data/dfs/jn/current/edits_0003337-0003487, 
/data/dfs/jn/current/seen_txid, 
/data/dfs/jn/current/edits_0003488-0003489, 
/data/dfs/jn/current/VERSION, 
/data/dfs/jn/current/edits_0003490-0003491, 
/data/dfs/jn/current/edits_0003492-0003493, 
/data/dfs/jn/current/edits_0003494-0003670, 
/data/dfs/jn/current/edits_0003697-0003718, 
/data/dfs/jn/current/edits_inprogress_0003719] {code}
In the end, it created a new log segment from edits_inprogress. 
{code:java}
(FileJournalManager.java:finalizeLogSegment(145)) - Finalizing edits file 
/data/dfs/jn/current/edits_inprogress_0003719 -> 
/data/dfs/jn/current/edits_0003719-0003736 {code}
So we lost trxn between fsimage and edit_inprogress, resulting edit gap. 

> Gap in edits after -initializeSharedEdits
> -
>
> Key: HDFS-16950
> URL: https://issues.apache.org/jira/browse/HDFS-16950
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: journal-node, namenode
>Reporter: Karthik Palanisamy
>Priority: Major
>
> Namenode failed in the production cluster when JN role is migrated. 
> {code:java}
> ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start 
> namenode.
> java.io.IOException: There appears to be a gap in the edit log.  We expected 
> txid xx, but got txid xx. {code}
> InitializeSharedEdits issued as part of the role migration step. Note, no 
> checkpoint is performed in the past few hours. 
> InitializeSharedEdits created a new log segment from the edit_inprogres 
> transaction and deleted all old transactions. 
> My ask here is to delete any edit transaction older than the fimage 
> transaction. But currently, it deletes all transactions and no check is 
> enforced in JNStorage#format(). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org