[jira] [Commented] (HDFS-15468) Active namenode crashed when no edit recover

Ayush Saxena (Jira) Fri, 23 Oct 2020 02:59:29 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-15468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219593#comment-17219593
 ]


Ayush Saxena commented on HDFS-15468:
-------------------------------------

Got some time to think about this. I think this can be chased. Well I feel this 
won't be a very common scenario but still can occur in some specific cases. So 
I feel like we can add a dfsAdmin Api, which in case the namenode is in 
safemode and syncTxn id and current Txn id are same. we can recover the JN's
If no objections to the approach I will try this out :-)


> Active namenode crashed when no edit recover
> --------------------------------------------
>
>                 Key: HDFS-15468
>                 URL: https://issues.apache.org/jira/browse/HDFS-15468
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha, journal-node, namenode
>    Affects Versions: 3.0.0
>            Reporter: Karthik Palanisamy
>            Priority: Critical
>
> if namenode is under safe mode and let restart two journal node for 
> maintenance activity.
>  In this case, the journal node will not finalize the last edit segment which 
> is edit in-progress. 
>  This last edit segment will be finalized or recovered when edit rolling 
> operation else when epoch change due to namenode failover.
>  But the current scenario is no failover, just namenode is under safe mode. 
> If we leave the safe mode then active namenode will crash.
>  Ie.
>  the current open segment is edits_inprogress_0000000010356376710 but it is 
> not recovered or finalized post JN2 restart. I think we need to recover the 
> edits after JN restart. 
> {code:java}
> Journal node 
> 2020-06-20 16:11:53,458 INFO  server.Journal 
> (Journal.java:scanStorageForLatestEdits(193)) - Latest log is 
> EditLogFile(file=/hadoop/hdfs/journal/xxx/current/edits_inprogress_0000000010356376710,first=0000000010356376710,last=0000000010356376710,inProgress=true,hasCorruptHeader=false)
> 2020-06-20 16:19:06,397 INFO  ipc.Server (Server.java:logException(2435)) - 
> IPC Server handler 3 on 8485, call 
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.journal from 
> 10.x.x.x:28444 Call#49083225 Retry#0
> org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't 
> write, no segment open
>         at 
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkSync(Journal.java:484)
> {code}
> {code:java}
> {code:java}
> Namenode log:
> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many 
> exceptions to achieve quorum size 2/3. 1 successful responses:
> 10.x.x.x:8485: null [success]
> 2 exceptions thrown:
> 10.y.y.y:8485: Can't write, no segment open
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15468) Active namenode crashed when no edit recover

Reply via email to