[ https://issues.apache.org/jira/browse/HDFS-15468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156980#comment-17156980 ]
Karthik Palanisamy commented on HDFS-15468: ------------------------------------------- the repro steps: 1. Keep namenode in safemode. 2. Restart JN2, and JN3. Please ensure there is latest edits_inprogress* under JN dir. 3. Leave safe mode and write some new data. 4. Namenode crash with "Can't write, no segment open" > Active namenode crashed when no edit recover > -------------------------------------------- > > Key: HDFS-15468 > URL: https://issues.apache.org/jira/browse/HDFS-15468 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 3.0.0 > Reporter: Karthik Palanisamy > Priority: Critical > > if namenode is under safe mode and let restart two journal node for > maintenance activity. > In this case, the journal node will not finalize the last edit segment which > is edit in-progress. > This last edit segment will be finalized or recovered when edit rolling > operation else when epoch change due to namenode failover. > But the current scenario is no failover, just namenode is under safe mode. > If we leave the safe mode then active namenode will crash. > Ie. > the current open segment is edits_inprogress_0000000010356376710 but it is > not recovered or finalized post JN2 restart. I think we need to recover the > edits after JN restart. > {code:java} > Journal node > 2020-06-20 16:11:53,458 INFO server.Journal > (Journal.java:scanStorageForLatestEdits(193)) - Latest log is > EditLogFile(file=/hadoop/hdfs/journal/xxx/current/edits_inprogress_0000000010356376710,first=0000000010356376710,last=0000000010356376710,inProgress=true,hasCorruptHeader=false) > 2020-06-20 16:19:06,397 INFO ipc.Server (Server.java:logException(2435)) - > IPC Server handler 3 on 8485, call > org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.journal from > 10.x.x.x:28444 Call#49083225 Retry#0 > org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't > write, no segment open > at > org.apache.hadoop.hdfs.qjournal.server.Journal.checkSync(Journal.java:484) > {code} > {code:java} > {code:java} > Namenode log: > org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many > exceptions to achieve quorum size 2/3. 1 successful responses: > 10.x.x.x:8485: null [success] > 2 exceptions thrown: > 10.y.y.y:8485: Can't write, no segment open > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org