[ 
https://issues.apache.org/jira/browse/HDFS-14075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16695235#comment-16695235
 ] 

Rong Tang commented on HDFS-14075:
----------------------------------

[~goiri] [~ayushtkn]  Here is my understanding after reading the code.

First, active NN gets RPC call to roll out new segment. Here it throws 
exception because of "too few journals successfully started.". this exception 
is swallowed, leaving editLogStream is null, and state in incorrect IN_SEGMENT.

Second is active NN checking in loop if "whether edit log wants to sync.".  

The current behavior is that it throws NPE, becaue editLogStream is null. I 
prefer to return false when editLogStream is null,  meaning that it is not 
ready to sync.

I think the question is when it fails to start a new segment in JNs, should the 
active NN terminates? or should it continues without syncing until some time 
the JNs are back?

 

[~ayushtkn] what triggers it? and do you have a way to re-pro it?

 

one related check-in 
[https://github.com/apache/hadoop/commit/3b00eaea256d252be3361a7d9106b88756fcb9ba]

 

 

> NPE while Edit Logging
> ----------------------
>
>                 Key: HDFS-14075
>                 URL: https://issues.apache.org/jira/browse/HDFS-14075
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Ayush Saxena
>            Assignee: Ayush Saxena
>            Priority: Critical
>         Attachments: HDFS-14075-01.patch, HDFS-14075-02.patch, 
> HDFS-14075-03.patch, HDFS-14075-04.patch, HDFS-14075-04.patch, 
> HDFS-14075-04.patch, HDFS-14075-05.patch, HDFS-14075-06.patch
>
>
> {noformat}
> 2018-11-10 18:59:38,427 FATAL 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Exception while edit 
> logging: null
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.doEditTransaction(FSEditLog.java:481)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync$Edit.logEdit(FSEditLogAsync.java:288)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.run(FSEditLogAsync.java:232)
>  at java.lang.Thread.run(Thread.java:745)
> 2018-11-10 18:59:38,532 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 1: Exception while edit logging: null
> 2018-11-10 18:59:38,552 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> SHUTDOWN_MSG:
> {noformat}
> Before NPE Received the following Exception
> {noformat}
> INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 65110, call 
> Call#23241 Retry#0 
> org.apache.hadoop.hdfs.server.protocol.NamenodeProtocol.rollEditLog from 
> XXXXXXXX
> java.io.IOException: Unable to start log segment 7964819: too few journals 
> successfully started.
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.startLogSegment(FSEditLog.java:1385)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.startLogSegmentAndWriteHeaderTxn(FSEditLog.java:1395)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1319)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1352)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4669)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1293)
>       at 
> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:146)
>       at 
> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12974)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>       at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:878)
>       at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:824)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2684)
> Caused by: java.io.IOException: starting log segment 7964819 failed for too 
> many journals
>       at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:412)
>       at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.startLogSegment(JournalSet.java:207)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.startLogSegment(FSEditLog.java:1383)
>       ... 15 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to