[ 
https://issues.apache.org/jira/browse/HDFS-16849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697582#comment-17697582
 ] 

Karthik Palanisamy commented on HDFS-16849:
-------------------------------------------

Yes Arpit, SNN keeps retrying but fail always until we reboot the namenode.  
local exception: org.apache.hadoop.security.KerberosAuthException: Login 
failure for user: hdfs/xxxx  javax.security.auth.login.LoginException: Client 
not found in Kerberos database (6)]
The problem is our checkpoint which didn't run. 

Customers think that the checkpoint was doing fine since SNN up.  But in 
reality, SNN is dead-state. 

> Terminate SNN when failing to perform EditLogTailing
> ----------------------------------------------------
>
>                 Key: HDFS-16849
>                 URL: https://issues.apache.org/jira/browse/HDFS-16849
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Karthik Palanisamy
>            Priority: Major
>
> We should terminate SNN if we fail LogTrailing for sufficient JN. We found 
> this after Kerberos error. 
> {code:java}
> 2022-10-14 10:53:16,796 INFO 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 6001 ms 
> (timeout=20000 ms) for a response for selectStreamingInputStreams. Exceptions 
> so far: [xxxx:8485:  DestHost:destPort xxxx:8485 , LocalHost:localPort 
> xxxx/xxxx:0. Failed on local exception: 
> org.apache.hadoop.security.KerberosAuthException: Login failure for user: 
> hdfs/xxxx  javax.security.auth.login.LoginException: Client not found in 
> Kerberos database (6)]
> 2022-10-14 10:53:30,796 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Unable to determine input 
> streams from QJM to [xxxx:8485, yyyy:8485, zzzz:8485]. Skipping.
> java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to 
> respond.
>         at 
> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:138)
>         at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectStreamingInputStreams(QuorumJournalManager.java:605)
>         at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:523)
>         at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:269)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1673)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1706)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:311)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:464)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:414)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:431)
>         at java.base/java.security.AccessController.doPrivileged(Native 
> Method)
>         at java.base/javax.security.auth.Subject.doAs(Subject.java:361)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
>         at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:480)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:427)
>  {code}
>  
> We have no check whether sufficient JN met: 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java#L280]
> So we should implement a similar check this,
> [https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java#L395]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to