[ https://issues.apache.org/jira/browse/HDFS-10733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812538#comment-15812538 ]
Vinitha Reddy Gankidi commented on HDFS-10733: ---------------------------------------------- [~kihwal] Thanks for the great suggestion. I have attached a patch that increases the endtime/timeout if there is a long pause due to a Full GC in NN. The unit test included asserts that a timeout exception is thrown instead of increasing the timeout as in the case of a Full GC if there indeed aren't any responses from the journal nodes. Please take a look. > NameNode terminated after full GC thinking QJM is unresponsive. > --------------------------------------------------------------- > > Key: HDFS-10733 > URL: https://issues.apache.org/jira/browse/HDFS-10733 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, qjm > Affects Versions: 2.6.4 > Reporter: Konstantin Shvachko > Assignee: Vinitha Reddy Gankidi > Attachments: HDFS-10733.001.patch > > > NameNode went into full GC while in {{AsyncLoggerSet.waitForWriteQuorum()}}. > After completing GC it checks if the timeout for quorum is reached. If the GC > was long enough the timeout can expire, and {{QuorumCall.waitFor()}} will > throw {{TimeoutExcpetion}}. Finally {{FSEditLog.logSync()}} catches the > exception and terminates NameNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org