[ https://issues.apache.org/jira/browse/MAPREDUCE-6657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246282#comment-15246282 ]
Daniel Templeton commented on MAPREDUCE-6657: --------------------------------------------- Thanks for the patch, [~haibochen]. I hate that HDFS expects you to parse the text of their exceptions to figure out what's going on. Wanna look into whether the API would allow you to throw a properly typed exception? Maybe just file a followup JIRA? In your test code, it would be nice to add a javadoc header that explains what you're testing. I don't love that you're running two mini-clusters and ignoring one of them. Is there any way to do the test with the existing mini-cluster without disrupting the other tests? If not, I'd consider creating a new test class so that you don't have two mini-clusters running. Is 2000ms the shortest reasonable duration for the timeout? Seems long to me... {code} Assert.assertEquals("Job History Server is expected to time out.", {code} Your assert message is misleading. It should instead say that it didn't get the expected error message. > job history server can fail on startup when NameNode is in start phase > ---------------------------------------------------------------------- > > Key: MAPREDUCE-6657 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6657 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver > Reporter: Haibo Chen > Assignee: Haibo Chen > Attachments: mapreduce6657.001.patch, mapreduce6657.002.patch > > > Job history server will try to create a history directory in HDFS on startup. > When NameNode is in safe mode, it will keep retrying for a configurable time > period. However, it should also keeps retrying if the name node is in start > state. Safe mode does not happen until the NN is out of the startup phase. > A RetriableException with the text "NameNode still not started" is thrown > when the NN is in its internal service startup phase. We should add the check > for this specific exception in isBecauseSafeMode() to account for that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)