[jira] [Commented] (MAPREDUCE-6657) job history server can fail on startup when NameNode is in start phase

Daniel Templeton (JIRA) Mon, 18 Apr 2016 11:38:18 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246282#comment-15246282
 ]


Daniel Templeton commented on MAPREDUCE-6657:
---------------------------------------------

Thanks for the patch, [~haibochen].

I hate that HDFS expects you to parse the text of their exceptions to figure 
out what's going on.  Wanna look into whether the API would allow you to throw 
a properly typed exception?  Maybe just file a followup JIRA?

In your test code, it would be nice to add a javadoc header that explains what 
you're testing.

I don't love that you're running two mini-clusters and ignoring one of them.  
Is there any way to do the test with the existing mini-cluster without 
disrupting the other tests?  If not, I'd consider creating a new test class so 
that you don't have two mini-clusters running.

Is 2000ms the shortest reasonable duration for the timeout?  Seems long to me...

{code}
      Assert.assertEquals("Job History Server is expected to time out.",
{code}

Your assert message is misleading.  It should instead say that it didn't get 
the expected error message.

> job history server can fail on startup when NameNode is in start phase
> ----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6657
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6657
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>            Reporter: Haibo Chen
>            Assignee: Haibo Chen
>         Attachments: mapreduce6657.001.patch, mapreduce6657.002.patch
>
>
> Job history server will try to create a history directory in HDFS on startup. 
> When NameNode is in safe mode, it will keep retrying for a configurable time 
> period.  However, it should also keeps retrying if the name node is in start 
> state. Safe mode does not happen until the NN is out of the startup phase. 
> A RetriableException with the text "NameNode still not started" is thrown 
> when the NN is in its internal service startup phase. We should add the check 
> for this specific exception in isBecauseSafeMode() to account for that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6657) job history server can fail on startup when NameNode is in start phase

Reply via email to