[jira] [Commented] (AMBARI-2617) History server should be managed as separate component

Jaimin D Jetly (JIRA) Thu, 20 Feb 2014 15:46:36 -0800

    [ 
https://issues.apache.org/jira/browse/AMBARI-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907717#comment-13907717
 ]


Jaimin D Jetly commented on AMBARI-2617:
----------------------------------------

This was fixed via AMBARI-4222 commit.

> History server should be managed as separate component
> ------------------------------------------------------
>
>                 Key: AMBARI-2617
>                 URL: https://issues.apache.org/jira/browse/AMBARI-2617
>             Project: Ambari
>          Issue Type: Improvement
>    Affects Versions: 1.2.4
>            Reporter: Jeff Sposetti
>            Assignee: Arsen Babych
>
> Ambari is currently not tracking history server as a separate master 
> component of mapreduce service. This can make it challenging to track 
> problems starting mapreduce w/o knowing to go onto the host and check the 
> history server logs.
> history server should be separate component, similar to job tracker. I think 
> it will be OK if we make historyserver always on the same machine as 
> jobtracker but it needs to be handled just like jobtracker with distinct and 
> clear start/stop operation results, and host component start/stop controls.
> Easily can see the challenge by not having historyserver separate:
> 1) Stop HDFS and Mapreduce
> 2) Only start Mapreduce
> 3) You'll see the start mapreduce operation fails because of the MapReduce 
> Check execute fails
> 4) No indication anywhere that something failed to start (JobTracker shows 
> started ok, which is true)
> 5) Mapreduce shows green dot as started ok
> 6) Go to the Hosts > Host page and jobtracker is running
> 7) So you think everything started fine so you start thinking something might 
> be wrong with mapreduce configs or something...
> Problem: Hosts > Host page doesn't list history server so you don't know it 
> failed to start. And the operations didn't show distinct history server fail 
> to start operation so user wasn't aware of failure.
> Once you figure out that history server didn't start, then you go onto the 
> machine and see the historyserver process isn't running. Then you figure out 
> how to check the logs and see that it failed to start completely (because NN 
> isn't up).
> Note: we do have a nagios alert watching history server web ui so that does 
> have an alert. But that alert alone is not enough to help people troubleshoot 
> what is wrong in their cluster related to history server.
> 2013-06-06 07:43:38,930 FATAL org.apache.hadoop.mapred.JobHistoryServer: 
> java.net.ConnectException: Call to xx-xx-xx-xx/xx-xx-xx-xx:8020 failed on 
> connection exception: java.net.ConnectException: Connection refused
> at org.apache.hadoop.ipc.Client.wrapException(Client.java:1147)
> at org.apache.hadoop.ipc.Client.call(Client.java:1123)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
> at $Proxy5.getProtocolVersion(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (AMBARI-2617) History server should be managed as separate component

Reply via email to