Jeff Sposetti created AMBARI-2617:
-------------------------------------

             Summary: History server should be managed as separate component
                 Key: AMBARI-2617
                 URL: https://issues.apache.org/jira/browse/AMBARI-2617
             Project: Ambari
          Issue Type: Improvement
    Affects Versions: 1.2.4
            Reporter: Jeff Sposetti


Ambari is currently not tracking history server as a separate master component 
of mapreduce service. This can make it challenging to track problems starting 
mapreduce w/o knowing to go onto the host and check the history server logs.

history server should be separate component, similar to job tracker. I think it 
will be OK if we make historyserver always on the same machine as jobtracker 
but it needs to be handled just like jobtracker with distinct and clear 
start/stop operation results, and host component start/stop controls.

Easily can see the challenge by not having historyserver separate:

1) Stop HDFS and Mapreduce
2) Only start Mapreduce
3) You'll see the start mapreduce operation fails because of the MapReduce 
Check execute fails
4) No indication anywhere that something failed to start (JobTracker shows 
started ok, which is true)
5) Mapreduce shows green dot as started ok
6) Go to the Hosts > Host page and jobtracker is running
7) So you think everything started fine so you start thinking something might 
be wrong with mapreduce configs or something...

Problem: Hosts > Host page doesn't list history server so you don't know it 
failed to start. And the operations didn't show distinct history server fail to 
start operation so user wasn't aware of failure.

Once you figure out that history server didn't start, then you go onto the 
machine and see the historyserver process isn't running. Then you figure out 
how to check the logs and see that it failed to start completely (because NN 
isn't up).

Note: we do have a nagios alert watching history server web ui so that does 
have an alert. But that alert alone is not enough to help people troubleshoot 
what is wrong in their cluster related to history server.

2013-06-06 07:43:38,930 FATAL org.apache.hadoop.mapred.JobHistoryServer: 
java.net.ConnectException: Call to xx-xx-xx-xx/xx-xx-xx-xx:8020 failed on 
connection exception: java.net.ConnectException: Connection refused
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1147)
at org.apache.hadoop.ipc.Client.call(Client.java:1123)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
at $Proxy5.getProtocolVersion(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to