Jeff Sposetti created AMBARI-2617:
-------------------------------------
Summary: History server should be managed as separate component
Key: AMBARI-2617
URL: https://issues.apache.org/jira/browse/AMBARI-2617
Project: Ambari
Issue Type: Improvement
Affects Versions: 1.2.4
Reporter: Jeff Sposetti
Ambari is currently not tracking history server as a separate master component
of mapreduce service. This can make it challenging to track problems starting
mapreduce w/o knowing to go onto the host and check the history server logs.
history server should be separate component, similar to job tracker. I think it
will be OK if we make historyserver always on the same machine as jobtracker
but it needs to be handled just like jobtracker with distinct and clear
start/stop operation results, and host component start/stop controls.
Easily can see the challenge by not having historyserver separate:
1) Stop HDFS and Mapreduce
2) Only start Mapreduce
3) You'll see the start mapreduce operation fails because of the MapReduce
Check execute fails
4) No indication anywhere that something failed to start (JobTracker shows
started ok, which is true)
5) Mapreduce shows green dot as started ok
6) Go to the Hosts > Host page and jobtracker is running
7) So you think everything started fine so you start thinking something might
be wrong with mapreduce configs or something...
Problem: Hosts > Host page doesn't list history server so you don't know it
failed to start. And the operations didn't show distinct history server fail to
start operation so user wasn't aware of failure.
Once you figure out that history server didn't start, then you go onto the
machine and see the historyserver process isn't running. Then you figure out
how to check the logs and see that it failed to start completely (because NN
isn't up).
Note: we do have a nagios alert watching history server web ui so that does
have an alert. But that alert alone is not enough to help people troubleshoot
what is wrong in their cluster related to history server.
2013-06-06 07:43:38,930 FATAL org.apache.hadoop.mapred.JobHistoryServer:
java.net.ConnectException: Call to xx-xx-xx-xx/xx-xx-xx-xx:8020 failed on
connection exception: java.net.ConnectException: Connection refused
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1147)
at org.apache.hadoop.ipc.Client.call(Client.java:1123)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
at $Proxy5.getProtocolVersion(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira