[
https://issues.apache.org/jira/browse/AMBARI-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907717#comment-13907717
]
Jaimin D Jetly commented on AMBARI-2617:
----------------------------------------
This was fixed via AMBARI-4222 commit.
> History server should be managed as separate component
> ------------------------------------------------------
>
> Key: AMBARI-2617
> URL: https://issues.apache.org/jira/browse/AMBARI-2617
> Project: Ambari
> Issue Type: Improvement
> Affects Versions: 1.2.4
> Reporter: Jeff Sposetti
> Assignee: Arsen Babych
>
> Ambari is currently not tracking history server as a separate master
> component of mapreduce service. This can make it challenging to track
> problems starting mapreduce w/o knowing to go onto the host and check the
> history server logs.
> history server should be separate component, similar to job tracker. I think
> it will be OK if we make historyserver always on the same machine as
> jobtracker but it needs to be handled just like jobtracker with distinct and
> clear start/stop operation results, and host component start/stop controls.
> Easily can see the challenge by not having historyserver separate:
> 1) Stop HDFS and Mapreduce
> 2) Only start Mapreduce
> 3) You'll see the start mapreduce operation fails because of the MapReduce
> Check execute fails
> 4) No indication anywhere that something failed to start (JobTracker shows
> started ok, which is true)
> 5) Mapreduce shows green dot as started ok
> 6) Go to the Hosts > Host page and jobtracker is running
> 7) So you think everything started fine so you start thinking something might
> be wrong with mapreduce configs or something...
> Problem: Hosts > Host page doesn't list history server so you don't know it
> failed to start. And the operations didn't show distinct history server fail
> to start operation so user wasn't aware of failure.
> Once you figure out that history server didn't start, then you go onto the
> machine and see the historyserver process isn't running. Then you figure out
> how to check the logs and see that it failed to start completely (because NN
> isn't up).
> Note: we do have a nagios alert watching history server web ui so that does
> have an alert. But that alert alone is not enough to help people troubleshoot
> what is wrong in their cluster related to history server.
> 2013-06-06 07:43:38,930 FATAL org.apache.hadoop.mapred.JobHistoryServer:
> java.net.ConnectException: Call to xx-xx-xx-xx/xx-xx-xx-xx:8020 failed on
> connection exception: java.net.ConnectException: Connection refused
> at org.apache.hadoop.ipc.Client.wrapException(Client.java:1147)
> at org.apache.hadoop.ipc.Client.call(Client.java:1123)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
> at $Proxy5.getProtocolVersion(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)