Siddharth Wagle created AMBARI-2041:
---------------------------------------
Summary: If a host that has a service client installed and the
host is down, service start will fail
Key: AMBARI-2041
URL: https://issues.apache.org/jira/browse/AMBARI-2041
Project: Ambari
Issue Type: Bug
Components: controller
Affects Versions: 1.3.0
Reporter: Siddharth Wagle
Assignee: Siddharth Wagle
Fix For: 1.3.0
In condor, service start may include client install on some hosts. If the host
where a client is being installed is down (heartbeat lost) then service start
fails. This is because the success factor for clients (tested with
MAPREDUCE_CLIENT) is 1 and single failure fails the stage. During service start
there are three stages, one each for installs, starts, and check. When install
stage fails, the later stages are aborted.
Few observations:
Client goes to INSTALL_FAILED state. So second attempt ignores installing
on the client thereby succeeds in starting the service. (this is a bug as we
should try installing a component that is in INSTALL_FAILED state. However, at
this point we are saved by this bug)
Service check can be scheduled on a host that is in UNHEALTHY/UNKNOWN state
and can fail
Now service cannot be stopped because:
Stop command sees INSTALL_FAILED state and schedules an INSTALL task
for the client which fails.
The STOP commands for other components are at a later stage and are
aborted.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira