Siddharth Wagle created AMBARI-2041:
---------------------------------------

             Summary: If a host that has a service client installed and the 
host is down, service start will fail
                 Key: AMBARI-2041
                 URL: https://issues.apache.org/jira/browse/AMBARI-2041
             Project: Ambari
          Issue Type: Bug
          Components: controller
    Affects Versions: 1.3.0
            Reporter: Siddharth Wagle
            Assignee: Siddharth Wagle
             Fix For: 1.3.0


In condor, service start may include client install on some hosts. If the host 
where a client is being installed is down (heartbeat lost) then service start 
fails. This is because the success factor for clients (tested with 
MAPREDUCE_CLIENT) is 1 and single failure fails the stage. During service start 
there are three stages, one each for installs, starts, and check. When install 
stage fails, the later stages are aborted.

Few observations:

    Client goes to INSTALL_FAILED state. So second attempt ignores installing 
on the client thereby succeeds in starting the service. (this is a bug as we 
should try installing a component that is in INSTALL_FAILED state. However, at 
this point we are saved by this bug)
    Service check can be scheduled on a host that is in UNHEALTHY/UNKNOWN state 
and can fail
    Now service cannot be stopped because:
        Stop command sees INSTALL_FAILED state and schedules an INSTALL task 
for the client which fails.
        The STOP commands for other components are at a later stage and are 
aborted.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to