[ 
https://issues.apache.org/jira/browse/AMBARI-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Wagle updated AMBARI-2041:
------------------------------------

    Attachment: AMBARI-2041.patch

[~sumitmohanty] Fix, without unit test. Incremental patch to follow.
                
> If a host that has a service client installed and the host is down, service 
> start will fail
> -------------------------------------------------------------------------------------------
>
>                 Key: AMBARI-2041
>                 URL: https://issues.apache.org/jira/browse/AMBARI-2041
>             Project: Ambari
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 1.3.0
>            Reporter: Siddharth Wagle
>            Assignee: Siddharth Wagle
>             Fix For: 1.3.0
>
>         Attachments: AMBARI-2041.patch
>
>
> In condor, service start may include client install on some hosts. If the 
> host where a client is being installed is down (heartbeat lost) then service 
> start fails. This is because the success factor for clients (tested with 
> MAPREDUCE_CLIENT) is 1 and single failure fails the stage. During service 
> start there are three stages, one each for installs, starts, and check. When 
> install stage fails, the later stages are aborted.
> Few observations:
>     Client goes to INSTALL_FAILED state. So second attempt ignores installing 
> on the client thereby succeeds in starting the service. (this is a bug as we 
> should try installing a component that is in INSTALL_FAILED state. However, 
> at this point we are saved by this bug)
>     Service check can be scheduled on a host that is in UNHEALTHY/UNKNOWN 
> state and can fail
>     Now service cannot be stopped because:
>         Stop command sees INSTALL_FAILED state and schedules an INSTALL task 
> for the client which fails.
>         The STOP commands for other components are at a later stage and are 
> aborted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to