[ 
https://issues.apache.org/jira/browse/WHIRR-414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146545#comment-13146545
 ] 

David Alves edited comment on WHIRR-414 at 11/8/11 8:45 PM:
------------------------------------------------------------

I'm not saying that a possible (and even the default) behavior would not be to 
kill all machines.

I'm just saying that it should be configurable, I can easily see cases where 
not killing all machines would be advantageous (transient provider errors, 
testing, development). For instance in testing/development/debugging you might 
want to log into the machines to see what went wrong, or if you have idempotent 
bootstrap/configure you might be able to add machines without having to waste 
those that did not fail to start, or if the machines failed in the config phase 
you might decide to use them for some other purpose (since you are paying for 
them).





                
      was (Author: dr-alves):
    I'm not saying that a possible (and even the default) behavior would not be 
to kill all machines.

I'm just saying that it should be configurable, I can easily see cases where 
not killing all machines would be advantageous (transient provider errors, 
testing, development). For instance in testing/development/debugging you might 
want to log into the machines to see what went wrong, or if you have idempotent 
bootstrap/configure you might be able to add machines without having to waste 
those that did not fail to start.





                  
> whirr can have a non-zero return code and unterminated (orphaned) host 
> instances
> --------------------------------------------------------------------------------
>
>                 Key: WHIRR-414
>                 URL: https://issues.apache.org/jira/browse/WHIRR-414
>             Project: Whirr
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.6.0
>         Environment: EC2, commandline whirr
>            Reporter: Paul Baclace
>            Assignee: Andrei Savu
>            Priority: Critical
>             Fix For: 0.7.0
>
>         Attachments: WHIRR-414.patch
>
>
> Whirr can fail to completely start a cluster and indicates this with a 
> non-zero return code. In many (currently intermittent) partial failure 
> scenarios, there are resources still active (EC2 machine instances, in my 
> experience) that are not cleaned up. 
> The log contains "IOException: Too many instance failed while bootstrapping!" 
> when I have seen orphaned nodes.
> A non-zero return code should guarantee that all resources are cleaned up.  
> Without this post-condition, these failures require manual inspection and 
> cleanup to stop useless expenses (which is why I marked this bug critical; it 
> needs to be addressed for any kind of cron job triggered whirr).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to