I occasionally run into an issue with the ec2_elb module that I think needs 
to be addressed.  This module is used to add and remove Amazon EC2 
instances to/from an Amazon load balancer.  The API into Amazon is 
asynchronous, so when you use it to add an instance to a load balancer it 
issues a "insert" command then it goes into a loop where it sleeps for a 
second, checks the status, and continues to loop until the instance reaches 
an "In Service" state in the load balancer. There's also a check to see if 
the instance enters an error state, and if it does then the module 
immediately returns an error. Here's a bit of pseudocode to demonstrate the 
behavior:

while True:
    get_instance_state
    if InService:
      return success;
    else if instance_error:
      return error;
    sleep 1

So when an instance is added to the load balancer, the module waits until 
it reaches the InService state.  If at any point along that way it enters 
an error state then it immediately fails.

The problem I have is that it's not terribly uncommon for an instance to 
enter a transient unhealthy state for a couple of seconds prior to being 
successfully put into service.  I have on a number of occasions had my 
Ansible playbook fail because the ec2_elb module throws an error and yet 
the EC2 instance is successfully put into service in the load balancer.  If 
the module had simply waited a few more seconds to check on the health of 
the instance then my playbook would have run successfully.

I would like to propose making a change to the ec2_elb module to address 
these sorts of transient errors.  There really should be a timeout 
associated with this while loop in the module.  It should only fail if the 
instance is not put into service during that period of time, and any errors 
that occur within that time period should be ignored.  To maintain the 
current state it shouldn't be too difficult to add an optional timeout 
parameter that changes the behavior only if it is set.  So if a timeout 
parameter is added then the above loop woudl look something like this:

while not timeout_exceeded:
    get_instance_state
    if InService:
      return success;
    else if instance_error AND timeout_exceeded:
      return error;
    sleep 1

return timeout

Any comments/suggestions about this, especially from other folks using the 
ec2_elb module?

-Bruce


-- 
You received this message because you are subscribed to the Google Groups 
"Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to ansible-project+unsubscr...@googlegroups.com.
To post to this group, send email to ansible-project@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to