Re: [ansible-project] EC2 Rolling Deploy with an ASG

Ben Whaley Mon, 24 Nov 2014 05:11:02 -0800

Hi all,

Sorry for resurrecting an old thread, but wanted to mention my experience 
thus far using ec2_asg & ec2_lc for code deploys.


I'm more or less following the methods described in this helpful repo

https://github.com/ansible/immutablish-deploys

I believe the dual_asg role is accepted as the more reliable method for 
deployments. If a deployment uses two ASGs, it's possible to just delete 
the new ASG and everything goes back to normal. This is the "Netflix" 
manner of releasing updates.

The thing I'm finding though is that instances become "viable" well before 
they're actually InService in the ELB. From the ec2_asg code and by running 
ansible in verbose mode it's clear that ansible considers an instance 
viable once AWS indicates that instances are Healthy and InService. 
Checking via the AWS CLI tool, I can see that the ASG shows instances as 
Healthy and InService, but the ELB shows OutOfService. 

The AWS docs are clear about the behavior of autoscale instances with 
health check type ELB: "For each call, if the Elastic Load Balancing action 
returns any state other than InService, the instance is marked as 
unhealthy." But this is not actually the case. 

Has anyone else encountered this? Any suggested workarounds or fixes?

Thanks,
Ben


On Thursday, September 11, 2014 12:54:25 PM UTC-7, Scott Anderson wrote:
>
> On Sep 11, 2014, at 3:26 PM, James Martin <jma...@ansible.com 
> <javascript:>> wrote:
>
> I think we’re probably going to move to a system that uses a tier of 
>> proxies and two ELBs. That way we can update the idle ELB, change out the 
>> AMIs, and bring the updated ELB up behind an alternate domain for the 
>> blue-green testing. Then when everything checks out, switch the proxies to 
>> the updated ELB and take down the remaining, now idle ELB.
>>
>>
> Not following this exactly -- what's your tier of proxies?  You have a 
> group of proxies (haproxy, nginx) behind a load balancer that point to your 
> application?
>
>
> Yes, nginx or some other HA-ish thing. If it’s nginx then you can maintain 
> a brochure site even if something horrible happens to the application.
>
>  
>
>> Amazon would suggest using Route53 to point to the new ELB, but there’s 
>> too great a chance of faulty DNS caching breaking a switch to a new ELB. 
>> Plus there’s a 60s TTL to start with regardless, even in the absence of 
>> caching.
>>
>
> Quite right.  There are some interesting things you can do with tools you 
> could run on the hosts that would redirect traffic from blue hosts to the 
> green LB, socat being one.  After you notice no more traffic coming to 
> blue, you can terminate it.
>
>
> That’s an interesting idea, but it fails if people are behind a caching 
> DNS and they visit after you’ve terminated the blue traffic but before 
> their caching DNS lets go of the record.
>
> You're right, I did miss that.  By checking the AMI, you're only updating 
> the instance if the AMI changes.  If you a checking the launch config, you 
> are updating the instances if any component of the launch config has 
> changed -- AMI, instance type, address type, etc.
>
>
> That’s true, but if I’m changing instance types I’ll generally just 
> cycle_all. Because of the connection draining and parallelism of the 
> instance creation, it’s just as quick to do all of them instead of the ones 
> that needs changing. That said, it’s an obvious optimization for sure.
>
>
> Using the ASG to do the provisioning might be preferable if it’s reliable. 
>> At first I went that route, but I was having problems with the ASG’s 
>> provisioning being non-deterministic. Manually creating the instances seems 
>> to ensure that things happen in a particular order and with predictable 
>> speed. As mentioned, the manual method definitely works every time, 
>> although I need to add some more timeout and error checking (like what 
>> happens if I ask for 3 new instances and only get 2).
>>
>>
> I didn't have any issues with the ASG doing the provisioning, but I would 
> say nothing is predictable with AWS :).  
>
>
> Very true. Over the past few months I’ve had several working processes 
> just fail with no warning. The most recent is AWS sometimes refusing to 
> return the current list of AMIs. Prior to that it was the Available status 
> on an AMI not really meaning available. Now I check the list of returned 
> AMIs in a loop until the one I’m looking for shows up, Available status 
> notwithstanding. Very frustrating. Things could be worse, however: the API 
> could be run by Facebook...
>
>
>> I have a separate task that cleans up the old AMIs and LCs, incidentally. 
>> I keep the most recent around as a backup for quick rollbacks.
>>
>
> That's cool, care to share?
>  
>
>
> I think I’ve posted it before, but here’s the important bit. After 
> deleting everything but the oldest backup AMI (determined by naming 
> convention or tags), delete any LC that doesn’t have an associated AMI:
>
> def delete_launch_configs(asg_connection, ec2_connection, module):
>     changed = False
>
>     launch_configs = asg_connection.get_all_launch_configurations()
>
>     for config in launch_configs:
>         image_id = config.image_id
>         images = ec2_connection.get_all_images(image_ids=[image_id])
>
>         if not images:
>             config.delete()
>             changed = True
>
>     module.exit_json(changed=changed)
>
>
> -scott
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to ansible-project+unsubscr...@googlegroups.com.
To post to this group, send email to ansible-project@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ansible-project/27c17e51-2385-4bb5-be8e-cb8d6748a374%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [ansible-project] EC2 Rolling Deploy with an ASG

Reply via email to