[SDRuby] Re: Detecting Rails deployment errors on EC2

Misha Manulis Mon, 14 Mar 2016 17:20:40 -0700

My setup is a little different from what it sounds like yours is, but hope 
this helps:

I had Ansible building the boxes and I used AWS CodeDeploy for deployments 
(would not suggest using it though).  I ended up separating out the process 
into several components.

High-Level:

1. CodeDeploy had verification step (you can do that with Ansible, more on 
that below)
2. Puma server was managed with Upstart scripts
3. ELB has health checks setup for all the member instances
4. Each application instance has a health check tied to an alert

Details:

1. Post-deploy check

As part of my deploy, I had a verification step, that made sure my Puma 
workers started, though I waited a few seconds to give enough time for ruby 
to be loaded, etc.; I found that if I ran the check immediately, it would 
always succeed, but after 2-5 seconds, the workers would crash, cause it 
took a bit for ruby runtime to get loaded, plus all the Rails stuff, DB 
connection, Redis connection, etc.

You can do the same with Ansible's modules.  Depending on how you're 
deploying, you can hack it together with the __command__ and __assert__ 
modules to make sure your service(s) is/are running after some timeout. 
 You can get more creative and use some other module that's specific to 
your process.

2. Puma server managed by Upstart

I setup my Upstart scripts to auto-restart Puma service if it failed and 
notify me via sendmail.  I did write wrapper scripts around starting / 
stopping Puma services to make sure that, if, the service/workers are 
running, I don't try to start things again and ditto for when workers are 
stopped.

It's a hack though.  I need to look deeper into systemd to see how I'd 
layer this in there better and probably integrate it with Consul / Etcd 
instead.

I did not go with bluepill or monit cause I had all kinds of problems with 
those two things on previous projects.  I need to give Inspeqtor another 
look and see where in the stack it would fit.  I will say that I'm leaning 
very heavily on Consul in my current projects.

3. ELB health checks

Health checks would make sure all the instances inside an ELB were up and I 
could hit them on port 8080 with HTTP request (that's where Puma was 
listening on).  If the health check failed, that instance is taken out of 
service.

4. App instance health checks

Then there was an alert setup for each application instance "HTTP pinging" 
on port 8080.  If it failed it would post to Slack channel via SNS.  Slack 
would send out notifications as needed.

This gives me a layered approach with only occasional false-positives.  To 
improve, I would add Consul cluster with more notifications and integrate 
PagerDuty or something similar.

This may seem like a lot of work, but I spent ~ 1 week to set it up, by 
myself, across an AWS cluster running 4 environments and with ~ 3 app 
instances / environment.  I have a slightly modified setup for worker 
instances.

One more note, while I love Ansible and prefer it to Chef / Puppet  (Salt 
looks nice too, just went down Ansible rabbit hole first), it's a poor 
choice for deployments.  Having the same tool that does orchestration and 
infra provisioning also do deploys "couples" things too closely for me.  I 
tend to use something else for deployments, plenty of options out there.

The reason for this is that deployment flows really need multiple 
"life-cycle" steps, like deploy verification, that are a lot harder to do 
with something like Ansible.  Obviously balance this advice with "working 
code always wins" mantra; something to consider in your roadmap perhaps.

One note on AWS CodeDeploy; while I love the idea of the service, it's beta 
software.  You need, at least, t2.medium to run it, due to CPU constraints. 
 It doesn't handle full disk issues well and spews stuff all over the FS. 
 There is no way to configure the thing either, so I ended up having to use 
20GB storage instead of like 5GB.  Logging on that thing is horrendous as 
well, so yeah, not a great idea.  I wish it was a lot better and started 
down the road of building my own clone, but got pulled into a project 
that's taking all my time atm.

Misha

On Monday, March 7, 2016 at 10:36:04 PM UTC-8, Chris McCann wrote:
>
> All,
>
> We use Ansible to deploy our Rails app onto EC2 servers on Amazon Web 
> Services.  
>
> An issue with a missing environment variable caused the Rails process to 
> fail on restart but that wasn't communicated through Ansible.  Only after 
> running `bundle exec rails c` on the server did the error become apparent 
> due to a Rails initializer that verifies all required env vars are present.
>
> Does anyone here have a mechanism in their deployment process that 
> verifies the Rails process restarts cleanly, in particular, via Ansible?
>
> Cheers,
>
> Chris
>
>
>

-- 
-- 
SD Ruby mailing list
[email protected]
http://groups.google.com/group/sdruby
--- 
You received this message because you are subscribed to the Google Groups "SD 
Ruby" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[SDRuby] Re: Detecting Rails deployment errors on EC2

Reply via email to