My setup is a little different from what it sounds like yours is, but hope this helps:
I had Ansible building the boxes and I used AWS CodeDeploy for deployments (would not suggest using it though). I ended up separating out the process into several components. High-Level: 1. CodeDeploy had verification step (you can do that with Ansible, more on that below) 2. Puma server was managed with Upstart scripts 3. ELB has health checks setup for all the member instances 4. Each application instance has a health check tied to an alert Details: 1. Post-deploy check As part of my deploy, I had a verification step, that made sure my Puma workers started, though I waited a few seconds to give enough time for ruby to be loaded, etc.; I found that if I ran the check immediately, it would always succeed, but after 2-5 seconds, the workers would crash, cause it took a bit for ruby runtime to get loaded, plus all the Rails stuff, DB connection, Redis connection, etc. You can do the same with Ansible's modules. Depending on how you're deploying, you can hack it together with the __command__ and __assert__ modules to make sure your service(s) is/are running after some timeout. You can get more creative and use some other module that's specific to your process. 2. Puma server managed by Upstart I setup my Upstart scripts to auto-restart Puma service if it failed and notify me via sendmail. I did write wrapper scripts around starting / stopping Puma services to make sure that, if, the service/workers are running, I don't try to start things again and ditto for when workers are stopped. It's a hack though. I need to look deeper into systemd to see how I'd layer this in there better and probably integrate it with Consul / Etcd instead. I did not go with bluepill or monit cause I had all kinds of problems with those two things on previous projects. I need to give Inspeqtor another look and see where in the stack it would fit. I will say that I'm leaning very heavily on Consul in my current projects. 3. ELB health checks Health checks would make sure all the instances inside an ELB were up and I could hit them on port 8080 with HTTP request (that's where Puma was listening on). If the health check failed, that instance is taken out of service. 4. App instance health checks Then there was an alert setup for each application instance "HTTP pinging" on port 8080. If it failed it would post to Slack channel via SNS. Slack would send out notifications as needed. This gives me a layered approach with only occasional false-positives. To improve, I would add Consul cluster with more notifications and integrate PagerDuty or something similar. This may seem like a lot of work, but I spent ~ 1 week to set it up, by myself, across an AWS cluster running 4 environments and with ~ 3 app instances / environment. I have a slightly modified setup for worker instances. One more note, while I love Ansible and prefer it to Chef / Puppet (Salt looks nice too, just went down Ansible rabbit hole first), it's a poor choice for deployments. Having the same tool that does orchestration and infra provisioning also do deploys "couples" things too closely for me. I tend to use something else for deployments, plenty of options out there. The reason for this is that deployment flows really need multiple "life-cycle" steps, like deploy verification, that are a lot harder to do with something like Ansible. Obviously balance this advice with "working code always wins" mantra; something to consider in your roadmap perhaps. One note on AWS CodeDeploy; while I love the idea of the service, it's beta software. You need, at least, t2.medium to run it, due to CPU constraints. It doesn't handle full disk issues well and spews stuff all over the FS. There is no way to configure the thing either, so I ended up having to use 20GB storage instead of like 5GB. Logging on that thing is horrendous as well, so yeah, not a great idea. I wish it was a lot better and started down the road of building my own clone, but got pulled into a project that's taking all my time atm. Misha On Monday, March 7, 2016 at 10:36:04 PM UTC-8, Chris McCann wrote: > > All, > > We use Ansible to deploy our Rails app onto EC2 servers on Amazon Web > Services. > > An issue with a missing environment variable caused the Rails process to > fail on restart but that wasn't communicated through Ansible. Only after > running `bundle exec rails c` on the server did the error become apparent > due to a Rails initializer that verifies all required env vars are present. > > Does anyone here have a mechanism in their deployment process that > verifies the Rails process restarts cleanly, in particular, via Ansible? > > Cheers, > > Chris > > > -- -- SD Ruby mailing list [email protected] http://groups.google.com/group/sdruby --- You received this message because you are subscribed to the Google Groups "SD Ruby" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
