[Puppet Users] EC2 autoscaling reusing hostnames

Bad Tux Sat, 24 May 2014 05:42:08 -0700

So I'm using Amazon's amazing EC2 autoscaling service and hey, this is 
pretty cool. Traffic on the web site constellation goes up, Amazon slowly 
spawns new instances of our web application to handle the traffic and 
attaches them to the load balancer for our site. Puppet runs, pulls in the 
application from the PuppetMaster (which was designated at scaling group 
creation time), spins it up, load balancer asks it "hey are you there", the 
application says "yep", and traffic starts getting split out to the new 
instance. Traffic goes back down, after a while Amazon slowly spins the 
excess instances back down.

So I sit there for a few weeks watching traffic yoyo up and down and
watching the scaling notifications crawl across my inbox, then suddenly my
Nagios alarms go off telling me that the application is offline. WTF?
There's instances up there! I attach an elastic IP to the ssh gateway
instance and log into a couple of the application instances via ssh and
sure enough, no Tomcat is installed or running, nevermind the web app that
Tomcat is supposed to be running. Okay, is my puppetmaster offline? Nope,
it's online and listening. So I manually run puppet on one of the instances
and... "invalid certificate for this hostname".

Wha?

Then I realize: Amazon gave this instance the same IP address and hostname
as a prior instance that'd been part of the constellation! Which is
inevitable when you're running inside a VPC (Virtual Private Cloud),
because you have only a /16 to play with, which must be divided between
multiple availability groups and multiple security zones. And the
puppetmaster's SSL sez, "nope, no way, I seen you before and you had a
different certificate, go away."

Uhm, okay. So I need to solve this problem so that my new instances can get
deployed. Only thing I can think of is to trash the ssl directories on both
the puppet master and all of the clients, and then run puppet again. Note
that all the instances and puppet are in a "puppet" network security group
that was created by CloudFormation, and instances not part of the "puppet"
security group cannot connect to the puppet master, so we *know* that we're
talking to the puppet master, and the puppet master *knows* we're actual
hosts that can talk to it, and besides all of these instances are inside a
virtual private cloud that is inaccessible to the wider Internet except via
port 8080 between the load balancer and the application instances (again
enforced by the security groups mechanism) so there's no way an outsider
could talk to the puppet server anyhow, but... puppet insists on validating
these SSL certificates before letting the instances talk to it. Even though
that's a totally useless exercise given that Amazon's enforcing the ACL's
at the virtual network (firewall) layer to prevent anybody unauthorized
from getting anywhere near that puppet port or puppet IP address.

Am I missing a configuration option in the manual to somehow disable SSL
certificate validation? Does everybody add a cron job to their puppet
master to stop the puppetmaster daemon and blow away its SSL directory then
restart it at exactly 12:00AM every day, and the same on the instances at
exactly 12:02AM every day? Or are we the only people on the planet who
actually use Amazon's auto-scaling feature *plus* use Puppet at the same
time? Curious penguins are... curious!

--
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to puppet-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/puppet-users/a43e189b-61c0-498c-b687-56d216c375aa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[Puppet Users] EC2 autoscaling reusing hostnames

Reply via email to