So I'm using Amazon's amazing EC2 autoscaling service and hey, this is 
pretty cool. Traffic on the web site constellation goes up, Amazon slowly 
spawns new instances of our web application to handle the traffic and 
attaches them to the load balancer for our site. Puppet runs, pulls in the 
application from the PuppetMaster (which was designated at scaling group 
creation time), spins it up, load balancer asks it "hey are you there", the 
application says "yep", and traffic starts getting split out to the new 
instance. Traffic goes back down, after a while Amazon slowly spins the 
excess instances back down. 

So I sit there for a few weeks watching traffic yoyo up and down and 
watching the scaling notifications crawl across my inbox, then suddenly my 
Nagios alarms go off telling me that the application is offline. WTF? 
There's instances up there! I attach an elastic IP to the ssh gateway 
instance and log into a couple of the application instances via ssh and 
sure enough, no Tomcat is installed or running, nevermind the web app that 
Tomcat is supposed to be running. Okay, is my puppetmaster offline? Nope, 
it's online and listening. So I manually run puppet on one of the instances 
and... "invalid certificate for this hostname".

Wha?

Then I realize: Amazon gave this instance the same IP address and hostname 
as a prior instance that'd been part of the constellation! Which is 
inevitable when you're running inside a VPC (Virtual Private Cloud), 
because you have only a /16 to play with, which must be divided between 
multiple availability groups and multiple security zones. And the 
puppetmaster's SSL sez, "nope, no way, I seen you before and you had a 
different certificate, go away." 

Uhm, okay. So I need to solve this problem so that my new instances can get 
deployed. Only thing I can think of is to trash the ssl directories on both 
the puppet master and all of the clients, and then run puppet again. Note 
that all the instances and puppet are in a "puppet" network security group 
that was created by CloudFormation, and instances not part of the "puppet" 
security group cannot connect to the puppet master, so we *know* that we're 
talking to the puppet master, and the puppet master *knows* we're actual 
hosts that can talk to it, and besides all of these instances are inside a 
virtual private cloud that is inaccessible to the wider Internet except via 
port 8080 between the load balancer and the application instances (again 
enforced by the security groups mechanism) so there's no way an outsider 
could talk to the puppet server anyhow, but... puppet insists on validating 
these SSL certificates before letting the instances talk to it. Even though 
that's a totally useless exercise given that Amazon's enforcing the ACL's 
at the virtual network (firewall) layer to prevent anybody unauthorized 
from getting anywhere near that puppet port or puppet IP address.

Am I missing a configuration option in the manual to somehow disable SSL 
certificate validation? Does everybody add a cron job to their puppet 
master to stop the puppetmaster daemon and blow away its SSL directory then 
restart it at exactly 12:00AM every day, and the same on the instances at 
exactly 12:02AM every day? Or are we the only people on the planet who 
actually use Amazon's auto-scaling feature *plus* use Puppet at the same 
time? Curious penguins are... curious!



-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-users/a43e189b-61c0-498c-b687-56d216c375aa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to