Re: [Puppet Users] Puppet Agent Hang when PuppetServer Crashes...

2018-01-01 Thread R.I.Pienaar


On Mon, 1 Jan 2018, at 23:51, Matt Wise wrote:
> *Puppet Agent: 5.3.2*
> *Puppet Server: 5.1.4 - Packaged in Docker, running on Amazon ECS*
> 
> So we've recently started rolling over from our ancient Puppet 3.x system
> to a new Puppet 5.x service. The new service consists of a PuppetServer
> Docker Image (5.1.4) running in Amazon ECS, and our hosts booting up and
> running Puppet Agent 5.3.2. At this point in the migration, we're running
> ~150-200 hosts on the new Puppet5 system and we replace ~30-80 of them
> daily.
> 
> We are currently tracking down a problem with our PuppetServers and their
> memory usage, which is causing the containers to be OOM'd a few times a day
> (~10 OOMs a day across ~20 containers). While we know that we need to fix
> this, we've seen a scary behavior on the Puppet Agent side that we could
> use some advice with.
> 
> It seems that at least a few times a day now we will get a server hung in
> the boot process. The `puppet agent -t ...` process will just hang midway
> through the run. It seems that these hangs happen when the backend
> underlying PuppetServer process that they were connected to gets OOMed and
> goes away. Obviously the OOM is a problem.. but frankly I am more concerned
> with the Puppet Agent getting wedged for hours and hours without making any
> progress.
> 
> It seems that when this failure happens, the puppet agent does not ever
> time out. It never fails, or throws an error. It just hangs. We've had
> these hangs last upwards of 4-5 hours before our systems are automatically
> terminated.
> 
> We've enabled debug logging, but haven't caught one of these failures yet
> with debug mode turned on. In the mean time, are there any  known
> regressions or configuration tweaks we need to make to Puppet Agent 5.x
> more quick to fail or resilient in this case? I could obviously try to
> build in some wrapper around Puppet to catch this behavior .. but I am
> hoping that there are just some settings we need to tweak.

I see this often for other kinds of interruptions like network interruptions etc

I do recall a number of bugs around this to make it more robust, you might want 
to try searching Puppet jita 


-- 
R.I.Pienaar / www.devco.net / @ripienaar

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-users/1514847264.1185405.1221159992.28D2AE6B%40webmail.messagingengine.com.
For more options, visit https://groups.google.com/d/optout.


[Puppet Users] Puppet Agent Hang when PuppetServer Crashes...

2018-01-01 Thread Matt Wise
*Puppet Agent: 5.3.2*
*Puppet Server: 5.1.4 - Packaged in Docker, running on Amazon ECS*

So we've recently started rolling over from our ancient Puppet 3.x system
to a new Puppet 5.x service. The new service consists of a PuppetServer
Docker Image (5.1.4) running in Amazon ECS, and our hosts booting up and
running Puppet Agent 5.3.2. At this point in the migration, we're running
~150-200 hosts on the new Puppet5 system and we replace ~30-80 of them
daily.

We are currently tracking down a problem with our PuppetServers and their
memory usage, which is causing the containers to be OOM'd a few times a day
(~10 OOMs a day across ~20 containers). While we know that we need to fix
this, we've seen a scary behavior on the Puppet Agent side that we could
use some advice with.

It seems that at least a few times a day now we will get a server hung in
the boot process. The `puppet agent -t ...` process will just hang midway
through the run. It seems that these hangs happen when the backend
underlying PuppetServer process that they were connected to gets OOMed and
goes away. Obviously the OOM is a problem.. but frankly I am more concerned
with the Puppet Agent getting wedged for hours and hours without making any
progress.

It seems that when this failure happens, the puppet agent does not ever
time out. It never fails, or throws an error. It just hangs. We've had
these hangs last upwards of 4-5 hours before our systems are automatically
terminated.

We've enabled debug logging, but haven't caught one of these failures yet
with debug mode turned on. In the mean time, are there any  known
regressions or configuration tweaks we need to make to Puppet Agent 5.x
more quick to fail or resilient in this case? I could obviously try to
build in some wrapper around Puppet to catch this behavior .. but I am
hoping that there are just some settings we need to tweak.

Any thoughts?

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-users/CA%2B9wXBTjih5N%3Dc%2B8H3UYnH2Jq7fpOPPY3-kmxoxP891W6xLBfQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.