Thank you for your response John.

You made some good suggestions especially with running it in strace to 
retrieve more information. 

In regards to your question, whether we have observed the behavior of 
puppet spawning multiple processes, I can definitely confirm this.  When 
some of our nodes suffer from this, we are alerted to memory hitting a 
threshold and when we look at the process on these nodes, we clearly see 
multiple "applying configurations" in the list of processes.  At that 
point, we stop the puppet daemon and kill all the "applying configurations".

We'll try retrieving some more information this week to see if we can 
narrow down some more possibilities.  This does seem to affect a few random 
nodes but on a consistent basis so we can definitely reproduce this.  

Thanks again for your response.

Franck 

On Wednesday, June 17, 2015 at 9:19:40 AM UTC-4, jcbollinger wrote:
>
>
>
> On Monday, June 15, 2015 at 9:12:03 PM UTC-5, Franck wrote:
>>
>> We've been experiencing a lot of "Command exceeded timeouts" on basic 
>> shell commands using the "exec" type for tasks that should execute fairly 
>> fast: 
>>
>> Jun 15 15:45:44 host1 puppet-agent[57648]: 
>> (/Stage[main]/Timezone::Utc/Exec[/bin/rm -f /etc/localtime && /bin/ln -s 
>> /usr/share/zoneinfo/UTC /etc/localtime]) Command exceeded timeout
>> Jun 10 21:15:24 host1 puppet-agent[57081]: 
>> (/Stage[main]/Open-vm-tools::Package/Exec[/usr/bin/
>> vmware-uninstall-tools.pl]/onlyif) Check "/usr/bin/test -f /usr/bin/
>> vmware-uninstall-tools.pl" exceeded timeout
>> Jun 10 23:56:02 host1 puppet-agent[40286]: 
>> (/Stage[main]/Open-vm-tools::Package/Exec[/usr/bin/yum install -y 
>> open-vm-tools.x86_64]/unless) Check "/bin/rpm -q open-vm-tools" exceeded 
>> timeout
>>
>> All these commands can be run locally to the host and return fairly 
>> quickly, but when puppet executes them they time out.
>>
>
>
> Very strange.
>
>  
>
>> Extending the timeout is an option but ridiculous since default is 300 
>> seconds and none of these commands should take 5 minutes or more to return. 
>>  
>>
>
>
> No, probably not a viable option.  If these particular commands are not 
> completing within the standard timeout, then there's no particular reason 
> to think that they would *ever* complete, no matter what timeout you set.
>  
>
>>
>> Some of the things observed is that this only affects CentOS 6.x hosts as 
>> we also have Ubuntu 14.x hosts and they do not experience these problems. 
>>  Also, we've played around with different versions of the puppet agent 
>> along with different versions of Ruby and none of them had any effect as 
>> this condition persists regardless.  Also, this does not seem to affect all 
>> of our CentOS 6.x hosts but only certain ones -- randomly.
>>
>
>
> There is surely some pattern to which machines are affected and which 
> not.  Discovering that pattern would be a big step in solving the problem.
>
>  
>
>>  Running puppet agent in debug mode does not seem to uncover what's going 
>> on as it just hangs when it gets to the "exec".    
>>
>>
>
> You could try running Puppet under strace to get a low-level view of 
> exactly what Puppet gets stuck on.  Nevertheless, if the problem sticks to 
> particular computers across different Puppet versions and different Ruby 
> versions, then the root of the problem must be outside Puppet itself.
>
> You could compare the lists of installed packages between an affected 
> machine and a non-affected one.  Perhaps the problem is caused by a 
> specific package or package version.
>
> You should compare the catalogs applied to the machines that suffer from 
> this problem with those for the machines that are not affected.  It may 
> help to narrow down the problem if you find that it is associated with a 
> small number of specific resources.
>
> You should check how Puppet is running on affected machines vs. 
> non-affected ones.  Is it running as a privileged user?  The same one?
>
>  
>
>> It's very annoying and actually dangerous in some cases as the puppet 
>> agent will continue spawning multiple "applying configuration" processes 
>> which will cause hosts to swap memory as each takes up more and more memory 
>> and in some instances will hose them entirely.  
>>
>
>
> Have you actually observed that behavior?  If so, then something is 
> dreadfully wrong.  Puppet should never start a new catalog run when one is 
> already underway.  It has safeguards in place to prevent that.  If you have 
> stumbled across a way in which those can be circumvented, then I'm sure 
> PuppetLabs would appreciate a bug report.
>
>  
>
>> We've had to remove these manifests that cause these conditions in the 
>> interim but right now we have a lot of hosts we need to manage with puppet 
>> so we need to be able to use this.
>>
>> Basic info on the hosts in question:
>>
>>    - Puppet: 3.7.5
>>    - Ruby: 2.1.2
>>    - CentOS 6.6
>>
>> Anyone have any ideas as to what could be causing this?
>>
>>
>
> You haven't given us much to work with, and I, at least, have never before 
> heard of such an issue.  I do not know what is causing it, but I suggest 
> you try narrowing down the configuration being applied to one of the 
> affected nodes to find a minimal set that is sufficient to trigger the 
> issue.  For example, if you apply only class Timezone::Utc, is that 
> sufficient to cause puppet to exhibit the problem?  Please provide the 
> actual manifests involved.
>
>
> John
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-users/0d2c1448-9881-40a4-8dcd-75dbd20c5039%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to