[Puppet Users] Re: Command Exceeded Timeout on Basic Shell Commands

2015-06-29 Thread Franck
In case anyone is interested, we figured out what was causing this issue. 

Being that we run a Ruby on Rails application and naturally Puppet runs on 
Ruby as well, we actually load TC Malloc memory allocator for our 
application.  By running one of the "applying configuration" processes in 
gstack, we were able to discover that it was getting stuck on 
libtcmalloc.so.  

The version of TC Malloc we were running that caused this issue was 4.2.0 
but since upgrading to latest 4.2.6 we have not experienced another 
occurrence of this issue.  

On Monday, June 15, 2015 at 10:12:03 PM UTC-4, Franck wrote:
>
> We've been experiencing a lot of "Command exceeded timeouts" on basic 
> shell commands using the "exec" type for tasks that should execute fairly 
> fast: 
>
> Jun 15 15:45:44 host1 puppet-agent[57648]: 
> (/Stage[main]/Timezone::Utc/Exec[/bin/rm -f /etc/localtime && /bin/ln -s 
> /usr/share/zoneinfo/UTC /etc/localtime]) Command exceeded timeout
> Jun 10 21:15:24 host1 puppet-agent[57081]: 
> (/Stage[main]/Open-vm-tools::Package/Exec[/usr/bin/
> vmware-uninstall-tools.pl]/onlyif) Check "/usr/bin/test -f /usr/bin/
> vmware-uninstall-tools.pl" exceeded timeout
> Jun 10 23:56:02 host1 puppet-agent[40286]: 
> (/Stage[main]/Open-vm-tools::Package/Exec[/usr/bin/yum install -y 
> open-vm-tools.x86_64]/unless) Check "/bin/rpm -q open-vm-tools" exceeded 
> timeout
>
> All these commands can be run locally to the host and return fairly 
> quickly, but when puppet executes them they time out. Extending the timeout 
> is an option but ridiculous since default is 300 seconds and none of these 
> commands should take 5 minutes or more to return.  
>
> Some of the things observed is that this only affects CentOS 6.x hosts as 
> we also have Ubuntu 14.x hosts and they do not experience these problems. 
>  Also, we've played around with different versions of the puppet agent 
> along with different versions of Ruby and none of them had any effect as 
> this condition persists regardless.  Also, this does not seem to affect all 
> of our CentOS 6.x hosts but only certain ones -- randomly.  Running puppet 
> agent in debug mode does not seem to uncover what's going on as it just 
> hangs when it gets to the "exec".
>
> It's very annoying and actually dangerous in some cases as the puppet 
> agent will continue spawning multiple "applying configuration" processes 
> which will cause hosts to swap memory as each takes up more and more memory 
> and in some instances will hose them entirely.  
> We've had to remove these manifests that cause these conditions in the 
> interim but right now we have a lot of hosts we need to manage with puppet 
> so we need to be able to use this.
>
> Basic info on the hosts in question:
>
>- Puppet: 3.7.5
>- Ruby: 2.1.2
>- CentOS 6.6
>
> Anyone have any ideas as to what could be causing this?
>
> Thanks.
> Franck
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-users/64526946-314f-46c2-8ba5-d7fbf1ee09e8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[Puppet Users] Re: Command Exceeded Timeout on Basic Shell Commands

2015-06-23 Thread 'Dan Finn' via Puppet Users
Just noticed something else interesting.  When running puppet apply or 
puppet agent and it doesn't hang, puppet only spawns 1 process and there is 
no puppet process left running afterwards, it seems to clean itself up and 
die off.  When puppet runs and hangs, it spawns 2 processes each looking 
identical with the same name and even though the output from puppet apply 
or agent says it finished running (after letting it timeout on the hang) it 
leaves 1 process behind still running.  I attached strace to this running 
process and the output looks like so:

pid 61271] futex(0x7f5e2454b6a0, FUTEX_WAIT_PRIVATE, 2, {0, 263395792}) = 
-1 ETIMEDOUT (Connection timed out)
[pid 61271] futex(0x7f5e2454b6a0, FUTEX_WAIT_PRIVATE, 2, {0, 248281920}) = 
-1 ETIMEDOUT (Connection timed out)
[pid 61271] futex(0x7f5e2454b6a0, FUTEX_WAIT_PRIVATE, 2, {0, 170310944}) = 
-1 ETIMEDOUT (Connection timed out)
[pid 61271] futex(0x7f5e2454b6a0, FUTEX_WAIT_PRIVATE, 2, {0, 16576}) = 
-1 ETIMEDOUT (Connection timed out)
[pid 61271] futex(0x7f5e2454b6a0, FUTEX_WAIT_PRIVATE, 2, {0, 244956432}) = 
-1 ETIMEDOUT (Connection timed out)
[pid 61271] futex(0x7f5e2454b6a0, FUTEX_WAIT_PRIVATE, 2, {0, 141958480}) = 
-1 ETIMEDOUT (Connection timed out)
[pid 61271] futex(0x7f5e2454b6a0, FUTEX_WAIT_PRIVATE, 2, {0, 22737840}) = 
-1 ETIMEDOUT (Connection timed out)
[pid 61271] futex(0x7f5e2454b6a0, FUTEX_WAIT_PRIVATE, 2, {0, 78315952}) = 
-1 ETIMEDOUT (Connection timed out)

and as best as I can tell it would just keep repeating that indefinitely.

This seems quite odd to me and has to be related in some way but I'm not 
sure how.

On Tuesday, June 23, 2015 at 3:45:20 PM UTC-6, Dan Finn wrote:
>
> Hello,
>>
>
> I am Franck's coworker.  I spent some time looking into this today and 
> have some more info but unfortunately am still really lost as to what is 
> going on or what is causing this.
>
> Here is the output captured from a puppet agent -t --debug run where a 
> node hangs:
>
>
> https://github.com/danfinn/errors/blob/master/puppet_debug_output_during_hang.txt
>
> And here is the output from strace captured during the same host hanging 
> at the same point in the manifest (hangs at line 3441):
>
>
> https://github.com/danfinn/errors/blob/master/puppet_hang_strace_no_follow_forks.txt
>  
>
> I understand that this strace would be more helpful with -ff enabled but 
> unfortunately I couldn't get puppet to run like that, I'm not sure why but 
> it generated a bunch of errors and never ran the manifest.
>
> One thing I noticed is that while puppet is hanging running /usr/bin/test, 
> or at least it is telling me that it's running that and waiting, that 
> process does not show up in the command list.  Also, running that identical 
> /usr/bin/test command at the same time that puppet is hanging on it works 
> fine, it returns very quickly.
>
> I was able to get this test system to hang both via puppet agent and 
> running the manifest locally with puppet apply.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-users/5949b475-91fb-463a-9ed8-4e4e672aaee6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[Puppet Users] Re: Command Exceeded Timeout on Basic Shell Commands

2015-06-23 Thread 'Dan Finn' via Puppet Users

>
> Hello,
>

I am Franck's coworker.  I spent some time looking into this today and have 
some more info but unfortunately am still really lost as to what is going 
on or what is causing this.

Here is the output captured from a puppet agent -t --debug run where a node 
hangs:

https://github.com/danfinn/errors/blob/master/puppet_debug_output_during_hang.txt

And here is the output from strace captured during the same host hanging at 
the same point in the manifest (hangs at line 3441):

https://github.com/danfinn/errors/blob/master/puppet_hang_strace_no_follow_forks.txt
 

I understand that this strace would be more helpful with -ff enabled but 
unfortunately I couldn't get puppet to run like that, I'm not sure why but 
it generated a bunch of errors and never ran the manifest.

One thing I noticed is that while puppet is hanging running /usr/bin/test, 
or at least it is telling me that it's running that and waiting, that 
process does not show up in the command list.  Also, running that identical 
/usr/bin/test command at the same time that puppet is hanging on it works 
fine, it returns very quickly.

I was able to get this test system to hang both via puppet agent and 
running the manifest locally with puppet apply.

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-users/aa1469c2-6b6d-4091-8c7a-6605dbec86d1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[Puppet Users] Re: Command Exceeded Timeout on Basic Shell Commands

2015-06-20 Thread Franck
Thank you for your response John.

You made some good suggestions especially with running it in strace to 
retrieve more information. 

In regards to your question, whether we have observed the behavior of 
puppet spawning multiple processes, I can definitely confirm this.  When 
some of our nodes suffer from this, we are alerted to memory hitting a 
threshold and when we look at the process on these nodes, we clearly see 
multiple "applying configurations" in the list of processes.  At that 
point, we stop the puppet daemon and kill all the "applying configurations".

We'll try retrieving some more information this week to see if we can 
narrow down some more possibilities.  This does seem to affect a few random 
nodes but on a consistent basis so we can definitely reproduce this.  

Thanks again for your response.

Franck 

On Wednesday, June 17, 2015 at 9:19:40 AM UTC-4, jcbollinger wrote:
>
>
>
> On Monday, June 15, 2015 at 9:12:03 PM UTC-5, Franck wrote:
>>
>> We've been experiencing a lot of "Command exceeded timeouts" on basic 
>> shell commands using the "exec" type for tasks that should execute fairly 
>> fast: 
>>
>> Jun 15 15:45:44 host1 puppet-agent[57648]: 
>> (/Stage[main]/Timezone::Utc/Exec[/bin/rm -f /etc/localtime && /bin/ln -s 
>> /usr/share/zoneinfo/UTC /etc/localtime]) Command exceeded timeout
>> Jun 10 21:15:24 host1 puppet-agent[57081]: 
>> (/Stage[main]/Open-vm-tools::Package/Exec[/usr/bin/
>> vmware-uninstall-tools.pl]/onlyif) Check "/usr/bin/test -f /usr/bin/
>> vmware-uninstall-tools.pl" exceeded timeout
>> Jun 10 23:56:02 host1 puppet-agent[40286]: 
>> (/Stage[main]/Open-vm-tools::Package/Exec[/usr/bin/yum install -y 
>> open-vm-tools.x86_64]/unless) Check "/bin/rpm -q open-vm-tools" exceeded 
>> timeout
>>
>> All these commands can be run locally to the host and return fairly 
>> quickly, but when puppet executes them they time out.
>>
>
>
> Very strange.
>
>  
>
>> Extending the timeout is an option but ridiculous since default is 300 
>> seconds and none of these commands should take 5 minutes or more to return. 
>>  
>>
>
>
> No, probably not a viable option.  If these particular commands are not 
> completing within the standard timeout, then there's no particular reason 
> to think that they would *ever* complete, no matter what timeout you set.
>  
>
>>
>> Some of the things observed is that this only affects CentOS 6.x hosts as 
>> we also have Ubuntu 14.x hosts and they do not experience these problems. 
>>  Also, we've played around with different versions of the puppet agent 
>> along with different versions of Ruby and none of them had any effect as 
>> this condition persists regardless.  Also, this does not seem to affect all 
>> of our CentOS 6.x hosts but only certain ones -- randomly.
>>
>
>
> There is surely some pattern to which machines are affected and which 
> not.  Discovering that pattern would be a big step in solving the problem.
>
>  
>
>>  Running puppet agent in debug mode does not seem to uncover what's going 
>> on as it just hangs when it gets to the "exec".
>>
>>
>
> You could try running Puppet under strace to get a low-level view of 
> exactly what Puppet gets stuck on.  Nevertheless, if the problem sticks to 
> particular computers across different Puppet versions and different Ruby 
> versions, then the root of the problem must be outside Puppet itself.
>
> You could compare the lists of installed packages between an affected 
> machine and a non-affected one.  Perhaps the problem is caused by a 
> specific package or package version.
>
> You should compare the catalogs applied to the machines that suffer from 
> this problem with those for the machines that are not affected.  It may 
> help to narrow down the problem if you find that it is associated with a 
> small number of specific resources.
>
> You should check how Puppet is running on affected machines vs. 
> non-affected ones.  Is it running as a privileged user?  The same one?
>
>  
>
>> It's very annoying and actually dangerous in some cases as the puppet 
>> agent will continue spawning multiple "applying configuration" processes 
>> which will cause hosts to swap memory as each takes up more and more memory 
>> and in some instances will hose them entirely.  
>>
>
>
> Have you actually observed that behavior?  If so, then something is 
> dreadfully wrong.  Puppet should never start a new catalog run when one is 
> already underway.  It has safeguards in place to prevent that.  If you have 
> stumbled across a way in which those can be circumvented, then I'm sure 
> PuppetLabs would appreciate a bug report.
>
>  
>
>> We've had to remove these manifests that cause these conditions in the 
>> interim but right now we have a lot of hosts we need to manage with puppet 
>> so we need to be able to use this.
>>
>> Basic info on the hosts in question:
>>
>>- Puppet: 3.7.5
>>- Ruby: 2.1.2
>>- CentOS 6.6
>>
>> Anyone have any ideas as to what could be causing this?
>>
>>
>
>

[Puppet Users] Re: Command Exceeded Timeout on Basic Shell Commands

2015-06-17 Thread jcbollinger


On Monday, June 15, 2015 at 9:12:03 PM UTC-5, Franck wrote:
>
> We've been experiencing a lot of "Command exceeded timeouts" on basic 
> shell commands using the "exec" type for tasks that should execute fairly 
> fast: 
>
> Jun 15 15:45:44 host1 puppet-agent[57648]: 
> (/Stage[main]/Timezone::Utc/Exec[/bin/rm -f /etc/localtime && /bin/ln -s 
> /usr/share/zoneinfo/UTC /etc/localtime]) Command exceeded timeout
> Jun 10 21:15:24 host1 puppet-agent[57081]: 
> (/Stage[main]/Open-vm-tools::Package/Exec[/usr/bin/
> vmware-uninstall-tools.pl]/onlyif) Check "/usr/bin/test -f /usr/bin/
> vmware-uninstall-tools.pl" exceeded timeout
> Jun 10 23:56:02 host1 puppet-agent[40286]: 
> (/Stage[main]/Open-vm-tools::Package/Exec[/usr/bin/yum install -y 
> open-vm-tools.x86_64]/unless) Check "/bin/rpm -q open-vm-tools" exceeded 
> timeout
>
> All these commands can be run locally to the host and return fairly 
> quickly, but when puppet executes them they time out.
>


Very strange.

 

> Extending the timeout is an option but ridiculous since default is 300 
> seconds and none of these commands should take 5 minutes or more to return. 
>  
>


No, probably not a viable option.  If these particular commands are not 
completing within the standard timeout, then there's no particular reason 
to think that they would *ever* complete, no matter what timeout you set.
 

>
> Some of the things observed is that this only affects CentOS 6.x hosts as 
> we also have Ubuntu 14.x hosts and they do not experience these problems. 
>  Also, we've played around with different versions of the puppet agent 
> along with different versions of Ruby and none of them had any effect as 
> this condition persists regardless.  Also, this does not seem to affect all 
> of our CentOS 6.x hosts but only certain ones -- randomly.
>


There is surely some pattern to which machines are affected and which not.  
Discovering that pattern would be a big step in solving the problem.

 

>  Running puppet agent in debug mode does not seem to uncover what's going 
> on as it just hangs when it gets to the "exec".
>
>

You could try running Puppet under strace to get a low-level view of 
exactly what Puppet gets stuck on.  Nevertheless, if the problem sticks to 
particular computers across different Puppet versions and different Ruby 
versions, then the root of the problem must be outside Puppet itself.

You could compare the lists of installed packages between an affected 
machine and a non-affected one.  Perhaps the problem is caused by a 
specific package or package version.

You should compare the catalogs applied to the machines that suffer from 
this problem with those for the machines that are not affected.  It may 
help to narrow down the problem if you find that it is associated with a 
small number of specific resources.

You should check how Puppet is running on affected machines vs. 
non-affected ones.  Is it running as a privileged user?  The same one?

 

> It's very annoying and actually dangerous in some cases as the puppet 
> agent will continue spawning multiple "applying configuration" processes 
> which will cause hosts to swap memory as each takes up more and more memory 
> and in some instances will hose them entirely.  
>


Have you actually observed that behavior?  If so, then something is 
dreadfully wrong.  Puppet should never start a new catalog run when one is 
already underway.  It has safeguards in place to prevent that.  If you have 
stumbled across a way in which those can be circumvented, then I'm sure 
PuppetLabs would appreciate a bug report.

 

> We've had to remove these manifests that cause these conditions in the 
> interim but right now we have a lot of hosts we need to manage with puppet 
> so we need to be able to use this.
>
> Basic info on the hosts in question:
>
>- Puppet: 3.7.5
>- Ruby: 2.1.2
>- CentOS 6.6
>
> Anyone have any ideas as to what could be causing this?
>
>

You haven't given us much to work with, and I, at least, have never before 
heard of such an issue.  I do not know what is causing it, but I suggest 
you try narrowing down the configuration being applied to one of the 
affected nodes to find a minimal set that is sufficient to trigger the 
issue.  For example, if you apply only class Timezone::Utc, is that 
sufficient to cause puppet to exhibit the problem?  Please provide the 
actual manifests involved.


John

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-users/2b617318-fe37-44d7-8461-b36ceadccde9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.