On Friday, January 6, 2012 5:31:34 PM UTC+1, jcbollinger wrote:
>
>
> Nothing in your log suggests that the Puppet agent is doing any work
> when it fails. It appears to apply a catalog successfully, then
> create a report successfully, then nothing else. That doesn't seem
> like a problem in a module. Nevertheless, you could try removing
> classes from the affected node's configuration and testing whether
> Puppet still freezes.
>
John, thanks for your reply. I'll be deploying a node that includes no
modules at all and see if a zombie process appears again.
> You said the agent runs for several hours before it hangs. Does it
> perform multiple successful runs during that time? That also would
> tend to counterindicate a problem in your manifests.
>
Yes, the agents perform several runs (with no changes to the catalog) and
then simply freeze up, waiting for the defunct sh process to return.
> I'm suspicious that something else on your systems is interfering with
> the Puppet process; some kind of service manager, for example. You'll
> have to say whether that's a reasonable guess. Alternatively, you may
> have a system-level bug; there have been a few Ruby bugs and kernel
> regressions that interfered with Puppet operation.
>
Those are all pretty plain Ubuntu 10.04.3 server installations (both i386
and x86_64), especially the ones I deployed this week, which aren't in
production yet. What kind of service manager could there even be that
interferes?
> You could try using strace to determine where the failure happens,
> though that's not as simple as it may sound.
>
Simply trying to strace the zombie process only results in an "Operation
not permitted". The agent process shows these lines repeatedly:
Process 3741 attached - interrupt to quit
select(8, [7], NULL, NULL, {1, 723393}) = 0 (Timeout)
sigprocmask(SIG_BLOCK, NULL, []) = 0
sigprocmask(SIG_BLOCK, NULL, []) = 0
select(8, [7], NULL, NULL, {2, 0}) = 0 (Timeout)
sigprocmask(SIG_BLOCK, NULL, []) = 0
sigprocmask(SIG_BLOCK, NULL, []) = 0
...
That doesn't tell me anything other than that the puppet agent is blocking
on select() with a timeout of two seconds.
You could also try just sidestepping the problem by using cron to
> launch puppetd --runonce at your desired intervals, instead of leaving
> puppetd running in daemon mode. A fair number of people seem to run
> Puppet that way, and it has some advantages.
>
Thanks, that's a good idea that I will probably have to resort to if the
problem doesn't go away.
Andreas
--
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/puppet-users/-/z-sG9Y7q6vQJ.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.