Issue #2211 has been updated by John Florian.
Great news! I now know how you can reproduce this and exactly what the problem
is. Please disregard the previous patch.
The fault actually lies with Facter::Util::Resolution in the "value" function
when there is a timeout for a sub-process. The code tries to reap the zombie
sub-process that timed-out, but doesn't necessarily reap just its own (see
facter/util/resolution.rb:128 and the Process.waitall). What I see is
effectively a race condition between the puppet thread that is looking for a
package provider (spawns a thread for 'rpm --version') and this zombie reaper
in facter (spawns a thread to Process.waitall). When facter launches a
sub-process for 'host #{hostname}' _and_ the network interface is down, the
host command takes about 10 seconds to timeout. Facter sees this is taking too
long and goes to reap it as a zombie. Unfortunately Process.waitall also
happens to reap the sub-process for 'rpm --version' and thus steals that exit
code away from puppet and BOOM the whole thing goes ugly quickly after that.
Please let me know if you need more info on how to reproduce this, but I
suspect you should have no difficulty at this point. It looks like the proper
solution might involve having Facter::Util::Resolution.exec() also return the
child PID so that the zombie reaper can wait for that PID specifically rather
than all PIDs.
----------------------------------------
Bug #2211: puppet won't install packages if network interface does not have an
IP address bound
http://projects.reductivelabs.com/issues/2211
Author: John Florian
Status: Accepted
Priority: High
Assigned to: Luke Kanies
Category:
Target version: 0.25.0
Complexity: Unknown
Affected version: 0.24.8
Keywords:
It is no longer possible to have puppet install packages via yum/rpm if the
network interface is not bound to an IP address. Our use case requires using
puppet in the non-daemon mode and this is possible for us because the system
will have all necessary manifests and other necessary files locally. This
worked just fine with 0.24.6 on Fedora 10, but began failing upon the upgrade
to 0.24.8.
See the attachments for failure messages and a code diff that seems to have
introduced the regression. If I revert this one change, things work nicely
once again. Looks like a very simple fix if it weren't for the ominous looking
comment in the code. :-)
--
You have received this notification because you have either subscribed to it,
or are involved in it.
To change your notification preferences, please click here:
http://reductivelabs.com/redmine/my/account
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Puppet Bugs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/puppet-bugs?hl=en
-~----------~----~----~----~------~----~------~--~---