Issue #4681 has been updated by Markus Roberts.
Status changed from Investigating to Accepted
Looking at the code in question:
<pre>
def timeout(sec, klass = nil)
return yield if sec == nil or sec.zero?
raise ThreadError, "timeout within critical session" if Thread.critical
exception = klass || Class.new(ExitException)
begin
x = Thread.current
y = Thread.start {
begin
sleep sec
rescue => e
x.raise e
else
x.raise exception, "execution expired" if x.alive?
end
}
yield sec
# return true
rescue exception => e
[snip]
ensure
if y and y.alive?
y.kill
y.join # make sure y is dead.
end
end
</pre>
it appears ruby isn't being sufficiently paranoid with regard to thread
critical and we (or another library) aren't being sufficiently careful. It's
pretty easy to construct a case in which the time-out doesn't occur:
<pre>
require "timeout"
begin
Timeout::timeout(1) {
p "foo"
sleep 200
}
rescue Timeout::Error
puts "timed-out"
end
begin
Timeout::timeout(1) {
Thread.critical = true
p "bar"
sleep 200
}
rescue Timeout::Error
puts "timed-out"
end
</pre>
but that's still short of explaining the hang as described, in that it would
still require the underlying code to hang and add the additional complication
(clue?) that this happen in a critical region.
The next question is probably **how can run_and_capture return while still
leaving things in an odd state (e.g. Thread.critical = true, dangling joins,
or...)?**
----------------------------------------
Bug #4681: Puppet 2.6.1.r2 randomly hangs on Exec in FreeBSD
http://projects.puppetlabs.com/issues/4681
Author: Fredrik Eriksson
Status: Accepted
Priority: Normal
Assignee: Markus Roberts
Category: FreeBSD
Target version:
Affected version: 0.25.5
Keywords:
Branch:
relevant info from pkg_info:
* ruby-1.8.7.248_3,1 (compiled with debug)
* puppet-2.6.1.r2
* facter-1.5.7_1
<pre>
uname -a
FreeBSD localhost 8.1-RELEASE FreeBSD 8.1-RELEASE #0: Wed Jul 28 11:44:21 UTC
2010 r...@localhost:/usr/obj/usr/src/sys/GENERIC amd64
</pre>
I did also see this problem in FreeBSD 8.0 and puppet 0.25. Puppet does not
hang every time it's run, only on random Exec statements. When debugging this I
simply created an endless loop which ran the puppet configuration until it
hang. Sometimes it would only run a few times before it hang and sometimes it
run for hours. The manifest is quite big, but contains some sensitive stuff so
I can't upload it here, but this is probably reproducable with any manifest
that uses a great deal of Exec statements.
A few examples of output when it hang with command 'ruby18 --debug
/usr/local/bin/puppet agent --test --debug --trace':
<pre>
/usr/local/lib/ruby/site_ruby/1.8/puppet/type.rb:1881: warning: instance
variable @virtual not initialized
/usr/local/lib/ruby/site_ruby/1.8/puppet/type.rb:1881: warning: instance
variable @virtual not initialized
/usr/local/lib/ruby/site_ruby/1.8/puppet/type.rb:1881: warning: instance
variable @virtual not initialized
debug: /Stage[main]/Webbase::Backup/Line[sudo_backup]/Exec[echo 'backup ALL= (
root ) NOPASSWD: /usr/sbin/backup' >> '/usr/local/etc/sudoers']: Executing
check '/usr/bin/grep -qFx 'backup ALL= ( root ) NOPASSWD:
/usr/sbin/backup' '/usr/local/etc/sudoers''
debug: Executing '/usr/bin/grep -qFx 'backup ALL= ( root ) NOPASSWD:
/usr/sbin/backup' '/usr/local/etc/sudoers''
removing /tmp/puppet20100901-36732-16mgs1p-0...done
[here puppet hangs]
</pre>
<pre>
debug: /Stage[main]/Webbase::Apache/Exec[apache_purge]: Executing check
'/bin/test -e /usr/local/etc/apache/ssl.crt'
debug: Executing '/bin/test -e /usr/local/etc/apache/ssl.crt'
removing /tmp/puppet20100901-50535-1p6j13j-0...done
[here puppet hangs]
</pre>
<pre>
/usr/local/lib/ruby/site_ruby/1.8/puppet/type.rb:1881: warning: instance
variable @virtual not initialized
debug: /Stage[main]/Webbase::Nas/Exec[/sbin/mount /proc]: Executing check
'/sbin/mount | /usr/bin/egrep ^procfs'
debug: Executing '/sbin/mount | /usr/bin/egrep ^procfs'
removing /tmp/puppet20100901-95974-1n5jg13-0...done
[here puppet hangs]
</pre>
Since I have very little experience debugging stuff, and no previous experience
with ruby I just hacked in a few debug statements in type/exec.rb, and the
relevant code appears to be this code from the run function:
<pre>
withenv environment do
Timeout::timeout(self[:timeout]) do
output, status = Puppet::Util::SUIDManager.run_and_capture(
[command], self[:user], self[:group]
)
Puppet.debug "returned from run_and_capture #{self[:timeout]}"
end
Puppet.debug "timeout ended"
# The shell returns 127 if the command is missing.
if status.exitstatus == 127
Puppet.debug "argument error"
raise ArgumentError, output
end
Puppet.debug "no argument error"
end
</pre>
When puppet hangs it is after
<pre>Puppet.debug "returned from run_and_capture #{self[:timeout]}"</pre>
but before
<pre>Puppet.debug "timeout ended"</pre>
Since I don't really have time to figure out how the timeout -> do -> end thing
works I'll just file this bug here so hopefully someone can say if this is a
bug in puppet or an upstream bug in ruby. Let me know if you need any more info.
--
You have received this notification because you have either subscribed to it,
or are involved in it.
To change your notification preferences, please click here:
http://projects.puppetlabs.com/my/account
--
You received this message because you are subscribed to the Google Groups
"Puppet Bugs" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/puppet-bugs?hl=en.