Issue #4681 has been updated by Markus Roberts.

Status changed from Investigating to Accepted

Looking at the code in question:

<pre>
  def timeout(sec, klass = nil)
    return yield if sec == nil or sec.zero?
    raise ThreadError, "timeout within critical session" if Thread.critical
    exception = klass || Class.new(ExitException)
    begin
      x = Thread.current
      y = Thread.start {
        begin
          sleep sec
        rescue => e
          x.raise e
        else
          x.raise exception, "execution expired" if x.alive?
        end
      }
      yield sec
      #    return true
    rescue exception => e
      [snip]
    ensure
      if y and y.alive?
        y.kill
        y.join # make sure y is dead.
      end
    end
</pre>

it appears ruby isn't being sufficiently paranoid with regard to thread 
critical and we (or another library) aren't being sufficiently careful.  It's 
pretty easy to construct a case in which the time-out doesn't occur:

<pre>
require "timeout"

begin
  Timeout::timeout(1) {
    p "foo"
    sleep 200
  }
rescue Timeout::Error
  puts "timed-out"
end

begin
  Timeout::timeout(1) {
    Thread.critical = true
    p "bar"
    sleep 200
  }
rescue Timeout::Error
  puts "timed-out"
end
</pre>

but that's still short of explaining the hang as described, in that it would 
still require the underlying code to hang and add the additional complication 
(clue?) that this happen in a critical region.

The next question is probably **how can run_and_capture return while still 
leaving things in an odd state (e.g. Thread.critical = true, dangling joins, 
or...)?**

----------------------------------------
Bug #4681: Puppet  2.6.1.r2 randomly hangs on Exec in FreeBSD
http://projects.puppetlabs.com/issues/4681

Author: Fredrik Eriksson
Status: Accepted
Priority: Normal
Assignee: Markus Roberts
Category: FreeBSD
Target version: 
Affected version: 0.25.5
Keywords: 
Branch: 


relevant info from pkg_info:
 
* ruby-1.8.7.248_3,1 (compiled with debug)
* puppet-2.6.1.r2
* facter-1.5.7_1

<pre>
uname -a
FreeBSD localhost 8.1-RELEASE FreeBSD 8.1-RELEASE #0: Wed Jul 28 11:44:21 UTC 
2010     r...@localhost:/usr/obj/usr/src/sys/GENERIC  amd64
</pre>

I did also see this problem in FreeBSD 8.0 and puppet 0.25. Puppet does not 
hang every time it's run, only on random Exec statements. When debugging this I 
simply created an endless loop which ran the puppet configuration until it 
hang. Sometimes it would only run a few times before it hang and sometimes it 
run for hours. The manifest is quite big, but contains some sensitive stuff so 
I can't upload it here, but this is probably reproducable with any manifest 
that uses a great deal of Exec statements.

A few examples of output when it hang with command 'ruby18 --debug 
/usr/local/bin/puppet agent --test --debug --trace':

<pre>
/usr/local/lib/ruby/site_ruby/1.8/puppet/type.rb:1881: warning: instance 
variable @virtual not initialized
/usr/local/lib/ruby/site_ruby/1.8/puppet/type.rb:1881: warning: instance 
variable @virtual not initialized
/usr/local/lib/ruby/site_ruby/1.8/puppet/type.rb:1881: warning: instance 
variable @virtual not initialized
debug: /Stage[main]/Webbase::Backup/Line[sudo_backup]/Exec[echo 'backup  ALL= ( 
root )   NOPASSWD: /usr/sbin/backup' >> '/usr/local/etc/sudoers']: Executing 
check '/usr/bin/grep -qFx 'backup     ALL= ( root )   NOPASSWD: 
/usr/sbin/backup' '/usr/local/etc/sudoers''
debug: Executing '/usr/bin/grep -qFx 'backup     ALL= ( root )   NOPASSWD: 
/usr/sbin/backup' '/usr/local/etc/sudoers''
removing /tmp/puppet20100901-36732-16mgs1p-0...done
[here puppet hangs]
</pre>

<pre>
debug: /Stage[main]/Webbase::Apache/Exec[apache_purge]: Executing check 
'/bin/test -e /usr/local/etc/apache/ssl.crt'
debug: Executing '/bin/test -e /usr/local/etc/apache/ssl.crt'
removing /tmp/puppet20100901-50535-1p6j13j-0...done
[here puppet hangs]
</pre>

<pre>
/usr/local/lib/ruby/site_ruby/1.8/puppet/type.rb:1881: warning: instance 
variable @virtual not initialized
debug: /Stage[main]/Webbase::Nas/Exec[/sbin/mount /proc]: Executing check 
'/sbin/mount | /usr/bin/egrep ^procfs'
debug: Executing '/sbin/mount | /usr/bin/egrep ^procfs'
removing /tmp/puppet20100901-95974-1n5jg13-0...done 
[here puppet hangs]
</pre>

Since I have very little experience debugging stuff, and no previous experience 
with ruby I just hacked in a few debug statements in type/exec.rb, and the 
relevant code appears to be this code from the run function:

<pre>
          withenv environment do
            Timeout::timeout(self[:timeout]) do
              output, status = Puppet::Util::SUIDManager.run_and_capture(
                [command], self[:user], self[:group]
              )
              Puppet.debug "returned from run_and_capture #{self[:timeout]}"
            end
            Puppet.debug "timeout ended"
            # The shell returns 127 if the command is missing.
            if status.exitstatus == 127
              Puppet.debug "argument error"
              raise ArgumentError, output
            end
            Puppet.debug "no argument error"
          end
</pre>

When puppet hangs it is after 
<pre>Puppet.debug "returned from run_and_capture #{self[:timeout]}"</pre>
but before 
<pre>Puppet.debug "timeout ended"</pre>

Since I don't really have time to figure out how the timeout -> do -> end thing 
works I'll just file this bug here so hopefully someone can say if this is a 
bug in puppet or an upstream bug in ruby. Let me know if you need any more info.


-- 
You have received this notification because you have either subscribed to it, 
or are involved in it.
To change your notification preferences, please click here: 
http://projects.puppetlabs.com/my/account

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Bugs" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-bugs?hl=en.

Reply via email to