On Thu, Sep 30, 2010 at 7:37 AM, Brice Figureau <[email protected]> wrote: > On Thu, 2010-09-30 at 06:01 -0700, Nigel Kersten wrote: >> On Thu, Sep 30, 2010 at 1:21 AM, Brice Figureau >> <[email protected]> wrote: >> > On Wed, 2010-09-29 at 17:32 -0700, Jason Wright wrote: >> >> On Wed, Sep 29, 2010 at 1:54 PM, Brice Figureau >> >> <[email protected]> wrote: >> >> > It would be great if you could add some debug statements to the >> >> > lib/puppet/indirector/yaml.rb file around line 22 to show what the YAML >> >> > look like, and/or what cache it was trying to load. >> >> >> >> I added >> >> >> >> Puppet.debug("FOO: failed to read YAML from #{file}") if yaml.nil? >> >> or yaml.to_s == "" >> >> >> >> at line 19 of puppet/indirector/yaml.rb and it's logging when I run >> >> puppet-load so it looks like something is failing in readlock(). >> > >> > Yes that was my gut feeling too. >> > I think part of the issue is that puppet-load asks always for the same >> > node. In real world setups it is improbable that the master has to >> > answer the same question at exactly the same time. >> > So I think there is a race in the indirector yaml caching subsystem. It >> > looks like readlock and writelock are not doing their job. > > I found several issues that are worth looking into: > > 1) Puppet::Util.sync doesn't seem thread-safe > Two threads can enter this method at the same time for the same > resource. Thus it might be possible to exit with two different Sync > instance for the same resource. There are low chance with MRI > green-threading, but this can happen under JRuby. Which means a thread > can write the file at the same time another can read it (flock is per > process and shouldn't lock a given thread). > > 2) lib/puppet/external/lock.rb seems incomplete > Notice how the lock_shared part does flock(LOCK_UN) only based on > $reader_count which is never incremented (you can compare with the > original version linked in the comment). > So basically we never unlock our read locks :) > I suppose that closing the file is enough to remove the lock > (hopefully).
nice catch. We should summarize this into a bug report... > > I think if someone beside Jason, Nigel and me could have a look to this > issue, that would be great (this is a hint for the PL team) :) > > I'll try to reproduce it on my side if I can achieve the same > concurrency as you have (I don't have any powerful test machines, nor > any load balancers :)). I'm pretty sure Jason is running his tests on a vmware instance on his desktop before we extrapolate to either our xen production servers or bare metal tests, so I don't think it's *that* powerful a machine. > >> > Can you summarize on what os/filesystems type/ruby versions you are >> > running your master? >> > >> > Hmm, could it be that the node yaml (ie $yamldir) is on NFS or any >> > filesystem that have issues with file locks? >> >> Just to avoid the timezone round trip because I woke up early :) Jason >> will either be benchmarking on Ubuntu Hardy or Lucid, and I think he's >> just on the standard Ruby versions there at the moment. >> >> Probably 1.8.6.111-2ubuntu1.3 or 1.8.7.249-2 > > OK, nothing fancy, then. > >> They're definitely not on NFS. > > But can $vardir be on NFS or any unlockable filesystem? Nope. We don't even enable NFS mounts on our puppet masters. -- You received this message because you are subscribed to the Google Groups "Puppet Developers" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/puppet-dev?hl=en.
