On Thu, Sep 30, 2010 at 7:37 AM, Brice Figureau
<[email protected]> wrote:
> On Thu, 2010-09-30 at 06:01 -0700, Nigel Kersten wrote:
>> On Thu, Sep 30, 2010 at 1:21 AM, Brice Figureau
>> <[email protected]> wrote:
>> > On Wed, 2010-09-29 at 17:32 -0700, Jason Wright wrote:
>> >> On Wed, Sep 29, 2010 at 1:54 PM, Brice Figureau
>> >> <[email protected]> wrote:
>> >> > It would be great if you could add some debug statements to the
>> >> > lib/puppet/indirector/yaml.rb file around line 22 to show what the YAML
>> >> > look like, and/or what cache it was trying to load.
>> >>
>> >> I added
>> >>
>> >>     Puppet.debug("FOO: failed to read YAML from #{file}") if yaml.nil?
>> >> or yaml.to_s == ""
>> >>
>> >> at line 19 of puppet/indirector/yaml.rb and it's logging when I run
>> >> puppet-load so it looks like something is failing in readlock().
>> >
>> > Yes that was my gut feeling too.
>> > I think part of the issue is that puppet-load asks always for the same
>> > node. In real world setups it is improbable that the master has to
>> > answer the same question at exactly the same time.
>> > So I think there is a race in the indirector yaml caching subsystem. It
>> > looks like readlock and writelock are not doing their job.
>
> I found several issues that are worth looking into:
>
> 1) Puppet::Util.sync doesn't seem thread-safe
> Two threads can enter this method at the same time for the same
> resource. Thus it might be possible to exit with two different Sync
> instance for the same resource. There are low chance with MRI
> green-threading, but this can happen under JRuby. Which means a thread
> can write the file at the same time another can read it (flock is per
> process and shouldn't lock a given thread).
>
> 2) lib/puppet/external/lock.rb seems incomplete
> Notice how the lock_shared part does flock(LOCK_UN) only based on
> $reader_count which is never incremented (you can compare with the
> original version linked in the comment).
> So basically we never unlock our read locks :)
> I suppose that closing the file is enough to remove the lock
> (hopefully).

nice catch. We should summarize this into a bug report...

>
> I think if someone beside Jason, Nigel and me could have a look to this
> issue, that would be great (this is a hint for the PL team) :)
>
> I'll try to reproduce it on my side if I can achieve the same
> concurrency as you have (I don't have any powerful test machines, nor
> any load balancers :)).

I'm pretty sure Jason is running his tests on a vmware instance on his
desktop before we extrapolate to either our xen production servers or
bare metal tests, so I don't think it's *that* powerful a machine.

>
>> > Can you summarize on what os/filesystems type/ruby versions you are 
>> > running your master?
>> >
>> > Hmm, could it be that the node yaml (ie $yamldir) is on NFS or any
>> > filesystem that have issues with file locks?
>>
>> Just to avoid the timezone round trip because I woke up early :) Jason
>> will either be benchmarking on Ubuntu Hardy or Lucid, and I think he's
>> just on the standard Ruby versions there at the moment.
>>
>> Probably 1.8.6.111-2ubuntu1.3 or 1.8.7.249-2
>
> OK, nothing fancy, then.
>
>> They're definitely not on NFS.
>
> But can $vardir be on NFS or any unlockable filesystem?

Nope. We don't even enable NFS mounts on our puppet masters.

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-dev?hl=en.

Reply via email to