Issue #1683 has been updated by mpalmer.

File 0003-Lock-all-accesses-to-the-values-instance-variable.patch added

luke wrote:
> mpalmer wrote:
> > luke wrote:
> > > mpalmer wrote:
> > > > luke wrote:
> > > > > In looking at the code, I see two things I can do:  Either wrap value 
> > > > > retrieval in the mutex, or don't replace the in-memory values until 
> > > > > the whole file has been parsed.  The latter is the more sensible 
> > > > > option, I think.
> > > > 
> > > > More sensible, but not practical.  There's at least three instance 
> > > > variables in a Settings object that need updating, and I can't see a 
> > > > way of doing that atomically, so all variable accesses will have to be 
> > > > locked anyway.
> > > > 
> > > > It might be easier to wrap all accesses to @@settings in the Puppet 
> > > > class and do the ol' switcheroo there.
> > > 
> > > What are the three instance variables that need updating?  Seems like you 
> > > could just swap out the @values hash in the 'parse' method; what am I 
> > > missing?
> > 
> > There's also @cache that needs dealing with, and I can't find the third 
> > variable now, but when I started adding locks everywhere there was 
> > definitely something else that needed love.
> 
> Actually, you shouldn't have to touch @cache, at least until you replace the 
> @values hash.
> 
> All I'm talking about is building up the replacement hash in a separate 
> variable, and only reassigning @values once the whole parsing or whatever has 
> succeeded.  This is a technique I've used in many other parts of the system 
> to avoid race conditions, including in the language parser.  Once you replace 
> @values, of course, you'd want to clear the cache.

I have this sneaking suspicion that "avoid" might actually mean "make less 
likely" rather than "make impossible"...

The problem with trying to replace @values and clear @cache without a mutex is 
that the race condition still exists, it's just a smaller target.  So when 
you're updating @values you need to also clear @cache in the same lock -- which 
means that we still need explicit locking on every access to @values. Since 
this bug is causing pain in the day job, I've whipped up the attached patch to 
do just that.  It doesn't break anything else, but it hasn't been tested enough 
to have a reasonable guarantee that it definitely fixes the bug, since it can't 
be reproduced with 100% reliability.  The patch is a bit of a monster, as 
there's a few bits of dodgy code I gutted while I was in there.

Also, I found the other place that (I thought) needed locking -- @config.  
Turns out that @config is just diving back into the Settings object, so it 
kinda-sorta doesn't need locking (although I get the feeling there's still 
going to be pain and suffering down the line).

- Matt
----------------------------------------
Bug #1683: Settings should not replace current values until new values are 
available
http://projects.reductivelabs.com/issues/show/1683

Author: ohookins
Status: Accepted
Priority: High
Assigned to: luke
Category: settings
Target version: 0.24.7
Complexity: Unknown
Affected version: 0.24.6
Keywords: 


So far I haven't been able to figure out what this is caused by so indulge me 
if I don't have a complete picture of what is going on.

Background:
 * my manifests are working - I test them quite thoroughly on a staging 
puppetmaster
 * the staging puppetmaster never has any issues and is quite similar in 
environment to the live puppetmaster
 * this problem most often happens after I've rolled out some changes to the 
manifests, e.g. a new module or reasonably large change.
 * after the next run by the client puppetd on all servers, usually the issue 
goes away, so it is transient and not related to the manifests (as far as I can 
see).

On the puppetd side I usually get messages like this:
<pre>
Fri Oct 24 18:02:09 +1100 2008 
//Node[testclient]/managed/ntp/File[/etc/xinetd.d/time] (err): Failed to 
retrieve current state of resource: No such file or
directory - /var/puppet/yaml/node Could not describe /ntp/etc/xinetd.d/time: No 
such file or directory - /var/puppet/yaml/node at
/var/lib/puppet/modules/ntp/manifests/init.pp:164
Fri Oct 24 18:02:09 +1100 2008 //Node[testclient]/managed/ntp/Service[xinetd] 
(notice): Dependency file[/etc/xinetd.d/time] has 1 failures
Fri Oct 24 18:02:09 +1100 2008 //Node[testclient]/managed/ntp/Service[xinetd] 
(warning): Skipping because of failed dependencies
Fri Oct 24 18:02:21 +1100 2008 
//Node[testclient]/managed/ntp/File[/etc/ntp.conf] (err): Failed to retrieve 
current state of resource: No such file or
directory - /var/puppet/yaml/node Could not describe /host/etc/ntp.conf: No 
such file or directory - /var/puppet/yaml/node at
/var/lib/puppet/modules/ntp/manifests/init.pp:84
</pre>

I do NOT have /var/puppet/ configured as a path ANYWHERE in the client or 
server configs. I have double and triple checked this.

On the puppetmasterd side I was able to capture this when running briefly with 
--debug and --verbose from the console:
<pre>
info: Caching node for client1
err: Could not call: No such file or directory - /var/puppet/yaml/node
err: Could not call: No such file or directory - /var/puppet/yaml/node
warning: Could not find facts for client2; you probably have a discrepancy 
between the node and fact names
info: Processing reports tagmail, store for client3
warning: Could not find facts for client4; you probably have a discrepancy 
between the node and fact names
warning: Could not find facts for client5; you probably have a discrepancy 
between the node and fact names
warning: Could not find facts for client6; you probably have a discrepancy 
between the node and fact names
info: Processing reports tagmail, store for client7
warning: Could not find facts for client7; you probably have a discrepancy 
between the node and fact names/usr/lib/ruby/1.8/sync.rb:57:in `Fail'
/usr/lib/ruby/1.8/sync.rb:63:in `Fail'
/usr/lib/ruby/1.8/sync.rb:183:in `sync_unlock'
/usr/lib/ruby/1.8/sync.rb:231:in `synchronize'
/usr/lib/ruby/site_ruby/1.8/puppet/util.rb:385:in `threadlock'
/usr/lib/ruby/site_ruby/1.8/puppet/util/log.rb:437:in `newmessage'
/usr/lib/ruby/site_ruby/1.8/puppet/util/log.rb:436:in `each'
/usr/lib/ruby/site_ruby/1.8/puppet/util/log.rb:436:in `newmessage'
/usr/lib/ruby/site_ruby/1.8/puppet/util/log.rb:505:in `initialize'
/usr/lib/ruby/site_ruby/1.8/puppet/util/log.rb:126:in `new'
/usr/lib/ruby/site_ruby/1.8/puppet/util/log.rb:126:in `create'
/usr/lib/ruby/site_ruby/1.8/puppet.rb:59:in `info'
/usr/lib/ruby/site_ruby/1.8/puppet/reports/tagmail.rb:152:in `send'
/usr/lib/ruby/site_ruby/1.8/puppet/reports/tagmail.rb:151:in `each'
/usr/lib/ruby/site_ruby/1.8/puppet/reports/tagmail.rb:151:in `send'
/usr/lib/ruby/site_ruby/1.8/puppet/reports/tagmail.rb:126:in `fork'
/usr/lib/ruby/site_ruby/1.8/puppet/reports/tagmail.rb:126:in `send'
/usr/lib/ruby/site_ruby/1.8/puppet/reports/tagmail.rb:121:in `process'
/usr/lib/ruby/site_ruby/1.8/puppet/network/handler/report.rb:66:in `process'
/usr/lib/ruby/site_ruby/1.8/puppet/network/handler/report.rb:59:in `each'
/usr/lib/ruby/site_ruby/1.8/puppet/network/handler/report.rb:59:in `process'
/usr/lib/ruby/site_ruby/1.8/puppet/network/handler/report.rb:33:in `report'
/usr/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in `to_proc'
/usr/lib/ruby/site_ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `call'
/usr/lib/ruby/site_ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in 
`protect_service'
/usr/lib/ruby/site_ruby/1.8/puppet/network/xmlrpc/processor.rb:85:in 
`setup_processor'
/usr/lib/ruby/1.8/xmlrpc/server.rb:336:in `call'
/usr/lib/ruby/1.8/xmlrpc/server.rb:336:in `dispatch'
/usr/lib/ruby/1.8/xmlrpc/server.rb:323:in `each'
/usr/lib/ruby/1.8/xmlrpc/server.rb:323:in `dispatch'
/usr/lib/ruby/1.8/xmlrpc/server.rb:366:in `call_method'
/usr/lib/ruby/1.8/xmlrpc/server.rb:378:in `handle'
/usr/lib/ruby/site_ruby/1.8/puppet/network/xmlrpc/processor.rb:44:in `process'
/usr/lib/ruby/site_ruby/1.8/puppet/network/http_server/mongrel.rb:111:in 
`process'
/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel/http_response.rb:65:in 
`start'
/usr/lib/ruby/site_ruby/1.8/puppet/network/http_server/mongrel.rb:108:in 
`process'
/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:159:in `process_client'
/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:158:in `each'
/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:158:in `process_client'
/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:285:in `run'
/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:285:in `initialize'
/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:285:in `new'
/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:285:in `run'
/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:268:in `initialize'
/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:268:in `new'
/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:268:in `run'
/usr/sbin/puppetmasterd:287
/usr/lib/ruby/site_ruby/1.8/puppet/reports/tagmail.rb:166:in `send': Could not 
send report emails via sendmail: Thread(#<Thread:0xb6c898c4 run>) not locked. 
(Puppet::Error)
        from /usr/lib/ruby/site_ruby/1.8/puppet/reports/tagmail.rb:126:in `fork'
        from /usr/lib/ruby/site_ruby/1.8/puppet/reports/tagmail.rb:126:in `send'
        from /usr/lib/ruby/site_ruby/1.8/puppet/reports/tagmail.rb:121:in 
`process'
        from /usr/lib/ruby/site_ruby/1.8/puppet/network/handler/report.rb:66:in 
`process'
        from /usr/lib/ruby/site_ruby/1.8/puppet/network/handler/report.rb:59:in 
`each'
        from /usr/lib/ruby/site_ruby/1.8/puppet/network/handler/report.rb:59:in 
`process'
        from /usr/lib/ruby/site_ruby/1.8/puppet/network/handler/report.rb:33:in 
`report'
        from /usr/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in 
`to_proc'
         ... 20 levels...
        from /usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:268:in 
`initialize'
        from /usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:268:in 
`new'
        from /usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:268:in 
`run'
        from /usr/sbin/puppetmasterd:287

</pre>

I'm sorry if I have the traceback and other output mixed up, I'm not sure what 
causes what. This traceback and error messages are printed ad infinitum as I 
have about 120 nodes connecting at about the same time. Hostnames have been 
changed to protect the innocent.

Any ideas?


----------------------------------------
You have received this notification because you have either subscribed to it, 
or are involved in it.
To change your notification preferences, please click here: 
http://reductivelabs.com/redmine/my/account

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Puppet Bugs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/puppet-bugs?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to