Issue #1683 has been updated by mpalmer. File 0003-Lock-all-accesses-to-the-values-instance-variable.patch added
luke wrote: > mpalmer wrote: > > luke wrote: > > > mpalmer wrote: > > > > luke wrote: > > > > > In looking at the code, I see two things I can do: Either wrap value > > > > > retrieval in the mutex, or don't replace the in-memory values until > > > > > the whole file has been parsed. The latter is the more sensible > > > > > option, I think. > > > > > > > > More sensible, but not practical. There's at least three instance > > > > variables in a Settings object that need updating, and I can't see a > > > > way of doing that atomically, so all variable accesses will have to be > > > > locked anyway. > > > > > > > > It might be easier to wrap all accesses to @@settings in the Puppet > > > > class and do the ol' switcheroo there. > > > > > > What are the three instance variables that need updating? Seems like you > > > could just swap out the @values hash in the 'parse' method; what am I > > > missing? > > > > There's also @cache that needs dealing with, and I can't find the third > > variable now, but when I started adding locks everywhere there was > > definitely something else that needed love. > > Actually, you shouldn't have to touch @cache, at least until you replace the > @values hash. > > All I'm talking about is building up the replacement hash in a separate > variable, and only reassigning @values once the whole parsing or whatever has > succeeded. This is a technique I've used in many other parts of the system > to avoid race conditions, including in the language parser. Once you replace > @values, of course, you'd want to clear the cache. I have this sneaking suspicion that "avoid" might actually mean "make less likely" rather than "make impossible"... The problem with trying to replace @values and clear @cache without a mutex is that the race condition still exists, it's just a smaller target. So when you're updating @values you need to also clear @cache in the same lock -- which means that we still need explicit locking on every access to @values. Since this bug is causing pain in the day job, I've whipped up the attached patch to do just that. It doesn't break anything else, but it hasn't been tested enough to have a reasonable guarantee that it definitely fixes the bug, since it can't be reproduced with 100% reliability. The patch is a bit of a monster, as there's a few bits of dodgy code I gutted while I was in there. Also, I found the other place that (I thought) needed locking -- @config. Turns out that @config is just diving back into the Settings object, so it kinda-sorta doesn't need locking (although I get the feeling there's still going to be pain and suffering down the line). - Matt ---------------------------------------- Bug #1683: Settings should not replace current values until new values are available http://projects.reductivelabs.com/issues/show/1683 Author: ohookins Status: Accepted Priority: High Assigned to: luke Category: settings Target version: 0.24.7 Complexity: Unknown Affected version: 0.24.6 Keywords: So far I haven't been able to figure out what this is caused by so indulge me if I don't have a complete picture of what is going on. Background: * my manifests are working - I test them quite thoroughly on a staging puppetmaster * the staging puppetmaster never has any issues and is quite similar in environment to the live puppetmaster * this problem most often happens after I've rolled out some changes to the manifests, e.g. a new module or reasonably large change. * after the next run by the client puppetd on all servers, usually the issue goes away, so it is transient and not related to the manifests (as far as I can see). On the puppetd side I usually get messages like this: <pre> Fri Oct 24 18:02:09 +1100 2008 //Node[testclient]/managed/ntp/File[/etc/xinetd.d/time] (err): Failed to retrieve current state of resource: No such file or directory - /var/puppet/yaml/node Could not describe /ntp/etc/xinetd.d/time: No such file or directory - /var/puppet/yaml/node at /var/lib/puppet/modules/ntp/manifests/init.pp:164 Fri Oct 24 18:02:09 +1100 2008 //Node[testclient]/managed/ntp/Service[xinetd] (notice): Dependency file[/etc/xinetd.d/time] has 1 failures Fri Oct 24 18:02:09 +1100 2008 //Node[testclient]/managed/ntp/Service[xinetd] (warning): Skipping because of failed dependencies Fri Oct 24 18:02:21 +1100 2008 //Node[testclient]/managed/ntp/File[/etc/ntp.conf] (err): Failed to retrieve current state of resource: No such file or directory - /var/puppet/yaml/node Could not describe /host/etc/ntp.conf: No such file or directory - /var/puppet/yaml/node at /var/lib/puppet/modules/ntp/manifests/init.pp:84 </pre> I do NOT have /var/puppet/ configured as a path ANYWHERE in the client or server configs. I have double and triple checked this. On the puppetmasterd side I was able to capture this when running briefly with --debug and --verbose from the console: <pre> info: Caching node for client1 err: Could not call: No such file or directory - /var/puppet/yaml/node err: Could not call: No such file or directory - /var/puppet/yaml/node warning: Could not find facts for client2; you probably have a discrepancy between the node and fact names info: Processing reports tagmail, store for client3 warning: Could not find facts for client4; you probably have a discrepancy between the node and fact names warning: Could not find facts for client5; you probably have a discrepancy between the node and fact names warning: Could not find facts for client6; you probably have a discrepancy between the node and fact names info: Processing reports tagmail, store for client7 warning: Could not find facts for client7; you probably have a discrepancy between the node and fact names/usr/lib/ruby/1.8/sync.rb:57:in `Fail' /usr/lib/ruby/1.8/sync.rb:63:in `Fail' /usr/lib/ruby/1.8/sync.rb:183:in `sync_unlock' /usr/lib/ruby/1.8/sync.rb:231:in `synchronize' /usr/lib/ruby/site_ruby/1.8/puppet/util.rb:385:in `threadlock' /usr/lib/ruby/site_ruby/1.8/puppet/util/log.rb:437:in `newmessage' /usr/lib/ruby/site_ruby/1.8/puppet/util/log.rb:436:in `each' /usr/lib/ruby/site_ruby/1.8/puppet/util/log.rb:436:in `newmessage' /usr/lib/ruby/site_ruby/1.8/puppet/util/log.rb:505:in `initialize' /usr/lib/ruby/site_ruby/1.8/puppet/util/log.rb:126:in `new' /usr/lib/ruby/site_ruby/1.8/puppet/util/log.rb:126:in `create' /usr/lib/ruby/site_ruby/1.8/puppet.rb:59:in `info' /usr/lib/ruby/site_ruby/1.8/puppet/reports/tagmail.rb:152:in `send' /usr/lib/ruby/site_ruby/1.8/puppet/reports/tagmail.rb:151:in `each' /usr/lib/ruby/site_ruby/1.8/puppet/reports/tagmail.rb:151:in `send' /usr/lib/ruby/site_ruby/1.8/puppet/reports/tagmail.rb:126:in `fork' /usr/lib/ruby/site_ruby/1.8/puppet/reports/tagmail.rb:126:in `send' /usr/lib/ruby/site_ruby/1.8/puppet/reports/tagmail.rb:121:in `process' /usr/lib/ruby/site_ruby/1.8/puppet/network/handler/report.rb:66:in `process' /usr/lib/ruby/site_ruby/1.8/puppet/network/handler/report.rb:59:in `each' /usr/lib/ruby/site_ruby/1.8/puppet/network/handler/report.rb:59:in `process' /usr/lib/ruby/site_ruby/1.8/puppet/network/handler/report.rb:33:in `report' /usr/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in `to_proc' /usr/lib/ruby/site_ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `call' /usr/lib/ruby/site_ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `protect_service' /usr/lib/ruby/site_ruby/1.8/puppet/network/xmlrpc/processor.rb:85:in `setup_processor' /usr/lib/ruby/1.8/xmlrpc/server.rb:336:in `call' /usr/lib/ruby/1.8/xmlrpc/server.rb:336:in `dispatch' /usr/lib/ruby/1.8/xmlrpc/server.rb:323:in `each' /usr/lib/ruby/1.8/xmlrpc/server.rb:323:in `dispatch' /usr/lib/ruby/1.8/xmlrpc/server.rb:366:in `call_method' /usr/lib/ruby/1.8/xmlrpc/server.rb:378:in `handle' /usr/lib/ruby/site_ruby/1.8/puppet/network/xmlrpc/processor.rb:44:in `process' /usr/lib/ruby/site_ruby/1.8/puppet/network/http_server/mongrel.rb:111:in `process' /usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel/http_response.rb:65:in `start' /usr/lib/ruby/site_ruby/1.8/puppet/network/http_server/mongrel.rb:108:in `process' /usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:159:in `process_client' /usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:158:in `each' /usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:158:in `process_client' /usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:285:in `run' /usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:285:in `initialize' /usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:285:in `new' /usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:285:in `run' /usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:268:in `initialize' /usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:268:in `new' /usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:268:in `run' /usr/sbin/puppetmasterd:287 /usr/lib/ruby/site_ruby/1.8/puppet/reports/tagmail.rb:166:in `send': Could not send report emails via sendmail: Thread(#<Thread:0xb6c898c4 run>) not locked. (Puppet::Error) from /usr/lib/ruby/site_ruby/1.8/puppet/reports/tagmail.rb:126:in `fork' from /usr/lib/ruby/site_ruby/1.8/puppet/reports/tagmail.rb:126:in `send' from /usr/lib/ruby/site_ruby/1.8/puppet/reports/tagmail.rb:121:in `process' from /usr/lib/ruby/site_ruby/1.8/puppet/network/handler/report.rb:66:in `process' from /usr/lib/ruby/site_ruby/1.8/puppet/network/handler/report.rb:59:in `each' from /usr/lib/ruby/site_ruby/1.8/puppet/network/handler/report.rb:59:in `process' from /usr/lib/ruby/site_ruby/1.8/puppet/network/handler/report.rb:33:in `report' from /usr/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in `to_proc' ... 20 levels... from /usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:268:in `initialize' from /usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:268:in `new' from /usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:268:in `run' from /usr/sbin/puppetmasterd:287 </pre> I'm sorry if I have the traceback and other output mixed up, I'm not sure what causes what. This traceback and error messages are printed ad infinitum as I have about 120 nodes connecting at about the same time. Hostnames have been changed to protect the innocent. Any ideas? ---------------------------------------- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here: http://reductivelabs.com/redmine/my/account --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Puppet Bugs" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/puppet-bugs?hl=en -~----------~----~----~----~------~----~------~--~---
