Hi, First, I'm very glad and pleased someone is using puppet-load :) The codebase is still very new and rough and any fix or new feature (or even comments/ideas) are welcome.
On Thu, 2010-09-23 at 16:15 -0700, Jason Wright wrote: > I'm trying to use a slightly modified (to take --environment and > --masterport and to output to CSV) puppet-load.rb to perform > benchmarking on various puppetmaster configurations but when the > connection rate becomes high enough my puppetmasters start throwing > 400s on some fraction of the requests. I'm using a wrapper script to > repeatedly call puppet-load.rb and ramp the concurrency up from 1 to > N, collect the CSV output and using that to compare different possible > configurations. Unless noted otherwise, all of the results in this > email are from 0.25.5 running under Passenger 2.2.5 on Ubuntu Hardy > using the system Ruby. I'm running puppet-load.rb from an Ubuntu > Lucid workstation which has puppet 0.25.4 installed. We fixed a bunch of concurrency issues (spotted with puppet-load) in 2.6. Any chance you can try with a 2.6.1 to see if you get the same issue? > For example, the external node classifier which we were originally > using turned out to be very inefficient, holding the TPS rate as > reported by puppet-load.rb down to 1.5-2. At that TPS, puppet-load.rb > runs just fine and there are no 400s. We refactored the external node > classifier and were able to bring the TPS up into the 16-20 range, but > 400s began creeping into the results. For example: > > saltmine: /var/tmp ] sudo /var/tmp/puppet-load --facts > /etc/puppet/health-check/thiscert-isss-forr-thee-healthchecks.yaml > --key /etc/puppet/health-check/thiscert-isss-forr-thee-healthchecks_key.pem > --cert /etc/puppet/health-check/thiscert-isss-forr-thee-healthchecks_cert.pem > --node thiscert-isss-forr-thee-healthchecks --masterport 9140 --server > [SERVER] --repeat 100 --concurrency 15 --environment monitoring > --debug | egrep -v "starting client|finished successfully" > [sudo] password for jwright: > debug: Client 128 finished with HTTP code 400 > debug: Client 136 finished with HTTP code 400 > debug: Client 603 finished with HTTP code 400 > debug: Client 614 finished with HTTP code 400 > debug: Client 1136 finished with HTTP code 400 > debug: Client 1138 finished with HTTP code 400 > debug: Client 1143 finished with HTTP code 400 > debug: Client 1144 finished with HTTP code 400 > debug: Client 1156 finished with HTTP code 400 > debug: Client 1157 finished with HTTP code 400 > debug: Client 1158 finished with HTTP code 400 > debug: Client 1485 finished with HTTP code 400 > debug: Client 1488 finished with HTTP code 400 > 1500 requests finished in 79.527451 s > 13 requests failed > Availability: 99.13 > > Time (s): > min: 0.064414 s > max: 18.993093 s > average: 0.781170390666667 s > median: 0.5736555 s > > Concurrency: 14.73 > Transaction Rate (tps): 18.86 t/s > > Received bytes: 3.69 MiB > Throughput: 0.04639 MiB/s > > The puppetmaster seems to log two of the following for each failed > request, making it appear that it did not use our custom configuration > file for a small fraction of the total requests: > > Sep 23 15:23:38 [SERVER] puppetmasterd[13578]: Could not parse for > environment production: Could not find file > /etc/puppet/manifests/site.pp This is indeed a problem. I'd blame some threading issue in the master when trying to load/find manifests. Puppet-load is really good to impose an high concurrency to the master. It is possible to modify puppet-load to spit the error text when it encounter an HTTP 400 error code, but I'm afraid it would just print what your master printed. It would be interesting to run your master with --trace so that we can see where this error is generated (and possibly with --debug). In 2.6 we fixed several threading issues, like the one in commit:316393 (which can be backported to 0.25 easily). Some were present in 0.25, but some were new to 2.6 (or present in 0.25 but the code was completely refactored so it's difficult to say). > Interestingly, with the puppetmaster running puppet 0.25.4, I got also > got YAML parse errors on my fact data: > > Sep 14 11:44:24 [SERVER] puppetmasterd[432]: Cached node for > thiscert-isss-forr-thee-healthchecks failed: Could not parse YAML data > for node thiscert-isss-forr-thee-healthchecks: syntax error on line > 10, col 2: ` domain: thiscert-isss-forr-thee-healthchecks' If I understand correctly, are you saying that this error is not present with a master running 0.25.5? If yes, then I'm not surprised since we stopped using the ruby lib YAML extension in favor of a pure ruby internal one (ZAML) in 0.25.5, which IMHO is way better :) > When I removed the 'domain' fact from my fact file, the parse error > recurred with a different fact on the next puppet-load.rb run but it > was always the same fact within a given run. I remember I had the same issue with puppet-load testing of a JRuby based master. I don't exactly remember how I fixed it though :(. You can debug it by adding some puts statements in your master around where we uncompress/render the facts. I suggest you try 0.25.5 to see if this problem disappear. > If I turn off the external node classifier entirely TPS goes way up, > but the error rate becomes high enough that I cannot extract useful > inferences from the data. Here is a snippet of CSV output from a run > without the external node classifier. Notice that the error rate does > not actually appear to be related to the number of concurrent clients: > > # concurrency,average,median,tps,failed > 24,0.454190943333333,0.4245255,51.63,14 > 25,0.424848656,0.3672415,57.30,2500 > 26,0.527298157692307,0.4559425,48.13,606 > 27,0.469057628148148,0.4262985,56.12,1868 > 28,0.499156777857143,0.4557345,54.88,1728 > 29,0.597320855862069,0.4679375,47.42,1388 > 30,0.672104640333335,0.5515795,43.62,248 > 31,0.709958197741934,0.5676315,42.42,560 > 32,0.61151450625,0.5922085,51.32,13 > 33,0.665883167575756,0.592762,48.32,25 Those numbers might be coherent with how the MRI handles threads. I really think this is a threading issue somewhere in the master code. I even think it is fixed in 2.6. What would be needed is to isolate it in 0.25 and fix it. > The REST URIs that puppet-load.rb is requesting appear to be identical > between a 400 and a 200 result: > > 192.168.1.1 - - [23/Sep/2010:15:23:53 -0700] "GET > /monitoring/catalog/thiscert-isss-forr-thee-healthchecks?facts_format=b64_zlib_yaml&facts=eNqVjsENwjAMAP%252BZwgxg8fcAPBErpImLA41dYrdSt4cRyvvupENEuIxtOq42%250AvbgEPbZ15SC6W2WiWy7hkDR3JghpXngENnfH2cbAEGYUzktIES5vT3teNnaC%250ABDB%252Fqp6sAKr13M7rrHsbpp01CLppCxtNnz8i5vHH7xcaHVz%252B%250A > HTTP/1.1" 400 93 "-" "EventMachine HttpClient" > > 192.168.1.1 - - [23/Sep/2010:15:23:53 -0700] "GET > /monitoring/catalog/thiscert-isss-forr-thee-healthchecks?facts_format=b64_zlib_yaml&facts=eNqVjsENwjAMAP%252BZwgxg8fcAPBErpImLA41dYrdSt4cRyvvupENEuIxtOq42%250AvbgEPbZ15SC6W2WiWy7hkDR3JghpXngENnfH2cbAEGYUzktIES5vT3teNnaC%250ABDB%252Fqp6sAKr13M7rrHsbpp01CLppCxtNnz8i5vHH7xcaHVz%252B%250A > HTTP/1.1" 200 2601 "-" "EventMachine HttpClient" > > For completeness, here is the fact YAML I'm using: > > --- !ruby/object:Puppet::Node::Facts > name: thiscert-isss-forr-thee-healthchecks > values: > domain: healtcheck.dummyvalue > hostname: thiscert-isss-forr-thee-healthchecks Those facts are overriden by puppet-load if I remember correctly. > environment: monitoring > > Any help would be appreciated, Will you be at puppetcamp? Maybe we can discuss/debug this lively there? -- Brice Figureau Follow the latest Puppet Community evolutions on www.planetpuppet.org! -- You received this message because you are subscribed to the Google Groups "Puppet Developers" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/puppet-dev?hl=en.
