On 2014-03-11 17:39, Georgi Todorov wrote:
On Friday, October 31, 2014 9:50:41 AM UTC-4, Georgi Todorov wrote:
Actually, sometime last night something happened and puppet
stopped processing requests altogether. Stopping and starting httpd
fixed this, but this could be just some bug in one of the new
versions of software I upgraded to. I'll keep monitoring.
So, unfortunately issue is not fixed :(. For whatever reason, everything
ran great for a day. Catalog compiles were taking around 7 seconds,
client runs finished in about 20s - happy days. Then overnight, the
catalog compile times jumped to 20-30 seconds and client runs were now
taking 200+ seconds. Few hours later, and there would be no more
requests arriving at the puppet master at all. Is my http server flaking
out?
Running some --trace --evaltrace and strace it looks like most of the
time is spent stat-ing:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
83.01 5.743474 9 673606 612864 stat
7.72 0.534393 7 72102 71510 lstat
6.76 0.467930 77988 6 wait4
That's a pretty poor "hit" rate (7k out of 74k stats)...
I've increased the check time to 1 hour on all clients, and the master
seems to be keeping up for now - catalog compile avg 8 seconds, client
run avg - 15 seconds, queue size = 0;
Here is what a client run looks like when the server is keeping up:
Notice: Finished catalog run in *11.93* seconds
Changes:
Events:
Resources:
Total: 522
Time:
Filebucket: 0.00
Cron: 0.00
Schedule: 0.00
Package: 0.00
Service: 0.68
Exec: 1.07
*File: 1.72*
Config retrieval: 13.35
Last run: 1415032387
Total: 16.82
Version:
Config: 1415031292
Puppet: 3.7.2
And when the server is just about dead:
Notice: Finished catalog run in 214.21 seconds
Changes:
Events:
Resources:
Total: 522
Time:
Cron: 0.00
Filebucket: 0.00
Schedule: 0.01
Package: 0.02
Service: 1.19
File: 128.94
Last run: 1415027092
Total: 159.21
Exec: 2.25
Config retrieval: 26.80
Version:
Config: 1415025705
Puppet: 3.7.2
Probably 500 of the "Resources" are autofs maps
using https://github.com/pdxcat/puppet-module-autofs/commits/master
So there is definitely some bottle neck on the system, the problem is I
can't figure out what it is. Is disk IO (iostat doesn't seem to think
so), is it CPU (top looks fine), is it memory (ditto), is http/passenger
combo not up to the task, is the postgres server not keeping up? There
are so many components that it is hard for me to do a proper profile to
find where the bottleneck is. Any ideas?
So far I've timed the ENC script that pulls the classes for a node -
takes less than 1 second.
From messages the catalog compile is from 7 seconds to 25 seconds
(worst case, overloaded server).
Anyway, figured I'd share that, unfortunately ruby was not the issue.
Back to poking around and testing.
You move away from Ruby 1.8.7 was a good move. That is essentially the
same as installing more hardware since Ruby versions after 1.8.7 are faster.
It may be worth trying running with Ruby 1.9.3 (p448 or later) just to
ensure it is not a Ruby 2x issue. It should be about par with Ruby 2x in
terms of performance.
That is, I am thinking that with the slow Ruby 1.8.7 you were simply
running out of compute resources, and then on Ruby 2x you may have hit
something else.
- henrik
--
Visit my Blog "Puppet on the Edge"
http://puppet-on-the-edge.blogspot.se/
--
You received this message because you are subscribed to the Google Groups "Puppet
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to puppet-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/puppet-users/m397pp%24rgk%241%40ger.gmane.org.
For more options, visit https://groups.google.com/d/optout.