On Tue, 2009-07-28 at 21:09 -0400, Mark Plaksin wrote:
> Brice Figureau <[email protected]> writes:
>
> >> Our mysql questions, com_select and com_insert stats spike first.
> >> com_select and com_update are normally at around 5 and spike to 40;
> >> questions is normally around 150 and spikes to 600. Threads connected
> >> goes from around 15 to 30. After that it looks like everything queues
> >> up behind MySQL and we start getting timeouts on our ~450 clients.
>
> ...
>
> > The only reason for a storeconfig storm is that Puppet deletes all the
> > resources/tags belonging to a particular host and then recreates them,
> > so you see a lots of Inserts.
> >
> > Now the real question is why Puppet thinks there is such discrepencies
> > between the database and the live compilation.
> >
> > Are you sure you're not removing hosts from the database?
>
> Yes. Hosts that no longer exist are still in the database :)
One thing I noticed is the following: I had in two separate places
(modules) the following pattern:
if ! defined(File["a"]) {
file { "a":
...
}
}
Then when a host was coming to get its config in puppetmaster, it could
get the File[a] defined from place1. If it was connected to another
master process, then the same host could get the File[a] from place2.
It isn't an issue per se, but that means that this resource has possibly
too different tag set (one mentioning place1, the other place2).
So each time, the tags for this resource was deleted and recreated,
generating database load.
Maybe you have such pattern in your manifests?
> > What would be interesting is to activate the mysql general query log
> > (warning it will increase your load), and dig in the large log around
> > the timeframe you see the storm (you can also activate the rails log for
> > the same effect).
>
> I meant to ask whether some MySQL expert could look at our binary logs
> and figure out what happened :) Oh, I see there's a mysqlbinlog command!
> Who knew? Some quick greps of its output say the total number of
> updates and inserts from yesterday is about the same as any other day.
> Same for various hours yesterday--the hour that we got slammed doesn't
> seem to have more updates or inserts than other hours when we didn't get
> slammed.
The binlog contains only write queries (ie INSERT, UPDATE, DELETE), so
you're only seeing a part of the story.
It's easy to trigger a "storm", if you have a few queries that takes a
long time.
But maybe the cause is external to MySQL. I have seen this kind of stuff
happen when:
* I/O degrades because another process is eating all the available
disks throughput (usually backup processes or snapshotting)
* I/O degrades because the machine is swapping (something eating
memory), either because the swap area is on the same disks as the mysql
data, or simply because mysql innodb buffer pool is being swapped in &
out.
What I suggest you is if it happens next time, is to use innotop and
have a look to the live running queries. You might find that you have
one or more "slow" queries.
> > Or I remember reading that maatkit now contains a query log extractor
> > from tcpdump captures files; it is worth capturing the traffic between
> > Puppet and mysql and analyze the queries performed. Maybe you'll find
> > the issue.
>
> Maybe there's a tool which reads binary logs and tells you what caused
> the storm :)
If only :-)
--
Brice Figureau
My Blog: http://www.masterzen.fr/
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en
-~----------~----~----~----~------~----~------~--~---