Hi Calvin, Yes, I have sar data on all systems going back for years.
Since others are going to probably want to be assured I am really "reading the data" right: - This is 92% user CPU time, 5% sys, and 1% soft - On some of the problems, I _do_ see a short spike of pgswpout's (memory pressure), but again, not enough to end up using much system time - The database disks are idle (all data being used is in RAM)..and are SSDs....average service times are barely measurable in ms. If I had to guess, I'd say it was spinlock misbehavior....I cannot understand why ekse a transaction blocking other things would drive the CPUs so hard into the ground with user time. Tony Tony Kay TeamUnify, LLC TU Corporate Website <http://www.teamunify.com/> TU Facebook <http://www.facebook.com/teamunify> | Free OnDeck Mobile Apps<http://www.teamunify.com/__corp__/ondeck/> On Mon, Oct 14, 2013 at 4:05 PM, Calvin Dodge <caldo...@gmail.com> wrote: > Have you tried running "vmstat 1" during these times? If so, what is > the percentage of WAIT time? Given that IIRC shared buffers should be > no more than 25% of installed memory, I wonder if too little is > available for system caching of disk reads. A high WAIT percentage > would indicate excessive I/O (especially random seeks). > > Calvin Dodge > > On Mon, Oct 14, 2013 at 6:00 PM, Tony Kay <t...@teamunify.com> wrote: > > Hi, > > > > I'm running 9.1.6 w/22GB shared buffers, and 32GB overall RAM on a 16 > > Opteron 6276 CPU box. We limit connections to roughly 120, but our > webapp is > > configured to allocate a thread-local connection, so those connections > are > > rarely doing anything more than half the time. > > > > We have been running smoothly for over a year on this configuration, and > > recently started having huge CPU spikes that bring the system to its > knees. > > Given that it is a multiuser system, it has been quite hard to pinpoint > the > > exact cause, but I think we've narrowed it down to two data import jobs > that > > were running in semi-long transactions (clusters of row inserts). > > > > The tables affected by these inserts are used in common queries. > > > > The imports will bring in a row count of perhaps 10k on average covering > 4 > > tables. > > > > The insert transactions are at isolation level read committed (the > default > > for the JDBC driver). > > > > When the import would run (again, theory...we have not been able to > > reproduce), we would end up maxed out on CPU, with a load average of 50 > for > > 16 CPUs (our normal busy usage is a load average of 5 out of 16 CPUs). > > > > When looking at the active queries, most of them are against the tables > that > > are affected by these imports. > > > > Our workaround (that is holding at present) was to drop the transactions > on > > those imports (which is not optimal, but fortunately is acceptable for > this > > particular data). This workaround has prevented any further incidents, > but > > is of course inconclusive. > > > > Does this sound familiar to anyone, and if so, please advise. > > > > Thanks in advance, > > > > Tony Kay > > >