Re: [GENERAL] High SYS CPU - need advise

2012-12-03 Thread Merlin Moncure
On Sun, Dec 2, 2012 at 9:08 AM, rahul143 rk204...@gmail.com wrote:
 Hello everyone,

 I'm seeking help in diagnosing / figuring out the issue that we have with
 our DB server:

 Under some (relatively non-heavy) load: 300...400 TPS, every 10-30 seconds
 server drops into high cpu system usage (90%+ SYSTEM across all CPUs - it's
 pure SYS cpu, i.e. it's not io wait, not irq, not user). Postgresql is
 taking 10-15% at the same time. Those periods would last from few seconds,
 to minutes or until Postgresql is restarted. Needless to say that system is
 barely responsive, with load average hitting over 100. We have mostly select
 statements (joins across few tables), using indexes and resulting in a small
 number of records returned. Should number of requests per second coming drop
 a bit, server does not fall into those HIGH-SYS-CPU periods. It all seems
 like postgres runs out of some resources or fighting for some locks and that
 causing kernel to go into la-la land trying to manage it.


 So far we've checked:
 - disk and nic delays / errors / utilization
 - WAL files (created rarely)
 - tables are vacuumed OK. periods of high SYS not tied to vacuum process.
 - kernel resources utilization (sufficient FS handles, shared MEM/SEM, VM)
 - increased log level, but nothing suspicious/different (to me) is reported
 there during periods of high sys-cpu
 - ran pgbench (could not reproduce the issue, even though it was producing
 over 40,000 TPS for prolonged period of time)

 Basically, our symptoms are exactly as was reported here over a year ago
 (though for postgres 8.3, we ran 9.1):
 http://archives.postgresql.org/pgsql-general/2011-10/msg00998.php

 I will be grateful for any ideas helping to resolve or diagnose this
 problem.

Didn't we just discuss this exact problem on the identically named
thread?  
http://postgresql.1045698.n5.nabble.com/High-SYS-CPU-need-advise-td5732045.html

If you're the same poster, it's good to reference the thread and any
conclusions made in order to save everyone's time.  As at happens, I
have been working an angle that may help solve this problem.  Are you
willing/able to run patched postgres and what's your tolerance for
risk?

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-12-03 Thread Alvaro Herrera
Merlin Moncure escribió:

 Didn't we just discuss this exact problem on the identically named
 thread?  
 http://postgresql.1045698.n5.nabble.com/High-SYS-CPU-need-advise-td5732045.html

Ignore this guy.  It's a bot reinjecting old messages, or something like
that, probably because of some bug in mail list scrubbing software.  My
impression is that it's eventually going to publish every email on a
blog somewhere, or something like that.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-25 Thread Vlad

RAID10

-- vlad

On 11/24/2012 3:17 PM, Gavin Flower wrote:


Curious,  what is your RAID configuration?





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-24 Thread Jeff Janes
On Tue, Nov 20, 2012 at 9:05 AM, Merlin Moncure mmonc...@gmail.com wrote:
 On Tue, Nov 20, 2012 at 10:50 AM, Jeff Janes jeff.ja...@gmail.com wrote:

 I wouldn't expect so.  Increasing shared_buffers should either fix
 free list lock contention, or leave it unchanged, not make it worse.

 AIUI, that is simply not true (unless you raise it to the point you're
 not churning them).   I'm looking at StrategyGetBuffer() for non-scan
 cases.  It locks BufFreelistLock then loops the free list, and, if
 it finds nothing, engages a clock sweep.

The freelist should never loop.  It is written as a loop, but I think
there is currently no code path which ends up with valid buffers being
on the freelist, so that loop will never, or at least rarely, execute
more than once.

 Both of those operations are
 dependent on the number of buffers being managed and so it's
 reasonable to expect some workloads to increase contention with more
 buffers.

The clock sweep can depend on the number of buffers begin managed in a
worst-case sense, but I've never seen any evidence (nor analysis) that
this worst case can be obtained in reality on an ongoing basis.  By
constructing two pathological workloads which get switched between, I
can get the worst-case to happen, but when it does happen the
consequences are mild compared to the amount of time needed to set up
the necessary transition.  In other words, the worse-case can't be
triggered often enough to make a meaningful difference.

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-24 Thread Merlin Moncure
On Tue, Nov 20, 2012 at 12:16 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Tue, Nov 20, 2012 at 9:05 AM, Merlin Moncure mmonc...@gmail.com wrote:
 On Tue, Nov 20, 2012 at 10:50 AM, Jeff Janes jeff.ja...@gmail.com wrote:

 I wouldn't expect so.  Increasing shared_buffers should either fix
 free list lock contention, or leave it unchanged, not make it worse.

 AIUI, that is simply not true (unless you raise it to the point you're
 not churning them).   I'm looking at StrategyGetBuffer() for non-scan
 cases.  It locks BufFreelistLock then loops the free list, and, if
 it finds nothing, engages a clock sweep.

 The freelist should never loop.  It is written as a loop, but I think
 there is currently no code path which ends up with valid buffers being
 on the freelist, so that loop will never, or at least rarely, execute
 more than once.

 Both of those operations are
 dependent on the number of buffers being managed and so it's
 reasonable to expect some workloads to increase contention with more
 buffers.

 The clock sweep can depend on the number of buffers begin managed in a
 worst-case sense, but I've never seen any evidence (nor analysis) that
 this worst case can be obtained in reality on an ongoing basis.  By
 constructing two pathological workloads which get switched between, I
 can get the worst-case to happen, but when it does happen the
 consequences are mild compared to the amount of time needed to set up
 the necessary transition.  In other words, the worse-case can't be
 triggered often enough to make a meaningful difference.

Yeah, good points;  but (getting off topic here) : there have been
several documented cases of lowering shared buffers improving
performance under contention...the  'worst case' might be happening
more than expected.  In particular, what happens when a substantial
percentage of the buffer pool is set with a non-zero usage count?
This seems unlikely, but possible?  Take note:

if (buf-refcount == 0)
{
if (buf-usage_count  0)
{
buf-usage_count--;
trycounter = NBuffers;  /* emphasis *./
}

ISTM time spent here isn't bounded except that as more time is spent
sweeping (more backends are thus waiting and not marking pages) the
usage counts decrease faster until you hit steady state.  Smaller
buffer pool naturally would help in that scenario as your usage counts
would drop faster.  It strikes me as cavalier to be resetting
trycounter while sitting under the #1 known contention point for read
only workloads.  Shouldn't SBF() work on advisory basis and try to
force a buffer after N failed usage count attempts?

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-24 Thread Jeff Janes
On Wed, Nov 21, 2012 at 9:05 AM, Vlad marche...@gmail.com wrote:
 it's session mode and the pool size is 1200 (cause I need to grantee that in
 the worst case we have enough slots for all possible clients),

Wouldn't the clients prefer to wait 100ms to get a connnection if that
means their query finishes in 100ms,
rather than waiting 0ms to get a connection but have the query take
1000ms to finish?

 however even
 at the times preceding  high-cpu-sys-stall, the number postmasters are like
 15-20. When stall happens, it starts to raise but that's the result of DB
 not answering in timely manner.

That is like the beatings will continue until moral improves joke.
The theory about why pgbouncer would be beneficial is that it prevents
hundreds of connections from clogging up the server.  By allowing
hundreds of connections to be made when the server gets into distress,
you are increasing that distress and making it harder to recover from
it.  That is the theory, anyway.

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-24 Thread Merlin Moncure
On Wed, Nov 21, 2012 at 10:43 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Wed, Nov 21, 2012 at 7:29 AM, Vlad Marchenko marche...@gmail.com wrote:

 update on my problem: despite pgbouncer, the problem still occures on my
 end.

 As Merlin asked, how big is the pool?  Maybe you are using a large
 enough pool so as to defeat the purpose of restricting the number of
 connections.

also, what mode.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-24 Thread Vlad
nothing changes if I increase number of threads.
pgbouncer doesn't change much.

also, I think the nature of high-sys-cpu during stall and and when I run
pgbench is different.
During pgbench it's constantly at 30-40%, while during stall it sits at low
5-15% and then spikes to 90% after a while, with no apparent reason (i.e.:
no increased load or anything), then goes down in a few secs, or stays
until postgres restarted.

at this point the only thing that does help on app side is when I force
frequent re-connection to postgresql (i.e. not letting postmaster live too
long)

-- Vlad


On Wed, Nov 21, 2012 at 10:18 AM, Jeff Janes jeff.ja...@gmail.com wrote:

 On Wed, Nov 21, 2012 at 8:14 AM, Vlad marche...@gmail.com wrote:
 
 
  '-M prepared' produces normal results, while '-M simple' results in 40%
 sys
  cpu.  '-M extended' is somewhere in between.
  I'm running it as 60 clients, 2 threads.

 2 threads is pretty low for 60 clients.  What happens if you increase
 -j to either 30 or 60?  (What I'm wondering here is if perhaps in this
 case it is pgbench which is triggering the scheduling issue)

 Also, what happens if you channel pgbench through pgbouncer?

 Cheers,

 Jeff



Re: [GENERAL] High SYS CPU - need advise

2012-11-24 Thread Kevin Grittner
Vlad wrote:

 it's session mode and the pool size is 1200 (cause I need to
 grantee that in the worst case we have enough slots for all
 possible clients),

We found that the real-world production performance of a web
application servicing millions of we hits per day with thousands of
concurrent users improved when we reconfigured our database
connection pool to be about 35 instead of 55, on a 16 core box with a
40 drive RAID.

http://wiki.postgresql.org/wiki/Number_Of_Database_Connections

-Kevin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-24 Thread Gavin Flower

On 25/11/12 09:30, Kevin Grittner wrote:

Vlad wrote:


it's session mode and the pool size is 1200 (cause I need to
grantee that in the worst case we have enough slots for all
possible clients),

We found that the real-world production performance of a web
application servicing millions of we hits per day with thousands of
concurrent users improved when we reconfigured our database
connection pool to be about 35 instead of 55, on a 16 core box with a
40 drive RAID.

http://wiki.postgresql.org/wiki/Number_Of_Database_Connections

-Kevin



Curious,  what is your RAID configuration?


Cheers,
Gavin


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-24 Thread Merlin Moncure
On Wed, Nov 21, 2012 at 11:05 AM, Vlad marche...@gmail.com wrote:
 it's session mode and the pool size is 1200 (cause I need to grantee that in
 the worst case we have enough slots for all possible clients), however even
 at the times preceding  high-cpu-sys-stall, the number postmasters are like
 15-20. When stall happens, it starts to raise but that's the result of DB
 not answering in timely manner.

That is completely defeating the point.  Mode should be transaction
mode and somewhere between 1-2x the number of cores for the pool size.
 Note the # of connected clients does not have to equal the size of
the pool -- that's the whole point of pgboucner -- it forces the
queries through a fixed number of backend processes.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-24 Thread Vlad
 what pgbouncer mode, and how large is your pool.


'-M prepared' produces normal results, while '-M simple' results in 40% sys
cpu.  '-M extended' is somewhere in between.
I'm running it as 60 clients, 2 threads.


-- Vlad


Re: [GENERAL] High SYS CPU - need advise

2012-11-24 Thread Kevin Grittner
Gavin Flower wrote:

 We found that the real-world production performance of a web
 application servicing millions of we hits per day with thousands
 of concurrent users improved when we reconfigured our database
 connection pool to be about 35 instead of 55, on a 16 core box
 with a 40 drive RAID.

 Curious, what is your RAID configuration?

We have the OS on RAID 1 on one RAID controller, WAL on another RAID
1 array on its own controller, and the PostgreSQL data directory on
RAID 5 on a pair of controllers which each has a path to every drive.
(Don't ask me how that works, I just trust the hardware guys to get
that part right.) The active portion of the database is cached, the
read-to-write ratio is about 10 to 1, and the controllers all have
BBU cache with write-back. It is actually one pool of about 30 for
the read-only web app with so many hits, and a separate pool of 6
database connections for logical replication from about 80 source
databases handling an OLTP load of 3000 directly connected users.

We were seeing some performance degradation at peak loads, and
improved both user response time and overall throughput by reducing
the database connections used by the connection pool. Peak load was
handled much better that way.

I cringe every time I see someone say they need a large number of
database connections in order to support that many users. Having one
database connection for each user is, in my experience, guaranteed to
make things fall apart under load. 

-Kevin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-24 Thread Vlad
it's session mode and the pool size is 1200 (cause I need to grantee that
in the worst case we have enough slots for all possible clients), however
even at the times preceding  high-cpu-sys-stall, the number postmasters are
like 15-20. When stall happens, it starts to raise but that's the result of
DB not answering in timely manner.

as far as data - yes, I can probably clean up names and provide some data
with same queries, though  - we are not able to cause a stall just with
those queries   pgbench; we are able to make the situation worse and stall
much likely when it's overlapped with regular queries coming from our app.

-- Vlad


On Wed, Nov 21, 2012 at 10:45 AM, Merlin Moncure mmonc...@gmail.com wrote:

 On Wed, Nov 21, 2012 at 10:43 AM, Jeff Janes jeff.ja...@gmail.com wrote:
  On Wed, Nov 21, 2012 at 7:29 AM, Vlad Marchenko marche...@gmail.com
 wrote:
 
  update on my problem: despite pgbouncer, the problem still occures on my
  end.
 
  As Merlin asked, how big is the pool?  Maybe you are using a large
  enough pool so as to defeat the purpose of restricting the number of
  connections.

 also, what mode.

 merlin



Re: [GENERAL] High SYS CPU - need advise

2012-11-24 Thread Gavin Flower

On 25/11/12 11:11, Kevin Grittner wrote:

Gavin Flower wrote:


We found that the real-world production performance of a web
application servicing millions of we hits per day with thousands
of concurrent users improved when we reconfigured our database
connection pool to be about 35 instead of 55, on a 16 core box
with a 40 drive RAID.

Curious, what is your RAID configuration?

We have the OS on RAID 1 on one RAID controller, WAL on another RAID
1 array on its own controller, and the PostgreSQL data directory on
RAID 5 on a pair of controllers which each has a path to every drive.
(Don't ask me how that works, I just trust the hardware guys to get
that part right.) The active portion of the database is cached, the
read-to-write ratio is about 10 to 1, and the controllers all have
BBU cache with write-back. It is actually one pool of about 30 for
the read-only web app with so many hits, and a separate pool of 6
database connections for logical replication from about 80 source
databases handling an OLTP load of 3000 directly connected users.

We were seeing some performance degradation at peak loads, and
improved both user response time and overall throughput by reducing
the database connections used by the connection pool. Peak load was
handled much better that way.

I cringe every time I see someone say they need a large number of
database connections in order to support that many users. Having one
database connection for each user is, in my experience, guaranteed to
make things fall apart under load.

-Kevin


Thanks for a very comprehensive answer! (almost simply asked what 'type' 
of RAID did you have, I am glad I rephrased that!  I learnt far more 
than I expected, which is good - more than merely satisfied my curiosity!.


I use RAID-6; but in development, reliability (for me, I've had 5 hard 
disk crashes in 20 years) is orders of magnitude more important than 
performance, with rare exceptions!


I can understand the mentality of wanting one connection per user, 
though I fully know why it is the wrong approach - I would also want to 
use connection pooling in your situation.


Backing the 1980's I was a Mainframe COBOL analyst/programmer and just 
starting to use databases. Initially I thought of a transaction in terms 
of the user wanting to ensure their display and update were in the same 
transaction. Took me a while to fully appreciate and accept that 
transactions should be short to protect the database while not blocking 
other users too much (multi-user access was also a whole new concept), 
and that keeping a transaction open for user actions was not the 'done 
thing'! It requires a whole new change of mindset – something a lot of 
people find difficult, as it conflicts with part of their implicit 
belief system (just try discussing even the most obvious flaws in 
Creationism with a Believer!).


Now I'm quite comfortable with the idea of multi-user and am much more 
database centric while still appreciating the needs of users. Now my 
weapons of choice are Java  Linux, with Postgres being my preferred 
database.



Cheers,
Gavin







Re: [GENERAL] High SYS CPU - need advise

2012-11-24 Thread Merlin Moncure
On Tue, Nov 20, 2012 at 4:08 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 It strikes me as cavalier to be resetting
 trycounter while sitting under the #1 known contention point for read
 only workloads.

 The only use for the trycounter is to know when to ERROR out with no
 unpinned buffers available, so not resetting that seems entirely
 wrong.

 I would contest the #1 known contention point claim.  We know that
 the freelist lock is a point of contention under certain conditions,
 but we (or at least I) also know that it is the mere acquisition of
 this lock, and not the work done while it is held, that is important.

Well, I'm speculating (without evidence) if that is in fact might not
be always true.  It certainly is hard to demonstrate with synthetic
testing though.  I guess the first step would be to whip up a
diagnostic patch to confirm malfeasance in suspicious cases.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-24 Thread Jeff Janes
On Tue, Nov 20, 2012 at 12:00 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Tue, Nov 20, 2012 at 12:16 PM, Jeff Janes jeff.ja...@gmail.com wrote:

 The freelist should never loop.  It is written as a loop, but I think
 there is currently no code path which ends up with valid buffers being
 on the freelist, so that loop will never, or at least rarely, execute
 more than once.

 Both of those operations are
 dependent on the number of buffers being managed and so it's
 reasonable to expect some workloads to increase contention with more
 buffers.

 The clock sweep can depend on the number of buffers begin managed in a
 worst-case sense, but I've never seen any evidence (nor analysis) that
 this worst case can be obtained in reality on an ongoing basis.  By
 constructing two pathological workloads which get switched between, I
 can get the worst-case to happen, but when it does happen the
 consequences are mild compared to the amount of time needed to set up
 the necessary transition.  In other words, the worse-case can't be
 triggered often enough to make a meaningful difference.

 Yeah, good points;  but (getting off topic here) : there have been
 several documented cases of lowering shared buffers improving
 performance under contention...the  'worst case' might be happening
 more than expected.

The ones that I am aware of (mostly Greg Smith's case studies) this
has been for write-intensive work loads and are related to
writes/fsyncs getting gummed up.

Shaun Thomas reports one that is (I assume) not read intensive, but
his diagnosis is that this is a kernel bug where a larger
shared_buffers for no good reason causes the kernel to kill off its
page cache.  From the kernel's perspective, the freelist lock doesn't
look any different from any other lwlock, so I doubt that is issue is
related to freelist lock.

 In particular, what happens when a substantial
 percentage of the buffer pool is set with a non-zero usage count?

The current clock sweep algorithm is an extraordinary usagecount
decrementing machine.  From what I know, the only way to get much more
than half of the buffers to be non-zero usage count is for the
clock-sweep to rarely run (in which case, it is hard to be the
bottleneck if it rarely runs), or for most of the buffer-cache to be
pinned simultaneously.

 This seems unlikely, but possible?  Take note:

 if (buf-refcount == 0)
 {
 if (buf-usage_count  0)
 {
 buf-usage_count--;
 trycounter = NBuffers;  /* emphasis *./
 }

 ISTM time spent here isn't bounded except that as more time is spent
 sweeping (more backends are thus waiting and not marking pages) the
 usage counts decrease faster until you hit steady state.

But that is a one time thing.  Once you hit the steady state, how do
you get away from it again, such that a large amount of work is needed
again?

 Smaller
 buffer pool naturally would help in that scenario as your usage counts
 would drop faster.

They would drop at the same rate in absolute numbers, barring the
smaller buffer_cache fitting entirely in the on-board CPU cache.

They would drop faster in percentage terms, but they would also
increase faster in percentage terms once a candidate is found and a
new page read into it.

 It strikes me as cavalier to be resetting
 trycounter while sitting under the #1 known contention point for read
 only workloads.

The only use for the trycounter is to know when to ERROR out with no
unpinned buffers available, so not resetting that seems entirely
wrong.

I would contest the #1 known contention point claim.  We know that
the freelist lock is a point of contention under certain conditions,
but we (or at least I) also know that it is the mere acquisition of
this lock, and not the work done while it is held, that is important.

If I add a spurious
LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE); LWLockRelease(BufFreelistLock);
to each execution of StrategyGetBuffer, then contention kicks in twice as fast.

But if I instead hack the clock sweep to run twice as far (ignore the
first eligible buffer it finds, and go find another one) but all under
the cover of a single BufFreelistLock acquisition, there was no
meaningful increase in contention.

This was all on a 4 socket x 2 core/socket opteron machine which I no
longer have access to.  Using a more modern 8 core on a single socket,
I can't get it to collapse on BufFreelistLock at all, presumably
because the cache coherence mechanisms are so much faster.

  Shouldn't SBF() work on advisory basis and try to
 force a buffer after N failed usage count attempts?

I believe Simon tried that a couple commit-fests ago, and no one could
show that it made a difference.

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:

Re: [GENERAL] High SYS CPU - need advise

2012-11-24 Thread Jeff Janes
On Wed, Nov 21, 2012 at 8:14 AM, Vlad marche...@gmail.com wrote:


 '-M prepared' produces normal results, while '-M simple' results in 40% sys
 cpu.  '-M extended' is somewhere in between.
 I'm running it as 60 clients, 2 threads.

2 threads is pretty low for 60 clients.  What happens if you increase
-j to either 30 or 60?  (What I'm wondering here is if perhaps in this
case it is pgbench which is triggering the scheduling issue)

Also, what happens if you channel pgbench through pgbouncer?

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-23 Thread Merlin Moncure
On Wed, Nov 21, 2012 at 12:17 PM, Vlad marche...@gmail.com wrote:
 It turned out we can't use transaction mode, cause there are prepared
 statement used a lot within code, while processing a single http request.

prepare statements can be fudged within some constraints.  if prepared
statements are explicitly named via PREPARE, you can simply prepare
them all on server connection via connect_query setting and disable
the manual preparation.  you then change the server_reset_query so
that they are not discarded.  some basic experimentation might confirm
if this is viable strategy.  automatic protocol level statements can
be an issue though.

 Also, I can't 100% rule out that there won't be any long running
 (statistical) queries launched (even though such requests should not come to
 this database), which would occupy connection for longer time, but do not
 create any race condition... So having pool size at 8 may be too slim .

there are a number of simple tricks to deal with this:
1) move long running queries to their own pool (by changing login user
or connection string)
2) bypass pgbouncer in those cases
3) increase pool size

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-23 Thread Vlad
Merlin,

On Wed, Nov 21, 2012 at 2:17 PM, Merlin Moncure mmonc...@gmail.com wrote:

 On Wed, Nov 21, 2012 at 12:17 PM, Vlad marche...@gmail.com wrote:
  It turned out we can't use transaction mode, cause there are prepared
  statement used a lot within code, while processing a single http request.

 prepare statements can be fudged within some constraints.  if prepared
 statements are explicitly named via PREPARE, you can simply prepare
 them all on server connection via connect_query setting and disable
 the manual preparation.  you then change the server_reset_query so
 that they are not discarded.  some basic experimentation might confirm
 if this is viable strategy.  automatic protocol level statements can
 be an issue though.


We have 350k+ lines of code in our app, so this is not quite feasible as
I'd wish.


  Also, I can't 100% rule out that there won't be any long running
  (statistical) queries launched (even though such requests should not
 come to
  this database), which would occupy connection for longer time, but do not
  create any race condition... So having pool size at 8 may be too slim .

 there are a number of simple tricks to deal with this:
 1) move long running queries to their own pool (by changing login user
 or connection string)
 2) bypass pgbouncer in those cases
 3) increase pool size


It's pretty much already setup so that long running queries should not hit
the same DB cluster as those with (potentially) high connection/query
rates, but I still can't rule out that no long-running queries will be
issued via pgbouncer.

Either case - it seems that the combination of pool size  = 200 / pool mode
= session / server_lifetime = 30 makes things stable for now.

I'm planning to repeat my case on 2.6.x kernel, but it will be a while
before I have chance to do that.


Thanks.

-- vlad


Re: [GENERAL] High SYS CPU - need advise

2012-11-23 Thread Vlad
It turned out we can't use transaction mode, cause there are prepared
statement used a lot within code, while processing a single http request.

Also, I can't 100% rule out that there won't be any long running
(statistical) queries launched (even though such requests should not come
to this database), which would occupy connection for longer time, but do
not create any race condition... So having pool size at 8 may be too slim .

It's a single database.

-- Vlad


On Wed, Nov 21, 2012 at 12:04 PM, Merlin Moncure mmonc...@gmail.com wrote:

 On Wed, Nov 21, 2012 at 11:56 AM, Vlad marche...@gmail.com wrote:
  ok, understood.
  I need to give some more thoughts to if it's possible for us to switch to
  transaction mode from app standpoint of view.
 
  if yes, then setting pool size to 20 (for 8 cores)  sounds OK?

 If it was me, I would be starting with exactly 8 and be tuning from there.

 Don't forget: each database role gets its own pool.

 merlin



Re: [GENERAL] High SYS CPU - need advise

2012-11-23 Thread Merlin Moncure
On Wed, Nov 21, 2012 at 11:56 AM, Vlad marche...@gmail.com wrote:
 ok, understood.
 I need to give some more thoughts to if it's possible for us to switch to
 transaction mode from app standpoint of view.

 if yes, then setting pool size to 20 (for 8 cores)  sounds OK?

If it was me, I would be starting with exactly 8 and be tuning from there.

Don't forget: each database role gets its own pool.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-23 Thread Vlad
ok, understood.
I need to give some more thoughts to if it's possible for us to switch to
transaction mode from app standpoint of view.

if yes, then setting pool size to 20 (for 8 cores)  sounds OK?

-- Vlad


Re: [GENERAL] High SYS CPU - need advise

2012-11-21 Thread Vlad Marchenko
update on my problem: despite pgbouncer, the problem still occures on my 
end.


Also, interesting observation - I ran several tests with pgbench, using 
queries that I think are prone to trigger high-sys-cpu-stall. What I 
noticed is when pgbench is started with prepared mode, the system 
behaves fine during stress-test (high user cpu - 85-90%, low sys cpu - 
5-7%), high TPS. Though when I used non-prepared modes, the sys cpu 
portion jumps to 40% (and tps drops dramatically as well, but this is 
understandable).  The test queries are pretty long (10kb+), with couple 
of outer joins across 1000-record tables with indexes.


Maybe, we are looking in a wrong place and the issue is somewhere within 
planer/parser? Is there some extensive locking used in there?


Another observation - it's harder to trigger high-sys-cpu stall on a 
freshly restarted postgresql. Though if it was running for a while, then 
it's much easier to do.


-- vlad

On 11/19/12 8:33 AM, Merlin Moncure wrote:

On Sun, Nov 18, 2012 at 4:24 PM, Jeff Janes jeff.ja...@gmail.com wrote:

On Fri, Nov 16, 2012 at 12:13 PM, Vlad marche...@gmail.com wrote:

ok, I've applied that patch and ran. The stall started around 13:50:45...50
and lasted until the end

https://dl.dropbox.com/u/109778/postgresql-2012-11-16_134904-stripped.log

That isn't as much log as I expected.  But I guess only the tip of the
iceberg gets logged, it could be that most spinlocks are contended but
then get acquired just before the sleep (and log) would have happened.

I'm not sure how to figure out just how much spinning is going on
below the surface, but my gut feeling is there isn't enough of it to
explain the amount of slowdown you are seeing (and it would probably
be accounted for as user time rather than system time)


yes. classic spinlock contention symptoms are high user cpu and almost
work getting done.   something else is going on here?


Maybe we could whip up a dummy standalone  program that uses PG's
spinlock code in a intentionally contended  way and see how your
system reports resource usage for that.  Maybe your system reports
fighting for cache lines or whatever is going on behind the scenes as
system time rather than user time, or something.

I don't think this is the case.  It's looking more and more like
scheduler issues, although it's unclear if that is due to some kind of
regression or just symptoms of backends piling up.  I'm also starting
to wonder if the spinlock contention we are seeing (which according to
your patch really isn't all that much really) is just symptomatic of
scheduler issues.


It would be a little easier to figure out what was going on there if
the log included the pid (%p) so we can see if it is the same process
doing a bunch of consecutive sleeps, or a bunch of different
processes.  Also, logging the millisecond (%m rather than %t) would be
helpful.

But the way that the point of contention jumps around a lot points to
a scheduler issue rather than a postgres-specific issue.  A backend
gets pre-empted while holding a spinlock, and then the kernel is
resistant to rescheduling it even though all other processes are piled
up behind it.  But again, that theory would lead to more log lines
than we see, I would think.

yeah.  I asked for some longer strace logs but then everyone went home
for the weekend.


Anyway, pgbouncer might help by ensuring that a connection gets
pre-empted by pgbouncer at a point where it has no locks, rather than
by the kernel at a point where it does hold locks.

yes -- I think pgbouncer is the key here because it's going to tell us
if the scheduler is the culprit here.  If you look carefully at the
strace logs, you'll see that system call times become unpredictable as
they are getting scheduled out...particularly yielding calls like
select().  So we need to determine if this is normal high load
behavior or if there is an underlying mechanic.


Any chance you could try an older kernel and see if that reduces the problem?

This is one thing to try.  Personally I think pgbouncer/transaction
mode is low hanging fruit if it 'drops in' to OP's app. If problem
goes away this will pretty much point the finger at Linux AFAICT (in
which case we can start messing around with the kernel).

merlin




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-21 Thread Merlin Moncure
On Wed, Nov 21, 2012 at 9:29 AM, Vlad Marchenko marche...@gmail.com wrote:
 update on my problem: despite pgbouncer, the problem still occures on my
 end.

 Also, interesting observation - I ran several tests with pgbench, using
 queries that I think are prone to trigger high-sys-cpu-stall. What I noticed
 is when pgbench is started with prepared mode, the system behaves fine
 during stress-test (high user cpu - 85-90%, low sys cpu - 5-7%), high TPS.
 Though when I used non-prepared modes, the sys cpu portion jumps to 40% (and
 tps drops dramatically as well, but this is understandable).  The test
 queries are pretty long (10kb+), with couple of outer joins across
 1000-record tables with indexes.

 Maybe, we are looking in a wrong place and the issue is somewhere within
 planer/parser? Is there some extensive locking used in there?

 Another observation - it's harder to trigger high-sys-cpu stall on a freshly
 restarted postgresql. Though if it was running for a while, then it's much
 easier to do.

what pgbouncer mode, and how large is your pool.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-21 Thread Jeff Janes
On Wed, Nov 21, 2012 at 7:29 AM, Vlad Marchenko marche...@gmail.com wrote:

 update on my problem: despite pgbouncer, the problem still occures on my
 end.

As Merlin asked, how big is the pool?  Maybe you are using a large
enough pool so as to defeat the purpose of restricting the number of
connections.


 Also, interesting observation - I ran several tests with pgbench, using
 queries that I think are prone to trigger high-sys-cpu-stall. What I noticed
 is when pgbench is started with prepared mode, the system behaves fine
 during stress-test (high user cpu - 85-90%, low sys cpu - 5-7%), high TPS.
 Though when I used non-prepared modes, the sys cpu portion jumps to 40% (and
 tps drops dramatically as well, but this is understandable).  The test
 queries are pretty long (10kb+), with couple of outer joins across
 1000-record tables with indexes.

Could you sanitize the queries (and some statements to generate dummy
data) enough to share?


 Maybe, we are looking in a wrong place and the issue is somewhere within
 planer/parser? Is there some extensive locking used in there?

I don't think the locking is particular extensive, but it could be
enough extra to drive something over the edge.

But it would be the same nature of locking as elsewhere (spinlocks and
lwlocks), so it doesn't really change the nature of the problem, which
is still Why do these user-space locks turn into high SYS cpu?

 Another observation - it's harder to trigger high-sys-cpu stall on a freshly
 restarted postgresql. Though if it was running for a while, then it's much
 easier to do.

Maybe the long running time has built up enough resource usage to
cause the kernel scheduler to get into a snit, so it decides to
preempt the process while it holds a spinlock, and then refuses to run
it again for a while.

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-20 Thread Vlad
Some additional observation and food for thoughts. Our app uses connection
caching (Apache::DBI). By disabling Apache::DBI and forcing
client re-connection for every (http) request processed I eliminated the
stall. The user cpu usage jumped (mostly cause prepared sql queries are no
longer available, and some additional overhead on re-connection), but no
single case of high-sys-cpu stall.

I can not completely rule out the possibility of some left-overs
(unfinished transaction?) remain after serving http request, which, in the
absence of connection caching, are discarded for sure

-- Vlad


On Mon, Nov 19, 2012 at 11:19 AM, Merlin Moncure mmonc...@gmail.com wrote:


 yeah.  interesting -- contention was much higher this time and that
 changes things.  strange how it was missed earlier.

 you're getting bounced around a lot in lwlock especially
 (unfortunately we don't know which one).  I'm going to hazard another
 guess:  maybe the trigger here is when the number of contending
 backends exceeds some critical number (probably based on the number of
 cores) you see a quick cpu spike (causing more backends to lock and
 pile up) as cache line bouncing sets in.  That spike doesn't last
 long, because the spinlocks quickly accumulate delay counts then punt
 to the scheduler which is unable to cope.  The exact reason why this
 is happening to you in exactly this way (I've never seen it) is
 unclear.  Also the line between symptom and cause is difficult to
 draw.

 unfortunately, in your case spinlock re-scheduling isn't helping.  log
 entries like this one:
 18764 [2012-11-19 10:43:50.124 CST] LOG:  JJ spin delay from file
 sinvaladt.c line 512 delay 212, pointer 0x7f514959a394 at character 29

 are suggesting major problems.  you're dangerously close to a stuck
 spinlock which is lights out for the database.

 merlin



Re: [GENERAL] High SYS CPU - need advise

2012-11-20 Thread Merlin Moncure
On Fri, Nov 16, 2012 at 3:21 PM, Vlad marche...@gmail.com wrote:
 what would pgbouncer do in my case? Number of connections will decrease, but
 number of active clients won't be smaller. As I understand the latter ones
 are that important.

Well, one thing that struck me was how little spinlock contention
there actually was.   Yeah, backends are delaying here and there,
which isn't great, but  but a few dozen delays per second across
several hundred backends doesn't seem like it should be pegging sys
cpu%.  This is all pointing to the problem not being in postgres, but
in linux.

pgbouncer would do two things:
1) perhaps guard you against some o/s issue
2) keep system more responsive during stall (since by controlling the
pool you control the number of queries piling up).

of course, this comes at the expense of some overhead.

But, before going through all that, how about timing strace recorded
calls (strace -T) both in stall and non-stall conditions.   I was
assuming select(), but maybe it's something elsefor example the
recently fixed lseek.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-20 Thread Vlad
ok, I've applied that patch and ran. The stall started around 13:50:45...50
and lasted until the end

https://dl.dropbox.com/u/109778/postgresql-2012-11-16_134904-stripped.log

the actual log has more data (including statement following each 'spin
delay' record), but there is some sensitive info inside which I can't share
with public.


-- Vlad


yup.  this problem doesn't smell like lwlock issues.  typically there
  the problem manifests as low cpu performance, everybody waits.
 classic spinlock contention (at least from what i've seen) is very
 high *userspace* cpu utilization and low work output.  this time it's
 different -- OP is bogging in the kernel so it's not impossible we're
 troubleshooting the symptom, not the cause.

  In 9.3 there is a new field that tells how many spin delays there were
  on the mutex that is behind each lock.  That was  commit
  b79ab00144e64217d41, maybe he can port that back to his version.
 
  But that only tells you about LWLock mutexes, not about all the other
  ones in PG.
 
  The attached patch logs every spin delay with where in the source it
 comes from.

 yeah, OP should fire this off. good stuff.  I'll bet lunch (if we ever
 happen to meet) it's on buffer pin.

 merlin



Re: [GENERAL] High SYS CPU - need advise

2012-11-20 Thread Merlin Moncure
On Mon, Nov 19, 2012 at 12:02 PM, Vlad marche...@gmail.com wrote:

 Some additional observation and food for thoughts. Our app uses connection
 caching (Apache::DBI). By disabling Apache::DBI and forcing client
 re-connection for every (http) request processed I eliminated the stall. The
 user cpu usage jumped (mostly cause prepared sql queries are no longer
 available, and some additional overhead on re-connection), but no single
 case of high-sys-cpu stall.

 I can not completely rule out the possibility of some left-overs (unfinished
 transaction?) remain after serving http request, which, in the absence of
 connection caching, are discarded for sure

well, that is good.  note for pgbouncer transaction mode you would
have had to quit using prepared statements anyways (see list of
features here: http://wiki.postgresql.org/wiki/PgBouncer).  you can
use session mode though, and that would be a good test.  faster speed
could be because of application issues or maybe just spreading out the
queries reduced contention.  hard to say.

well, what happens next is up to you -- we can spend more time chasing
this down if you want, or you can hold back and see how things run for
a while.   what is your user load up to in the worst case?

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-20 Thread Merlin Moncure
On Fri, Nov 16, 2012 at 2:13 PM, Vlad marche...@gmail.com wrote:
 ok, I've applied that patch and ran. The stall started around 13:50:45...50
 and lasted until the end

 https://dl.dropbox.com/u/109778/postgresql-2012-11-16_134904-stripped.log

 the actual log has more data (including statement following each 'spin
 delay' record), but there is some sensitive info inside which I can't share
 with public.

first thoughts:
no single thing really stands out -- contention is all over the place.
lwlock, pinbuffer, dynahash (especially).  I am again suspicious of
bad scheduler interaction.  any chance we can fire up pgbouncer?

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-20 Thread Shaun Thomas

On 11/16/2012 02:31 PM, Merlin Moncure wrote:


no single thing really stands out -- contention is all over the place.
lwlock, pinbuffer, dynahash (especially).  I am again suspicious of
bad scheduler interaction.  any chance we can fire up pgbouncer?


Just want to throw it out there, but we've been having really bad luck 
with the scheduler recently. But only when we use 8GB (on our 72GB 
system) for shared_buffers. Cut that down to 4GB, and everything is fine 
and dandy.


I think the kernel devs have added in some overzealous scheduler code on us.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
stho...@optionshouse.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-20 Thread Jeff Janes
On Fri, Nov 16, 2012 at 12:13 PM, Vlad marche...@gmail.com wrote:
 ok, I've applied that patch and ran. The stall started around 13:50:45...50
 and lasted until the end

 https://dl.dropbox.com/u/109778/postgresql-2012-11-16_134904-stripped.log

That isn't as much log as I expected.  But I guess only the tip of the
iceberg gets logged, it could be that most spinlocks are contended but
then get acquired just before the sleep (and log) would have happened.

I'm not sure how to figure out just how much spinning is going on
below the surface, but my gut feeling is there isn't enough of it to
explain the amount of slowdown you are seeing (and it would probably
be accounted for as user time rather than system time)

Maybe we could whip up a dummy standalone  program that uses PG's
spinlock code in a intentionally contended  way and see how your
system reports resource usage for that.  Maybe your system reports
fighting for cache lines or whatever is going on behind the scenes as
system time rather than user time, or something.

It would be a little easier to figure out what was going on there if
the log included the pid (%p) so we can see if it is the same process
doing a bunch of consecutive sleeps, or a bunch of different
processes.  Also, logging the millisecond (%m rather than %t) would be
helpful.

But the way that the point of contention jumps around a lot points to
a scheduler issue rather than a postgres-specific issue.  A backend
gets pre-empted while holding a spinlock, and then the kernel is
resistant to rescheduling it even though all other processes are piled
up behind it.  But again, that theory would lead to more log lines
than we see, I would think.

Anyway, pgbouncer might help by ensuring that a connection gets
pre-empted by pgbouncer at a point where it has no locks, rather than
by the kernel at a point where it does hold locks.

Any chance you could try an older kernel and see if that reduces the problem?

One thing that does stand out is that it looks like partitioned hash
tables in dynahash.c could benefit by having their freelist
partitioned as well, assuming we can give up the hash_get_num_entries
function.

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-20 Thread Merlin Moncure
On Tue, Nov 20, 2012 at 9:02 AM, Shaun Thomas stho...@optionshouse.com wrote:
 On 11/16/2012 02:31 PM, Merlin Moncure wrote:

 no single thing really stands out -- contention is all over the place.
 lwlock, pinbuffer, dynahash (especially).  I am again suspicious of
 bad scheduler interaction.  any chance we can fire up pgbouncer?


 Just want to throw it out there, but we've been having really bad luck with
 the scheduler recently. But only when we use 8GB (on our 72GB system) for
 shared_buffers. Cut that down to 4GB, and everything is fine and dandy.

 I think the kernel devs have added in some overzealous scheduler code on us.

Shared buffer manipulation changing contention is suggesting you're
running into free list lock issues.  How many active backends/cores?

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-20 Thread Shaun Thomas

On 11/20/2012 10:03 AM, Merlin Moncure wrote:


Shared buffer manipulation changing contention is suggesting you're
running into free list lock issues.  How many active backends/cores?


Oh, the reason I wanted to point it out was that we see a lot more than 
CPU contention with higher shared_buffer settings. The scheduler starts 
aggressively clearing out the file cache *and* freeing memory, driving 
load way higher than it should be. That doesn't result in a very high 
iowait, but all that page swapping sure isn't free.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
stho...@optionshouse.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-20 Thread Merlin Moncure
On Tue, Nov 20, 2012 at 10:12 AM, Shaun Thomas stho...@optionshouse.com wrote:
 On 11/20/2012 10:03 AM, Merlin Moncure wrote:

 Shared buffer manipulation changing contention is suggesting you're
 running into free list lock issues.  How many active backends/cores?


 Oh, the reason I wanted to point it out was that we see a lot more than CPU
 contention with higher shared_buffer settings. The scheduler starts
 aggressively clearing out the file cache *and* freeing memory, driving load
 way higher than it should be. That doesn't result in a very high iowait, but
 all that page swapping sure isn't free.

have you ruled out numa issues?
(http://frosty-postgres.blogspot.com/2012/08/postgresql-numa-and-zone-reclaim-mode.html)

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-20 Thread Shaun Thomas

On 11/20/2012 10:13 AM, Merlin Moncure wrote:


have you ruled out numa issues?
(http://frosty-postgres.blogspot.com/2012/08/postgresql-numa-and-zone-reclaim-mode.html)


Haha. Yeah. Our zone reclaim mode off, and node distance is 10 or 20. 
ZCM is only enabled by default if distance is  20, unless there's some 
kernel bug triggering it even when zone reclaim reports as off.


I'll also note that my tests with interleave made no difference at all. 
At least not with pgbench. There was a small amount of flux with TPS, 
but the page swap storms came regardless of NUMA tweaks. The formula 
worked like this:


High connection count + high shared_buffers = page swap storm. I'll note 
that 8GB - 4GB immediately stopped the paging everywhere, and the OS 
went from using 13GB for active file cache, to 45GB. I can't see how PG 
would cause something like that by itself.


I only piped in because it's very adversely affecting our CPU load in a 
similarly inexplicable-but-seemingly-the-scheduler way.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
stho...@optionshouse.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-20 Thread Merlin Moncure
On Thu, Nov 15, 2012 at 4:29 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Merlin Moncure escribió:

 ok, excellent.   reviewing the log, this immediately caught my eye:

 recvfrom(8, \27\3\1\0@, 5, 0, NULL, NULL) = 5
 recvfrom(8, 
 \327\327\nl\231LD\211\346\243@WW\254\244\363C\326\247\341\177\255\263~\327HDv-\3466\353...,
 64, 0, NULL, NULL) = 64
 select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 2000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 3000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 4000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 6000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 7000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 8000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 9000})  = 0 (Timeout)
 semop(41713721, {{2, 1, 0}}, 1) = 0
 lseek(295, 0, SEEK_END) = 0
 lseek(296, 0, SEEK_END) = 8192

 this is definitely pointing to spinlock issue.

 I met Rik van Riel (Linux kernel hacker) recently and we chatted about
 this briefly.  He strongly suggested that we should consider using
 futexes on Linux instead of spinlocks; the big advantage being that
 futexes sleep instead of spinning when contention is high.  That would
 reduce the system load in this scenario.

By the way, I think you (and Rik) 100% right.  We're slamming select()
where futex has much smarter wait mechanism.  There are reports from
users who have switched to pthreads and reporting better scalability.
(https://groups.google.com/forum/?fromgroups=#!topic/pgsql.hackers/v8V82gznxs8)

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-20 Thread Jeff Janes
On Tue, Nov 20, 2012 at 8:03 AM, Merlin Moncure mmonc...@gmail.com wrote:
 On Tue, Nov 20, 2012 at 9:02 AM, Shaun Thomas stho...@optionshouse.com 
 wrote:
 On 11/16/2012 02:31 PM, Merlin Moncure wrote:

 no single thing really stands out -- contention is all over the place.
 lwlock, pinbuffer, dynahash (especially).  I am again suspicious of
 bad scheduler interaction.  any chance we can fire up pgbouncer?


 Just want to throw it out there, but we've been having really bad luck with
 the scheduler recently. But only when we use 8GB (on our 72GB system) for
 shared_buffers. Cut that down to 4GB, and everything is fine and dandy.

 I think the kernel devs have added in some overzealous scheduler code on us.

 Shared buffer manipulation changing contention is suggesting you're
 running into free list lock issues.

I wouldn't expect so.  Increasing shared_buffers should either fix
free list lock contention, or leave it unchanged, not make it worse.
Sounds more like that NUMA/interleave/zone reclaim stuff I could never
wrap my head around.

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-20 Thread Merlin Moncure
On Tue, Nov 20, 2012 at 10:50 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Tue, Nov 20, 2012 at 8:03 AM, Merlin Moncure mmonc...@gmail.com wrote:
 On Tue, Nov 20, 2012 at 9:02 AM, Shaun Thomas stho...@optionshouse.com 
 wrote:
 On 11/16/2012 02:31 PM, Merlin Moncure wrote:

 no single thing really stands out -- contention is all over the place.
 lwlock, pinbuffer, dynahash (especially).  I am again suspicious of
 bad scheduler interaction.  any chance we can fire up pgbouncer?


 Just want to throw it out there, but we've been having really bad luck with
 the scheduler recently. But only when we use 8GB (on our 72GB system) for
 shared_buffers. Cut that down to 4GB, and everything is fine and dandy.

 I think the kernel devs have added in some overzealous scheduler code on us.

 Shared buffer manipulation changing contention is suggesting you're
 running into free list lock issues.

 I wouldn't expect so.  Increasing shared_buffers should either fix
 free list lock contention, or leave it unchanged, not make it worse.

AIUI, that is simply not true (unless you raise it to the point you're
not churning them).   I'm looking at StrategyGetBuffer() for non-scan
cases.  It locks BufFreelistLock then loops the free list, and, if
it finds nothing, engages a clock sweep.  Both of those operations are
dependent on the number of buffers being managed and so it's
reasonable to expect some workloads to increase contention with more
buffers.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-20 Thread Vlad
what would pgbouncer do in my case? Number of connections will decrease,
but number of active clients won't be smaller. As I understand the latter
ones are that important.

-- Vlad


On Fri, Nov 16, 2012 at 2:31 PM, Merlin Moncure mmonc...@gmail.com wrote:


 first thoughts:
 no single thing really stands out -- contention is all over the place.
 lwlock, pinbuffer, dynahash (especially).  I am again suspicious of
 bad scheduler interaction.  any chance we can fire up pgbouncer?

 merlin



Re: [GENERAL] High SYS CPU - need advise

2012-11-20 Thread Shaun Thomas

On 11/20/2012 04:08 PM, Jeff Janes wrote:


Shaun Thomas reports one that is (I assume) not read intensive, but
his diagnosis is that this is a kernel bug where a larger
shared_buffers for no good reason causes the kernel to kill off its
page cache.


We're actually very read intensive. According to pg_stat_statements, we 
regularly top out at 42k queries per second, and pg_stat_database says 
we're pushing 7k TPS.


But I'm still sure this is a kernel bug. Moving from 4GB to 6GB or 8GB 
causes the kernel to cut the active page cache in half, in addition to 
freeing 1/4 of RAM to just sit around doing nothing. That in turn causes 
kswapd to work constantly, while our IO drivers work to undo the damage. 
It's a positive feedback loop that I can reliably drive the load up to 
800+ on an 800-client pgbench with two threads, all while having 0% CPU 
free.


Make that 4GB, and not only does the problem completely disappear, but 
the load settles down to around 9, and the machine becomes about 60% 
idle. Something in there is fantastically broken, but I can't point a 
finger at what.


I was just piping in because, in absence of an obvious PG-related 
culprit, the problem could be the OS itself. It certainly was in our case.


That, or PG has a memory leak that only appears at  4GB of shared buffers.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
stho...@optionshouse.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-20 Thread Jeff Janes
On Tue, Nov 20, 2012 at 2:26 PM, Shaun Thomas stho...@optionshouse.com wrote:
 On 11/20/2012 04:08 PM, Jeff Janes wrote:

 Shaun Thomas reports one that is (I assume) not read intensive, but
 his diagnosis is that this is a kernel bug where a larger
 shared_buffers for no good reason causes the kernel to kill off its
 page cache.


 We're actually very read intensive.

Sorry, that is what I meant.  I wrote not write intensive, then
decided it would be clearer to change not write to read and then
forgot to remove the not.  I hate it when that happens.

Atomic update commit failure in the meatware :)

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-20 Thread Shaun Thomas

On 11/20/2012 04:35 PM, Jeff Janes wrote:


Atomic update commit failure in the meatware :)


Ha.

What's actually funny is that one of the affected machines started 
*swapping* earlier today. With 15GB free, and 12GB of inactive cache, 
and vm.swappiness set to 0, it somehow decided there was enough memory 
pressure to push 1.5GB of *something* into swap.


Frankly, the kernel we're using is clearly deranged. There's really no 
other way to explain this. But we were blaming various elements of PG 
for several weeks following our upgrade. After further examination, our 
query profile isn't any different, and the load spikes were effectively 
random.


All I can say is... be wary of the 3.2 kernel. It's... unique.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
stho...@optionshouse.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-20 Thread Gavin Flower

On 21/11/12 11:41, Shaun Thomas wrote:

On 11/20/2012 04:35 PM, Jeff Janes wrote:


Atomic update commit failure in the meatware :)


Ha.

What's actually funny is that one of the affected machines started 
*swapping* earlier today. With 15GB free, and 12GB of inactive cache, 
and vm.swappiness set to 0, it somehow decided there was enough memory 
pressure to push 1.5GB of *something* into swap.


Frankly, the kernel we're using is clearly deranged. There's really no 
other way to explain this. But we were blaming various elements of PG 
for several weeks following our upgrade. After further examination, 
our query profile isn't any different, and the load spikes were 
effectively random.


All I can say is... be wary of the 3.2 kernel. It's... unique.


You'll just have to upgrade to Microsoft!
Ducks, and runs away _VERY_ quickly!



Could you not install a later kernel?


Cheers,
Gavin


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-19 Thread Vlad Marchenko
OK, so far I settled on excluding connection caching on app side 
(Apache::DBI and prepare_cached) from equation and adding pgbouncer as a 
counter-measure. This seems to stabilize the situation - at least I'm 
not able to push server into high-sys-cpu stall the way how I used to do.


I'm still interested in getting to the bottom of the issue vs 
alleviating it with connection poller, so I might setup a server on 
Centos 6.3 (kernel ver 2.6.32) and give it a try.


Alternatively, if anyone have ideas on playing with sysctl parameters 
related to scheduler - please let me know.


-- vlad





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-19 Thread Merlin Moncure
On Fri, Nov 16, 2012 at 11:19 AM, Vlad marche...@gmail.com wrote:

 We're looking for spikes in 'blk' which represents when lwlocks bump.
 If you're not seeing any then this is suggesting a buffer pin related
 issue -- this is also supported by the fact that raising shared
 buffers didn't help.   If you're not seeing 'bk's, go ahead and
 disable the stats macro.


 most blk comes with 0, some with 1, few hitting 100. I can't say that during
 stall times the number of blk 0 vs blk non-0 are very different.

right.  this is feeling more and more like a buffer pin issue.  but
even then we can't be certain -- it could be symptom, not the cause.
to prove it we need to demonstrate that everyone is spinning and
waiting, which we haven't done.  classic spinlock contention manifests
in high user cpu. we are binding in kernel, so I wonder if it's all
the select() calls.  we haven't yet ruled out kernel regression.

If I were you, I'd be investigating pgbouncer to see if your app is
compliant with transaction mode processing, if for no other reason
than it will mitigate high load issues.

 *) How many specific query plans are needed to introduce the
 condition,  Hopefully, it's not too many.  If so, let's start
 gathering the plans.  If you have a lot of plans to sift through, one
 thing we can attempt to eliminate noise is to tweak
 log_min_duration_statement so that during stall times (only) it logs
 offending queries that are unexpectedly blocking.


 unfortunately, there are quite a few query plans... also, I don't think
 setting log_min_duration_statement will help us, cause when server is
 hitting high load average, it reacts slowly even on a key press. So even
 non-offending queries will be taking long to execute. I see all sorts of
 queries a being executed long during stall: spanning from simple
 LOG:  duration: 1131.041 ms  statement: SELECT 'DBD::Pg ping test'
 to complex ones, joining multiple tables.
 We are still looking into all the logged queries in attempt to find the ones
 that are causing the problem, I'll report if we find any clues.

right.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-19 Thread Merlin Moncure
On Fri, Nov 16, 2012 at 12:26 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Fri, Nov 16, 2012 at 8:21 AM, Merlin Moncure mmonc...@gmail.com wrote:
 On Fri, Nov 16, 2012 at 9:52 AM, Vlad marche...@gmail.com wrote:

 *) failing that, LWLOCK_STATS macro can be compiled in to give us some
 information about the particular lock(s) we're binding on.  Hopefully
 it's a lwlock -- this will make diagnosing the problem easier.


 I've enabled that macro, seeing flying lwlock messages in the log (see
 below), even when there is no high-sys-cpu stall observed at the moment.
 Should I be looking for something in particular?

 We're looking for spikes in 'blk' which represents when lwlocks bump.

 Unfortunately LWLock contention is only a weak indication of spinlock
 contention.  For example, if backends are furiously acquiring and
 releasing the same LWLock in shared mode and no one is getting an
 exclusive mode, then there can be high spinlock contention but there
 will never be a blk.

yup.  this problem doesn't smell like lwlock issues.  typically there
the problem manifests as low cpu performance, everybody waits.
classic spinlock contention (at least from what i've seen) is very
high *userspace* cpu utilization and low work output.  this time it's
different -- OP is bogging in the kernel so it's not impossible we're
troubleshooting the symptom, not the cause.

 In 9.3 there is a new field that tells how many spin delays there were
 on the mutex that is behind each lock.  That was  commit
 b79ab00144e64217d41, maybe he can port that back to his version.

 But that only tells you about LWLock mutexes, not about all the other
 ones in PG.

 The attached patch logs every spin delay with where in the source it comes 
 from.

yeah, OP should fire this off. good stuff.  I'll bet lunch (if we ever
happen to meet) it's on buffer pin.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-19 Thread Vlad
I just did a little experiment: extracted top four queries that were
executed the longest during stall times and launched  pgbench test with 240
clients. Yet I wasn't able to put the server into a stall with that. Also
load average was hitting 120+, it was all user cpu, single digit % system.
The system remained pretty responsive (on a keypress), in contrary to when
high-sys-cpu stall happens, showing similar LA numbers.

This makes me think that there is probably a different (than ones I tried)
query or condition that is responsible for creating high-sys-cpu. Also,
from my limited knowledge of postgresql and kernel internals, this
experiment puts under question idea that linux scheduler might be the
corner stone

Here is a longer log, first column is PID. High-sys-cpu stall happened
around 10:42:37
https://dl.dropbox.com/u/109778/postgresql-2012-11-19_103425-stripped.log

I'll try to put pgbouncer in place and repeat, also though this will take
longer time to setup.


-- Vlad


Re: [GENERAL] High SYS CPU - need advise

2012-11-19 Thread Merlin Moncure
On Sun, Nov 18, 2012 at 4:24 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Fri, Nov 16, 2012 at 12:13 PM, Vlad marche...@gmail.com wrote:
 ok, I've applied that patch and ran. The stall started around 13:50:45...50
 and lasted until the end

 https://dl.dropbox.com/u/109778/postgresql-2012-11-16_134904-stripped.log

 That isn't as much log as I expected.  But I guess only the tip of the
 iceberg gets logged, it could be that most spinlocks are contended but
 then get acquired just before the sleep (and log) would have happened.

 I'm not sure how to figure out just how much spinning is going on
 below the surface, but my gut feeling is there isn't enough of it to
 explain the amount of slowdown you are seeing (and it would probably
 be accounted for as user time rather than system time)


yes. classic spinlock contention symptoms are high user cpu and almost
work getting done.   something else is going on here?

 Maybe we could whip up a dummy standalone  program that uses PG's
 spinlock code in a intentionally contended  way and see how your
 system reports resource usage for that.  Maybe your system reports
 fighting for cache lines or whatever is going on behind the scenes as
 system time rather than user time, or something.

I don't think this is the case.  It's looking more and more like
scheduler issues, although it's unclear if that is due to some kind of
regression or just symptoms of backends piling up.  I'm also starting
to wonder if the spinlock contention we are seeing (which according to
your patch really isn't all that much really) is just symptomatic of
scheduler issues.

 It would be a little easier to figure out what was going on there if
 the log included the pid (%p) so we can see if it is the same process
 doing a bunch of consecutive sleeps, or a bunch of different
 processes.  Also, logging the millisecond (%m rather than %t) would be
 helpful.

 But the way that the point of contention jumps around a lot points to
 a scheduler issue rather than a postgres-specific issue.  A backend
 gets pre-empted while holding a spinlock, and then the kernel is
 resistant to rescheduling it even though all other processes are piled
 up behind it.  But again, that theory would lead to more log lines
 than we see, I would think.

yeah.  I asked for some longer strace logs but then everyone went home
for the weekend.

 Anyway, pgbouncer might help by ensuring that a connection gets
 pre-empted by pgbouncer at a point where it has no locks, rather than
 by the kernel at a point where it does hold locks.

yes -- I think pgbouncer is the key here because it's going to tell us
if the scheduler is the culprit here.  If you look carefully at the
strace logs, you'll see that system call times become unpredictable as
they are getting scheduled out...particularly yielding calls like
select().  So we need to determine if this is normal high load
behavior or if there is an underlying mechanic.

 Any chance you could try an older kernel and see if that reduces the problem?

This is one thing to try.  Personally I think pgbouncer/transaction
mode is low hanging fruit if it 'drops in' to OP's app. If problem
goes away this will pretty much point the finger at Linux AFAICT (in
which case we can start messing around with the kernel).

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-19 Thread Merlin Moncure
On Mon, Nov 19, 2012 at 10:50 AM, Vlad marche...@gmail.com wrote:
 I just did a little experiment: extracted top four queries that were
 executed the longest during stall times and launched  pgbench test with 240
 clients. Yet I wasn't able to put the server into a stall with that. Also
 load average was hitting 120+, it was all user cpu, single digit % system.
 The system remained pretty responsive (on a keypress), in contrary to when
 high-sys-cpu stall happens, showing similar LA numbers.

 This makes me think that there is probably a different (than ones I tried)
 query or condition that is responsible for creating high-sys-cpu. Also, from
 my limited knowledge of postgresql and kernel internals, this experiment
 puts under question idea that linux scheduler might be the corner stone

 Here is a longer log, first column is PID. High-sys-cpu stall happened
 around 10:42:37
 https://dl.dropbox.com/u/109778/postgresql-2012-11-19_103425-stripped.log

 I'll try to put pgbouncer in place and repeat, also though this will take
 longer time to setup.


yeah.  interesting -- contention was much higher this time and that
changes things.  strange how it was missed earlier.

you're getting bounced around a lot in lwlock especially
(unfortunately we don't know which one).  I'm going to hazard another
guess:  maybe the trigger here is when the number of contending
backends exceeds some critical number (probably based on the number of
cores) you see a quick cpu spike (causing more backends to lock and
pile up) as cache line bouncing sets in.  That spike doesn't last
long, because the spinlocks quickly accumulate delay counts then punt
to the scheduler which is unable to cope.  The exact reason why this
is happening to you in exactly this way (I've never seen it) is
unclear.  Also the line between symptom and cause is difficult to
draw.

unfortunately, in your case spinlock re-scheduling isn't helping.  log
entries like this one:
18764 [2012-11-19 10:43:50.124 CST] LOG:  JJ spin delay from file
sinvaladt.c line 512 delay 212, pointer 0x7f514959a394 at character 29

are suggesting major problems.  you're dangerously close to a stuck
spinlock which is lights out for the database.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-16 Thread Merlin Moncure
On Thu, Nov 15, 2012 at 6:07 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Thu, Nov 15, 2012 at 2:44 PM, Merlin Moncure mmonc...@gmail.com wrote:

 select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 2000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 3000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 4000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 6000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 7000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 8000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 9000})  = 0 (Timeout)

 This is not entirely inconsistent with the spinlock.  Note that 1000
 is repeated 3 times, and 5000 is missing.

 This might just be a misleading random sample we got here.  I've seen
 similar close spacing in some simulations I've run.

 It is not clear to me why we use a resolution of 1 msec here.  If the
 OS's implementation of select() eventually rounds to the nearest msec,
 that is its business.  But why do we have to lose intermediate
 precision due to its decision?

Yeah -- you're right, this is definitely spinlock issue.  Next steps:

*) in mostly read workloads, we have a couple of known frequent
offenders.  In particular the 'BufFreelistLock'.  One way we can
influence that guy is to try and significantly lower/raise shared
buffers.  So this is one thing to try.

*) failing that, LWLOCK_STATS macro can be compiled in to give us some
information about the particular lock(s) we're binding on.  Hopefully
it's a lwlock -- this will make diagnosing the problem easier.

*) if we're not blocking on lwlock, it's possibly a buffer pin related
issue? I've seen this before, for example on an index scan that is
dependent on an seq scan.  This long thread:
http://postgresql.1045698.n5.nabble.com/9-2beta1-parallel-queries-ReleasePredicateLocks-CheckForSerializableConflictIn-in-the-oprofile-td5709812i100.html;
has a lot information about that case and deserves a review.

*) we can consider experimenting with futex
(http://archives.postgresql.org/pgsql-hackers/2012-06/msg01588.php)
to see if things improve.  This is dangerous, and could crash your
server/eat your data, so fair warning.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-16 Thread Vlad

Merlin,


Yeah -- you're right, this is definitely spinlock issue.  Next steps:

*) in mostly read workloads, we have a couple of known frequent
offenders.  In particular the 'BufFreelistLock'.  One way we can
influence that guy is to try and significantly lower/raise shared
buffers.  So this is one thing to try.


server has 32gb with 3.2gb dedicated for share buffers. I've increased 
it to 13.2gb, the stall still happened (I have a way of controlling 
number of queries hitting postgresql by shutting down own memcache-based 
app cache, so to test for stall I temporary shut down few memcached 
servers).



*) failing that, LWLOCK_STATS macro can be compiled in to give us some
information about the particular lock(s) we're binding on.  Hopefully
it's a lwlock -- this will make diagnosing the problem easier.


I've enabled that macro, seeing flying lwlock messages in the log (see 
below), even when there is no high-sys-cpu stall observed at the moment. 
Should I be looking for something in particular?



PID 17293 lwlock 5906: shacq 1 exacq 0 blk 0
PID 17293 lwlock 5932: shacq 1 exacq 0 blk 0
PID 17293 lwlock 5934: shacq 1 exacq 0 blk 0
PID 17293 lwlock 10854: shacq 1 exacq 0 blk 0
PID 17293 lwlock 10856: shacq 4 exacq 0 blk 0
PID 17293 lwlock 10858: shacq 15 exacq 0 blk 0
PID 17293 lwlock 16442: shacq 4 exacq 0 blk 0
PID 17293 lwlock 16596: shacq 1 exacq 0 blk 0
PID 17293 lwlock 20458: shacq 3 exacq 0 blk 0
PID 17293 lwlock 20460: shacq 10 exacq 0 blk 0
PID 17293 lwlock 20464: shacq 2 exacq 0 blk 0
PID 17293 lwlock 20466: shacq 1 exacq 0 blk 0
PID 17293 lwlock 20480: shacq 2 exacq 0 blk 0
PID 17293 lwlock 20482: shacq 1 exacq 0 blk 0
PID 17293 lwlock 20484: shacq 1 exacq 0 blk 0
PID 17293 lwlock 20508: shacq 12 exacq 0 blk 0
PID 17293 lwlock 20510: shacq 6 exacq 0 blk 0
PID 17293 lwlock 20938: shacq 11 exacq 0 blk 0
PID 17293 lwlock 20940: shacq 22 exacq 0 blk 0
PID 17293 lwlock 20942: shacq 28 exacq 0 blk 0
PID 17293 lwlock 20944: shacq 56 exacq 0 blk 0
PID 17293 lwlock 20946: shacq 1 exacq 0 blk 0
PID 17293 lwlock 20948: shacq 2 exacq 0 blk 0
PID 17293 lwlock 20950: shacq 11 exacq 0 blk 0
PID 17293 lwlock 20952: shacq 3 exacq 0 blk 0
PID 17293 lwlock 20954: shacq 7 exacq 0 blk 0
PID 17293 lwlock 20956: shacq 2 exacq 0 blk 0
PID 17293 lwlock 20958: shacq 1 exacq 0 blk 0
PID 17293 lwlock 20960: shacq 1 exacq 0 blk 0
PID 17293 lwlock 20962: shacq 2 exacq 0 blk 0
PID 17293 lwlock 20964: shacq 2 exacq 0 blk 0
PID 17293 lwlock 46624: shacq 24 exacq 0 blk 0
PID 17293 lwlock 68126: shacq 6 exacq 0 blk 0



*) if we're not blocking on lwlock, it's possibly a buffer pin related
issue? I've seen this before, for example on an index scan that is
dependent on an seq scan.  This long thread:
http://postgresql.1045698.n5.nabble.com/9-2beta1-parallel-queries-ReleasePredicateLocks-CheckForSerializableConflictIn-in-the-oprofile-td5709812i100.html;
has a lot information about that case and deserves a review.

*) we can consider experimenting with futex
(http://archives.postgresql.org/pgsql-hackers/2012-06/msg01588.php)
to see if things improve.  This is dangerous, and could crash your
server/eat your data, so fair warning.

merlin



-- vlad


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-16 Thread Merlin Moncure
On Fri, Nov 16, 2012 at 9:52 AM, Vlad marche...@gmail.com wrote:
 Merlin,


 Yeah -- you're right, this is definitely spinlock issue.  Next steps:

 *) in mostly read workloads, we have a couple of known frequent
 offenders.  In particular the 'BufFreelistLock'.  One way we can
 influence that guy is to try and significantly lower/raise shared
 buffers.  So this is one thing to try.


 server has 32gb with 3.2gb dedicated for share buffers. I've increased it to
 13.2gb, the stall still happened (I have a way of controlling number of
 queries hitting postgresql by shutting down own memcache-based app cache, so
 to test for stall I temporary shut down few memcached servers).


 *) failing that, LWLOCK_STATS macro can be compiled in to give us some
 information about the particular lock(s) we're binding on.  Hopefully
 it's a lwlock -- this will make diagnosing the problem easier.


 I've enabled that macro, seeing flying lwlock messages in the log (see
 below), even when there is no high-sys-cpu stall observed at the moment.
 Should I be looking for something in particular?

We're looking for spikes in 'blk' which represents when lwlocks bump.
If you're not seeing any then this is suggesting a buffer pin related
issue -- this is also supported by the fact that raising shared
buffers didn't help.   If you're not seeing 'bk's, go ahead and
disable the stats macro.

So, what we need to know now is:
*) What happens when you drastically *lower* shared buffers?   Say, to
64mb?  Note, you may experience higher load for unrelated reasons and
have to scuttle the test.  Also, if you have to crank higher to handle
internal server structures, do that.  This is a hail mary, but maybe
something interesting spits out.

*) How many specific query plans are needed to introduce the
condition,  Hopefully, it's not too many.  If so, let's start
gathering the plans.  If you have a lot of plans to sift through, one
thing we can attempt to eliminate noise is to tweak
log_min_duration_statement so that during stall times (only) it logs
offending queries that are unexpectedly blocking.

*) Approximately how big is your 'working set' -- the data your
queries are routinely hitting?

*) Is the distribution of the *types* of queries uniform?  Or do you
have special processes that occur on intervals?

Thanks for your patience.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-16 Thread Vlad
 We're looking for spikes in 'blk' which represents when lwlocks bump.
 If you're not seeing any then this is suggesting a buffer pin related
 issue -- this is also supported by the fact that raising shared
 buffers didn't help.   If you're not seeing 'bk's, go ahead and
 disable the stats macro.


most blk comes with 0, some with 1, few hitting 100. I can't say that
during stall times the number of blk 0 vs blk non-0 are very different.



 So, what we need to know now is:
 *) What happens when you drastically *lower* shared buffers?   Say, to
 64mb?  Note, you may experience higher load for unrelated reasons and
 have to scuttle the test.  Also, if you have to crank higher to handle
 internal server structures, do that.  This is a hail mary, but maybe
 something interesting spits out.


lowering shared_buffers didn't help.


 *) How many specific query plans are needed to introduce the
 condition,  Hopefully, it's not too many.  If so, let's start
 gathering the plans.  If you have a lot of plans to sift through, one
 thing we can attempt to eliminate noise is to tweak
 log_min_duration_statement so that during stall times (only) it logs
 offending queries that are unexpectedly blocking.


unfortunately, there are quite a few query plans... also, I don't think
setting log_min_duration_statement will help us, cause when server is
hitting high load average, it reacts slowly even on a key press. So even
non-offending queries will be taking long to execute. I see all sorts of
queries a being executed long during stall: spanning from simple
LOG:  duration: 1131.041 ms  statement: SELECT 'DBD::Pg ping test'
to complex ones, joining multiple tables.
We are still looking into all the logged queries in attempt to find the
ones that are causing the problem, I'll report if we find any clues.



 *) Approximately how big is your 'working set' -- the data your
 queries are routinely hitting?


I *think* it's within few hundreds MB range.



 *) Is the distribution of the *types* of queries uniform?  Or do you
 have special processes that occur on intervals?


it's pretty uniform.



 Thanks for your patience.


oh no, thank you for trying to help me to resolve this issue.

-- vlad


Re: [GENERAL] High SYS CPU - need advise

2012-11-16 Thread Jeff Janes
On Fri, Nov 16, 2012 at 8:21 AM, Merlin Moncure mmonc...@gmail.com wrote:
 On Fri, Nov 16, 2012 at 9:52 AM, Vlad marche...@gmail.com wrote:

 *) failing that, LWLOCK_STATS macro can be compiled in to give us some
 information about the particular lock(s) we're binding on.  Hopefully
 it's a lwlock -- this will make diagnosing the problem easier.


 I've enabled that macro, seeing flying lwlock messages in the log (see
 below), even when there is no high-sys-cpu stall observed at the moment.
 Should I be looking for something in particular?

 We're looking for spikes in 'blk' which represents when lwlocks bump.

Unfortunately LWLock contention is only a weak indication of spinlock
contention.  For example, if backends are furiously acquiring and
releasing the same LWLock in shared mode and no one is getting an
exclusive mode, then there can be high spinlock contention but there
will never be a blk.

In 9.3 there is a new field that tells how many spin delays there were
on the mutex that is behind each lock.  That was  commit
b79ab00144e64217d41, maybe he can port that back to his version.

But that only tells you about LWLock mutexes, not about all the other
ones in PG.

The attached patch logs every spin delay with where in the source it comes from.

I don't think it will have too much of a performance impact as the
logs only happen when a sleep of at least 1ms is going to happen
anyway.  But it could generate massive amounts of logs.


Cheers,

Jeff


spin_log.patch
Description: Binary data

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-15 Thread Merlin Moncure
On Wed, Nov 14, 2012 at 4:08 PM, John R Pierce pie...@hogranch.com wrote:
 On 11/14/12 1:34 PM, Vlad wrote:

 thanks for your feedback. While implementing connection pooling would make
 resources utilization more efficient, I don't think it's the root of my
 problem. Most of the connected clients are at IDLE. When I do

 select * from pg_stat_activity where current_query not like '%IDLE%';

 I only see several active queries at any given time.


 what about during these spikes?

Yeah -- if you are seeing a lot of queries pile up during these times,
it's time to explore connection pooler because it will keep system
load manageable during these situations.

things to check:
*) blocked queries (pg_locks/pg_stat_activity)
*) i/o wait. in particular, is linux page cache flushing.
*) query stampede: we will want to measure TPS leading into the stall
and out of it.
*) anything else running on the box?
*) you have a large amount of table? maybe this statistics file related?
*) let's log checkpoints to see if there is correlation with stall
*) nice to have vmstat/iostat during/before/after stall.  for example,
massive spike of context switches during stall could point to o/s
scheduler issue.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-15 Thread Vlad
there is no big spike of queries that cause that, queries come in
relatively stable pace. It's just when the higher rate of queries coming,
the more likely this to happen.  yes, when stall happens , the active
queries pile up - but that's the result of a stall (the server reacts slow
on a keypress, not to mention queries execution), not the cause.


things to check:
 *) blocked queries (pg_locks/pg_stat_activity)


nada


 *) i/o wait. in particular, is linux page cache flushing.


no i/o wait, no IRQ (see below)


 *) query stampede: we will want to measure TPS leading into the stall
 and out of it.
 *) anything else running on the box?


just bare linux + postgresql.


 *) you have a large amount of table? maybe this statistics file related?


over 1000 tables across 4 or 5 schemas in a single database.

*) let's log checkpoints to see if there is correlation with stall


checked, checkpoints happen must more rarely and w/o relation to a high-sys
periods


 *) nice to have vmstat/iostat during/before/after stall.  for example,
 massive spike of context switches during stall could point to o/s
 scheduler issue.



checked that as well - nothing. CS even lower.

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  16.940.009.280.380.00   73.40

Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
sda   6.0048.00 0.00 48  0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  18.060.00   18.430.250.00   63.26

Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
sda  10.00   104.00 0.00104  0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   9.120.00  * 86.74*0.120.004.01

Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
sda   0.97 7.77 0.00  8  0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   1.440.00   *96.58*0.000.001.98

Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
sda   3.2878.69 0.00144  0



procs ---memory-- ---swap-- -io --system--
-cpu-
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id
wa st
 1  0  0 279240  12016 144319640032 0 197852 4299 15  9
76  0  0
 4  0  0 225984  12024 1441969600 064 197711 5158 11  9
79  1  0
 0  0  0 260112  12024 144136360048 0 196708 4618 17 10
73  0  0
 6  0  0 233936  12024 1437578400   104 0 179861 4884 19 17
64  0  0
30  0  0 224904  12024 1435481200 8 0 51088 1205  9 *86
 *5  0  0
72  0  0 239144  12024 1433385200   144 0 45601  542  2 *98
 *0  0  0
78  0  0 224840  12024 1432853600 0 0 38732  481  2 *94
 *5  0  0
22  1  0 219072  12032 1425065200   136   100 47323 1231  9 *90
 *1  0  0


Re: [GENERAL] High SYS CPU - need advise

2012-11-15 Thread Merlin Moncure
On Thu, Nov 15, 2012 at 11:50 AM, Vlad marche...@gmail.com wrote:
 there is no big spike of queries that cause that, queries come in relatively
 stable pace. It's just when the higher rate of queries coming, the more
 likely this to happen.  yes, when stall happens , the active queries pile up
 - but that's the result of a stall (the server reacts slow on a keypress,
 not to mention queries execution), not the cause.

 procs ---memory-- ---swap-- -io --system--
 -cpu-
  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id
 wa st
  1  0  0 279240  12016 144319640032 0 197852 4299 15  9
 76  0  0
  4  0  0 225984  12024 1441969600 064 197711 5158 11  9
 79  1  0
  0  0  0 260112  12024 144136360048 0 196708 4618 17 10
 73  0  0
  6  0  0 233936  12024 1437578400   104 0 179861 4884 19 17
 64  0  0
 30  0  0 224904  12024 1435481200 8 0 51088 1205  9 86
 5  0  0
 72  0  0 239144  12024 1433385200   144 0 45601  542  2 98
 0  0  0
 78  0  0 224840  12024 1432853600 0 0 38732  481  2 94
 5  0  0
 22  1  0 219072  12032 1425065200   136   100 47323 1231  9 90
 1  0  0

hm.  well, we can definitely rule out i/o.   I reviewed your last
posting, and you said:
Out of the top 50 processes in top, 48 of them are postmasters, one
is syslog, and one is psql. Each of the postmasters have a high %CPU,
the top ones being 80% and higher, the rest being anywhere between 30%
- 60%. Would postmaster 'queries' that are running attribute to the
sys CPU usage, or should they be under the 'us' CPU usage?

Is this still true?  Can we capture strace from one of the high %
postmasters to see if there's any clues there.   Maybe we've uncovered
some type of weird spinlock contention issue.  How large is your
database (or at least the portion of it that's commonly used)?  Would
you characterize your queries as mostly small lookups, scans, or a
mix?

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-15 Thread Vlad
Merlin,

this is not my report, probably from a thread that I've referenced as
having a common symptoms. Here is info about my db:


Postgresql 9.1.6.
Postgres usually has 400-500 connected clients, most of them are idle.
Database is over 1000 tables (across 5 namespaces), taking ~150Gb on disk.
We have mostly select statements (joins across few tables), using indexes
and resulting in a small number of records returned.

So mostly small lookups across few tables joining by a primary key / index.
Also, I just perform an experiment - I switched our app over to using hot
PITR instead of master and it experienced the same problem. So since PITR
db can only perform read-only queries, there is no write-locks (except
maybe when pitr is playing wal records from the master?), nevertheless SYS
CPU jumped sky.

-- Vlad




 Is this still true?  Can we capture strace from one of the high %
 postmasters to see if there's any clues there.   Maybe we've uncovered
 some type of weird spinlock contention issue.  How large is your
 database (or at least the portion of it that's commonly used)?  Would
 you characterize your queries as mostly small lookups, scans, or a
 mix?

 merlin



Re: [GENERAL] High SYS CPU - need advise

2012-11-15 Thread Merlin Moncure
On Thu, Nov 15, 2012 at 2:20 PM, Vlad marche...@gmail.com wrote:
 Merlin,

 this is not my report, probably from a thread that I've referenced as having
 a common symptoms. Here is info about my db:


 Postgresql 9.1.6.
 Postgres usually has 400-500 connected clients, most of them are idle.
 Database is over 1000 tables (across 5 namespaces), taking ~150Gb on disk.
 We have mostly select statements (joins across few tables), using indexes
 and resulting in a small number of records returned.

 So mostly small lookups across few tables joining by a primary key / index.
 Also, I just perform an experiment - I switched our app over to using hot
 PITR instead of master and it experienced the same problem. So since PITR db
 can only perform read-only queries, there is no write-locks (except maybe
 when pitr is playing wal records from the master?), nevertheless SYS CPU
 jumped sky.

yeah.  ok, nest steps:
*) can you confirm that postgres process is using high cpu (according
to top) during stall time
*) if, so, please strace that process and save some of the log
*) you're using a 'bleeding edge' kernel.  so we must be suspicious of
a regression there, particularly in the scheduler.
*) I am suspicious of spinlock issue. so, if we can't isolate the
problem, is running a hand complied postgres a possibility (for lock
stats)?
*) what is the output of this:
echo /proc/sys/vm/zone_reclaim_mode

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-15 Thread Vlad

 yeah.  ok, nest steps:
 *) can you confirm that postgres process is using high cpu (according
 to top) during stall time


yes, CPU is spread across a lot of postmasters

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
29863 pgsql 20   0 3636m 102m  36m R 19.1  0.3   0:01.33 postmaster
30277 pgsql 20   0 3645m 111m  37m R 16.8  0.3   0:01.27 postmaster
11966 pgsql 20   0 3568m  22m  15m R 15.1  0.1   0:00.66 postmaster
 8073 pgsql 20   0 3602m  60m  26m S 13.6  0.2   0:00.77 postmaster
29780 pgsql 20   0 3646m 115m  43m R 13.6  0.4   0:01.13 postmaster
11865 pgsql 20   0 3606m  61m  23m S 12.8  0.2   0:01.87 postmaster
29379 pgsql 20   0 3603m  70m  30m R 12.8  0.2   0:00.80 postmaster
29727 pgsql 20   0 3616m  77m  31m R 12.5  0.2   0:00.81 postmaster




 *) if, so, please strace that process and save some of the log


https://dl.dropbox.com/u/109778/stall_postmaster.log


 *) you're using a 'bleeding edge' kernel.  so we must be suspicious of
 a regression there, particularly in the scheduler.


this was observed for a while, during which period system went from using
3.4.* kernels to 3.5.*... but I do not deny such a possibility.


 *) I am suspicious of spinlock issue. so, if we can't isolate the
 problem, is running a hand complied postgres a possibility (for lock
 stats)?



Yes, definitely possible. we run manually compiled postgresql anyway. Pls,
provide instructions.




 *) what is the output of this:
 echo /proc/sys/vm/zone_reclaim_mode


I presume you wanted cat instead of echo, and it shows 0.


-- vlad


Re: [GENERAL] High SYS CPU - need advise

2012-11-15 Thread Merlin Moncure
On Thu, Nov 15, 2012 at 2:44 PM, Vlad marche...@gmail.com wrote:


 yeah.  ok, nest steps:
 *) can you confirm that postgres process is using high cpu (according
 to top) during stall time


 yes, CPU is spread across a lot of postmasters

   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 29863 pgsql 20   0 3636m 102m  36m R 19.1  0.3   0:01.33 postmaster
 30277 pgsql 20   0 3645m 111m  37m R 16.8  0.3   0:01.27 postmaster
 11966 pgsql 20   0 3568m  22m  15m R 15.1  0.1   0:00.66 postmaster
  8073 pgsql 20   0 3602m  60m  26m S 13.6  0.2   0:00.77 postmaster
 29780 pgsql 20   0 3646m 115m  43m R 13.6  0.4   0:01.13 postmaster
 11865 pgsql 20   0 3606m  61m  23m S 12.8  0.2   0:01.87 postmaster
 29379 pgsql 20   0 3603m  70m  30m R 12.8  0.2   0:00.80 postmaster
 29727 pgsql 20   0 3616m  77m  31m R 12.5  0.2   0:00.81 postmaster


 *) if, so, please strace that process and save some of the log


 https://dl.dropbox.com/u/109778/stall_postmaster.log

ok, excellent.   reviewing the log, this immediately caught my eye:

recvfrom(8, \27\3\1\0@, 5, 0, NULL, NULL) = 5
recvfrom(8, 
\327\327\nl\231LD\211\346\243@WW\254\244\363C\326\247\341\177\255\263~\327HDv-\3466\353...,
64, 0, NULL, NULL) = 64
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 2000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 3000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 4000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 6000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 7000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 8000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 9000})  = 0 (Timeout)
semop(41713721, {{2, 1, 0}}, 1) = 0
lseek(295, 0, SEEK_END) = 0
lseek(296, 0, SEEK_END) = 8192

this is definitely pointing to spinlock issue. see: slock.c what you
are seeing here is a backend getting caught in a spin loop via a stuck
spinlock. 0-9 = 10 times.  did you by any chance check the logs? did
any backends restart? we are looking for this:

elog(PANIC, stuck spinlock (%p) detected at %s:%d,
 lock, file, line);

Anything else going on in the log?   The way to enable locks status is
via LWLOCK_STATS macro.  But before doing any thing plz check logs for
error/panic.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-15 Thread Merlin Moncure
On Thu, Nov 15, 2012 at 3:49 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Thu, Nov 15, 2012 at 2:44 PM, Vlad marche...@gmail.com wrote:


 yeah.  ok, nest steps:
 *) can you confirm that postgres process is using high cpu (according
 to top) during stall time


 yes, CPU is spread across a lot of postmasters

   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 29863 pgsql 20   0 3636m 102m  36m R 19.1  0.3   0:01.33 postmaster
 30277 pgsql 20   0 3645m 111m  37m R 16.8  0.3   0:01.27 postmaster
 11966 pgsql 20   0 3568m  22m  15m R 15.1  0.1   0:00.66 postmaster
  8073 pgsql 20   0 3602m  60m  26m S 13.6  0.2   0:00.77 postmaster
 29780 pgsql 20   0 3646m 115m  43m R 13.6  0.4   0:01.13 postmaster
 11865 pgsql 20   0 3606m  61m  23m S 12.8  0.2   0:01.87 postmaster
 29379 pgsql 20   0 3603m  70m  30m R 12.8  0.2   0:00.80 postmaster
 29727 pgsql 20   0 3616m  77m  31m R 12.5  0.2   0:00.81 postmaster


 *) if, so, please strace that process and save some of the log


 https://dl.dropbox.com/u/109778/stall_postmaster.log

 ok, excellent.   reviewing the log, this immediately caught my eye:

 recvfrom(8, \27\3\1\0@, 5, 0, NULL, NULL) = 5
 recvfrom(8, 
 \327\327\nl\231LD\211\346\243@WW\254\244\363C\326\247\341\177\255\263~\327HDv-\3466\353...,
 64, 0, NULL, NULL) = 64
 select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 2000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 3000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 4000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 6000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 7000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 8000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 9000})  = 0 (Timeout)
 semop(41713721, {{2, 1, 0}}, 1) = 0
 lseek(295, 0, SEEK_END) = 0
 lseek(296, 0, SEEK_END) = 8192

 this is definitely pointing to spinlock issue. see: slock.c what you
 are seeing here is a backend getting caught in a spin loop via a stuck
 spinlock. 0-9 = 10 times.  did you by any chance check the logs? did
 any backends restart? we are looking for this:

 elog(PANIC, stuck spinlock (%p) detected at %s:%d,
  lock, file, line);

 Anything else going on in the log?   The way to enable locks status is
 via LWLOCK_STATS macro.  But before doing any thing plz check logs for
 error/panic.

ah, scratch that.  reading the code again, slock.c allows for for up
to 1 second (that is, you can delay 1000 times).  so the panic is
unlikely to be in the log.  also spinlock waits shouldn't increase by
exactly 1000 us. investigating...

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-15 Thread Alvaro Herrera
Merlin Moncure escribió:

 ok, excellent.   reviewing the log, this immediately caught my eye:
 
 recvfrom(8, \27\3\1\0@, 5, 0, NULL, NULL) = 5
 recvfrom(8, 
 \327\327\nl\231LD\211\346\243@WW\254\244\363C\326\247\341\177\255\263~\327HDv-\3466\353...,
 64, 0, NULL, NULL) = 64
 select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 2000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 3000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 4000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 6000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 7000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 8000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 9000})  = 0 (Timeout)
 semop(41713721, {{2, 1, 0}}, 1) = 0
 lseek(295, 0, SEEK_END) = 0
 lseek(296, 0, SEEK_END) = 8192
 
 this is definitely pointing to spinlock issue.

I met Rik van Riel (Linux kernel hacker) recently and we chatted about
this briefly.  He strongly suggested that we should consider using
futexes on Linux instead of spinlocks; the big advantage being that
futexes sleep instead of spinning when contention is high.  That would
reduce the system load in this scenario.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-15 Thread Vlad
sorry - no panics / errors in the log...

-- Vlad


Re: [GENERAL] High SYS CPU - need advise

2012-11-15 Thread Merlin Moncure
On Thu, Nov 15, 2012 at 4:29 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Merlin Moncure escribió:

 ok, excellent.   reviewing the log, this immediately caught my eye:

 recvfrom(8, \27\3\1\0@, 5, 0, NULL, NULL) = 5
 recvfrom(8, 
 \327\327\nl\231LD\211\346\243@WW\254\244\363C\326\247\341\177\255\263~\327HDv-\3466\353...,
 64, 0, NULL, NULL) = 64
 select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 2000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 3000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 4000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 6000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 7000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 8000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 9000})  = 0 (Timeout)
 semop(41713721, {{2, 1, 0}}, 1) = 0
 lseek(295, 0, SEEK_END) = 0
 lseek(296, 0, SEEK_END) = 8192

 this is definitely pointing to spinlock issue.

 I met Rik van Riel (Linux kernel hacker) recently and we chatted about
 this briefly.  He strongly suggested that we should consider using
 futexes on Linux instead of spinlocks; the big advantage being that
 futexes sleep instead of spinning when contention is high.  That would
 reduce the system load in this scenario.

Well, so do postgres spinlocks right?  When we overflow
spins_per_delay we go to pg_usleep which proxies to select() --
postgres spinlocks are a hybrid implementation.  Moving to futex is
possible improvement (that's another discussion) in degenerate cases
but I'm not sure that I've exactly zeroed in on the problem.  Or am I
missing something?

What I've been scratching my head over is what code exactly would
cause an iterative sleep like the above.  The code is here:

  pg_usleep(cur_delay * 1000L);

  /* increase delay by a random fraction between 1X and 2X */
  cur_delay += (int) (cur_delay *
((double) random() / (double) MAX_RANDOM_VALUE) + 0.5);
  /* wrap back to minimum delay when max is exceeded */
  if (cur_delay  MAX_DELAY_MSEC)
cur_delay = MIN_DELAY_MSEC;

...so cur_delay is supposed to increase in non linear fashion.  I've
looked around the sleep, usleep, and latch calls as of yet haven't
found anything that advances timeout just like that (yet, need to do
another pass). And we don't know for sure if this is directly related
to OP's problem.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-15 Thread Tom Lane
Merlin Moncure mmonc...@gmail.com writes:
 What I've been scratching my head over is what code exactly would
 cause an iterative sleep like the above.  The code is here:

   pg_usleep(cur_delay * 1000L);

   /* increase delay by a random fraction between 1X and 2X */
   cur_delay += (int) (cur_delay *
 ((double) random() / (double) MAX_RANDOM_VALUE) + 0.5);
   /* wrap back to minimum delay when max is exceeded */
   if (cur_delay  MAX_DELAY_MSEC)
 cur_delay = MIN_DELAY_MSEC;

IIRC that coding is of relatively recent vintage.  The OP is probably
running some older version that increased cur_delay differently.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-15 Thread Vlad Marchenko

Tom,

I just checked the version I'm running (9.1.6), and the code is quite 
similar (src/backend/storage/lmgr/s_lock.c)


pg_usleep(cur_delay * 1000L);

#if defined(S_LOCK_TEST)
fprintf(stdout, *);
fflush(stdout);
#endif

/* increase delay by a random fraction between 1X and 2X */
cur_delay += (int) (cur_delay *
  ((double) random() / (double) MAX_RANDOM_VALUE) + 
0.5);

/* wrap back to minimum delay when max is exceeded */
if (cur_delay  MAX_DELAY_MSEC)
cur_delay = MIN_DELAY_MSEC;

-- vlad




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-15 Thread Jeff Janes
On Thu, Nov 15, 2012 at 2:44 PM, Merlin Moncure mmonc...@gmail.com wrote:

 select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 2000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 3000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 4000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 6000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 7000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 8000})  = 0 (Timeout)
 select(0, NULL, NULL, NULL, {0, 9000})  = 0 (Timeout)

This is not entirely inconsistent with the spinlock.  Note that 1000
is repeated 3 times, and 5000 is missing.

This might just be a misleading random sample we got here.  I've seen
similar close spacing in some simulations I've run.

It is not clear to me why we use a resolution of 1 msec here.  If the
OS's implementation of select() eventually rounds to the nearest msec,
that is its business.  But why do we have to lose intermediate
precision due to its decision?

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-14 Thread John R Pierce

On 11/14/12 1:13 PM, Vlad wrote:

Postgresql 9.1.6.
Postgres usually has 400-500 connected clients, most of them are idle.
Database is over 1000 tables (across 5 namespaces), taking ~150Gb on disk.


thats a really high client connection count for a 8 core system.

I'd consider implementing a connection pool (like pgbouncer), and 
rewriting your client applications to connect, do a transaction, 
disconnect, so the actual number of postgres connections is much lower, 
say in the 16-48 range.



--
john r pierceN 37, W 122
santa cruz ca mid-left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High SYS CPU - need advise

2012-11-14 Thread Vlad
John,

thanks for your feedback. While implementing connection pooling would make
resources utilization more efficient, I don't think it's the root of my
problem. Most of the connected clients are at IDLE. When I do

select * from pg_stat_activity where current_query not like '%IDLE%';

I only see several active queries at any given time.


-- Vlad


On Wed, Nov 14, 2012 at 3:23 PM, John R Pierce pie...@hogranch.com wrote:

 On 11/14/12 1:13 PM, Vlad wrote:

 Postgresql 9.1.6.
 Postgres usually has 400-500 connected clients, most of them are idle.
 Database is over 1000 tables (across 5 namespaces), taking ~150Gb on disk.


 thats a really high client connection count for a 8 core system.

 I'd consider implementing a connection pool (like pgbouncer), and
 rewriting your client applications to connect, do a transaction,
 disconnect, so the actual number of postgres connections is much lower, say
 in the 16-48 range.




Re: [GENERAL] High SYS CPU - need advise

2012-11-14 Thread John R Pierce

On 11/14/12 1:34 PM, Vlad wrote:
thanks for your feedback. While implementing connection pooling would 
make resources utilization more efficient, I don't think it's the root 
of my problem. Most of the connected clients are at IDLE. When I do


select * from pg_stat_activity where current_query not like '%IDLE%';

I only see several active queries at any given time.


what about during these spikes?

--
john r pierceN 37, W 122
santa cruz ca mid-left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general