Re: [GENERAL] High SYS CPU - need advise
On Sun, Dec 2, 2012 at 9:08 AM, rahul143 rk204...@gmail.com wrote: Hello everyone, I'm seeking help in diagnosing / figuring out the issue that we have with our DB server: Under some (relatively non-heavy) load: 300...400 TPS, every 10-30 seconds server drops into high cpu system usage (90%+ SYSTEM across all CPUs - it's pure SYS cpu, i.e. it's not io wait, not irq, not user). Postgresql is taking 10-15% at the same time. Those periods would last from few seconds, to minutes or until Postgresql is restarted. Needless to say that system is barely responsive, with load average hitting over 100. We have mostly select statements (joins across few tables), using indexes and resulting in a small number of records returned. Should number of requests per second coming drop a bit, server does not fall into those HIGH-SYS-CPU periods. It all seems like postgres runs out of some resources or fighting for some locks and that causing kernel to go into la-la land trying to manage it. So far we've checked: - disk and nic delays / errors / utilization - WAL files (created rarely) - tables are vacuumed OK. periods of high SYS not tied to vacuum process. - kernel resources utilization (sufficient FS handles, shared MEM/SEM, VM) - increased log level, but nothing suspicious/different (to me) is reported there during periods of high sys-cpu - ran pgbench (could not reproduce the issue, even though it was producing over 40,000 TPS for prolonged period of time) Basically, our symptoms are exactly as was reported here over a year ago (though for postgres 8.3, we ran 9.1): http://archives.postgresql.org/pgsql-general/2011-10/msg00998.php I will be grateful for any ideas helping to resolve or diagnose this problem. Didn't we just discuss this exact problem on the identically named thread? http://postgresql.1045698.n5.nabble.com/High-SYS-CPU-need-advise-td5732045.html If you're the same poster, it's good to reference the thread and any conclusions made in order to save everyone's time. As at happens, I have been working an angle that may help solve this problem. Are you willing/able to run patched postgres and what's your tolerance for risk? merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
Merlin Moncure escribió: Didn't we just discuss this exact problem on the identically named thread? http://postgresql.1045698.n5.nabble.com/High-SYS-CPU-need-advise-td5732045.html Ignore this guy. It's a bot reinjecting old messages, or something like that, probably because of some bug in mail list scrubbing software. My impression is that it's eventually going to publish every email on a blog somewhere, or something like that. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
RAID10 -- vlad On 11/24/2012 3:17 PM, Gavin Flower wrote: Curious, what is your RAID configuration? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
On Tue, Nov 20, 2012 at 9:05 AM, Merlin Moncure mmonc...@gmail.com wrote: On Tue, Nov 20, 2012 at 10:50 AM, Jeff Janes jeff.ja...@gmail.com wrote: I wouldn't expect so. Increasing shared_buffers should either fix free list lock contention, or leave it unchanged, not make it worse. AIUI, that is simply not true (unless you raise it to the point you're not churning them). I'm looking at StrategyGetBuffer() for non-scan cases. It locks BufFreelistLock then loops the free list, and, if it finds nothing, engages a clock sweep. The freelist should never loop. It is written as a loop, but I think there is currently no code path which ends up with valid buffers being on the freelist, so that loop will never, or at least rarely, execute more than once. Both of those operations are dependent on the number of buffers being managed and so it's reasonable to expect some workloads to increase contention with more buffers. The clock sweep can depend on the number of buffers begin managed in a worst-case sense, but I've never seen any evidence (nor analysis) that this worst case can be obtained in reality on an ongoing basis. By constructing two pathological workloads which get switched between, I can get the worst-case to happen, but when it does happen the consequences are mild compared to the amount of time needed to set up the necessary transition. In other words, the worse-case can't be triggered often enough to make a meaningful difference. Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
On Tue, Nov 20, 2012 at 12:16 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Tue, Nov 20, 2012 at 9:05 AM, Merlin Moncure mmonc...@gmail.com wrote: On Tue, Nov 20, 2012 at 10:50 AM, Jeff Janes jeff.ja...@gmail.com wrote: I wouldn't expect so. Increasing shared_buffers should either fix free list lock contention, or leave it unchanged, not make it worse. AIUI, that is simply not true (unless you raise it to the point you're not churning them). I'm looking at StrategyGetBuffer() for non-scan cases. It locks BufFreelistLock then loops the free list, and, if it finds nothing, engages a clock sweep. The freelist should never loop. It is written as a loop, but I think there is currently no code path which ends up with valid buffers being on the freelist, so that loop will never, or at least rarely, execute more than once. Both of those operations are dependent on the number of buffers being managed and so it's reasonable to expect some workloads to increase contention with more buffers. The clock sweep can depend on the number of buffers begin managed in a worst-case sense, but I've never seen any evidence (nor analysis) that this worst case can be obtained in reality on an ongoing basis. By constructing two pathological workloads which get switched between, I can get the worst-case to happen, but when it does happen the consequences are mild compared to the amount of time needed to set up the necessary transition. In other words, the worse-case can't be triggered often enough to make a meaningful difference. Yeah, good points; but (getting off topic here) : there have been several documented cases of lowering shared buffers improving performance under contention...the 'worst case' might be happening more than expected. In particular, what happens when a substantial percentage of the buffer pool is set with a non-zero usage count? This seems unlikely, but possible? Take note: if (buf-refcount == 0) { if (buf-usage_count 0) { buf-usage_count--; trycounter = NBuffers; /* emphasis *./ } ISTM time spent here isn't bounded except that as more time is spent sweeping (more backends are thus waiting and not marking pages) the usage counts decrease faster until you hit steady state. Smaller buffer pool naturally would help in that scenario as your usage counts would drop faster. It strikes me as cavalier to be resetting trycounter while sitting under the #1 known contention point for read only workloads. Shouldn't SBF() work on advisory basis and try to force a buffer after N failed usage count attempts? merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
On Wed, Nov 21, 2012 at 9:05 AM, Vlad marche...@gmail.com wrote: it's session mode and the pool size is 1200 (cause I need to grantee that in the worst case we have enough slots for all possible clients), Wouldn't the clients prefer to wait 100ms to get a connnection if that means their query finishes in 100ms, rather than waiting 0ms to get a connection but have the query take 1000ms to finish? however even at the times preceding high-cpu-sys-stall, the number postmasters are like 15-20. When stall happens, it starts to raise but that's the result of DB not answering in timely manner. That is like the beatings will continue until moral improves joke. The theory about why pgbouncer would be beneficial is that it prevents hundreds of connections from clogging up the server. By allowing hundreds of connections to be made when the server gets into distress, you are increasing that distress and making it harder to recover from it. That is the theory, anyway. Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
On Wed, Nov 21, 2012 at 10:43 AM, Jeff Janes jeff.ja...@gmail.com wrote: On Wed, Nov 21, 2012 at 7:29 AM, Vlad Marchenko marche...@gmail.com wrote: update on my problem: despite pgbouncer, the problem still occures on my end. As Merlin asked, how big is the pool? Maybe you are using a large enough pool so as to defeat the purpose of restricting the number of connections. also, what mode. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
nothing changes if I increase number of threads. pgbouncer doesn't change much. also, I think the nature of high-sys-cpu during stall and and when I run pgbench is different. During pgbench it's constantly at 30-40%, while during stall it sits at low 5-15% and then spikes to 90% after a while, with no apparent reason (i.e.: no increased load or anything), then goes down in a few secs, or stays until postgres restarted. at this point the only thing that does help on app side is when I force frequent re-connection to postgresql (i.e. not letting postmaster live too long) -- Vlad On Wed, Nov 21, 2012 at 10:18 AM, Jeff Janes jeff.ja...@gmail.com wrote: On Wed, Nov 21, 2012 at 8:14 AM, Vlad marche...@gmail.com wrote: '-M prepared' produces normal results, while '-M simple' results in 40% sys cpu. '-M extended' is somewhere in between. I'm running it as 60 clients, 2 threads. 2 threads is pretty low for 60 clients. What happens if you increase -j to either 30 or 60? (What I'm wondering here is if perhaps in this case it is pgbench which is triggering the scheduling issue) Also, what happens if you channel pgbench through pgbouncer? Cheers, Jeff
Re: [GENERAL] High SYS CPU - need advise
Vlad wrote: it's session mode and the pool size is 1200 (cause I need to grantee that in the worst case we have enough slots for all possible clients), We found that the real-world production performance of a web application servicing millions of we hits per day with thousands of concurrent users improved when we reconfigured our database connection pool to be about 35 instead of 55, on a 16 core box with a 40 drive RAID. http://wiki.postgresql.org/wiki/Number_Of_Database_Connections -Kevin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
On 25/11/12 09:30, Kevin Grittner wrote: Vlad wrote: it's session mode and the pool size is 1200 (cause I need to grantee that in the worst case we have enough slots for all possible clients), We found that the real-world production performance of a web application servicing millions of we hits per day with thousands of concurrent users improved when we reconfigured our database connection pool to be about 35 instead of 55, on a 16 core box with a 40 drive RAID. http://wiki.postgresql.org/wiki/Number_Of_Database_Connections -Kevin Curious, what is your RAID configuration? Cheers, Gavin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
On Wed, Nov 21, 2012 at 11:05 AM, Vlad marche...@gmail.com wrote: it's session mode and the pool size is 1200 (cause I need to grantee that in the worst case we have enough slots for all possible clients), however even at the times preceding high-cpu-sys-stall, the number postmasters are like 15-20. When stall happens, it starts to raise but that's the result of DB not answering in timely manner. That is completely defeating the point. Mode should be transaction mode and somewhere between 1-2x the number of cores for the pool size. Note the # of connected clients does not have to equal the size of the pool -- that's the whole point of pgboucner -- it forces the queries through a fixed number of backend processes. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
what pgbouncer mode, and how large is your pool. '-M prepared' produces normal results, while '-M simple' results in 40% sys cpu. '-M extended' is somewhere in between. I'm running it as 60 clients, 2 threads. -- Vlad
Re: [GENERAL] High SYS CPU - need advise
Gavin Flower wrote: We found that the real-world production performance of a web application servicing millions of we hits per day with thousands of concurrent users improved when we reconfigured our database connection pool to be about 35 instead of 55, on a 16 core box with a 40 drive RAID. Curious, what is your RAID configuration? We have the OS on RAID 1 on one RAID controller, WAL on another RAID 1 array on its own controller, and the PostgreSQL data directory on RAID 5 on a pair of controllers which each has a path to every drive. (Don't ask me how that works, I just trust the hardware guys to get that part right.) The active portion of the database is cached, the read-to-write ratio is about 10 to 1, and the controllers all have BBU cache with write-back. It is actually one pool of about 30 for the read-only web app with so many hits, and a separate pool of 6 database connections for logical replication from about 80 source databases handling an OLTP load of 3000 directly connected users. We were seeing some performance degradation at peak loads, and improved both user response time and overall throughput by reducing the database connections used by the connection pool. Peak load was handled much better that way. I cringe every time I see someone say they need a large number of database connections in order to support that many users. Having one database connection for each user is, in my experience, guaranteed to make things fall apart under load. -Kevin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
it's session mode and the pool size is 1200 (cause I need to grantee that in the worst case we have enough slots for all possible clients), however even at the times preceding high-cpu-sys-stall, the number postmasters are like 15-20. When stall happens, it starts to raise but that's the result of DB not answering in timely manner. as far as data - yes, I can probably clean up names and provide some data with same queries, though - we are not able to cause a stall just with those queries pgbench; we are able to make the situation worse and stall much likely when it's overlapped with regular queries coming from our app. -- Vlad On Wed, Nov 21, 2012 at 10:45 AM, Merlin Moncure mmonc...@gmail.com wrote: On Wed, Nov 21, 2012 at 10:43 AM, Jeff Janes jeff.ja...@gmail.com wrote: On Wed, Nov 21, 2012 at 7:29 AM, Vlad Marchenko marche...@gmail.com wrote: update on my problem: despite pgbouncer, the problem still occures on my end. As Merlin asked, how big is the pool? Maybe you are using a large enough pool so as to defeat the purpose of restricting the number of connections. also, what mode. merlin
Re: [GENERAL] High SYS CPU - need advise
On 25/11/12 11:11, Kevin Grittner wrote: Gavin Flower wrote: We found that the real-world production performance of a web application servicing millions of we hits per day with thousands of concurrent users improved when we reconfigured our database connection pool to be about 35 instead of 55, on a 16 core box with a 40 drive RAID. Curious, what is your RAID configuration? We have the OS on RAID 1 on one RAID controller, WAL on another RAID 1 array on its own controller, and the PostgreSQL data directory on RAID 5 on a pair of controllers which each has a path to every drive. (Don't ask me how that works, I just trust the hardware guys to get that part right.) The active portion of the database is cached, the read-to-write ratio is about 10 to 1, and the controllers all have BBU cache with write-back. It is actually one pool of about 30 for the read-only web app with so many hits, and a separate pool of 6 database connections for logical replication from about 80 source databases handling an OLTP load of 3000 directly connected users. We were seeing some performance degradation at peak loads, and improved both user response time and overall throughput by reducing the database connections used by the connection pool. Peak load was handled much better that way. I cringe every time I see someone say they need a large number of database connections in order to support that many users. Having one database connection for each user is, in my experience, guaranteed to make things fall apart under load. -Kevin Thanks for a very comprehensive answer! (almost simply asked what 'type' of RAID did you have, I am glad I rephrased that! I learnt far more than I expected, which is good - more than merely satisfied my curiosity!. I use RAID-6; but in development, reliability (for me, I've had 5 hard disk crashes in 20 years) is orders of magnitude more important than performance, with rare exceptions! I can understand the mentality of wanting one connection per user, though I fully know why it is the wrong approach - I would also want to use connection pooling in your situation. Backing the 1980's I was a Mainframe COBOL analyst/programmer and just starting to use databases. Initially I thought of a transaction in terms of the user wanting to ensure their display and update were in the same transaction. Took me a while to fully appreciate and accept that transactions should be short to protect the database while not blocking other users too much (multi-user access was also a whole new concept), and that keeping a transaction open for user actions was not the 'done thing'! It requires a whole new change of mindset – something a lot of people find difficult, as it conflicts with part of their implicit belief system (just try discussing even the most obvious flaws in Creationism with a Believer!). Now I'm quite comfortable with the idea of multi-user and am much more database centric while still appreciating the needs of users. Now my weapons of choice are Java Linux, with Postgres being my preferred database. Cheers, Gavin
Re: [GENERAL] High SYS CPU - need advise
On Tue, Nov 20, 2012 at 4:08 PM, Jeff Janes jeff.ja...@gmail.com wrote: It strikes me as cavalier to be resetting trycounter while sitting under the #1 known contention point for read only workloads. The only use for the trycounter is to know when to ERROR out with no unpinned buffers available, so not resetting that seems entirely wrong. I would contest the #1 known contention point claim. We know that the freelist lock is a point of contention under certain conditions, but we (or at least I) also know that it is the mere acquisition of this lock, and not the work done while it is held, that is important. Well, I'm speculating (without evidence) if that is in fact might not be always true. It certainly is hard to demonstrate with synthetic testing though. I guess the first step would be to whip up a diagnostic patch to confirm malfeasance in suspicious cases. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
On Tue, Nov 20, 2012 at 12:00 PM, Merlin Moncure mmonc...@gmail.com wrote: On Tue, Nov 20, 2012 at 12:16 PM, Jeff Janes jeff.ja...@gmail.com wrote: The freelist should never loop. It is written as a loop, but I think there is currently no code path which ends up with valid buffers being on the freelist, so that loop will never, or at least rarely, execute more than once. Both of those operations are dependent on the number of buffers being managed and so it's reasonable to expect some workloads to increase contention with more buffers. The clock sweep can depend on the number of buffers begin managed in a worst-case sense, but I've never seen any evidence (nor analysis) that this worst case can be obtained in reality on an ongoing basis. By constructing two pathological workloads which get switched between, I can get the worst-case to happen, but when it does happen the consequences are mild compared to the amount of time needed to set up the necessary transition. In other words, the worse-case can't be triggered often enough to make a meaningful difference. Yeah, good points; but (getting off topic here) : there have been several documented cases of lowering shared buffers improving performance under contention...the 'worst case' might be happening more than expected. The ones that I am aware of (mostly Greg Smith's case studies) this has been for write-intensive work loads and are related to writes/fsyncs getting gummed up. Shaun Thomas reports one that is (I assume) not read intensive, but his diagnosis is that this is a kernel bug where a larger shared_buffers for no good reason causes the kernel to kill off its page cache. From the kernel's perspective, the freelist lock doesn't look any different from any other lwlock, so I doubt that is issue is related to freelist lock. In particular, what happens when a substantial percentage of the buffer pool is set with a non-zero usage count? The current clock sweep algorithm is an extraordinary usagecount decrementing machine. From what I know, the only way to get much more than half of the buffers to be non-zero usage count is for the clock-sweep to rarely run (in which case, it is hard to be the bottleneck if it rarely runs), or for most of the buffer-cache to be pinned simultaneously. This seems unlikely, but possible? Take note: if (buf-refcount == 0) { if (buf-usage_count 0) { buf-usage_count--; trycounter = NBuffers; /* emphasis *./ } ISTM time spent here isn't bounded except that as more time is spent sweeping (more backends are thus waiting and not marking pages) the usage counts decrease faster until you hit steady state. But that is a one time thing. Once you hit the steady state, how do you get away from it again, such that a large amount of work is needed again? Smaller buffer pool naturally would help in that scenario as your usage counts would drop faster. They would drop at the same rate in absolute numbers, barring the smaller buffer_cache fitting entirely in the on-board CPU cache. They would drop faster in percentage terms, but they would also increase faster in percentage terms once a candidate is found and a new page read into it. It strikes me as cavalier to be resetting trycounter while sitting under the #1 known contention point for read only workloads. The only use for the trycounter is to know when to ERROR out with no unpinned buffers available, so not resetting that seems entirely wrong. I would contest the #1 known contention point claim. We know that the freelist lock is a point of contention under certain conditions, but we (or at least I) also know that it is the mere acquisition of this lock, and not the work done while it is held, that is important. If I add a spurious LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE); LWLockRelease(BufFreelistLock); to each execution of StrategyGetBuffer, then contention kicks in twice as fast. But if I instead hack the clock sweep to run twice as far (ignore the first eligible buffer it finds, and go find another one) but all under the cover of a single BufFreelistLock acquisition, there was no meaningful increase in contention. This was all on a 4 socket x 2 core/socket opteron machine which I no longer have access to. Using a more modern 8 core on a single socket, I can't get it to collapse on BufFreelistLock at all, presumably because the cache coherence mechanisms are so much faster. Shouldn't SBF() work on advisory basis and try to force a buffer after N failed usage count attempts? I believe Simon tried that a couple commit-fests ago, and no one could show that it made a difference. Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription:
Re: [GENERAL] High SYS CPU - need advise
On Wed, Nov 21, 2012 at 8:14 AM, Vlad marche...@gmail.com wrote: '-M prepared' produces normal results, while '-M simple' results in 40% sys cpu. '-M extended' is somewhere in between. I'm running it as 60 clients, 2 threads. 2 threads is pretty low for 60 clients. What happens if you increase -j to either 30 or 60? (What I'm wondering here is if perhaps in this case it is pgbench which is triggering the scheduling issue) Also, what happens if you channel pgbench through pgbouncer? Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
On Wed, Nov 21, 2012 at 12:17 PM, Vlad marche...@gmail.com wrote: It turned out we can't use transaction mode, cause there are prepared statement used a lot within code, while processing a single http request. prepare statements can be fudged within some constraints. if prepared statements are explicitly named via PREPARE, you can simply prepare them all on server connection via connect_query setting and disable the manual preparation. you then change the server_reset_query so that they are not discarded. some basic experimentation might confirm if this is viable strategy. automatic protocol level statements can be an issue though. Also, I can't 100% rule out that there won't be any long running (statistical) queries launched (even though such requests should not come to this database), which would occupy connection for longer time, but do not create any race condition... So having pool size at 8 may be too slim . there are a number of simple tricks to deal with this: 1) move long running queries to their own pool (by changing login user or connection string) 2) bypass pgbouncer in those cases 3) increase pool size merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
Merlin, On Wed, Nov 21, 2012 at 2:17 PM, Merlin Moncure mmonc...@gmail.com wrote: On Wed, Nov 21, 2012 at 12:17 PM, Vlad marche...@gmail.com wrote: It turned out we can't use transaction mode, cause there are prepared statement used a lot within code, while processing a single http request. prepare statements can be fudged within some constraints. if prepared statements are explicitly named via PREPARE, you can simply prepare them all on server connection via connect_query setting and disable the manual preparation. you then change the server_reset_query so that they are not discarded. some basic experimentation might confirm if this is viable strategy. automatic protocol level statements can be an issue though. We have 350k+ lines of code in our app, so this is not quite feasible as I'd wish. Also, I can't 100% rule out that there won't be any long running (statistical) queries launched (even though such requests should not come to this database), which would occupy connection for longer time, but do not create any race condition... So having pool size at 8 may be too slim . there are a number of simple tricks to deal with this: 1) move long running queries to their own pool (by changing login user or connection string) 2) bypass pgbouncer in those cases 3) increase pool size It's pretty much already setup so that long running queries should not hit the same DB cluster as those with (potentially) high connection/query rates, but I still can't rule out that no long-running queries will be issued via pgbouncer. Either case - it seems that the combination of pool size = 200 / pool mode = session / server_lifetime = 30 makes things stable for now. I'm planning to repeat my case on 2.6.x kernel, but it will be a while before I have chance to do that. Thanks. -- vlad
Re: [GENERAL] High SYS CPU - need advise
It turned out we can't use transaction mode, cause there are prepared statement used a lot within code, while processing a single http request. Also, I can't 100% rule out that there won't be any long running (statistical) queries launched (even though such requests should not come to this database), which would occupy connection for longer time, but do not create any race condition... So having pool size at 8 may be too slim . It's a single database. -- Vlad On Wed, Nov 21, 2012 at 12:04 PM, Merlin Moncure mmonc...@gmail.com wrote: On Wed, Nov 21, 2012 at 11:56 AM, Vlad marche...@gmail.com wrote: ok, understood. I need to give some more thoughts to if it's possible for us to switch to transaction mode from app standpoint of view. if yes, then setting pool size to 20 (for 8 cores) sounds OK? If it was me, I would be starting with exactly 8 and be tuning from there. Don't forget: each database role gets its own pool. merlin
Re: [GENERAL] High SYS CPU - need advise
On Wed, Nov 21, 2012 at 11:56 AM, Vlad marche...@gmail.com wrote: ok, understood. I need to give some more thoughts to if it's possible for us to switch to transaction mode from app standpoint of view. if yes, then setting pool size to 20 (for 8 cores) sounds OK? If it was me, I would be starting with exactly 8 and be tuning from there. Don't forget: each database role gets its own pool. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
ok, understood. I need to give some more thoughts to if it's possible for us to switch to transaction mode from app standpoint of view. if yes, then setting pool size to 20 (for 8 cores) sounds OK? -- Vlad
Re: [GENERAL] High SYS CPU - need advise
update on my problem: despite pgbouncer, the problem still occures on my end. Also, interesting observation - I ran several tests with pgbench, using queries that I think are prone to trigger high-sys-cpu-stall. What I noticed is when pgbench is started with prepared mode, the system behaves fine during stress-test (high user cpu - 85-90%, low sys cpu - 5-7%), high TPS. Though when I used non-prepared modes, the sys cpu portion jumps to 40% (and tps drops dramatically as well, but this is understandable). The test queries are pretty long (10kb+), with couple of outer joins across 1000-record tables with indexes. Maybe, we are looking in a wrong place and the issue is somewhere within planer/parser? Is there some extensive locking used in there? Another observation - it's harder to trigger high-sys-cpu stall on a freshly restarted postgresql. Though if it was running for a while, then it's much easier to do. -- vlad On 11/19/12 8:33 AM, Merlin Moncure wrote: On Sun, Nov 18, 2012 at 4:24 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Fri, Nov 16, 2012 at 12:13 PM, Vlad marche...@gmail.com wrote: ok, I've applied that patch and ran. The stall started around 13:50:45...50 and lasted until the end https://dl.dropbox.com/u/109778/postgresql-2012-11-16_134904-stripped.log That isn't as much log as I expected. But I guess only the tip of the iceberg gets logged, it could be that most spinlocks are contended but then get acquired just before the sleep (and log) would have happened. I'm not sure how to figure out just how much spinning is going on below the surface, but my gut feeling is there isn't enough of it to explain the amount of slowdown you are seeing (and it would probably be accounted for as user time rather than system time) yes. classic spinlock contention symptoms are high user cpu and almost work getting done. something else is going on here? Maybe we could whip up a dummy standalone program that uses PG's spinlock code in a intentionally contended way and see how your system reports resource usage for that. Maybe your system reports fighting for cache lines or whatever is going on behind the scenes as system time rather than user time, or something. I don't think this is the case. It's looking more and more like scheduler issues, although it's unclear if that is due to some kind of regression or just symptoms of backends piling up. I'm also starting to wonder if the spinlock contention we are seeing (which according to your patch really isn't all that much really) is just symptomatic of scheduler issues. It would be a little easier to figure out what was going on there if the log included the pid (%p) so we can see if it is the same process doing a bunch of consecutive sleeps, or a bunch of different processes. Also, logging the millisecond (%m rather than %t) would be helpful. But the way that the point of contention jumps around a lot points to a scheduler issue rather than a postgres-specific issue. A backend gets pre-empted while holding a spinlock, and then the kernel is resistant to rescheduling it even though all other processes are piled up behind it. But again, that theory would lead to more log lines than we see, I would think. yeah. I asked for some longer strace logs but then everyone went home for the weekend. Anyway, pgbouncer might help by ensuring that a connection gets pre-empted by pgbouncer at a point where it has no locks, rather than by the kernel at a point where it does hold locks. yes -- I think pgbouncer is the key here because it's going to tell us if the scheduler is the culprit here. If you look carefully at the strace logs, you'll see that system call times become unpredictable as they are getting scheduled out...particularly yielding calls like select(). So we need to determine if this is normal high load behavior or if there is an underlying mechanic. Any chance you could try an older kernel and see if that reduces the problem? This is one thing to try. Personally I think pgbouncer/transaction mode is low hanging fruit if it 'drops in' to OP's app. If problem goes away this will pretty much point the finger at Linux AFAICT (in which case we can start messing around with the kernel). merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
On Wed, Nov 21, 2012 at 9:29 AM, Vlad Marchenko marche...@gmail.com wrote: update on my problem: despite pgbouncer, the problem still occures on my end. Also, interesting observation - I ran several tests with pgbench, using queries that I think are prone to trigger high-sys-cpu-stall. What I noticed is when pgbench is started with prepared mode, the system behaves fine during stress-test (high user cpu - 85-90%, low sys cpu - 5-7%), high TPS. Though when I used non-prepared modes, the sys cpu portion jumps to 40% (and tps drops dramatically as well, but this is understandable). The test queries are pretty long (10kb+), with couple of outer joins across 1000-record tables with indexes. Maybe, we are looking in a wrong place and the issue is somewhere within planer/parser? Is there some extensive locking used in there? Another observation - it's harder to trigger high-sys-cpu stall on a freshly restarted postgresql. Though if it was running for a while, then it's much easier to do. what pgbouncer mode, and how large is your pool. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
On Wed, Nov 21, 2012 at 7:29 AM, Vlad Marchenko marche...@gmail.com wrote: update on my problem: despite pgbouncer, the problem still occures on my end. As Merlin asked, how big is the pool? Maybe you are using a large enough pool so as to defeat the purpose of restricting the number of connections. Also, interesting observation - I ran several tests with pgbench, using queries that I think are prone to trigger high-sys-cpu-stall. What I noticed is when pgbench is started with prepared mode, the system behaves fine during stress-test (high user cpu - 85-90%, low sys cpu - 5-7%), high TPS. Though when I used non-prepared modes, the sys cpu portion jumps to 40% (and tps drops dramatically as well, but this is understandable). The test queries are pretty long (10kb+), with couple of outer joins across 1000-record tables with indexes. Could you sanitize the queries (and some statements to generate dummy data) enough to share? Maybe, we are looking in a wrong place and the issue is somewhere within planer/parser? Is there some extensive locking used in there? I don't think the locking is particular extensive, but it could be enough extra to drive something over the edge. But it would be the same nature of locking as elsewhere (spinlocks and lwlocks), so it doesn't really change the nature of the problem, which is still Why do these user-space locks turn into high SYS cpu? Another observation - it's harder to trigger high-sys-cpu stall on a freshly restarted postgresql. Though if it was running for a while, then it's much easier to do. Maybe the long running time has built up enough resource usage to cause the kernel scheduler to get into a snit, so it decides to preempt the process while it holds a spinlock, and then refuses to run it again for a while. Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
Some additional observation and food for thoughts. Our app uses connection caching (Apache::DBI). By disabling Apache::DBI and forcing client re-connection for every (http) request processed I eliminated the stall. The user cpu usage jumped (mostly cause prepared sql queries are no longer available, and some additional overhead on re-connection), but no single case of high-sys-cpu stall. I can not completely rule out the possibility of some left-overs (unfinished transaction?) remain after serving http request, which, in the absence of connection caching, are discarded for sure -- Vlad On Mon, Nov 19, 2012 at 11:19 AM, Merlin Moncure mmonc...@gmail.com wrote: yeah. interesting -- contention was much higher this time and that changes things. strange how it was missed earlier. you're getting bounced around a lot in lwlock especially (unfortunately we don't know which one). I'm going to hazard another guess: maybe the trigger here is when the number of contending backends exceeds some critical number (probably based on the number of cores) you see a quick cpu spike (causing more backends to lock and pile up) as cache line bouncing sets in. That spike doesn't last long, because the spinlocks quickly accumulate delay counts then punt to the scheduler which is unable to cope. The exact reason why this is happening to you in exactly this way (I've never seen it) is unclear. Also the line between symptom and cause is difficult to draw. unfortunately, in your case spinlock re-scheduling isn't helping. log entries like this one: 18764 [2012-11-19 10:43:50.124 CST] LOG: JJ spin delay from file sinvaladt.c line 512 delay 212, pointer 0x7f514959a394 at character 29 are suggesting major problems. you're dangerously close to a stuck spinlock which is lights out for the database. merlin
Re: [GENERAL] High SYS CPU - need advise
On Fri, Nov 16, 2012 at 3:21 PM, Vlad marche...@gmail.com wrote: what would pgbouncer do in my case? Number of connections will decrease, but number of active clients won't be smaller. As I understand the latter ones are that important. Well, one thing that struck me was how little spinlock contention there actually was. Yeah, backends are delaying here and there, which isn't great, but but a few dozen delays per second across several hundred backends doesn't seem like it should be pegging sys cpu%. This is all pointing to the problem not being in postgres, but in linux. pgbouncer would do two things: 1) perhaps guard you against some o/s issue 2) keep system more responsive during stall (since by controlling the pool you control the number of queries piling up). of course, this comes at the expense of some overhead. But, before going through all that, how about timing strace recorded calls (strace -T) both in stall and non-stall conditions. I was assuming select(), but maybe it's something elsefor example the recently fixed lseek. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
ok, I've applied that patch and ran. The stall started around 13:50:45...50 and lasted until the end https://dl.dropbox.com/u/109778/postgresql-2012-11-16_134904-stripped.log the actual log has more data (including statement following each 'spin delay' record), but there is some sensitive info inside which I can't share with public. -- Vlad yup. this problem doesn't smell like lwlock issues. typically there the problem manifests as low cpu performance, everybody waits. classic spinlock contention (at least from what i've seen) is very high *userspace* cpu utilization and low work output. this time it's different -- OP is bogging in the kernel so it's not impossible we're troubleshooting the symptom, not the cause. In 9.3 there is a new field that tells how many spin delays there were on the mutex that is behind each lock. That was commit b79ab00144e64217d41, maybe he can port that back to his version. But that only tells you about LWLock mutexes, not about all the other ones in PG. The attached patch logs every spin delay with where in the source it comes from. yeah, OP should fire this off. good stuff. I'll bet lunch (if we ever happen to meet) it's on buffer pin. merlin
Re: [GENERAL] High SYS CPU - need advise
On Mon, Nov 19, 2012 at 12:02 PM, Vlad marche...@gmail.com wrote: Some additional observation and food for thoughts. Our app uses connection caching (Apache::DBI). By disabling Apache::DBI and forcing client re-connection for every (http) request processed I eliminated the stall. The user cpu usage jumped (mostly cause prepared sql queries are no longer available, and some additional overhead on re-connection), but no single case of high-sys-cpu stall. I can not completely rule out the possibility of some left-overs (unfinished transaction?) remain after serving http request, which, in the absence of connection caching, are discarded for sure well, that is good. note for pgbouncer transaction mode you would have had to quit using prepared statements anyways (see list of features here: http://wiki.postgresql.org/wiki/PgBouncer). you can use session mode though, and that would be a good test. faster speed could be because of application issues or maybe just spreading out the queries reduced contention. hard to say. well, what happens next is up to you -- we can spend more time chasing this down if you want, or you can hold back and see how things run for a while. what is your user load up to in the worst case? merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
On Fri, Nov 16, 2012 at 2:13 PM, Vlad marche...@gmail.com wrote: ok, I've applied that patch and ran. The stall started around 13:50:45...50 and lasted until the end https://dl.dropbox.com/u/109778/postgresql-2012-11-16_134904-stripped.log the actual log has more data (including statement following each 'spin delay' record), but there is some sensitive info inside which I can't share with public. first thoughts: no single thing really stands out -- contention is all over the place. lwlock, pinbuffer, dynahash (especially). I am again suspicious of bad scheduler interaction. any chance we can fire up pgbouncer? merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
On 11/16/2012 02:31 PM, Merlin Moncure wrote: no single thing really stands out -- contention is all over the place. lwlock, pinbuffer, dynahash (especially). I am again suspicious of bad scheduler interaction. any chance we can fire up pgbouncer? Just want to throw it out there, but we've been having really bad luck with the scheduler recently. But only when we use 8GB (on our 72GB system) for shared_buffers. Cut that down to 4GB, and everything is fine and dandy. I think the kernel devs have added in some overzealous scheduler code on us. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 stho...@optionshouse.com __ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
On Fri, Nov 16, 2012 at 12:13 PM, Vlad marche...@gmail.com wrote: ok, I've applied that patch and ran. The stall started around 13:50:45...50 and lasted until the end https://dl.dropbox.com/u/109778/postgresql-2012-11-16_134904-stripped.log That isn't as much log as I expected. But I guess only the tip of the iceberg gets logged, it could be that most spinlocks are contended but then get acquired just before the sleep (and log) would have happened. I'm not sure how to figure out just how much spinning is going on below the surface, but my gut feeling is there isn't enough of it to explain the amount of slowdown you are seeing (and it would probably be accounted for as user time rather than system time) Maybe we could whip up a dummy standalone program that uses PG's spinlock code in a intentionally contended way and see how your system reports resource usage for that. Maybe your system reports fighting for cache lines or whatever is going on behind the scenes as system time rather than user time, or something. It would be a little easier to figure out what was going on there if the log included the pid (%p) so we can see if it is the same process doing a bunch of consecutive sleeps, or a bunch of different processes. Also, logging the millisecond (%m rather than %t) would be helpful. But the way that the point of contention jumps around a lot points to a scheduler issue rather than a postgres-specific issue. A backend gets pre-empted while holding a spinlock, and then the kernel is resistant to rescheduling it even though all other processes are piled up behind it. But again, that theory would lead to more log lines than we see, I would think. Anyway, pgbouncer might help by ensuring that a connection gets pre-empted by pgbouncer at a point where it has no locks, rather than by the kernel at a point where it does hold locks. Any chance you could try an older kernel and see if that reduces the problem? One thing that does stand out is that it looks like partitioned hash tables in dynahash.c could benefit by having their freelist partitioned as well, assuming we can give up the hash_get_num_entries function. Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
On Tue, Nov 20, 2012 at 9:02 AM, Shaun Thomas stho...@optionshouse.com wrote: On 11/16/2012 02:31 PM, Merlin Moncure wrote: no single thing really stands out -- contention is all over the place. lwlock, pinbuffer, dynahash (especially). I am again suspicious of bad scheduler interaction. any chance we can fire up pgbouncer? Just want to throw it out there, but we've been having really bad luck with the scheduler recently. But only when we use 8GB (on our 72GB system) for shared_buffers. Cut that down to 4GB, and everything is fine and dandy. I think the kernel devs have added in some overzealous scheduler code on us. Shared buffer manipulation changing contention is suggesting you're running into free list lock issues. How many active backends/cores? merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
On 11/20/2012 10:03 AM, Merlin Moncure wrote: Shared buffer manipulation changing contention is suggesting you're running into free list lock issues. How many active backends/cores? Oh, the reason I wanted to point it out was that we see a lot more than CPU contention with higher shared_buffer settings. The scheduler starts aggressively clearing out the file cache *and* freeing memory, driving load way higher than it should be. That doesn't result in a very high iowait, but all that page swapping sure isn't free. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 stho...@optionshouse.com __ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
On Tue, Nov 20, 2012 at 10:12 AM, Shaun Thomas stho...@optionshouse.com wrote: On 11/20/2012 10:03 AM, Merlin Moncure wrote: Shared buffer manipulation changing contention is suggesting you're running into free list lock issues. How many active backends/cores? Oh, the reason I wanted to point it out was that we see a lot more than CPU contention with higher shared_buffer settings. The scheduler starts aggressively clearing out the file cache *and* freeing memory, driving load way higher than it should be. That doesn't result in a very high iowait, but all that page swapping sure isn't free. have you ruled out numa issues? (http://frosty-postgres.blogspot.com/2012/08/postgresql-numa-and-zone-reclaim-mode.html) merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
On 11/20/2012 10:13 AM, Merlin Moncure wrote: have you ruled out numa issues? (http://frosty-postgres.blogspot.com/2012/08/postgresql-numa-and-zone-reclaim-mode.html) Haha. Yeah. Our zone reclaim mode off, and node distance is 10 or 20. ZCM is only enabled by default if distance is 20, unless there's some kernel bug triggering it even when zone reclaim reports as off. I'll also note that my tests with interleave made no difference at all. At least not with pgbench. There was a small amount of flux with TPS, but the page swap storms came regardless of NUMA tweaks. The formula worked like this: High connection count + high shared_buffers = page swap storm. I'll note that 8GB - 4GB immediately stopped the paging everywhere, and the OS went from using 13GB for active file cache, to 45GB. I can't see how PG would cause something like that by itself. I only piped in because it's very adversely affecting our CPU load in a similarly inexplicable-but-seemingly-the-scheduler way. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 stho...@optionshouse.com __ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
On Thu, Nov 15, 2012 at 4:29 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Merlin Moncure escribió: ok, excellent. reviewing the log, this immediately caught my eye: recvfrom(8, \27\3\1\0@, 5, 0, NULL, NULL) = 5 recvfrom(8, \327\327\nl\231LD\211\346\243@WW\254\244\363C\326\247\341\177\255\263~\327HDv-\3466\353..., 64, 0, NULL, NULL) = 64 select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 2000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 3000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 4000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 6000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 7000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 8000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 9000}) = 0 (Timeout) semop(41713721, {{2, 1, 0}}, 1) = 0 lseek(295, 0, SEEK_END) = 0 lseek(296, 0, SEEK_END) = 8192 this is definitely pointing to spinlock issue. I met Rik van Riel (Linux kernel hacker) recently and we chatted about this briefly. He strongly suggested that we should consider using futexes on Linux instead of spinlocks; the big advantage being that futexes sleep instead of spinning when contention is high. That would reduce the system load in this scenario. By the way, I think you (and Rik) 100% right. We're slamming select() where futex has much smarter wait mechanism. There are reports from users who have switched to pthreads and reporting better scalability. (https://groups.google.com/forum/?fromgroups=#!topic/pgsql.hackers/v8V82gznxs8) merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
On Tue, Nov 20, 2012 at 8:03 AM, Merlin Moncure mmonc...@gmail.com wrote: On Tue, Nov 20, 2012 at 9:02 AM, Shaun Thomas stho...@optionshouse.com wrote: On 11/16/2012 02:31 PM, Merlin Moncure wrote: no single thing really stands out -- contention is all over the place. lwlock, pinbuffer, dynahash (especially). I am again suspicious of bad scheduler interaction. any chance we can fire up pgbouncer? Just want to throw it out there, but we've been having really bad luck with the scheduler recently. But only when we use 8GB (on our 72GB system) for shared_buffers. Cut that down to 4GB, and everything is fine and dandy. I think the kernel devs have added in some overzealous scheduler code on us. Shared buffer manipulation changing contention is suggesting you're running into free list lock issues. I wouldn't expect so. Increasing shared_buffers should either fix free list lock contention, or leave it unchanged, not make it worse. Sounds more like that NUMA/interleave/zone reclaim stuff I could never wrap my head around. Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
On Tue, Nov 20, 2012 at 10:50 AM, Jeff Janes jeff.ja...@gmail.com wrote: On Tue, Nov 20, 2012 at 8:03 AM, Merlin Moncure mmonc...@gmail.com wrote: On Tue, Nov 20, 2012 at 9:02 AM, Shaun Thomas stho...@optionshouse.com wrote: On 11/16/2012 02:31 PM, Merlin Moncure wrote: no single thing really stands out -- contention is all over the place. lwlock, pinbuffer, dynahash (especially). I am again suspicious of bad scheduler interaction. any chance we can fire up pgbouncer? Just want to throw it out there, but we've been having really bad luck with the scheduler recently. But only when we use 8GB (on our 72GB system) for shared_buffers. Cut that down to 4GB, and everything is fine and dandy. I think the kernel devs have added in some overzealous scheduler code on us. Shared buffer manipulation changing contention is suggesting you're running into free list lock issues. I wouldn't expect so. Increasing shared_buffers should either fix free list lock contention, or leave it unchanged, not make it worse. AIUI, that is simply not true (unless you raise it to the point you're not churning them). I'm looking at StrategyGetBuffer() for non-scan cases. It locks BufFreelistLock then loops the free list, and, if it finds nothing, engages a clock sweep. Both of those operations are dependent on the number of buffers being managed and so it's reasonable to expect some workloads to increase contention with more buffers. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
what would pgbouncer do in my case? Number of connections will decrease, but number of active clients won't be smaller. As I understand the latter ones are that important. -- Vlad On Fri, Nov 16, 2012 at 2:31 PM, Merlin Moncure mmonc...@gmail.com wrote: first thoughts: no single thing really stands out -- contention is all over the place. lwlock, pinbuffer, dynahash (especially). I am again suspicious of bad scheduler interaction. any chance we can fire up pgbouncer? merlin
Re: [GENERAL] High SYS CPU - need advise
On 11/20/2012 04:08 PM, Jeff Janes wrote: Shaun Thomas reports one that is (I assume) not read intensive, but his diagnosis is that this is a kernel bug where a larger shared_buffers for no good reason causes the kernel to kill off its page cache. We're actually very read intensive. According to pg_stat_statements, we regularly top out at 42k queries per second, and pg_stat_database says we're pushing 7k TPS. But I'm still sure this is a kernel bug. Moving from 4GB to 6GB or 8GB causes the kernel to cut the active page cache in half, in addition to freeing 1/4 of RAM to just sit around doing nothing. That in turn causes kswapd to work constantly, while our IO drivers work to undo the damage. It's a positive feedback loop that I can reliably drive the load up to 800+ on an 800-client pgbench with two threads, all while having 0% CPU free. Make that 4GB, and not only does the problem completely disappear, but the load settles down to around 9, and the machine becomes about 60% idle. Something in there is fantastically broken, but I can't point a finger at what. I was just piping in because, in absence of an obvious PG-related culprit, the problem could be the OS itself. It certainly was in our case. That, or PG has a memory leak that only appears at 4GB of shared buffers. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 stho...@optionshouse.com __ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
On Tue, Nov 20, 2012 at 2:26 PM, Shaun Thomas stho...@optionshouse.com wrote: On 11/20/2012 04:08 PM, Jeff Janes wrote: Shaun Thomas reports one that is (I assume) not read intensive, but his diagnosis is that this is a kernel bug where a larger shared_buffers for no good reason causes the kernel to kill off its page cache. We're actually very read intensive. Sorry, that is what I meant. I wrote not write intensive, then decided it would be clearer to change not write to read and then forgot to remove the not. I hate it when that happens. Atomic update commit failure in the meatware :) Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
On 11/20/2012 04:35 PM, Jeff Janes wrote: Atomic update commit failure in the meatware :) Ha. What's actually funny is that one of the affected machines started *swapping* earlier today. With 15GB free, and 12GB of inactive cache, and vm.swappiness set to 0, it somehow decided there was enough memory pressure to push 1.5GB of *something* into swap. Frankly, the kernel we're using is clearly deranged. There's really no other way to explain this. But we were blaming various elements of PG for several weeks following our upgrade. After further examination, our query profile isn't any different, and the load spikes were effectively random. All I can say is... be wary of the 3.2 kernel. It's... unique. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 stho...@optionshouse.com __ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
On 21/11/12 11:41, Shaun Thomas wrote: On 11/20/2012 04:35 PM, Jeff Janes wrote: Atomic update commit failure in the meatware :) Ha. What's actually funny is that one of the affected machines started *swapping* earlier today. With 15GB free, and 12GB of inactive cache, and vm.swappiness set to 0, it somehow decided there was enough memory pressure to push 1.5GB of *something* into swap. Frankly, the kernel we're using is clearly deranged. There's really no other way to explain this. But we were blaming various elements of PG for several weeks following our upgrade. After further examination, our query profile isn't any different, and the load spikes were effectively random. All I can say is... be wary of the 3.2 kernel. It's... unique. You'll just have to upgrade to Microsoft! Ducks, and runs away _VERY_ quickly! Could you not install a later kernel? Cheers, Gavin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
OK, so far I settled on excluding connection caching on app side (Apache::DBI and prepare_cached) from equation and adding pgbouncer as a counter-measure. This seems to stabilize the situation - at least I'm not able to push server into high-sys-cpu stall the way how I used to do. I'm still interested in getting to the bottom of the issue vs alleviating it with connection poller, so I might setup a server on Centos 6.3 (kernel ver 2.6.32) and give it a try. Alternatively, if anyone have ideas on playing with sysctl parameters related to scheduler - please let me know. -- vlad -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
On Fri, Nov 16, 2012 at 11:19 AM, Vlad marche...@gmail.com wrote: We're looking for spikes in 'blk' which represents when lwlocks bump. If you're not seeing any then this is suggesting a buffer pin related issue -- this is also supported by the fact that raising shared buffers didn't help. If you're not seeing 'bk's, go ahead and disable the stats macro. most blk comes with 0, some with 1, few hitting 100. I can't say that during stall times the number of blk 0 vs blk non-0 are very different. right. this is feeling more and more like a buffer pin issue. but even then we can't be certain -- it could be symptom, not the cause. to prove it we need to demonstrate that everyone is spinning and waiting, which we haven't done. classic spinlock contention manifests in high user cpu. we are binding in kernel, so I wonder if it's all the select() calls. we haven't yet ruled out kernel regression. If I were you, I'd be investigating pgbouncer to see if your app is compliant with transaction mode processing, if for no other reason than it will mitigate high load issues. *) How many specific query plans are needed to introduce the condition, Hopefully, it's not too many. If so, let's start gathering the plans. If you have a lot of plans to sift through, one thing we can attempt to eliminate noise is to tweak log_min_duration_statement so that during stall times (only) it logs offending queries that are unexpectedly blocking. unfortunately, there are quite a few query plans... also, I don't think setting log_min_duration_statement will help us, cause when server is hitting high load average, it reacts slowly even on a key press. So even non-offending queries will be taking long to execute. I see all sorts of queries a being executed long during stall: spanning from simple LOG: duration: 1131.041 ms statement: SELECT 'DBD::Pg ping test' to complex ones, joining multiple tables. We are still looking into all the logged queries in attempt to find the ones that are causing the problem, I'll report if we find any clues. right. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
On Fri, Nov 16, 2012 at 12:26 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Fri, Nov 16, 2012 at 8:21 AM, Merlin Moncure mmonc...@gmail.com wrote: On Fri, Nov 16, 2012 at 9:52 AM, Vlad marche...@gmail.com wrote: *) failing that, LWLOCK_STATS macro can be compiled in to give us some information about the particular lock(s) we're binding on. Hopefully it's a lwlock -- this will make diagnosing the problem easier. I've enabled that macro, seeing flying lwlock messages in the log (see below), even when there is no high-sys-cpu stall observed at the moment. Should I be looking for something in particular? We're looking for spikes in 'blk' which represents when lwlocks bump. Unfortunately LWLock contention is only a weak indication of spinlock contention. For example, if backends are furiously acquiring and releasing the same LWLock in shared mode and no one is getting an exclusive mode, then there can be high spinlock contention but there will never be a blk. yup. this problem doesn't smell like lwlock issues. typically there the problem manifests as low cpu performance, everybody waits. classic spinlock contention (at least from what i've seen) is very high *userspace* cpu utilization and low work output. this time it's different -- OP is bogging in the kernel so it's not impossible we're troubleshooting the symptom, not the cause. In 9.3 there is a new field that tells how many spin delays there were on the mutex that is behind each lock. That was commit b79ab00144e64217d41, maybe he can port that back to his version. But that only tells you about LWLock mutexes, not about all the other ones in PG. The attached patch logs every spin delay with where in the source it comes from. yeah, OP should fire this off. good stuff. I'll bet lunch (if we ever happen to meet) it's on buffer pin. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
I just did a little experiment: extracted top four queries that were executed the longest during stall times and launched pgbench test with 240 clients. Yet I wasn't able to put the server into a stall with that. Also load average was hitting 120+, it was all user cpu, single digit % system. The system remained pretty responsive (on a keypress), in contrary to when high-sys-cpu stall happens, showing similar LA numbers. This makes me think that there is probably a different (than ones I tried) query or condition that is responsible for creating high-sys-cpu. Also, from my limited knowledge of postgresql and kernel internals, this experiment puts under question idea that linux scheduler might be the corner stone Here is a longer log, first column is PID. High-sys-cpu stall happened around 10:42:37 https://dl.dropbox.com/u/109778/postgresql-2012-11-19_103425-stripped.log I'll try to put pgbouncer in place and repeat, also though this will take longer time to setup. -- Vlad
Re: [GENERAL] High SYS CPU - need advise
On Sun, Nov 18, 2012 at 4:24 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Fri, Nov 16, 2012 at 12:13 PM, Vlad marche...@gmail.com wrote: ok, I've applied that patch and ran. The stall started around 13:50:45...50 and lasted until the end https://dl.dropbox.com/u/109778/postgresql-2012-11-16_134904-stripped.log That isn't as much log as I expected. But I guess only the tip of the iceberg gets logged, it could be that most spinlocks are contended but then get acquired just before the sleep (and log) would have happened. I'm not sure how to figure out just how much spinning is going on below the surface, but my gut feeling is there isn't enough of it to explain the amount of slowdown you are seeing (and it would probably be accounted for as user time rather than system time) yes. classic spinlock contention symptoms are high user cpu and almost work getting done. something else is going on here? Maybe we could whip up a dummy standalone program that uses PG's spinlock code in a intentionally contended way and see how your system reports resource usage for that. Maybe your system reports fighting for cache lines or whatever is going on behind the scenes as system time rather than user time, or something. I don't think this is the case. It's looking more and more like scheduler issues, although it's unclear if that is due to some kind of regression or just symptoms of backends piling up. I'm also starting to wonder if the spinlock contention we are seeing (which according to your patch really isn't all that much really) is just symptomatic of scheduler issues. It would be a little easier to figure out what was going on there if the log included the pid (%p) so we can see if it is the same process doing a bunch of consecutive sleeps, or a bunch of different processes. Also, logging the millisecond (%m rather than %t) would be helpful. But the way that the point of contention jumps around a lot points to a scheduler issue rather than a postgres-specific issue. A backend gets pre-empted while holding a spinlock, and then the kernel is resistant to rescheduling it even though all other processes are piled up behind it. But again, that theory would lead to more log lines than we see, I would think. yeah. I asked for some longer strace logs but then everyone went home for the weekend. Anyway, pgbouncer might help by ensuring that a connection gets pre-empted by pgbouncer at a point where it has no locks, rather than by the kernel at a point where it does hold locks. yes -- I think pgbouncer is the key here because it's going to tell us if the scheduler is the culprit here. If you look carefully at the strace logs, you'll see that system call times become unpredictable as they are getting scheduled out...particularly yielding calls like select(). So we need to determine if this is normal high load behavior or if there is an underlying mechanic. Any chance you could try an older kernel and see if that reduces the problem? This is one thing to try. Personally I think pgbouncer/transaction mode is low hanging fruit if it 'drops in' to OP's app. If problem goes away this will pretty much point the finger at Linux AFAICT (in which case we can start messing around with the kernel). merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
On Mon, Nov 19, 2012 at 10:50 AM, Vlad marche...@gmail.com wrote: I just did a little experiment: extracted top four queries that were executed the longest during stall times and launched pgbench test with 240 clients. Yet I wasn't able to put the server into a stall with that. Also load average was hitting 120+, it was all user cpu, single digit % system. The system remained pretty responsive (on a keypress), in contrary to when high-sys-cpu stall happens, showing similar LA numbers. This makes me think that there is probably a different (than ones I tried) query or condition that is responsible for creating high-sys-cpu. Also, from my limited knowledge of postgresql and kernel internals, this experiment puts under question idea that linux scheduler might be the corner stone Here is a longer log, first column is PID. High-sys-cpu stall happened around 10:42:37 https://dl.dropbox.com/u/109778/postgresql-2012-11-19_103425-stripped.log I'll try to put pgbouncer in place and repeat, also though this will take longer time to setup. yeah. interesting -- contention was much higher this time and that changes things. strange how it was missed earlier. you're getting bounced around a lot in lwlock especially (unfortunately we don't know which one). I'm going to hazard another guess: maybe the trigger here is when the number of contending backends exceeds some critical number (probably based on the number of cores) you see a quick cpu spike (causing more backends to lock and pile up) as cache line bouncing sets in. That spike doesn't last long, because the spinlocks quickly accumulate delay counts then punt to the scheduler which is unable to cope. The exact reason why this is happening to you in exactly this way (I've never seen it) is unclear. Also the line between symptom and cause is difficult to draw. unfortunately, in your case spinlock re-scheduling isn't helping. log entries like this one: 18764 [2012-11-19 10:43:50.124 CST] LOG: JJ spin delay from file sinvaladt.c line 512 delay 212, pointer 0x7f514959a394 at character 29 are suggesting major problems. you're dangerously close to a stuck spinlock which is lights out for the database. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
On Thu, Nov 15, 2012 at 6:07 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Thu, Nov 15, 2012 at 2:44 PM, Merlin Moncure mmonc...@gmail.com wrote: select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 2000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 3000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 4000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 6000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 7000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 8000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 9000}) = 0 (Timeout) This is not entirely inconsistent with the spinlock. Note that 1000 is repeated 3 times, and 5000 is missing. This might just be a misleading random sample we got here. I've seen similar close spacing in some simulations I've run. It is not clear to me why we use a resolution of 1 msec here. If the OS's implementation of select() eventually rounds to the nearest msec, that is its business. But why do we have to lose intermediate precision due to its decision? Yeah -- you're right, this is definitely spinlock issue. Next steps: *) in mostly read workloads, we have a couple of known frequent offenders. In particular the 'BufFreelistLock'. One way we can influence that guy is to try and significantly lower/raise shared buffers. So this is one thing to try. *) failing that, LWLOCK_STATS macro can be compiled in to give us some information about the particular lock(s) we're binding on. Hopefully it's a lwlock -- this will make diagnosing the problem easier. *) if we're not blocking on lwlock, it's possibly a buffer pin related issue? I've seen this before, for example on an index scan that is dependent on an seq scan. This long thread: http://postgresql.1045698.n5.nabble.com/9-2beta1-parallel-queries-ReleasePredicateLocks-CheckForSerializableConflictIn-in-the-oprofile-td5709812i100.html; has a lot information about that case and deserves a review. *) we can consider experimenting with futex (http://archives.postgresql.org/pgsql-hackers/2012-06/msg01588.php) to see if things improve. This is dangerous, and could crash your server/eat your data, so fair warning. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
Merlin, Yeah -- you're right, this is definitely spinlock issue. Next steps: *) in mostly read workloads, we have a couple of known frequent offenders. In particular the 'BufFreelistLock'. One way we can influence that guy is to try and significantly lower/raise shared buffers. So this is one thing to try. server has 32gb with 3.2gb dedicated for share buffers. I've increased it to 13.2gb, the stall still happened (I have a way of controlling number of queries hitting postgresql by shutting down own memcache-based app cache, so to test for stall I temporary shut down few memcached servers). *) failing that, LWLOCK_STATS macro can be compiled in to give us some information about the particular lock(s) we're binding on. Hopefully it's a lwlock -- this will make diagnosing the problem easier. I've enabled that macro, seeing flying lwlock messages in the log (see below), even when there is no high-sys-cpu stall observed at the moment. Should I be looking for something in particular? PID 17293 lwlock 5906: shacq 1 exacq 0 blk 0 PID 17293 lwlock 5932: shacq 1 exacq 0 blk 0 PID 17293 lwlock 5934: shacq 1 exacq 0 blk 0 PID 17293 lwlock 10854: shacq 1 exacq 0 blk 0 PID 17293 lwlock 10856: shacq 4 exacq 0 blk 0 PID 17293 lwlock 10858: shacq 15 exacq 0 blk 0 PID 17293 lwlock 16442: shacq 4 exacq 0 blk 0 PID 17293 lwlock 16596: shacq 1 exacq 0 blk 0 PID 17293 lwlock 20458: shacq 3 exacq 0 blk 0 PID 17293 lwlock 20460: shacq 10 exacq 0 blk 0 PID 17293 lwlock 20464: shacq 2 exacq 0 blk 0 PID 17293 lwlock 20466: shacq 1 exacq 0 blk 0 PID 17293 lwlock 20480: shacq 2 exacq 0 blk 0 PID 17293 lwlock 20482: shacq 1 exacq 0 blk 0 PID 17293 lwlock 20484: shacq 1 exacq 0 blk 0 PID 17293 lwlock 20508: shacq 12 exacq 0 blk 0 PID 17293 lwlock 20510: shacq 6 exacq 0 blk 0 PID 17293 lwlock 20938: shacq 11 exacq 0 blk 0 PID 17293 lwlock 20940: shacq 22 exacq 0 blk 0 PID 17293 lwlock 20942: shacq 28 exacq 0 blk 0 PID 17293 lwlock 20944: shacq 56 exacq 0 blk 0 PID 17293 lwlock 20946: shacq 1 exacq 0 blk 0 PID 17293 lwlock 20948: shacq 2 exacq 0 blk 0 PID 17293 lwlock 20950: shacq 11 exacq 0 blk 0 PID 17293 lwlock 20952: shacq 3 exacq 0 blk 0 PID 17293 lwlock 20954: shacq 7 exacq 0 blk 0 PID 17293 lwlock 20956: shacq 2 exacq 0 blk 0 PID 17293 lwlock 20958: shacq 1 exacq 0 blk 0 PID 17293 lwlock 20960: shacq 1 exacq 0 blk 0 PID 17293 lwlock 20962: shacq 2 exacq 0 blk 0 PID 17293 lwlock 20964: shacq 2 exacq 0 blk 0 PID 17293 lwlock 46624: shacq 24 exacq 0 blk 0 PID 17293 lwlock 68126: shacq 6 exacq 0 blk 0 *) if we're not blocking on lwlock, it's possibly a buffer pin related issue? I've seen this before, for example on an index scan that is dependent on an seq scan. This long thread: http://postgresql.1045698.n5.nabble.com/9-2beta1-parallel-queries-ReleasePredicateLocks-CheckForSerializableConflictIn-in-the-oprofile-td5709812i100.html; has a lot information about that case and deserves a review. *) we can consider experimenting with futex (http://archives.postgresql.org/pgsql-hackers/2012-06/msg01588.php) to see if things improve. This is dangerous, and could crash your server/eat your data, so fair warning. merlin -- vlad -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
On Fri, Nov 16, 2012 at 9:52 AM, Vlad marche...@gmail.com wrote: Merlin, Yeah -- you're right, this is definitely spinlock issue. Next steps: *) in mostly read workloads, we have a couple of known frequent offenders. In particular the 'BufFreelistLock'. One way we can influence that guy is to try and significantly lower/raise shared buffers. So this is one thing to try. server has 32gb with 3.2gb dedicated for share buffers. I've increased it to 13.2gb, the stall still happened (I have a way of controlling number of queries hitting postgresql by shutting down own memcache-based app cache, so to test for stall I temporary shut down few memcached servers). *) failing that, LWLOCK_STATS macro can be compiled in to give us some information about the particular lock(s) we're binding on. Hopefully it's a lwlock -- this will make diagnosing the problem easier. I've enabled that macro, seeing flying lwlock messages in the log (see below), even when there is no high-sys-cpu stall observed at the moment. Should I be looking for something in particular? We're looking for spikes in 'blk' which represents when lwlocks bump. If you're not seeing any then this is suggesting a buffer pin related issue -- this is also supported by the fact that raising shared buffers didn't help. If you're not seeing 'bk's, go ahead and disable the stats macro. So, what we need to know now is: *) What happens when you drastically *lower* shared buffers? Say, to 64mb? Note, you may experience higher load for unrelated reasons and have to scuttle the test. Also, if you have to crank higher to handle internal server structures, do that. This is a hail mary, but maybe something interesting spits out. *) How many specific query plans are needed to introduce the condition, Hopefully, it's not too many. If so, let's start gathering the plans. If you have a lot of plans to sift through, one thing we can attempt to eliminate noise is to tweak log_min_duration_statement so that during stall times (only) it logs offending queries that are unexpectedly blocking. *) Approximately how big is your 'working set' -- the data your queries are routinely hitting? *) Is the distribution of the *types* of queries uniform? Or do you have special processes that occur on intervals? Thanks for your patience. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
We're looking for spikes in 'blk' which represents when lwlocks bump. If you're not seeing any then this is suggesting a buffer pin related issue -- this is also supported by the fact that raising shared buffers didn't help. If you're not seeing 'bk's, go ahead and disable the stats macro. most blk comes with 0, some with 1, few hitting 100. I can't say that during stall times the number of blk 0 vs blk non-0 are very different. So, what we need to know now is: *) What happens when you drastically *lower* shared buffers? Say, to 64mb? Note, you may experience higher load for unrelated reasons and have to scuttle the test. Also, if you have to crank higher to handle internal server structures, do that. This is a hail mary, but maybe something interesting spits out. lowering shared_buffers didn't help. *) How many specific query plans are needed to introduce the condition, Hopefully, it's not too many. If so, let's start gathering the plans. If you have a lot of plans to sift through, one thing we can attempt to eliminate noise is to tweak log_min_duration_statement so that during stall times (only) it logs offending queries that are unexpectedly blocking. unfortunately, there are quite a few query plans... also, I don't think setting log_min_duration_statement will help us, cause when server is hitting high load average, it reacts slowly even on a key press. So even non-offending queries will be taking long to execute. I see all sorts of queries a being executed long during stall: spanning from simple LOG: duration: 1131.041 ms statement: SELECT 'DBD::Pg ping test' to complex ones, joining multiple tables. We are still looking into all the logged queries in attempt to find the ones that are causing the problem, I'll report if we find any clues. *) Approximately how big is your 'working set' -- the data your queries are routinely hitting? I *think* it's within few hundreds MB range. *) Is the distribution of the *types* of queries uniform? Or do you have special processes that occur on intervals? it's pretty uniform. Thanks for your patience. oh no, thank you for trying to help me to resolve this issue. -- vlad
Re: [GENERAL] High SYS CPU - need advise
On Fri, Nov 16, 2012 at 8:21 AM, Merlin Moncure mmonc...@gmail.com wrote: On Fri, Nov 16, 2012 at 9:52 AM, Vlad marche...@gmail.com wrote: *) failing that, LWLOCK_STATS macro can be compiled in to give us some information about the particular lock(s) we're binding on. Hopefully it's a lwlock -- this will make diagnosing the problem easier. I've enabled that macro, seeing flying lwlock messages in the log (see below), even when there is no high-sys-cpu stall observed at the moment. Should I be looking for something in particular? We're looking for spikes in 'blk' which represents when lwlocks bump. Unfortunately LWLock contention is only a weak indication of spinlock contention. For example, if backends are furiously acquiring and releasing the same LWLock in shared mode and no one is getting an exclusive mode, then there can be high spinlock contention but there will never be a blk. In 9.3 there is a new field that tells how many spin delays there were on the mutex that is behind each lock. That was commit b79ab00144e64217d41, maybe he can port that back to his version. But that only tells you about LWLock mutexes, not about all the other ones in PG. The attached patch logs every spin delay with where in the source it comes from. I don't think it will have too much of a performance impact as the logs only happen when a sleep of at least 1ms is going to happen anyway. But it could generate massive amounts of logs. Cheers, Jeff spin_log.patch Description: Binary data -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
On Wed, Nov 14, 2012 at 4:08 PM, John R Pierce pie...@hogranch.com wrote: On 11/14/12 1:34 PM, Vlad wrote: thanks for your feedback. While implementing connection pooling would make resources utilization more efficient, I don't think it's the root of my problem. Most of the connected clients are at IDLE. When I do select * from pg_stat_activity where current_query not like '%IDLE%'; I only see several active queries at any given time. what about during these spikes? Yeah -- if you are seeing a lot of queries pile up during these times, it's time to explore connection pooler because it will keep system load manageable during these situations. things to check: *) blocked queries (pg_locks/pg_stat_activity) *) i/o wait. in particular, is linux page cache flushing. *) query stampede: we will want to measure TPS leading into the stall and out of it. *) anything else running on the box? *) you have a large amount of table? maybe this statistics file related? *) let's log checkpoints to see if there is correlation with stall *) nice to have vmstat/iostat during/before/after stall. for example, massive spike of context switches during stall could point to o/s scheduler issue. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
there is no big spike of queries that cause that, queries come in relatively stable pace. It's just when the higher rate of queries coming, the more likely this to happen. yes, when stall happens , the active queries pile up - but that's the result of a stall (the server reacts slow on a keypress, not to mention queries execution), not the cause. things to check: *) blocked queries (pg_locks/pg_stat_activity) nada *) i/o wait. in particular, is linux page cache flushing. no i/o wait, no IRQ (see below) *) query stampede: we will want to measure TPS leading into the stall and out of it. *) anything else running on the box? just bare linux + postgresql. *) you have a large amount of table? maybe this statistics file related? over 1000 tables across 4 or 5 schemas in a single database. *) let's log checkpoints to see if there is correlation with stall checked, checkpoints happen must more rarely and w/o relation to a high-sys periods *) nice to have vmstat/iostat during/before/after stall. for example, massive spike of context switches during stall could point to o/s scheduler issue. checked that as well - nothing. CS even lower. avg-cpu: %user %nice %system %iowait %steal %idle 16.940.009.280.380.00 73.40 Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sda 6.0048.00 0.00 48 0 avg-cpu: %user %nice %system %iowait %steal %idle 18.060.00 18.430.250.00 63.26 Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sda 10.00 104.00 0.00104 0 avg-cpu: %user %nice %system %iowait %steal %idle 9.120.00 * 86.74*0.120.004.01 Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sda 0.97 7.77 0.00 8 0 avg-cpu: %user %nice %system %iowait %steal %idle 1.440.00 *96.58*0.000.001.98 Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sda 3.2878.69 0.00144 0 procs ---memory-- ---swap-- -io --system-- -cpu- r b swpd free buff cache si sobibo in cs us sy id wa st 1 0 0 279240 12016 144319640032 0 197852 4299 15 9 76 0 0 4 0 0 225984 12024 1441969600 064 197711 5158 11 9 79 1 0 0 0 0 260112 12024 144136360048 0 196708 4618 17 10 73 0 0 6 0 0 233936 12024 1437578400 104 0 179861 4884 19 17 64 0 0 30 0 0 224904 12024 1435481200 8 0 51088 1205 9 *86 *5 0 0 72 0 0 239144 12024 1433385200 144 0 45601 542 2 *98 *0 0 0 78 0 0 224840 12024 1432853600 0 0 38732 481 2 *94 *5 0 0 22 1 0 219072 12032 1425065200 136 100 47323 1231 9 *90 *1 0 0
Re: [GENERAL] High SYS CPU - need advise
On Thu, Nov 15, 2012 at 11:50 AM, Vlad marche...@gmail.com wrote: there is no big spike of queries that cause that, queries come in relatively stable pace. It's just when the higher rate of queries coming, the more likely this to happen. yes, when stall happens , the active queries pile up - but that's the result of a stall (the server reacts slow on a keypress, not to mention queries execution), not the cause. procs ---memory-- ---swap-- -io --system-- -cpu- r b swpd free buff cache si sobibo in cs us sy id wa st 1 0 0 279240 12016 144319640032 0 197852 4299 15 9 76 0 0 4 0 0 225984 12024 1441969600 064 197711 5158 11 9 79 1 0 0 0 0 260112 12024 144136360048 0 196708 4618 17 10 73 0 0 6 0 0 233936 12024 1437578400 104 0 179861 4884 19 17 64 0 0 30 0 0 224904 12024 1435481200 8 0 51088 1205 9 86 5 0 0 72 0 0 239144 12024 1433385200 144 0 45601 542 2 98 0 0 0 78 0 0 224840 12024 1432853600 0 0 38732 481 2 94 5 0 0 22 1 0 219072 12032 1425065200 136 100 47323 1231 9 90 1 0 0 hm. well, we can definitely rule out i/o. I reviewed your last posting, and you said: Out of the top 50 processes in top, 48 of them are postmasters, one is syslog, and one is psql. Each of the postmasters have a high %CPU, the top ones being 80% and higher, the rest being anywhere between 30% - 60%. Would postmaster 'queries' that are running attribute to the sys CPU usage, or should they be under the 'us' CPU usage? Is this still true? Can we capture strace from one of the high % postmasters to see if there's any clues there. Maybe we've uncovered some type of weird spinlock contention issue. How large is your database (or at least the portion of it that's commonly used)? Would you characterize your queries as mostly small lookups, scans, or a mix? merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
Merlin, this is not my report, probably from a thread that I've referenced as having a common symptoms. Here is info about my db: Postgresql 9.1.6. Postgres usually has 400-500 connected clients, most of them are idle. Database is over 1000 tables (across 5 namespaces), taking ~150Gb on disk. We have mostly select statements (joins across few tables), using indexes and resulting in a small number of records returned. So mostly small lookups across few tables joining by a primary key / index. Also, I just perform an experiment - I switched our app over to using hot PITR instead of master and it experienced the same problem. So since PITR db can only perform read-only queries, there is no write-locks (except maybe when pitr is playing wal records from the master?), nevertheless SYS CPU jumped sky. -- Vlad Is this still true? Can we capture strace from one of the high % postmasters to see if there's any clues there. Maybe we've uncovered some type of weird spinlock contention issue. How large is your database (or at least the portion of it that's commonly used)? Would you characterize your queries as mostly small lookups, scans, or a mix? merlin
Re: [GENERAL] High SYS CPU - need advise
On Thu, Nov 15, 2012 at 2:20 PM, Vlad marche...@gmail.com wrote: Merlin, this is not my report, probably from a thread that I've referenced as having a common symptoms. Here is info about my db: Postgresql 9.1.6. Postgres usually has 400-500 connected clients, most of them are idle. Database is over 1000 tables (across 5 namespaces), taking ~150Gb on disk. We have mostly select statements (joins across few tables), using indexes and resulting in a small number of records returned. So mostly small lookups across few tables joining by a primary key / index. Also, I just perform an experiment - I switched our app over to using hot PITR instead of master and it experienced the same problem. So since PITR db can only perform read-only queries, there is no write-locks (except maybe when pitr is playing wal records from the master?), nevertheless SYS CPU jumped sky. yeah. ok, nest steps: *) can you confirm that postgres process is using high cpu (according to top) during stall time *) if, so, please strace that process and save some of the log *) you're using a 'bleeding edge' kernel. so we must be suspicious of a regression there, particularly in the scheduler. *) I am suspicious of spinlock issue. so, if we can't isolate the problem, is running a hand complied postgres a possibility (for lock stats)? *) what is the output of this: echo /proc/sys/vm/zone_reclaim_mode merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
yeah. ok, nest steps: *) can you confirm that postgres process is using high cpu (according to top) during stall time yes, CPU is spread across a lot of postmasters PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 29863 pgsql 20 0 3636m 102m 36m R 19.1 0.3 0:01.33 postmaster 30277 pgsql 20 0 3645m 111m 37m R 16.8 0.3 0:01.27 postmaster 11966 pgsql 20 0 3568m 22m 15m R 15.1 0.1 0:00.66 postmaster 8073 pgsql 20 0 3602m 60m 26m S 13.6 0.2 0:00.77 postmaster 29780 pgsql 20 0 3646m 115m 43m R 13.6 0.4 0:01.13 postmaster 11865 pgsql 20 0 3606m 61m 23m S 12.8 0.2 0:01.87 postmaster 29379 pgsql 20 0 3603m 70m 30m R 12.8 0.2 0:00.80 postmaster 29727 pgsql 20 0 3616m 77m 31m R 12.5 0.2 0:00.81 postmaster *) if, so, please strace that process and save some of the log https://dl.dropbox.com/u/109778/stall_postmaster.log *) you're using a 'bleeding edge' kernel. so we must be suspicious of a regression there, particularly in the scheduler. this was observed for a while, during which period system went from using 3.4.* kernels to 3.5.*... but I do not deny such a possibility. *) I am suspicious of spinlock issue. so, if we can't isolate the problem, is running a hand complied postgres a possibility (for lock stats)? Yes, definitely possible. we run manually compiled postgresql anyway. Pls, provide instructions. *) what is the output of this: echo /proc/sys/vm/zone_reclaim_mode I presume you wanted cat instead of echo, and it shows 0. -- vlad
Re: [GENERAL] High SYS CPU - need advise
On Thu, Nov 15, 2012 at 2:44 PM, Vlad marche...@gmail.com wrote: yeah. ok, nest steps: *) can you confirm that postgres process is using high cpu (according to top) during stall time yes, CPU is spread across a lot of postmasters PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 29863 pgsql 20 0 3636m 102m 36m R 19.1 0.3 0:01.33 postmaster 30277 pgsql 20 0 3645m 111m 37m R 16.8 0.3 0:01.27 postmaster 11966 pgsql 20 0 3568m 22m 15m R 15.1 0.1 0:00.66 postmaster 8073 pgsql 20 0 3602m 60m 26m S 13.6 0.2 0:00.77 postmaster 29780 pgsql 20 0 3646m 115m 43m R 13.6 0.4 0:01.13 postmaster 11865 pgsql 20 0 3606m 61m 23m S 12.8 0.2 0:01.87 postmaster 29379 pgsql 20 0 3603m 70m 30m R 12.8 0.2 0:00.80 postmaster 29727 pgsql 20 0 3616m 77m 31m R 12.5 0.2 0:00.81 postmaster *) if, so, please strace that process and save some of the log https://dl.dropbox.com/u/109778/stall_postmaster.log ok, excellent. reviewing the log, this immediately caught my eye: recvfrom(8, \27\3\1\0@, 5, 0, NULL, NULL) = 5 recvfrom(8, \327\327\nl\231LD\211\346\243@WW\254\244\363C\326\247\341\177\255\263~\327HDv-\3466\353..., 64, 0, NULL, NULL) = 64 select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 2000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 3000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 4000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 6000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 7000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 8000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 9000}) = 0 (Timeout) semop(41713721, {{2, 1, 0}}, 1) = 0 lseek(295, 0, SEEK_END) = 0 lseek(296, 0, SEEK_END) = 8192 this is definitely pointing to spinlock issue. see: slock.c what you are seeing here is a backend getting caught in a spin loop via a stuck spinlock. 0-9 = 10 times. did you by any chance check the logs? did any backends restart? we are looking for this: elog(PANIC, stuck spinlock (%p) detected at %s:%d, lock, file, line); Anything else going on in the log? The way to enable locks status is via LWLOCK_STATS macro. But before doing any thing plz check logs for error/panic. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
On Thu, Nov 15, 2012 at 3:49 PM, Merlin Moncure mmonc...@gmail.com wrote: On Thu, Nov 15, 2012 at 2:44 PM, Vlad marche...@gmail.com wrote: yeah. ok, nest steps: *) can you confirm that postgres process is using high cpu (according to top) during stall time yes, CPU is spread across a lot of postmasters PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 29863 pgsql 20 0 3636m 102m 36m R 19.1 0.3 0:01.33 postmaster 30277 pgsql 20 0 3645m 111m 37m R 16.8 0.3 0:01.27 postmaster 11966 pgsql 20 0 3568m 22m 15m R 15.1 0.1 0:00.66 postmaster 8073 pgsql 20 0 3602m 60m 26m S 13.6 0.2 0:00.77 postmaster 29780 pgsql 20 0 3646m 115m 43m R 13.6 0.4 0:01.13 postmaster 11865 pgsql 20 0 3606m 61m 23m S 12.8 0.2 0:01.87 postmaster 29379 pgsql 20 0 3603m 70m 30m R 12.8 0.2 0:00.80 postmaster 29727 pgsql 20 0 3616m 77m 31m R 12.5 0.2 0:00.81 postmaster *) if, so, please strace that process and save some of the log https://dl.dropbox.com/u/109778/stall_postmaster.log ok, excellent. reviewing the log, this immediately caught my eye: recvfrom(8, \27\3\1\0@, 5, 0, NULL, NULL) = 5 recvfrom(8, \327\327\nl\231LD\211\346\243@WW\254\244\363C\326\247\341\177\255\263~\327HDv-\3466\353..., 64, 0, NULL, NULL) = 64 select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 2000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 3000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 4000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 6000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 7000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 8000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 9000}) = 0 (Timeout) semop(41713721, {{2, 1, 0}}, 1) = 0 lseek(295, 0, SEEK_END) = 0 lseek(296, 0, SEEK_END) = 8192 this is definitely pointing to spinlock issue. see: slock.c what you are seeing here is a backend getting caught in a spin loop via a stuck spinlock. 0-9 = 10 times. did you by any chance check the logs? did any backends restart? we are looking for this: elog(PANIC, stuck spinlock (%p) detected at %s:%d, lock, file, line); Anything else going on in the log? The way to enable locks status is via LWLOCK_STATS macro. But before doing any thing plz check logs for error/panic. ah, scratch that. reading the code again, slock.c allows for for up to 1 second (that is, you can delay 1000 times). so the panic is unlikely to be in the log. also spinlock waits shouldn't increase by exactly 1000 us. investigating... merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
Merlin Moncure escribió: ok, excellent. reviewing the log, this immediately caught my eye: recvfrom(8, \27\3\1\0@, 5, 0, NULL, NULL) = 5 recvfrom(8, \327\327\nl\231LD\211\346\243@WW\254\244\363C\326\247\341\177\255\263~\327HDv-\3466\353..., 64, 0, NULL, NULL) = 64 select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 2000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 3000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 4000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 6000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 7000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 8000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 9000}) = 0 (Timeout) semop(41713721, {{2, 1, 0}}, 1) = 0 lseek(295, 0, SEEK_END) = 0 lseek(296, 0, SEEK_END) = 8192 this is definitely pointing to spinlock issue. I met Rik van Riel (Linux kernel hacker) recently and we chatted about this briefly. He strongly suggested that we should consider using futexes on Linux instead of spinlocks; the big advantage being that futexes sleep instead of spinning when contention is high. That would reduce the system load in this scenario. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
sorry - no panics / errors in the log... -- Vlad
Re: [GENERAL] High SYS CPU - need advise
On Thu, Nov 15, 2012 at 4:29 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Merlin Moncure escribió: ok, excellent. reviewing the log, this immediately caught my eye: recvfrom(8, \27\3\1\0@, 5, 0, NULL, NULL) = 5 recvfrom(8, \327\327\nl\231LD\211\346\243@WW\254\244\363C\326\247\341\177\255\263~\327HDv-\3466\353..., 64, 0, NULL, NULL) = 64 select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 2000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 3000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 4000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 6000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 7000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 8000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 9000}) = 0 (Timeout) semop(41713721, {{2, 1, 0}}, 1) = 0 lseek(295, 0, SEEK_END) = 0 lseek(296, 0, SEEK_END) = 8192 this is definitely pointing to spinlock issue. I met Rik van Riel (Linux kernel hacker) recently and we chatted about this briefly. He strongly suggested that we should consider using futexes on Linux instead of spinlocks; the big advantage being that futexes sleep instead of spinning when contention is high. That would reduce the system load in this scenario. Well, so do postgres spinlocks right? When we overflow spins_per_delay we go to pg_usleep which proxies to select() -- postgres spinlocks are a hybrid implementation. Moving to futex is possible improvement (that's another discussion) in degenerate cases but I'm not sure that I've exactly zeroed in on the problem. Or am I missing something? What I've been scratching my head over is what code exactly would cause an iterative sleep like the above. The code is here: pg_usleep(cur_delay * 1000L); /* increase delay by a random fraction between 1X and 2X */ cur_delay += (int) (cur_delay * ((double) random() / (double) MAX_RANDOM_VALUE) + 0.5); /* wrap back to minimum delay when max is exceeded */ if (cur_delay MAX_DELAY_MSEC) cur_delay = MIN_DELAY_MSEC; ...so cur_delay is supposed to increase in non linear fashion. I've looked around the sleep, usleep, and latch calls as of yet haven't found anything that advances timeout just like that (yet, need to do another pass). And we don't know for sure if this is directly related to OP's problem. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
Merlin Moncure mmonc...@gmail.com writes: What I've been scratching my head over is what code exactly would cause an iterative sleep like the above. The code is here: pg_usleep(cur_delay * 1000L); /* increase delay by a random fraction between 1X and 2X */ cur_delay += (int) (cur_delay * ((double) random() / (double) MAX_RANDOM_VALUE) + 0.5); /* wrap back to minimum delay when max is exceeded */ if (cur_delay MAX_DELAY_MSEC) cur_delay = MIN_DELAY_MSEC; IIRC that coding is of relatively recent vintage. The OP is probably running some older version that increased cur_delay differently. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
Tom, I just checked the version I'm running (9.1.6), and the code is quite similar (src/backend/storage/lmgr/s_lock.c) pg_usleep(cur_delay * 1000L); #if defined(S_LOCK_TEST) fprintf(stdout, *); fflush(stdout); #endif /* increase delay by a random fraction between 1X and 2X */ cur_delay += (int) (cur_delay * ((double) random() / (double) MAX_RANDOM_VALUE) + 0.5); /* wrap back to minimum delay when max is exceeded */ if (cur_delay MAX_DELAY_MSEC) cur_delay = MIN_DELAY_MSEC; -- vlad -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
On Thu, Nov 15, 2012 at 2:44 PM, Merlin Moncure mmonc...@gmail.com wrote: select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 2000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 3000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 4000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 6000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 7000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 8000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 9000}) = 0 (Timeout) This is not entirely inconsistent with the spinlock. Note that 1000 is repeated 3 times, and 5000 is missing. This might just be a misleading random sample we got here. I've seen similar close spacing in some simulations I've run. It is not clear to me why we use a resolution of 1 msec here. If the OS's implementation of select() eventually rounds to the nearest msec, that is its business. But why do we have to lose intermediate precision due to its decision? Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
On 11/14/12 1:13 PM, Vlad wrote: Postgresql 9.1.6. Postgres usually has 400-500 connected clients, most of them are idle. Database is over 1000 tables (across 5 namespaces), taking ~150Gb on disk. thats a really high client connection count for a 8 core system. I'd consider implementing a connection pool (like pgbouncer), and rewriting your client applications to connect, do a transaction, disconnect, so the actual number of postgres connections is much lower, say in the 16-48 range. -- john r pierceN 37, W 122 santa cruz ca mid-left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
John, thanks for your feedback. While implementing connection pooling would make resources utilization more efficient, I don't think it's the root of my problem. Most of the connected clients are at IDLE. When I do select * from pg_stat_activity where current_query not like '%IDLE%'; I only see several active queries at any given time. -- Vlad On Wed, Nov 14, 2012 at 3:23 PM, John R Pierce pie...@hogranch.com wrote: On 11/14/12 1:13 PM, Vlad wrote: Postgresql 9.1.6. Postgres usually has 400-500 connected clients, most of them are idle. Database is over 1000 tables (across 5 namespaces), taking ~150Gb on disk. thats a really high client connection count for a 8 core system. I'd consider implementing a connection pool (like pgbouncer), and rewriting your client applications to connect, do a transaction, disconnect, so the actual number of postgres connections is much lower, say in the 16-48 range.
Re: [GENERAL] High SYS CPU - need advise
On 11/14/12 1:34 PM, Vlad wrote: thanks for your feedback. While implementing connection pooling would make resources utilization more efficient, I don't think it's the root of my problem. Most of the connected clients are at IDLE. When I do select * from pg_stat_activity where current_query not like '%IDLE%'; I only see several active queries at any given time. what about during these spikes? -- john r pierceN 37, W 122 santa cruz ca mid-left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general