Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.
Hello. We have not been able to reproduce this problem on a test servers. Use this patch to production servers do not dare. In the course of studying the problems we have identified that many queries are executed on the slave several times slower. On master function heap_hot_search_buffer execute 100 cycles, on the slave the same query with the same plan function heap_hot_search_buffer execute 2000 cycles. Also, we were able to reproduce the problem on the master and detect that there s_lock of slow queries. We have solved this problem. A large number of queries used 4 frequently changing index. In these indexes, 99% of the dead tuples. Autovacuum and even VACUUM FULL these tuples can not be removed because of autovacuum_freeze_max_age. We've added cron that 2-3 times a day, performs CREATE INDEX CONCURRENTLY idx_name_new; DROP INDEX CONCURRENTLY idx_name; ALTER INDEX idx_name_new RENAME TO idx_name; for this 4 indexes. As a result s_lock not exists in listed perf top. 2013/8/29 Merlin Moncure mmonc...@gmail.com so -- are you in a position where you might be able to test this patch? merlin
Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.
Hi, On 2013-09-17 17:55:01 +0600, Дмитрий Дегтярёв wrote: We have not been able to reproduce this problem on a test servers. Use this patch to production servers do not dare. In the course of studying the problems we have identified that many queries are executed on the slave several times slower. On master function heap_hot_search_buffer execute 100 cycles, on the slave the same query with the same plan function heap_hot_search_buffer execute 2000 cycles. Also, we were able to reproduce the problem on the master and detect that there s_lock of slow queries. What you describe is normally an indication that you have too many longrunning transactions around preventing hot pruning from working. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.
On Tue, Sep 17, 2013 at 6:59 AM, Andres Freund and...@2ndquadrant.com wrote: Hi, On 2013-09-17 17:55:01 +0600, Дмитрий Дегтярёв wrote: We have not been able to reproduce this problem on a test servers. Use this patch to production servers do not dare. In the course of studying the problems we have identified that many queries are executed on the slave several times slower. On master function heap_hot_search_buffer execute 100 cycles, on the slave the same query with the same plan function heap_hot_search_buffer execute 2000 cycles. Also, we were able to reproduce the problem on the master and detect that there s_lock of slow queries. What you describe is normally an indication that you have too many longrunning transactions around preventing hot pruning from working. Do you think it's worth submitting the lock avoidance patch for formal review? merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.
On 2013-09-17 08:18:54 -0500, Merlin Moncure wrote: On Tue, Sep 17, 2013 at 6:59 AM, Andres Freund and...@2ndquadrant.com wrote: Hi, On 2013-09-17 17:55:01 +0600, Дмитрий Дегтярёв wrote: We have not been able to reproduce this problem on a test servers. Use this patch to production servers do not dare. In the course of studying the problems we have identified that many queries are executed on the slave several times slower. On master function heap_hot_search_buffer execute 100 cycles, on the slave the same query with the same plan function heap_hot_search_buffer execute 2000 cycles. Also, we were able to reproduce the problem on the master and detect that there s_lock of slow queries. What you describe is normally an indication that you have too many longrunning transactions around preventing hot pruning from working. Do you think it's worth submitting the lock avoidance patch for formal review? You mean the bufmgr.c thing? Generally I think that that code needs a good of scalability work - there's a whole thread about it somewhere. But TBH the theories you've voiced about the issues you've seen haven't convinced me so far. If you can manage to prove it has a benefit in some case that's reproducable - why not go ahead? Quick question: Do you happen to have pg_locks output from back then around? We've recently found servers going into somewhat similar slowdowns because they exhausted the fastpath locks which made lwlocks far more expensive and made s_lock go up very high in the profle. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.
On Tue, Sep 17, 2013 at 8:24 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-09-17 08:18:54 -0500, Merlin Moncure wrote: Do you think it's worth submitting the lock avoidance patch for formal review? You mean the bufmgr.c thing? Generally I think that that code needs a good of scalability work - there's a whole thread about it somewhere. But TBH the theories you've voiced about the issues you've seen haven't convinced me so far. er, no (but I share your skepticism -- my challenge right now is to demonstrate measurable benefit which so far I've been unable to do). I was talking about the patch on *this* thread which bypasses the s_lock in RecoveryInProgress() :-). Quick question: Do you happen to have pg_locks output from back then around? We've recently found servers going into somewhat similar slowdowns because they exhausted the fastpath locks which made lwlocks far more expensive and made s_lock go up very high in the profle. I do. Unfortunately I don't have profile info. Not sure how useful it is -- I'll send it off-list. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.
On 2013-09-17 08:32:30 -0500, Merlin Moncure wrote: On Tue, Sep 17, 2013 at 8:24 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-09-17 08:18:54 -0500, Merlin Moncure wrote: Do you think it's worth submitting the lock avoidance patch for formal review? You mean the bufmgr.c thing? Generally I think that that code needs a good of scalability work - there's a whole thread about it somewhere. But TBH the theories you've voiced about the issues you've seen haven't convinced me so far. er, no (but I share your skepticism -- my challenge right now is to demonstrate measurable benefit which so far I've been unable to do). I was talking about the patch on *this* thread which bypasses the s_lock in RecoveryInProgress() :-). Ah, yes. Sorry confused issues ;). Yes, I think that'd made sense. Quick question: Do you happen to have pg_locks output from back then around? We've recently found servers going into somewhat similar slowdowns because they exhausted the fastpath locks which made lwlocks far more expensive and made s_lock go up very high in the profle. I do. Unfortunately I don't have profile info. Not sure how useful it is -- I'll send it off-list. Great. The primary thing I'd like to know is whether there are lots of non-fastpath locks... If you ever get into the situation I mistakenly referred to again, I'd strongly suggest recompling postgres with -fno-omit-frame-pointer. That makes hierarchical profiles actually useful which can help tremendously with diagnosing issues like this... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.
On Tue, Sep 17, 2013 at 8:35 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-09-17 08:32:30 -0500, Merlin Moncure wrote: On Tue, Sep 17, 2013 at 8:24 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-09-17 08:18:54 -0500, Merlin Moncure wrote: Do you think it's worth submitting the lock avoidance patch for formal review? You mean the bufmgr.c thing? Generally I think that that code needs a good of scalability work - there's a whole thread about it somewhere. But TBH the theories you've voiced about the issues you've seen haven't convinced me so far. er, no (but I share your skepticism -- my challenge right now is to demonstrate measurable benefit which so far I've been unable to do). I was talking about the patch on *this* thread which bypasses the s_lock in RecoveryInProgress() :-). Ah, yes. Sorry confused issues ;). Yes, I think that'd made sense. Quick question: Do you happen to have pg_locks output from back then around? We've recently found servers going into somewhat similar slowdowns because they exhausted the fastpath locks which made lwlocks far more expensive and made s_lock go up very high in the profle. I do. Unfortunately I don't have profile info. Not sure how useful it is -- I'll send it off-list. Great. The primary thing I'd like to know is whether there are lots of non-fastpath locks... If you ever get into the situation I mistakenly referred to again, I'd strongly suggest recompling postgres with -fno-omit-frame-pointer. That makes hierarchical profiles actually useful which can help tremendously with diagnosing issues like this... We may get an opportunity to do that. I'm curious enough about the THP compaction issues that Kevin mentioned to to maybe consider cranking buffers again. If I do that, it will be with strict instructions to the site operators to catch a profile before taking further action. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.
On 2013-09-17 08:40:23 -0500, Merlin Moncure wrote: If you ever get into the situation I mistakenly referred to again, I'd strongly suggest recompling postgres with -fno-omit-frame-pointer. That makes hierarchical profiles actually useful which can help tremendously with diagnosing issues like this... We may get an opportunity to do that. I'm curious enough about the THP compaction issues that Kevin mentioned to to maybe consider cranking buffers again. If I do that, it will be with strict instructions to the site operators to catch a profile before taking further action. The THP issues should be very clearly diagnosable because a good part of the time will be spent in the kernel. Lots of spinlocking there, but the function names are easily discernible from pg's code. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Re: Planner performance extremely affected by an hanging transaction (20-30 times)?
What's more doing similar insert (even much larger) to the 'offers' table does not affect the benchmark results in any significant way... Just want clarify myself here: Insert to 'offers' table does not cause the slowdown. Only insert to 'categories' causes the problem.