Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.

2013-09-17 Thread Дмитрий Дегтярёв
Hello.

We have not been able to reproduce this problem on a test servers. Use this
patch to production servers do not dare.

In the course of studying the problems we have identified that many queries
are executed on the slave several times slower. On master function
heap_hot_search_buffer execute 100 cycles, on the slave the same query with
the same plan function heap_hot_search_buffer execute 2000 cycles.
Also, we were able to reproduce the problem on the master and detect that
there s_lock of slow queries.

We have solved this problem. A large number of queries used 4 frequently
changing index. In these indexes, 99% of the dead tuples. Autovacuum and
even VACUUM FULL these tuples can not be removed because of
autovacuum_freeze_max_age.

We've added cron that 2-3 times a day, performs CREATE INDEX CONCURRENTLY
idx_name_new; DROP INDEX CONCURRENTLY idx_name; ALTER INDEX idx_name_new
RENAME TO idx_name; for this 4 indexes.

As a result s_lock not exists in listed perf top.


2013/8/29 Merlin Moncure mmonc...@gmail.com


 so -- are you in a position where you might be able to test this patch?

 merlin


Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.

2013-09-17 Thread Andres Freund
Hi,

On 2013-09-17 17:55:01 +0600, Дмитрий Дегтярёв wrote:
 We have not been able to reproduce this problem on a test servers. Use this
 patch to production servers do not dare.
 
 In the course of studying the problems we have identified that many queries
 are executed on the slave several times slower. On master function
 heap_hot_search_buffer execute 100 cycles, on the slave the same query with
 the same plan function heap_hot_search_buffer execute 2000 cycles.
 Also, we were able to reproduce the problem on the master and detect that
 there s_lock of slow queries.

What you describe is normally an indication that you have too many
longrunning transactions around preventing hot pruning from working.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.

2013-09-17 Thread Merlin Moncure
On Tue, Sep 17, 2013 at 6:59 AM, Andres Freund and...@2ndquadrant.com wrote:
 Hi,

 On 2013-09-17 17:55:01 +0600, Дмитрий Дегтярёв wrote:
 We have not been able to reproduce this problem on a test servers. Use this
 patch to production servers do not dare.

 In the course of studying the problems we have identified that many queries
 are executed on the slave several times slower. On master function
 heap_hot_search_buffer execute 100 cycles, on the slave the same query with
 the same plan function heap_hot_search_buffer execute 2000 cycles.
 Also, we were able to reproduce the problem on the master and detect that
 there s_lock of slow queries.

 What you describe is normally an indication that you have too many
 longrunning transactions around preventing hot pruning from working.

Do you think it's worth submitting the lock avoidance patch for formal review?

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.

2013-09-17 Thread Andres Freund
On 2013-09-17 08:18:54 -0500, Merlin Moncure wrote:
 On Tue, Sep 17, 2013 at 6:59 AM, Andres Freund and...@2ndquadrant.com wrote:
  Hi,
 
  On 2013-09-17 17:55:01 +0600, Дмитрий Дегтярёв wrote:
  We have not been able to reproduce this problem on a test servers. Use this
  patch to production servers do not dare.
 
  In the course of studying the problems we have identified that many queries
  are executed on the slave several times slower. On master function
  heap_hot_search_buffer execute 100 cycles, on the slave the same query with
  the same plan function heap_hot_search_buffer execute 2000 cycles.
  Also, we were able to reproduce the problem on the master and detect that
  there s_lock of slow queries.
 
  What you describe is normally an indication that you have too many
  longrunning transactions around preventing hot pruning from working.
 
 Do you think it's worth submitting the lock avoidance patch for formal review?

You mean the bufmgr.c thing? Generally I think that that code needs a
good of scalability work - there's a whole thread about it
somewhere. But TBH the theories you've voiced about the issues you've
seen haven't convinced me so far.

If you can manage to prove it has a benefit in some case that's
reproducable - why not go ahead?

Quick question: Do you happen to have pg_locks output from back then
around? We've recently found servers going into somewhat similar
slowdowns because they exhausted the fastpath locks which made lwlocks
far more expensive and made s_lock go up very high in the profle.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.

2013-09-17 Thread Merlin Moncure
On Tue, Sep 17, 2013 at 8:24 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-09-17 08:18:54 -0500, Merlin Moncure wrote:
 Do you think it's worth submitting the lock avoidance patch for formal 
 review?

 You mean the bufmgr.c thing? Generally I think that that code needs a
 good of scalability work - there's a whole thread about it
 somewhere. But TBH the theories you've voiced about the issues you've
 seen haven't convinced me so far.

er, no (but I share your skepticism -- my challenge right now is to
demonstrate measurable benefit which so far I've been unable to do).
I was talking about the patch on  *this* thread which bypasses the
s_lock in RecoveryInProgress()  :-).

 Quick question: Do you happen to have pg_locks output from back then
 around? We've recently found servers going into somewhat similar
 slowdowns because they exhausted the fastpath locks which made lwlocks
 far more expensive and made s_lock go up very high in the profle.

I do. Unfortunately I don't have profile info.   Not sure how useful
it is -- I'll send it off-list.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.

2013-09-17 Thread Andres Freund
On 2013-09-17 08:32:30 -0500, Merlin Moncure wrote:
 On Tue, Sep 17, 2013 at 8:24 AM, Andres Freund and...@2ndquadrant.com wrote:
  On 2013-09-17 08:18:54 -0500, Merlin Moncure wrote:
  Do you think it's worth submitting the lock avoidance patch for formal 
  review?
 
  You mean the bufmgr.c thing? Generally I think that that code needs a
  good of scalability work - there's a whole thread about it
  somewhere. But TBH the theories you've voiced about the issues you've
  seen haven't convinced me so far.
 
 er, no (but I share your skepticism -- my challenge right now is to
 demonstrate measurable benefit which so far I've been unable to do).
 I was talking about the patch on  *this* thread which bypasses the
 s_lock in RecoveryInProgress()  :-).

Ah, yes. Sorry confused issues ;). Yes, I think that'd made sense.

  Quick question: Do you happen to have pg_locks output from back then
  around? We've recently found servers going into somewhat similar
  slowdowns because they exhausted the fastpath locks which made lwlocks
  far more expensive and made s_lock go up very high in the profle.
 
 I do. Unfortunately I don't have profile info.   Not sure how useful
 it is -- I'll send it off-list.

Great.

The primary thing I'd like to know is whether there are lots of
non-fastpath locks...

If you ever get into the situation I mistakenly referred to again, I'd
strongly suggest recompling postgres with -fno-omit-frame-pointer. That
makes hierarchical profiles actually useful which can help tremendously
with diagnosing issues like this...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.

2013-09-17 Thread Merlin Moncure
On Tue, Sep 17, 2013 at 8:35 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-09-17 08:32:30 -0500, Merlin Moncure wrote:
 On Tue, Sep 17, 2013 at 8:24 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  On 2013-09-17 08:18:54 -0500, Merlin Moncure wrote:
  Do you think it's worth submitting the lock avoidance patch for formal 
  review?
 
  You mean the bufmgr.c thing? Generally I think that that code needs a
  good of scalability work - there's a whole thread about it
  somewhere. But TBH the theories you've voiced about the issues you've
  seen haven't convinced me so far.

 er, no (but I share your skepticism -- my challenge right now is to
 demonstrate measurable benefit which so far I've been unable to do).
 I was talking about the patch on  *this* thread which bypasses the
 s_lock in RecoveryInProgress()  :-).

 Ah, yes. Sorry confused issues ;). Yes, I think that'd made sense.

  Quick question: Do you happen to have pg_locks output from back then
  around? We've recently found servers going into somewhat similar
  slowdowns because they exhausted the fastpath locks which made lwlocks
  far more expensive and made s_lock go up very high in the profle.

 I do. Unfortunately I don't have profile info.   Not sure how useful
 it is -- I'll send it off-list.

 Great.

 The primary thing I'd like to know is whether there are lots of
 non-fastpath locks...

 If you ever get into the situation I mistakenly referred to again, I'd
 strongly suggest recompling postgres with -fno-omit-frame-pointer. That
 makes hierarchical profiles actually useful which can help tremendously
 with diagnosing issues like this...

We may get an opportunity to do that.  I'm curious enough about the
THP compaction issues that Kevin mentioned to to maybe consider
cranking buffers again.  If I do that, it will be with strict
instructions to the site operators to catch a profile before taking
further action.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.

2013-09-17 Thread Andres Freund
On 2013-09-17 08:40:23 -0500, Merlin Moncure wrote:
  If you ever get into the situation I mistakenly referred to again, I'd
  strongly suggest recompling postgres with -fno-omit-frame-pointer. That
  makes hierarchical profiles actually useful which can help tremendously
  with diagnosing issues like this...
 
 We may get an opportunity to do that.  I'm curious enough about the
 THP compaction issues that Kevin mentioned to to maybe consider
 cranking buffers again.  If I do that, it will be with strict
 instructions to the site operators to catch a profile before taking
 further action.

The THP issues should be very clearly diagnosable because a good part of
the time will be spent in the kernel. Lots of spinlocking there, but the
function names are easily discernible from pg's code.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: Planner performance extremely affected by an hanging transaction (20-30 times)?

2013-09-17 Thread Bartłomiej Romański
 What's more doing similar insert (even much larger) to the 'offers' table
does not affect the benchmark results in any significant way...

Just want clarify myself here: Insert to 'offers' table does not cause the
slowdown. Only insert to 'categories' causes the problem.