Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.

2013-09-27 Thread Merlin Moncure
On Thu, Sep 26, 2013 at 10:14 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Thu, Sep 26, 2013 at 6:08 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-08-27 12:17:55 -0500, Merlin Moncure wrote:
 On Tue, Aug 27, 2013 at 10:55 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  On 2013-08-27 09:57:38 -0500, Merlin Moncure wrote:
  + bool
  + RecoveryMightBeInProgress(void)
  + {
  + /*
  +  * We check shared state each time only until we leave recovery 
  mode. We
  +  * can't re-enter recovery, so there's no need to keep checking 
  after the
  +  * shared variable has once been seen false.
  +  */
  + if (!LocalRecoveryInProgress)
  + return false;
  + else
  + {
  + /* use volatile pointer to prevent code rearrangement */
  + volatile XLogCtlData *xlogctl = XLogCtl;
  +
  + /* Intentionally query xlogctl without spinlocking! */
  + LocalRecoveryInProgress = 
  xlogctl-SharedRecoveryInProgress;
  +
  + return LocalRecoveryInProgress;
  + }
  + }
 
  I don't think it's acceptable to *set* LocalRecoveryInProgress
  here. That should only be done in the normal routine.

 quite right -- that was a major error -- you could bypass the
 initialization call to the xlog with some bad luck.

 I've seen this in profiles since, so I'd appreciate pushing this
 forward.

 roger that -- will push ahead when I get into the office...

attached is new version fixing some comment typos.

merlin


recovery4.patch
Description: Binary data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Troubleshooting query performance issues - Resolved (sort of)

2013-09-27 Thread Jim Garrison
We have traced this to the *addition* of a two-column index. 

The two tables in question both have single-column indexes on two foreign keys, 
say columns A and B.  The query joins the two large tables on A and B.  

With only the two indexes, the query plan does a bitmap AND on the index scan 
results and performance is stable.

I added an index on (A,B), and this caused the planner to use the new index, 
but I was never able to get the query to complete.  In one instance I let it 
run 18 hours.  

The onlly difference was the addition of the index

Summary:

- With index on (A,B) -- query time is infinite 

- Without index on (A,B), relying on individual indexes and bitmap AND -- query 
time is about 4 minutes (as expected given the data volume)

Does this sound like a bug in the query planner?

 -Original Message-
 From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-
 performance-ow...@postgresql.org] On Behalf Of Jim Garrison
 Sent: Wednesday, September 25, 2013 8:58 AM
 To: pgsql-performance@postgresql.org
 Subject: [PERFORM] Troubleshooting query performance issues
 
 I spent about a week optimizing a query in our performance-testing
 environment, which has hardware similar to production.
 
 I was able to refactor the query and reduce the runtime from hours to about
 40 seconds, through the use of CTEs and a couple of new indexes.
 
 The database was rebuilt and refreshed with the very similar data from
 production, but now the query takes hours again.
 
 In the query plan, it is clear that the row count estimates are WAY too low,
 even though the statistics are up to date.  Here's a sample query plan:
 
 CTE Scan on stef  (cost=164.98..165.00 rows=1 width=38)
   CTE terms
 -  Nested Loop  (cost=0.00..62.40 rows=1 width=12)
   -  Index Scan using term_idx1 on term t  (cost=0.00..52.35 rows=1
 width=12)
 Index Cond: (partner_id = 497)
 Filter: (recalculate_district_averages_yn AND (NOT 
 is_deleted_yn))
   -  Index Scan using growth_measurement_window_fk1 on
 growth_measurement_window gw  (cost=0.00..10.04 rows=1 width=4)
 Index Cond: (term_id = t.term_id)
 Filter: (test_window_complete_yn AND (NOT is_deleted_yn) AND
 ((growth_window_type)::text = 'DISTRICT'::text))
   CTE stef
 -  Nested Loop  (cost=0.00..102.58 rows=1 width=29)
   Join Filter: ((ssef.student_id = terf.student_id) AND 
 (ssef.grade_id =
 terf.grade_id))
   -  Nested Loop  (cost=0.00..18.80 rows=3 width=28)
 -  CTE Scan on terms t  (cost=0.00..0.02 rows=1 width=8)
 -  Index Scan using student_school_enrollment_fact_idx2 on
 student_school_enrollment_fact ssef  (cost=0.00..18.74 rows=3 width=20)
   Index Cond: ((partner_id = t.partner_id) AND (term_id =
 t.term_id))
   Filter: primary_yn
   -  Index Scan using test_event_result_fact_idx3 on
 test_event_result_fact terf  (cost=0.00..27.85 rows=4 width=25)
 Index Cond: ((partner_id = t.partner_id) AND (term_id = 
 t.term_id))
 Filter: growth_event_yn
 
 The estimates in the first CTE are correct, but in the second, the scan on
 student_school_enrollment_fact will return about 1.5 million rows, and the
 scan on test_event_result_fact actually returns about 1.1 million.  The top
 level join should return about 900K rows.  I believe the fundamental issue is
 that the CTE stef outer nested loop should be a merge join instead, but I
 cannot figure out why the optimizer is estimating one row when it has the
 statistics to correctly estimate the count.
 
 What would cause PG to so badly estimate the row counts?
 
 I've already regenerated the indexes and re-analyzed the tables involved.
 
 What else can I do to find out why it's running so slowly?
 
 
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Troubleshooting query performance issues - resolved (sort of)

2013-09-27 Thread Jim Garrison
We have traced this to the *addition* of a two-column index. 
 -Original Message-
 From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-
 performance-ow...@postgresql.org] On Behalf Of Jim Garrison
 Sent: Wednesday, September 25, 2013 8:58 AM
 To: pgsql-performance@postgresql.org
 Subject: [PERFORM] Troubleshooting query performance issues
 
 I spent about a week optimizing a query in our performance-testing
 environment, which has hardware similar to production.
 
 I was able to refactor the query and reduce the runtime from hours to about
 40 seconds, through the use of CTEs and a couple of new indexes.
 
 The database was rebuilt and refreshed with the very similar data from
 production, but now the query takes hours again.
 
 In the query plan, it is clear that the row count estimates are WAY too low,
 even though the statistics are up to date.  Here's a sample query plan:
 
[snip]

The two tables in question both have single-column indexes on two foreign keys, 
say columns A and B.  The query joins the two large tables on A and B.  

With only the two indexes, the query plan does a bitmap AND on the index scan 
results and performance is stable.

I added an index on (A,B), and this caused the planner to use the new index, 
but I was never able to get the query to complete.  In one instance I let it 
run 18 hours.  

The onlly difference was the addition of the index

Summary:

- With index on (A,B) -- query time is infinite 

- Without index on (A,B), relying on individual indexes and bitmap AND -- query 
time is about 4 minutes (as expected given the data volume)

Does this sound like a bug in the query planner?




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planner performance extremely affected by an hanging transaction (20-30 times)?

2013-09-27 Thread Josh Berkus
Andres, Jeff,


 As far as I can tell, the only downside of doing that is that, since hint
 bits might be set later, it is possible some dirty pages will get written
 unhinted and then re-dirtied by the hint bit setting, when more aggressive
 setting would have only one combined dirty write instead.  But that seems
 rather hypothetical, and if it really is a problem we should probably
 tackle it directly rather than by barring other optimizations.
 
 I am - as evidenced - too tired to think about this properly, but I
 think you might be right here.

Any thoughts on a fix for this we could get into 9.2.5?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planner performance extremely affected by an hanging transaction (20-30 times)?

2013-09-27 Thread Andres Freund
On 2013-09-27 13:57:02 -0700, Josh Berkus wrote:
 Andres, Jeff,
 
 
  As far as I can tell, the only downside of doing that is that, since hint
  bits might be set later, it is possible some dirty pages will get written
  unhinted and then re-dirtied by the hint bit setting, when more aggressive
  setting would have only one combined dirty write instead.  But that seems
  rather hypothetical, and if it really is a problem we should probably
  tackle it directly rather than by barring other optimizations.
  
  I am - as evidenced - too tired to think about this properly, but I
  think you might be right here.
 
 Any thoughts on a fix for this we could get into 9.2.5?

I don't see much chance to apply anything like this in a
backbranch. Changing IO patterns in a noticeable way in a minor release
is just asking for trouble.

Also, this really isn't going to fix the issue discussed here - this was
just about the additional ProcArrayLock contention. I don't think it
would change anything dramatical in your case.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance