Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.
On Thu, Sep 26, 2013 at 10:14 PM, Merlin Moncure mmonc...@gmail.com wrote: On Thu, Sep 26, 2013 at 6:08 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-08-27 12:17:55 -0500, Merlin Moncure wrote: On Tue, Aug 27, 2013 at 10:55 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-08-27 09:57:38 -0500, Merlin Moncure wrote: + bool + RecoveryMightBeInProgress(void) + { + /* + * We check shared state each time only until we leave recovery mode. We + * can't re-enter recovery, so there's no need to keep checking after the + * shared variable has once been seen false. + */ + if (!LocalRecoveryInProgress) + return false; + else + { + /* use volatile pointer to prevent code rearrangement */ + volatile XLogCtlData *xlogctl = XLogCtl; + + /* Intentionally query xlogctl without spinlocking! */ + LocalRecoveryInProgress = xlogctl-SharedRecoveryInProgress; + + return LocalRecoveryInProgress; + } + } I don't think it's acceptable to *set* LocalRecoveryInProgress here. That should only be done in the normal routine. quite right -- that was a major error -- you could bypass the initialization call to the xlog with some bad luck. I've seen this in profiles since, so I'd appreciate pushing this forward. roger that -- will push ahead when I get into the office... attached is new version fixing some comment typos. merlin recovery4.patch Description: Binary data -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Troubleshooting query performance issues - Resolved (sort of)
We have traced this to the *addition* of a two-column index. The two tables in question both have single-column indexes on two foreign keys, say columns A and B. The query joins the two large tables on A and B. With only the two indexes, the query plan does a bitmap AND on the index scan results and performance is stable. I added an index on (A,B), and this caused the planner to use the new index, but I was never able to get the query to complete. In one instance I let it run 18 hours. The onlly difference was the addition of the index Summary: - With index on (A,B) -- query time is infinite - Without index on (A,B), relying on individual indexes and bitmap AND -- query time is about 4 minutes (as expected given the data volume) Does this sound like a bug in the query planner? -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql- performance-ow...@postgresql.org] On Behalf Of Jim Garrison Sent: Wednesday, September 25, 2013 8:58 AM To: pgsql-performance@postgresql.org Subject: [PERFORM] Troubleshooting query performance issues I spent about a week optimizing a query in our performance-testing environment, which has hardware similar to production. I was able to refactor the query and reduce the runtime from hours to about 40 seconds, through the use of CTEs and a couple of new indexes. The database was rebuilt and refreshed with the very similar data from production, but now the query takes hours again. In the query plan, it is clear that the row count estimates are WAY too low, even though the statistics are up to date. Here's a sample query plan: CTE Scan on stef (cost=164.98..165.00 rows=1 width=38) CTE terms - Nested Loop (cost=0.00..62.40 rows=1 width=12) - Index Scan using term_idx1 on term t (cost=0.00..52.35 rows=1 width=12) Index Cond: (partner_id = 497) Filter: (recalculate_district_averages_yn AND (NOT is_deleted_yn)) - Index Scan using growth_measurement_window_fk1 on growth_measurement_window gw (cost=0.00..10.04 rows=1 width=4) Index Cond: (term_id = t.term_id) Filter: (test_window_complete_yn AND (NOT is_deleted_yn) AND ((growth_window_type)::text = 'DISTRICT'::text)) CTE stef - Nested Loop (cost=0.00..102.58 rows=1 width=29) Join Filter: ((ssef.student_id = terf.student_id) AND (ssef.grade_id = terf.grade_id)) - Nested Loop (cost=0.00..18.80 rows=3 width=28) - CTE Scan on terms t (cost=0.00..0.02 rows=1 width=8) - Index Scan using student_school_enrollment_fact_idx2 on student_school_enrollment_fact ssef (cost=0.00..18.74 rows=3 width=20) Index Cond: ((partner_id = t.partner_id) AND (term_id = t.term_id)) Filter: primary_yn - Index Scan using test_event_result_fact_idx3 on test_event_result_fact terf (cost=0.00..27.85 rows=4 width=25) Index Cond: ((partner_id = t.partner_id) AND (term_id = t.term_id)) Filter: growth_event_yn The estimates in the first CTE are correct, but in the second, the scan on student_school_enrollment_fact will return about 1.5 million rows, and the scan on test_event_result_fact actually returns about 1.1 million. The top level join should return about 900K rows. I believe the fundamental issue is that the CTE stef outer nested loop should be a merge join instead, but I cannot figure out why the optimizer is estimating one row when it has the statistics to correctly estimate the count. What would cause PG to so badly estimate the row counts? I've already regenerated the indexes and re-analyzed the tables involved. What else can I do to find out why it's running so slowly? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Troubleshooting query performance issues - resolved (sort of)
We have traced this to the *addition* of a two-column index. -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql- performance-ow...@postgresql.org] On Behalf Of Jim Garrison Sent: Wednesday, September 25, 2013 8:58 AM To: pgsql-performance@postgresql.org Subject: [PERFORM] Troubleshooting query performance issues I spent about a week optimizing a query in our performance-testing environment, which has hardware similar to production. I was able to refactor the query and reduce the runtime from hours to about 40 seconds, through the use of CTEs and a couple of new indexes. The database was rebuilt and refreshed with the very similar data from production, but now the query takes hours again. In the query plan, it is clear that the row count estimates are WAY too low, even though the statistics are up to date. Here's a sample query plan: [snip] The two tables in question both have single-column indexes on two foreign keys, say columns A and B. The query joins the two large tables on A and B. With only the two indexes, the query plan does a bitmap AND on the index scan results and performance is stable. I added an index on (A,B), and this caused the planner to use the new index, but I was never able to get the query to complete. In one instance I let it run 18 hours. The onlly difference was the addition of the index Summary: - With index on (A,B) -- query time is infinite - Without index on (A,B), relying on individual indexes and bitmap AND -- query time is about 4 minutes (as expected given the data volume) Does this sound like a bug in the query planner? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planner performance extremely affected by an hanging transaction (20-30 times)?
Andres, Jeff, As far as I can tell, the only downside of doing that is that, since hint bits might be set later, it is possible some dirty pages will get written unhinted and then re-dirtied by the hint bit setting, when more aggressive setting would have only one combined dirty write instead. But that seems rather hypothetical, and if it really is a problem we should probably tackle it directly rather than by barring other optimizations. I am - as evidenced - too tired to think about this properly, but I think you might be right here. Any thoughts on a fix for this we could get into 9.2.5? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planner performance extremely affected by an hanging transaction (20-30 times)?
On 2013-09-27 13:57:02 -0700, Josh Berkus wrote: Andres, Jeff, As far as I can tell, the only downside of doing that is that, since hint bits might be set later, it is possible some dirty pages will get written unhinted and then re-dirtied by the hint bit setting, when more aggressive setting would have only one combined dirty write instead. But that seems rather hypothetical, and if it really is a problem we should probably tackle it directly rather than by barring other optimizations. I am - as evidenced - too tired to think about this properly, but I think you might be right here. Any thoughts on a fix for this we could get into 9.2.5? I don't see much chance to apply anything like this in a backbranch. Changing IO patterns in a noticeable way in a minor release is just asking for trouble. Also, this really isn't going to fix the issue discussed here - this was just about the additional ProcArrayLock contention. I don't think it would change anything dramatical in your case. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance