Re: [PERFORM] Performance bug in prepared statement binding in 9.2?
On Wed, Sep 11, 2013 at 2:10 PM, Andres Freund and...@2ndquadrant.comwrote: On 2013-09-11 15:06:23 -0400, Andrew Dunstan wrote: One thing that this made me wonder is why we don't have transaction_timeout, or maybe transaction_idle_timeout. Because it's harder than it sounds, at least if you want to support idle-in-transactions. Note that we do not support pg_cancel_backend() for those yet... So we are left with pg_terminate_backend in a cron job. That mostly seems to work, because usually apps that abandon connections in the idle-in-transaction state will never check back on them anyway, but cancel would be nicer. Also, I think it might lead to papering over actual issues with applications leaving transactions open. I don't really see a valid reason for an application needing cancelling of long idle transactions. Some of us make a living, at least partially, by papering over issues with 3rd party applications that we have no control over! Cheers, Jeff
Re: [PERFORM] Planner performance extremely affected by an hanging transaction (20-30 times)?
On Tue, Sep 24, 2013 at 3:35 AM, Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kgri...@ymail.com writes: Are we talking about the probe for the end (or beginning) of an index? If so, should we even care about visibility of the row related to the most extreme index entry? Should we even go to the heap during the plan phase? Consider the case where some transaction inserted a wildly out-of-range value, then rolled back. If we don't check validity of the heap row, we'd be using that silly endpoint value for planning purposes --- indefinitely. Would it really be indefinite? Would it be different from if someone inserted a wild value, committed, then deleted it and committed that? It seems like eventually the histogram would have to get rebuilt with the ability to shrink the range. To get really complicated, it could stop at an in-progress tuple and use its value for immediate purposes, but suppress storing it in the histogram (storing only committed, not in-progress, values). Cheers, Jeff
Re: [PERFORM] Planner performance extremely affected by an hanging transaction (20-30 times)?
On Tue, Sep 24, 2013 at 10:43 AM, Josh Berkus j...@agliodbs.com wrote: On 09/24/2013 08:01 AM, jes...@krogh.cc wrote: This stuff is a 9.2 feature right? What was the original problem to be adressed? Earlier, actually. 9.1? 9.0? The problem addressed was that, for tables with a progressive value like a sequence or a timestamp, the planner tended to estimate 1 row any time the user queried the 10,000 most recent rows due to the stats being out-of-date. This resulted in some colossally bad query plans for a very common situation. So there's no question that the current behavior is an improvement, since it affects *only* users who have left an idle transaction open for long periods of time, something you're not supposed to do anyway. Some transaction just take a long time to complete their work. If the first thing it does is insert these poisoned values, then go on to do other intensive work on other tables, it can do some serious damage without being idle. Not that we shouldn't fix it (and backport the fix), but we don't want to regress to the prior planner behavior. However, a solution is not readily obvious: The mergejoinscansel code is almost pathologically designed to exercise this case (which seems to be what is doing in the original poster) because it systematically probes the highest and lowest values from one table against the other. If they have the same range, that means it will always be testing the upper limit. Perhaps mergejoinscansel could pass a flag to prevent the look-up from happening. My gut feeling is that mergejoin it would not be very sensitive to the progressive value issue, but I can't really back that up. On the other hand, if we could just make getting the actual value faster then everyone would be better off. On 09/20/2013 03:01 PM, Jeff Janes wrote: 3) Even worse, asking if a given transaction has finished yet can be a serious point of system-wide contention, because it takes the ProcArrayLock, once per row which needs to be checked. So you have 20 processes all fighting over the ProcArrayLock, each doing so 1000 times per query. Why do we need a procarraylock for this? Seems like the solution would be not to take a lock at all; the information on transaction commit is in the clog, after all. My understanding is that you are not allowed to check the clog until after you verify the transaction is no longer in progress, otherwise you open up race conditions. Cheers, Jeff
Re: [PERFORM] Slow plan for MAX/MIN or LIMIT 1?
Hi All and Merlin, So here is the explain analyze output. -- Query A -- single row output, but very slow query -- SELECT min(log_id) FROM event_log WHERE event='S-Create' AND insert_time'2013-09-15' and insert_time'2013-09-16' http://explain.depesz.com/s/3H5 Result (cost=134.48..134.49 rows=1 width=0) (actual time=348370.719..348370.720 rows=1 loops=1) Output: $0 InitPlan 1 (returns $0) - Limit (cost=0.00..134.48 rows=1 width=8) (actual time=348370.712..348370.713 rows=1 loops=1) Output: uco.event_log.log_id - Index Scan using event_log_pkey on uco.event_log (cost=0.00..1525564.02 rows=11344 width=8) (actual time=348370.709..348370.709 rows=1 loops=1) Output: uco.event_log.log_id Index Cond: (uco.event_log.log_id IS NOT NULL) Filter: ((uco.event_log.insert_time '2013-09-15 00:00:00'::timestamp without time zone) AND (uco.event_log.insert_time '2013-09-16 00:00:00'::timestamp without time zone) AND (uco.event_log.event = 'S-Create'::text)) Rows Removed by Filter: 43249789 Total runtime: 348370.762 ms -- Query B -- multiple row output, fast query, but I could get what I want from the first output row -- SELECT log_id FROM event_log WHERE event='S-Create' AND insert_time'2013-09-15' and insert_time'2013-09-16' ORDER BY log_id http://explain.depesz.com/s/s6P Sort (cost=41015.85..41021.52 rows=11344 width=8) (actual time=3651.695..3652.160 rows=6948 loops=1) Output: log_id Sort Key: event_log.log_id Sort Method: quicksort Memory: 518kB - Bitmap Heap Scan on uco.event_log (cost=311.42..40863.05 rows=11344 width=8) (actual time=448.349..3645.465 rows=6948 loops=1) Output: log_id Recheck Cond: ((event_log.event = 'S-Create'::text) AND (event_log.insert_time '2013-09-15 00:00:00'::timestamp without time zone) AND (event_log.insert_time '2013-09-16 00:00:00'::timestamp without time zone)) - Bitmap Index Scan on event_data_search (cost=0.00..310.86 rows=11344 width=0) (actual time=447.670..447.670 rows=6948 loops=1) Index Cond: ((event_log.event = 'S-Create'::text) AND (event_log.insert_time '2013-09-15 00:00:00'::timestamp without time zone) AND (event_log.insert_time '2013-09-16 00:00:00'::timestamp without time zone)) Total runtime: 3652.535 ms P.S. If I put a LIMIT 1 at the end of this query, it will get an identical plan just like Query A. -- My observation: In Query A, the lower bound of the INDEX SCAN node estimation is way off. It won't get the first row output right at 0.00 because the filter needed to be applied. Thanks, Sam -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow plan for MAX/MIN or LIMIT 1?
On Tue, Sep 24, 2013 at 4:56 PM, Claudio Freire klaussfre...@gmail.com wrote: On Tue, Sep 24, 2013 at 6:24 AM, Sam Wong s...@hellosam.net wrote: This event_log table has 4 million rows. “log_id” is the primary key (bigint), there is a composite index “event_data_search” over (event::text, insert_time::datetime). I think you need to add log_id to that composite index to get pg to use it. hurk: OP is two statistics misses (one of them massive that are combing to gobsmack you). your solution unfortuantely wont work: you can't combine two range searches in a single index scan. it would probably work if you it like this. If insert_time is a timestamp, not a timestamptz, we can convert it to date to get what I think he wants (as long as his queries are along date boundaries). how about: CREATE INDEX ON event_log(event_id, insert_time::date, log_id); EXPLAIN ANALYZE SELECT * FROM event_log WHERE (event_id, insert_time::date, log_id) = ('S-Create', '2013-09-15'::date, 0) AND event_id = 'S-Create' AND insert_time::date '2013-09-16'::date ORDER BY event_id, insert_time::date, log_id LIMIT 1 if insert_time is a timestamptz, we can materialize the date into the table to get around that (timestamptz-date is a stable expression). If date boundary handling is awkward, our best bet is probably to hack the planner with a CTE. Note the above query will smoke the CTE based one. WITH data AS ( SELECT log_id FROM event_log WHERE event='S-Create' AND insert_time'2013-09-15' and insert_time'2013-09-16' ) SELECT * from data ORDER BY log_id LIMIT 1; merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] earthdistance query performance
Hi, I have a table with zip_code and latitude and longitude. \d zip_code_based_lng_lat Table public.zip_code_based_lng_lat Column | Type | Modifiers ++--- zip| character varying(100) | state | character varying(100) | city | character varying(100) | type | character varying(100) | lat| character varying(100) | lng| character varying(100) | Indexes: zip_code_based_lng_lat_zipidx btree (zip) I need to find the closest distance using the radius formula using a zip_code provided by user. I build the query like: select *, earth_distance(q2_c1, q1.c1) as d from ( select *, ll_to_earth(lat::float,lng::float) as c1 from zip_code_based_lng_lat ) as q1, ( select ll_to_earth(lat::float,lng::float) q2_c1 from zip_code_based_lng_lat where zip='18938' ) as q2 order by d limit 10 Limit (cost=216010.21..216010.24 rows=10 width=55) (actual time=38296.185..38296.191 rows=10 loops=1) - Sort (cost=216010.21..216415.74 rows=162212 width=55) (actual time=38296.182..38296.182 rows=10 loops=1) Sort Key: (sec_to_gc(cube_distance((ll_to_earth((public.zip_code_based_lng_lat.lat)::double precision, (public.zip_code_based_lng_lat.lng)::double precision))::cube, (ll _to_earth((public.zip_code_based_lng_lat.lat)::double precision, (public.zip_code_based_lng_lat.lng)::double precision))::cube))) Sort Method: top-N heapsort Memory: 27kB - Nested Loop (cost=0.00..212504.87 rows=162212 width=55) (actual time=3.244..38052.444 rows=81106 loops=1) - Seq Scan on zip_code_based_lng_lat (cost=0.00..817.90 rows=81106 width=38) (actual time=0.025..50.669 rows=81106 loops=1) - Materialize (cost=0.00..0.32 rows=2 width=17) (actual time=0.000..0.001 rows=1 loops=81106) - Index Scan using zip_code_based_lng_lat_zipidx on zip_code_based_lng_lat (cost=0.00..0.31 rows=2 width=17) (actual time=0.080..0.084 rows=1 loops=1) Index Cond: ((zip)::text = '18938'::text) Total runtime: 38296.360 ms The result is fine. But it is too slow. I am using Postgresql 9.2 with following parameters: shared_buffers = 6GB work_mem = 500 MB seq_page_cost = 0.01 random_page_cost = 0.01 Any idea to improve it. Thanks.
Re: [PERFORM] Slow plan for MAX/MIN or LIMIT 1?
On Wed, Sep 25, 2013 at 10:29 AM, Merlin Moncure mmonc...@gmail.com wrote: On Tue, Sep 24, 2013 at 4:56 PM, Claudio Freire klaussfre...@gmail.com wrote: On Tue, Sep 24, 2013 at 6:24 AM, Sam Wong s...@hellosam.net wrote: This event_log table has 4 million rows. “log_id” is the primary key (bigint), there is a composite index “event_data_search” over (event::text, insert_time::datetime). I think you need to add log_id to that composite index to get pg to use it. hurk: OP is two statistics misses (one of them massive that are combing to gobsmack you). your solution unfortuantely wont work: you can't combine two range searches in a single index scan. it would probably work if you it like this. If insert_time is a timestamp, not a timestamptz, we can convert it to date to get what I think he wants (as long as his queries are along date boundaries). I was thinking an index over: (event, date_trunc('day', insert_time), log_id) And the query like SELECT min(log_id) FROM event_log WHERE event='S-Create' AND date_trunc('day',insert_time) = '2013-09-15' That's a regular simple range scan over the index. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow plan for MAX/MIN or LIMIT 1?
On Wed, Sep 25, 2013 at 10:20 AM, Claudio Freire klaussfre...@gmail.com wrote: On Wed, Sep 25, 2013 at 10:29 AM, Merlin Moncure mmonc...@gmail.com wrote: On Tue, Sep 24, 2013 at 4:56 PM, Claudio Freire klaussfre...@gmail.com wrote: On Tue, Sep 24, 2013 at 6:24 AM, Sam Wong s...@hellosam.net wrote: This event_log table has 4 million rows. “log_id” is the primary key (bigint), there is a composite index “event_data_search” over (event::text, insert_time::datetime). I think you need to add log_id to that composite index to get pg to use it. hurk: OP is two statistics misses (one of them massive that are combing to gobsmack you). your solution unfortuantely wont work: you can't combine two range searches in a single index scan. it would probably work if you it like this. If insert_time is a timestamp, not a timestamptz, we can convert it to date to get what I think he wants (as long as his queries are along date boundaries). I was thinking an index over: (event, date_trunc('day', insert_time), log_id) And the query like SELECT min(log_id) FROM event_log WHERE event='S-Create' AND date_trunc('day',insert_time) = '2013-09-15' That's a regular simple range scan over the index. *) date_trunc has same problems as ::date: it is stable expression only for timestamptz. also, the index will be bigger since you're still indexing timestamp *) row wise comparison search might be faster and is generalized to return N records, not jut one. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Troubleshooting query performance issues
I spent about a week optimizing a query in our performance-testing environment, which has hardware similar to production. I was able to refactor the query and reduce the runtime from hours to about 40 seconds, through the use of CTEs and a couple of new indexes. The database was rebuilt and refreshed with the very similar data from production, but now the query takes hours again. In the query plan, it is clear that the row count estimates are WAY too low, even though the statistics are up to date. Here's a sample query plan: CTE Scan on stef (cost=164.98..165.00 rows=1 width=38) CTE terms - Nested Loop (cost=0.00..62.40 rows=1 width=12) - Index Scan using term_idx1 on term t (cost=0.00..52.35 rows=1 width=12) Index Cond: (partner_id = 497) Filter: (recalculate_district_averages_yn AND (NOT is_deleted_yn)) - Index Scan using growth_measurement_window_fk1 on growth_measurement_window gw (cost=0.00..10.04 rows=1 width=4) Index Cond: (term_id = t.term_id) Filter: (test_window_complete_yn AND (NOT is_deleted_yn) AND ((growth_window_type)::text = 'DISTRICT'::text)) CTE stef - Nested Loop (cost=0.00..102.58 rows=1 width=29) Join Filter: ((ssef.student_id = terf.student_id) AND (ssef.grade_id = terf.grade_id)) - Nested Loop (cost=0.00..18.80 rows=3 width=28) - CTE Scan on terms t (cost=0.00..0.02 rows=1 width=8) - Index Scan using student_school_enrollment_fact_idx2 on student_school_enrollment_fact ssef (cost=0.00..18.74 rows=3 width=20) Index Cond: ((partner_id = t.partner_id) AND (term_id = t.term_id)) Filter: primary_yn - Index Scan using test_event_result_fact_idx3 on test_event_result_fact terf (cost=0.00..27.85 rows=4 width=25) Index Cond: ((partner_id = t.partner_id) AND (term_id = t.term_id)) Filter: growth_event_yn The estimates in the first CTE are correct, but in the second, the scan on student_school_enrollment_fact will return about 1.5 million rows, and the scan on test_event_result_fact actually returns about 1.1 million. The top level join should return about 900K rows. I believe the fundamental issue is that the CTE stef outer nested loop should be a merge join instead, but I cannot figure out why the optimizer is estimating one row when it has the statistics to correctly estimate the count. What would cause PG to so badly estimate the row counts? I've already regenerated the indexes and re-analyzed the tables involved. What else can I do to find out why it's running so slowly? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] earthdistance query performance
On Wed, Sep 25, 2013 at 10:05 AM, AI Rumman rumman...@gmail.com wrote: Hi, I have a table with zip_code and latitude and longitude. \d zip_code_based_lng_lat Table public.zip_code_based_lng_lat Column | Type | Modifiers ++--- zip| character varying(100) | state | character varying(100) | city | character varying(100) | type | character varying(100) | lat| character varying(100) | lng| character varying(100) | Indexes: zip_code_based_lng_lat_zipidx btree (zip) I need to find the closest distance using the radius formula using a zip_code provided by user. I build the query like: select *, earth_distance(q2_c1, q1.c1) as d from ( select *, ll_to_earth(lat::float,lng::float) as c1 from zip_code_based_lng_lat ) as q1, ( select ll_to_earth(lat::float,lng::float) q2_c1 from zip_code_based_lng_lat where zip='18938' ) as q2 order by d limit 10 Limit (cost=216010.21..216010.24 rows=10 width=55) (actual time=38296.185..38296.191 rows=10 loops=1) - Sort (cost=216010.21..216415.74 rows=162212 width=55) (actual time=38296.182..38296.182 rows=10 loops=1) Sort Key: (sec_to_gc(cube_distance((ll_to_earth((public.zip_code_based_lng_lat.lat)::double precision, (public.zip_code_based_lng_lat.lng)::double precision))::cube, (ll _to_earth((public.zip_code_based_lng_lat.lat)::double precision, (public.zip_code_based_lng_lat.lng)::double precision))::cube))) Sort Method: top-N heapsort Memory: 27kB - Nested Loop (cost=0.00..212504.87 rows=162212 width=55) (actual time=3.244..38052.444 rows=81106 loops=1) - Seq Scan on zip_code_based_lng_lat (cost=0.00..817.90 rows=81106 width=38) (actual time=0.025..50.669 rows=81106 loops=1) - Materialize (cost=0.00..0.32 rows=2 width=17) (actual time=0.000..0.001 rows=1 loops=81106) - Index Scan using zip_code_based_lng_lat_zipidx on zip_code_based_lng_lat (cost=0.00..0.31 rows=2 width=17) (actual time=0.080..0.084 rows=1 loops=1) Index Cond: ((zip)::text = '18938'::text) Total runtime: 38296.360 ms your problem is the sort is happening before the limit. you need to reconfigure your query so that's compatible with nearest neighbor search (which was introduced with 9.1). merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Troubleshooting query performance issues
On Wed, Sep 25, 2013 at 8:58 AM, Jim Garrison jim.garri...@nwea.org wrote: I spent about a week optimizing a query in our performance-testing environment, which has hardware similar to production. I was able to refactor the query and reduce the runtime from hours to about 40 seconds, through the use of CTEs and a couple of new indexes. The database was rebuilt and refreshed with the very similar data from production, but now the query takes hours again. In the query plan, it is clear that the row count estimates are WAY too low, even though the statistics are up to date. Here's a sample query plan: CTE Scan on stef (cost=164.98..165.00 rows=1 width=38) CTE terms - Nested Loop (cost=0.00..62.40 rows=1 width=12) - Index Scan using term_idx1 on term t (cost=0.00..52.35 rows=1 width=12) Index Cond: (partner_id = 497) Filter: (recalculate_district_averages_yn AND (NOT is_deleted_yn)) - Index Scan using growth_measurement_window_fk1 on growth_measurement_window gw (cost=0.00..10.04 rows=1 width=4) Index Cond: (term_id = t.term_id) Filter: (test_window_complete_yn AND (NOT is_deleted_yn) AND ((growth_window_type)::text = 'DISTRICT'::text)) CTE stef - Nested Loop (cost=0.00..102.58 rows=1 width=29) Join Filter: ((ssef.student_id = terf.student_id) AND (ssef.grade_id = terf.grade_id)) - Nested Loop (cost=0.00..18.80 rows=3 width=28) - CTE Scan on terms t (cost=0.00..0.02 rows=1 width=8) - Index Scan using student_school_enrollment_fact_idx2 on student_school_enrollment_fact ssef (cost=0.00..18.74 rows=3 width=20) Index Cond: ((partner_id = t.partner_id) AND (term_id = t.term_id)) Filter: primary_yn - Index Scan using test_event_result_fact_idx3 on test_event_result_fact terf (cost=0.00..27.85 rows=4 width=25) Index Cond: ((partner_id = t.partner_id) AND (term_id = t.term_id)) Filter: growth_event_yn The estimates in the first CTE are correct, but in the second, the scan on student_school_enrollment_fact will return about 1.5 million rows, and the scan on test_event_result_fact actually returns about 1.1 million. The top level join should return about 900K rows. I believe the fundamental issue is that the CTE stef outer nested loop should be a merge join instead, but I cannot figure out why the optimizer is estimating one row when it has the statistics to correctly estimate the count. What would cause PG to so badly estimate the row counts? I've already regenerated the indexes and re-analyzed the tables involved. What else can I do to find out why it's running so slowly? More details about the environment would probably be helpful: https://wiki.postgresql.org/wiki/Slow_Query_Questions Are you able to swap out the CTE for a temp table and index that (+analyze) to compare against the CTE version?
Re: [PERFORM] Slow plan for MAX/MIN or LIMIT 1?
-Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Merlin Moncure Sent: Wednesday, September 25, 2013 23:55 To: Claudio Freire Cc: Sam Wong; postgres performance list Subject: Re: [PERFORM] Slow plan for MAX/MIN or LIMIT 1? On Wed, Sep 25, 2013 at 10:20 AM, Claudio Freire klaussfre...@gmail.com wrote: On Wed, Sep 25, 2013 at 10:29 AM, Merlin Moncure mmonc...@gmail.com wrote: On Tue, Sep 24, 2013 at 4:56 PM, Claudio Freire klaussfre...@gmail.com wrote: On Tue, Sep 24, 2013 at 6:24 AM, Sam Wong s...@hellosam.net wrote: This event_log table has 4 million rows. log_id is the primary key (bigint), there is a composite index event_data_search over (event::text, insert_time::datetime). I think you need to add log_id to that composite index to get pg to use it. hurk: OP is two statistics misses (one of them massive that are combing to gobsmack you). your solution unfortuantely wont work: you can't combine two range searches in a single index scan. it would probably work if you it like this. If insert_time is a timestamp, not a timestamptz, we can convert it to date to get what I think he wants (as long as his queries are along date boundaries). I was thinking an index over: (event, date_trunc('day', insert_time), log_id) And the query like SELECT min(log_id) FROM event_log WHERE event='S-Create' AND date_trunc('day',insert_time) = '2013-09-15' That's a regular simple range scan over the index. *) date_trunc has same problems as ::date: it is stable expression only for timestamptz. also, the index will be bigger since you're still indexing timestamp *) row wise comparison search might be faster and is generalized to return N records, not jut one. merlin I'm afraid it's not anything about composite index. (Because the query B works fine and the plan is as expected) BTW, the timestamp is without timezone. I just thought of another way that demonstrate the issue. Both query returns the same one row result. log_id is the bigint primary key, event_data_search is still that indexed. --- Fast query --- with Q AS ( select event from event_log WHERE log_id1 and log_id5 order by event ) SELECT * FROM Q LIMIT 1 Limit (cost=2521.82..2521.83 rows=1 width=32) (actual time=88.342..88.342 rows=1 loops=1) Output: q.event Buffers: shared hit=93 read=622 CTE q - Sort (cost=2502.07..2521.82 rows=39502 width=6) (actual time=88.335..88.335 rows=1 loops=1) Output: event_log.event Sort Key: event_log.event Sort Method: quicksort Memory: 3486kB Buffers: shared hit=93 read=622 - Index Scan using event_log_pkey on uco.event_log (cost=0.00..1898.89 rows=39502 width=6) (actual time=13.918..65.573 rows=3 loops=1) Output: event_log.event Index Cond: ((event_log.log_id 101) AND (event_log.log_id 105)) Buffers: shared hit=93 read=622 - CTE Scan on q (cost=0.00..237.01 rows=39502 width=32) (actual time=88.340..88.340 rows=1 loops=1) Output: q.event Buffers: shared hit=93 read=622 Total runtime: 89.039 ms --- Slow Query --- Result (cost=1241.05..1241.05 rows=1 width=0) (actual time=1099.532..1099.533 rows=1 loops=1) Output: $0 Buffers: shared hit=49029 read=57866 InitPlan 1 (returns $0) - Limit (cost=0.00..1241.05 rows=1 width=6) (actual time=1099.527..1099.527 rows=1 loops=1) Output: uco.event_log.event Buffers: shared hit=49029 read=57866 - Index Scan using event_data_search on uco.event_log (cost=0.00..49024009.79 rows=39502 width=6) (actual time=1099.523..1099.523 rows=1 loops=1) Output: uco.event_log.event Index Cond: (uco.event_log.event IS NOT NULL) Filter: ((uco.event_log.log_id 101) AND (uco.event_log.log_id 105)) Rows Removed by Filter: 303884 Buffers: shared hit=49029 read=57866 Total runtime: 1099.568 ms (Note: Things got buffered so it goes down to 1 second, but comparing to the buffer count with the query above this is a few orders slower than that) --- The CTE fast query works, it is completed with index scan over event_log_pkey, which is what I am expecting, and it is good. But it's much straight forward to write it as the slow query, I am expecting the planner would give the same optimum plan for both. For the plan of Slow query has an estimated total cost of 1241.05, and the Fast query has 2521.83, hence the planner prefers that plan - the scanning over the event_data_search index plan - but this choice doesn't make sense to me. This is my point I want to bring up, why the planner would do that? Instead of scanning over the event_log_pkey? Looking into the estimated cost of the Slow Query Plan, the Index Scan node lower bound estimation is 0.00. IIRC,
Re: [PERFORM] Slow plan for MAX/MIN or LIMIT 1?
-Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Sam Wong Sent: Thursday, September 26, 2013 0:34 To: 'postgres performance list' Subject: Re: [PERFORM] Slow plan for MAX/MIN or LIMIT 1? -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Merlin Moncure Sent: Wednesday, September 25, 2013 23:55 To: Claudio Freire Cc: Sam Wong; postgres performance list Subject: Re: [PERFORM] Slow plan for MAX/MIN or LIMIT 1? On Wed, Sep 25, 2013 at 10:20 AM, Claudio Freire klaussfre...@gmail.com wrote: On Wed, Sep 25, 2013 at 10:29 AM, Merlin Moncure mmonc...@gmail.com wrote: On Tue, Sep 24, 2013 at 4:56 PM, Claudio Freire klaussfre...@gmail.com wrote: On Tue, Sep 24, 2013 at 6:24 AM, Sam Wong s...@hellosam.net wrote: This event_log table has 4 million rows. log_id is the primary key (bigint), there is a composite index event_data_search over (event::text, insert_time::datetime). I think you need to add log_id to that composite index to get pg to use it. hurk: OP is two statistics misses (one of them massive that are combing to gobsmack you). your solution unfortuantely wont work: you can't combine two range searches in a single index scan. it would probably work if you it like this. If insert_time is a timestamp, not a timestamptz, we can convert it to date to get what I think he wants (as long as his queries are along date boundaries). I was thinking an index over: (event, date_trunc('day', insert_time), log_id) And the query like SELECT min(log_id) FROM event_log WHERE event='S-Create' AND date_trunc('day',insert_time) = '2013-09-15' That's a regular simple range scan over the index. *) date_trunc has same problems as ::date: it is stable expression only for timestamptz. also, the index will be bigger since you're still indexing timestamp *) row wise comparison search might be faster and is generalized to return N records, not jut one. merlin I'm afraid it's not anything about composite index. (Because the query B works fine and the plan is as expected) BTW, the timestamp is without timezone. I just thought of another way that demonstrate the issue. Both query returns the same one row result. log_id is the bigint primary key, event_data_search is still that indexed. --- Fast query --- Excuse me, this is the actual query. with Q AS ( select event from event_log WHERE log_id101 and log_id105 order by event ) SELECT * FROM Q LIMIT 1 Limit (cost=2521.82..2521.83 rows=1 width=32) (actual time=88.342..88.342 rows=1 loops=1) Output: q.event Buffers: shared hit=93 read=622 CTE q - Sort (cost=2502.07..2521.82 rows=39502 width=6) (actual time=88.335..88.335 rows=1 loops=1) Output: event_log.event Sort Key: event_log.event Sort Method: quicksort Memory: 3486kB Buffers: shared hit=93 read=622 - Index Scan using event_log_pkey on uco.event_log (cost=0.00..1898.89 rows=39502 width=6) (actual time=13.918..65.573 rows=3 loops=1) Output: event_log.event Index Cond: ((event_log.log_id 101) AND (event_log.log_id 105)) Buffers: shared hit=93 read=622 - CTE Scan on q (cost=0.00..237.01 rows=39502 width=32) (actual time=88.340..88.340 rows=1 loops=1) Output: q.event Buffers: shared hit=93 read=622 Total runtime: 89.039 ms --- Slow Query --- And the query for this...I must have forgot to paste. select min(event) from event_log WHERE log_id101 and log_id105 Result (cost=1241.05..1241.05 rows=1 width=0) (actual time=1099.532..1099.533 rows=1 loops=1) Output: $0 Buffers: shared hit=49029 read=57866 InitPlan 1 (returns $0) - Limit (cost=0.00..1241.05 rows=1 width=6) (actual time=1099.527..1099.527 rows=1 loops=1) Output: uco.event_log.event Buffers: shared hit=49029 read=57866 - Index Scan using event_data_search on uco.event_log (cost=0.00..49024009.79 rows=39502 width=6) (actual time=1099.523..1099.523 rows=1 loops=1) Output: uco.event_log.event Index Cond: (uco.event_log.event IS NOT NULL) Filter: ((uco.event_log.log_id 101) AND (uco.event_log.log_id 105)) Rows Removed by Filter: 303884 Buffers: shared hit=49029 read=57866 Total runtime: 1099.568 ms (Note: Things got buffered so it goes down to 1 second, but comparing to the buffer count with the query above this is a few orders slower than that) --- The CTE fast query works, it is completed with index scan over event_log_pkey, which is what I am expecting, and it is good. But it's much
Re: [PERFORM] Slow plan for MAX/MIN or LIMIT 1?
On Wed, Sep 25, 2013 at 12:54 PM, Merlin Moncure mmonc...@gmail.com wrote: I was thinking an index over: (event, date_trunc('day', insert_time), log_id) And the query like SELECT min(log_id) FROM event_log WHERE event='S-Create' AND date_trunc('day',insert_time) = '2013-09-15' That's a regular simple range scan over the index. *) date_trunc has same problems as ::date: it is stable expression only for timestamptz. also, the index will be bigger since you're still indexing timestamp Ah, yes, good point. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why is n_distinct always -1 for range types?
On 09/19/2013 01:47 PM, Josh Berkus wrote: Test: 1. create a table with a range type column. 2. insert 1000 identical values into that column. 3. analyze 4. n-distinct will still be listed as -1 (unique) for the column. Why? Anyone? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planner performance extremely affected by an hanging transaction (20-30 times)?
On 09/25/2013 12:06 AM, Jeff Janes wrote: Why do we need a procarraylock for this? Seems like the solution would be not to take a lock at all; the information on transaction commit is in the clog, after all. My understanding is that you are not allowed to check the clog until after you verify the transaction is no longer in progress, otherwise you open up race conditions. In this particular case, I'd argue that we don't care about race conditions -- it's a plan estimate. We certainly care about them a lot less than lock-blocks. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planner performance extremely affected by an hanging transaction (20-30 times)?
On 2013-09-25 00:06:06 -0700, Jeff Janes wrote: On 09/20/2013 03:01 PM, Jeff Janes wrote: 3) Even worse, asking if a given transaction has finished yet can be a serious point of system-wide contention, because it takes the ProcArrayLock, once per row which needs to be checked. So you have 20 processes all fighting over the ProcArrayLock, each doing so 1000 times per query. That should be gone in master, we don't use SnapshotNow anymore which had those TransactionIdIsInProgress() calls you're probably referring to. The lookups discussed in this thread now use the statement's snapshot. And all those have their own copy of the currently running transactions. Why do we need a procarraylock for this? Seems like the solution would be not to take a lock at all; the information on transaction commit is in the clog, after all. More clog accesses would hardly improve the situation. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why is n_distinct always -1 for range types?
On Thu, Sep 19, 2013 at 1:47 PM, Josh Berkus j...@agliodbs.com wrote: 4. n-distinct will still be listed as -1 (unique) for the column. Why? Because of this: https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/rangetypes_typanalyze.c#L205 We only collect and use histograms of lower and upper bounds for range types, and the fraction of empty ranges. -- Regards, Peter Geoghegan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planner performance extremely affected by an hanging transaction (20-30 times)?
On Wed, Sep 25, 2013 at 10:53 AM, Andres Freund and...@2ndquadrant.comwrote: On 2013-09-25 00:06:06 -0700, Jeff Janes wrote: On 09/20/2013 03:01 PM, Jeff Janes wrote: 3) Even worse, asking if a given transaction has finished yet can be a serious point of system-wide contention, because it takes the ProcArrayLock, once per row which needs to be checked. So you have 20 processes all fighting over the ProcArrayLock, each doing so 1000 times per query. That should be gone in master, we don't use SnapshotNow anymore which had those TransactionIdIsInProgress() calls you're probably referring to. The lookups discussed in this thread now use the statement's snapshot. And all those have their own copy of the currently running transactions. See HeapTupleSatisfiesMVCC, near line 943 of tqual.c: else if (TransactionIdIsInProgress(HeapTupleHeaderGetXmin(tuple))) return false; else if (TransactionIdDidCommit(HeapTupleHeaderGetXmin(tuple))) SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED, HeapTupleHeaderGetXmin(tuple)); If we guarded that check by moving up line 961 to before 943: if (XidInMVCCSnapshot(HeapTupleHeaderGetXmin(tuple), snapshot)) return false; /* treat as still in progress */ Then we could avoid the contention, as that check only refers to local memory. As far as I can tell, the only downside of doing that is that, since hint bits might be set later, it is possible some dirty pages will get written unhinted and then re-dirtied by the hint bit setting, when more aggressive setting would have only one combined dirty write instead. But that seems rather hypothetical, and if it really is a problem we should probably tackle it directly rather than by barring other optimizations. Cheers, Jeff
Re: [PERFORM] Planner performance extremely affected by an hanging transaction (20-30 times)?
On 2013-09-25 11:17:51 -0700, Jeff Janes wrote: On Wed, Sep 25, 2013 at 10:53 AM, Andres Freund and...@2ndquadrant.comwrote: On 2013-09-25 00:06:06 -0700, Jeff Janes wrote: On 09/20/2013 03:01 PM, Jeff Janes wrote: 3) Even worse, asking if a given transaction has finished yet can be a serious point of system-wide contention, because it takes the ProcArrayLock, once per row which needs to be checked. So you have 20 processes all fighting over the ProcArrayLock, each doing so 1000 times per query. That should be gone in master, we don't use SnapshotNow anymore which had those TransactionIdIsInProgress() calls you're probably referring to. The lookups discussed in this thread now use the statement's snapshot. And all those have their own copy of the currently running transactions. See HeapTupleSatisfiesMVCC, near line 943 of tqual.c: else if (TransactionIdIsInProgress(HeapTupleHeaderGetXmin(tuple))) return false; else if (TransactionIdDidCommit(HeapTupleHeaderGetXmin(tuple))) SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED, HeapTupleHeaderGetXmin(tuple)); Hm, sorry, misrembered things a bit there. If we guarded that check by moving up line 961 to before 943: if (XidInMVCCSnapshot(HeapTupleHeaderGetXmin(tuple), snapshot)) return false; /* treat as still in progress */ Then we could avoid the contention, as that check only refers to local memory. That wouldn't be correct afaics - the current TransactionIdIsInProgress callsite is only called when no HEAP_XMIN_COMMITTED was set. So you would have to duplicate it. As far as I can tell, the only downside of doing that is that, since hint bits might be set later, it is possible some dirty pages will get written unhinted and then re-dirtied by the hint bit setting, when more aggressive setting would have only one combined dirty write instead. But that seems rather hypothetical, and if it really is a problem we should probably tackle it directly rather than by barring other optimizations. I am - as evidenced - too tired to think about this properly, but I think you might be right here. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance