Re: [GENERAL] Regression in query optimizer when upgrading from 9.4.5 to 9.5.2?
On Jun 20, 2016, at 1:36 PM, David G. Johnston mailto:david.g.johns...@gmail.com>> wrote: Please don't top-post. Yes, using "[VACUUM] ANALYZE" is what was meant (the VACUUM doesn't hurt - and possibly helps - but wasn't required). ANALYZE recomputes the statistics for your database. The apparent problem was that those statistics were wrong which causes the planner to choose the wrong plan. EXPLAIN ANALYZE doesn't help in that scenario - the explain and planner both end up using the bad statistics. The addition of ANALYZE to EXPLAIN simply tells the system to not only explain your query but to execute it as well (but discard the results). Aside from sharing the same 7 characters the two words have nothing in common. I'm not sure how one would "...includ[e] ANALYZE in EXPLAIN ANALYZE" - you only get to explain one statement at a time. David J. Thank you, David. My confusion originated from a lack of familiarity with the ANALYZE command. Your responses and Tom’s response have been very enlightening. The head-scratcher for us is that our statistics became so out of date even though we have the autovacuum daemon enabled in RDS, and according to the docs that does run ANALYZE periodically. Now we know (thanks to your help) to check for this issue immediately when the planner is showing a large disparity between the estimated and actual cost. Markus E.
Re: [GENERAL] Regression in query optimizer when upgrading from 9.4.5 to 9.5.2?
On Fri, Jun 17, 2016 at 5:05 PM, Erdmann, Markus @ Bellevue < markus.erdm...@cbre.com> wrote: > Thank you Tom and David for your very helpful replies. We dumped and > restored the RDS staging database on a local installation of pg and were > not able to reproduce the issue in 9.5.2, which led us to try running a > VACUUM ANALYZE and recreating indexes. After this we no longer saw a > discrepancy between the query plan in the separate environments. Is this > what you meant, Tom, by making sure to ANALYZE? Or did you mean including > ANALYZE in EXPLAIN ANALYZE? > > Please don't top-post. Yes, using "[VACUUM] ANALYZE" is what was meant (the VACUUM doesn't hurt - and possibly helps - but wasn't required). ANALYZE recomputes the statistics for your database. The apparent problem was that those statistics were wrong which causes the planner to choose the wrong plan. EXPLAIN ANALYZE doesn't help in that scenario - the explain and planner both end up using the bad statistics. The addition of ANALYZE to EXPLAIN simply tells the system to not only explain your query but to execute it as well (but discard the results). Aside from sharing the same 7 characters the two words have nothing in common. I'm not sure how one would "...includ[e] ANALYZE in EXPLAIN ANALYZE" - you only get to explain one statement at a time. David J.
Re: [GENERAL] Regression in query optimizer when upgrading from 9.4.5 to 9.5.2?
Thank you Tom and David for your very helpful replies. We dumped and restored the RDS staging database on a local installation of pg and were not able to reproduce the issue in 9.5.2, which led us to try running a VACUUM ANALYZE and recreating indexes. After this we no longer saw a discrepancy between the query plan in the separate environments. Is this what you meant, Tom, by making sure to ANALYZE? Or did you mean including ANALYZE in EXPLAIN ANALYZE? On 6/17/16, 11:17 AM, "Tom Lane" wrote: >"Erdmann, Markus @ Bellevue" writes: >> We¹re trying to debug a performance issue affecting our staging >>database, and we¹ve narrowed it down to a difference in the query >>optimizer in 9.5.2. Upgrading to 9.5 is important for us because we need >>the ability to import foreign schemas. > >I think the core of the problem is the large rowcount estimation error >here: > >> -> Bitmap Index Scan on >>transactions_transaction_c69e55a4 (cost=0.00..18.02 rows=161 width=0) >>(actual time=20.153..20.153 rows=269021 loops=1) >> Index Cond: ((date_created >= >>'2010-01-01'::date) AND (date_created <= '2015-12-31'::date)) > >That's a pretty simple condition and it's hard to believe that 9.5 does it >much differently than 9.4 did. Perhaps you forgot to ANALYZE, or were >using a larger statistics target in the 9.4 installation? > > regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Regression in query optimizer when upgrading from 9.4.5 to 9.5.2?
"Erdmann, Markus @ Bellevue" writes: > Were trying to debug a performance issue affecting our staging database, and > weve narrowed it down to a difference in the query optimizer in 9.5.2. > Upgrading to 9.5 is important for us because we need the ability to import > foreign schemas. I think the core of the problem is the large rowcount estimation error here: > -> Bitmap Index Scan on > transactions_transaction_c69e55a4 (cost=0.00..18.02 rows=161 width=0) > (actual time=20.153..20.153 rows=269021 loops=1) > Index Cond: ((date_created >= > '2010-01-01'::date) AND (date_created <= '2015-12-31'::date)) That's a pretty simple condition and it's hard to believe that 9.5 does it much differently than 9.4 did. Perhaps you forgot to ANALYZE, or were using a larger statistics target in the 9.4 installation? regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Regression in query optimizer when upgrading from 9.4.5 to 9.5.2?
On Thu, Jun 16, 2016 at 8:40 PM, Erdmann, Markus @ Bellevue < markus.erdm...@cbre.com> wrote: > Hello, > > We’re trying to debug a performance issue affecting our staging database, > and we’ve narrowed it down to a difference in the query optimizer in 9.5.2. > Upgrading to 9.5 is important for us because we need the ability to import > foreign schemas. > > This is the query we’re running: > > CREATE TEMP TABLE tmp_joined_transactions_9gkgptn5xcp9 ( transaction_id > integer PRIMARY KEY ); > > INSERT INTO tmp_joined_transactions_9gkgptn5xcp9 (transaction_id) > This... > SELECT DISTINCT ON ("transactions_transaction"."id") > "transactions_transaction"."id" > DISTINCT is a code smell. DISTINCT ON less so - it helps to avoid self-joins - but your inclusion of ON here is pointless since the only output column is "id". As written there should be no way to get duplicate "id"s into the output result. Or, if the tmp_joined_transactions relationship is 1-to-many you should instead use a semi-join instead of an inner join. FROM "transactions_transaction" , "tmp_joined_transactions_75chlsokrsev" > WHERE > Here... (NOT ("transactions_transaction"."id" > IN (SELECT U0."id" AS Col1 > FROM "transactions_transaction" U0 > LEFT OUTER JOIN "transactions_commission" U1 > ON ( U0."id" = U1."transaction_id" ) > WHERE U1."id" IS NULL)) > Making this an anti-join (NOT EXISTS instead of NOT IN) stands out as an easy improvement to try: It also makes the logic clearer since you seem to have a double-negative here which means you really want a semi-join (which I wrote below) WHERE EXISTS (SELECT 1 FROM transactions_commission WHERE transactions_transaction.id = transactions_commission.transaction_id) I won't promise this gives the same answer...I don't have enough spare brain power or the ability to test it...but its seems correct. AND "transactions_transaction"."date_created" >= '2010-01-01'::date > AND "transactions_transaction"."date_created" <= '2015-12-31'::date > AND "transactions_transaction"."deal_status" IN (1) > Also... > AND (transactions_transaction.id = > tmp_joined_transactions_75chlsokrsev.transaction_id)) > This is style but I'm really a fan of using ANSI JOIN syntax...turning the above into a ORDER BY "transactions_transaction"."id" ASC; > > The regression itself someone else would need to comment on. David J.