Re: [PERFORM] Fast HashJoin only after a cluster/recreate table
Hi David, Thanks for the explanations. But I don't undestand why when I recreate the table, the planner choose a best mode for sometime... >I wonder what the planner would do if you pulled out the join to ES08T. If that generates a better plan, then providing that es08tipdoc is the primary key of that table, then you could just put a > subquery in the SELECT clause to lookup the es08desdoc. We have a problem with this approach. Actually, this querys are generated by a framework and can't be 'rewrite'. Can you think in another solution directly in DB (perhaps a partial index, table partitioning, etc) ??? Best regards, Alexandre 2016-04-01 10:17 GMT-03:00 David Rowley: > On 1 April 2016 at 15:44, Alexandre de Arruda Paes > wrote: > >> In the query below, the planner choose an extreme slow mergejoin(380 >> seconds). 'Vacuum analyze' can't help. >> > > Yeah, it appears that planner is estimating the WHERE clause on es09t > quite badly, expecting just 1 row, but there's actually 1212 rows. This > seems to throw the whole plan decision out quite a bit, as, if you notice > in the merge left join for t1.es09tipdoc = t2.es08tipdoc, it expect just 2 > rows to be present, therefore most likely thinks that it's not worth > sorting those results on t1.es09tipdoc, t1.es09numdoc in order for it to > match the known output order of Materialize node on the inner side of that > join. Instead it assumes the Merge join will be good enough on just the > es09tipdoc order, and just adds the other two columns as join filters > > ... and this seems to be what's killing the performance. This Merge Join > constantly has to perform a mark/restore on the Materialize node. This why > you're getting the insanely high "Rows Removed by Join Filter: 992875295", > in other words the join filter throw away that many row combinations > because they didn't match. > > This mark/restore is basically the rewind process that the merge join > algorithm needs to do to match many to many rows. In actual fact, this > rewind is pretty useless in this case as the GROUP BY subquery ensures that > no duplicate values will make it into the inner side of that merge join. > The planner is not currently smart enough to detect this. > > There are some patches currently pending for 9.6 which might help fix this > problem in the future; > > 1. multivariate statistics; this might help the poor estimates on es09t. > If this was available you could add statistics on es09codemp, es09datreq, > which may well improve the estimate and cause the plan to change. > https://commitfest.postgresql.org/9/450/ > 2. Unique joins. This tackles the problem a different way and allows the > Merge Join algorithm to skip the restore with some new smarts that are > added to the planner to detect when the inner side of the join can only > produce, at most, a single row for each outer row. > https://commitfest.postgresql.org/9/129/ > > If you feel like compiling 9.6 devel from source and applying each of > these patches independently and seeing if it helps... Of course that does > not solve your 9.4 production dilemma, but it may help evaluation of each > of these two patches for 9.6 or beyond. > > I wonder what the planner would do if you pulled out the join to ES08T. If > that generates a better plan, then providing that es08tipdoc is the primary > key of that table, then you could just put a subquery in the SELECT clause > to lookup the es08desdoc. > > >> >> QUERY PLAN >> >> >> -- >> Sort (cost=289546.93..289546.94 rows=2 width=78) (actual >> time=380405.796..380405.929 rows=2408 loops=1) >>Sort Key: t1.es09numdoc, t1.es09tipdoc >>Sort Method: quicksort Memory: 435kB >>Buffers: shared hit=82163 >>-> Merge Left Join (cost=47.09..289546.92 rows=2 width=78) (actual >> time=1133.077..380398.160 rows=2408 loops=1) >> Merge Cond: (t1.es09tipdoc = es09t1.es09tipdoc) >> Join Filter: ((es09t1.es09codemp = t1.es09codemp) AND >> (es09t1.es09numdoc = t1.es09numdoc)) >> Rows Removed by Join Filter: 992875295 >> Buffers: shared hit=82163 >> -> Merge Left Join (cost=46.53..49.29 rows=2 width=70) (actual >> time=12.206..18.155 rows=2408 loops=1) >>Merge Cond: (t1.es09tipdoc = t2.es08tipdoc) >>Buffers: shared hit=6821 >>-> Sort (cost=9.19..9.19 rows=2 width=44) (actual >> time=11.611..12.248 rows=2408 loops=1) >> Sort Key: t1.es09tipdoc >> Sort Method: quicksort Memory: 285kB >> Buffers: shared hit=6814 >> -> Nested Loop (cost=1.11..9.18 rows=2 width=44) >> (actual time=0.040..10.398 rows=2408 loops=1) >>Buffers: shared hit=6814 >>
Re: [PERFORM] Fast HashJoin only after a cluster/recreate table
On 1 April 2016 at 15:44, Alexandre de Arruda Paeswrote: > In the query below, the planner choose an extreme slow mergejoin(380 > seconds). 'Vacuum analyze' can't help. > Yeah, it appears that planner is estimating the WHERE clause on es09t quite badly, expecting just 1 row, but there's actually 1212 rows. This seems to throw the whole plan decision out quite a bit, as, if you notice in the merge left join for t1.es09tipdoc = t2.es08tipdoc, it expect just 2 rows to be present, therefore most likely thinks that it's not worth sorting those results on t1.es09tipdoc, t1.es09numdoc in order for it to match the known output order of Materialize node on the inner side of that join. Instead it assumes the Merge join will be good enough on just the es09tipdoc order, and just adds the other two columns as join filters ... and this seems to be what's killing the performance. This Merge Join constantly has to perform a mark/restore on the Materialize node. This why you're getting the insanely high "Rows Removed by Join Filter: 992875295", in other words the join filter throw away that many row combinations because they didn't match. This mark/restore is basically the rewind process that the merge join algorithm needs to do to match many to many rows. In actual fact, this rewind is pretty useless in this case as the GROUP BY subquery ensures that no duplicate values will make it into the inner side of that merge join. The planner is not currently smart enough to detect this. There are some patches currently pending for 9.6 which might help fix this problem in the future; 1. multivariate statistics; this might help the poor estimates on es09t. If this was available you could add statistics on es09codemp, es09datreq, which may well improve the estimate and cause the plan to change. https://commitfest.postgresql.org/9/450/ 2. Unique joins. This tackles the problem a different way and allows the Merge Join algorithm to skip the restore with some new smarts that are added to the planner to detect when the inner side of the join can only produce, at most, a single row for each outer row. https://commitfest.postgresql.org/9/129/ If you feel like compiling 9.6 devel from source and applying each of these patches independently and seeing if it helps... Of course that does not solve your 9.4 production dilemma, but it may help evaluation of each of these two patches for 9.6 or beyond. I wonder what the planner would do if you pulled out the join to ES08T. If that generates a better plan, then providing that es08tipdoc is the primary key of that table, then you could just put a subquery in the SELECT clause to lookup the es08desdoc. > > QUERY PLAN > > > -- > Sort (cost=289546.93..289546.94 rows=2 width=78) (actual > time=380405.796..380405.929 rows=2408 loops=1) >Sort Key: t1.es09numdoc, t1.es09tipdoc >Sort Method: quicksort Memory: 435kB >Buffers: shared hit=82163 >-> Merge Left Join (cost=47.09..289546.92 rows=2 width=78) (actual > time=1133.077..380398.160 rows=2408 loops=1) > Merge Cond: (t1.es09tipdoc = es09t1.es09tipdoc) > Join Filter: ((es09t1.es09codemp = t1.es09codemp) AND > (es09t1.es09numdoc = t1.es09numdoc)) > Rows Removed by Join Filter: 992875295 > Buffers: shared hit=82163 > -> Merge Left Join (cost=46.53..49.29 rows=2 width=70) (actual > time=12.206..18.155 rows=2408 loops=1) >Merge Cond: (t1.es09tipdoc = t2.es08tipdoc) >Buffers: shared hit=6821 >-> Sort (cost=9.19..9.19 rows=2 width=44) (actual > time=11.611..12.248 rows=2408 loops=1) > Sort Key: t1.es09tipdoc > Sort Method: quicksort Memory: 285kB > Buffers: shared hit=6814 > -> Nested Loop (cost=1.11..9.18 rows=2 width=44) > (actual time=0.040..10.398 rows=2408 loops=1) >Buffers: shared hit=6814 >-> Index Scan using ad_es09t_1 on es09t t3 > (cost=0.56..4.58 rows=1 width=42) (actual time=0.020..0.687 rows=1212 > loops=1) > Index Cond: ((es09codemp = 1) AND > (es09datreq >= '2016-02-02'::date) AND (es09datreq <= '2016-02-02'::date)) > Filter: (es09usuari ~~ > ''::text) > Buffers: shared hit=108 >-> Index Scan using es09t1_pkey on es09t1 t1 > (cost=0.56..4.59 rows=1 width=19) (actual time=0.006..0.007 rows=2 > loops=1212) > Index Cond: ((es09codemp = 1) AND > (es09tipdoc = t3.es09tipdoc) AND (es09numdoc = t3.es09numdoc)) > Filter: (es09tipdoc ~~ '%'::text) >
[PERFORM] Fast HashJoin only after a cluster/recreate table
Hi, In the query below, the planner choose an extreme slow mergejoin(380 seconds). 'Vacuum analyze' can't help. If I CLUSTER (or recreate) table ES09T1, the planner choose a faster hashjoin (about 10 seconds). But, obviously, I can't do that with the users connected. After some time after cluster(generally in the same day), the problem returns. Autovacuum is on, but the tables are vacuumed forced after pg_dump, 3 times in a day (00:00 - 12:00 - 23:00). Postgresql 9.4.5 128GB RAM/10xRAID10 SAS 15k shared_buffers = 8GB work_mem = 256MB maintenance_work_mem = 16GB random_page_cost = 2.0 effective_cache_size = 120GB db=# explain (buffers,analyze) SELECT T1.es09item, T1.es09status, T3.es09usuari, T3.es09datreq, T2.es08desdoc AS es09desdoc, T1.es09numdoc, T1.es09tipdoc AS es09tipdoc, T1.es09codemp, COALESCE( T4.es09quatre, 0) AS es09quatre FROM (((ES09T1 T1 LEFT JOIN ES08T T2 ON T2.es08tipdoc = T1.es09tipdoc) LEFT JOIN ES09T T3 ON T3.es09codemp = T1.es09codemp AND T3.es09tipdoc = T1.es09tipdoc AND T3.es09numdoc = T1.es09numdoc) LEFT JOIN (SELECT COUNT(*) AS es09quatre, es09codemp, es09tipdoc, es09numdoc FROM ES09T1 GROUP BY es09codemp, es09tipdoc, es09numdoc ) T4 ON T4.es09codemp = T1.es09codemp AND T4.es09tipdoc = T1.es09tipdoc AND T4.es09numdoc = T1.es09numdoc) WHERE (T1.es09codemp = 1) and (T3.es09datreq >= '2016-02-02' and T3.es09datreq <= '2016-02-02') and (T3.es09usuari like '') and (T1.es09tipdoc like '%') ORDER BY T1.es09codemp, T1.es09numdoc DESC, T1.es09tipdoc; QUERY PLAN -- Sort (cost=289546.93..289546.94 rows=2 width=78) (actual time=380405.796..380405.929 rows=2408 loops=1) Sort Key: t1.es09numdoc, t1.es09tipdoc Sort Method: quicksort Memory: 435kB Buffers: shared hit=82163 -> Merge Left Join (cost=47.09..289546.92 rows=2 width=78) (actual time=1133.077..380398.160 rows=2408 loops=1) Merge Cond: (t1.es09tipdoc = es09t1.es09tipdoc) Join Filter: ((es09t1.es09codemp = t1.es09codemp) AND (es09t1.es09numdoc = t1.es09numdoc)) Rows Removed by Join Filter: 992875295 Buffers: shared hit=82163 -> Merge Left Join (cost=46.53..49.29 rows=2 width=70) (actual time=12.206..18.155 rows=2408 loops=1) Merge Cond: (t1.es09tipdoc = t2.es08tipdoc) Buffers: shared hit=6821 -> Sort (cost=9.19..9.19 rows=2 width=44) (actual time=11.611..12.248 rows=2408 loops=1) Sort Key: t1.es09tipdoc Sort Method: quicksort Memory: 285kB Buffers: shared hit=6814 -> Nested Loop (cost=1.11..9.18 rows=2 width=44) (actual time=0.040..10.398 rows=2408 loops=1) Buffers: shared hit=6814 -> Index Scan using ad_es09t_1 on es09t t3 (cost=0.56..4.58 rows=1 width=42) (actual time=0.020..0.687 rows=1212 loops=1) Index Cond: ((es09codemp = 1) AND (es09datreq >= '2016-02-02'::date) AND (es09datreq <= '2016-02-02'::date)) Filter: (es09usuari ~~ ''::text) Buffers: shared hit=108 -> Index Scan using es09t1_pkey on es09t1 t1 (cost=0.56..4.59 rows=1 width=19) (actual time=0.006..0.007 rows=2 loops=1212) Index Cond: ((es09codemp = 1) AND (es09tipdoc = t3.es09tipdoc) AND (es09numdoc = t3.es09numdoc)) Filter: (es09tipdoc ~~ '%'::text) Buffers: shared hit=6706 -> Sort (cost=37.35..38.71 rows=547 width=32) (actual time=0.592..2.206 rows=2919 loops=1) Sort Key: t2.es08tipdoc Sort Method: quicksort Memory: 67kB Buffers: shared hit=7 -> Seq Scan on es08t t2 (cost=0.00..12.47 rows=547 width=32) (actual time=0.003..0.126 rows=547 loops=1) Buffers: shared hit=7 -> Materialize (cost=0.56..287644.85 rows=716126 width=23) (actual time=0.027..68577.800 rows=993087854 loops=1) Buffers: shared hit=75342 -> GroupAggregate (cost=0.56..278693.28 rows=716126 width=15) (actual time=0.025..4242.453 rows=3607573 loops=1) Group Key: es09t1.es09codemp, es09t1.es09tipdoc, es09t1.es09numdoc Buffers: shared hit=75342 -> Index Only Scan using es09t1_pkey on es09t1 (cost=0.56..199919.49 rows=7161253 width=15) (actual time=0.016..1625.031 rows=7160921 loops=1) Index Cond: (es09codemp = 1) Heap Fetches: 51499 Buffers: shared hit=75342 Planning time: 50.129 ms Execution time: