Re: [PERFORM] Fast HashJoin only after a cluster/recreate table

David Rowley Fri, 01 Apr 2016 06:18:14 -0700

On 1 April 2016 at 15:44, Alexandre de Arruda Paes <[email protected]>
wrote:


> In the query below, the planner choose an extreme slow mergejoin(380
> seconds). 'Vacuum analyze' can't help.
>

Yeah, it appears that planner is estimating the WHERE clause on es09t quite
badly, expecting just 1 row, but there's actually 1212 rows.  This seems to
throw the whole plan decision out quite a bit, as, if you notice in the
merge left join for t1.es09tipdoc = t2.es08tipdoc, it expect just 2 rows to
be present, therefore most likely thinks that it's not worth sorting those
results on t1.es09tipdoc, t1.es09numdoc in order for it to match the known
output order of Materialize node on the inner side of that join. Instead it
assumes the Merge join will be good enough on just the es09tipdoc order,
and just adds the other two columns as join filters....

... and this seems to be what's killing the performance. This Merge Join
constantly has to perform a mark/restore on the Materialize node. This why
you're getting the insanely high "Rows Removed by Join Filter: 992875295",
in other words the join filter throw away that many row combinations
because they didn't match.

This mark/restore is basically the rewind process that the merge join
algorithm needs to do to match many to many rows. In actual fact, this
rewind is pretty useless in this case as the GROUP BY subquery ensures that
no duplicate values will make it into the inner side of that merge join.
The planner is not currently smart enough to detect this.

There are some patches currently pending for 9.6 which might help fix this
problem in the future;

1. multivariate statistics; this might help the poor estimates on es09t. If
this was available you could add statistics on es09codemp, es09datreq,
which may well improve the estimate and cause the plan to change.
https://commitfest.postgresql.org/9/450/
2. Unique joins. This tackles the problem a different way and allows the
Merge Join algorithm to skip the restore with some new smarts that are
added to the planner to detect when the inner side of the join can only
produce, at most, a single row for each outer row.
https://commitfest.postgresql.org/9/129/

If you feel like compiling 9.6 devel from source and applying each of these
patches independently and seeing if it helps...  Of course that does not
solve your 9.4 production dilemma, but it may help evaluation of each of
these two patches for 9.6 or beyond.

I wonder what the planner would do if you pulled out the join to ES08T. If
that generates a better plan, then providing that es08tipdoc is the primary
key of that table, then you could just put a subquery in the SELECT clause
to lookup the es08desdoc.


>
>     QUERY PLAN
>
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Sort  (cost=289546.93..289546.94 rows=2 width=78) (actual
> time=380405.796..380405.929 rows=2408 loops=1)
>    Sort Key: t1.es09numdoc, t1.es09tipdoc
>    Sort Method: quicksort  Memory: 435kB
>    Buffers: shared hit=82163
>    ->  Merge Left Join  (cost=47.09..289546.92 rows=2 width=78) (actual
> time=1133.077..380398.160 rows=2408 loops=1)
>          Merge Cond: (t1.es09tipdoc = es09t1.es09tipdoc)
>          Join Filter: ((es09t1.es09codemp = t1.es09codemp) AND
> (es09t1.es09numdoc = t1.es09numdoc))
>          Rows Removed by Join Filter: 992875295
>          Buffers: shared hit=82163
>          ->  Merge Left Join  (cost=46.53..49.29 rows=2 width=70) (actual
> time=12.206..18.155 rows=2408 loops=1)
>                Merge Cond: (t1.es09tipdoc = t2.es08tipdoc)
>                Buffers: shared hit=6821
>                ->  Sort  (cost=9.19..9.19 rows=2 width=44) (actual
> time=11.611..12.248 rows=2408 loops=1)
>                      Sort Key: t1.es09tipdoc
>                      Sort Method: quicksort  Memory: 285kB
>                      Buffers: shared hit=6814
>                      ->  Nested Loop  (cost=1.11..9.18 rows=2 width=44)
> (actual time=0.040..10.398 rows=2408 loops=1)
>                            Buffers: shared hit=6814
>                            ->  Index Scan using ad_es09t_1 on es09t t3
>  (cost=0.56..4.58 rows=1 width=42) (actual time=0.020..0.687 rows=1212
> loops=1)
>                                  Index Cond: ((es09codemp = 1) AND
> (es09datreq >= '2016-02-02'::date) AND (es09datreq <= '2016-02-02'::date))
>                                  Filter: (es09usuari ~~
> '%%%%%%%%%%%%%%%%%%%%'::text)
>                                  Buffers: shared hit=108
>                            ->  Index Scan using es09t1_pkey on es09t1 t1
>  (cost=0.56..4.59 rows=1 width=19) (actual time=0.006..0.007 rows=2
> loops=1212)
>                                  Index Cond: ((es09codemp = 1) AND
> (es09tipdoc = t3.es09tipdoc) AND (es09numdoc = t3.es09numdoc))
>                                  Filter: (es09tipdoc ~~ '%%%%%'::text)
>                                  Buffers: shared hit=6706
>                ->  Sort  (cost=37.35..38.71 rows=547 width=32) (actual
> time=0.592..2.206 rows=2919 loops=1)
>                      Sort Key: t2.es08tipdoc
>                      Sort Method: quicksort  Memory: 67kB
>                      Buffers: shared hit=7
>                      ->  Seq Scan on es08t t2  (cost=0.00..12.47 rows=547
> width=32) (actual time=0.003..0.126 rows=547 loops=1)
>                            Buffers: shared hit=7
>          ->  Materialize  (cost=0.56..287644.85 rows=716126 width=23)
> (actual time=0.027..68577.800 rows=993087854 loops=1)
>                Buffers: shared hit=75342
>                ->  GroupAggregate  (cost=0.56..278693.28 rows=716126
> width=15) (actual time=0.025..4242.453 rows=3607573 loops=1)
>                      Group Key: es09t1.es09codemp, es09t1.es09tipdoc,
> es09t1.es09numdoc
>                      Buffers: shared hit=75342
>                      ->  Index Only Scan using es09t1_pkey on es09t1
>  (cost=0.56..199919.49 rows=7161253 width=15) (actual time=0.016..1625.031
> rows=7160921 loops=1)
>                            Index Cond: (es09codemp = 1)
>                            Heap Fetches: 51499
>                            Buffers: shared hit=75342
>  Planning time: 50.129 ms
>  Execution time: 380419.435 ms
>


-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [PERFORM] Fast HashJoin only after a cluster/recreate table

Reply via email to