Re: [PERFORM] Fast HashJoin only after a cluster/recreate table

2016-04-01 Thread Alexandre de Arruda Paes
Hi David,

Thanks for the explanations. But I don't undestand why when I recreate the
table, the planner choose a best mode for sometime...

>I wonder what the planner would do if you pulled out the join to ES08T. If
that generates a better plan, then providing that es08tipdoc is the primary
key of that table, then you could just put a
> subquery in the SELECT clause to lookup the es08desdoc.

We have a problem with this approach. Actually, this querys are generated
by a framework and can't be 'rewrite'.
Can you think in another solution directly in DB (perhaps a partial index,
table partitioning, etc) ???

Best regards,

Alexandre



2016-04-01 10:17 GMT-03:00 David Rowley :

> On 1 April 2016 at 15:44, Alexandre de Arruda Paes 
> wrote:
>
>> In the query below, the planner choose an extreme slow mergejoin(380
>> seconds). 'Vacuum analyze' can't help.
>>
>
> Yeah, it appears that planner is estimating the WHERE clause on es09t
> quite badly, expecting just 1 row, but there's actually 1212 rows.  This
> seems to throw the whole plan decision out quite a bit, as, if you notice
> in the merge left join for t1.es09tipdoc = t2.es08tipdoc, it expect just 2
> rows to be present, therefore most likely thinks that it's not worth
> sorting those results on t1.es09tipdoc, t1.es09numdoc in order for it to
> match the known output order of Materialize node on the inner side of that
> join. Instead it assumes the Merge join will be good enough on just the
> es09tipdoc order, and just adds the other two columns as join filters
>
> ... and this seems to be what's killing the performance. This Merge Join
> constantly has to perform a mark/restore on the Materialize node. This why
> you're getting the insanely high "Rows Removed by Join Filter: 992875295",
> in other words the join filter throw away that many row combinations
> because they didn't match.
>
> This mark/restore is basically the rewind process that the merge join
> algorithm needs to do to match many to many rows. In actual fact, this
> rewind is pretty useless in this case as the GROUP BY subquery ensures that
> no duplicate values will make it into the inner side of that merge join.
> The planner is not currently smart enough to detect this.
>
> There are some patches currently pending for 9.6 which might help fix this
> problem in the future;
>
> 1. multivariate statistics; this might help the poor estimates on es09t.
> If this was available you could add statistics on es09codemp, es09datreq,
> which may well improve the estimate and cause the plan to change.
> https://commitfest.postgresql.org/9/450/
> 2. Unique joins. This tackles the problem a different way and allows the
> Merge Join algorithm to skip the restore with some new smarts that are
> added to the planner to detect when the inner side of the join can only
> produce, at most, a single row for each outer row.
> https://commitfest.postgresql.org/9/129/
>
> If you feel like compiling 9.6 devel from source and applying each of
> these patches independently and seeing if it helps...  Of course that does
> not solve your 9.4 production dilemma, but it may help evaluation of each
> of these two patches for 9.6 or beyond.
>
> I wonder what the planner would do if you pulled out the join to ES08T. If
> that generates a better plan, then providing that es08tipdoc is the primary
> key of that table, then you could just put a subquery in the SELECT clause
> to lookup the es08desdoc.
>
>
>>
>> QUERY PLAN
>>
>>
>> --
>>  Sort  (cost=289546.93..289546.94 rows=2 width=78) (actual
>> time=380405.796..380405.929 rows=2408 loops=1)
>>Sort Key: t1.es09numdoc, t1.es09tipdoc
>>Sort Method: quicksort  Memory: 435kB
>>Buffers: shared hit=82163
>>->  Merge Left Join  (cost=47.09..289546.92 rows=2 width=78) (actual
>> time=1133.077..380398.160 rows=2408 loops=1)
>>  Merge Cond: (t1.es09tipdoc = es09t1.es09tipdoc)
>>  Join Filter: ((es09t1.es09codemp = t1.es09codemp) AND
>> (es09t1.es09numdoc = t1.es09numdoc))
>>  Rows Removed by Join Filter: 992875295
>>  Buffers: shared hit=82163
>>  ->  Merge Left Join  (cost=46.53..49.29 rows=2 width=70) (actual
>> time=12.206..18.155 rows=2408 loops=1)
>>Merge Cond: (t1.es09tipdoc = t2.es08tipdoc)
>>Buffers: shared hit=6821
>>->  Sort  (cost=9.19..9.19 rows=2 width=44) (actual
>> time=11.611..12.248 rows=2408 loops=1)
>>  Sort Key: t1.es09tipdoc
>>  Sort Method: quicksort  Memory: 285kB
>>  Buffers: shared hit=6814
>>  ->  Nested Loop  (cost=1.11..9.18 rows=2 width=44)
>> (actual time=0.040..10.398 rows=2408 loops=1)
>>Buffers: shared hit=6814
>>  

Re: [PERFORM] Fast HashJoin only after a cluster/recreate table

2016-04-01 Thread David Rowley
On 1 April 2016 at 15:44, Alexandre de Arruda Paes 
wrote:

> In the query below, the planner choose an extreme slow mergejoin(380
> seconds). 'Vacuum analyze' can't help.
>

Yeah, it appears that planner is estimating the WHERE clause on es09t quite
badly, expecting just 1 row, but there's actually 1212 rows.  This seems to
throw the whole plan decision out quite a bit, as, if you notice in the
merge left join for t1.es09tipdoc = t2.es08tipdoc, it expect just 2 rows to
be present, therefore most likely thinks that it's not worth sorting those
results on t1.es09tipdoc, t1.es09numdoc in order for it to match the known
output order of Materialize node on the inner side of that join. Instead it
assumes the Merge join will be good enough on just the es09tipdoc order,
and just adds the other two columns as join filters

... and this seems to be what's killing the performance. This Merge Join
constantly has to perform a mark/restore on the Materialize node. This why
you're getting the insanely high "Rows Removed by Join Filter: 992875295",
in other words the join filter throw away that many row combinations
because they didn't match.

This mark/restore is basically the rewind process that the merge join
algorithm needs to do to match many to many rows. In actual fact, this
rewind is pretty useless in this case as the GROUP BY subquery ensures that
no duplicate values will make it into the inner side of that merge join.
The planner is not currently smart enough to detect this.

There are some patches currently pending for 9.6 which might help fix this
problem in the future;

1. multivariate statistics; this might help the poor estimates on es09t. If
this was available you could add statistics on es09codemp, es09datreq,
which may well improve the estimate and cause the plan to change.
https://commitfest.postgresql.org/9/450/
2. Unique joins. This tackles the problem a different way and allows the
Merge Join algorithm to skip the restore with some new smarts that are
added to the planner to detect when the inner side of the join can only
produce, at most, a single row for each outer row.
https://commitfest.postgresql.org/9/129/

If you feel like compiling 9.6 devel from source and applying each of these
patches independently and seeing if it helps...  Of course that does not
solve your 9.4 production dilemma, but it may help evaluation of each of
these two patches for 9.6 or beyond.

I wonder what the planner would do if you pulled out the join to ES08T. If
that generates a better plan, then providing that es08tipdoc is the primary
key of that table, then you could just put a subquery in the SELECT clause
to lookup the es08desdoc.


>
> QUERY PLAN
>
>
> --
>  Sort  (cost=289546.93..289546.94 rows=2 width=78) (actual
> time=380405.796..380405.929 rows=2408 loops=1)
>Sort Key: t1.es09numdoc, t1.es09tipdoc
>Sort Method: quicksort  Memory: 435kB
>Buffers: shared hit=82163
>->  Merge Left Join  (cost=47.09..289546.92 rows=2 width=78) (actual
> time=1133.077..380398.160 rows=2408 loops=1)
>  Merge Cond: (t1.es09tipdoc = es09t1.es09tipdoc)
>  Join Filter: ((es09t1.es09codemp = t1.es09codemp) AND
> (es09t1.es09numdoc = t1.es09numdoc))
>  Rows Removed by Join Filter: 992875295
>  Buffers: shared hit=82163
>  ->  Merge Left Join  (cost=46.53..49.29 rows=2 width=70) (actual
> time=12.206..18.155 rows=2408 loops=1)
>Merge Cond: (t1.es09tipdoc = t2.es08tipdoc)
>Buffers: shared hit=6821
>->  Sort  (cost=9.19..9.19 rows=2 width=44) (actual
> time=11.611..12.248 rows=2408 loops=1)
>  Sort Key: t1.es09tipdoc
>  Sort Method: quicksort  Memory: 285kB
>  Buffers: shared hit=6814
>  ->  Nested Loop  (cost=1.11..9.18 rows=2 width=44)
> (actual time=0.040..10.398 rows=2408 loops=1)
>Buffers: shared hit=6814
>->  Index Scan using ad_es09t_1 on es09t t3
>  (cost=0.56..4.58 rows=1 width=42) (actual time=0.020..0.687 rows=1212
> loops=1)
>  Index Cond: ((es09codemp = 1) AND
> (es09datreq >= '2016-02-02'::date) AND (es09datreq <= '2016-02-02'::date))
>  Filter: (es09usuari ~~
> ''::text)
>  Buffers: shared hit=108
>->  Index Scan using es09t1_pkey on es09t1 t1
>  (cost=0.56..4.59 rows=1 width=19) (actual time=0.006..0.007 rows=2
> loops=1212)
>  Index Cond: ((es09codemp = 1) AND
> (es09tipdoc = t3.es09tipdoc) AND (es09numdoc = t3.es09numdoc))
>  Filter: (es09tipdoc ~~ '%'::text)
>   

[PERFORM] Fast HashJoin only after a cluster/recreate table

2016-03-31 Thread Alexandre de Arruda Paes
Hi,

In the query below, the planner choose an extreme slow mergejoin(380
seconds). 'Vacuum analyze' can't help.
If I CLUSTER (or recreate) table ES09T1, the planner choose a faster
hashjoin (about 10 seconds). But, obviously, I can't do that with the users
connected.
After some time after cluster(generally in the same day), the problem
returns. Autovacuum is on, but the tables are vacuumed forced after
pg_dump, 3 times in a day (00:00 - 12:00 - 23:00).

Postgresql 9.4.5
128GB RAM/10xRAID10 SAS 15k
shared_buffers = 8GB

work_mem = 256MB

maintenance_work_mem = 16GB
random_page_cost = 2.0

effective_cache_size = 120GB


db=# explain (buffers,analyze) SELECT T1.es09item, T1.es09status,
T3.es09usuari, T3.es09datreq, T2.es08desdoc AS es09desdoc, T1.es09numdoc,
T1.es09tipdoc AS es09tipdoc, T1.es09codemp, COALESCE( T4.es09quatre, 0) AS
es09quatre FROM (((ES09T1 T1 LEFT JOIN ES08T T2 ON T2.es08tipdoc =
T1.es09tipdoc) LEFT JOIN ES09T T3 ON T3.es09codemp = T1.es09codemp AND
T3.es09tipdoc = T1.es09tipdoc AND T3.es09numdoc = T1.es09numdoc) LEFT JOIN
(SELECT COUNT(*) AS es09quatre, es09codemp, es09tipdoc, es09numdoc FROM
ES09T1 GROUP BY es09codemp, es09tipdoc, es09numdoc ) T4 ON T4.es09codemp =
T1.es09codemp AND T4.es09tipdoc = T1.es09tipdoc AND T4.es09numdoc =
T1.es09numdoc) WHERE (T1.es09codemp = 1) and (T3.es09datreq >= '2016-02-02'
and T3.es09datreq <= '2016-02-02') and (T3.es09usuari like
'') and (T1.es09tipdoc like '%') ORDER BY
T1.es09codemp, T1.es09numdoc DESC, T1.es09tipdoc;

  QUERY PLAN

--
 Sort  (cost=289546.93..289546.94 rows=2 width=78) (actual
time=380405.796..380405.929 rows=2408 loops=1)
   Sort Key: t1.es09numdoc, t1.es09tipdoc
   Sort Method: quicksort  Memory: 435kB
   Buffers: shared hit=82163
   ->  Merge Left Join  (cost=47.09..289546.92 rows=2 width=78) (actual
time=1133.077..380398.160 rows=2408 loops=1)
 Merge Cond: (t1.es09tipdoc = es09t1.es09tipdoc)
 Join Filter: ((es09t1.es09codemp = t1.es09codemp) AND
(es09t1.es09numdoc = t1.es09numdoc))
 Rows Removed by Join Filter: 992875295
 Buffers: shared hit=82163
 ->  Merge Left Join  (cost=46.53..49.29 rows=2 width=70) (actual
time=12.206..18.155 rows=2408 loops=1)
   Merge Cond: (t1.es09tipdoc = t2.es08tipdoc)
   Buffers: shared hit=6821
   ->  Sort  (cost=9.19..9.19 rows=2 width=44) (actual
time=11.611..12.248 rows=2408 loops=1)
 Sort Key: t1.es09tipdoc
 Sort Method: quicksort  Memory: 285kB
 Buffers: shared hit=6814
 ->  Nested Loop  (cost=1.11..9.18 rows=2 width=44)
(actual time=0.040..10.398 rows=2408 loops=1)
   Buffers: shared hit=6814
   ->  Index Scan using ad_es09t_1 on es09t t3
 (cost=0.56..4.58 rows=1 width=42) (actual time=0.020..0.687 rows=1212
loops=1)
 Index Cond: ((es09codemp = 1) AND
(es09datreq >= '2016-02-02'::date) AND (es09datreq <= '2016-02-02'::date))
 Filter: (es09usuari ~~
''::text)
 Buffers: shared hit=108
   ->  Index Scan using es09t1_pkey on es09t1 t1
 (cost=0.56..4.59 rows=1 width=19) (actual time=0.006..0.007 rows=2
loops=1212)
 Index Cond: ((es09codemp = 1) AND
(es09tipdoc = t3.es09tipdoc) AND (es09numdoc = t3.es09numdoc))
 Filter: (es09tipdoc ~~ '%'::text)
 Buffers: shared hit=6706
   ->  Sort  (cost=37.35..38.71 rows=547 width=32) (actual
time=0.592..2.206 rows=2919 loops=1)
 Sort Key: t2.es08tipdoc
 Sort Method: quicksort  Memory: 67kB
 Buffers: shared hit=7
 ->  Seq Scan on es08t t2  (cost=0.00..12.47 rows=547
width=32) (actual time=0.003..0.126 rows=547 loops=1)
   Buffers: shared hit=7
 ->  Materialize  (cost=0.56..287644.85 rows=716126 width=23)
(actual time=0.027..68577.800 rows=993087854 loops=1)
   Buffers: shared hit=75342
   ->  GroupAggregate  (cost=0.56..278693.28 rows=716126
width=15) (actual time=0.025..4242.453 rows=3607573 loops=1)
 Group Key: es09t1.es09codemp, es09t1.es09tipdoc,
es09t1.es09numdoc
 Buffers: shared hit=75342
 ->  Index Only Scan using es09t1_pkey on es09t1
 (cost=0.56..199919.49 rows=7161253 width=15) (actual time=0.016..1625.031
rows=7160921 loops=1)
   Index Cond: (es09codemp = 1)
   Heap Fetches: 51499
   Buffers: shared hit=75342
 Planning time: 50.129 ms
 Execution time: