Re: [GENERAL] Performance problem on 2 PG versions on same query

2014-11-06 Thread Rémy-Christophe Schermesser
Thank you very much for the insights (and how to read correctly a explain
:)).
There is a difference in data between the 2 databases, crons where running
on one and not the other.

But your insights enabled us to dive deeper, thanks !

2014-11-05 18:30 GMT+01:00 Tom Lane t...@sss.pgh.pa.us:

 Tomas Vondra t...@fuzzy.cz writes:
  Dne 5 Listopad 2014, 18:10, Tom Lane napsal(a):
  For some reason there's over 1000 times more rows in andalertsmatch in
  the 9.1.14 installation.  I'm betting on a foulup somewhere in the data
  dump/restore process.

  I'd bet that's mostly due to rescans in the merge join ...

 Doh, I should have thought of that :-(.  But still, the difference in
 the number of final output rows is pretty compelling evidence that
 it's not really the same data.

 regards, tom lane



[GENERAL] Performance problem on 2 PG versions on same query

2014-11-05 Thread Rémy-Christophe Schermesser
Hi,

We have 2 instances of PG, one in 9.1.1 and 9.1.14. They have the same
data, schema, PG configuration, and are almost identical machines, same
number of cores and memory, but different cloud provider. The data was
transferred with a pg_dump/pg_restore. We ran VACUUM ANALYSE, ANALYSE, and
REINDEX on both machines.

One query take ~11 seconds on 9.1.1 and ~25 minutes on 9.1.14.

Can you explain why there is such a big difference, and how to fix it on
9.1.14.
On the explain plans (below), there is two difference that we can not
explain :

   - Using of Hash Anti Join in 9.1.1, and Merge Anti Join in 9.1.14
   - The usage of the index andalertsmatch_a_mid_idx in 9.1.14 and no index
   on 9.1.1

Thanks for your help,


*Here is the query : *
SELECT a.* , e.id AS eid, e.fk_compc_id AS ecid
FROM gsm.andalertsteam a
JOIN gsm.events e ON (a.fk_team_id = e.part_a1_id OR e.part_b1_id =
a.fk_team_id)
LEFT OUTER JOIN  gsm.andalertsmatch am ON e.id = am.a_mid AND am.a_uid =
a.t_uid AND am.a_bundle = a.t_bundle
WHERE (e.eventday = 1415232000 OR e.eventday = 1415059200 OR e.eventday =
1415145600)
AND e.fk_status_id != 2
AND am.a_mid IS NULL AND a.fk_comp_id IS NULL
LIMIT 1000;


*Here is the explain plan on 9.1.1 :*
Limit  (cost=100414.92..107502.31 rows=1000 width=279) (actual
time=6200.302..11650.567 rows=1000 loops=1)
-  Hash Anti Join  (cost=100414.92..118856.31 rows=2602 width=279) (actual
time=6200.302..11650.567 rows=1000 loops=1)
  Hash Cond: ((e.id = am.a_mid) AND ((a.t_uid)::text =
(am.a_uid)::text) AND ((a.t_bundle)::text = (am.a_bundle)::text))
  -  Nested Loop  (cost=88.78..7785.80 rows=2655 width=279) (actual
time=190.009..9470.460 rows=7951 loops=1)
-  Seq Scan on events e  (cost=0.00..1650.14 rows=2 width=32)
(actual time=40.002..260.013 rows=8 loops=1)
  Filter: ((fk_status_id  2) AND ((eventday = 1415232000)
OR (eventday = 1415059200) OR (eventday = 141514560
0)))
-  Bitmap Heap Scan on andalertsteam a  (cost=88.78..3052.59
rows=1016 width=263) (actual time=178.759..1151.306 r
ows=994 loops=8)
  Recheck Cond: ((fk_team_id = e.part_a1_id) OR
(e.part_b1_id = fk_team_id))
  Filter: (fk_comp_id IS NULL)
  -  BitmapOr  (cost=88.78..88.78 rows=1016 width=0)
(actual time=175.008..175.008 rows=0 loops=8)
-  Bitmap Index Scan on
andalertsmatch_fk_team_id_idx  (cost=0.00..44.26 rows=508 width=0) (actual
tim
e=37.502..37.502 rows=555 loops=8)
  Index Cond: (fk_team_id = e.part_a1_id)
-  Bitmap Index Scan on
andalertsmatch_fk_team_id_idx  (cost=0.00..44.26 rows=508 width=0) (actual
tim
e=137.507..137.507 rows=4192 loops=8)
  Index Cond: (e.part_b1_id = fk_team_id)
  -  Hash  (cost=71287.87..71287.87 rows=1064987 width=52) (actual
time=2150.105..2150.105 rows=1064987 loops=1)
Buckets: 65536  Batches: 2  Memory Usage: 44488kB
-  Seq Scan on andalertsmatch am  (cost=0.00..71287.87
rows=1064987 width=52) (actual time=0.000..1680.077 rows=10
64987 loops=1)

*Here is the explain plan on 9.1.14 :*
Limit  (cost=20.64..73294.62 rows=1000 width=279) (actual
time=1419311.904..1419400.785 rows=3 loops=1)
   -  Merge Anti Join  (cost=20.64..193463.96 rows=2640 width=279) (actual
time=1419311.902..1419400.783 rows=3 loops=1)
 Merge Cond: (e.id = am.a_mid)
 Join Filter: (((am.a_uid)::text = (a.t_uid)::text) AND
((am.a_bundle)::text = (a.t_bundle)::text))
 -  Nested Loop  (cost=20.64..8045.28 rows=2694 width=279) (actual
time=13.230..555.366 rows=121063 loops=1)
   -  Index Scan using events_pkey on events e
 (cost=0.00..3369.07 rows=2 width=32) (actual time=12.494..14.003 rows=41
loops=1)
 Filter: ((fk_status_id  2) AND ((eventday =
1415232000) OR (eventday = 1415059200) OR (eventday = 1415145600)))
   -  Bitmap Heap Scan on andalertsteam a
 (cost=20.64..2322.43 rows=1045 width=263) (actual time=0.467..8.612
rows=2953 loops=41)
 Recheck Cond: ((fk_team_id = e.part_a1_id) OR
(e.part_b1_id = fk_team_id))
 Filter: (fk_comp_id IS NULL)
 -  BitmapOr  (cost=20.64..20.64 rows=1045 width=0)
(actual time=0.335..0.335 rows=0 loops=41)
   -  Bitmap Index Scan on
andalertsmatch_fk_team_id_idx  (cost=0.00..10.19 rows=522 width=0) (actual
time=0.079..0.079 rows=401 loops=41)
 Index Cond: (fk_team_id = e.part_a1_id)
   -  Bitmap Index Scan on
andalertsmatch_fk_team_id_idx  (cost=0.00..10.19 rows=522 width=0) (actual
time=0.253..0.253 rows=2558 loops=41)
 Index Cond: (e.part_b1_id = fk_team_id)
 -  Index Scan using andalertsmatch_a_mid_idx on andalertsmatch am
 (cost=0.00..180798.61 rows=1173762 width=52) (actual
time=0.015..875294.427 rows=1826118122 loops=1)
 Total 

Re: [GENERAL] Performance problem on 2 PG versions on same query

2014-11-05 Thread Tomas Vondra
Hi,

Dne 5 Listopad 2014, 17:31, Rémy-Christophe Schermesser napsal(a):
 Hi,

 We have 2 instances of PG, one in 9.1.1 and 9.1.14. They have the same
 data, schema, PG configuration, and are almost identical machines, same
 number of cores and memory, but different cloud provider. The data was
 transferred with a pg_dump/pg_restore. We ran VACUUM ANALYSE, ANALYSE, and
 REINDEX on both machines.

What do you mean by cloud provider? Are you installing and configuring
the machine on your own, or is this set-up by the provider? BTW you should
do basic benchmarking first - numbers reported by the providers are just
random numbers until you verify them.

 One query take ~11 seconds on 9.1.1 and ~25 minutes on 9.1.14.

Well, the first observation is that the queries produce different results:

Limit  (cost=100414.92..107502.31 rows=1000 width=279) (actual
time=6200.302..11650.567 rows=1000 loops=1)

Limit  (cost=20.64..73294.62 rows=1000 width=279) (actual
time=1419311.904..1419400.785 rows=3 loops=1)

So while on 9.1.1 it produces 1000 rows very quickly, on 9.1.14 it only
ever finds 3 rows (so the query needs to scan all the data, and the abort
early does not trigger).

There are other differences, though. For example on 9.1.1 the nested loop
returns ~8k rows:

Nested Loop  (cost=88.78..7785.80 rows=2655 width=279) (actual
time=190.009..9470.460 rows=7951 loops=1)

while on 9.1.14 it produces ~120k rows:

Nested Loop  (cost=20.64..8045.28 rows=2694 width=279) (actual
time=13.230..555.366 rows=121063 loops=1)

This may be one of the reasons why the database decided to use different
join method.

Are there any differences in settings between the two machines (e.g.
work_mem)?

regards
Tomas



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance problem on 2 PG versions on same query

2014-11-05 Thread Tom Lane
Tomas Vondra t...@fuzzy.cz writes:
 Dne 5 Listopad 2014, 17:31, Rémy-Christophe Schermesser napsal(a):
 We have 2 instances of PG, one in 9.1.1 and 9.1.14. They have the same
 data, schema, PG configuration, and are almost identical machines, same
 number of cores and memory, but different cloud provider. The data was
 transferred with a pg_dump/pg_restore. We ran VACUUM ANALYSE, ANALYSE, and
 REINDEX on both machines.

 Well, the first observation is that the queries produce different results:

Yeah.  Another reason to not believe that the databases contain identical
data is here:

-  Seq Scan on andalertsmatch am  (cost=0.00..71287.87
 rows=1064987 width=52) (actual time=0.000..1680.077 rows=1064987 loops=1)

 -  Index Scan using andalertsmatch_a_mid_idx on andalertsmatch am
 (cost=0.00..180798.61 rows=1173762 width=52) (actual
 time=0.015..875294.427 rows=1826118122 loops=1)

For some reason there's over 1000 times more rows in andalertsmatch in
the 9.1.14 installation.  I'm betting on a foulup somewhere in the data
dump/restore process.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance problem on 2 PG versions on same query

2014-11-05 Thread Tomas Vondra
Dne 5 Listopad 2014, 18:10, Tom Lane napsal(a):
 Tomas Vondra t...@fuzzy.cz writes:
 Dne 5 Listopad 2014, 17:31, R??my-Christophe Schermesser napsal(a):
 We have 2 instances of PG, one in 9.1.1 and 9.1.14. They have the same
 data, schema, PG configuration, and are almost identical machines, same
 number of cores and memory, but different cloud provider. The data was
 transferred with a pg_dump/pg_restore. We ran VACUUM ANALYSE, ANALYSE,
 and
 REINDEX on both machines.

 Well, the first observation is that the queries produce different
 results:

 Yeah.  Another reason to not believe that the databases contain identical
 data is here:

-  Seq Scan on andalertsmatch am  (cost=0.00..71287.87
 rows=1064987 width=52) (actual time=0.000..1680.077 rows=1064987
 loops=1)

 -  Index Scan using andalertsmatch_a_mid_idx on andalertsmatch
 am
 (cost=0.00..180798.61 rows=1173762 width=52) (actual
 time=0.015..875294.427 rows=1826118122 loops=1)

 For some reason there's over 1000 times more rows in andalertsmatch in
 the 9.1.14 installation.  I'm betting on a foulup somewhere in the data
 dump/restore process.

I'd bet that's mostly due to rescans in the merge join ...

Tomas



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance problem on 2 PG versions on same query

2014-11-05 Thread Tom Lane
Tomas Vondra t...@fuzzy.cz writes:
 Dne 5 Listopad 2014, 18:10, Tom Lane napsal(a):
 For some reason there's over 1000 times more rows in andalertsmatch in
 the 9.1.14 installation.  I'm betting on a foulup somewhere in the data
 dump/restore process.

 I'd bet that's mostly due to rescans in the merge join ...

Doh, I should have thought of that :-(.  But still, the difference in
the number of final output rows is pretty compelling evidence that
it's not really the same data.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general