On Wed, 27 Apr 2016 13:48:06 +0200
Alban Hertroys <[email protected]> wrote:
> In this case, you're using the values in adverse to filter relevant rid's for
> the FK join, so you might be better off with the inverse of above index:
> create index on report_adverses (adverse, rid);
> create index on report_drugs (drug, rid);
Hmmmm, like I reported yesterday this achieved a huge performance boost.
However, two (new) things I like to touch on while further experimenting with
this query:
1. Altering Drug IDs or Adverse names effects the executing time negatively.
In this example altering the drug IDs I search for makes the query 6 times
slower again:
SELECT r.id, r.age, r.gender, r.created, a.adverse, d.drug
FROM reports r
JOIN report_drugs d ON d.rid = r.id
JOIN report_adverses a ON a.rid = r.id
WHERE a.adverse = ANY (ARRAY['back pain - nonspecific', 'nonspecific back
pain', 'back pain'])
AND d.drug = ANY (ARRAY[9557, 17848, 17880, 18223]) ORDER BY r.created;
Different drug ID ([9557, 17848, 17880, 18223]), but the rest is the same.
Query plan:
Sort (cost=31409.71..31417.48 rows=3107 width=41) (actual
time=662.707..662.819 rows=4076 loops=1)
Sort Key: r.created
Sort Method: quicksort Memory: 415kB
-> Nested Loop (cost=25693.17..31229.48 rows=3107 width=41) (actual
time=71.748..661.743 rows=4076 loops=1)
-> Merge Join (cost=25692.61..26795.53 rows=3107 width=29) (actual
time=70.841..97.094 rows=4076 loops=1)
Merge Cond: (d.rid = a.rid)
-> Sort (cost=16332.08..16671.61 rows=135810 width=8) (actual
time=48.946..58.623 rows=135370 loops=1)
Sort Key: d.rid
Sort Method: quicksort Memory: 12498kB
-> Index Scan using report_drugs_drug_idx on report_drugs
d (cost=0.44..4753.44 rows=135810 width=8) (actual time=0.681..28.441
rows=135532 loops=1)
Index Cond: (drug = ANY
('{9557,17848,17880,18223}'::integer[]))
-> Sort (cost=9360.53..9556.94 rows=78565 width=21) (actual
time=21.880..25.969 rows=77163 loops=1)
Sort Key: a.rid
Sort Method: quicksort Memory: 6682kB
-> Index Only Scan using report_adverses_adverse_rid_idx
on report_adverses a (cost=0.56..2972.57 rows=78565 width=21) (actual
time=0.983..10.744 rows=76972 loops=1)
Index Cond: (adverse = ANY ('{"back pain -
nonspecific","nonspecific back pain","back pain"}'::text[]))
Heap Fetches: 0
-> Index Only Scan using reports_id_age_gender_created_idx on reports
r (cost=0.56..1.42 rows=1 width=20) (actual time=0.137..0.138 rows=1
loops=4076)
Index Cond: (id = d.rid)
Heap Fetches: 0
Planning time: 16.438 ms
Execution time: 663.583 ms
A total of 660 ms instead of the previous 120 ms. The amount of rows returned
are 4076, instead of the 448 rows of the previous query. Could this be the
cause of the slow down? Or is it the fact that the integers in the ANY clause
are further apart from one another and thus make for longer index searches?
The same happens with changing the adverses, some adverses come back in ~120
ms, others can take up to one second.
2. Adding extra columns ignores indexes
This one was kind of expected. Adding extra columns from both "report_drugs"
and "report_adverses" slows the lot down again. Extra columns added:
- a.recovery
- a.severity
- d.reason
- d.effectiveness
- d.duration
As I would expect, adding these columns would make the previous multi-column
indexes useless. And they indeed were not used anymore.
So my initial thought was to create new indexes on the columns I now query:
- CREATE INDEX ON report_adverses(adverse, rid, severity, recovery);
- CREATE INDEX ON report_drugs(drug, rid, reason, effectiveness, duration);
After running the query again, the new index on "report_adverses" got picked
up, but the index on "report_drugs" did not:
Sort (cost=12365.79..12366.61 rows=329 width=76) (actual
time=129.106..129.120 rows=448 loops=1)
Sort Key: r.created
Sort Method: quicksort Memory: 66kB
-> Nested Loop (cost=11212.93..12352.04 rows=329 width=76) (actual
time=31.558..128.951 rows=448 loops=1)
Join Filter: (d.rid = r.id)
-> Merge Join (cost=11212.38..11680.44 rows=329 width=64) (actual
time=30.705..39.527 rows=448 loops=1)
Merge Cond: (d.rid = a.rid)
-> Sort (cost=1503.85..1539.82 rows=14390 width=35) (actual
time=6.977..7.993 rows=14074 loops=1)
Sort Key: d.rid
Sort Method: quicksort Memory: 1202kB
-> Index Scan using report_drugs_drug_idx on report_drugs
d (cost=0.44..510.02 rows=14390 width=35) (actual time=0.567..4.638 rows=14200
loops=1)
Index Cond: (drug = ANY
('{359,360,361,362,363}'::integer[]))
-> Sort (cost=9708.53..9904.94 rows=78565 width=29) (actual
time=23.717..26.540 rows=76974 loops=1)
Sort Key: a.rid
Sort Method: quicksort Memory: 6702kB
-> Index Only Scan using
report_adverses_adverse_rid_severity_recovery_idx on report_adverses a
(cost=0.56..3320.57 rows=78565 width=29) (actual time=0.878..12.297 rows=76972
loops=1)
Index Cond: (adverse = ANY ('{"back pain -
nonspecific","nonspecific back pain","back pain"}'::text[]))
Heap Fetches: 0
-> Index Only Scan using reports_id_age_gender_created_idx on reports
r (cost=0.56..2.03 rows=1 width=20) (actual time=0.198..0.199 rows=1 loops=448)
Index Cond: (id = a.rid)
Heap Fetches: 0
Planning time: 18.310 ms
Execution time: 129.483 ms
Still fast enough ... but I was wondering why the planner would not use the new
index and instead fall back on the "report_drugs_drug_idx" single column index.
> Alban Hertroys
> --
> If you can't see the forest for the trees,
> cut the trees and you'll find there is no forest.
Cheers,
Tim
--
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general