On Thu, 28 Apr 2016 16:38:53 +0200
Alban Hertroys <[email protected]> wrote:
Hi Alban
First off ... thanks you for your time on this so far and giving me the
educational smacks on the head :) I appreciate this a lot.
> You're doing ~9 times as many index lookups. A slowdown of 6x of this
> part of the query seems rather reasonable.
Again, very true. But why 9 times as much?
> > Planning time: 16.438 ms
> > Execution time: 663.583 ms
> >
> > A total of 660 ms instead of the previous 120 ms. The amount of rows
> > returned are 4076, instead of the 448 rows of the previous query. Could
> > this be the cause of the slow down? Or is it the fact that the integers in
> > the ANY clause are further apart from one another and thus make for longer
> > index searches?
>
> Technically probably both, but the index node distance (assuming a
> btree index) should be insignificant.
It is indeed a b-tree index.
> The sorting of found candidate rows on rid probably contributes the
> most. I forgot how quicksort scales to the number of items to sort,
> probably something like 2log(n) or something like that.
Looked this up and you are right, a quicksort (in C) is of nlog(n) complexity
... and if I'm right this is "rather" linear. Less sorting is thus the message
:)
> That said, you didn't happen to perform these tests (assuming >1)
> during a busier period of the database server? I ask because later on
> you seem to be getting fast results again after some more index
> changes and I don't see cause for the difference.
No, these test are on a local development machine with nothing else going on
but the database cluster.
> > 2. Adding extra columns ignores indexes
> > ...
> > - a.recovery
> > - a.severity
> > - d.reason
> > - d.effectiveness
> > - d.duration
>
> I have to admit that I'm a bit behind with the current state of the
> art of PostgreSQL, but last time I checked, the database needed to
> look at each row in the result-set for transaction visibility
> information. I recall that there was (at some point) much discussion
> whether that visibility information could be added to indexes and that
> there were strong arguments against doing so. Hence, I doubt that
> those new index-only scans skip that step.
I see.
> Unless I'm wrong there, adding non-queried fields to the index is only
> going to affect your performance adversely. Relevant fields for
> indices are those used in joins, those regularly used in conditions
> (where-clauses) and those that are regularly sorted over.
>
> Other fields are available from the candidate rows of the result set.
> Putting them in the index just results in storing redundant
> information.
Storing redundant information and making for bloated indexes which the planner
might choose to skip if I understand correctly. Good to know.
> > ...
> > Still fast enough ... but I was wondering why the planner would not use the
> > new index and instead fall back on the "report_drugs_drug_idx" single
> > column index.
>
> I'm actually a bit suspicious of those numbers; those are different
> drug id's than those from the first explain too.
You are completely right, this was the wrong plan ... the correct plan is:
Sort (cost=31757.71..31765.48 rows=3107 width=76) (actual
time=722.348..722.461 rows=4076 loops=1)
Sort Key: r.created
Sort Method: quicksort Memory: 495kB
-> Nested Loop (cost=26041.17..31577.48 rows=3107 width=76) (actual
time=139.407..721.090 rows=4076 loops=1)
-> Merge Join (cost=26040.61..27143.53 rows=3107 width=64) (actual
time=139.396..170.446 rows=4076 loops=1)
Merge Cond: (d.rid = a.rid)
-> Sort (cost=16332.08..16671.61 rows=135810 width=35) (actual
time=108.866..119.143 rows=135370 loops=1)
Sort Key: d.rid
Sort Method: quicksort Memory: 13992kB
-> Index Scan using report_drugs_drug_idx on report_drugs
d (cost=0.44..4753.44 rows=135810 width=35) (actual time=0.038..53.758
rows=135532 loops=1)
Index Cond: (drug = ANY
('{9557,17848,17880,18223}'::integer[]))
-> Sort (cost=9708.53..9904.94 rows=78565 width=29) (actual
time=30.517..34.876 rows=77163 loops=1)
Sort Key: a.rid
Sort Method: quicksort Memory: 6702kB
-> Index Only Scan using
report_adverses_adverse_rid_severity_recovery_idx on report_adverses a
(cost=0.56..3320.57 rows=78565 width=29) (actual time=1.005..16.135 rows=76972
loops=1)
Index Cond: (adverse = ANY ('{"back pain -
nonspecific","nonspecific back pain","back pain"}'::text[]))
Heap Fetches: 0
-> Index Only Scan using reports_id_age_gender_created_idx on reports
r (cost=0.56..1.42 rows=1 width=20) (actual time=0.134..0.135 rows=1
loops=4076)
Index Cond: (id = d.rid)
Heap Fetches: 0
Planning time: 29.415 ms
Execution time: 723.545 ms
And this is now indeed much closer to the ~660 ms from before, it doesn't make
much of a difference after all.
Cheers,
Tim
--
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general