Re: [GENERAL] Slow join over three tables

Tim van der Linden Thu, 28 Apr 2016 17:20:03 -0700

On Thu, 28 Apr 2016 16:38:53 +0200
Alban Hertroys <[email protected]> wrote:


Hi Alban

First off ... thanks you for your time on this so far and giving me the 
educational smacks on the head :) I appreciate this a lot. 

> You're doing ~9 times as many index lookups. A slowdown of 6x of this
> part of the query seems rather reasonable.

Again, very true. But why 9 times as much?
 
> >  Planning time: 16.438 ms
> >  Execution time: 663.583 ms
> >
> > A total of 660 ms instead of the previous 120 ms. The amount of rows 
> > returned are 4076, instead of the 448 rows of the previous query. Could 
> > this be the cause of the slow down? Or is it the fact that the integers in 
> > the ANY clause are further apart from one another and thus make for longer 
> > index searches?
> 
> Technically probably both, but the index node distance (assuming a
> btree index) should be insignificant.

It is indeed a b-tree index.

> The sorting of found candidate rows on rid probably contributes the
> most. I forgot how quicksort scales to the number of items to sort,
> probably something like 2log(n) or something like that.

Looked this up and you are right, a quicksort (in C) is of nlog(n) complexity 
... and if I'm right this is "rather" linear. Less sorting is thus the message 
:)

> That said, you didn't happen to perform these tests (assuming >1)
> during a busier period of the database server? I ask because later on
> you seem to be getting fast results again after some more index
> changes and I don't see cause for the difference.

No, these test are on a local development machine with nothing else going on 
but the database cluster.

> > 2. Adding extra columns ignores indexes
> > ...
> > - a.recovery
> > - a.severity
> > - d.reason
> > - d.effectiveness
> > - d.duration
> 
> I have to admit that I'm a bit behind with the current state of the
> art of PostgreSQL, but last time I checked, the database needed to
> look at each row in the result-set for transaction visibility
> information. I recall that there was (at some point) much discussion
> whether that visibility information could be added to indexes and that
> there were strong arguments against doing so. Hence, I doubt that
> those new index-only scans skip that step.

I see.

> Unless I'm wrong there, adding non-queried fields to the index is only
> going to affect your performance adversely. Relevant fields for
> indices are those used in joins, those regularly used in conditions
> (where-clauses) and those that are regularly sorted over.
> 
> Other fields are available from the candidate rows of the result set.
> Putting them in the index just results in storing redundant
> information.

Storing redundant information and making for bloated indexes which the planner 
might choose to skip if I understand correctly. Good to know.

> > ...
> > Still fast enough ... but I was wondering why the planner would not use the 
> > new index and instead fall back on the "report_drugs_drug_idx" single 
> > column index.
> 
> I'm actually a bit suspicious of those numbers; those are different
> drug id's than those from the first explain too.

You are completely right, this was the wrong plan ... the correct plan is:

 Sort  (cost=31757.71..31765.48 rows=3107 width=76) (actual 
time=722.348..722.461 rows=4076 loops=1)
   Sort Key: r.created
   Sort Method: quicksort  Memory: 495kB
   ->  Nested Loop  (cost=26041.17..31577.48 rows=3107 width=76) (actual 
time=139.407..721.090 rows=4076 loops=1)
         ->  Merge Join  (cost=26040.61..27143.53 rows=3107 width=64) (actual 
time=139.396..170.446 rows=4076 loops=1)
               Merge Cond: (d.rid = a.rid)
               ->  Sort  (cost=16332.08..16671.61 rows=135810 width=35) (actual 
time=108.866..119.143 rows=135370 loops=1)
                     Sort Key: d.rid
                     Sort Method: quicksort  Memory: 13992kB
                     ->  Index Scan using report_drugs_drug_idx on report_drugs 
d  (cost=0.44..4753.44 rows=135810 width=35) (actual time=0.038..53.758 
rows=135532 loops=1)
                           Index Cond: (drug = ANY 
('{9557,17848,17880,18223}'::integer[]))
               ->  Sort  (cost=9708.53..9904.94 rows=78565 width=29) (actual 
time=30.517..34.876 rows=77163 loops=1)
                     Sort Key: a.rid
                     Sort Method: quicksort  Memory: 6702kB
                     ->  Index Only Scan using 
report_adverses_adverse_rid_severity_recovery_idx on report_adverses a  
(cost=0.56..3320.57 rows=78565 width=29) (actual time=1.005..16.135 rows=76972 
loops=1)
                           Index Cond: (adverse = ANY ('{"back pain - 
nonspecific","nonspecific back pain","back pain"}'::text[]))
                           Heap Fetches: 0
         ->  Index Only Scan using reports_id_age_gender_created_idx on reports 
r  (cost=0.56..1.42 rows=1 width=20) (actual time=0.134..0.135 rows=1 
loops=4076)
               Index Cond: (id = d.rid)
               Heap Fetches: 0
 Planning time: 29.415 ms
 Execution time: 723.545 ms

And this is now indeed much closer to the ~660 ms from before, it doesn't make 
much of a difference after all.

Cheers,
Tim


-- 
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Slow join over three tables

Reply via email to