[SQL] On Differing Optimizer Choices ( Again)

Mark kirkwood Tue, 28 Aug 2001 04:41:27 -0700
Dear all,

Tom's comments on my previous posting encouraged me think some more about 
this... and now I believe got to the heart of what I was attempting to get 
accross before. 

I have a fresh and hopefully clear example.

Ok lets start with a small table called 'dim0' that has a unique key called 
'd0key'  ( 10000 unique values ). Add to this a large table called 'fact2', 
which has 1000 of these 'd0key' values. There are 3000 duplicates for each 
value uniformly distributed throughout it. ( total of 3000000 rows ).

Consider the query :

SELECT
       f.d0key,
       count(f.val)
FROM fact2 f
WHERE f.d0key BETWEEN  270 AND 350
GROUP BY f.d0key

which has execution plan :

Aggregate  (cost=0.00..102500.80 rows=2592 width=8)
  ->  Group  (cost=0.00..102436.00 rows=25920 width=8)
        ->  Index Scan using fact2_pk on fact2 f  (cost=0.00..102371.20 
rows=25920 width=8)

If we use 351 instead of 350 we get a sequential scan.

Now examine a similar query, but with 'dim0' joined :

SELECT
       f.d0key,
       count(f.val)
FROM dim0 d0,
     fact2 f
WHERE d0.d0key = f.d0key
AND   f.d0key BETWEEN 270 AND 350
GROUP BY f.d0key

this has plan :

Aggregate  (cost=0.00..103127.60 rows=2592 width=12)
  ->  Group  (cost=0.00..103062.80 rows=25920 width=12)
        ->  Merge Join  (cost=0.00..102998.00 rows=25920 width=12)
              ->  Index Scan using dim0_pk on dim0 d0  (cost=0.00..213.00 
rows=10000 width=4)
              ->  Index Scan using fact2_pk on fact2 f  (cost=0.00..102371.20 
rows=25920 width=8)

No surprises there 

(If we use 351, again we get a sequential scan used instead ).

So far this is all as one would expect. However suppose we substitute 
'd0.d0key' in the 'AND' clause instead of 'f.d0key' to obtain :

SELECT
       f.d0key,
       count(f.val)
FROM dim0 d0,
     fact2 f
WHERE d0.d0key = f.d0key
AND   d0.d0key BETWEEN 270 AND 350
GROUP BY f.d0key

Suddenly the plan is :

Aggregate  (cost=103530.27..104293.15 rows=2624 width=12)
  ->  Group  (cost=103530.27..104227.54 rows=26244 width=12)
        ->  Merge Join  (cost=103530.27..104161.93 rows=26244 width=12)
              ->  Index Scan using dim0_pk on dim0 d0  (cost=0.00..213.00 
rows=10000 width=4)
              ->  Sort  (cost=103530.27..103530.27 rows=26244 width=8)
                    ->  Seq Scan on fact2 f  (cost=0.00..101604.00 rows=26244 
width=8)

Now this is interesting, I would have expected an index scan to be still 
used... This behavour was what I was seeing ( in disguised form ) in the 
queries of the previous posting.

( In fact to encourage an index scan changing 350 down to 313 is required )

I wonder how 7.1.x behaves when faced with this situation?... a build of an 
extra 7.1.3 database I reveals the corrosponding plan for this query is  
(note that for 7.1.3 the magic number for index-> sequential scan is 369 
instead of 350 but bear with me) :

Aggregate  (cost=0.00..118850.17 rows=2970 width=12)
  ->  Group  (cost=0.00..118775.91 rows=29703 width=12)
        ->  Nested Loop  (cost=0.00..118701.66 rows=29703 width=12)
              ->  Index Scan using dim0_pk on dim0 d0  (cost=0.00..67.99 
rows=99 width=4)
              ->  Index Scan using fact2_pk on fact2 f  (cost=0.00..1194.45 
rows=300 width=8)

So that this version is using an index scan for this query ( in fact will 
keep using one until after d0key=445 - so in some sense a behavour opposite 
to 7.2dev is being exibited)

Now the obvious question to ask here is "why are you are griping about using 
a seq scan...? ". Timing the queries reveals that the index scan is 
considerably faster : specifically  10s against 60s. Additionally 7.1.3 
performs the above query in 10s also - and even "out" at the "extreme end" 
using d0.d0key=445 the elapsed time is just 15s .

Why am I pointing this out ? - well I hope that "field testing" the optimizer 
will occasionally provide food for thought ! 

regards

Mark

P.s : (I am using 7.2 dev 2001-08-17 and all parameters are default apart 
from shared_buffers and sort_mem)


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[SQL] On Differing Optimizer Choices ( Again)

Reply via email to