Dear all,
Tom's comments on my previous posting encouraged me think some more about
this... and now I believe got to the heart of what I was attempting to get
accross before.
I have a fresh and hopefully clear example.
Ok lets start with a small table called 'dim0' that has a unique key called
'd0key' ( 10000 unique values ). Add to this a large table called 'fact2',
which has 1000 of these 'd0key' values. There are 3000 duplicates for each
value uniformly distributed throughout it. ( total of 3000000 rows ).
Consider the query :
SELECT
f.d0key,
count(f.val)
FROM fact2 f
WHERE f.d0key BETWEEN 270 AND 350
GROUP BY f.d0key
which has execution plan :
Aggregate (cost=0.00..102500.80 rows=2592 width=8)
-> Group (cost=0.00..102436.00 rows=25920 width=8)
-> Index Scan using fact2_pk on fact2 f (cost=0.00..102371.20
rows=25920 width=8)
If we use 351 instead of 350 we get a sequential scan.
Now examine a similar query, but with 'dim0' joined :
SELECT
f.d0key,
count(f.val)
FROM dim0 d0,
fact2 f
WHERE d0.d0key = f.d0key
AND f.d0key BETWEEN 270 AND 350
GROUP BY f.d0key
this has plan :
Aggregate (cost=0.00..103127.60 rows=2592 width=12)
-> Group (cost=0.00..103062.80 rows=25920 width=12)
-> Merge Join (cost=0.00..102998.00 rows=25920 width=12)
-> Index Scan using dim0_pk on dim0 d0 (cost=0.00..213.00
rows=10000 width=4)
-> Index Scan using fact2_pk on fact2 f (cost=0.00..102371.20
rows=25920 width=8)
No surprises there
(If we use 351, again we get a sequential scan used instead ).
So far this is all as one would expect. However suppose we substitute
'd0.d0key' in the 'AND' clause instead of 'f.d0key' to obtain :
SELECT
f.d0key,
count(f.val)
FROM dim0 d0,
fact2 f
WHERE d0.d0key = f.d0key
AND d0.d0key BETWEEN 270 AND 350
GROUP BY f.d0key
Suddenly the plan is :
Aggregate (cost=103530.27..104293.15 rows=2624 width=12)
-> Group (cost=103530.27..104227.54 rows=26244 width=12)
-> Merge Join (cost=103530.27..104161.93 rows=26244 width=12)
-> Index Scan using dim0_pk on dim0 d0 (cost=0.00..213.00
rows=10000 width=4)
-> Sort (cost=103530.27..103530.27 rows=26244 width=8)
-> Seq Scan on fact2 f (cost=0.00..101604.00 rows=26244
width=8)
Now this is interesting, I would have expected an index scan to be still
used... This behavour was what I was seeing ( in disguised form ) in the
queries of the previous posting.
( In fact to encourage an index scan changing 350 down to 313 is required )
I wonder how 7.1.x behaves when faced with this situation?... a build of an
extra 7.1.3 database I reveals the corrosponding plan for this query is
(note that for 7.1.3 the magic number for index-> sequential scan is 369
instead of 350 but bear with me) :
Aggregate (cost=0.00..118850.17 rows=2970 width=12)
-> Group (cost=0.00..118775.91 rows=29703 width=12)
-> Nested Loop (cost=0.00..118701.66 rows=29703 width=12)
-> Index Scan using dim0_pk on dim0 d0 (cost=0.00..67.99
rows=99 width=4)
-> Index Scan using fact2_pk on fact2 f (cost=0.00..1194.45
rows=300 width=8)
So that this version is using an index scan for this query ( in fact will
keep using one until after d0key=445 - so in some sense a behavour opposite
to 7.2dev is being exibited)
Now the obvious question to ask here is "why are you are griping about using
a seq scan...? ". Timing the queries reveals that the index scan is
considerably faster : specifically 10s against 60s. Additionally 7.1.3
performs the above query in 10s also - and even "out" at the "extreme end"
using d0.d0key=445 the elapsed time is just 15s .
Why am I pointing this out ? - well I hope that "field testing" the optimizer
will occasionally provide food for thought !
regards
Mark
P.s : (I am using 7.2 dev 2001-08-17 and all parameters are default apart
from shared_buffers and sort_mem)
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])