Hi,
I've noticed that
the cost estimates for a lot of my queries are consistently far to high.
Sometimes it's because the row estimates are wrong, like
this:
explain analyze
select logtime from loginlog where
uid='Ymogen::YM_User::3e2c0869c2fdd26d8a74d218d5a6ff585d490560' and result =
'Success' order by logtime desc limit 3;
NOTICE: QUERY PLAN:
NOTICE: QUERY PLAN:
Limit
(cost=0.00..221.85 rows=3 width=8) (actual time=0.21..2.39 rows=3
loops=1)
-> Index Scan Backward using loginlog_logtime_idx on loginlog (cost=0.00..12846.69 rows=174 width=8) (actual time=0.20..2.37 rows=4 loops=1)
Total runtime: 2.48 msec
-> Index Scan Backward using loginlog_logtime_idx on loginlog (cost=0.00..12846.69 rows=174 width=8) (actual time=0.20..2.37 rows=4 loops=1)
Total runtime: 2.48 msec
The row estimate
here is off by a factor of 50, but the cost estimate is off by a factor
of 5000.
Sometimes the row
estimates are good, but the costs are still too high:
explain analyze
select u.email from ym_user u join mobilepm m on (m.ownerid = u._id) where
m.status = 'Validated' and m.network = 'TMOBILEUK';
NOTICE: QUERY PLAN:
NOTICE: QUERY PLAN:
Nested Loop
(cost=0.00..2569.13 rows=441 width=145) (actual time=1.93..248.57 rows=553
loops=1)
-> Seq Scan on mobilepm m (cost=0.00..795.11 rows=441 width=58) (actual time=1.69..132.83 rows=553 loops=1)
-> Index Scan using ym_user_id_idx on ym_user u (cost=0.00..4.01 rows=1 width=87) (actual time=0.19..0.20 rows=1 loops=553)
Total runtime: 249.47 msec
-> Seq Scan on mobilepm m (cost=0.00..795.11 rows=441 width=58) (actual time=1.69..132.83 rows=553 loops=1)
-> Index Scan using ym_user_id_idx on ym_user u (cost=0.00..4.01 rows=1 width=87) (actual time=0.19..0.20 rows=1 loops=553)
Total runtime: 249.47 msec
loginlog has 180000
rows, mobilepm has 12000, ym_user has 50000, and they've all been analyzed prior
to running the query.
The server is a
Quad PIII 700 Xeon/1MB cache, 3GB RAM, hardware RAID 10 on two
SCSI channels with 128MB write-back cache.
I've lowered the
random_page_cost to 2 to reflect the decent disk IO, but I suppose the fact that
the DB & indexes are essentially all cached in RAM might also be affecting
the results, although effective_cache_size is set to a realistic 262144
(2GB). Those planner params in full:
#effective_cache_size = 1000 # default in 8k
pages
#random_page_cost = 4
#cpu_tuple_cost = 0.01
#cpu_index_tuple_cost = 0.001
#cpu_operator_cost = 0.0025
effective_cache_size = 262144 # 2GB of FS cache
random_page_cost = 2
#random_page_cost = 4
#cpu_tuple_cost = 0.01
#cpu_index_tuple_cost = 0.001
#cpu_operator_cost = 0.0025
effective_cache_size = 262144 # 2GB of FS cache
random_page_cost = 2
For now the planner seems to be making the right choices, but
my concern is that at some point the planner might start making some bad
decisions, especially on more complex queries. Should I bother tweaking
the planner costs more, and if so which ones? Am I fretting over
nothing?
Cheers
Matt
Matt Clark
Ymogen Ltd
[EMAIL PROTECTED]
corp.ymogen.net