The planner is choosing a sequential scan for my query.  I am trying
to understand why since it is clearly not the fastest choice, and 
what the proper tuning dial is to adjust here.  Here's the query:

        select count(1) from measurement where measure_id = 98;

TIA.  Here are the details:

% psql -c "\d measurement"
                                     Table "public.measurement"
   Column   |           Type           |                          Modifiers            
              
------------+--------------------------+-------------------------------------------------------------
 id         | integer                  | not null default 
nextval('public.measurement_id_seq'::text)
 host_id    | integer                  | not null
 measure_id | integer                  | not null
 value      | double precision         | not null
 duration   | double precision         | not null default 0.0
 moment     | timestamp with time zone | not null default ('now'::text)::timestamp(6) 
with time zone
 source_id  | integer                  | not null default 1
Indexes: measurement_duration_idx btree (duration),
         measurement_host_idx btree (host_id),
         measurement_measure_idx btree (measure_id),
         measurement_moment_idx btree (moment),
         measurement_source_idx btree (source_id),
         measurement_value_idx btree (value)
Foreign Key constraints: $2 FOREIGN KEY (measure_id) REFERENCES measure(id) ON UPDATE 
NO ACTION ON DELETE NO ACTION,
                         $1 FOREIGN KEY (source_id) REFERENCES measurement_source(id) 
ON UPDATE NO ACTION ON DELETE NO ACTION

% psql -c "analyze measurement"
ANALYZE

% psql -c "select count(1) from measurement"
  count  
---------
 1800866
(1 row)


% psql -c "select count(1) from measurement where measure_id = 98"
 count 
-------
 38862
(1 row)

% time psql -c "set enable_seqscan=no; explain analyze select count(1) from 
measurement where measure_id = 98"
                                                                       QUERY PLAN      
                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=215208.66..215208.66 rows=1 width=0) (actual time=904.58..904.58 
rows=1 loops=1)
   ->  Index Scan using measurement_measure_idx on measurement  (cost=0.00..215062.64 
rows=58408 width=0) (actual time=0.17..843.78 rows=38866 loops=1)
         Index Cond: (measure_id = 98)
 Total runtime: 904.77 msec
(4 rows)


real    0m1.298s
user    0m0.010s
sys     0m0.000s

% time psql -c "explain analyze select count(1) from measurement where measure_id = 98"
                                                       QUERY PLAN                      
                                  
-------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=97857.09..97857.09 rows=1 width=0) (actual time=2493.90..2493.90 
rows=1 loops=1)
   ->  Seq Scan on measurement  (cost=0.00..97711.07 rows=58408 width=0) (actual 
time=12.94..2430.08 rows=38866 loops=1)
         Filter: (measure_id = 98)
 Total runtime: 2494.11 msec
(4 rows)


real    0m2.885s
user    0m0.000s
sys     0m0.000s


This seems to be saying the planner thinks its less expensive to do the
sequential scan, but why?

Including pg_stats data in case it is relevant here.

% psql -c "select * from pg_stats where tablename = 'measurement' and attname = 
'measure_id'"
 schemaname |  tablename  |  attname   | null_frac | avg_width | n_distinct |          
   most_common_vals              |                                most_common_freqs    
                             |            histogram_bounds             | correlation 
------------+-------------+------------+-----------+-----------+------------+-------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------+-------------
 public     | measurement | measure_id |         0 |         4 |         52 | 
{104,108,113,124,106,110,101,107,112,109} | 
{0.084,0.0556667,0.052,0.05,0.048,0.0473333,0.0383333,0.0363333,0.034,0.0326667} | 
{23,36,39,43,85,89,100,111,120,122,128} |    0.232087
(1 row)

% psql -c "select name, setting from pg_settings where name like 'random%'"
       name       | setting 
------------------+---------
 random_page_cost | 4
(1 row)




---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Reply via email to