On Mon, 2005-05-16 at 14:35 -0400, Tom Lane wrote:
> "Jeffrey W. Baker" <[EMAIL PROTECTED]> writes:
> > On Mon, 2005-05-16 at 09:53 -0400, Tom Lane wrote:
> >> This is a fallacy, and I think your concern is largely mistaken.  Have
> >> you experimented with the cases you are worried about?
> 
> > Perhaps I have not stated the problem clearly.  Believe me, I have
> > experimented extensively with this problem.
> 
> Sorry, perhaps I wasn't clear: have you experimented *with CVS tip*?
> The current code is certainly capable of choosing either seqscan,
> bitmap scan, or traditional index scan depending on the given query.
> For example,

[...]

> I think the work that's most needed at the moment is to test the
> bitmap-scan cost model to see if it has much to do with reality...

The cost model seems to work pretty darn well, as a matter of fact.  

This table is in-order by id, in-order by date, and randomly ordered by
"random".

scratch=# \d test
     Table "public.test"
 Column |  Type   | Modifiers 
--------+---------+-----------
 id     | integer | 
 date   | date    | 
 random | integer | 
Indexes:
    "test_pk" UNIQUE, btree (id)
    "test_date_idx" btree (date)
    "test_rand_idx" btree (random)

scratch=# select count(1) from test;
  count   
----------
 10000000

The size of the tables and indexes is about 1G, or double physical
memory.  I tested by selecting on the random column.  For 100 random
values, I selected the row that matches exactly, then in ranges of 1000,
10000, 100000, and 1000000.  These touch roughly 5, 50, 500, and 5000
tuples, respectively.  The test is repeated for index scan, bitmap scan,
and sequential scan.

Example query:

        select count(1) 
          from test  
         where random >= 1429076987 
           and random < 1429076987 + 10000;

                                                          QUERY PLAN            
                                              
------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=171.31..171.31 rows=1 width=0) (actual time=0.996..1.000 
rows=1 loops=1)
   ->  Bitmap Heap Scan on test  (cost=2.26..171.20 rows=43 width=0) (actual 
time=0.145..0.829 rows=42 loops=1)
         Recheck Cond: ((random >= 1429076987) AND (random < 1429086987))
         ->  Bitmap Index Scan on test_rand_idx  (cost=0.00..2.26 rows=43 
width=0) (actual time=0.063..0.063 rows=42 loops=1)
               Index Cond: ((random >= 1429076987) AND (random < 1429086987))
 Total runtime: 1.114 ms

100 queries returning |  1  |  5  |  50  |  500  |  5000  |
----------------------+-----+-----+------+-------+--------+
bitmap scan           |  1s |  2s |  13s | 1m41s | 11m16s |
index scan            |  1s |  1s |  12s | 2m6s  | 24m19s |
----------------------+-----+-----+------+-------+--------+
sequential scan       |             28m20s                | 

The planner uses index scan for the first two columns, and chooses
bitmap scan for the rest.  This would seem to be a reasonable decision,
given the results.  The only real problem with the cost model in further
testing is that it doesn't switch to seq scan quickly enough.  If bitmap
scan is disabled, the planner will pick index scan even in cases when
sequential scan is 10x faster:

scratch=# set enable_bitmapscan to off;
SET
scratch=# explain analyze select count(1) from test where random >= 1429076987  
and  random < 1429076987 + 10000000;
                                                                 QUERY PLAN     
                                                      
      
--------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=170142.03..170142.03 rows=1 width=0) (actual 
time=177419.182..177419.185 rows=1 loops=1)
   ->  Index Scan using test_rand_idx on test  (cost=0.00..170034.11 rows=43167 
width=0) (actual time=0.035..177255.696 rows=46764 loops=1)
         Index Cond: ((random >= 1429076987) AND (random < 1439076987))
 Total runtime: 177419.302 ms
(4 rows)

scratch=# set enable_indexscan to off;
SET
scratch=# explain analyze select count(1) from test where random >= 1429076987  
and  random < 1429076987 + 10000000;
                                                      QUERY PLAN                
                                      
----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=204165.55..204165.55 rows=1 width=0) (actual 
time=12334.042..12334.045 rows=1 loops=1)
   ->  Seq Scan on test  (cost=0.00..204057.62 rows=43167 width=0) (actual 
time=17.436..12174.150 rows=46764 loops=1)
         Filter: ((random >= 1429076987) AND (random < 1439076987))
 Total runtime: 12334.156 ms
(4 rows)

Obviously in this case sequential scan was (would have been) a huge win.
Incrementing random_page_cost from 4 (the default) to 5 causes the
planner to make a better decision.

-jwb



---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Reply via email to