Russell Smith <mr-russ 'at' pws.com.au> writes:

> Pavel Stehule wrote:
>> Hello
>>
>> 2008/9/1 David West <[EMAIL PROTECTED]>:
>>   
>>> Thanks for your suggestion but the result is the same.
>>>
>>> Here is the explain analyse output from different queries.
>>> Select * from my_table where A is null and B = '21' limit 15
>>>
>>> "Limit  (cost=0.00..3.68 rows=15 width=128) (actual 
>>> time=85837.043..85896.140 rows=15 loops=1)"
>>> "  ->  Seq Scan on my_table this_  (cost=0.00..258789.88 rows=1055580 
>>> width=128) (actual time=85837.038..85896.091 rows=15 loops=1)"
>>> "        Filter: ((A IS NULL) AND ((B)::text = '21'::text))"
>>> "Total runtime: 85896.214 ms"
>>>
>>>     
> [snip]
>
> Further to Pavel's comments;
>
> (actual time=85837.038..85896.091 rows=15 loops=1)
>
> That's 85 seconds on a sequence scan to return the first tuple.  The table is 
> not bloated by any chance is it?

Wouldn't this be e.g. normal if the distribution of values would
be uneven, e.g. A IS NULL AND B = '21' not near the beginning of
the table data?

By the way, my newbie eyes on "pg_stats" seem to tell me that PG
doesn't collect/use statistics about the distribution of the
data, am I wrong? E.g. in that situation, when a few A IS NULL
AND B = '21' rows move from the beginning to the end of the table
data, a seqscan becomes a totally different story.. (the
correlation changes, but may not change a lot if only a few rows
move).

However, I cannot reproduce a similar situation to David's.

gc=# create table foo ( bar int, baz text );
CREATE TABLE
gc=# insert into foo ( select generate_series(0, 10000000) / 1000000, case when 
random() < 0.05 then 'Today Alcatel-Lucent has announced that P******* C**** is 
appointed non-executive Chairman and B** V******** is appointed Chief Executive 
Officer.' else null end );
INSERT 0 10000001
gc=# create index foobar on foo(bar);
CREATE INDEX
gc=# create index foobaz on foo(baz);
CREATE INDEX
gc=# explain select * from foo where baz is null and bar = '8';
                                   QUERY PLAN                                   
 
---------------------------------------------------------------------------------
 Bitmap Heap Scan on foo  (cost=1297.96..1783.17 rows=250 width=36)
   Recheck Cond: ((bar = 8) AND (baz IS NULL))
   ->  BitmapAnd  (cost=1297.96..1297.96 rows=250 width=0)
         ->  Bitmap Index Scan on foobar  (cost=0.00..595.69 rows=50000 width=0)
               Index Cond: (bar = 8)
         ->  Bitmap Index Scan on foobaz  (cost=0.00..701.90 rows=50000 width=0)
               Index Cond: (baz IS NULL)
(7 rows)

gc=# analyze foo;
ANALYZE
gc=# explain select * from foo where baz is null and bar = '8';
                                  QUERY PLAN                                  
------------------------------------------------------------------------------
 Index Scan using foobar on foo  (cost=0.00..30398.66 rows=1079089 width=154)
   Index Cond: (bar = 8)
   Filter: (baz IS NULL)
(3 rows)

This is using pg 8.3.1 and:

random_page_cost = 2
effective_cache_size = 256MB
shared_buffers = 384MB

David, is there relevant information you've forgot to tell:

- any other columns in your table?
- is table bloated?
- has table never been analyzed?
- what version of postgresql? what overriden configuration?

-- 
Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company
Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to