Russell Smith <mr-russ 'at' pws.com.au> writes:
> Pavel Stehule wrote:
>> Hello
>>
>> 2008/9/1 David West <[EMAIL PROTECTED]>:
>>
>>> Thanks for your suggestion but the result is the same.
>>>
>>> Here is the explain analyse output from different queries.
>>> Select * from my_table where A is null and B = '21' limit 15
>>>
>>> "Limit (cost=0.00..3.68 rows=15 width=128) (actual
>>> time=85837.043..85896.140 rows=15 loops=1)"
>>> " -> Seq Scan on my_table this_ (cost=0.00..258789.88 rows=1055580
>>> width=128) (actual time=85837.038..85896.091 rows=15 loops=1)"
>>> " Filter: ((A IS NULL) AND ((B)::text = '21'::text))"
>>> "Total runtime: 85896.214 ms"
>>>
>>>
> [snip]
>
> Further to Pavel's comments;
>
> (actual time=85837.038..85896.091 rows=15 loops=1)
>
> That's 85 seconds on a sequence scan to return the first tuple. The table is
> not bloated by any chance is it?
Wouldn't this be e.g. normal if the distribution of values would
be uneven, e.g. A IS NULL AND B = '21' not near the beginning of
the table data?
By the way, my newbie eyes on "pg_stats" seem to tell me that PG
doesn't collect/use statistics about the distribution of the
data, am I wrong? E.g. in that situation, when a few A IS NULL
AND B = '21' rows move from the beginning to the end of the table
data, a seqscan becomes a totally different story.. (the
correlation changes, but may not change a lot if only a few rows
move).
However, I cannot reproduce a similar situation to David's.
gc=# create table foo ( bar int, baz text );
CREATE TABLE
gc=# insert into foo ( select generate_series(0, 10000000) / 1000000, case when
random() < 0.05 then 'Today Alcatel-Lucent has announced that P******* C**** is
appointed non-executive Chairman and B** V******** is appointed Chief Executive
Officer.' else null end );
INSERT 0 10000001
gc=# create index foobar on foo(bar);
CREATE INDEX
gc=# create index foobaz on foo(baz);
CREATE INDEX
gc=# explain select * from foo where baz is null and bar = '8';
QUERY PLAN
---------------------------------------------------------------------------------
Bitmap Heap Scan on foo (cost=1297.96..1783.17 rows=250 width=36)
Recheck Cond: ((bar = 8) AND (baz IS NULL))
-> BitmapAnd (cost=1297.96..1297.96 rows=250 width=0)
-> Bitmap Index Scan on foobar (cost=0.00..595.69 rows=50000 width=0)
Index Cond: (bar = 8)
-> Bitmap Index Scan on foobaz (cost=0.00..701.90 rows=50000 width=0)
Index Cond: (baz IS NULL)
(7 rows)
gc=# analyze foo;
ANALYZE
gc=# explain select * from foo where baz is null and bar = '8';
QUERY PLAN
------------------------------------------------------------------------------
Index Scan using foobar on foo (cost=0.00..30398.66 rows=1079089 width=154)
Index Cond: (bar = 8)
Filter: (baz IS NULL)
(3 rows)
This is using pg 8.3.1 and:
random_page_cost = 2
effective_cache_size = 256MB
shared_buffers = 384MB
David, is there relevant information you've forgot to tell:
- any other columns in your table?
- is table bloated?
- has table never been analyzed?
- what version of postgresql? what overriden configuration?
--
Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company
Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36
--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance