On Wed, 2005-03-09 at 23:06 +0100, Thomas Mueller wrote:
> Geo Carncross wrote:
> > On Wed, 2005-03-09 at 19:10 +0100, Thomas Mueller wrote:
> > 
> >>No index will ever solve the problem for PostgreSQL. COUNT() can't use
> >>any index, because indices know nothing about transactions.
> >>That is why every record that is counted using the index has to be
> >>checked to see if it's involved in a transaction (otherwise count()
> >>could return too high or low numbers!) - then it's cheaper to do a full
> >>table scan and that is what PostgreSQL does.
> > 
> > Is EXPLAIN really wrong here? I see PG generating the same query plan
> > for COUNT(*) as it appears to for every aggregate- and in each case it
> > says:
> > 
> >  Aggregate  (cost=123.57..123.57 rows=1 width=4)
> >    ->  Index Scan using ...
> 
> No I was wrong here I'm sorry. What I said is true for COUNT() without
> restricting WHERE.

Well, I would expect aggregates to walk the entire table- you're saying
that COUNT() isn't optimized _further_ than that because the
availability of rows isn't known. :)

That's fine.

> I guess the following happens with your query: in my example it returns
> only 600 of about 80000 records, that's a selectivity of 0.75%. So it's
> cheaper to do a index scan and check the records afterwards than to do a
> full table scan.
> But if I'm not completely wrong the 600 records have to be read, it's
> not enough to read the index (because of missing transaction information).

You're absolutely right about this.

->  Index Scan using dbmail_messages_8
        Index Cond: (mailbox_idnr = 5::bigint)
        Filter: ((status = 0::smallint) OR (status = 1::smallint))

This is the relevant part of the query-plan. It saying Pg only uses one
Index Cond: _even though_ status also exists in that very same index.


In thinking about this more, if exists (count*) is true, we STILL have
to get the id numbers for each message. Perhaps the simpler approach of:

SELECT message_idnr,status,seen_flag,recent_flag FROM dbmail_messages
                WHERE mailbox_idnr='5';

would be appropriate - after all, if exists (see db.c) we still download
all the messages.

Or was this code going to be deferred?


> > All these counts can be translated into SUM() - will it act different
> > (even though the query plan is the same) when using SUM?
> 
> No, the query plan is what will be executed if it's the same it
> shouldn't make a difference - at least for you :)
> On two machines two queries can lead to different query plans depending
> on the expected costs (IO p.e.).

Only brought up because it sounded like you were saying COUNT() was
special on Pg in that it ignores indexes.


-- 
Internet Connection High Quality Web Hosting
http://www.internetconnection.net/

Reply via email to