On Wed, 2005-03-09 at 23:06 +0100, Thomas Mueller wrote:
> Geo Carncross wrote:
> > On Wed, 2005-03-09 at 19:10 +0100, Thomas Mueller wrote:
> >
> >>No index will ever solve the problem for PostgreSQL. COUNT() can't use
> >>any index, because indices know nothing about transactions.
> >>That is why every record that is counted using the index has to be
> >>checked to see if it's involved in a transaction (otherwise count()
> >>could return too high or low numbers!) - then it's cheaper to do a full
> >>table scan and that is what PostgreSQL does.
> >
> > Is EXPLAIN really wrong here? I see PG generating the same query plan
> > for COUNT(*) as it appears to for every aggregate- and in each case it
> > says:
> >
> > Aggregate (cost=123.57..123.57 rows=1 width=4)
> > -> Index Scan using ...
>
> No I was wrong here I'm sorry. What I said is true for COUNT() without
> restricting WHERE.
Well, I would expect aggregates to walk the entire table- you're saying
that COUNT() isn't optimized _further_ than that because the
availability of rows isn't known. :)
That's fine.
> I guess the following happens with your query: in my example it returns
> only 600 of about 80000 records, that's a selectivity of 0.75%. So it's
> cheaper to do a index scan and check the records afterwards than to do a
> full table scan.
> But if I'm not completely wrong the 600 records have to be read, it's
> not enough to read the index (because of missing transaction information).
You're absolutely right about this.
-> Index Scan using dbmail_messages_8
Index Cond: (mailbox_idnr = 5::bigint)
Filter: ((status = 0::smallint) OR (status = 1::smallint))
This is the relevant part of the query-plan. It saying Pg only uses one
Index Cond: _even though_ status also exists in that very same index.
In thinking about this more, if exists (count*) is true, we STILL have
to get the id numbers for each message. Perhaps the simpler approach of:
SELECT message_idnr,status,seen_flag,recent_flag FROM dbmail_messages
WHERE mailbox_idnr='5';
would be appropriate - after all, if exists (see db.c) we still download
all the messages.
Or was this code going to be deferred?
> > All these counts can be translated into SUM() - will it act different
> > (even though the query plan is the same) when using SUM?
>
> No, the query plan is what will be executed if it's the same it
> shouldn't make a difference - at least for you :)
> On two machines two queries can lead to different query plans depending
> on the expected costs (IO p.e.).
Only brought up because it sounded like you were saying COUNT() was
special on Pg in that it ignores indexes.
--
Internet Connection High Quality Web Hosting
http://www.internetconnection.net/