Re: [HACKERS] count(*) performance improvement ideas

Mark Mielke Wed, 12 Mar 2008 08:48:05 -0700

Pavan Deolasee wrote:

I am reading discussion about improving count(*) performance. I have
also seen a TODO for this.


Many people have suggested TRIGGER based solution to the slow count(*)
problem. I looked at the following link which presents the solution
neatly.

http://www.varlena.com/GeneralBits/120.php

But how does that really work for SERIALIZABLE transactions ? If
two concurrent transactions INSERT/DELETE rows from a table,
the trigger execution of one of the transactions is bound to fail
because of concurrent access. Even for READ COMMITTED transactions,
the trigger execution would wait if the other transaction has executed
the trigger on the same table. Well, I think the READ COMMITTED case
can be handled with DEFERRED triggers, but that may require queuing up
too many triggers if there are many inserts/deletes in a transaction.

Running trigger for every insert/delete seems too expensive. I wonder
if we can have a separate "counter" table (as suggested in the TRIGGER
based solution) and track total number of tuples inserted and deleted
in a transaction (and all the committed subtransactions). We then
execute a single UPDATE at the end of the transaction. With HOT,
updating the "counter" table should not be a big pain since all these
updates can potentially be HOT updates. Also since the update of
the "counter" table happens at the commit time, other transactions
inserting/deleting from the same user table may need to wait for a
very small period on the "counter" table tuple.

This still doesn't solve the serializable transaction problem
though. But I am sure we can figure out some solution for that case
as well if we agree on the general approach.

I am sure this must have been discussed before. So what are the

objections

If you are talking about automatically doing this for every table - Ihave an objection that the performance impact seems unwarranted againstthe gain. We are still talking about every insert or update updatingsome counter table, with the only mitigating factor being that thetrigger would be coded deeper into PostgreSQL theoretically making itcheaper?

You can already today create a trigger on insert that will append to asummary table of some sort, whose only purpose is to maintain counts. Atthe simplest, it is as somebody else suggested where you might have theother table only store the primary keys with foreign key references backto the main table for handling deletes and updates. Storing transactionnumbers and such might allow the data to be reduced in terms of size(and therefore elapsed time to scan), but it seems complex.

If this really is a problem that must be solved - I prefer thesuggestion from the past of keeping track of live rows per block for acertain transaction range, and any that fall within this range can checkoff this block quickly with an exact count, then the exceptional blocks(the ones being changed) can be scanned to be sure. But, it's stillpretty complicated to implement right and maintain, for what is probablylimited gain. I don't personally buy into the need to do exact count(*)on a whole table quickly. I know people ask for it - but I find thesesame people either confused, or trying to use this functionality toaccomplish some other end, under the assumption that because they canget counts faster from other databases, therefore PostgreSQL should doit as well. I sometimes wonder whether these people would even notice ifPostgreSQL translated count(*) on the whole table to query reltuples. :-)


Cheers,
mark

--
Mark Mielke <[EMAIL PROTECTED]>


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] count(*) performance improvement ideas

Reply via email to