Re: [HACKERS] count(*) performance improvement ideas

Stephen Denne Wed, 19 Mar 2008 19:33:44 -0700

Mark Mielke wrote
> This returns to the question of whether count of the whole table is useful, 
> or whether 
> count of a GROUP BY or WHERE is useful.
> If GROUP BY or WHERE is useful, then trigger on UPDATE becomes necessary.


True... for the example I gave I should have had an update trigger on my table.
I neglected it as I based the example of a more complex multi-table example 
from my application, where I know that the value I'm grouping by doesn't change.

> What is the direction here? Is it count of the whole table only? (<-- not 
> interesting to me)
> Or count of more practical real life examples, which I completely agree with 
> Greg, 
> that this gets into the materialized view realm, and becomes very interesting.
> In my current db project, I never count all of the rows in a table. However, 
> I do use count(*) with GROUP BY and WHERE.
        
I'm trying to figure out how close I can come to a useful efficient 
materialized view with current production builds of postgresql,
and identifying areas where changes to postgresql could make it easier.

Currently I can see three areas of concern:

1) Turning it on
My query to initially populate the materialized view table takes 6 hours on my 
(limited hardware) development system, which highlights the problem of when do 
you turn on the triggers.
An outage is one way to guarantee that there are neither missing details nor 
double counted details.
Would turning on the triggers and then running my initial population query in 
the same transaction work?

2) Table bloat
I'm trying to group together a large number of +1 or -1 deltas into a single 
delta per transaction.
This creates as many dead rows as there are updates.
This is also a problem with Pavan's suggestion of maintaining a counter table.
They can all be HOT updates in 8.3, but they still all create new row versions.

Tom says "Collapsing a transaction's changes
into a single row would require keeping some intra-transaction state,
which is do-able at the C-code level but not from plpgsql."

So can I hope that this problem is solvable as a contrib module that will work 
with at least 8.3?
I'd still want to write plpgsql trigger function myself, as I know the 
aggregation rules, and call contributed functions to integrate with the 
collapsing of the transaction's changes into a single row. (Expect I'd need a 
small number of rows per transaction, as I need to create delta rows for each 
value of my grouping field involved in the transaction.

3) How to incorporate the deltas.
With my technique, if the transaction rolls back the delta record becomes a 
dead row, if it commits, the delta is then visible to transaction started after 
this time.
I need to have a regular process run to sum and remove the deltas, rewriting 
the summary rows.
I'd like to be able to write an after-commit trigger that fires after changes 
in dependent tables that I identify, which can add the transaction's deltas to 
the summary table. I would want it to effectively be a new, small transaction.

I think that if these three areas are addressed, then before even considering 
writing code to automatically convert any given view into a materialized view.

2) and 3) could perhaps be implemented with a per transaction map from 
my_custom_key to a prepared statement and a list of parameter values.
Provide access to the values, making them updateable within the transaction. 
Have the statement automatically executed on commit.

Pavan also refers to deferred triggers, which has got me thinking about another 
possible solution:

Instead of inserting a delta row, that will be updated a lot of times, create 
an on commit drop temp table named after the txid and the grouping value (which 
is an integer in my case, perhaps hash it if you're grouping by something that 
doesn't easily convert to part of a table name), 
create an after insert initially deferred constraint to call a function which 
will move the 'at commit' values of the rows in the temp table to the summary 
table.
The temp table will only have one row inserted, updated many times, then on 
commit the trigger is fired once, and the temp table is dropped.

Does anyone think this will or won't work for some reason?

Regards,
Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer 
focus, and courage. This email with any attachments is confidential and may be 
subject to legal privilege.  If it is not intended for you please advise by 
reply immediately, destroy it and do not copy, disclose or use it in any way.
__________________________________________________________________
  This email has been scanned by the DMZGlobal Business Quality
              Electronic Messaging Suite.
Please see http://www.dmzglobal.com/dmzmessaging.htm for details.
__________________________________________________________________



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] count(*) performance improvement ideas

Reply via email to