On 04/02/2009 03:32 PM, Tom Lane wrote:
Tony Cebzanov <tony...@andrew.cmu.edu> writes:
  
What I want to do is update the assoc_count field in the dataset table
to reflect the count of related records in the assoc field.  To do so, I
added the following trigger:
    

  
CREATE OR REPLACE FUNCTION update_assoc_count_insert()
RETURNS TRIGGER AS
'
BEGIN
    UPDATE dataset
        SET assoc_count = assoc_count + 1
        WHERE dataset_id = NEW.dataset_id;
    RETURN NEW;
END
' LANGUAGE plpgsql;
    

  
CREATE TRIGGER assoc_update_assoc_count_insert AFTER INSERT ON assoc
    FOR EACH ROW EXECUTE PROCEDURE update_assoc_count_insert();
    

There is basically no way that this is going to not suck :-(.  In the
first place, using an AFTER trigger means that each update queues an
AFTER trigger update event that has to be fired at statement or
transaction end.  In the second place (as Craig correctly noted) this
results in a separate update to the count-table row for each inserted
row, which tremendously bloats the count table with dead tuples.
In the third place, if you have any concurrency of insertions, it
disappears because all the inserters need to update the same count row.

If you dig in the pgsql-hackers archives, you will find that the
original scheme for this was to have each transaction accumulate its
total number of insertions minus deletions for a table in local memory,
and then insert *one* delta row into the count table just before
transaction commit.  I don't think it's possible to do that with just
user-level triggers (not least because we haven't got ON COMMIT
triggers); it would have to be a C-code addition.  The various blog
entries you cite are non-peer-reviewed oversimplifications of that
design.

Digging around, the oldest description I can find of this idea is
http://archives.postgresql.org/pgsql-hackers/2003-09/msg00387.php
although there are more recent threads rehashing the topic.

One point I don't recall anyone mentioning is that the stats subsystem
now implements a fairly large subset of this work already, namely the
initial data accumulation.  So you could imagine plugging something into
that to send the deltas to a table in addition to the stats collector.

			regards, tom lane

  

So, basically other than reading from pg_class table about the tuple count, there isn't a good way to optimize the COUNT(*)?


Thanks
Wei

Reply via email to