gsst...@mit.edu (Greg Stark) writes:
> On Wed, Jun 2, 2010 at 6:45 PM, Chris Browne <cbbro...@acm.org> wrote:
>> It would make it easy to conclude:
>>
>>   "This next transaction did 8328194 updates.  Maybe we should do
>>   some kind of checkpoint (e.g. - commit transaction or such) before
>>   working on it."
>>
>>    versus
>>
>>   "This transaction we're thinking of working on had 7 updates.  No
>>   big deal..."
>
> I'm puzzled how you would define this value. How do you add 7 inserts,
> 7 deletes, and 7 updates? Is that 21 rows modified? Why are the 7
> inserts and 7 deletes worth twice as much as the 7 updates when
> they're basically the same thing? What if the inserts fired triggers
> which inserted 7 more rows, is that 14? What if the 7 updates modified
> 2 TB of TOAST data but the 8238194 updates were all to the same record
> and they were all HOT updates so all it did was change 8kB?

The presence of those questions (and their ambiguity) is the reason
why there's a little squirming as to whether this is super-useful and
super-necessary.

What this offers is *SOME* idea of how much updating work a particular
transaction did.  It's a bit worse than you suggest:

 - If replication triggers have captured tuples, those would get
   counted.

 - TOAST updates might lead to extra updates being counted.

But back to where you started, I'd anticipate 7 inserts, 7 deletes,
and 7 updates being counted as something around 21 updates.

And if that included 5 TOAST changes, it might bump up to 26.

If there were replication triggers in place, that might bump the count
up to 45 (which I chose arbitrarily).

> In any case you'll have all the actual data from your triggers or
> hooks or whatever so what value does having the system keep track of
> this add?

This means that when we'd pull the list of transactions to consider,
we'd get something like:

 select * from next_transactions('4218:23', 50);

[list of 50 transactions returned, each with...
   -> txid
   -> START timestamp
   -> COMMIT timestamp
   -> Approximate # of updates

Then, for each of the 50, I'd pull replication log data for the
corresponding transaction.

If I have the approximate # of updates, that might lead me to stop
short, and say:

  "That next update looks like a doozy!  I'm going to stop and commit
  what I've got before doing that one."

It's not strictly necessary, but would surely be useful for flow
control.
-- 
select 'cbbrowne' || '@' || 'cbbrowne.com';
http://cbbrowne.com/info/internet.html
"MS  apparently now  has a  team dedicated  to tracking  problems with
Linux  and publicizing them.   I guess  eventually they'll  figure out
this back fires... ;)" -- William Burrow <aa...@delete.fan.nb.ca>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to