Manfred Koizar <[EMAIL PROTECTED]> writes: > Visibility check by other transactions: If a tuple is visited and its > XMIN/XMAX_IS_COMMITTED/ABORTED flags are not yet set, pg_clog has to > be consulted to find out the status of the inserting/deleting > transaction xid. If pg_clog[xid] is ...
> 00: transaction still active > 10: aborted > 01: committed > 11: committed subtransaction, have to check parent > Only in this last case do we have to get parentxid from pg_subtrans. Unfortunately this discussion is wrong. User-level visibility checks will usually have to fetch the parentxid in case 01 as well, because even if the parent is committed, it might not be visible in our snapshot. Snapshots will record only topmost-parent XIDs (because that's what we can find in the PG_PROC array, and anything else would create atomicity problems anyway). So we must chase to the topmost parent before testing visibility. This means that the parentxid will need to be fetched in enough cases that it's quite dubious that pushing it to a different file saves I/O. Also, using a 11 state doubles the amount of pg_clog I/O needed to commit a collection of subtransactions. You have to write 11 as the state of each commitable subtransaction, then commit the parent (write 01 as its state), then go back and change the state of each subtransaction to 01. (Whether this last bit is done as part of parent transaction commit, or during later inspections of the state of the subtransaction, doesn't change the argument.) I think it would be preferable to use only three states: active, aborted, committed. The parent commit protocol is (1) write 10 as state of each aborted subtransaction (this should be done as soon as the subtransaction is known aborted, rather than delaying to parent commit); (2) write 01 as state of parent (this is the atomic commit); (3) write 01 as state of each committed subtransaction. Readers who see 00 must check the parent state; if the parent is committed then they have to go back and recheck the child state (to see if it became "aborted" after they looked). This halves the write traffic during a commit, at the cost of additional read traffic when subtransaction state is checked in a narrow window after the time of parent transaction commit. I believe it nets out to be faster. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html