On Thu, 28 Nov 2002 12:59:21 -0500 (EST), Bruce Momjian
<[EMAIL PROTECTED]> wrote:
>Yes, locking is one possible solution, but no one likes that.  One hack
>lock idea would be to create a subtransaction-only lock, [...]
>
>> [...] without
>> having to touch the xids in the tuple headers.
>
>Yes, you could do that, but we can easily just set the clog bits
>atomically,

>From what I read above I don't think we can *easily* set more than one
transaction's bits atomically.

> and it will not be needed --- the tuple bits really don't
>help us, I think.

Yes, this is what I said, or at least tried to say.  I just wanted to
make clear how this new approach (use the fourth status) differs from
older proposals (replace subtransaction ids in tuple headers).

>OK, we put it in a file.  And how do we efficiently clean it up?
>Remember, it is only to be used for a _brief_ period of time.  I think a
>file system solution is doable if we can figure out a way not to create
>a file for every xid.

I don't want to create one file for every transaction, but rather a
huge (sparse) array of parent xids.  This array is divided into
manageable chunks, represented by files, "pg_subtrans_NNNN".  These
files are only created when necessary.  At any time only a tiny part
of the whole array is kept in shared buffers.  This concept is similar
or almost equal to pg_clog, which is an array of doublebits.

>Maybe we write the xid's to a file in a special directory in sorted
>order, and backends can do a btree search of each file in that directory
>looking for the xid, and then knowing the master xid, look up that
>status, and once all the children xid's are updated, you delete the
>file.

Yes, dense arrays or btrees are other possible implementations.  But
for simplicity I'd do it pg_clog style.

>Yes, but again, the xid status of subtransactions is only update just
>before commit of the main transaction, so there is little value to
>having those visible.

Having them visible solves the atomicity problem without requiring
long locks.  Updating the status of a single (main or sub) transaction
is atomic, just like it is now.

Here is what is to be done for some operations:

BEGIN main transaction:
        Get a new xid (no change to current behaviour).
        pg_clog[xid] is still 00, meaning active.
        pg_subtrans[xid] is still 0, meaning no parent.

BEGIN subtransaction:
        Push current transaction info onto local stack.
        Get a new xid.
        Record parent xid in pg_subtrans[xid].
        pg_clog[xid] is still 00.

ROLLBACK subtransaction:
        Set pg_clog[xid] to 10 (aborted).
        Optionally set clog bits for subsubtransactions to 10.
        Pop transaction info from stack.

COMMIT subtransaction:
        Set pg_clog[xid] to 11 (committed subtrans).
        Don't touch clog bits for subsubtransactions!
        Pop transaction info from stack.

ROLLBACK main transaction:
        Set pg_clog[xid] to 10 (aborted).
        Optionally set clog bits for subtransactions to 10.
        
COMMIT main transaction:
        Set pg_clog[xid] to 01 (committed).
        Optionally set clog bits for subtransactions from 11 to 01.
        Don't touch clog bits for aborted subtransactions!

Visibility check by other transactions:  If a tuple is visited and its
XMIN/XMAX_IS_COMMITTED/ABORTED flags are not yet set, pg_clog has to
be consulted to find out the status of the inserting/deleting
transaction xid.  If pg_clog[xid] is ...

        00:  transaction still active

        10:  aborted

        01:  committed

        11:  committed subtransaction, have to check parent

Only in this last case do we have to get parentxid from pg_subtrans.
Now we look at pg_clog[parentxid].  If we find ...

        00:  parent still active, so xid is considered active, too

        10:  parent aborted, so xid is considered aborted,
             optionally set pg_clog[xid] = 10

        01:  parent committed, so xid is considered committed,
             optionally set pg_clog[xid] = 01

        11:  recursively check grandparent(s) ...

For brevity the following operations are not covered in detail:
. Visibility checks for tuples inserted/deleted by a (sub)transaction
belonging to the current transaction tree (have to check local
transaction stack whenever we look at a xid or switch to a parent xid)
. HeapTupleSatisfiesUpdate (sometimes has to wait for parent
transaction)

The trick here is, that subtransaction status is immediately updated
in pg_clog on commit/abort.  Main transaction commit is atomic (just
set its commit bit).  Status 11 is short-lived, it is replaced with
the final status by one or more of

        - COMMIT/ROLLBACK of the main transaction
        - a later visibility check (as a side effect)
        - VACUUM

pg_subtrans cleanup:  A pg_subtrans_NNNN file covers a known range of
transaction ids.  As soon as none of these transactions has a pg_clog
status of 11, the pg_subtrans_NNNN file can be removed.  VACUUM can do
this, and it won't even have to check the heap.

Servus
 Manfred

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Reply via email to