Hello,

Some time ago at PgConn.Vienna we have proposed eXtensible Transaction Manager API (XTM). The idea is to be able to provide custom implementation of transaction managers as standard Postgres extensions,
primary goal is implementation of distritibuted transaction manager.
It should not only support 2PC, but also provide consistent snapshots for global transaction executed at different nodes.

Actually, current version of XTM API propose any particular 2PC model. It can be implemented either at coordinator side (as it is done in our pg_tsdtm <https://github.com/postgrespro/pg_tsdtm> implementation based on timestamps and not requiring centralized arbiter), either by arbiter (pg_dtm <https://github.com/postgrespro/pg_dtm>). In the last case 2PC logic is hidden under XTM SetTransactionStatus method:

bool (*SetTransactionStatus)(TransactionId xid, int nsubxids, TransactionId *subxids, XidStatus status, XLogRecPtr lsn);

which encapsulates TransactionIdSetTreeStatus in clog.c.
But you may notice that original TransactionIdSetTreeStatus function is void - it is not intended to return anything. It is called in RecordTransactionCommit in critical section where it is not expected that commit may fail. But in case of DTM transaction may be rejected by arbiter. XTM API allows to control access to CLOG, so everybody will see that transaction is aborted. But we in any case have to somehow notify client about abort of transaction.

We can not just call elog(ERROR,...) in SetTransactionStatus implementation because inside critical section it cause Postgres crash with panic message. So we have to remember that transaction is rejected and report error later after exit from critical section:


        /*
         * Now we may update the CLOG, if we wrote a COMMIT record above
         */
        if (markXidCommitted) {
            committed = TransactionIdCommitTree(xid, nchildren, children);
        }
...
    /*
     * If we entered a commit critical section, leave it now, and let
     * checkpoints proceed.
     */
    if (markXidCommitted)
    {
        MyPgXact->delayChkpt = false;
        END_CRIT_SECTION();
        if (!committed) {
            CurrentTransactionState->state = TRANS_ABORT;
            CurrentTransactionState->blockState = TBLOCK_ABORT_PENDING;
            elog(ERROR, "Transaction commit rejected by XTM");
        }
    }

There is one more problem - at this moment the state of transaction is TRANS_COMMIT. If ERROR handler will try to abort it, then we get yet another fatal error: attempt to rollback committed transaction. So we need to hide the fact that transaction is actually committed in local XLOG.

This approach works but looks a little bit like hacker approach. It requires not only to replace direct call of TransactionIdSetTreeStatus with indirect (though XTM API), but also requires to make some non obvious changes in RecordTransactionCommit.

So what are the alternatives?

1. Move RecordTransactionCommit to XTM. In this case we have to copy original RecordTransactionCommit to DTM implementation and patch it here. It is also not nice, because it will complicate maintenance of DTM implementation. The primary idea of XTM is to allow development of DTM as standard PostgreSQL extension without creating of specific clones of main PostgreSQL source tree. But this idea will be compromised if we have copy&paste some pieces of PostgreSQL code. In some sense it is even worser than maintaining separate branch - in last case at least we have some way to perfrtom automatic merge.

2. Propose some alternative two-phase commit implementation in PostgreSQL core. The main motivation for such "lightweight" implementation of 2PC in pg_dtm is that original mechanism of prepared transactions in PostgreSQL adds to much overhead. In our benchmarks we have found that simple credit-debit banking test (without any DTM) works almost 10 times slower with PostgreSQL 2PC than without it. This is why we try to propose alternative solution (right now pg_dtm is 2 times slower than vanilla PostgreSQL, but it not only performs 2PC but also provide consistent snapshots).

May be somebody can suggest some other solution?
Or give some comments concerning current approach?

Thank in advance,
Konstantin,
Postgres Professional

Reply via email to