Re: [HACKERS] eXtensible Transaction Manager API

Kevin Grittner Sun, 15 Nov 2015 10:00:04 -0800

On Saturday, November 14, 2015 8:41 AM, Craig Ringer <[email protected]> 
wrote:
> On 13 November 2015 at 21:35, Michael Paquier <[email protected]> 
> wrote:
>> On Tue, Nov 10, 2015 at 3:46 AM, Robert Haas <[email protected]> wrote:


>>> If the database is corrupted, there's no way to guarantee that
>>> anything works as planned.  This is like saying that criticizing
>>> somebody's disaster recovery plan on the basis that it will be
>>> inadequate if the entire planet earth is destroyed.

+1

Once all participating servers return "success" from the "prepare"
phase, the only sane way to proceed is to commit as much as
possible.  (I guess you still have some room to argue about what to
do after an error on the attempt to commit the first prepared
transaction, but even there you need to move forward if it might
have been due to a communications failure preventing a success
indication for a commit which was actually successful.)  The idea
of two phase commit is that failure to commit after a successful
prepare should be extremely rare.

>> As well as there could be FS, OS, network problems... To come back to
>> the point, my point is simply that I found surprising the sentence of
>> Konstantin upthread saying that if commit fails on some of the nodes
>> we should rollback the prepared transaction on all nodes. In the
>> example given, in the phase after calling dtm_end_prepare, say we
>> perform COMMIT PREPARED correctly on node 1, but then failed it on
>> node 2 because a meteor has hit a server, it seems that we cannot
>> rollback, instead we had better rolling in a backup and be sure that
>> the transaction gets committed. How would you rollback the transaction
>> already committed on node 1? But perhaps I missed something...

Right.

> The usual way this works in an XA-like model is:
>
> In phase 1 (prepare transaction, in Pg's spelling), failure on
> any node triggers a rollback on all nodes.
>
> In phase 2 (commit prepared), failure on any node causes retries
> until it succeeds, or until the admin intervenes - say, to remove
> that node from operation. The global xact as a whole isn't
> considered successful until it's committed on all nodes.
>
> 2PC and distributed commit is well studied, including the
> problems. We don't have to think this up for ourselves. We don't
> have to invent anything here. There's a lot of distributed
> systems theory to work with - especially when dealing with well
> studied relational DBs trying to maintain ACID semantics.

Right; it is silly not to build on decades of work on theory and
practice in this area.

What is making me nervous as I watch this thread is a bit of loose
talk about the I in ACID.  There have been some points where it
seemed to be implied that we had sufficient transaction isolation
if we could get all the participating nodes using the same
snapshot.  I think I've seen some hint that there is an intention
to use distributed strict two-phase locking with a global lock
manager to achieve serializable behavior.  The former is vulnerable
to some well-known serialization anomalies and the latter was
dropped from PostgreSQL when MVCC was implemented because of its
horrible concurrency and performance characteristics, which is not
going to be less of an issue in a distributed system using that
technique.  It may be possible to implement the Serializable
Snapshot Isolation (SSI) technique across multiple cooperating
nodes, but it's not obvious exactly how that would be implemented,
and I have seen no discussion of that.

If we're going to talk about maintaining ACID semantics in this
environment, I think we need to explicitly state what level of
isolation we intend to provide, and how we intend to do that.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] eXtensible Transaction Manager API

Reply via email to