On Saturday, November 14, 2015 8:41 AM, Craig Ringer <cr...@2ndquadrant.com> wrote: > On 13 November 2015 at 21:35, Michael Paquier <michael.paqu...@gmail.com> > wrote: >> On Tue, Nov 10, 2015 at 3:46 AM, Robert Haas <robertmh...@gmail.com> wrote:
>>> If the database is corrupted, there's no way to guarantee that >>> anything works as planned. This is like saying that criticizing >>> somebody's disaster recovery plan on the basis that it will be >>> inadequate if the entire planet earth is destroyed. +1 Once all participating servers return "success" from the "prepare" phase, the only sane way to proceed is to commit as much as possible. (I guess you still have some room to argue about what to do after an error on the attempt to commit the first prepared transaction, but even there you need to move forward if it might have been due to a communications failure preventing a success indication for a commit which was actually successful.) The idea of two phase commit is that failure to commit after a successful prepare should be extremely rare. >> As well as there could be FS, OS, network problems... To come back to >> the point, my point is simply that I found surprising the sentence of >> Konstantin upthread saying that if commit fails on some of the nodes >> we should rollback the prepared transaction on all nodes. In the >> example given, in the phase after calling dtm_end_prepare, say we >> perform COMMIT PREPARED correctly on node 1, but then failed it on >> node 2 because a meteor has hit a server, it seems that we cannot >> rollback, instead we had better rolling in a backup and be sure that >> the transaction gets committed. How would you rollback the transaction >> already committed on node 1? But perhaps I missed something... Right. > The usual way this works in an XA-like model is: > > In phase 1 (prepare transaction, in Pg's spelling), failure on > any node triggers a rollback on all nodes. > > In phase 2 (commit prepared), failure on any node causes retries > until it succeeds, or until the admin intervenes - say, to remove > that node from operation. The global xact as a whole isn't > considered successful until it's committed on all nodes. > > 2PC and distributed commit is well studied, including the > problems. We don't have to think this up for ourselves. We don't > have to invent anything here. There's a lot of distributed > systems theory to work with - especially when dealing with well > studied relational DBs trying to maintain ACID semantics. Right; it is silly not to build on decades of work on theory and practice in this area. What is making me nervous as I watch this thread is a bit of loose talk about the I in ACID. There have been some points where it seemed to be implied that we had sufficient transaction isolation if we could get all the participating nodes using the same snapshot. I think I've seen some hint that there is an intention to use distributed strict two-phase locking with a global lock manager to achieve serializable behavior. The former is vulnerable to some well-known serialization anomalies and the latter was dropped from PostgreSQL when MVCC was implemented because of its horrible concurrency and performance characteristics, which is not going to be less of an issue in a distributed system using that technique. It may be possible to implement the Serializable Snapshot Isolation (SSI) technique across multiple cooperating nodes, but it's not obvious exactly how that would be implemented, and I have seen no discussion of that. If we're going to talk about maintaining ACID semantics in this environment, I think we need to explicitly state what level of isolation we intend to provide, and how we intend to do that. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers