On Sun, Jan 11, 2015 at 3:36 AM, Michael Paquier <michael.paqu...@gmail.com> wrote: >> My understanding is that once you get a successful PREPARE that should mean >> that it's basically impossible for the transaction to fail to commit. If >> that's not the case, I fail to see how you can get any decent level of >> sanity out of this... > When giving the responsability of a group of COMMIT PREPARED to a set > of nodes in a network, there could be a couple of problems showing up, > of the type split-brain for example.
I think this is just confusing the issue. When a machine reports that a transaction is successfully prepared, any future COMMIT PREPARED operation *must* succeed. If it doesn't, the machine has broken its promises, and that's not OK. Period. It doesn't matter whether that's due to split-brain or sunspots or Oscar Wilde having bad breath. If you say that it's prepared, then you're not allowed to change your mind later and say that it can't be committed. If you do, then you have a broken 2PC implementation and, as Jim says, all bets are off. Now of course nothing is certain in life except death and taxes. If you PREPARE a transaction, and then go into the data directory and corrupt the 2PC state file using dd, and then try to commit it, it might fail. But no system can survive that sort of thing, whether 2PC is involved or not; in such extraordinary situations, of course operator intervention will be required. But in a more normal situation where you just have a failover, if the failover causes your prepared transaction to come unprepared, that means your failover mechanism is broken. If you're using synchronous replication, this shouldn't happen. > There could be as well failures > at hardware-level, so you would need a mechanism ensuring that WAL is > consistent among all the nodes, with for example the addition of a > common restore point on all the nodes once PREPARE is successfully > done with for example XLOG_RESTORE_POINT. That's a reason why I think > that the local Coordinator should use 2PC as well, to ensure a > consistency point once all the remote nodes have successfully > PREPAREd, and a reason why things can get complicated for either the > DBA or the upper application in charge of ensuring the DB consistency > even in case of critical failures. It's up to the DBA to decide whether they care about surviving complete loss of a node while having 2PC still work. If they do, they should use sync rep, and they should be fine -- the machine on which the transaction is prepared shouldn't acknowledge the PREPARE as having succeeded until the WAL is safely on disk on the standby. Most probably don't, though; that's a big performance penalty. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers