On Sun, Jan 11, 2015 at 3:36 AM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
>> My understanding is that once you get a successful PREPARE that should mean
>> that it's basically impossible for the transaction to fail to commit. If
>> that's not the case, I fail to see how you can get any decent level of
>> sanity out of this...
> When giving the responsability of a group of COMMIT PREPARED to a set
> of nodes in a network, there could be a couple of problems showing up,
> of the type split-brain for example.

I think this is just confusing the issue.  When a machine reports that
a transaction is successfully prepared, any future COMMIT PREPARED
operation *must* succeed.  If it doesn't, the machine has broken its
promises, and that's not OK.  Period.  It doesn't matter whether
that's due to split-brain or sunspots or Oscar Wilde having bad
breath.  If you say that it's prepared, then you're not allowed to
change your mind later and say that it can't be committed.  If you do,
then you have a broken 2PC implementation and, as Jim says, all bets
are off.

Now of course nothing is certain in life except death and taxes.  If
you PREPARE a transaction, and then go into the data directory and
corrupt the 2PC state file using dd, and then try to commit it, it
might fail.  But no system can survive that sort of thing, whether 2PC
is involved or not; in such extraordinary situations, of course
operator intervention will be required.  But in a more normal
situation where you just have a failover, if the failover causes your
prepared transaction to come unprepared, that means your failover
mechanism is broken.  If you're using synchronous replication, this
shouldn't happen.

> There could be as well failures
> at hardware-level, so you would need a mechanism ensuring that WAL is
> consistent among all the nodes, with for example the addition of a
> common restore point on all the nodes once PREPARE is successfully
> done with for example XLOG_RESTORE_POINT. That's a reason why I think
> that the local Coordinator should use 2PC as well, to ensure a
> consistency point once all the remote nodes have successfully
> PREPAREd, and a reason why things can get complicated for either the
> DBA or the upper application in charge of ensuring the DB consistency
> even in case of critical failures.

It's up to the DBA to decide whether they care about surviving
complete loss of a node while having 2PC still work.  If they do, they
should use sync rep, and they should be fine -- the machine on which
the transaction is prepared shouldn't acknowledge the PREPARE as
having succeeded until the WAL is safely on disk on the standby.  Most
probably don't, though; that's a big performance penalty.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to