On Sat, Jan 10, 2015 at 9:02 AM, Jim Nasby <jim.na...@bluetreble.com> wrote: > On 1/8/15, 12:00 PM, Kevin Grittner wrote: >> The key point is that the distributed transaction data must be >> flagged as needing to commit rather than roll back between the >> prepare phase and the final commit. If you try to avoid the >> PREPARE, flagging, COMMIT PREPARED sequence by building the >> flagging of the distributed transaction metadata into the COMMIT >> process, you still have the problem of what to do on crash >> recovery. You really need to use 2PC to keep that clean, I think. Yes, 2PC is needed as long as more than 2 nodes perform write operations within a transaction.
> If we had an independent transaction coordinator then I agree with you > Kevin. I think Robert is proposing that if we are controlling one of the > nodes that's participating as well as coordinating the overall transaction > that we can take some shortcuts. AIUI a PREPARE means you are completely > ready to commit. In essence you're just waiting to write and fsync the > commit message. That is in fact the state that a coordinating PG node would > be in by the time everyone else has done their prepare. So from that > standpoint we're OK. > > Now, as soon as ANY of the nodes commit, our coordinating node MUST be able > to commit as well! That would require it to have a real prepared transaction > of it's own created. However, as long as there is zero chance of any other > prepared transactions committing before our local transaction, that step > isn't actually needed. Our local transaction will either commit or abort, > and that will determine what needs to happen on all other nodes. It is a property of 2PC to ensure that a prepared transaction will commit. Now, once it is confirmed on the coordinator that all the remote nodes have successfully PREPAREd, the coordinator issues COMMIT PREPARED to each node. What do you do if some nodes report ABORT PREPARED while other nodes report COMMIT PREPARED? Do you abort the transaction on coordinator, commit it or FATAL? This lets the cluster in an inconsistent state, meaning that some consistent cluster-wide recovery point is needed as well (Postgres-XC and XL have introduced the concept of barriers for such problems, stuff created first by Pavan Deolassee). -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers