On 5 January 2016 at 04:09, Riley Berton <rber...@appnexus.com> wrote:

>
> The conflict on the "thingy" table has resulted in node2 winning based
> on last_update wins default resolution.  However, both inserts have
> applied.  My expectation is that the entire TX applies or does not
> apply.  This expectation is clearly wrong.
>

Correct. Conflicts are resolved row-by-row. Their outcomes are determined
(by default) by transaction commit timestamps, but the conflicts themselves
are row-by-row.

Because BDR:

* applies changes to other nodes only AFTER commit on the origin node; and
* does not take row and table locks across nodes

it has no way to sensibly apply all or none of a transaction on downstream
peers because the client has already committed and moved on to other
things. If the xact doesn't apply, what do we do? Log output on the failing
node(s) and throw it away?

It's probably practical to have xacts abort on the first conflict, though
some thought would be needed about making sure that doesn't break
consistency requirements across nodes. It's not clear if doing so is useful
though.

For that you IMO want synchronous replication where the client doesn't get
a local COMMIT until all nodes have confirmed they can commit the xact.
That's something that could be added to BDR in future, but doing it well it
requires support for logical decoding of prepared transactions which is
currently missing from PostgreSQL's logical decoding support. If it's
something you think is important/useful you might want to explore what's
involved in implementing that.

Question is: is there a way (via a custom conflict handler) to have the
> TX obeyed?


No.

Even if you ERROR in your handler, BDR will just retry the xact. It has no
concept of "throw this transaction away forever".


> I can't see a way to even implement a simple bank account
> database that changes multiple tables in a single transaction without
> having the data end up in an inconsistent state.  Am I missing something
> obvious here?
>

You're trying to use asynchronous multimaster replication as if it was an
application-transparent synchronous cluster with a global transaction
manager and global lock manager.

BDR is not application-transparent. You need to understand replication
conflicts and think about them. It does not preserve full READ COMMITTED
semantics across nodes. This comes with big benefits in partition
tolerance, performance and latency tolerance, but it means you can't point
an existing app at more than one node and expect it to work properly.

The documentation tries over and over to emphasise this. Can you suggest
where it can be made clearer or more prominent?

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply via email to