I'm currently implementing commit sequence number (CSN) based
snapshots and I hit a design decision that I would like to resolve
before I have too much code to rewrite.

The issue is commit visibility ordering on slaves. As a couple of
threads on hackers have already noted, currently commit order on
slaves can differ from what is seen on master. This arises from the
fact that on master commit visibility is determined by the order of
ProcArrayLock acquisition by ProcArrayEndTransaction(). On slaves
commit visibility is exactly the order of commit records in WAL.
Because XLogInsert() in RecordTransactionCommit() is not interlocked
with ProcArrayEndTransaction() these orders can differ. In case of
mixed sync and async transactions they in fact are quite likely to
differ due to the durability wait in RecordTransactionCommit().

It's not possible to change master commit order to match WAL order
because then either async transactions must either wait behind sync
transactions before returning losing the point of async; or async
transactions must return without becoming visible, changing user
visible semantics; or sync transactions must become visible before
they become durable, again changing user visible semantics.

As it's not possible to change master commit order, the slave
visibility order must change for the orders to be consistent. WAL
currently doesn't have the information to reconstruct master commit
order. Either we need to add a new WAL record for the commit order
(only necessary when wal_level=hot_standby) or add a side channel to
replication connections to communicate commit order information.

One more consideration here is the wish expressed by several hackers
that commit record LSNs could be used as CSNs. One of the most
interesting benefits of this is the property of LSNs being the same
over the whole cluster, meaning that it would be relatively simple to
create cluster wide consistent snapshots.

I currently see the following courses of action:

1. Do nothing about the inconsistency, use a transient global counter
for master commit order and commit record LSN for slaves.
   Pro: doesn't change any semantics
   Con: we are not making any progress towards cluster wide snapshots
or even serializable transactions on slaves.

2. Create a new WAL record type that is inserted when a transaction
becomes visible. LSN of this record determines transaction visibility
order. Async transactions can be optimized to skip this record. This
record does not need to be flushed.
   Pro: cluster wide consistency, replication method agnostic
   Con: one extra WAL record insertion per writing transaction. (32
bytes of WAL per tx)

3. Use a transient global counter on master, send xid-csn pairs to
slave via a side channel on the replication connection.
   Pro: Less overhead than WAL records
   Con: replication protocol needs (possibly invasive) changes, WAL
shipping based replication can't use this mechanism, lots of extra
code required.

4. Make the choice between 1 and 2 user configurable (it seems to me
that it could even be changed without a restart).

Thoughts?

Regards,
Ants Aasma
-- 
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to