Re: [HACKERS] The plan for FDW-based sharding

Konstantin Knizhnik Sat, 27 Feb 2016 11:15:20 -0800

Neither pg_dtm, neither pg_tsdtm supports serializable isolation level.
We implemented distributed snapshot isolation - repeatable-read isolation level.
We also do not support read-committed isolation level now.


We do not try to preserve transaction commit order at all nodes.
But in principle it can be implemented using XTM API: it allows to redefine 
function which actually sets transaction status.  pg_dtm performs 2PC here.
And in principle it is possible to enforce commits in any particular order.

Concerning CSNs, may be you are right and it is not correct to use this notion in this 
case. Actually there are many "CSNs" involved in transaction commit.
First of all each transaction is assigned local CSN (timestamp) when it is 
ready to commit. Then CSNs of all nodes are exchanged and maximal CSN is chosen.
This maximum is writen as final transaction CSN and is used in visibility check.


On 02/27/2016 01:48 AM, Kevin Grittner wrote:

On Fri, Feb 26, 2016 at 2:19 PM, Konstantin Knizhnik
<[email protected]> wrote:

pg_tsdtm  is based on another approach: it is using system time
as CSN

Which brings up an interesting point, if we want logical
replication to be free of serialization anomalies for those using
serializable transactions, we need to support applying transactions
in an order which may not be the same as commit order -- CSN (as
such) would be the wrong thing.  If serializable transaction 1 (T1)
modifies a row and concurrent serializable transaction 2 (T2) reads
the old version of the row, and modifies something based on that,
T2 must be applied to a logical replica first even if T1 commits
before it; otherwise the logical replica could see a state not
consistent with business rules and which could not have been seen
(due to SSI) on the source database.  Any DTM API which does not
support some mechanism to rearrange the order of transactions from
commit order to some other order (based on, for example, read-write
dependencies) is not complete.  If it does support that, it gives
us a way forward for presenting consistent data on logical
replicas.

To avoid confusion, it might be best to reserve CSN for actual
commit sequence numbers, or at least values which increase
monotonically with each commit.  The term of art for what I
described above is "apparent order of execution", so maybe we want
to use AOE or AOoE for the order we choose to use in a particular
implementation.  It doesn't seem to me to be outright inaccurate
for cases where the system time on the various systems is used.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

Reply via email to