Re: [HACKERS] Transactions involving multiple postgres foreign servers

Stas Kelvich Tue, 01 Aug 2017 06:15:50 -0700

> On 31 Jul 2017, at 20:03, Robert Haas <robertmh...@gmail.com> wrote:
> 
> Regardless of whether we share XIDs or DXIDs, we need a more complex
> concept of transaction state than we have now.

Seems that discussion shifted from 2PC itself to the general issues with
distributed
transactions. So it is probably appropriate to share here resume of things that
we
done in area of distributed visibility. During last two years we tried three
quite different
approaches and finally settled with Clock-SI.

At first, to test different approaches we did small patch that wrap calls to
visibility-related
functions (SetTransactionStatus, GetSnapshot, etc. Described in detail at
wiki[1] ) in order to
allow overload them from extension. Such approach allows to implement almost
anything
related to distributed visibility since you have full control about how local
visibility is done.
That API isn’t hard prerequisite, and if one wants to create some concrete
implementation
it can be done just in place. However, I think it is good to have such API in
some form.

So three approaches that we tried:

1) Postgres-XL-like:

That is most straightforward way. Basically we need separate network service
(GTM/DTM) that is
responsible for xid generation, and managing running-list of transactions. So
acquiring
xid and snapshot is done by network calls. Because of shared xid space it is
possible
to compare them in ordinary way and get right order. Gap between
non-simultaneous
commits by 2pc is covered by the fact that we getting our snapshots from GTM,
and
it will remove xid from running list only when transaction committed on both
nodes.

Such approach is okay for OLAP-style transactions where tps isn’t high. But
OLTP with
high transaction rate GTM will immediately became a bottleneck since even write
transactions
need to get snapshot from GTM. Even if they access only one node.

2) Incremental SI [2]

Approach with central coordinator, that can allow local reads without network
communications by slightly altering visibility rules.

Despite the fact that it is kind of patented, we also failed to achieve proper
visibility
by implementing algorithms from that paper. It always showed some
inconsistencies.
May be because of bugs in our implementation, may be because of some
typos/mistakes in algorithm description itself. Reasoning in paper wasn’t very
clear for us, as well as patent issues, so we just leaved that.

3) Clock-SI [3]

It is MS research paper, that describes algorithm similar to ones used in
Spanner and
CockroachDB, without central GTM and with reads that do not require network
roundtrip.

There are two ideas behind it:

* Assuming snapshot isolation and visibility on node are based on CSN, use
local time as CSN,
then when you are doing 2PC, collect prepare time from all participating nodes
and
commit transaction everywhere with maximum of that times. If node during read
faces tuples
committed by tx with CSN greater then their snapshot CSN (that can happen due to
time desynchronisation on node) then it just waits until that time come. So
time desynchronisation
can affect performance, but can’t affect correctness.

* During distributed commit transaction neither running (if it commits then
tuple
should be already visible) nor committed/aborted (it still can be aborted, so
it is illegal to read).
So here IN-DOUBT transaction state appears, when reader should wait for writers.

We managed to implement that using mentioned XTM api. XID<->CSN mapping is
accounted by extension itself. Speed/scalability are also good.

I want to resubmit implementation of that algorithm for FDW later in August,
along with some
isolation tests based on set of queries in [4].

[1] https://wiki.postgresql.org/wiki/DTM#eXtensible_Transaction_Manager_API
[2] http://pi3.informatik.uni-mannheim.de/~norman/dsi_jour_2014.pdf
[3]
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/samehe-clocksi.srds2013.pdf
[4] https://github.com/ept/hermitage

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions involving multiple postgres foreign servers

Reply via email to