Re: [HACKERS] The plan for FDW-based sharding

Konstantin Knizhnik Fri, 26 Feb 2016 23:30:07 -0800

On 02/27/2016 06:57 AM, Robert Haas wrote:

On Sat, Feb 27, 2016 at 1:49 AM, Konstantin Knizhnik
<[email protected]> wrote:

pg_tsdtm  is based on another approach: it is using system time as CSN and
doesn't require arbiter. In theory there is no limit for scalability. But
differences in system time and necessity to use more rounds of communication
have negative impact on performance.

How do you prevent clock skew from causing serialization anomalies?


If node receives message from "feature" it just needs to wait until this future 
arrive.
Practically we just "adjust" system time in this case, moving it forward 
(certainly system time is not actually changed, we just set correction value which need 
to be added to system time).
This approach was discussed in the article:
http://research.microsoft.com/en-us/people/samehe/clocksi.srds2013.pdf
I hope, in this article algorithm is explained much better than I can do here.

Few notes:
1. I can not prove that our pg_tsdtm absolutely correctly implements approach 
described in this article.
2. I didn't try to formally prove that our implementation can not cause some 
serialization anomalies.
3. We just run various synchronization tests (including simplest debit-credit 
test which breaks old version of Postgtes-XL) during several days and we didn't 
get any inconsistencies.

4. We have tested pg_tsdtm both at single node, blade cluster and geographically distributed nodes (distance more than thousand kilometers: one server was in Vladivostok, another in Kaliningrad). Ping between these two servers takes about 100msec.Performance of our benchmark drops about 100 times but there was no inconsistencies.


Also I once again want to notice that primary idea of the proposed patch was 
not pg_tsdtm.
There are well know limitation of this  pg_tsdtm which we will try to address 
in future.
What we want is to include XTM API in PostgreSQL to be able to continue our 
experiments with different transaction managers and implementing multimaster on 
top of it (our first practical goal) without affecting PostgreSQL core.

If XTM patch will be included in 9.6, then we can propose our multimaster as 
PostgreSQL extension and everybody can use it.
Otherwise we have to propose our own fork of Postgres which significantly 
complicates using and maintaining it.

So there is no ideal solution which can work well for all cluster. This is
why it is not possible to develop just one GTM, propose it as a patch for
review and then (hopefully) commit it in Postgres core. IMHO it will never
happen. And I do not think that it is actually needed. What we need is a way
to be able to create own transaction managers as Postgres extension not
affecting its  core.

This seems rather defeatist.  If the code is good and reliable, why
should it not be committed to core?


Two reasons:
1. There is no ideal implementation of DTM which will fit all possible needs 
and be  efficient for all clusters.
2. Even if such implementation exists, still the right way of it integration is 
Postgres should use kind of TM API.
I hope that everybody will agree that doing it in this way:

#ifdef PGXC
        /* In Postgres-XC, stop timestamp has to follow the timeline of GTM */
        xlrec.xact_time = xactStopTimestamp + GTMdeltaTimestamp;
#else
        xlrec.xact_time = xactStopTimestamp;
#endif

or in this way:

        xlrec.xact_time = xactUseGTM ? xactStopTimestamp + GTMdeltaTimestamp : 
xactStopTimestamp;

is very very bad idea.
In OO programming we should have abstract TM interface and several 
implementations of this interface, for example
MVCC_TM, 2PL_TM, Distributed_TM...
This is actually what can be done with our XTM API.
As far as Postgres is implemented in C, not in C++ we have to emulate 
interfaces using structures with function pointers.
And please notice that there is completely no need to include DTM 
implementation in core, as far as it is not needed for everybody.
It can be easily distributed as extension.

I have that quite soon we can propose multimaster extension which should provides functionality similar with MySQL Gallera. But even right now we have integrated pg_dtm and pg_tsdtm with pg_shard and postgres_fdw, allowing to provide distributedconsistency for them.

All arguments against XTM can be applied to any other extension API in
Postgres, for example FDW.
Is it general enough? There are many useful operations which currently are
not handled by this API. For example performing aggregation and grouping at
foreign server side.  But still it is very useful and flexible mechanism,
allowing to implement many wonderful things.

That is true.  And everybody is entitled to an opinion on each new
proposed hook, as to whether that hook is general or not.  We have
both accepted and rejected proposed hooks in the past.



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

Reply via email to