Re: [HACKERS] The plan for FDW-based sharding

Konstantin Knizhnik Sat, 27 Feb 2016 00:21:08 -0800

On 02/27/2016 06:54 AM, Robert Haas wrote:

On Fri, Feb 26, 2016 at 10:56 PM, Konstantin Knizhnik
<[email protected]> wrote:

We do not have formal prove that proposed XTM is "general enough" to handle
all possible transaction manager implementations.
But there are two general ways of dealing with isolation: snapshot based and
CSN  based.

I don't believe that for a minute.  For example, consider this article:


Well, I have to agree that saying that there are just two ways of providing 
distributed isolation I was not right.
There is at least one more method: conservative locking. But it will cause huge 
number of extra network messages which has to be exchanged.
Also I mostly considered solutions compatible with PostgreSQL MVCC model.

And definitely their are other approaches. Like preserving transaction commit 
order (as it is done in Galera).
Some other them can be implemented with XTM (preserving commit order), some - 
not (2PL).
I have already noticed that XTM is not allowing to implement ANY transaction 
manager.
But we have considered several approaches to distributed transaction management 
explained in the article related with really working systems.
Some of them are real production system as SAP HANA, some are just prototypes, 
but working prototypes for which authors have performed
some benchmarking and comparison with other approaches. The references you have 
mentioned are mostly theoretical description of the problem.
Nice to know it but it is hard to build some concrete implementation based on 
this articles.


Briefly answering other your questions:

For example, consider a table with a million rows spread across any number of 
servers.


It is sharding scenario, pg_tsdtm will work well in this case does not 
requiring sending a lot of extra messages.

Now consider another workload where each transaction reads a row one

one server, reads a row on another server,

It can be solved both with pg_dtm (central arbiter) and pg_tsdtm (no arbiter),
But actually you scenarios just once again proves that there can not be just 
one ideal distributed TM.

So maybe the goal for the GTM isn't to provide true serializability

across the cluster but some lesser degree of transaction isolation.
But then exactly which serialization anomalies are we trying to
prevent, and why is it OK to prevent those and not others?

Absolutely agree. There are some theoretical discussion regarding CAP and 
different distributed level of isolation.
But at practice people want to solve their tasks. Most of PostgeSQL used are using 
default isolation level: read committed although there are alot of "wonderful" 
anomalies with it.
Serialazable transaction in Oracle are actually violating fundamental 
serializability rule and still Oracle is one of ther most popular database in 
the world...
The was isolation bug in Postgres-XL which doesn't prevent from using it by 
commercial customers...

So I do not say that discussing all this theoretical questions is not need as 
formally proven correctness of distributed algorithm.
But I do not understand hot why it should prevent from providing extensible TM 
API.
Yes, we can tot do everything with it. But still we can implement many 
different approaches.
I think that it somehow proves that it is "general enough".

https://en.wikipedia.org/wiki/Global_serializability

I think the neutrality of that article is *very* debatable, but it
certainly contradicts the idea that snapshots and CSNs are the only
methods of achieving global serializability.

Or consider this lecture:

http://hssl.cs.jhu.edu/~randal/416/lectures.old/ln5.2.pdf

That's a great introduction to the problem we're trying to solve here,
but again, snapshots are not mentioned, and CSNs certainly aren't
mentioned.

This write-up goes further, explaining three different methods for
ensuring global serializability, none of which mention snapshots or
CSNs:

http://heaven.eee.metu.edu.tr/~vision/LectureNotes/EE442/Ee442ch7.html

Actually, I think the second approach is basically a snapshot/CSN-type
approach, but it doesn't use that terminology and the connection to
what you are proposing is very unclear.

I think you're approaching this problem from a viewpoint that is
entirely too focused on the code that exists in PostgreSQL today.
Lots of people have done lots of academic research on how to solve
this problem, and you can't possibly say that CSNs and snapshots are
the only solution to this problem unless you haven't read any of those
papers.  The articles above aren't exceptional in mentioning neither
of the approaches that you are advocating - they are typical of the
literature in this area.  How can it be that the only solutions to
this problem are ones that are totally different from the approaches
that university professors who spend time doing research on
concurrency have spent time exploring?

I think we need to back up here and examine our underlying design
assumptions.  The goal here shouldn't necessarily be to replace
PostgreSQL's current transaction management with a distributed version
of the same thing.  We might want to do that, but I think the goal is
or should be to provide ACID semantics in a multi-node environment,
and specifically the I in ACID: transaction isolation.  Making the
existing transaction manager into something that can be spread across
multiple nodes is one way of accomplishing that.  Maybe the best one.
Certainly one that's been experimented within Postgres-XC.  But it is
often the case that an algorithm that works tolerably well on a single
machine starts performing extremely badly in a distributed
environment, because the latency of communicating between multiple
systems is vastly higher than the latency of communicating between
CPUs or cores on the same system.  So I don't think we should be
assuming that's the way forward.

For example, consider a table with a million rows spread across any
number of servers.  Consider also a series of update transactions each
of which reads exactly one row and then writes that row.  If we adopt
any solution that involves a central coordinator to arbitrate commit
ordering, this is going to require at least one and probably two
million network round trips, one per transaction to get a snapshot and
a second to commit.  But all of this is completely unnecessary.
Because each transaction touches only a single node, a perfect global
transaction manager doesn't really need to do anything at all in this
case.  The existing PostreSQL mechanisms - snapshot isolation, and SSI
if you have it turned on - will provide just as much transaction
isolation on this workload as they would on a workload that only
touched a single node.  If we design a GTM that does two million
network round trips in this scenario, we have just wasted two million
network round trips.

Now consider another workload where each transaction reads a row one
one server, reads a row on another server, and then updates the second
row.  Here, the GTM has a job to do.  If T1 reads R1, reads R2, writes
R2; and T2 concurrently reads R2, reads R1, and then writes R1, it
could happen that both transactions see the pre-update values of the
row they read first and yet both transactions go on to commit.  That's
not equivalent to any serial history, so transaction isolation is
broken.  A GTM which aims to provide true cluster-wide serializability
must do something to keep that from happening.  If all of this were
happening on a single node, those transactions would succeed if run at
READ COMMITTED but SSI would roll one of them back at SERIALIZABLE.
So maybe the goal for the GTM isn't to provide true serializability
across the cluster but some lesser degree of transaction isolation.
But then exactly which serialization anomalies are we trying to
prevent, and why is it OK to prevent those and not others?

I have seen zero discussion of any of this.  What I think we ought to
be doing here is describing precisely what might break, and then
deciding which of those problems we want to fix, and then deciding how
we can do that with the least amount of network traffic.  Jumping to
"let's make the transaction API pluggable" is presupposing the answer
to the first two questions without any discussion, and I'm afraid that
it's not going to lead to a very agreeable solution to the third one.

Yes, it is certainly possible to develop cluster by cloning PostgreSQL.
But it cause big problems both for developers, which have to permanently
synchronize their branch with master,
and, what is more important, for customers, which can not use standard
version of PostgreSQL.
It may cause problems with system certification, with running Postgres in
cloud,...
Actually the history of Postgres-XL/XC and Greenplum IMHO shows that it is
wrong direction.

I think the history of Postgres-XC/XL shows that developing technology
outside of the PostgreSQL community is a risky business.  You might
end up developing something that is not widely used or adopted, and
the lack of community review might cause that technology to be less
good than it would have been had it been done through the community
process. It seems to me that installing a bunch of hooks here and then
having you go off and develop outside the community has those same
perils. (Of course, in that case and this one, working outside the
community also lets you can also go faster and do things the community
doesn't like, which are sometimes advantages.)

Also, what you are proposing solves problems for you while maybe
creating them for other people.  You're saying that we should have
hooks so that you don't have to merge with master.  But that's just
transferring the maintenance burden from you to core.  Instead of you
having to merge when things change, core has got to maintain the hooks
as things change so that things are easy for you.  If there are no
code changes in the relevant area anyway, then merging is trivial and
you shouldn't need to worry about it.  I could submit a patch adding
hooks to core to enable all of the things (or even just some of the
things) that EnterpriseDB has changed in Advanced Server, and that
patch would be rejected so fast it would make your head spin, because
of course the core project doesn't want to be burdened with
maintaining a whole bunch of hooks for the convenience of
EnterpriseDB.  Which is understandable.  I think it's fine for you to
ask whether PostgreSQL will accept a certain set of hooks, but we've
all got to understand that there is a difference between what is
convenient for us or our employers and what is actually best for the
project.  I am not under any illusions that those two things are the
same, and while I do a lot of things that I hope will benefit my
employer, when I am writing to this mailing list I do not do things
unless they are in the interest of PostgreSQL.  When those two things
intersect, great; when they don't, and the work is community work,
PostgreSQL wins.  I see very clearly that what you are proposing here
will benefit your customers, but unless it will also benefit the
PostgreSQL community in general, it's not a good submission.

But I don't really want to spend a lot of time arguing about politics
here.  The real issue is whether this is a good approach.  If it is,
then it's the right thing to do for PostgreSQL and we should commit
it.  If it's not, then we should reject it.  Let's focus on the
technical concerns I wrote about in the first part of the email rather
than wrangling about business interests.  I'm not blind to the fact
that we work for different companies and I realize that can create
some tension, but if we want to *have* a PostgreSQL community we've
got to try to get past that.



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

Reply via email to