Re: [HACKERS] The plan for FDW-based sharding

Konstantin Knizhnik Mon, 07 Mar 2016 00:35:26 -0800

On 03/07/2016 04:28 AM, Robert Haas wrote:

On Fri, Mar 4, 2016 at 10:54 PM, Craig Ringer <[email protected]> wrote:

I've got to say that this is somewhat reminicient of the discussions around
in-core pooling, where argument 1 is applied to justify excluding pooling
from core/contrib.


I don't have a strong position on whether a DTM should be in core or not as
I haven't done enough work in the area. I do think it's interesting to
strongly require that a DTM be in core while we also reject things like
pooling that are needed by a large proportion of users.

I don't remember this discussion, but I don't think I feel differently
about either of these two issues.  I'm not opposed to having some
hooks in core to make it easier to build a DTM, but I'm not convinced
that these hooks are the right hooks or that the design underlying
those hooks is correct.

What can I try to convince you that design of XTM API is correct?
I already wrote that we have not introduced some new abstractions.
What we have done is just encapsulate some existed Postgres functions.
The main reason was that we tried to minimize changes in Postgres core.
If seems to betempting if we can provide enough level of flexibility without 
rewriting core, isn't it?

What does it mean "enough level of flexibility"? We are interested in 
implementation of DTM, so if XTM API allows to do it for several considered approaches,
then it is "flexible enough".

So do you agree than before rewriting/refactoring xact.c/transam.c/procarray.c 
it is better first to try introduce XTM over existed code?
And if we find out that some useful functionality is missed and can not be 
overrden through this API in convenient and efficient way,
without copying substantial peaces of code, then only in this case we should 
consider refactoring of core transaction processing code to make it more 
modular and tunable.

If you agree with this statement, then next question is which set of functions 
needs to be overridden by XTM.
PostgreSQL transaction manager has many different functions, some of them are 
doing almost the same things, but in different way.
For example consider TransactionIdIsInProgress,TransactionIdIsKnownCompleted, 
TransactionIdDidCommit, TransactionIdDidAbort, TransactionIdGetStatus.
Some of them are accessing clog, some - procarray, some just check cached 
value. And so them are scattered through different Postgres modules.

So which of them has to be included in XTM API?
We have investigated code and usage of all this functions.
We found out that TransactionIdDidCommit is always called by visibility check 
after TransactionIdIsInProgress.
And it is in turn using TransactionIdGetStatus to extract information about 
transaction from clog.
So we have included in XTM TransactionIdIsInProgress and 
TransactionIdGetStatus, but not TransactionIdDidCommit,TransactionIdDidAbort 
and TransactionIdIsKnownCompleted.

Similar story is with other functions. For example: transaction commit.
There are once again a bundle of functions: CommitTransactionCommand, 
CommitTransaction, CommitSubTransaction, RecordTransactionCommit, 
TransactionIdSetTreeStatus.
CommitTransactionCommand - is function from public API. It is initiating switch 
of state of Postgres TM finite state automaton.
We do not want to affect logic of this automaton: it is the same for DTM and 
local TM. So we are looking deeper.
CommitTransaction/CommitSubTransaction are called by this FSM. We also do not 
want to change logic of processing subtransactions.
One more step deeper. So we arrive at TransactionIdSetTreeStatus. And this is 
why it is included in XTM.

Another example is tuple visibility check. There is a family of 
HeapTupleSatisfies* functions in  utils/time/tqual.c (IMHO: very strange place 
for one of the core Postgres submodule:)
Should we override all of them? No, because them are mostly based on few other 
functions, such as TransactionIdIsInProgress, TransactionIdIsInProgress, 
XidInMVCCSnapshot...
As far as we do not want to change heap tuple format, we leave all 
manipulations with tuple status bits as it is and redefine only 
XidInMVCCSnapshot() function.

So, I can provide arguments for all functions included in XTM: why it was 
included in this API and why some other related functions were not included.
But I can not provide that is a necessary and sufficient subset of function.

I do not see big problems in extending and refactoring this API in future. Postgres lives for a years a without custom TMs and I do not expect that if presence of XTM API will cause development of many different TMs. Most likely very few people orcompanies will try to develop their TMs. So compatibility will not be a buig issue here.

And, eventually, I would like to see a DTM in
core or contrib so that it can be accessible to everyone relatively
easily.


So am I. But before including something in core, it will be best to test it for 
many different scenarios.
It is especially true for DTM, because requirements of various cluster solution 
are very different.
And the most convenient way of doing it is to ship DTM as extension, not as 
some fork of Postgres.
It will greatly simplify using it.

Now, on connection pooling, I am similarly not opposed to
having some well-designed hooks, but I also think in the long run it
would be better for some improvements in this area to be part of core.
None of that means I would support any particular hook proposal, of
course.



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

Reply via email to