From: Robert Haas <robertmh...@gmail.com>
> On Thu, Jun 10, 2021 at 9:58 PM tsunakawa.ta...@fujitsu.com
> <tsunakawa.ta...@fujitsu.com> wrote:
> > The question I have been asking is how.  With that said, we should only have
> two options; one is the return value of the FDW commit routine, and the other 
> is
> via ereport(ERROR).  I suggested the possibility of the former, because if the
> FDW does ereport(ERROR), Postgres core (transaction manager) may have
> difficulty in handling the rest of the participants.
> 
> I don't think that is going to work. It is very difficult to write
> code that doesn't ever ERROR in PostgreSQL. It is not impossible if
> the operation is trivial enough, but I think you're greatly
> underestimating the complexity of committing the remote transaction.
> If somebody had designed PostgreSQL so that every function returns a
> return code and every time you call some other function you check that
> return code and pass any error up to your own caller, then there would
> be no problem here. But in fact the design was that at the first sign
> of trouble you throw an ERROR. It's not easy to depart from that
> programming model in just one place.

> > I'm not completely sure about this.  I thought (and said) that the only 
> > thing
> the FDW does would be to send a commit request through an existing
> connection.  So, I think it's not a severe restriction to require FDWs to do
> ereport(ERROR) during commits (of the second phase of 2PC.)
> 
> To send a commit request through an existing connection, you have to
> send some bytes over the network using a send() or write() system
> call. That can fail. Then you have to read the response back over the
> network using recv() or read(). That can also fail. You also need to
> parse the result that you get from the remote side, which can also
> fail, because you could get back garbage for some reason. And
> depending on the details, you might first need to construct the
> message you're going to send, which might be able to fail too. Also,
> the data might be encrypted using SSL, so you might have to decrypt
> it, which can also fail, and you might need to encrypt data before
> sending it, which can fail. In fact, if you're using the OpenSSL,
> trying to call SSL_read() or SSL_write() can both read and write data
> from the socket, even multiple times, so you have extra opportunities
> to fail.

I know sending a commit request may get an error from various underlying 
functions, but we're talking about the client side, not the Postgres's server 
side that could unexpectedly ereport(ERROR) somewhere.  So, the new FDW commit 
routine won't lose control and can return an error code as its return value.  
For instance, the FDW commit routine for DBMS-X would typically be:

int
DBMSXCommit(...)
{
        int ret;

        /* extract info from the argument to pass to xa_commit() */

        ret = DBMSX_xa_commit(...);
        /* This is the actual commit function which is exposed to the app 
server (e.g. Tuxedo) through the xa_commit() interface */

        /* map xa_commit() return values to the corresponding return values of 
the FDW commit routine */
        switch (ret)
        {
                case XA_RMERR:
                        ret = ...;
                        break;
                ...
        }

        return ret;
}


> I think that's a valid concern, but we also have to have a plan that
> is realistic. Some things are indeed not possible in PostgreSQL's
> design. Also, some of these problems are things everyone has to
> somehow confront. There's no database doing 2PC that can't have a
> situation where one of the machines disappears unexpectedly due to
> some natural disaster or administrator interference. It might be the
> case that our inability to do certain things safely during transaction
> commit puts us out of compliance with the spec, but it can't be the
> case that some other system has no possible failures during
> transaction commit. The problem of the network potentially being
> disconnected between one packet and the next exists in every system.

So, we need to design how commit behaves from the user's perspective.  That's 
the functional design.  We should figure out what's the desirable response of 
commit first, and then see if we can implement it or have to compromise in some 
way.  I think we can reference the X/Open TX standard and/or JTS (Java 
Transaction Service) specification (I haven't had a chance to read them yet, 
though.)  Just in case we can't find the requested commit behavior in the 
volcano case from those specifications, ... (I'm hesitant to say this because 
it may be hard,) it's desirable to follow representative products such as 
Tuxedo and GlassFish (the reference implementation of Java EE specs.)


> > I don't think the resolver-based approach would bring us far enough.  It's
> fundamentally a bottleneck.  Such a background process should only handle
> commits whose requests failed to be sent due to server down.
> 
> Why is it fundamentally a bottleneck? It seems to me in some cases it
> could scale better than any other approach. If we have to commit on
> 100 shards in only one process we can only do those commits one at a
> time. If we can use resolver processes we could do all 100 at once if
> the user can afford to run that many resolvers, which should be way
> faster. It is true that if the resolver does not have a connection
> open and must open one, that might be slow, but presumably after that
> it can keep the connection open and reuse it for subsequent
> distributed transactions. I don't really see why that should be
> particularly slow.

Concurrent transactions are serialized at the resolver.  I heard that the 
current patch handles 2PC like this: the TM (transaction manager in Postgres 
core) requests prepare to the resolver, the resolver sends prepare to the 
remote server and wait for reply, the TM gets back control from the resolver, 
TM requests commit to the resolver, the resolver sends commit to the remote 
server and wait for reply, and TM gets back control.  The resolver handles one 
transaction at a time.

In regard to the case where one session has to commit on multiple remote 
servers, we're talking about the asynchronous interface just like what the XA 
standard provides.


Regards
Takayuki Tsunakawa

Reply via email to