On 2020/09/11 0:37, Masahiko Sawada wrote:
On Tue, 8 Sep 2020 at 13:00, tsunakawa.ta...@fujitsu.com
<tsunakawa.ta...@fujitsu.com> wrote:

From: Amit Kapila <amit.kapil...@gmail.com>
I intend to say that the global-visibility work can impact this in a
major way and we have analyzed that to some extent during a discussion
on the other thread. So, I think without having a complete
design/solution that addresses both the 2PC and global-visibility, it
is not apparent what is the right way to proceed. It seems to me that
rather than working on individual (or smaller) parts one needs to come
up with a bigger picture (or overall design) and then once we have
figured that out correctly, it would be easier to decide which parts
can go first.

I'm really sorry I've been getting late and late and latex10 to publish the 
revised scale-out design wiki to discuss the big picture!  I don't know why I'm 
taking this long time; I feel I were captive in a time prison (yes, nobody is 
holding me captive; I'm just late.)  Please wait a few days.

But to proceed with the development, let me comment on the atomic commit and 
global visibility.

* We have to hear from Andrey about their check on the possibility that 
Clock-SI could be Microsoft's patent and if we can avoid it.

* I have a feeling that we can adopt the algorithm used by Spanner, 
CockroachDB, and YugabyteDB.  That is, 2PC for multi-node atomic commit, Paxos 
or Raft for replica synchronization (in the process of commit) to make 2PC more 
highly available, and the timestamp-based global visibility.  However, the 
timestamp-based approach makes the database instance shut down when the node's 
clock is distant from the other nodes.

* Or, maybe we can use the following Commitment ordering that doesn't require 
the timestamp or any other information to be transferred among the cluster 
nodes.  However, this seems to have to track the order of read and write 
operations among concurrent transactions to ensure the correct commit order, so 
I'm not sure about the performance.  The MVCO paper seems to present the 
information we need, but I haven't understood it well yet (it's difficult.)  
Could you anybody kindly interpret this?

Commitment ordering (CO) - yoavraz2
https://sites.google.com/site/yoavraz2/the_principle_of_co


As for the Sawada-san's 2PC patch, which I find interesting purely as FDW 
enhancement, I raised the following issues to be addressed:

1. Make FDW API implementable by other FDWs than postgres_fdw (this is what 
Amit-san kindly pointed out.)  I think oracle_fdw and jdbc_fdw would be good 
examples to consider, while MySQL may not be good because it exposes the XA 
feature as SQL statements, not C functions as defined in the XA specification.

I agree that we need to verify new FDW APIs will be suitable for other
FDWs than postgres_fdw as well.


2. 2PC processing is queued and serialized in one background worker.  That 
severely subdues transaction throughput.  Each backend should perform 2PC.

Not sure it's safe that each backend perform PREPARE and COMMIT
PREPARED since the current design is for not leading an inconsistency
between the actual transaction result and the result the user sees.

Can I check my understanding about why the resolver process is necessary?

Firstly, you think that issuing COMMIT PREPARED command to the foreign server 
can cause an error, for example, because of connection error, OOM, etc. On the 
other hand, only waiting for other process to issue the command is less likely 
to cause an error. Right?

If an error occurs in backend process after commit record is WAL-logged, the 
error would be reported to the client and it may misunderstand that the 
transaction failed even though commit record was already flushed. So you think 
that each backend should not issue COMMIT PREPARED command to avoid that 
inconsistency. To avoid that, it's better to make other process, the resolver, 
issue the command and just make each backend wait for that to completed. Right?

Also using the resolver process has another merit; when there are unresolved 
foreign transactions but the corresponding backend exits, the resolver can try 
to resolve them. If something like this automatic resolution is necessary, the 
process like the resolver would be necessary. Right?

To the contrary, if we don't need such automatic resolution (i.e., unresolved 
foreign transactions always need to be resolved manually) and we can prevent 
the code to issue COMMIT PREPARED command from causing an error (not sure if 
that's possible, though...), probably we don't need the resolver process. Right?


But in the future, I think we can have multiple background workers per
database for better performance.

Yes, that's an idea.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply via email to