On 2020/09/05 3:31, Alexey Kondratov wrote:
Hi,
On 2020-07-27 09:44, Andrey V. Lepikhov wrote:
On 7/27/20 11:22 AM, tsunakawa.ta...@fujitsu.com wrote:
US8356007B2 - Distributed transaction management for database systems with
multiversioning - Google Patents
https://patents.google.com/patent/US8356007
If it is, can we circumvent this patent?
Thank you for the research (and previous links too).
I haven't seen this patent before. This should be carefully studied.
I had a look on the patch set, although it is quite outdated, especially on
0003.
Two thoughts about 0003:
First, IIUC atomicity of the distributed transaction in the postgres_fdw is
achieved by the usage of 2PC. I think that this postgres_fdw 2PC support should
be separated from global snapshots.
Agreed.
It could be useful to have such atomic distributed transactions even without a
proper visibility, which is guaranteed by the global snapshot. Especially
taking into account the doubts about Clock-SI and general questions about
algorithm choosing criteria above in the thread.
Thus, I propose to split 0003 into two parts and add a separate GUC
'postgres_fdw.use_twophase', which could be turned on independently from
'postgres_fdw.use_global_snapshots'. Of course if the latter is enabled, then
2PC should be forcedly turned on as well.
Second, there are some problems with errors handling in the 0003 (thanks to
Arseny Sher for review).
+error:
+ if (!res)
+ {
+ sql = psprintf("ABORT PREPARED '%s'", fdwTransState->gid);
+ BroadcastCmd(sql);
+ elog(ERROR, "Failed to PREPARE transaction on remote node");
+ }
It seems that we should never reach this point, just because BroadcastStmt will
throw an ERROR if it fails to prepare transaction on the foreign server:
+ if (PQresultStatus(result) != expectedStatus ||
+ (handler && !handler(result, arg)))
+ {
+ elog(WARNING, "Failed command %s: status=%d, expected
status=%d", sql, PQresultStatus(result), expectedStatus);
+ pgfdw_report_error(ERROR, result, entry->conn, true, sql);
+ allOk = false;
+ }
Moreover, It doesn't make much sense to try to abort prepared xacts, since if
we failed to prepare it somewhere, then some foreign servers may become
unavailable already and this doesn't provide us a 100% guarantee of clean up.
+ /* COMMIT open transaction of we were doing 2PC */
+ if (fdwTransState->two_phase_commit &&
+ (event == XACT_EVENT_PARALLEL_COMMIT || event == XACT_EVENT_COMMIT))
+ {
+ BroadcastCmd(psprintf("COMMIT PREPARED '%s'", fdwTransState->gid));
+ }
At this point, the host (local) transaction is already committed and there is
no way to abort it gracefully. However, BroadcastCmd may rise an ERROR that
will cause a PANIC, since it is non-recoverable state:
PANIC: cannot abort transaction 487, it was already committed
Attached is a patch, which implements a plain 2PC in the postgres_fdw and adds
a GUC 'postgres_fdw.use_twophase'. Also it solves these errors handling issues
above and tries to add proper comments everywhere. I think, that 0003 should be
rebased on the top of it, or it could be a first patch in the set, since it may
be used independently. What do you think?
Thanks for the patch!
Sawada-san was proposing another 2PC patch at [1]. Do you have any thoughts
about pros and cons between your patch and Sawada-san's?
Regards,
[1]
https://www.postgresql.org/message-id/ca+fd4k4z6_b1etevqamwqhu4rx7xsrn5orl7ohj4b5b6sw-...@mail.gmail.com
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION