Added to 2015-06 commitfest to attract some reviews and comments.

On Tue, Feb 17, 2015 at 2:56 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> Hi All,
>
> Here are the steps and infrastructure for achieving atomic commits across
> multiple foreign servers. I have tried to address most of the concerns
> raised in this mail thread before. Let me know, if I have left something.
> Attached is a WIP patch implementing the same for postgres_fdw. I have
> tried to make it FDW-independent.
>
> A. Steps during transaction processing
> ------------------------------------------------
>
> 1. When an FDW connects to a foreign server and starts a transaction, it
> registers that server with a boolean flag indicating whether that server is
> capable of participating in a two phase commit. In the patch this is
> implemented using function RegisterXactForeignServer(), which raises an
> error, thus aborting the transaction, if there is at least one foreign
> server incapable of 2PC in a multiserver transaction. This error thrown as
> early as possible. If all the foreign servers involved in the transaction
> are capable of 2PC, the function just updates the information. As of now,
> in the patch the function is in the form of a stub.
>
> Whether a foreign server is capable of 2PC, can be
> a. FDW level decision e.g. file_fdw as of now, is incapable of 2PC but it
> can build the capabilities which can be used for all the servers using
> file_fdw
> b. a decision based on server version type etc. thus FDW can decide that
> by looking at the server properties for each server
> c. a user decision where the FDW can allow a user to specify it in the
> form of CREATE/ALTER SERVER option. Implemented in the patch.
>
> For a transaction involving only a single foreign server, the current code
> remains unaltered as two phase commit is not needed. Rest of the discussion
> pertains to a transaction involving more than one foreign servers.
> At the commit or abort time, the FDW receives call backs with the
> appropriate events. FDW then takes following actions on each event.
>
> 2. On XACT_EVENT_PRE_COMMIT event, the FDW coins one prepared transaction
> id per foreign server involved and saves it along with xid, dbid, foreign
> server id and user mapping and foreign transaction status = PREPARING
> in-memory. The prepared transaction id can be anything represented as byte
> string. Same information is flushed to the disk to survive crashes. This is
> implemented in the patch as prepare_foreign_xact(). Persistent and
> in-memory storages and their usages are discussed later in the mail. FDW
> then prepares the transaction on the foreign server. If this step is
> successful, the foreign transaction status is changed to PREPARED. If the
> step is unsuccessful, the local transaction is aborted and each FDW will
> receive XACT_EVENT_ABORT (discussed later). The updates to the foreign
> transaction status need not be flushed to the disk, as they can be inferred
> from the status of local transaction.
>
> 3. If the local transaction is committed, the FDW callback will get
> XACT_EVENT_COMMIT event. Foreign transaction status is changed to
> COMMITTING. FDW tries to commit the foreign transaction with the prepared
> transaction id. If the commit is successful, the foreign transaction entry
> is removed. If the commit is unsuccessful because of local/foreign server
> crash or network failure, the foreign prepared transaction resolver takes
> care of the committing it at later point of time.
>
> 4. If the local transaction is aborted, the FDW callback will get
> XACT_EVENT_ABORT event. At this point, the FDW may or may not have prepared
> a transaction on foreign server as per step 1 above. If it has not prepared
> the transaction, it simply aborts the transaction on foreign server; a
> server crash or network failure doesn't alter the ultimate result in this
> case. If FDW has prepared the foreign transaction, it updates the foreign
> transaction status as ABORTING and tries to rollback the prepared
> transaction. If the rollback is successful, the foreign transaction entry
> is removed. If the rollback is not successful, the foreign prepared
> transaction resolver takes care of aborting it at later point of time.
>
> B. Foreign prepared transaction resolver
> ---------------------------------------------------
> In the patch this is implemented as a built-in function pg_fdw_resolve().
> Ideally the functionality should be run by a background worker process
> frequently.
>
> The resolver looks at each entry and invokes the FDW routine to resolve
> the transaction. The FDW routine returns boolean status: true if the
> prepared transaction was resolved (committed/aborted), false otherwise.
> The resolution is as follows -
> 1. If foreign transaction status is COMMITTING or ABORTING, commits or
> aborts the prepared transaction resp through the FDW routine. If the
> transaction is successfully resolved, it removes the foreign transaction
> entry.
> 2. Else, it checks if the local transaction was committed or aborted, it
> update the foreign transaction status accordingly and takes the action
> according to above step 1.
> 3. The resolver doesn't touch entries created by in-progress local
> transactions.
>
> If server/backend crashes after it has registered the foreign transaction
> entry (during step A.1), we will be left with a prepared transaction id,
> which was never prepared on the foreign server. Similarly the
> server/backend crashes after it has resolved the foreign prepared
> transaction but before removing the entry, same situation can arise. FDW
> should detect these situations, when foreign server complains about
> non-existing prepared transaction ids and consider such foreign
> transactions as resolved.
>
> After looking at all the entries the resolver flushes the entries to the
> disk, so as to retain the latest status across shutdown and crash.
>
> C. Other methods and infrastructure
> ------------------------------------------------
> 1. Method to show the current foreign transaction entries (in progress or
> waiting to be resolved). Implemented as function pg_fdw_xact() in the patch.
> 2. Method to drop foreign transaction entries in case they are resolved by
> user/DBA themselves. Not implemented in the patch.
> 3. Method to prevent altering or dropping foreign server and user mapping
> used to prepare the foreign transaction till the later gets resolved. Not
> implemented in the patch. While altering or dropping the foreign server or
> user mapping, that portion of the code needs to check if there exists an
> foreign transaction entry depending upon the foreign server or user mapping
> and should error out.
> 4. The information about the xid needs to be available till it is decided
> whether to commit or abort the foreign transaction and that decision is
> persisted. That should put some constraint on the xid wraparound or oldest
> active transaction. Not implemented in the patch.
> 5. Method to propagate the foreign transaction information to the slave.
>
> D. Persistent and in-memory storage considerations
> --------------------------------------------------------------------
> I considered following options for persistent storage
> 1. in-memory table and file(s) - The foreign transaction entries are saved
> and manipulated in shared memory. They are written to file whenever
> persistence is necessary e.g. while registering the foreign transaction in
> step A.2. Requirements C.1, C.2 need some SQL interface in the form of
> built-in functions or SQL commands.
>
> The patch implements the in-memory foreign transaction table as a fixed
> size array of foreign transaction entries (similar to prepared transaction
> entries in twophase.c). This puts a restriction on number of foreign
> prepared transactions that need to be maintained at a time. We need
> separate locks to syncronize the access to the shared memory; the patch
> uses only a single LW lock. There is restriction on the length of prepared
> transaction id (or prepared transaction information saved by FDW to be
> general), since everything is being saved in fixed size memory. We may be
> able to overcome that restriction by writing this information to separate
> files (one file per foreign prepared transaction). We need to take the same
> route as 2PC for C.5.
>
> 2. New catalog - This method takes out the need to have separate method
> for C1, C5 and even C2, also the synchronization will be taken care of by
> row locks, there will be no limit on the number of foreign transactions as
> well as the size of foreign prepared transaction information. But big
> problem with this approach is that, the changes to the catalogs are atomic
> with the local transaction. If a foreign prepared transaction can not be
> aborted while the local transaction is rolled back, that entry needs to
> retained. But since the local transaction is aborting the corresponding
> catalog entry would become invisible and thus unavailable to the resolver
> (alas! we do not have autonomous transaction support). We may be able to
> overcome this, by simulating autonomous transaction through a background
> worker (which can also act as a resolver). But the amount of communication
> and synchronization, might affect the performance.
>
> A mixed approach where the backend shifts the entries from storage in
> approach 1 to catalog, thus lifting the constraints on size is possible,
> but is very complicated.
>
> Any other ideas to use catalog table as the persistent storage here? Does
> anybody think, catalog table is a viable option?
>
> 3. WAL records - Since the algorithm follows "write ahead of action", WAL
> seems to be a possible way to persist the foreign transaction entries. But
> WAL records can not be used for repeated scan as is required by the foreign
> transaction resolver. Also, replaying WALs is controlled by checkpoint, so
> not all WALs are replayed. If a checkpoint happens after a foreign prepared
> transaction remains resolved, corresponding WALs will never be replayed,
> thus causing the foreign prepared transaction to remain unresolved forever
> without a clue. So, WALs alone don't seem to be a fit here.
>
> The algorithms rely on the FDWs to take right steps to the large extent,
> rather than controlling each step explicitly. It expects the FDWs to take
> the right steps for each event and call the right functions to manipulate
> foreign transaction entries. It does not ensure the correctness of these
> steps, by say examining the foreign transaction entries in response to each
> event or by making the callback return the information and manipulate the
> entries within the core. I am willing to go the stricter but more intrusive
> route if the others also think that way. Otherwise, the current approach is
> less intrusive and I am fine with that too.
>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Reply via email to