On 20.02.21 04:38, Amit Kapila wrote:
I see a problem with this assumption. During the initial
synchronization, this transaction won't be visible to snapshot and we
won't copy it. Then later if we won't decode and send it then the
replica will be out of sync. Such a problem won't happen with Ajin's
patch.

You are assuming that the initial snapshot is a) logical and b) dumb.

A physical snapshot very well "sees" prepared transactions and will restore them to their prepared state. But even in the logical case, I think it's beneficial to keep the decoder simpler and instead require some support for two-phase commit in the initial synchronization logic. For example using the following approach (you will recognize similarities to what snapbuild does):

1.) create the slot
2.) start to retrieve changes and queue them
3.) wait for the prepared transactions that were pending at the
    point in time of step 1 to complete
4.) take a snapshot (by visibility, w/o requiring to "see" prepared
    transactions)
5.) apply the snapshot
6.) replay the queue, filtering commits already visible in the
    snapshot

Just as with the solution proposed by Ajin and you, this has the danger of showing transactions as committed without the effects of the PREPAREs being "visible" (after step 5 but before 6).

However, this approach of solving the problem outside of the walsender has two advantages:

* The delay in step 3 can be made visible and dealt with.  As there's
  no upper boundary to that delay, it makes sense to e.g. inform the
  user after 10 minutes and provide a list of two-phase transactions
  still in progress.

* Second, it becomes possible to avoid inconsistencies during the
  reconciliation window in between steps 5 and 6 by disallowing
  concurrent (user) transactions to run until after completion of
  step 6.

Whereas the current implementation hides this in the walsender without any way to determine how much a PREPARE had been delayed or when consistency has been reached. (Of course, short of using the very same initial snapshotting approach outlined above. For which the reordering logic in the walsender does more harm than good.)

Essentially, I think I'm saying that while I agree that some kind of snapshot synchronization logic is needed, it should live in a different place.

Regards

Markus


Reply via email to