Re: repeated decoding of prepared transactions

Amit Kapila Sat, 20 Feb 2021 22:02:56 -0800

On Sat, Feb 20, 2021 at 10:26 PM Andres Freund <[email protected]> wrote:
>
> Hi,
>
> On Fri, Feb 19, 2021, at 19:38, Amit Kapila wrote:
> > On Fri, Feb 19, 2021 at 8:23 PM Markus Wanner
> > <[email protected]> wrote:
> > >
> > > With that line of thinking, the point in time (or in WAL) of the COMMIT
> > > PREPARED does not matter at all to reason about the decoding of the
> > > PREPARE operation.  Instead, there are only exactly two cases to consider:
> > >
> > > a) the PREPARE happened before the start_decoding_at LSN and must not be
> > > decoded. (But the effects of the PREPARE must then be included in the
> > > initial synchronization. If that's not supported, the output plugin
> > > should not enable two-phase commit.)
> > >
> >
> > I see a problem with this assumption. During the initial
> > synchronization, this transaction won't be visible to snapshot and we
> > won't copy it. Then later if we won't decode and send it then the
> > replica will be out of sync. Such a problem won't happen with Ajin's
> > patch.
>
> Why isn't the more obvious answer to this to not allow/disable 2pc decoding 
> during the initial sync?
>

Here, I am assuming you are asking to disable 2PC both via
apply-worker and tablesync worker till the initial sync (aka all
tables are in SUBREL_STATE_READY state) phase is complete. If we do
that and what if commit prepared happened after the initial sync phase
but prepare happened before that? At Commit prepared because the 2PC
is enabled, we will just send Commit Prepared without the actual data
and prepare. Now, to solve that say we remember in TXN that at prepare
time 2PC was not enabled so at commit prepared time consider that 2PC
is disabled for that TXN and send the entire transaction along with
commit as we do for non-2PC TXNs. But it is possible that a restart
might happen before the commit prepared and then it is possible that
prepare falls before start_decoding_at point so we will still skip
sending it even though 2PC is enabled after the restart and just send
the commit prepared. So, again that can lead to replica going out of
sync.

The other thing related to this is to see to ensure that via SQL APIs
we don't skip any prepared xacts and just return commit prepared.
Basically, the example case, I have described in my email above [1].

One of the ideas I have previously speculated to overcome these
challenges is to someway persist the information of Prepares that are
decoded. Say, after sending prepare, we update the slot information on
disk to indicate that the particular GID is sent. Then next time
whenever we have to skip prepare due to whatever reason, we can check
the existence of persistent information on disk for that GID, if it
exists then we need to send just Commit Prepared, otherwise, the
entire transaction. We can remove this information during or after
CheckPointSnapBuild, basically, we can remove the information of all
GID's that are after cutoff LSN computed via
ReplicationSlotsComputeLogicalRestartLSN. But that seems to be costly
so we didn't pursue it.

[1] -
https://www.postgresql.org/message-id/CAA4eK1L5aX1BL9Xg-wSULbFeB417G0v9qk5qZ6NbYCkCo6JUGQ%40mail.gmail.com

--
With Regards,
Amit Kapila.

Re: repeated decoding of prepared transactions

Reply via email to