On Sat, Feb 20, 2021 at 10:26 PM Andres Freund <and...@anarazel.de> wrote: > > Hi, > > On Fri, Feb 19, 2021, at 19:38, Amit Kapila wrote: > > On Fri, Feb 19, 2021 at 8:23 PM Markus Wanner > > <markus.wan...@enterprisedb.com> wrote: > > > > > > With that line of thinking, the point in time (or in WAL) of the COMMIT > > > PREPARED does not matter at all to reason about the decoding of the > > > PREPARE operation. Instead, there are only exactly two cases to consider: > > > > > > a) the PREPARE happened before the start_decoding_at LSN and must not be > > > decoded. (But the effects of the PREPARE must then be included in the > > > initial synchronization. If that's not supported, the output plugin > > > should not enable two-phase commit.) > > > > > > > I see a problem with this assumption. During the initial > > synchronization, this transaction won't be visible to snapshot and we > > won't copy it. Then later if we won't decode and send it then the > > replica will be out of sync. Such a problem won't happen with Ajin's > > patch. > > Why isn't the more obvious answer to this to not allow/disable 2pc decoding > during the initial sync? >
Here, I am assuming you are asking to disable 2PC both via apply-worker and tablesync worker till the initial sync (aka all tables are in SUBREL_STATE_READY state) phase is complete. If we do that and what if commit prepared happened after the initial sync phase but prepare happened before that? At Commit prepared because the 2PC is enabled, we will just send Commit Prepared without the actual data and prepare. Now, to solve that say we remember in TXN that at prepare time 2PC was not enabled so at commit prepared time consider that 2PC is disabled for that TXN and send the entire transaction along with commit as we do for non-2PC TXNs. But it is possible that a restart might happen before the commit prepared and then it is possible that prepare falls before start_decoding_at point so we will still skip sending it even though 2PC is enabled after the restart and just send the commit prepared. So, again that can lead to replica going out of sync. The other thing related to this is to see to ensure that via SQL APIs we don't skip any prepared xacts and just return commit prepared. Basically, the example case, I have described in my email above [1]. One of the ideas I have previously speculated to overcome these challenges is to someway persist the information of Prepares that are decoded. Say, after sending prepare, we update the slot information on disk to indicate that the particular GID is sent. Then next time whenever we have to skip prepare due to whatever reason, we can check the existence of persistent information on disk for that GID, if it exists then we need to send just Commit Prepared, otherwise, the entire transaction. We can remove this information during or after CheckPointSnapBuild, basically, we can remove the information of all GID's that are after cutoff LSN computed via ReplicationSlotsComputeLogicalRestartLSN. But that seems to be costly so we didn't pursue it. [1] - https://www.postgresql.org/message-id/CAA4eK1L5aX1BL9Xg-wSULbFeB417G0v9qk5qZ6NbYCkCo6JUGQ%40mail.gmail.com -- With Regards, Amit Kapila.