On Tue, Nov 25, 2025 at 10:25 PM Zhijie Hou (Fujitsu) <[email protected]> wrote: > > On Wednesday, November 26, 2025 2:57 AM Masahiko Sawada > <[email protected]> wrote: > > > > On Tue, Nov 25, 2025 at 4:02 AM Zhijie Hou (Fujitsu) > > <[email protected]> > > wrote: > > > > > > On Tuesday, November 25, 2025 3:30 AM Masahiko Sawada > > <[email protected]> wrote: > > > > > > > > > > > > Given that the computation of xmin and catalog_xmin among all slots > > > > could be executed concurrently, could the following scenario happen > > > > where > > > > procArray->replication_slot_xmin and replication_slot_catalog_xmin > > > > procArray->are retreat to a non-invalid > > > > XID? > > > > > > > > 1. Suppose the initial value procArray->replication_slot_catalog_xmin is > > 50. > > > > 2. Process-A updates its owned slot's catalog_xmin to 100, and > > > > computes the new catalog_xmin as 100 while holding > > > > ReplicationSlotControlLock in a shared mode in > > > > ReplicationSlotsComputeRequiredLSN(). But it doesn't update the > > procArray's catalog_xmin value yet. > > > > 3. Process-B updates its owned slot's catalog_xmin to 150, and > > > > computes the new catalog_xmin as 150. > > > > 4. Process-B updates the procArray->replication_slot_catalog_xmin to > > 150. > > > > 5. Process-A updates the procArray->repilcation_slot_catalog_xmin to > > > > 100, which was 150. > > > > > > After further investigation, I think that steps 3 and 4 cannot occur > > > because Process-B must have already encountered the catalog_xmin > > > maintained by Process-A, either 50 or 100. Consequently, Process-B > > > will refrain from updating the catalog_xmin to a more recent value, such > > > as > > 150. > > > > Right. But the following scenario seems to happen: > > > > 1. Both processes have a slot with effective_catalog_xmin = 100. > > 2. Process-A updates effective_catalog_xmin to 150, and computes the new > > catalog_xmin as 100 because process-B slot still has effective_catalog_xmin > > = > > 100. > > 3. Process-B updates effective_catalog_xmin to 150, and computes the new > > catalog_xmin as 150. > > 4. Process-B updates procArray->replication_slot_catalog_xmin to 150. > > 5. Process-A updates procArray->replication_slot_catalog_xmin to 100. > > I think this scenario can occur, but is not harmful. Because the catalog rows > removed prior to xid:150 would no longer be used, as both slots have advanced > their catalog_xmin and flushed the value to disk. Therefore, even if > replication_slot_catalog_xmin regresses, it should be OK. > > Considering all above, I think allowing concurrent xmin computation, as the > patch does, is acceptable. What do you think ?
I agree with your analysis. Another thing I'd like to confirm is that in an extreme case, if the server crashes suddenly after removing catalog tuples older than XID 100 and logical decoding restarts, it ends up missing necessary catalog tuples? I think it's not a problem as long as the subscriber knows the next commit LSN they want but could it be problematic if the user switches to use the logical decoding SQL API? I might be worrying too much, though. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
