On Tue, May 5, 2026 at 6:17 PM Antonin Houska <[email protected]> wrote: > > Antonin Houska <[email protected]> wrote: > > I think the problem is that with database-specific snapshot, > SnapBuildProcessRunningXacts() returns early, w/o adjusting builder->xmin > > /* > * Database specific transaction info may exist to reach CONSISTENT > state > * faster, however the code below makes no use of it. Moreover, such > * record might cause problems because the following normal > (cluster-wide) > * record can have lower value of oldestRunningXid. In that case, > let's > * wait with the cleanup for the next regular cluster-wide record. > */ > if (OidIsValid(running->dbid)) > return; > > and thus some transactions whose XID is below running->oldestRunningXid may > continue to be incorrectly considered running. > > I originally thought that this should not happen because such transactions > will be added to the builder's array of committed transactions by > SnapBuildCommitTxn() anyway. However, I failed to notice that COMMIT record of > a transaction listed in the xl_running_xacts WAL record is not guaranteed to > follow the xl_running_xacts record in WAL. In other words, even if > xl_running_xacts is created before a COMMIT record of the contained > transaction, it may end up at higher LSN in WAL. So the cleanup I relied on > might not take place. >
BTW, is it possible to write a test by using injection_points or via manual steps (by using debugger, etc) so that we can more clearly understand this problem and proposed fix? -- With Regards, Amit Kapila.
