On Wed, Jun 7, 2023 at 6:02 PM Tomas Vondra <tomas.von...@enterprisedb.com> wrote: > > > Well, I think the issue is pretty clear - we end up with an initial > snapshot that's in between the ASSIGNMENT and NEW_CID, and because > NEW_CID has both xact and subxact XID it fails because we add two TXNs > with the same LSN, not realizing one of them is subxact. > > That's obviously wrong, although somewhat benign in production because > it only fails because of hitting an assert. >
Doesn't this indicate that we can end up decoding a partial transaction when we restore a snapshot? Won't that be a problem even for production? > Regular builds are likely to > just ignore it, although I haven't checked if the COMMIT cleanup (I > wonder if we remove the subxact from the toplevel list on commit). > > I think the problem is we just grab an existing snapshot, before all > running xacts complete. Maybe we should fix that, and leave the > needs_full_snapshot alone. > It is not clear what exactly you have in mind to fix this because if there is no running xact, we don't even need to restore the snapshot because of a prior check "if (running->oldestRunningXid == running->nextXid)". I think the main problem is that we started decoding immediately from the point where we restored a snapshot as at that point we could have some partial running xacts. -- With Regards, Amit Kapila.