On 2017-04-24 13:29:11 +0100, Simon Riggs wrote: > On 24 April 2017 at 00:25, Andres Freund <and...@anarazel.de> wrote: > > if the subxid->xid mapping doesn't actually exist - as it's the case > > with this bug afaics - we'll not get the correct toplevel > > transaction. > > The nature of the corruption is that in some cases > * a subxid will point to nothing (even though in most cases it was > already set correctly) > * the parent will point to a subxid
Right. Those cases aren't that different from the point of trying to find the parent of an subxid. > > Which'll mean the following block: > > /* > > * We now have either a top-level xid higher than xmin or an > > * indeterminate xid. We don't know whether it's top level > > or subxact > > * but it doesn't matter. If it's present, the xid is > > visible. > > */ > > for (j = 0; j < snapshot->subxcnt; j++) > > { > > if (TransactionIdEquals(xid, snapshot->subxip[j])) > > return true; > > } > > won't work correctly if suboverflowed. > > Your example of snapshots taken during recovery is not correct. Oh? > Note that SubTransGetTopmostTransaction() returns a valid, running > xid, even though it is the wrong one. Sure. > Snapshots work differently on standbys - we store all known running > xids, so the test still passes correctly, even when overflowed. I don't think that's generally true. Isn't that precisely what ProcArrayStruct->lastOverflowedXid is about? If we have a snapshot that's suboverflowed due to the lastOverflowedXid cutoff, then we the subxip array does *not* contain all known running xids anymore, we rely on pg_subtrans to only guarantee that toplevel xids are stored in the KnownAssignedXids machinery. See: * When we throw away subXIDs from KnownAssignedXids, we need to keep track of * that, similarly to tracking overflow of a PGPROC's subxids array. We do * that by remembering the lastOverflowedXID, ie the last thrown-away subXID. * As long as that is within the range of interesting XIDs, we have to assume * that subXIDs are missing from snapshots. (Note that subXID overflow occurs * on primary when 65th subXID arrives, whereas on standby it occurs when 64th * subXID arrives - that is not an error.) /* * Highest subxid that has been removed from KnownAssignedXids array to * prevent overflow; or InvalidTransactionId if none. We track this for * similar reasons to tracking overflowing cached subxids in PGXACT * entries. Must hold exclusive ProcArrayLock to change this, and shared * lock to read it. */ TransactionId lastOverflowedXid; Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers