On Tue, May 10, 2016 at 3:05 AM, Andres Freund <and...@anarazel.de> wrote: > The easy way to trigger this problem would be to have an oid wraparound > - but the WAL shows that that's not the case here. I've not figured > that one out entirely (and won't tonight). But I do see WAL records > like: > rmgr: XLOG len (rec/tot): 4/ 30, tx: 0, lsn: > 2/12004018, prev 2/12003288, desc: NEXTOID 4302693 > rmgr: XLOG len (rec/tot): 4/ 30, tx: 0, lsn: > 2/1327EA08, prev 2/1327DC60, desc: NEXTOID 4302693 > i.e. two NEXTOID records allocating the same range, which obviously > doesn't seem right. There's also every now and then close by ranges: > rmgr: XLOG len (rec/tot): 4/ 30, tx: 0, lsn: > 1/9A404DB8, prev 1/9A404270, desc: NEXTOID 3311455 > rmgr: XLOG len (rec/tot): 4/ 30, tx: 7814505, lsn: > 1/9A4EC888, prev 1/9A4EB9D0, desc: NEXTOID 3311461 > > > As far as I can see something like the above, or an oid wraparound, are > pretty much deadly for toast. > > Is anybody ready with a good defense for SatisfiesToast not doing any > actual liveliness checks?
I assume that this was installed as a performance optimization, and I don't really see why it shouldn't be or be able to be made safe. I assume that the wraparound case was deemed safe because at that time the idea of 4 billion OIDs getting used with old transactions still active seemed inconceivable. It seems to me that the real question here is how you're getting two calls to XLogPutNextOid() with the same value of ShmemVariableCache->nextOid, and the answer, as it seems to me, must be that LWLocks are broken. Either two processes are managing to hold OidGenLock in exclusive mode at the same time, or they're acquiring it in quick succession but without the second process seeing all of the updates performed by the first process. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers