I believe I've identified the reason why skink and some other buildfarm members have been failing the pg_upgrade test recently. It is that recent changes in sequence support have caused binary-upgrade restore runs to do some sequence OID/relfilenode assignments without any heed to the OIDs that pg_upgrade tried to impose on those sequences. Once those sequences have relfilenodes other than the intended ones, they are land mines for all subsequent pg_upgrade-controlled table OID assignments.
I am not very sure why it's so hard to duplicate the misbehavior; perhaps, in order to make the failure happen with the current regression tests, it's necessary for a background auto-analyze to happen and consume some OIDs (for pg_statistic TOAST entries) at just the wrong time. However, I can definitely demonstrate that there are uncontrolled relfilenode assignments happening during pg_upgrade's restore run. I stuck an elog() call into GetNewObjectId(), along with generation of a stack trace using backtrace(), and here is one example: [593daad3.4863:2243] LOG: generated OID 16735 [593daad3.4863:2244] STATEMENT: -- For binary upgrade, must preserve pg_class oids SELECT pg_catalog.binary_upgrade_set_next_heap_pg_class_oid('46851'::pg_catalog.oid); -- For binary upgrade, must preserve pg_type oid SELECT pg_catalog.binary_upgrade_set_next_pg_type_oid('46852'::pg_catalog.oid); ALTER TABLE "itest10" ALTER COLUMN "a" ADD GENERATED BY DEFAULT AS IDENTITY ( SEQUENCE NAME "itest10_a_seq" START WITH 1 INCREMENT BY 1 NO MINVALUE NO MAXVALUE CACHE 1 ); postgres: postgres regression [local] ALTER TABLE(GetNewObjectId+0xda) [0x50397a] postgres: postgres regression [local] ALTER TABLE(GetNewRelFileNode+0xec) [0x52430c] postgres: postgres regression [local] ALTER TABLE(RelationSetNewRelfilenode+0x79) [0x851d59] postgres: postgres regression [local] ALTER TABLE(AlterSequence+0x1cd) [0x5d976d] postgres: postgres regression [local] ALTER TABLE() [0x75d279] postgres: postgres regression [local] ALTER TABLE(standard_ProcessUtility+0xb7) [0x75dec7] postgres: postgres regression [local] ALTER TABLE() [0x75cb1d] postgres: postgres regression [local] ALTER TABLE(standard_ProcessUtility+0xb7) [0x75dec7] postgres: postgres regression [local] ALTER TABLE() [0x759f0b] postgres: postgres regression [local] ALTER TABLE() [0x75ae91] postgres: postgres regression [local] ALTER TABLE(PortalRun+0x250) [0x75b740] postgres: postgres regression [local] ALTER TABLE() [0x757be7] postgres: postgres regression [local] ALTER TABLE(PostgresMain+0xe08) [0x759968] postgres: postgres regression [local] ALTER TABLE(PostmasterMain+0x1a99) [0x6e21a9] postgres: postgres regression [local] ALTER TABLE(main+0x6b8) [0x65b958] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3f3bc1ed1d] postgres: postgres regression [local] ALTER TABLE() [0x473899] Judging by when we started to see buildfarm failures, I think that commit 3d79013b9 probably broke it, but the problem seems latent in the whole concept of transactional sequence information. Not sure what we want to do about it. One idea is to make ALTER SEQUENCE not so transactional when in binary-upgrade mode. (I'm also tempted to make GetNewRelFileNode complain if IsBinaryUpgrade is true, but that's a separate matter.) In any case, this is a "must fix" problem IMO, so I'll go add it to the open items list. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers