Buildfarm member culicidae just showed a transient failure in the 9.4 branch:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2017-07-21%2017%3A49%3A37 It's an assert trap, for which the buildfarm helpfully captured a stack trace: #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 #1 0x00007fb8d388a3fa in __GI_abort () at abort.c:89 #2 0x0000558d34d90814 in ExceptionalCondition (conditionName=conditionName@entry=0x558d34df6e2d "!(!found)", errorType=errorType@entry=0x558d34dcef3c "FailedAssertion", fileName=fileName@entry=0x558d34f19dc0 "/home/andres/build/buildfarm-culicidae/REL9_4_STABLE/pgsql.build/../pgsql/src/backend/storage/lmgr/predicate.c", lineNumber=lineNumber@entry=2023) at /home/andres/build/buildfarm-culicidae/REL9_4_STABLE/pgsql.build/../pgsql/src/backend/utils/error/assert.c:54 #3 0x0000558d34c9374b in RestoreScratchTarget (lockheld=lockheld@entry=1 '\001') at /home/andres/build/buildfarm-culicidae/REL9_4_STABLE/pgsql.build/../pgsql/src/backend/storage/lmgr/predicate.c:2023 #4 0x0000558d34c966c4 in DropAllPredicateLocksFromTable (transfer=1 '\001', relation=relation@entry=0x7fb8d4d3aa18) at /home/andres/build/buildfarm-culicidae/REL9_4_STABLE/pgsql.build/../pgsql/src/backend/storage/lmgr/predicate.c:2997 #5 TransferPredicateLocksToHeapRelation (relation=relation@entry=0x7fb8d4d3aa18) at /home/andres/build/buildfarm-culicidae/REL9_4_STABLE/pgsql.build/../pgsql/src/backend/storage/lmgr/predicate.c:3014 #6 0x0000558d34ac7a70 in index_drop (indexId=29755, concurrent=concurrent@entry=0 '\000') at /home/andres/build/buildfarm-culicidae/REL9_4_STABLE/pgsql.build/../pgsql/src/backend/catalog/index.c:1516 #7 0x0000558d34ac00f8 in doDeletion (flags=-1369083928, object=0x558d35c2c03c) at /home/andres/build/buildfarm-culicidae/REL9_4_STABLE/pgsql.build/../pgsql/src/backend/catalog/dependency.c:1125 #8 deleteOneObject (object=0x558d35c2c03c, depRel=depRel@entry=0x7fffae656fe8, flags=flags@entry=0) at /home/andres/build/buildfarm-culicidae/REL9_4_STABLE/pgsql.build/../pgsql/src/backend/catalog/dependency.c:1036 #9 0x0000558d34ac0545 in deleteObjectsInList (targetObjects=targetObjects@entry=0x558d35bae140, depRel=depRel@entry=0x7fffae656fe8, flags=flags@entry=0) at /home/andres/build/buildfarm-culicidae/REL9_4_STABLE/pgsql.build/../pgsql/src/backend/catalog/dependency.c:227 #10 0x0000558d34ac06c8 in performMultipleDeletions (objects=objects@entry=0x558d35badef0, behavior=DROP_CASCADE, flags=flags@entry=0) at /home/andres/build/buildfarm-culicidae/REL9_4_STABLE/pgsql.build/../pgsql/src/backend/catalog/dependency.c:366 #11 0x0000558d34b3e2e9 in RemoveObjects (stmt=stmt@entry=0x558d35bf5678) at /home/andres/build/buildfarm-culicidae/REL9_4_STABLE/pgsql.build/../pgsql/src/backend/commands/dropcmds.c:134 #12 0x0000558d34ca61f0 in ExecDropStmt (stmt=stmt@entry=0x558d35bf5678, isTopLevel=isTopLevel@entry=1 '\001') at /home/andres/build/buildfarm-culicidae/REL9_4_STABLE/pgsql.build/../pgsql/src/backend/tcop/utility.c:1364 #13 0x0000558d34ca8455 in ProcessUtilitySlow (parsetree=parsetree@entry=0x558d35bf5678, queryString=queryString@entry=0x558d35bf4b50 "DROP SCHEMA selinto_schema CASCADE;", context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=params@entry=0x0, dest=dest@entry=0x558d35bf5a20, completionTag=completionTag@entry=0x7fffae657710 "") at /home/andres/build/buildfarm-culicidae/REL9_4_STABLE/pgsql.build/../pgsql/src/backend/tcop/utility.c:1295 I've been staring at that for a little while, and I can't see any logic error that would lead to the failure. Clearly it'd be expected if two sessions tried to remove/reinsert the "scratch target" concurrently, but the locking operations should be enough to prevent that. (Moreover, if that had happened, you'd have expected an earlier assertion failure in one or the other of the RemoveScratchTarget calls.) Plausible explanations at this point seem to be: 1. Cosmic ray bit-flip. 2. There's some bug in the lock infrastructure, allowing two processes to acquire an LWLock concurrently. 3. Logic error I'm missing. Probably it's #3, but what? And, while I'm looking at this ... isn't this "scratch target" logic just an ineffective attempt at waving a dead chicken? It's assuming that freeing an entry in a shared hash table guarantees that it can insert another entry. But that hash table is partitioned, meaning it has a separate freelist per partition. So the extra entry only provides a guarantee that you can insert something into the same partition it's in, making it useless for this purpose AFAICS. By the same token, I do not think I believe the nearby assumptions that deleting one entry from PredicateLockHash guarantees we can insert another one. That hash is partitioned as well. It looks to me like we either need to do a fairly significant rewrite here, or to give up on making these hashtables partitioned. Either one is pretty annoying, considering the very low probability of running out of shared memory right here; but what we've got is not up to project standards IMO. I have some ideas about fixing this by enlisting the help of dynahash.c explicitly, rather than fooling with "scratch entries". But I haven't been able yet to write a design for that that doesn't have obvious bugs. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers