I wrote: > I've been chasing the intermittent "cache lookup failed for access method > 403" failure at session startup that's been seen lately in the buildfarm, > for instance here: > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=axolotl&dt=2015-06-04%2019%3A22%3A46 > (Axolotl has shown this 3 times in the last 90 days, not sure if any > others have seen it.) I hypothesized that this was triggered by the > "VACUUM FULL pg_am" in the concurrently running vacuum.sql regression > test, so I started running the regression tests in parallel with a > shell script doing > while sleep 0.1; do psql -c 'vacuum full pg_am' regression; done > and sure enough, I can reproduce it once in awhile.
I've tracked down what's happening here, and it's pretty nasty. relcache.c keeps a list of the OIDs of all system catalogs/indexes that are in the relcache init file, which it uses to decide whether the init file needs to be updated after a relcache invalidation has been issued against some relation. That list is created in one of two ways: 1. If we read in an init file, we just record all the entries we saw. 2. If we write out an init file, we just record all the entries we wrote out. Before doing #2, we open all the catalog caches, which should result in populating the relcache with all the interesting system catalogs and indexes. Now there's a race condition here, which is that we might receive an invalidation message telling us that one of those relcache entries is stale before we get done writing it out. Fine, we check for that, and if it happens then we discard the already-stale init file instead of installing it as valid. That all works. BUT: the session keeps on using the initFileRelationIds list that it constructed while writing the bogus init file. Processing of the invalidation message might've resulted in flushing one or more relcache entries, so that they were never written out and thus are missing from the initFileRelationIds list. This can lead to wrong decisions later in the session about whether its own actions require init file updates. To be concrete, here's the sequence of events that I see happening in the above-described test case: 1. The "VACUUM FULL pg_am" in the vacuum.sql regression test rewrites pg_am, sends out a relcache inval for pg_am, and unlinks the init file. (Actually the init file was probably already gone from other activity.) 2. An incoming session from the manual script fails to open the init file, so it loads up its catalog caches. Somewhere pretty late in that sequence, it notices the relcache inval for pg_am and flushes its pg_am relcache entry, which it then has no need to reload later in the sequence. So when it goes to write the init file, it doesn't write a pg_am entry and doesn't add pg_am's OID to the initFileRelationIds list either. At the end of that, it correctly notices that what it's written is stale, so it doesn't install a busted init file ... but it does sally on with a busted initFileRelationIds list. 3. Meanwhile, some other regression-test session that arrived slightly later successfully builds and installs a valid init file. 4. When the manual script session does "vacuum full pg_am", it does not realize that there ought to be an init file invalidation event from that. So the init file created by step 3 is left alone, even though it now contains obsolete data about the relfilenode of pg_am. 5. Specifically, the init file's pg_am entry is referencing a physical disk file that has been truncated to zero size and is pending deletion at the next checkpoint. So, when incoming sessions decide they need to fetch a pg_am entry, the symptom is "row not found" rather than "file not found". That matches the failure symptom we see in the buildfarm. Now, this test case seems to require two successive vacuum fulls on pg_am in order to create the observed problem, which is something that doesn't occur in the regression tests themselves, so how come we saw it in the buildfarm? I think the answer is probably that an sinval overflow/reset event can substitute for step 1. I tried instrumenting RelationCacheInitializePhase3 to log a complaint anytime the list length wasn't the currently-expected 94 entries, and got results like these: LOG: length(initFileRelationIds) = 11 in pid 2054 LOG: length(initFileRelationIds) = 93 in pid 2298 LOG: length(initFileRelationIds) = 93 in pid 4631 LOG: length(initFileRelationIds) = 11 in pid 4684 LOG: length(initFileRelationIds) = 11 in pid 4680 LOG: length(initFileRelationIds) = 11 in pid 4695 LOG: length(initFileRelationIds) = 11 in pid 4692 LOG: length(initFileRelationIds) = 11 in pid 4682 LOG: length(initFileRelationIds) = 11 in pid 4687 LOG: length(initFileRelationIds) = 93 in pid 5722 The events with one missing entry are probably consequences of the sequence described above, but the events with only 11 entries seem like they must reflect an sinval overflow causing everything not nailed in cache to be lost. (Note that there are 11 nailed non-shared relations at present, so that's why 11 and not some other number.) Bottom line is that what we have is an unsafe way of constructing the list of significant relations. What I'm inclined to do is get rid of initFileRelationIds and RelationIdIsInInitFile() as such altogether, and instead have a function in syscache.c that reports "true" if the relation OID being inquired of is one of the tables or indexes that are listed in the static catalog cache definitions. (RelationHasSysCache is the same general idea, but it considers only tables not indexes.) This is a bit more coupling between "is in init file" and "supports a syscache" than was originally intended, but we clearly need a less shaky way of determining "is in init file", and I don't see a better alternative. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers