On 2015-01-04 21:54:40 +0100, Andres Freund wrote: > On January 4, 2015 9:51:43 PM CET, Andrew Dunstan <and...@dunslane.net> wrote: > > > >On 12/15/2014 12:04 PM, Andres Freund wrote: > > > >>> I think the safest fix would be to defer catchup interrupt > >processing > >>> while you're in this mode. You don't really want to be processing > >any > >>> remote sinval messages at all, I'd think. > >> Well, we need to do relmap, smgr and similar things. So I think > >that'd > >> be more complicated than we want. > >> > > > > > > > >Where are we on this? Traffic seems to have gone quite but we still > >have > >a bunch of buildfarm animals red. > > I've a simple fix (similar too what I iriginally outkined) which I > plan to post soonish. I've tried a bunch of things roughly in the vein > of Tom's suggestions, but they all are more invasive and still > incomplete.
Attached. Note that part 1) referenced in the commit message is actually problematic in all branches. I think we actually should backpatch that part all the way, because if we ever hit that case the consequences of the current coding will be rather hard to analyze. If others think so as well, I'm going to split the commit into two parts, so commit messages for < 9.4 won't reference logical decoding. Since there hasn't been a report of that error for a long while (~8.1 era), I can also live with not backpatching further than 9.4. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
>From e5a6c2e9211561bc5c9985a8bf107d9911aad9f6 Mon Sep 17 00:00:00 2001 From: Andres Freund <and...@anarazel.de> Date: Mon, 5 Jan 2015 11:52:05 +0100 Subject: [PATCH] Improve handling of relcache invalidations to currently invisible relations. The corner case where a relcache invalidation tried to rebuild the entry for a referenced relation but couldn't find it in the catalog wasn't correct. 1) The code tried to RelationCacheDelete/RelationDestroyRelation the entry. That didn't work when assertions are enabled because the latter contains an assertion ensuring the refcount is zero. It's also a bad idea, because by virtue of being referenced somebody might actually look at the entry, which is possible if the error is trapped and handled via a subtransaction abort. Instead just error out, without deleting the entry. As the entry is marked invalid, the worst that can happen is that we leak some memory. 2) When using a historic snapshot for logical decoding it can validly happen that a relation that's in the relcache isn't visible to that historic snapshot. E.g. if the relation is referenced in the query that uses the SQL interface for logical decoding. While erroring cleanly, instead of hitting an assertion due to 1) is already an improvement, it's not good enough. So additionally allow that case without an error if a historic snapshot is set up - that won't allow an invalid entry to stay in the cache because it's a) already marked invalid and will thus be rebuilt during the next access b) the syscaches will be reset at the end of decoding. There might be prettier solutions to handle this case, but all that we could think of so far end up being much more complex than this simple fix. This fixes the assertion failures reported by the buildfarm (markhor, tick, leech) after the introduction of new regression tests in 89fd41b390a4. The failure there weren't actually directly caused by CLOBBER_CACHE_ALWAYS but via the long runtimes due to it. Backpatch the whole patch to 9.4, and 1) from above all the way back. --- src/backend/utils/cache/relcache.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c index c2e574d..2cce2c1 100644 --- a/src/backend/utils/cache/relcache.c +++ b/src/backend/utils/cache/relcache.c @@ -2192,9 +2192,24 @@ RelationClearRelation(Relation relation, bool rebuild) newrel = RelationBuildDesc(save_relid, false); if (newrel == NULL) { - /* Should only get here if relation was deleted */ - RelationCacheDelete(relation); - RelationDestroyRelation(relation, false); + /* + * We can validly get here, if we're using a historic snapshot in + * which a relation, accessed from outside logical decoding, is + * still invisible. In that case it's fine to just mark the + * relation as invalid and return - it'll fully get reloaded after + * by the cache reset at the end of logical decoding. + */ + if (HistoricSnapshotActive()) + return; + + /* + * This situation shouldn't happen as dropping a relation + * shouldn't be possible if still referenced + * (c.f. CheckTableNotInUse()). But if get here anyway, we can't + * just delete the relcache entry as it's possibly still + * referenced and the error might get caught and handled via a + * subtransaction rollback. + */ elog(ERROR, "relation %u deleted while still in use", save_relid); } -- 2.0.0.rc2.4.g1dc51c6.dirty
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers