On 2015-01-04 21:54:40 +0100, Andres Freund wrote:
> On January 4, 2015 9:51:43 PM CET, Andrew Dunstan <and...@dunslane.net> wrote:
> >
> >On 12/15/2014 12:04 PM, Andres Freund wrote:
> >
> >>> I think the safest fix would be to defer catchup interrupt
> >processing
> >>> while you're in this mode.  You don't really want to be processing
> >any
> >>> remote sinval messages at all, I'd think.
> >> Well, we need to do relmap, smgr and similar things. So I think
> >that'd
> >> be more complicated than we want.
> >>
> >
> >
> >
> >Where are we on this? Traffic seems to have gone quite but we still
> >have 
> >a bunch of buildfarm animals red.
> 
> I've a simple fix (similar too what I iriginally outkined) which I
> plan to post soonish. I've tried a bunch of things roughly in the vein
> of Tom's suggestions, but they all are more invasive and still
> incomplete.

Attached.

Note that part 1) referenced in the commit message is actually
problematic in all branches. I think we actually should backpatch that
part all the way, because if we ever hit that case the consequences of
the current coding will be rather hard to analyze. If others think so as
well, I'm going to split the commit into two parts, so commit messages
for < 9.4 won't reference logical decoding. Since there hasn't been a
report of that error for a long while (~8.1 era), I can also live with
not backpatching further than 9.4.

Greetings,

Andres Freund

-- 
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
>From e5a6c2e9211561bc5c9985a8bf107d9911aad9f6 Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Mon, 5 Jan 2015 11:52:05 +0100
Subject: [PATCH] Improve handling of relcache invalidations to currently
 invisible relations.

The corner case where a relcache invalidation tried to rebuild the
entry for a referenced relation but couldn't find it in the catalog
wasn't correct.

1) The code tried to RelationCacheDelete/RelationDestroyRelation the
   entry. That didn't work when assertions are enabled because the
   latter contains an assertion ensuring the refcount is zero. It's
   also a bad idea, because by virtue of being referenced somebody
   might actually look at the entry, which is possible if the error is
   trapped and handled via a subtransaction abort.  Instead just error
   out, without deleting the entry. As the entry is marked invalid,
   the worst that can happen is that we leak some memory.

2) When using a historic snapshot for logical decoding it can validly
   happen that a relation that's in the relcache isn't visible to that
   historic snapshot. E.g. if the relation is referenced in the query
   that uses the SQL interface for logical decoding.  While erroring
   cleanly, instead of hitting an assertion due to 1) is already an
   improvement, it's not good enough. So additionally allow that case
   without an error if a historic snapshot is set up - that won't
   allow an invalid entry to stay in the cache because it's a) already
   marked invalid and will thus be rebuilt during the next access b)
   the syscaches will be reset at the end of decoding.

   There might be prettier solutions to handle this case, but all that
   we could think of so far end up being much more complex than this
   simple fix.

This fixes the assertion failures reported by the buildfarm (markhor,
tick, leech) after the introduction of new regression tests in
89fd41b390a4. The failure there weren't actually directly caused by
CLOBBER_CACHE_ALWAYS but via the long runtimes due to it.

Backpatch the whole patch to 9.4, and 1) from above all the way back.
---
 src/backend/utils/cache/relcache.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index c2e574d..2cce2c1 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -2192,9 +2192,24 @@ RelationClearRelation(Relation relation, bool rebuild)
 		newrel = RelationBuildDesc(save_relid, false);
 		if (newrel == NULL)
 		{
-			/* Should only get here if relation was deleted */
-			RelationCacheDelete(relation);
-			RelationDestroyRelation(relation, false);
+			/*
+			 * We can validly get here, if we're using a historic snapshot in
+			 * which a relation, accessed from outside logical decoding, is
+			 * still invisible. In that case it's fine to just mark the
+			 * relation as invalid and return - it'll fully get reloaded after
+			 * by the cache reset at the end of logical decoding.
+			 */
+			if (HistoricSnapshotActive())
+				return;
+
+			/*
+			 * This situation shouldn't happen as dropping a relation
+			 * shouldn't be possible if still referenced
+			 * (c.f. CheckTableNotInUse()). But if get here anyway, we can't
+			 * just delete the relcache entry as it's possibly still
+			 * referenced and the error might get caught and handled via a
+			 * subtransaction rollback.
+			 */
 			elog(ERROR, "relation %u deleted while still in use", save_relid);
 		}
 
-- 
2.0.0.rc2.4.g1dc51c6.dirty

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to