
On 2019-01-28 14:10:55 -0800, Andres Freund wrote:
> So, I'd pushed the latest version. And longfin came back with an
> interesting error:
> ERROR:  page 135 of relation "pg_class" should be empty but is not
> The only way I can currently imagine this happening is that there's a
> concurrent vacuum that discovers the page is empty, enters it into the
> FSM (which now isn't happening under an extension lock anymore), and
> then a separate backend starts to actually use that buffer.  That seems
> tight but possible.  Seems we need to re-add the
> LockRelationForExtension/UnlockRelationForExtension() logic :(

Hm, but thinking about this, isn't this a risk independent of this
change? The FSM isn't WAL logged, and given it's tree-ish structure it's
possible to "rediscover" free space (e.g. FreeSpaceMapVacuum()). ISTM
that after a crash the FSM might point to free space that doesn't
actually exist, and is rediscovered after independent changes.  Not sure
if that's actually a realistic issue.

I'm inclined to put back the
           LockBuffer(buf, BUFFER_LOCK_UNLOCK);
           LockRelationForExtension(onerel, ExclusiveLock);
           UnlockRelationForExtension(onerel, ExclusiveLock);
           if (PageIsNew(page))
dance regardless, just to get the buildfarm to green?

But I do wonder if we should just make hio.c cope with this instead.


Andres Freund

Reply via email to