Andres Freund <and...@anarazel.de> writes:
> On 2017-09-29 20:26:42 +0000, Tom Lane wrote:
>> get_rel_oids used to not take any relation locks at all, but that stopped
>> being a good idea with commit 3c3bb9933, which inserted a syscache lookup
>> into the function.  A concurrent DROP TABLE could now produce "cache lookup
>> failed", which we don't want to have happen in normal operation.  The best
>> solution seems to be to transiently take a lock on the relation named by
>> the RangeVar (which also makes the result of RangeVarGetRelid a lot less
>> spongy).  But we shouldn't hold the lock beyond this function, because we
>> don't want VACUUM to lock more than one table at a time.  (That would not
>> be a big problem right now, but it will become one after the pending
>> feature patch to allow multiple tables to be named in VACUUM.)

> I'm not sure how big a problem this is, but I suspect it is
> one. Autovacuum used to skip relations when they're locked, and the
> vacuum isn't an anti-wraparound one.  But after this change it appears
> we'll get stuck behind this new lock, even if VACOPT_NOWAIT is
> specified.  That's bad because now relations that are AEL locked
> (e.g. because of some longrunning DDL) can block global autovacuum
> progress.

Hm.  Maybe we should take this lock with nowait if the vacuum option
is specified?

Another idea is to revert both this patch and 3c3bb9933, and instead
handle partitioning more like we handle recursion for toast tables, ie
make no decisions until after we do have lock on a relation.  The
really fundamental problem here is that 3c3bb9933 thought it is OK
to do a syscache lookup on a table we have no lock on, which is flat
wrong.  But we don't necessarily have to do it exactly like that.

                        regards, tom lane

Reply via email to