Hi,

On Sun, Jun 14, 2026 at 12:13 AM cca5507 <[email protected]> wrote:
>
> Hi hackers,
>
> When doing a whole database vacuum, we scan pg_class to construct
> a list of vacuumable tables. For each vacuumable table, we call
> vacuum_is_permitted_for_relation() to check permissions. If a
> concurrent drop happens, the pg_class_aclcheck() might report an
> error because of failing to search the syscache:
>
> ERROR:  relation with OID ****** does not exist
>
> To fix it, we can use pg_class_aclcheck_ext() to detect the concurrent
> drop and report a warning instead.
>
> Note that a concurrent drop after constructing the list of vacuumable
> tables is handled by vacuum_open_relation().
>
> Thoughts?

Nice catch!

IMHO, the vacuum command failing in this situation may not be a huge
problem in practice, because the user sees the error and can re-issue
the command.

get_all_vacuum_rels is called only for database-wide vacuum (neither
autovacuum nor vacuum <<tables-list>> calls it - in those cases, the
permission check happens after the table lock is taken in
vacuum_open_relation).

Let's think about what happens if we don't fix this. For example, a
database-wide vacuum with 1000 tables has scanned 990 of them in
pg_class and the 991st table gets concurrently dropped - the vacuum
command fails and the user needs to notice that and re-issue the
command, at which point it rescans pg_class. What exactly is the
problem we're solving? Is it the vacuum command failing and the user
missing it or forgetting to re-issue? The pg_class scan being
expensive? Or consistency with how autovacuum and vacuum
<<tables-list>> handle this?

ReindexMultipleTables has a similar scan on pg_class with a permission
check, but it is not affected because the permission check is gated
behind shared relations (so droppable user tables never reach it).

Some comments on the v3 patches:

1/
+ *
+ * Note: use pg_class_aclcheck_ext() to detect a concurrent drop.

This comment seems redundant - the function call, the missing_ok
check, and the warning message already make it clear.

2/
+ * Note that the relation might not be locked, so it can be dropped
concurrently.
+ * This can happen when doing a whole database vacuum or analyze in
+ * get_all_vacuum_rels().  We issue a WARNING log message and return false in
+ * this case.

Instead of saying "relation might not be locked", can we be more
explicit about the exact cases? For example: For database-wide
vacuums, this function is called without holding a relation lock, so
the table can be dropped concurrently. In that case, issue a WARNING
and return false.

I couldn't think of a good way to reproduce this other than creating a
large number of tables to make the pg_class scan costlier. For
predictable testing I might agree to adding an injection point.
However, I have some comments:

3/
+# Make sure that current user is not the owner of current database.
+step setrole1
+{
+    SET ROLE regress_vacuum;
+}

Is this necessary?

4/
+step analyze1
+{
+    ANALYZE;
+}

I think just testing VACUUM or VACUUM ANALYZE in a single test keeps
things smaller, since the underlying code is the same for both.

5/
+++ b/src/test/modules/injection_points/specs/vacuum_concurrent_drop.spec

Why do you need isolation tests with an injection point? Why not a TAP
test under src/test/modules/test_misc? A TAP test with just VACUUM
(not ANALYZE) would keep the test simpler with fewer lines of code.

6/
+ INJECTION_POINT("vacuum-constructing-vacuumable-rels", NULL);

Nit: How about a shorter name like vacuum-get-rels or vacuum-build-rels?

--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com


Reply via email to