On Fri, May 1, 2026 at 7:00 AM Bharath Rupireddy <[email protected]> wrote: > > Hi, > > On Wed, Apr 29, 2026 at 8:41 PM Chao Li <[email protected]> wrote: > > > > > <v5-0001-PG16-Fix-pg_get_publication_tables-race-with-conc.txt><v5-0001-Fix-pg_get_publication_tables-race-with-concurren.patch><v5-0001-PG18-Fix-pg_get_publication_tables-race-with-conc.txt><v5-0001-PG17-Fix-pg_get_publication_tables-race-with-conc.txt> > > > > I am afraid this is only a partial fix. > > Thanks for reviewing it. Please find my responses below. > > > ``` > > @@ -1599,12 +1621,18 @@ pg_get_publication_tables(FunctionCallInfo fcinfo, > > ArrayType *pubnames, > > /* Show all columns when the column list is not specified. > > */ > > if (nulls[2]) > > { > > - Relation rel = table_open(relid, > > AccessShareLock); > > + Relation rel = try_table_open(relid, > > AccessShareLock); > > int nattnums = 0; > > int16 *attnums; > > - TupleDesc desc = RelationGetDescr(rel); > > + TupleDesc desc; > > int i; > > > > + /* Skip if the relation has been concurrently > > dropped. */ > > + if (rel == NULL) > > + continue; > > ``` > > > > This change uses try_table_open() to detect whether a table has been > > dropped, but try_table_open() is only called when the column list is not > > specified. If a table is included in the publication with an explicit > > column list, then even if it is dropped concurrently, it may still be > > returned by pg_get_publication_tables(). > > Right. The try_table_open() is only needed there because that's the > only code path that actually opens the relation (to enumerate its > columns). The column list path reads from the pg_publication_rel > catalog - it never calls table_open(), so it cannot hit the ERROR. > > > So this patch removes the “could not open relation with OID” error, but it > > does not fully ensure the accuracy of the returned table list. > > IMO, no function returning table OIDs can guarantee they remain valid > - a drop can happen right after we return the OID, and accuracy is in > the caller's hands. All the callers of pg_get_publication_tables() > already handle this by JOINing with pg_class. >
I agree with Bharath. Also I would like to add one more point. We do have this: + /* Skip if the relation has been concurrently dropped. */ + if (!OidIsValid(schemaid)) + continue; So if a table is dropped before we could access its explicitly mentioned column-list, above should handle it right even without 'try_table_open' done? Moreover, pg_publication_rel will not return any row for dropped table, so we should be good. For 'table_infos', the case was different, as we saved the list in first call and are using that in subsequent call? But that is not the case with pg_publication_rel. > > It also introduces inconsistent behavior between tables published with and > > without column lists. > > These two paths do different things - one needs the relation open, the > other doesn't. For callers, the outcome is the same: any stale OID > gets filtered out by their pg_class JOIN. > > > To resolve the race condition completely, I think we should try to open the > > table regardless of whether a column list is specified. > > IMO, that would add unnecessary locking overhead in a path that > doesn't need it. The bug is the ERROR, and this patch fixes it. > > In short, when no column list is specified, > pg_get_publication_tables() needs to open the relation to enumerate > all publishable columns and return them as an int2vector in the attrs > output column. It currently uses table_open() for this, which errors > out with concurrent table drops. This patch fixes it by using > try_table_open() and skipping the relation if it's been dropped. > > Thoughts? > > -- > Bharath Rupireddy > Amazon Web Services: https://aws.amazon.com
