On Mon, Jun 22, 2026 at 2:38 PM Amit Kapila <[email protected]> wrote:
>
> On Mon, Jun 22, 2026 at 1:56 PM shveta malik <[email protected]> wrote:
> >
> > On Mon, Jun 22, 2026 at 9:33 AM vignesh C <[email protected]> wrote:
> > >
> > > On Mon, 22 Jun 2026 at 08:41, Amit Kapila <[email protected]> wrote:
> > > >
> > > > On Sun, Jun 21, 2026 at 7:53 PM vignesh C <[email protected]> wrote:
> > > > >
> > > > > While attempting to log a conflict, a concurrent ALTER SUBSCRIPTION
> > > > > can change the conflict logging destination from all to log. In this
> > > > > scenario, the apply worker may already have cached the conflictlogdest
> > > > > information, including the OID of the current conflict log table.
> > > > > However, the concurrent ALTER SUBSCRIPTION drops the conflict log
> > > > > table as part of the destination change:
> > > > > +Relation
> > > > > +GetConflictLogDestAndTable(ConflictLogDest *log_dest)
> > > > > +{
> > > > > +       Oid                     conflictlogrelid;
> > > > > +
> > > > > +       /*
> > > > > +        * Convert the text log destination to the internal enum.
> > > > > MySubscription
> > > > > +        * already contains the data from pg_subscription.
> > > > > +        */
> > > > > +       *log_dest = 
> > > > > GetConflictLogDest(MySubscription->conflictlogdest);
> > > > > +
> > > > > +       /* Quick exit if a conflict log table was not requested. */
> > > > > +       if (!CONFLICTS_LOGGED_TO_TABLE(*log_dest))
> > > > > +               return NULL;
> > > > > +
> > > > > +       conflictlogrelid = MySubscription->conflictlogrelid;
> > > > > +
> > > > > +       Assert(OidIsValid(conflictlogrelid));
> > > > > +
> > > > > +       return table_open(conflictlogrelid, RowExclusiveLock);
> > > > > +}
> > > > >
> > > > > As a result, when the apply worker later attempts to open the cached
> > > > > conflict log table, table_open() fails because the relation has
> > > > > already been dropped. This causes the error handling path itself to
> > > > > fail before the conflict record can be written to either the conflict
> > > > > log table or the server log.
> > > > >
> > > > > In such cases, the conflict record is effectively lost and is not
> > > > > logged anywhere. For example:
> > > > > 2026-06-21 19:31:13.592 IST [263598] LOG:  logical replication apply
> > > > > worker for subscription "sub1" has started
> > > > > 2026-06-21 19:32:26.731 IST [263598] ERROR:  could not open relation
> > > > > with OID 16405
> > > > > 2026-06-21 19:32:26.731 IST [263598] CONTEXT:  processing remote data
> > > > > for replication origin "pg_16404" during message type "INSERT" for
> > > > > replication target relation "public.t1" in transaction 698, finished
> > > > > at 0/017D39A0
> > > > > 2026-06-21 19:32:26.735 IST [263471] LOG:  background worker "logical
> > > > > replication apply worker" (PID 263598) exited with exit code 1
> > > > >
> > > > > Ideally, failure to access the conflict log table should not prevent
> > > > > the conflict from being reported in the server log. This issue is
> > > > > present with the v52 version. I have not yet checked if Amit's recent
> > > > > patch posted a few minutes ago at [1] handles this issue.
> > > > >
> > > >
> > > > There are two places in the patch from where we LOG/Insert the
> > > > conflict data. First is ReportApplyConflict() where we LOG if the
> > > > conflict arises from a non-ERROR path (aka conflicts other
> > > > INSERT/UPDATE_EXISTS).  In that case, the conflict data will be logged
> > > > even when we fail to insert into CLT. Second is the place for
> > > > conflicts that arose as ERRORs (aka INSERT/UPDATE_EXISTS), where the
> > > > conflict information will be logged along with insert failure as
> > > > CONTEXT. Can you please verify your test based on this input and share
> > > > your findings and thoughts?
> > >
> > > The scenario I am testing is an insert_exists conflict.
> > > On the publisher:
> > > CREATE TABLE t1 (c1 int);
> > >
> > > On the subscriber:
> > > CREATE TABLE t1 (c1 int PRIMARY KEY);
> > >
> > > Then execute the following on the publisher:
> > > INSERT INTO t1 VALUES (10);
> > > INSERT INTO t1 VALUES (10);
> > >
> > > The second insert generates an insert_exists conflict on the
> > > subscriber. The conflict is reported and logged through the following
> > > call chain:
> > > apply_handle_insert
> > >   -> apply_handle_insert_internal
> > >      -> ExecSimpleRelationInsert
> > >         -> CheckAndReportConflict
> > >            -> ReportApplyConflict
> > >
> > > Pause execution in ReportApplyConflict() at
> > > GetConflictLogDestAndTable(), immediately before opening the conflict
> > > log table:
> > > ...
> > > return table_open(conflictlogrelid, RowExclusiveLock);
> > > ...
> > >
> > > While the apply worker is paused, execute the following command 
> > > concurrently:
> > > ALTER SUBSCRIPTION sub1
> > >   SET (conflict_log_destination = 'log');
> > >
> > > This succeeds and drops the conflict log table:
> > > NOTICE:  dropped conflict log table "pg_conflict.pg_conflict_log_16404"
> > >          for subscription "sub1"
> > > ALTER SUBSCRIPTION
> > >
> > > At this point, GetConflictLogDestAndTable() has already determined
> > > that the conflict should be logged to a table and has cached the
> > > corresponding relation OID. However, the concurrent ALTER SUBSCRIPTION
> > > has removed that table.
> > >
> > > When execution resumes, the subsequent table_open() call fails with:
> > > 2026-06-22 09:24:53.072 IST [304864] ERROR:  could not open relation
> > > with OID 16405
> > >
> > > As a result, conflict processing itself fails before the conflict
> > > details can be recorded. The conflict is therefore not logged to the
> > > conflict log table and is also not emitted to the server log.
> > >
> >
> >
> > I understand this case, but I feel it isn't critical because the table
> > is going to be dropped in parallel, so ultimately, all data is lost.
> > At-max, we can provide a  LOG when table-open fails, indicating that
> > the CLT table is dropped concurrently and thus conflict-data cannott
> > be logged to table.
> >
>
> Instead of adding additional LOG, a simpler fix would be to use
> try_table_open() and if the table is dropped, silently just LOG the
> conflict and proceed (see attached top-up patch).

Yeah, the idea looks good.

> In general, I agree
> that it is not a very critical issue but as the fix is simpler, I
> thought it is better to address so that apply worker can continue
> instead of erroring out.

yes, my intent was the same, the apply worker can simply LOG
(table-open issue) and continue when table_open does not give table.
But we can LOG the conflict instead as you suggested.

> Having said that, I think we can't handle
> each and every corner case where there are some other errors before we
> can LOG the conflict, say some OOM or some other error happens during
> CheckAndReportConflict.
>

I agree.

thanks
Shveta


Reply via email to