On Saturday, February 19, 2022, Amit Kapila <amit.kapil...@gmail.com> wrote:

> On Sat, Feb 19, 2022 at 1:17 AM David G. Johnston
> <david.g.johns...@gmail.com> wrote:
> >
> > On Fri, Feb 18, 2022 at 1:26 AM Masahiko Sawada <sawada.m...@gmail.com>
> wrote:
> >>
> >>
> >> Here is the summary of the discussion, changes, and plan.
> >>
> >> 1. Move some error information such as the error message to a new
> >> system catalog, pg_subscription_error. The pg_subscription_error table
> >> would have the following columns:
> >>
> >> * sesubid : subscription Oid.
> >> * serelid : relation Oid (NULL for apply worker).
> >> * seerrlsn : commit-LSN or the error transaction.
> >> * seerrcmd : command (INSERT, UPDATE, etc.) of the error transaction.
> >> * seerrmsg : error message
> >
> >
> > Not a fan of the "se" prefix but overall yes. We should include a
> timestamp.
> >
>
> How about naming it pg_subscription_worker_error as Peter E. has
> suggested in one of his emails? I find pg_subscription_error slightly
> odd as one could imagine that even the errors related to subscription
> commands like Alter Subscription.
>
>
Adding worker makes sense.


> >>
> >> The tuple is inserted or updated when an apply worker or a tablesync
> >> worker raises an error. If the same error occurs in a row, the update
> >> is skipped.
> >
>
> Are you going to query table to check if it is same error?


I don’t get the question, the quoted text is your which I disagree with.
But the error message is being captured in any case.

>
> >
> > I disagree with this - I would treat every new instance of an error to
> be important and insert on conflict (sesubid, serelid) the new entry,
> updating lsn/cmd/msg with the new values.
> >
>
> I don't think that will be a problem but what advantage are you
> envisioning with updating the same info except for timestamp?


Omission of timestamp (or any other non-key field we have) from the update
is an oversight.


> >> The tuple is removed in the following cases:
> >>
> >> * the subscription is dropped.
> >> * the table is dropped.
> >>
> >> * the table is removed from the subscription.
> >> * the worker successfully committed a non-empty transaction.
> >
> >
> > Correct.  This handles the "end of sync worker" just fine since its
> final action should be a successful commit of a non-empty transaction.
> >>
>
> I think for tablesync workers, we may need slightly different handling
> as there could probably be no transactions to apply apart from the
> initial copy. Now, I think for tablesync worker, we can't postpone it
> till after we update the rel state as SUBREL_STATE_SYNCDONE because if
> we do it after that and there is some error updating/deleting the
> tuple, the tablesync worker won't be launched again and that entry
> will remain in the system for a longer duration.
>

Not sure…but I don’t see how you can not have a non-empty transaction while
still having an error.

Are subscriptions “dropped” upon a reboot?  If not, that needs its own case
for row removal.

David J.

Reply via email to