On Saturday, February 19, 2022, Amit Kapila <amit.kapil...@gmail.com> wrote:
> On Sat, Feb 19, 2022 at 1:17 AM David G. Johnston > <david.g.johns...@gmail.com> wrote: > > > > On Fri, Feb 18, 2022 at 1:26 AM Masahiko Sawada <sawada.m...@gmail.com> > wrote: > >> > >> > >> Here is the summary of the discussion, changes, and plan. > >> > >> 1. Move some error information such as the error message to a new > >> system catalog, pg_subscription_error. The pg_subscription_error table > >> would have the following columns: > >> > >> * sesubid : subscription Oid. > >> * serelid : relation Oid (NULL for apply worker). > >> * seerrlsn : commit-LSN or the error transaction. > >> * seerrcmd : command (INSERT, UPDATE, etc.) of the error transaction. > >> * seerrmsg : error message > > > > > > Not a fan of the "se" prefix but overall yes. We should include a > timestamp. > > > > How about naming it pg_subscription_worker_error as Peter E. has > suggested in one of his emails? I find pg_subscription_error slightly > odd as one could imagine that even the errors related to subscription > commands like Alter Subscription. > > Adding worker makes sense. > >> > >> The tuple is inserted or updated when an apply worker or a tablesync > >> worker raises an error. If the same error occurs in a row, the update > >> is skipped. > > > > Are you going to query table to check if it is same error? I don’t get the question, the quoted text is your which I disagree with. But the error message is being captured in any case. > > > > > I disagree with this - I would treat every new instance of an error to > be important and insert on conflict (sesubid, serelid) the new entry, > updating lsn/cmd/msg with the new values. > > > > I don't think that will be a problem but what advantage are you > envisioning with updating the same info except for timestamp? Omission of timestamp (or any other non-key field we have) from the update is an oversight. > >> The tuple is removed in the following cases: > >> > >> * the subscription is dropped. > >> * the table is dropped. > >> > >> * the table is removed from the subscription. > >> * the worker successfully committed a non-empty transaction. > > > > > > Correct. This handles the "end of sync worker" just fine since its > final action should be a successful commit of a non-empty transaction. > >> > > I think for tablesync workers, we may need slightly different handling > as there could probably be no transactions to apply apart from the > initial copy. Now, I think for tablesync worker, we can't postpone it > till after we update the rel state as SUBREL_STATE_SYNCDONE because if > we do it after that and there is some error updating/deleting the > tuple, the tablesync worker won't be launched again and that entry > will remain in the system for a longer duration. > Not sure…but I don’t see how you can not have a non-empty transaction while still having an error. Are subscriptions “dropped” upon a reboot? If not, that needs its own case for row removal. David J.