On Fri, Feb 18, 2022 at 1:26 AM Masahiko Sawada <sawada.m...@gmail.com> wrote:
> > Here is the summary of the discussion, changes, and plan. > > 1. Move some error information such as the error message to a new > system catalog, pg_subscription_error. The pg_subscription_error table > would have the following columns: > > * sesubid : subscription Oid. > * serelid : relation Oid (NULL for apply worker). > * seerrlsn : commit-LSN or the error transaction. > * seerrcmd : command (INSERT, UPDATE, etc.) of the error transaction. > * seerrmsg : error message > Not a fan of the "se" prefix but overall yes. We should include a timestamp. > The tuple is inserted or updated when an apply worker or a tablesync > worker raises an error. If the same error occurs in a row, the update > is skipped. I disagree with this - I would treat every new instance of an error to be important and insert on conflict (sesubid, serelid) the new entry, updating lsn/cmd/msg with the new values. The tuple is removed in the following cases: > > * the subscription is dropped. > * the table is dropped. * the table is removed from the subscription. > * the worker successfully committed a non-empty transaction. > Correct. This handles the "end of sync worker" just fine since its final action should be a successful commit of a non-empty transaction. > > With this change, pg_stat_subscription_workers will be like: > > * subid > * subname > * subrelid > * error_count > * last_error_timestamp > > This view will be extended by adding transaction statistics proposed > on another thread[1]. > I haven't reviewed that thread but in-so-far as this one goes I would just drop this altogether. The error count, if desired, can be added to pg_subscription_error, and the timestamp should be added there as noted above. If this is useful for the feature on the other thread it can be reconstituted from there. > 2. Fix a bug in pg_stat_subscription_workers. As pointed out by > Andres, there is a bug in pg_stat_subscription_workers; it doesn't > drop entries for already-dropped tables. We need to fix it. > Given the above, this becomes an N/A. > 3. Show commit-LSN of the error transaction in errcontext. Currently, > we show XID and commit timestamp in the errcontext. But given that we > use LSN in pg_subscriptoin_error, it's better to show commit-LSN as > well (or instead of error-XID). > Agreed, I think: what "errcontext" is this referring to? > > 5. Record skipped data to the system catalog, say > pg_subscription_conflict_history so that there is a chance to analyze > and recover. We can discuss the > details in a new thread. > Agreed - the "before skipping" consideration seems considerably more helpful; but wouldn't need to be persistent, it could just be a view. A permanent record probably would best be recorded in the logs - though if we get the pre-skip functionality the user can view directly and save the results wherever they wish or we can provide a function to spool the information to the log. I don't see persistent in-database storage being that desirable here. If we only do something after the transaction has been skipped it may be useful to add an option to the skipping command to auto-disable the subscription after the transaction has been skipped and before any subsequent transactions are applied. This will give the user a chance to process the post-skipped information before restoring sync and having the system begin changing underneath them again. > > 4 and 5 might be better introduced together but I think since the user > is able to check what changes will be skipped on the publisher side we > can do 5 for v16. How would one go about doing that (checking what changes will be skipped on the publisher side)? David J.