Re: Design of pg_stat_subscription_workers vs pgstats

Andres Freund Tue, 15 Feb 2022 10:18:02 -0800

Hi,

On 2022-02-04 09:23:06 +0530, Amit Kapila wrote:
> On Thu, Feb 3, 2022 at 3:25 PM Peter Eisentraut
> <[email protected]> wrote:
> >
> > On 02.02.22 07:54, Amit Kapila wrote:
> >
> > > Sure, but is this the reason you want to store all the error info in
> > > the system catalog? I agree that providing more error info could be
> > > useful and also possibly the previously failed (apply) xacts info as
> > > well but I am not able to see why you want to have that sort of info
> > > in the catalog. I could see storing info like err_lsn/err_xid that can
> > > allow to proceed to apply worker automatically or to slow down the
> > > launch of errored apply worker but not all sort of other error info
> > > (like err_cnt, err_code, err_message, err_time, etc.). I want to know
> > > why you are insisting to make all the error info persistent via the
> > > system catalog?
> >
> > Let's flip this around and ask, why not?
> >
> 
> Because we don't necessarily need all this information after the crash
> and neither is this information about any system object which we
> require for performing operations on objects.


I find this not particularly convincing. IMO data that leads the user to
compromise "replication integrity" is pretty crucial.

And skipped data needs to be logged somewhere persistent, so that there's a
chance to analyze / recover.

We also should utilize more detailed knowledge about errors to influence at
which interval replication is retried. Serialization error: retry soon. Other
errors: retry with increasing backoff.


> In walreceiver (for standby), we don't store the errors/conflicts in any
> table, they are either reported in logs or shared via stats.

That's imo quite different - they're fundamentally time-limited problems. And
they aren't leading the user / DBA to skip transactions etc.

Greetings,

Andres Freund

Re: Design of pg_stat_subscription_workers vs pgstats

Reply via email to