Hi, On 2022-02-04 09:23:06 +0530, Amit Kapila wrote: > On Thu, Feb 3, 2022 at 3:25 PM Peter Eisentraut > <peter.eisentr...@enterprisedb.com> wrote: > > > > On 02.02.22 07:54, Amit Kapila wrote: > > > > > Sure, but is this the reason you want to store all the error info in > > > the system catalog? I agree that providing more error info could be > > > useful and also possibly the previously failed (apply) xacts info as > > > well but I am not able to see why you want to have that sort of info > > > in the catalog. I could see storing info like err_lsn/err_xid that can > > > allow to proceed to apply worker automatically or to slow down the > > > launch of errored apply worker but not all sort of other error info > > > (like err_cnt, err_code, err_message, err_time, etc.). I want to know > > > why you are insisting to make all the error info persistent via the > > > system catalog? > > > > Let's flip this around and ask, why not? > > > > Because we don't necessarily need all this information after the crash > and neither is this information about any system object which we > require for performing operations on objects.
I find this not particularly convincing. IMO data that leads the user to compromise "replication integrity" is pretty crucial. And skipped data needs to be logged somewhere persistent, so that there's a chance to analyze / recover. We also should utilize more detailed knowledge about errors to influence at which interval replication is retried. Serialization error: retry soon. Other errors: retry with increasing backoff. > In walreceiver (for standby), we don't store the errors/conflicts in any > table, they are either reported in logs or shared via stats. That's imo quite different - they're fundamentally time-limited problems. And they aren't leading the user / DBA to skip transactions etc. Greetings, Andres Freund