On Tue, Oct 18, 2022 at 22:35 PM Fabrice Chapuis <fabrice636...@gmail.com> 
wrote:
> Hello Amit,
>
> In version 14.4 the timeout problem for logical replication happens again 
> despite
> the patch provided for this issue in this version. When bulky materialized 
> views
> are reloaded it broke logical replication. It is possible to solve this 
> problem by
> using your new "streaming" option.
> Have you ever had this issue reported to you?
>
> Regards
>
> Fabrice
>
> 2022-10-10 17:19:02 CEST [538424]: [17-1]
> user=postgres,db=dbxxxa00,client=[local] CONTEXT:  SQL statement "REFRESH
> MATERIALIZED VIEW sxxxa00.table_base"
>         PL/pgSQL function refresh_materialized_view(text) line 5 at EXECUTE
> 2022-10-10 17:19:02 CEST [538424]: [18-1]
> user=postgres,db=dbxxxa00,client=[local] STATEMENT:  select
> refresh_materialized_view('sxxxa00.table_base');
> 2022-10-10 17:19:02 CEST [538424]: [19-1]
> user=postgres,db=dbxxxa00,client=[local] LOG:  duration: 264815.652
> ms  statement: select refresh_materialized_view('sxxxa00.table_base');
> 2022-10-10 17:19:27 CEST [559156]: [1-1] user=,db=,client= LOG:  automatic
> vacuum of table "dbxxxa00.sxxxa00.table_base": index scans: 0
>         pages: 0 removed, 296589 remain, 0 skipped due to pins, 0 skipped 
> frozen
>         tuples: 0 removed, 48472622 remain, 0 are dead but not yet removable,
> oldest xmin: 1501528
>         index scan not needed: 0 pages from table (0.00% of total) had 0 dead 
> item
> identifiers removed
>         I/O timings: read: 1.494 ms, write: 0.000 ms
>         avg read rate: 0.028 MB/s, avg write rate: 107.952 MB/s
>         buffer usage: 593301 hits, 77 misses, 294605 dirtied
>         WAL usage: 296644 records, 46119 full page images, 173652718 bytes
>         system usage: CPU: user: 17.26 s, system: 0.29 s, elapsed: 21.32 s
> 2022-10-10 17:19:28 CEST [559156]: [2-1] user=,db=,client= LOG:  automatic
> analyze of table "dbxxxa00.sxxxa00.table_base"
>         I/O timings: read: 0.043 ms, write: 0.000 ms
>         avg read rate: 0.026 MB/s, avg write rate: 0.026 MB/s
>         buffer usage: 30308 hits, 2 misses, 2 dirtied
>         system usage: CPU: user: 0.54 s, system: 0.00 s, elapsed: 0.59 s
> 2022-10-10 17:19:34 CEST [3898111]: [6840-1] user=,db=,client= LOG:  
> checkpoint
> complete: wrote 1194 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 
> recycled;
> write=269.551 s, sync=0.002 s, total=269.560 s; sync files=251, longest=0.00
> 1 s, average=0.001 s; distance=583790 kB, estimate=583790 kB
> 2022-10-10 17:20:02 CEST [716163]: [2-1] user=,db=,client= ERROR:  terminating
> logical replication worker due to timeout
> 2022-10-10 17:20:02 CEST [3897921]: [13-1] user=,db=,client= LOG:  background
> worker "logical replication worker" (PID 716163) exited with exit code 1
> 2022-10-10 17:20:02 CEST [561346]: [1-1] user=,db=,client= LOG:  logical
> replication apply worker for subscription "subxxx_sxxxa00" has started

Thanks for reporting!

There is one thing I want to confirm:
Is the statement `select refresh_materialized_view('sxxxa00.table_base');`
executed on the publisher-side?

If so, I think the reason for this timeout problem could be that during DDL
(`REFRESH MATERIALIZED VIEW`), lots of temporary data is generated due to
rewrite. Since these temporary data will not be processed by the pgoutput 
plugin, our previous fix for DML had no impact on this case.
I think setting "streaming" option to "on" could work around this problem.

I tried to write a draft patch (see attachment) on REL_14_4 to fix this.
I tried it locally and it seems to work.
Could you please confirm whether this problem is fixed after applying this
draft patch?

If this draft patch works, I will improve it and try to fix this problem.

Regards,
Wang wei

Attachment: 0001-draft-for-REL_14_4.patch
Description: 0001-draft-for-REL_14_4.patch

Reply via email to