Re: Logical replication missing information
Hello!Thank you very much, the information you just sent to me is very, very valuable! Best regards, Cristi Boboc On Wednesday, November 23, 2022 at 04:45:09 AM GMT+2, Peter Smith wrote: On Fri, Nov 18, 2022 at 4:50 AM PG Doc comments form wrote: > > The following documentation comment has been logged on the website: > > Page: https://www.postgresql.org/docs/15/logical-replication-row-filter.html > Description: Hi, FYI - I have forwarded this post to the hacker's list, where I think it will receive more attention. I am not sure why that (above) page was cited -- the section "31.3 Row Filters" is specifically about row filtering, whereas the items you reported seem unrelated to row filters, but are generic for all Logical Replication. > > There are several things missing here and some of them I found to be highly > important: > 1. How can I find why a logical replication failed. Currently I only can see > it "does nothing" in pg_stat_subscriptions. There should be logs reporting any replication conflicts etc. See [1] for example logs. See also the answer for #2 below. > 2. In case of copying the existing data: how can I find which tables or > partitions were processed and which are on the processing queue (while > monitoring I have observed no specific order or rule). There is no predictable processing queue or order - The initial tablesyncs might be happening in multiple asynchronous processes according to the GUC max_sync_workers_per_subscription [2]. Below I show examples of replicating two tables (tab1 and tab2). ~~ >From the logs you should see which table syncs have completed OK: e.g. (the initial copy is all good) test_sub=# CREATE SUBSCRIPTION sub1 CONNECTION 'host=localhost dbname=test_pub' PUBLICATION pub1; NOTICE: created replication slot "sub1" on publisher CREATE SUBSCRIPTION test_sub=# 2022-11-23 12:23:18.501 AEDT [27961] LOG: logical replication apply worker for subscription "sub1" has started 2022-11-23 12:23:18.513 AEDT [27963] LOG: logical replication table synchronization worker for subscription "sub1", table "tab1" has started 2022-11-23 12:23:18.524 AEDT [27965] LOG: logical replication table synchronization worker for subscription "sub1", table "tab2" has started 2022-11-23 12:23:18.593 AEDT [27963] LOG: logical replication table synchronization worker for subscription "sub1", table "tab1" has finished 2022-11-23 12:23:18.611 AEDT [27965] LOG: logical replication table synchronization worker for subscription "sub1", table "tab2" has finished e.g. (where there is conflict in table tab2) test_sub=# CREATE SUBSCRIPTION sub1 CONNECTION 'host=localhost dbname=test_pub' PUBLICATION pub1; NOTICE: created replication slot "sub1" on publisher CREATE SUBSCRIPTION test_sub=# 2022-11-23 12:40:56.794 AEDT [23401] LOG: logical replication apply worker for subscription "sub1" has started 2022-11-23 12:40:56.808 AEDT [23403] LOG: logical replication table synchronization worker for subscription "sub1", table "tab1" has started 2022-11-23 12:40:56.819 AEDT [23405] LOG: logical replication table synchronization worker for subscription "sub1", table "tab2" has started 2022-11-23 12:40:56.890 AEDT [23405] ERROR: duplicate key value violates unique constraint "tab2_pkey" 2022-11-23 12:40:56.890 AEDT [23405] DETAIL: Key (id)=(1) already exists. 2022-11-23 12:40:56.890 AEDT [23405] CONTEXT: COPY tab2, line 1 2022-11-23 12:40:56.891 AEDT [3233] LOG: background worker "logical replication worker" (PID 23405) exited with exit code 1 2022-11-23 12:40:56.902 AEDT [23403] LOG: logical replication table synchronization worker for subscription "sub1", table "tab1" has finished ... ~~ Alternatively, you can use some SQL query to discover which tables of the subscription had attained a READY state. The READY state (denoted by 'r') means that the initial COPY was completed ok. The table replication state is found in the 'srsubstate' column. See [3] e.g. (the initial copy is all good) test_sub=# select sr.srsubid,sr.srrelid,s.subname,ut.relname,sr.srsubstate from pg_statio_user_tables ut, pg_subscription_rel sr, pg_subscription s where ut.relid = sr.srrelid and s.oid=sr.srsubid; srsubid | srrelid | subname | relname | srsubstate -+-+-+-+ 16418 | 16409 | sub1 | tab1 | r 16418 | 16402 | sub1 | tab2 | r (2 rows) e.g. (where it has a conflict in table tab2, so it did not get to READY state) test_sub=# select sr.srsubid,sr.srrelid,s.subname,ut.relname,sr.srsubstate from pg_statio_user_tables ut, pg_subscription_rel sr, pg_subscription s where ut.relid = sr.srrelid and s.oid=sr.srsubid;2022-11-23 12:41:37.686 AEDT [24501] LOG: logical replication table synchronization worker for subscription "sub1", table "tab2" has started 2022-11-23 12:41:37.774 AEDT [24501] ERROR: duplicate key value violates unique constraint "tab2_pkey" 2022-11-23 12:41:37.774 AEDT [24501] DETAIL: Key (id)=(1) already exists. 2022-11-23 12:41:3
Re: Logical replication missing information
On Fri, Nov 18, 2022 at 4:50 AM PG Doc comments form wrote: > > The following documentation comment has been logged on the website: > > Page: https://www.postgresql.org/docs/15/logical-replication-row-filter.html > Description: Hi, FYI - I have forwarded this post to the hacker's list, where I think it will receive more attention. I am not sure why that (above) page was cited -- the section "31.3 Row Filters" is specifically about row filtering, whereas the items you reported seem unrelated to row filters, but are generic for all Logical Replication. > > There are several things missing here and some of them I found to be highly > important: > 1. How can I find why a logical replication failed. Currently I only can see > it "does nothing" in pg_stat_subscriptions. There should be logs reporting any replication conflicts etc. See [1] for example logs. See also the answer for #2 below. > 2. In case of copying the existing data: how can I find which tables or > partitions were processed and which are on the processing queue (while > monitoring I have observed no specific order or rule). There is no predictable processing queue or order - The initial tablesyncs might be happening in multiple asynchronous processes according to the GUC max_sync_workers_per_subscription [2]. Below I show examples of replicating two tables (tab1 and tab2). ~~ >From the logs you should see which table syncs have completed OK: e.g. (the initial copy is all good) test_sub=# CREATE SUBSCRIPTION sub1 CONNECTION 'host=localhost dbname=test_pub' PUBLICATION pub1; NOTICE: created replication slot "sub1" on publisher CREATE SUBSCRIPTION test_sub=# 2022-11-23 12:23:18.501 AEDT [27961] LOG: logical replication apply worker for subscription "sub1" has started 2022-11-23 12:23:18.513 AEDT [27963] LOG: logical replication table synchronization worker for subscription "sub1", table "tab1" has started 2022-11-23 12:23:18.524 AEDT [27965] LOG: logical replication table synchronization worker for subscription "sub1", table "tab2" has started 2022-11-23 12:23:18.593 AEDT [27963] LOG: logical replication table synchronization worker for subscription "sub1", table "tab1" has finished 2022-11-23 12:23:18.611 AEDT [27965] LOG: logical replication table synchronization worker for subscription "sub1", table "tab2" has finished e.g. (where there is conflict in table tab2) test_sub=# CREATE SUBSCRIPTION sub1 CONNECTION 'host=localhost dbname=test_pub' PUBLICATION pub1; NOTICE: created replication slot "sub1" on publisher CREATE SUBSCRIPTION test_sub=# 2022-11-23 12:40:56.794 AEDT [23401] LOG: logical replication apply worker for subscription "sub1" has started 2022-11-23 12:40:56.808 AEDT [23403] LOG: logical replication table synchronization worker for subscription "sub1", table "tab1" has started 2022-11-23 12:40:56.819 AEDT [23405] LOG: logical replication table synchronization worker for subscription "sub1", table "tab2" has started 2022-11-23 12:40:56.890 AEDT [23405] ERROR: duplicate key value violates unique constraint "tab2_pkey" 2022-11-23 12:40:56.890 AEDT [23405] DETAIL: Key (id)=(1) already exists. 2022-11-23 12:40:56.890 AEDT [23405] CONTEXT: COPY tab2, line 1 2022-11-23 12:40:56.891 AEDT [3233] LOG: background worker "logical replication worker" (PID 23405) exited with exit code 1 2022-11-23 12:40:56.902 AEDT [23403] LOG: logical replication table synchronization worker for subscription "sub1", table "tab1" has finished ... ~~ Alternatively, you can use some SQL query to discover which tables of the subscription had attained a READY state. The READY state (denoted by 'r') means that the initial COPY was completed ok. The table replication state is found in the 'srsubstate' column. See [3] e.g. (the initial copy is all good) test_sub=# select sr.srsubid,sr.srrelid,s.subname,ut.relname,sr.srsubstate from pg_statio_user_tables ut, pg_subscription_rel sr, pg_subscription s where ut.relid = sr.srrelid and s.oid=sr.srsubid; srsubid | srrelid | subname | relname | srsubstate -+-+-+-+ 16418 | 16409 | sub1| tab1| r 16418 | 16402 | sub1| tab2| r (2 rows) e.g. (where it has a conflict in table tab2, so it did not get to READY state) test_sub=# select sr.srsubid,sr.srrelid,s.subname,ut.relname,sr.srsubstate from pg_statio_user_tables ut, pg_subscription_rel sr, pg_subscription s where ut.relid = sr.srrelid and s.oid=sr.srsubid;2022-11-23 12:41:37.686 AEDT [24501] LOG: logical replication table synchronization worker for subscription "sub1", table "tab2" has started 2022-11-23 12:41:37.774 AEDT [24501] ERROR: duplicate key value violates unique constraint "tab2_pkey" 2022-11-23 12:41:37.774 AEDT [24501] DETAIL: Key (id)=(1) already exists. 2022-11-23 12:41:37.774 AEDT [24501] CONTEXT: COPY tab2, line 1 2022-11-23 12:41:37.775 AEDT [3233] LOG: background worker "logical replication worker" (PID 24501) exited with exit code 1 srsubid | srrelid | sub