Re: Logical replication missing information

2022-11-23 Thread Boboc Cristi
Hello!Thank you very much, the information you just sent to me is very, very 
valuable!
Best regards, Cristi Boboc 

On Wednesday, November 23, 2022 at 04:45:09 AM GMT+2, Peter Smith 
 wrote:  
 
 On Fri, Nov 18, 2022 at 4:50 AM PG Doc comments form
 wrote:
>
> The following documentation comment has been logged on the website:
>
> Page: https://www.postgresql.org/docs/15/logical-replication-row-filter.html
> Description:

Hi,

FYI - I have forwarded this post to the hacker's list, where I think
it will receive more attention.

I am not sure why that (above) page was cited -- the section "31.3 Row
Filters" is specifically about row filtering, whereas the items you
reported seem unrelated to row filters, but are generic for all
Logical Replication.

>
> There are several things missing here and some of them I found to be highly
> important:
> 1. How can I find why a logical replication failed. Currently I only can see
> it "does nothing" in pg_stat_subscriptions.

There should be logs reporting any replication conflicts etc. See [1]
for example logs. See also the answer for #2 below.

> 2. In case of copying the existing data: how can I find which tables or
> partitions were processed and which are on the processing queue (while
> monitoring I have observed no specific order or rule).

There is no predictable processing queue or order - The initial
tablesyncs might be happening in multiple asynchronous processes
according to the GUC max_sync_workers_per_subscription [2].

Below I show examples of replicating two tables (tab1 and tab2).

~~

>From the logs you should see which table syncs have completed OK:

e.g. (the initial copy is all good)
test_sub=# CREATE SUBSCRIPTION sub1 CONNECTION 'host=localhost
dbname=test_pub' PUBLICATION pub1;
NOTICE:  created replication slot "sub1" on publisher
CREATE SUBSCRIPTION
test_sub=# 2022-11-23 12:23:18.501 AEDT [27961] LOG:  logical
replication apply worker for subscription "sub1" has started
2022-11-23 12:23:18.513 AEDT [27963] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tab1" has
started
2022-11-23 12:23:18.524 AEDT [27965] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tab2" has
started
2022-11-23 12:23:18.593 AEDT [27963] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tab1" has
finished
2022-11-23 12:23:18.611 AEDT [27965] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tab2" has
finished

e.g. (where there is conflict in table tab2)
test_sub=# CREATE SUBSCRIPTION sub1 CONNECTION 'host=localhost
dbname=test_pub' PUBLICATION pub1;
NOTICE:  created replication slot "sub1" on publisher
CREATE SUBSCRIPTION
test_sub=# 2022-11-23 12:40:56.794 AEDT [23401] LOG:  logical
replication apply worker for subscription "sub1" has started
2022-11-23 12:40:56.808 AEDT [23403] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tab1" has
started
2022-11-23 12:40:56.819 AEDT [23405] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tab2" has
started
2022-11-23 12:40:56.890 AEDT [23405] ERROR:  duplicate key value
violates unique constraint "tab2_pkey"
2022-11-23 12:40:56.890 AEDT [23405] DETAIL:  Key (id)=(1) already exists.
2022-11-23 12:40:56.890 AEDT [23405] CONTEXT:  COPY tab2, line 1
2022-11-23 12:40:56.891 AEDT [3233] LOG:  background worker "logical
replication worker" (PID 23405) exited with exit code 1
2022-11-23 12:40:56.902 AEDT [23403] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tab1" has
finished
...

~~

Alternatively, you can use some SQL query to discover which tables of
the subscription had attained a READY state. The READY state (denoted
by 'r') means that the initial COPY was completed ok. The table
replication state is found in the 'srsubstate' column. See [3]

e.g. (the initial copy is all good)
test_sub=# select
sr.srsubid,sr.srrelid,s.subname,ut.relname,sr.srsubstate from
pg_statio_user_tables ut, pg_subscription_rel sr, pg_subscription s
where ut.relid = sr.srrelid and s.oid=sr.srsubid;
 srsubid | srrelid | subname | relname | srsubstate
-+-+-+-+
  16418 |  16409 | sub1    | tab1    | r
  16418 |  16402 | sub1    | tab2    | r
(2 rows)

e.g. (where it has a conflict in table tab2, so it did not get to READY state)
test_sub=# select
sr.srsubid,sr.srrelid,s.subname,ut.relname,sr.srsubstate from
pg_statio_user_tables ut, pg_subscription_rel sr, pg_subscription s
where ut.relid = sr.srrelid and s.oid=sr.srsubid;2022-11-23
12:41:37.686 AEDT [24501] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tab2" has
started
2022-11-23 12:41:37.774 AEDT [24501] ERROR:  duplicate key value
violates unique constraint "tab2_pkey"
2022-11-23 12:41:37.774 AEDT [24501] DETAIL:  Key (id)=(1) already exists.
2022-11-23 

Re: Logical replication missing information

2022-11-22 Thread Peter Smith
On Fri, Nov 18, 2022 at 4:50 AM PG Doc comments form
 wrote:
>
> The following documentation comment has been logged on the website:
>
> Page: https://www.postgresql.org/docs/15/logical-replication-row-filter.html
> Description:

Hi,

FYI - I have forwarded this post to the hacker's list, where I think
it will receive more attention.

I am not sure why that (above) page was cited -- the section "31.3 Row
Filters" is specifically about row filtering, whereas the items you
reported seem unrelated to row filters, but are generic for all
Logical Replication.

>
> There are several things missing here and some of them I found to be highly
> important:
> 1. How can I find why a logical replication failed. Currently I only can see
> it "does nothing" in pg_stat_subscriptions.

There should be logs reporting any replication conflicts etc. See [1]
for example logs. See also the answer for #2 below.

> 2. In case of copying the existing data: how can I find which tables or
> partitions were processed and which are on the processing queue (while
> monitoring I have observed no specific order or rule).

There is no predictable processing queue or order - The initial
tablesyncs might be happening in multiple asynchronous processes
according to the GUC max_sync_workers_per_subscription [2].

Below I show examples of replicating two tables (tab1 and tab2).

~~

>From the logs you should see which table syncs have completed OK:

e.g. (the initial copy is all good)
test_sub=# CREATE SUBSCRIPTION sub1 CONNECTION 'host=localhost
dbname=test_pub' PUBLICATION pub1;
NOTICE:  created replication slot "sub1" on publisher
CREATE SUBSCRIPTION
test_sub=# 2022-11-23 12:23:18.501 AEDT [27961] LOG:  logical
replication apply worker for subscription "sub1" has started
2022-11-23 12:23:18.513 AEDT [27963] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tab1" has
started
2022-11-23 12:23:18.524 AEDT [27965] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tab2" has
started
2022-11-23 12:23:18.593 AEDT [27963] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tab1" has
finished
2022-11-23 12:23:18.611 AEDT [27965] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tab2" has
finished

e.g. (where there is conflict in table tab2)
test_sub=# CREATE SUBSCRIPTION sub1 CONNECTION 'host=localhost
dbname=test_pub' PUBLICATION pub1;
NOTICE:  created replication slot "sub1" on publisher
CREATE SUBSCRIPTION
test_sub=# 2022-11-23 12:40:56.794 AEDT [23401] LOG:  logical
replication apply worker for subscription "sub1" has started
2022-11-23 12:40:56.808 AEDT [23403] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tab1" has
started
2022-11-23 12:40:56.819 AEDT [23405] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tab2" has
started
2022-11-23 12:40:56.890 AEDT [23405] ERROR:  duplicate key value
violates unique constraint "tab2_pkey"
2022-11-23 12:40:56.890 AEDT [23405] DETAIL:  Key (id)=(1) already exists.
2022-11-23 12:40:56.890 AEDT [23405] CONTEXT:  COPY tab2, line 1
2022-11-23 12:40:56.891 AEDT [3233] LOG:  background worker "logical
replication worker" (PID 23405) exited with exit code 1
2022-11-23 12:40:56.902 AEDT [23403] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tab1" has
finished
...

~~

Alternatively, you can use some SQL query to discover which tables of
the subscription had attained a READY state. The READY state (denoted
by 'r') means that the initial COPY was completed ok. The table
replication state is found in the 'srsubstate' column. See [3]

e.g. (the initial copy is all good)
test_sub=# select
sr.srsubid,sr.srrelid,s.subname,ut.relname,sr.srsubstate from
pg_statio_user_tables ut, pg_subscription_rel sr, pg_subscription s
where ut.relid = sr.srrelid and s.oid=sr.srsubid;
 srsubid | srrelid | subname | relname | srsubstate
-+-+-+-+
   16418 |   16409 | sub1| tab1| r
   16418 |   16402 | sub1| tab2| r
(2 rows)

e.g. (where it has a conflict in table tab2, so it did not get to READY state)
test_sub=# select
sr.srsubid,sr.srrelid,s.subname,ut.relname,sr.srsubstate from
pg_statio_user_tables ut, pg_subscription_rel sr, pg_subscription s
where ut.relid = sr.srrelid and s.oid=sr.srsubid;2022-11-23
12:41:37.686 AEDT [24501] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tab2" has
started
2022-11-23 12:41:37.774 AEDT [24501] ERROR:  duplicate key value
violates unique constraint "tab2_pkey"
2022-11-23 12:41:37.774 AEDT [24501] DETAIL:  Key (id)=(1) already exists.
2022-11-23 12:41:37.774 AEDT [24501] CONTEXT:  COPY tab2, line 1
2022-11-23 12:41:37.775 AEDT [3233] LOG:  background worker "logical
replication worker" (PID 24501) exited with exit code 1

 srsubid | srrelid |