Re: Subscription stuck at initialize state
Hi Vijaykumar Thank you so much for taking out so much of your time to recreate the bug. I checked the max_logical_replication_workers on both the nodes and they are set at 4 at the moment. The reason why it is failing is that there are 3 existing replications on the target node already and when I create this 4th one, it is not able to create that temporary replication worker, needed to copy the initial data, and therefore it never moves ahead of that state. Since these are production databases, I can't restart them during the weekdays, so I will try to change them during the weekend. But I did find the warning message in the target DB logs. "WARNING,53400,"out of logical replication worker slots",,"You might need to increase max_logical_replication_workers.",,,"","logical replication worker" So I am sure this will work. THANK YOU SO MUCH. On Fri, Feb 4, 2022 at 3:04 AM Vijaykumar Jain < vijaykumarjain.git...@gmail.com> wrote: > trimming the email, to avoid noise. > > I spent a lot of time trying multiple options/combinations and finally > managed to replicate your debug output. > this is when i have , > > postgres=# show max_logical_replication_workers; > * max_logical_replication_workers* > - > * 2* > (1 row) > > > on publisher > postgres@controller:~$ psql -p 5001 > psql (14.1 (Ubuntu 14.1-2.pgdg20.04+1)) > Type "help" for help. > > postgres=# create table t_bytea(id int primary key, somebytea bytea); > CREATE TABLE > postgres=# create table t1(id int); > CREATE TABLE > postgres=# create table t2(id int); > CREATE TABLE > postgres=# create table t3(id int); > CREATE TABLE > postgres=# create publication mypub1 for table t1; > CREATE PUBLICATION > postgres=# create publication mypub2 for table t2; > CREATE PUBLICATION > postgres=# create publication mypub3 for table t3; > CREATE PUBLICATION > postgres=# create publication mypub4 for table t3; > CREATE PUBLICATION > postgres=# create publication mypub5 for table t_bytea; > CREATE PUBLICATION > postgres=# insert into t_bytea select x,repeat(repeat('xxx', 100), > 1000)::bytea from generate_series(1, 1000) x; > INSERT 0 1000 > > > > on subscriber > postgres@controller:~$ psql -p 5002 > psql (14.1 (Ubuntu 14.1-2.pgdg20.04+1)) > Type "help" for help. > > postgres=# create table t_bytea(id int primary key, somebytea bytea); > CREATE TABLE > postgres=# create table t1(id int); > CREATE TABLE > postgres=# create table t2(id int); > CREATE TABLE > postgres=# create table t3(id int); > CREATE TABLE > postgres=# create table t4(id int); > CREATE TABLE > postgres=# create table t5(id int); > CREATE TABLE > postgres=# create subscription mysub1 connection 'port=5001' publication > mypub1; > NOTICE: created replication slot "mysub1" on publisher > CREATE SUBSCRIPTION > postgres=# create subscription mysub2 connection 'port=5001' publication > mypub2; > NOTICE: created replication slot "mysub2" on publisher > CREATE SUBSCRIPTION > postgres=# create subscription mysub3 connection 'port=5001' publication > mypub3; > NOTICE: created replication slot "mysub3" on publisher > CREATE SUBSCRIPTION > postgres=# create subscription mysub4 connection 'port=5001' publication > mypub4; > NOTICE: created replication slot "mysub4" on publisher > CREATE SUBSCRIPTION > postgres=# create subscription mysub5 connection 'port=5001' publication > mypub5; > NOTICE: created replication slot "mysub5" on publisher > CREATE SUBSCRIPTION > *postgres=# select count(1) from t_bytea;* > * count* > *---* > * 0* > *(1 row)* > > postgres=# table pg_subscription_rel; > srsubid | srrelid | srsubstate | srsublsn > -+-++--- >16406 | 16391 | r | 0/1722838 > * 16407 | 16394 | i |* > * 16408 | 16397 | i |* > * 16409 | 16397 | i |* > * 16410 | 16384 | i |* > (5 rows) > > # as expected no data in t_bytea (as it could not get any worker) > postgres=# select count(1) from t_bytea; > count > --- > 0 > (1 row) > > > > but logs clearly state the problem > 2022-02-03 23:18:31.107 IST [3430] LOG: logical replication table > synchronization worker for subscription "mysub1", table "t1" has started > 2022-02-03 23:18:31.138 IST [3430] LOG: logical replication table > synchronization worker for subscription "mysub1", table "t1" has finished > 2022-02-03 23:18:40.730 IST [3433] LOG: logical replication apply worker > for subscription "mysub2" has started > *2022-02-03 23:18:40.737 IST [3433] WARNING: out of logical replication > worker slots* > *2022-02-03 23:18:40.737 IST [3433] HINT: You might need to increase > max_logical_replication_workers.* > *2022-02-03 23:18:45.865 IST [3433] WARNING: out of logical replication > worker slots* > > > *#publisher logs* > 2022-02-03 23:18:31.096 IST [3427] STATEMENT: CREATE_REPLICATION_SLOT > "mysub1" LOGICAL pgoutput NOEXPORT_SNAPSHOT > 2022-02-03 23:18:31.106 IST [3429] LOG: starting logical decoding
Re: Subscription stuck at initialize state
trimming the email, to avoid noise. I spent a lot of time trying multiple options/combinations and finally managed to replicate your debug output. this is when i have , postgres=# show max_logical_replication_workers; * max_logical_replication_workers* - * 2* (1 row) on publisher postgres@controller:~$ psql -p 5001 psql (14.1 (Ubuntu 14.1-2.pgdg20.04+1)) Type "help" for help. postgres=# create table t_bytea(id int primary key, somebytea bytea); CREATE TABLE postgres=# create table t1(id int); CREATE TABLE postgres=# create table t2(id int); CREATE TABLE postgres=# create table t3(id int); CREATE TABLE postgres=# create publication mypub1 for table t1; CREATE PUBLICATION postgres=# create publication mypub2 for table t2; CREATE PUBLICATION postgres=# create publication mypub3 for table t3; CREATE PUBLICATION postgres=# create publication mypub4 for table t3; CREATE PUBLICATION postgres=# create publication mypub5 for table t_bytea; CREATE PUBLICATION postgres=# insert into t_bytea select x,repeat(repeat('xxx', 100), 1000)::bytea from generate_series(1, 1000) x; INSERT 0 1000 on subscriber postgres@controller:~$ psql -p 5002 psql (14.1 (Ubuntu 14.1-2.pgdg20.04+1)) Type "help" for help. postgres=# create table t_bytea(id int primary key, somebytea bytea); CREATE TABLE postgres=# create table t1(id int); CREATE TABLE postgres=# create table t2(id int); CREATE TABLE postgres=# create table t3(id int); CREATE TABLE postgres=# create table t4(id int); CREATE TABLE postgres=# create table t5(id int); CREATE TABLE postgres=# create subscription mysub1 connection 'port=5001' publication mypub1; NOTICE: created replication slot "mysub1" on publisher CREATE SUBSCRIPTION postgres=# create subscription mysub2 connection 'port=5001' publication mypub2; NOTICE: created replication slot "mysub2" on publisher CREATE SUBSCRIPTION postgres=# create subscription mysub3 connection 'port=5001' publication mypub3; NOTICE: created replication slot "mysub3" on publisher CREATE SUBSCRIPTION postgres=# create subscription mysub4 connection 'port=5001' publication mypub4; NOTICE: created replication slot "mysub4" on publisher CREATE SUBSCRIPTION postgres=# create subscription mysub5 connection 'port=5001' publication mypub5; NOTICE: created replication slot "mysub5" on publisher CREATE SUBSCRIPTION *postgres=# select count(1) from t_bytea;* * count* *---* * 0* *(1 row)* postgres=# table pg_subscription_rel; srsubid | srrelid | srsubstate | srsublsn -+-++--- 16406 | 16391 | r | 0/1722838 * 16407 | 16394 | i |* * 16408 | 16397 | i |* * 16409 | 16397 | i |* * 16410 | 16384 | i |* (5 rows) # as expected no data in t_bytea (as it could not get any worker) postgres=# select count(1) from t_bytea; count --- 0 (1 row) but logs clearly state the problem 2022-02-03 23:18:31.107 IST [3430] LOG: logical replication table synchronization worker for subscription "mysub1", table "t1" has started 2022-02-03 23:18:31.138 IST [3430] LOG: logical replication table synchronization worker for subscription "mysub1", table "t1" has finished 2022-02-03 23:18:40.730 IST [3433] LOG: logical replication apply worker for subscription "mysub2" has started *2022-02-03 23:18:40.737 IST [3433] WARNING: out of logical replication worker slots* *2022-02-03 23:18:40.737 IST [3433] HINT: You might need to increase max_logical_replication_workers.* *2022-02-03 23:18:45.865 IST [3433] WARNING: out of logical replication worker slots* *#publisher logs* 2022-02-03 23:18:31.096 IST [3427] STATEMENT: CREATE_REPLICATION_SLOT "mysub1" LOGICAL pgoutput NOEXPORT_SNAPSHOT 2022-02-03 23:18:31.106 IST [3429] LOG: starting logical decoding for slot "mysub1" 2022-02-03 23:18:31.106 IST [3429] DETAIL: Streaming transactions committing after 0/1722800, reading WAL from 0/17227C8. 2022-02-03 23:18:31.106 IST [3429] STATEMENT: START_REPLICATION SLOT "mysub1" LOGICAL 0/0 (proto_version '2', publication_names '"mypub1"') 2022-02-03 23:18:31.106 IST [3429] LOG: logical decoding found consistent point at 0/17227C8 2022-02-03 23:18:31.106 IST [3429] DETAIL: There are no running transactions. 2022-02-03 23:18:31.106 IST [3429] STATEMENT: START_REPLICATION SLOT "mysub1" LOGICAL 0/0 (proto_version '2', publication_names '"mypub1"') 2022-02-03 23:18:31.129 IST [3431] LOG: logical decoding found consistent point at 0/1722800 2022-02-03 23:18:31.129 IST [3431] DETAIL: There are no running transactions. 2022-02-03 23:18:31.129 IST [3431] STATEMENT: CREATE_REPLICATION_SLOT "pg_16406_sync_16391_7060540926182153512" LOGICAL pgoutput USE_SNAPSHOT 2022-02-03 23:18:31.135 IST [3431] LOG: starting logical decoding for slot "pg_16406_sync_16391_7060540926182153512" 2022-02-03 23:18:31.135 IST [3431] DETAIL: Streaming transactions committing after 0/1722838, reading WAL from 0/1722800. 2022-02-03 23:18:31.135 IST [
Re: Subscription stuck at initialize state
* *Basic Connectivity from target(subscriber) to source (Publisher)-* postgres@targethost:~$ psql -h sourcedb-vip --port=5432 --user=dba -d sourcedb Password for user dba: psql (13.3) Type "help" for help. sourcedb=# * *Create Publication* sourcedb=# CREATE PUBLICATION omx_archive FOR TABLE sample_table1, sample_table2, sample_table3 ; * *Create subscription* - *Please see the table sample_tables exists on both and have the same structure* targetdb=# CREATE SUBSCRIPTION sub_omx_archive_tci CONNECTION 'host=sourcedb-vip port=5432 user=dba dbname=sourcedb password=x' PUBLICATION omx_archive with (enabled =true, create_slot = true, copy_data = true); * *Target DB replication origin status having 0/0 remote_lsb* targetdb=# select * from pg_replication_origin_status; local_id | external_id | remote_lsn | local_lsn --+---++ 1 | *pg_3615756798 *| 9E98/9F62A7A8 | 15197/E0CA940 2 | *pg_3616584803 *| 0/0| 0/0 (2 rows) *postgres@targethost:~$ ps -ef | grep subscription* postgres 30997 238163 0 18:07 pts/000:00:00 grep --color=auto subscription postgres 168277 124365 1 13:46 ?00:04:23 postgres: logical replication worker for subscription *3616584803* postgres 178901 124365 0 05:08 ?00:00:08 postgres: logical replication worker for subscription *3615756798* * *Strace for the subscription 3616584803 (PID 168277) - the one with the problem. The other one is okay and hence not shown* postgres@targethost:~$ *strace -p 168277 * strace: Process 168277 attached epoll_wait(21, [], 1, 831) = 0 close(21) = 0 sendto(8, "d\0\0\0&r\0\0\236\230\343\275V\370\0\0\236\230\343\275V\370\0\0\236\230\343\275V\370\0\2"..., 39, MSG_NOSIGNAL, NULL, 0) = 39 recvfrom(8, 0x21f60f0, 16384, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) epoll_create1(EPOLL_CLOEXEC)= 21 epoll_ctl(21, EPOLL_CTL_ADD, 12, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=34804168, u64=34804168}}) = 0 epoll_ctl(21, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=34804192, u64=34804192}}) = 0 epoll_ctl(21, EPOLL_CTL_ADD, 8, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=34804216, u64=34804216}}) = 0 epoll_wait(21, [], 1, 1000) = 0 close(21) = 0 recvfrom(8, 0x21f60f0, 16384, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) epoll_create1(EPOLL_CLOEXEC)= 21 epoll_ctl(21, EPOLL_CTL_ADD, 12, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=34804168, u64=34804168}}) = 0 epoll_ctl(21, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=34804192, u64=34804192}}) = 0 epoll_ctl(21, EPOLL_CTL_ADD, 8, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=34804216, u64=34804216}}) = 0 epoll_wait(21, [], 1, 1000) = 0 close(21) = 0 sendto(8, "d\0\0\0&r\0\0\236\230\343\275V\370\0\0\236\230\343\275V\370\0\0\236\230\343\275V\370\0\2"..., 39, MSG_NOSIGNAL, NULL, 0) = 39 recvfrom(8, 0x21f60f0, 16384, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) epoll_create1(EPOLL_CLOEXEC)= 21 epoll_ctl(21, EPOLL_CTL_ADD, 12, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=34804168, u64=34804168}}) = 0 epoll_ctl(21, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=34804192, u64=34804192}}) = 0 epoll_ctl(21, EPOLL_CTL_ADD, 8, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=34804216, u64=34804216}}) = 0 epoll_wait(21, [], 1, 1000) = 0 close(21) = 0 recvfrom(8, 0x21f60f0, 16384, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) epoll_create1(EPOLL_CLOEXEC)= 21 epoll_ctl(21, EPOLL_CTL_ADD, 12, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=34804168, u64=34804168}}) = 0 epoll_ctl(21, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=34804192, u64=34804192}}) = 0 epoll_ctl(21, EPOLL_CTL_ADD, 8, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=34804216, u64=34804216}}) = 0 epoll_wait(21, [], 1, 1000) = 0 close(21) = 0 sendto(8, "d\0\0\0&r\0\0\236\230\343\275V\370\0\0\236\230\343\275V\370\0\0\236\230\343\275V\370\0\2"..., 39, MSG_NOSIGNAL, NULL, 0) = 39 recvfrom(8, 0x21f60f0, 16384, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) sendto(9, "<133>Feb 3 18:08:21 postgres[16"..., 167, MSG_NOSIGNAL, NULL, 0) = 167 sendto(9, "<133>Feb 3 18:08:21 postgres[16"..., 184, MSG_NOSIGNAL, NULL, 0) = 184 write(2, "\0\0\371\0U\221\2\0T2022-02-03 18:08:21.440"..., 258) = 258 sendto(9, "<133>Feb 3 18:08:21 postgres[16"..., 167, MSG_NOSIGNAL, NULL, 0) = 167 sendto(9, "<133>Feb 3 18:08:21 postgres[16"..., 184, MSG_NOSIGNAL, NULL, 0) = 184 write(2, "\0\0\371\0U\221\2\0T2022-02-03 18:08:21.440"..., 258) = 258 sendto(9, "<133>Feb 3 18:08:21 postgres[16"..., 167, MSG_NOSIGNAL, NULL, 0) = 167 sendto(9, "<133>Feb 3 18:08:21 postgres[16"..., 184, MSG_NOSIGNAL, NULL, 0) = 184 write(2, "\0\0\371\0U\221\2\0T2022-02-03 18:08:21.440"..., 258) = 258 sendto(9, "<133>Feb 3 18:08:21 postgres[16"..., 167, MSG_NOSIG
Re: Subscription stuck at initialize state
On Thu, 3 Feb 2022 at 12:44, Abhishek Bhola wrote: > Hi Vijaykumar, > > I checked the pg_subscription_rel and all the tables in that subscription > are in the state - i (initialize). > I also tried creating a new publication on the source DB with just one > table and tried to subscribe it, it doesn't work either. > However, when I try to subscribe it on some other DB than the one > mentioned above, it works. > By which I am deducing that publication and the source DB are okay, the > problem is on the target DB and it's subscription. > Maybe I will have to restart the DB as a last resort, but I am not sure if > that will solve the problem either. > > its a very verbose mail, so if it noisy, kindly ignore. else, can you check basic connectivity from the subscriber to publisher using psql and run a simple query ? can you share your "create publication" and "create subscription" commands/statements too please? i am attaching a general logical replication setup on a single server and put some scenarios where replication breaks and how to monitor and how to resume. and how that is monitored. postgres@controller:~$ tail db1/postgresql.conf db2/postgresql.conf ==> db1/postgresql.conf <== # Add settings for extensions here wal_level=logical archive_mode = on archive_command = '/bin/true' max_wal_size = 48MB min_wal_size = 32MB shared_buffers = 32MB port = 5001 max_logical_replication_workers = 10 ==> db2/postgresql.conf <== # Add settings for extensions here wal_level=logical archive_mode = on archive_command = '/bin/true' max_wal_size = 48MB min_wal_size = 32MB shared_buffers = 32MB port = 5002 max_logical_replication_workers = 10 postgres@controller:~$ pg_ctl -D db1 -l db1.log start waiting for server to start done server started postgres@controller:~$ pg_ctl -D db2 -l db2.log start waiting for server to start done server started postgres@controller:~$ psql -p 5001 psql (14.1 (Ubuntu 14.1-2.pgdg20.04+1)) Type "help" for help. postgres=# \x Expanded display is on. postgres=# create table t1(id int primary key); CREATE TABLE postgres=# create table t2(id int); -- this will throw error on delete, no replica identity CREATE TABLE postgres=# insert into t1 select x from generate_series(1, 100) x; INSERT 0 100 postgres=# insert into t2 select x from generate_series(1, 100) x; INSERT 0 100 postgres=# checkpoint; CHECKPOINT postgres=# \q postgres@controller:~$ psql -p 5002 psql (14.1 (Ubuntu 14.1-2.pgdg20.04+1)) Type "help" for help. postgres=# create table t1(id int primary key); CREATE TABLE postgres=# create table t2(id int); CREATE TABLE postgres=# \q postgres@controller:~$ ps aux | grep -i postgres: postgres1116 0.0 0.4 113632 8232 ?Ss 13:24 0:00 postgres: checkpointer postgres1117 0.0 0.2 113496 5868 ?Ss 13:24 0:00 postgres: background writer postgres1118 0.0 0.3 113496 6964 ?Ss 13:24 0:00 postgres: walwriter postgres1119 0.0 0.4 114032 8432 ?Ss 13:24 0:00 postgres: autovacuum launcher postgres1120 0.0 0.2 113496 4132 ?Ss 13:24 0:00 postgres: archiver postgres1121 0.0 0.2 72112 4896 ?Ss 13:24 0:00 postgres: stats collector postgres1122 0.0 0.3 113928 6732 ?Ss 13:24 0:00 postgres: logical replication launcher postgres1128 0.0 0.3 113496 5956 ?Ss 13:24 0:00 postgres: checkpointer postgres1129 0.0 0.3 113496 5916 ?Ss 13:24 0:00 postgres: background writer postgres1130 0.0 0.3 113496 6952 ?Ss 13:24 0:00 postgres: walwriter postgres1131 0.0 0.4 114032 8384 ?Ss 13:24 0:00 postgres: autovacuum launcher postgres1132 0.0 0.2 113496 4160 ?Ss 13:24 0:00 postgres: archiver postgres1133 0.0 0.2 72112 4868 ?Ss 13:24 0:00 postgres: stats collector postgres1134 0.0 0.3 113928 6804 ?Ss 13:24 0:00 postgres: logical replication launcher postgres1186 0.0 0.0 8164 724 pts/0S+ 13:26 0:00 grep -i postgres: postgres@controller:~$ psql -p 5001 psql (14.1 (Ubuntu 14.1-2.pgdg20.04+1)) Type "help" for help. postgres=# create publication mypub for all tables postgres=# create publication mypub for all tables with (publish='insert,update,delete,truncate'); CREATE PUBLICATION postgres=# \q postgres@controller:~$ psql -p 5002 psql (14.1 (Ubuntu 14.1-2.pgdg20.04+1)) Type "help" for help. postgres=# create subscription mysub connection 'port=5001' publication mypub; NOTICE: created replication slot "mysub" on publisher CREATE SUBSCRIPTION postgres=# select count(1) from t1; count --- 100 (1 row) postgres=# select count(1) from t2; count --- 100 (1 row) postgres=# \q postgres@controller:~$ ps aux | grep postgres: # strace the below pids for movement postgres1195 0.0 0.7 114800 14744 ?Ss 13:27 0:00 postgres: logical replication worker for subscription 16392 postgres1196 0.0 0.7 114728 14676 ?S
Re: Subscription stuck at initialize state
Hi Vijaykumar, I checked the pg_subscription_rel and all the tables in that subscription are in the state - i (initialize). I also tried creating a new publication on the source DB with just one table and tried to subscribe it, it doesn't work either. However, when I try to subscribe it on some other DB than the one mentioned above, it works. By which I am deducing that publication and the source DB are okay, the problem is on the target DB and it's subscription. Maybe I will have to restart the DB as a last resort, but I am not sure if that will solve the problem either. On Thu, Feb 3, 2022 at 3:33 PM Vijaykumar Jain < vijaykumarjain.git...@gmail.com> wrote: > > > On Thu, Feb 3, 2022, 10:32 AM Abhishek Bhola < > abhishek.bh...@japannext.co.jp> wrote: > >> So far I figured out that the problem is on the subscriber side. >> The same publication, when subscribed to on another DB, works fine. >> Also noticed that the remote_lsn value on the target DB is still 0/0. >> >> targetdb=# select * from pg_replication_origin_status ; >> local_id | external_id | remote_lsn | local_lsn >> --+---++ >> 1 | pg_3615756798 | 9E96/37152C80 | 1518C/9014BD98 >> 2 | pg_3616584803 | 0/0 | 0/0 >> (2 rows) >> >> Would really appreciate it if someone could help me with this. >> >> >> >> On Thu, Feb 3, 2022 at 9:53 AM Abhishek Bhola < >> abhishek.bh...@japannext.co.jp> wrote: >> >>> Update: Tried dropping and recreating publication on the source DB and >>> subscription still won't move ahead. >>> Not sure what I am missing. >>> >>> On Wed, Feb 2, 2022 at 1:20 PM Abhishek Bhola < >>> abhishek.bh...@japannext.co.jp> wrote: >>> The only statement I see on the target DB log is "logical replication apply worker for subscription ""sub_omx_archive_tci"" has started","" I don't see the logical replication table synchronization worker started for any of the tables in this subscription as I see in the other one. >>> > > This might help you track subscription state along with > pg_stat_subscription. > https://www.postgresql.org/docs/10/catalog-pg-subscription-rel.html > > It might be a huge table being 'copy' ed and writes would be slow due to > too many indexes etc. > > https://postgresteam.slack.com/files/UQMFAU01W/F02V69YK59P/untitled.sql > > also if you see nothing is moving, you may want to login to the server and > strace the worker processes pid to see if you see any movement at all or it > keeps looping on some errors. > > -- _This correspondence (including any attachments) is for the intended recipient(s) only. It may contain confidential or privileged information or both. No confidentiality or privilege is waived or lost by any mis-transmission. If you receive this correspondence by mistake, please contact the sender immediately, delete this correspondence (and all attachments) and destroy any hard copies. You must not use, disclose, copy, distribute or rely on any part of this correspondence (including any attachments) if you are not the intended recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。本情報の閲覧、発信または本情報に基づくいかなる行為も明確に禁止されていることをご了承ください。_
Re: Subscription stuck at initialize state
On Thu, Feb 3, 2022, 10:32 AM Abhishek Bhola wrote: > So far I figured out that the problem is on the subscriber side. > The same publication, when subscribed to on another DB, works fine. > Also noticed that the remote_lsn value on the target DB is still 0/0. > > targetdb=# select * from pg_replication_origin_status ; > local_id | external_id | remote_lsn | local_lsn > --+---++ > 1 | pg_3615756798 | 9E96/37152C80 | 1518C/9014BD98 > 2 | pg_3616584803 | 0/0 | 0/0 > (2 rows) > > Would really appreciate it if someone could help me with this. > > > > On Thu, Feb 3, 2022 at 9:53 AM Abhishek Bhola < > abhishek.bh...@japannext.co.jp> wrote: > >> Update: Tried dropping and recreating publication on the source DB and >> subscription still won't move ahead. >> Not sure what I am missing. >> >> On Wed, Feb 2, 2022 at 1:20 PM Abhishek Bhola < >> abhishek.bh...@japannext.co.jp> wrote: >> >>> The only statement I see on the target DB log is >>> "logical replication apply worker for subscription >>> ""sub_omx_archive_tci"" has started","" >>> >>> I don't see the logical replication table synchronization worker started >>> for any of the tables in this subscription as I see in the other one. >>> >> This might help you track subscription state along with pg_stat_subscription. https://www.postgresql.org/docs/10/catalog-pg-subscription-rel.html It might be a huge table being 'copy' ed and writes would be slow due to too many indexes etc. https://postgresteam.slack.com/files/UQMFAU01W/F02V69YK59P/untitled.sql also if you see nothing is moving, you may want to login to the server and strace the worker processes pid to see if you see any movement at all or it keeps looping on some errors.
Re: Subscription stuck at initialize state
So far I figured out that the problem is on the subscriber side. The same publication, when subscribed to on another DB, works fine. Also noticed that the remote_lsn value on the target DB is still 0/0. targetdb=# select * from pg_replication_origin_status ; local_id | external_id | remote_lsn | local_lsn --+---++ 1 | pg_3615756798 | 9E96/37152C80 | 1518C/9014BD98 2 | pg_3616584803 | 0/0 | 0/0 (2 rows) Would really appreciate it if someone could help me with this. On Thu, Feb 3, 2022 at 9:53 AM Abhishek Bhola < abhishek.bh...@japannext.co.jp> wrote: > Update: Tried dropping and recreating publication on the source DB and > subscription still won't move ahead. > Not sure what I am missing. > > On Wed, Feb 2, 2022 at 1:20 PM Abhishek Bhola < > abhishek.bh...@japannext.co.jp> wrote: > >> The only statement I see on the target DB log is >> "logical replication apply worker for subscription >> ""sub_omx_archive_tci"" has started","" >> >> I don't see the logical replication table synchronization worker started >> for any of the tables in this subscription as I see in the other one. >> >> Is there anything in particular that I should be looking for in the log >> files? >> >> On Wed, Feb 2, 2022 at 11:31 AM Steve Baldwin >> wrote: >> >>> Hi Abishek, >>> >>> Have you checked the subscriber and publisher database log files to see >>> if there is a problem blocking the subscription? For example, a subscribed >>> table missing a column that exists in the publisher. >>> >>> Cheers, >>> >>> Steve >>> >>> On Wed, Feb 2, 2022 at 1:26 PM Abhishek Bhola < >>> abhishek.bh...@japannext.co.jp> wrote: >>> I have 2 sets of publication/subscription between my 2 DBs. One of them is working fine and the other one is stuck at initializing state for all the tables. sourcedb=# select * from pg_publication; oid |pubname | pubowner | puballtables | pubinsert | pubupdate | pubdelete | pubtruncate | pubviaroot ---++--+--+---+---+---+-+ 19585 | omx_archive_big_tables |16420 | f| t | t | t | t | f 19584 | omx_archive|16420 | f| t | t | t | t | f targetdb=# select * from pg_subscription_rel ; srsubid | srrelid | srsubstate | srsublsn +++--- 3615804367 | 3322052690 | i | 3615804367 | 3322052570 | i | 3615756798 | 3322051793 | r | 9E7E/BF5F82D8 3615804367 | 3322052133 | i | 3615804367 | 3322054214 | i | 3615756798 | 3322051802 | r | 9E7E/C149BBD8 3615804367 | 3322051757 | i | targetdb=# select * from pg_subscription; -[ RECORD 1 ]---+-- oid | 3615756798 subdbid | 16589 subname | sub_omx_archive_big_tables_tci subowner| 16420 subenabled | t subconninfo | xxx subslotname | sub_omx_archive_big_tables_tci subsynccommit | off subpublications | {omx_archive_big_tables} -[ RECORD 2 ]---+-- oid | 3615804367 subdbid | 16589 subname | sub_omx_archive_tci subowner| 16420 subenabled | t subconninfo | subslotname | sub_omx_archive_tci subsynccommit | off subpublications | {omx_archive} I have dropped the subscription, recreated and refreshed it many times, but it won't move from the initializing phase. Any suggestions on how to start copying the data again, other than dropping the publication and re-creating it? *This correspondence (including any attachments) is for the intended recipient(s) only. It may contain confidential or privileged information or both. No confidentiality or privilege is waived or lost by any mis-transmission. If you receive this correspondence by mistake, please contact the sender immediately, delete this correspondence (and all attachments) and destroy any hard copies. You must not use, disclose, copy, distribute or rely on any part of this correspondence (including any attachments) if you are not the intended recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。
Re: Subscription stuck at initialize state
Update: Tried dropping and recreating publication on the source DB and subscription still won't move ahead. Not sure what I am missing. On Wed, Feb 2, 2022 at 1:20 PM Abhishek Bhola < abhishek.bh...@japannext.co.jp> wrote: > The only statement I see on the target DB log is > "logical replication apply worker for subscription ""sub_omx_archive_tci"" > has started","" > > I don't see the logical replication table synchronization worker started > for any of the tables in this subscription as I see in the other one. > > Is there anything in particular that I should be looking for in the log > files? > > On Wed, Feb 2, 2022 at 11:31 AM Steve Baldwin > wrote: > >> Hi Abishek, >> >> Have you checked the subscriber and publisher database log files to see >> if there is a problem blocking the subscription? For example, a subscribed >> table missing a column that exists in the publisher. >> >> Cheers, >> >> Steve >> >> On Wed, Feb 2, 2022 at 1:26 PM Abhishek Bhola < >> abhishek.bh...@japannext.co.jp> wrote: >> >>> I have 2 sets of publication/subscription between my 2 DBs. >>> One of them is working fine and the other one is stuck at initializing >>> state for all the tables. >>> >>> sourcedb=# select * from pg_publication; >>> oid |pubname | pubowner | puballtables | pubinsert | >>> pubupdate | pubdelete | pubtruncate | pubviaroot >>> >>> ---++--+--+---+---+---+-+ >>> 19585 | omx_archive_big_tables |16420 | f| t | >>> t | t | t | f >>> 19584 | omx_archive|16420 | f| t >>> | t | t | t | f >>> >>> >>> targetdb=# select * from pg_subscription_rel ; >>> srsubid | srrelid | srsubstate | srsublsn >>> +++--- >>> 3615804367 | 3322052690 | i | >>> 3615804367 | 3322052570 | i | >>> 3615756798 | 3322051793 | r | 9E7E/BF5F82D8 >>> 3615804367 | 3322052133 | i | >>> 3615804367 | 3322054214 | i | >>> 3615756798 | 3322051802 | r | 9E7E/C149BBD8 >>> 3615804367 | 3322051757 | i | >>> >>> targetdb=# select * from pg_subscription; >>> -[ RECORD 1 >>> ]---+-- >>> oid | 3615756798 >>> subdbid | 16589 >>> subname | sub_omx_archive_big_tables_tci >>> subowner| 16420 >>> subenabled | t >>> subconninfo | xxx >>> subslotname | sub_omx_archive_big_tables_tci >>> subsynccommit | off >>> subpublications | {omx_archive_big_tables} >>> -[ RECORD 2 >>> ]---+-- >>> oid | 3615804367 >>> subdbid | 16589 >>> subname | sub_omx_archive_tci >>> subowner| 16420 >>> subenabled | t >>> subconninfo | >>> subslotname | sub_omx_archive_tci >>> subsynccommit | off >>> subpublications | {omx_archive} >>> >>> I have dropped the subscription, recreated and refreshed it many times, >>> but it won't move from the initializing phase. >>> >>> Any suggestions on how to start copying the data again, other than >>> dropping the publication and re-creating it? >>> >>> >>> *This correspondence (including any attachments) is for the intended >>> recipient(s) only. It may contain confidential or privileged information or >>> both. No confidentiality or privilege is waived or lost by any >>> mis-transmission. If you receive this correspondence by mistake, please >>> contact the sender immediately, delete this correspondence (and all >>> attachments) and destroy any hard copies. You must not use, disclose, copy, >>> distribute or rely on any part of this correspondence (including any >>> attachments) if you are not the intended >>> recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。本情報の閲覧、発信または本情報に基づくいかなる行為も明確に禁止されていることをご了承ください。* >> >> -- _This correspondence (including any attachments) is for the intended recipient(s) only. It may contain confidential or privileged information or both. No confidentiality or privilege is waived or lost by any mis-transmission. If you receive this correspondence by mistake, please contact the sender immediately, delete this correspondence (and all attachments) and destroy any hard copies. You must not use, disclose, copy, distribute or rely on any part of this correspondence (including any attachments) if you are not the intended recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、
Re: Subscription stuck at initialize state
The only statement I see on the target DB log is "logical replication apply worker for subscription ""sub_omx_archive_tci"" has started","" I don't see the logical replication table synchronization worker started for any of the tables in this subscription as I see in the other one. Is there anything in particular that I should be looking for in the log files? On Wed, Feb 2, 2022 at 11:31 AM Steve Baldwin wrote: > Hi Abishek, > > Have you checked the subscriber and publisher database log files to see if > there is a problem blocking the subscription? For example, a subscribed > table missing a column that exists in the publisher. > > Cheers, > > Steve > > On Wed, Feb 2, 2022 at 1:26 PM Abhishek Bhola < > abhishek.bh...@japannext.co.jp> wrote: > >> I have 2 sets of publication/subscription between my 2 DBs. >> One of them is working fine and the other one is stuck at initializing >> state for all the tables. >> >> sourcedb=# select * from pg_publication; >> oid |pubname | pubowner | puballtables | pubinsert | >> pubupdate | pubdelete | pubtruncate | pubviaroot >> >> ---++--+--+---+---+---+-+ >> 19585 | omx_archive_big_tables |16420 | f| t | t >> | t | t | f >> 19584 | omx_archive|16420 | f| t >> | t | t | t | f >> >> >> targetdb=# select * from pg_subscription_rel ; >> srsubid | srrelid | srsubstate | srsublsn >> +++--- >> 3615804367 | 3322052690 | i | >> 3615804367 | 3322052570 | i | >> 3615756798 | 3322051793 | r | 9E7E/BF5F82D8 >> 3615804367 | 3322052133 | i | >> 3615804367 | 3322054214 | i | >> 3615756798 | 3322051802 | r | 9E7E/C149BBD8 >> 3615804367 | 3322051757 | i | >> >> targetdb=# select * from pg_subscription; >> -[ RECORD 1 >> ]---+-- >> oid | 3615756798 >> subdbid | 16589 >> subname | sub_omx_archive_big_tables_tci >> subowner| 16420 >> subenabled | t >> subconninfo | xxx >> subslotname | sub_omx_archive_big_tables_tci >> subsynccommit | off >> subpublications | {omx_archive_big_tables} >> -[ RECORD 2 >> ]---+-- >> oid | 3615804367 >> subdbid | 16589 >> subname | sub_omx_archive_tci >> subowner| 16420 >> subenabled | t >> subconninfo | >> subslotname | sub_omx_archive_tci >> subsynccommit | off >> subpublications | {omx_archive} >> >> I have dropped the subscription, recreated and refreshed it many times, >> but it won't move from the initializing phase. >> >> Any suggestions on how to start copying the data again, other than >> dropping the publication and re-creating it? >> >> >> *This correspondence (including any attachments) is for the intended >> recipient(s) only. It may contain confidential or privileged information or >> both. No confidentiality or privilege is waived or lost by any >> mis-transmission. If you receive this correspondence by mistake, please >> contact the sender immediately, delete this correspondence (and all >> attachments) and destroy any hard copies. You must not use, disclose, copy, >> distribute or rely on any part of this correspondence (including any >> attachments) if you are not the intended >> recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。本情報の閲覧、発信または本情報に基づくいかなる行為も明確に禁止されていることをご了承ください。* > > -- _This correspondence (including any attachments) is for the intended recipient(s) only. It may contain confidential or privileged information or both. No confidentiality or privilege is waived or lost by any mis-transmission. If you receive this correspondence by mistake, please contact the sender immediately, delete this correspondence (and all attachments) and destroy any hard copies. You must not use, disclose, copy, distribute or rely on any part of this correspondence (including any attachments) if you are not the intended recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。本情報の閲覧、発信または本情報に基づくいかなる行為も明確に禁止されていることをご了承ください。_
Re: Subscription stuck at initialize state
Hi Abishek, Have you checked the subscriber and publisher database log files to see if there is a problem blocking the subscription? For example, a subscribed table missing a column that exists in the publisher. Cheers, Steve On Wed, Feb 2, 2022 at 1:26 PM Abhishek Bhola < abhishek.bh...@japannext.co.jp> wrote: > I have 2 sets of publication/subscription between my 2 DBs. > One of them is working fine and the other one is stuck at initializing > state for all the tables. > > sourcedb=# select * from pg_publication; > oid |pubname | pubowner | puballtables | pubinsert | > pubupdate | pubdelete | pubtruncate | pubviaroot > > ---++--+--+---+---+---+-+ > 19585 | omx_archive_big_tables |16420 | f| t | t > | t | t | f > 19584 | omx_archive|16420 | f| t > | t | t | t | f > > > targetdb=# select * from pg_subscription_rel ; > srsubid | srrelid | srsubstate | srsublsn > +++--- > 3615804367 | 3322052690 | i | > 3615804367 | 3322052570 | i | > 3615756798 | 3322051793 | r | 9E7E/BF5F82D8 > 3615804367 | 3322052133 | i | > 3615804367 | 3322054214 | i | > 3615756798 | 3322051802 | r | 9E7E/C149BBD8 > 3615804367 | 3322051757 | i | > > targetdb=# select * from pg_subscription; > -[ RECORD 1 > ]---+-- > oid | 3615756798 > subdbid | 16589 > subname | sub_omx_archive_big_tables_tci > subowner| 16420 > subenabled | t > subconninfo | xxx > subslotname | sub_omx_archive_big_tables_tci > subsynccommit | off > subpublications | {omx_archive_big_tables} > -[ RECORD 2 > ]---+-- > oid | 3615804367 > subdbid | 16589 > subname | sub_omx_archive_tci > subowner| 16420 > subenabled | t > subconninfo | > subslotname | sub_omx_archive_tci > subsynccommit | off > subpublications | {omx_archive} > > I have dropped the subscription, recreated and refreshed it many times, > but it won't move from the initializing phase. > > Any suggestions on how to start copying the data again, other than > dropping the publication and re-creating it? > > > *This correspondence (including any attachments) is for the intended > recipient(s) only. It may contain confidential or privileged information or > both. No confidentiality or privilege is waived or lost by any > mis-transmission. If you receive this correspondence by mistake, please > contact the sender immediately, delete this correspondence (and all > attachments) and destroy any hard copies. You must not use, disclose, copy, > distribute or rely on any part of this correspondence (including any > attachments) if you are not the intended > recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。本情報の閲覧、発信または本情報に基づくいかなる行為も明確に禁止されていることをご了承ください。*
Subscription stuck at initialize state
I have 2 sets of publication/subscription between my 2 DBs. One of them is working fine and the other one is stuck at initializing state for all the tables. sourcedb=# select * from pg_publication; oid |pubname | pubowner | puballtables | pubinsert | pubupdate | pubdelete | pubtruncate | pubviaroot ---++--+--+---+---+---+-+ 19585 | omx_archive_big_tables |16420 | f| t | t | t | t | f 19584 | omx_archive|16420 | f| t | t | t | t | f targetdb=# select * from pg_subscription_rel ; srsubid | srrelid | srsubstate | srsublsn +++--- 3615804367 | 3322052690 | i | 3615804367 | 3322052570 | i | 3615756798 | 3322051793 | r | 9E7E/BF5F82D8 3615804367 | 3322052133 | i | 3615804367 | 3322054214 | i | 3615756798 | 3322051802 | r | 9E7E/C149BBD8 3615804367 | 3322051757 | i | targetdb=# select * from pg_subscription; -[ RECORD 1 ]---+-- oid | 3615756798 subdbid | 16589 subname | sub_omx_archive_big_tables_tci subowner| 16420 subenabled | t subconninfo | xxx subslotname | sub_omx_archive_big_tables_tci subsynccommit | off subpublications | {omx_archive_big_tables} -[ RECORD 2 ]---+-- oid | 3615804367 subdbid | 16589 subname | sub_omx_archive_tci subowner| 16420 subenabled | t subconninfo | subslotname | sub_omx_archive_tci subsynccommit | off subpublications | {omx_archive} I have dropped the subscription, recreated and refreshed it many times, but it won't move from the initializing phase. Any suggestions on how to start copying the data again, other than dropping the publication and re-creating it? -- _This correspondence (including any attachments) is for the intended recipient(s) only. It may contain confidential or privileged information or both. No confidentiality or privilege is waived or lost by any mis-transmission. If you receive this correspondence by mistake, please contact the sender immediately, delete this correspondence (and all attachments) and destroy any hard copies. You must not use, disclose, copy, distribute or rely on any part of this correspondence (including any attachments) if you are not the intended recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。本情報の閲覧、発信または本情報に基づくいかなる行為も明確に禁止されていることをご了承ください。_