On Sat, Dec 19, 2020 at 5:38 PM Amit Kapila <[email protected]> wrote:
>
> On Fri, Dec 18, 2020 at 6:41 PM Peter Smith <[email protected]> wrote:
> >
> > TODO / Known Issues:
> >
> > * the current implementation of tablesync drop slot (e.g. from
> > DropSubscription or finish_sync_worker) regenerates the tablesync slot
> > name so it knows what slot to drop.
> >
>
> If you always drop the slot at finish_sync_worker, then in which case
> do you need to drop it during DropSubscription? Is it when the table
> sync workers are crashed?
Yes. It is not the normal case. But if the tablesync never yet got to
SYNCDONE state (maybe crashed) then finish_sync_worker may not be
called.
So I think a rogue tablesync slot might still exist during DropSubscription.
>
> > The current code might be ok for
> > normal use cases, but if there is an ALTER SUBSCRIPTION ... SET
> > (slot_name = newname) it would fail to be able to find the tablesync
> > slot.
> >
>
> Sure, but the same will be true for the apply worker slot as well. I
> agree the problem would be more for table sync workers but I think we
> can solve it, see below.
>
> > * I think if there are crashed tablesync workers then they are not
> > known to DropSubscription. So this might be a problem to cleanup slots
> > and/or origin tracking belonging to those unknown workers.
> >
>
> Yeah, I think we can do two things to avoid this and the previous
> problem. (a) We can generate the slot_name for the table sync worker
> based on only subscription_id and rel_id. (b) Immediately after
> creating the slot, advance the replication origin with the position
> (origin_startpos) we get from walrcv_create_slot, this will help us to
> start from the right location.
>
> Do you see anything which will still not be addressed after doing the above?
(a) V5 Patch is updated as suggested.
(b) V5 Patch is updated as suggested. Now calling replorigin_advance.
No problems seen so far. All TAP tests pass, but more testing needed
for the origin stuff
>
> I understand why you are trying to create this patch atop logical
> decoding of 2PC patch but I think it is better to create this as an
> independent patch and then use it to test 2PC problem.
OK. The latest patch still applies to v30 just for my convenience
today, but I will head towards converting this to an independent patch
ASAP.
> Also, please
> explain what kind of testing you did to ensure that it works properly
> after the table sync worker restarts after the crash.
So far tested like this - I caused the tablesync to crash after
COPYDONE (but before SYNCDONE) by sending a row to cause a PK
violation while holding the tablesync at the CATCHUP state in the
debugger. The tablesync then handles the insert, encounters the PK
violation error, and re-launches. Then I can remove the extra row so
the PK violation does not happen, so the (re-launched) tablesync can
complete and finish normally. The Apply worker then takes over.
I have attached some captured/annotated logging of my test scenario
which I ran using the V4 patch (the log has a lot of extra temporary
output to help see what is going on)
---
Kind Regards,
Peter Smith.
Fujitsu Australia.
TEST CASE:
§ Insert some row to subscriber side only
§ Start the SUBSCRIPTION – to start tablesync worker
§ Let tablesync do the copy (so mode will be COPYDONE)
§ In debugger hold the tablesync worker at the CATCHUP point
§ Then issue INSERT (from publisher) to cause PK violation with same row that
was previous inserted as subscriber side
§ Tablesync worker should ERROR when handling this duplicate insert
§ Then a *new* tablesync worker may be launched for same table
§ And this newtablesync worker should detect the state was at COPYDONE, and so
this time it should bypass the initial copy phase.
§ In debugger hold the (re-launched) tablesync worker at the CATCHUP point
§ Delete the subscriber row which was causeing the PK viloation
§ Let the tablesync proceed. Now it should be ableto handle replaciation and
finish OK.
§ Then Apply worker takes over...
LOG FRAGMENTS BELOW
===================
##
## Start with empty tables
##
[postgres@CentOS7-x64 ~]$ psql -d test_pub -c "TRUNCATE test_tab;"
TRUNCATE TABLE
[postgres@CentOS7-x64 ~]$ psql -d test_sub -p 54321 -c "TRUNCATE test_tab;"
TRUNCATE TABLE
##
## Poulate 1 row data on publisher.
##
[postgres@CentOS7-x64 ~]$ psql -d test_pub -c "INSERT INTO test_tab VALUES(1,
'foo');"
INSERT 0 1
##
## Add a row to subscription side only (this is to cause a subsequent PK
violation)
##
[postgres@CentOS7-x64 ~]$ psql -d test_sub -p 54321 -c "INSERT INTO test_tab
VALUES(2, 'bar');"
INSERT 0 1
##
## Start SUBSCRIPTION
## Attach to tablesync
## Get to CATCHUP mode
##
[postgres@CentOS7-x64 ~]$ psql -d test_sub -p 54321 -c "CREATE SUBSCRIPTION
tap_sub CONNECTION 'host=localhost dbname=test_pub application_name=tap_sub'
PUBLICATION tap_pub WITH (two_phase = on);"
2020-12-18 22:54:04.280 AEDT [18765] LOG: logical decoding found consistent
point at 0/1653608
2020-12-18 22:54:04.280 AEDT [18765] DETAIL: There are no running transactions.
2020-12-18 22:54:04.280 AEDT [18765] STATEMENT: CREATE_REPLICATION_SLOT
"tap_sub" LOGICAL pgoutput NOEXPORT_SNAPSHOT
NOTICE: created replication slot "tap_sub" on publisher
CREATE SUBSCRIPTION
2020-12-18 22:54:04.289 AEDT [18766] LOG: logical replication apply worker for
subscription "tap_sub" has started
2020-12-18 22:54:04.289 AEDT [18766] LOG: !!>> The apply worker process has
PID = 18766
[postgres@CentOS7-x64 ~]$ 2020-12-18 22:54:04.296 AEDT [18771] LOG: starting
logical decoding for slot "tap_sub"
2020-12-18 22:54:04.296 AEDT [18771] DETAIL: Streaming transactions committing
after 0/1653640, reading WAL from 0/1653608.
2020-12-18 22:54:04.296 AEDT [18771] STATEMENT: START_REPLICATION SLOT
"tap_sub" LOGICAL 0/0 (proto_version '2', two_phase 'on', publication_names
'"tap_pub"')
2020-12-18 22:54:04.296 AEDT [18771] LOG: logical decoding found consistent
point at 0/1653608
2020-12-18 22:54:04.296 AEDT [18771] DETAIL: There are no running transactions.
2020-12-18 22:54:04.296 AEDT [18771] STATEMENT: START_REPLICATION SLOT
"tap_sub" LOGICAL 0/0 (proto_version '2', two_phase 'on', publication_names
'"tap_pub"')
2020-12-18 22:54:04.296 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:04.296 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:04.300 AEDT [18774] LOG: logical replication table
synchronization worker for subscription "tap_sub", table "test_tab" has started
2020-12-18 22:54:04.300 AEDT [18774] LOG: !!>> The tablesync worker process
has PID = 18774
2020-12-18 22:54:04.300 AEDT [18774] LOG: !!>>
Sleeping 30 secs. For debugging, attach to process 18774 now!
2020-12-18 22:54:05.311 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:05.311 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:06.312 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:06.313 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:07.314 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:07.314 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:08.315 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:08.315 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:09.317 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:09.317 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:10.319 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:10.319 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:11.320 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:11.321 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:12.323 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:12.323 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:13.325 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:13.325 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:14.331 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:14.332 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:14.378 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:14.378 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:15.382 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:15.382 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:16.384 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:16.384 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:17.386 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:17.386 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:18.388 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:18.388 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:18.656 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:18.657 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:19.658 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:19.658 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:20.662 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:20.662 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:21.666 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:21.666 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:22.677 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:22.677 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:23.680 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:23.680 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:24.680 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:24.680 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:25.686 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:25.687 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:26.688 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:26.688 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:27.689 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:27.689 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:28.694 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:28.694 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:29.696 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:29.696 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:30.700 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:30.700 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:31.703 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:31.703 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:32.712 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:32.712 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:33.715 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:33.715 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:34.307 AEDT [18774] LOG: !!>> tablesync worker: About to call
LogicalRepSyncTableStart to do initial syncing
2020-12-18 22:54:34.718 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:34.718 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:35.723 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:35.723 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:36.727 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:36.727 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:37.731 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:37.731 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:38.735 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:38.735 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:39.739 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:39.739 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:40.745 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:40.745 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:41.746 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:41.746 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:42.749 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:42.749 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:43.753 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:43.753 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:44.754 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:44.755 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:45.759 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:45.759 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:46.761 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:46.761 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:47.763 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:47.763 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:48.669 AEDT [18774] LOG: !!>> LogicalRepSyncTableStart:
walrcv_create_slot for "tap_sub_65608_sync_16385".
2020-12-18 22:54:48.684 AEDT [20106] LOG: logical decoding found consistent
point at 0/1653728
2020-12-18 22:54:48.684 AEDT [20106] DETAIL: There are no running transactions.
2020-12-18 22:54:48.684 AEDT [20106] STATEMENT: CREATE_REPLICATION_SLOT
"tap_sub_65608_sync_16385" LOGICAL pgoutput USE_SNAPSHOT
2020-12-18 22:54:48.690 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:48.691 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 22:54:48.701 AEDT [18774] LOG: !!>> LogicalRepSyncTableStart:
create replication origin tracking "pg_65608_16385".
2020-12-18 22:54:48.702 AEDT [18774] LOG: !!>> LogicalRepSyncTableStart: setup
replication origin tracking "pg_65608_16385".
2020-12-18 22:54:48.702 AEDT [18774] LOG: !!>> tablesync worker: wait for
CATCHUP state notification
2020-12-18 22:54:48.702 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 22:54:48.702 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
## COPYDONE state?
## Origin tracking names?
psql -d test_sub -p 54321 -c "select * from pg_subscription_rel;"
srsubid | srrelid | srsubstate | srsublsn
---------+---------+------------+----------
65608 | 16385 | C |
(1 row)
[postgres@CentOS7-x64 ~]$ psql -d test_sub -p 54321 -c "select * from
pg_replication_origin;"
roident | roname
---------+----------------
1 | pg_65608
2 | pg_65608_16385
(2 rows)
## Copied data?
[postgres@CentOS7-x64 ~]$ psql -d test_sub -p 54321 -c "SELECT * FROM test_tab;"
a | b | c | d
---+-----+-------------------------------+-----
2 | bar | 2020-12-18 22:53:26.772818+11 | 999
1 | foo | 2020-12-18 22:52:56.26031+11 | 999
(2 rows)
## debugging (in CATCHUP mode)
(gdb) p *MyLogicalRepWorker
$1 = {launch_time = 661607644296792, in_use = true, generation = 2, proc =
0x7ff727e95180,
dbid = 16384, userid = 10, subid = 65608, relid = 16385, relstate = 99 'c',
relstate_lsn = 23410528, relmutex = 0 '\000', last_lsn = 0, last_send_time =
661607644297610,
last_recv_time = 661607644297610, reply_lsn = 0, reply_time = 661607644297610}
## Now try to replicate a row which will cause PK violation
2020-12-18 23:01:32.897 AEDT [18771] CONTEXT: slot "tap_sub", output plugin
"pgoutput", in the commit callback, associated LSN 0/16538C0
2020-12-18 23:01:32.897 AEDT [18771] STATEMENT: START_REPLICATION SLOT
"tap_sub" LOGICAL 0/0 (proto_version '2', two_phase 'on', publication_names
'"tap_pub"')
INSERT 0 1
##
## Let tablesync try to cope with that
##
[postgres@CentOS7-x64 ~]$ 2020-12-18 23:02:01.396 AEDT [18774] LOG: !!>>
tablesync worker: received CATCHUP state notification
2020-12-18 23:02:04.490 AEDT [18774] LOG: !!>> tablesync worker: Returned from
LogicalRepSyncTableStart
2020-12-18 23:02:07.522 AEDT [20106] LOG: starting logical decoding for slot
"tap_sub_65608_sync_16385"
2020-12-18 23:02:07.522 AEDT [20106] DETAIL: Streaming transactions committing
after 0/1653760, reading WAL from 0/1653728.
2020-12-18 23:02:07.522 AEDT [20106] STATEMENT: START_REPLICATION SLOT
"tap_sub_65608_sync_16385" LOGICAL 0/0 (proto_version '2', two_phase 'on',
publication_names '"tap_pub"')
2020-12-18 23:02:07.522 AEDT [20106] LOG: logical decoding found consistent
point at 0/1653728
2020-12-18 23:02:07.522 AEDT [20106] DETAIL: There are no running transactions.
2020-12-18 23:02:07.522 AEDT [20106] STATEMENT: START_REPLICATION SLOT
"tap_sub_65608_sync_16385" LOGICAL 0/0 (proto_version '2', two_phase 'on',
publication_names '"tap_pub"')
2020-12-18 23:02:07.523 AEDT [20106] LOG: !!>>
pgoutput_begin_txn
2020-12-18 23:02:07.523 AEDT [20106] CONTEXT: slot "tap_sub_65608_sync_16385",
output plugin "pgoutput", in the begin callback, associated LSN 0/1653760
2020-12-18 23:02:07.523 AEDT [20106] STATEMENT: START_REPLICATION SLOT
"tap_sub_65608_sync_16385" LOGICAL 0/0 (proto_version '2', two_phase 'on',
publication_names '"tap_pub"')
2020-12-18 23:02:07.523 AEDT [20106] LOG: !!>>
pgoutput_commit_txn
2020-12-18 23:02:07.523 AEDT [20106] CONTEXT: slot "tap_sub_65608_sync_16385",
output plugin "pgoutput", in the commit callback, associated LSN 0/16538C0
2020-12-18 23:02:07.523 AEDT [20106] STATEMENT: START_REPLICATION SLOT
"tap_sub_65608_sync_16385" LOGICAL 0/0 (proto_version '2', two_phase 'on',
publication_names '"tap_pub"')
2020-12-18 23:02:09.681 AEDT [18774] LOG: !!>> tablesync worker:
LogicalRepApplyLoop
2020-12-18 23:02:11.457 AEDT [18774] LOG: !!>> tablesync worker:
apply_dispatch for message kind 'B'
2020-12-18 23:02:13.482 AEDT [18774] LOG: !!>> tablesync worker:
apply_dispatch for message kind 'R'
2020-12-18 23:02:15.450 AEDT [18774] LOG: !!>> tablesync worker:
apply_dispatch for message kind 'I'
2020-12-18 23:02:17.309 AEDT [18774] LOG: !!>> tablesync worker:
should_apply_changes_for_rel: true
rel->state == READY: false
rel->state == SYNCDONE: false
rel->statelsn <= remote_final_lsn: true
2020-12-18 23:02:17.309 AEDT [18774] ERROR: duplicate key value violates
unique constraint "test_tab_pkey"
2020-12-18 23:02:17.309 AEDT [18774] DETAIL: Key (a)=(2) already exists.
## Above is the PK violation
## The apply worker proceeded but decided it should not yet do anything
(because tablesync not finished?)
2020-12-18 23:02:17.311 AEDT [27204] LOG: background worker "logical
replication worker" (PID 18774) exited with exit code 1
2020-12-18 23:02:17.311 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 23:02:17.311 AEDT [18766] LOG: !!>> apply worker: apply_dispatch
for message kind 'B'
2020-12-18 23:02:17.311 AEDT [18766] LOG: !!>> apply worker: apply_dispatch
for message kind 'R'
2020-12-18 23:02:17.311 AEDT [18766] LOG: !!>> apply worker: apply_dispatch
for message kind 'I'
2020-12-18 23:02:17.312 AEDT [18766] LOG: !!>> apply worker:
should_apply_changes_for_rel: false
rel->state == READY: false
rel->state == SYNCDONE: false
rel->statelsn <= remote_final_lsn: true
2020-12-18 23:02:17.312 AEDT [18766] LOG: !!>> apply worker: apply_dispatch
for message kind 'C'
2020-12-18 23:02:17.312 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
## The replacement tablesyn relaunches
## It skips the COPY since already done
## It reused same origin tracking that was previously created
2020-12-18 23:02:17.316 AEDT [32050] LOG: logical replication table
synchronization worker for subscription "tap_sub", table "test_tab" has started
2020-12-18 23:02:17.316 AEDT [32050] LOG: !!>> The tablesync worker process
has PID = 32050
2020-12-18 23:02:17.316 AEDT [32050] LOG: !!>>
Sleeping 30 secs. For debugging, attach to process 32050 now!
2020-12-18 23:02:17.323 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 23:02:18.325 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 23:02:18.325 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 23:02:19.326 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 23:02:19.326 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 23:02:20.329 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 23:02:20.329 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
...
2020-12-18 23:02:44.417 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 23:02:45.419 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 23:02:45.419 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 23:02:46.420 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 23:02:46.420 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 23:02:47.332 AEDT [32050] LOG: !!>> tablesync worker: About to call
LogicalRepSyncTableStart to do initial syncing
2020-12-18 23:02:47.422 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 23:02:47.422 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 23:02:47.422 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 23:02:47.422 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 23:02:48.423 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 23:02:48.423 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 23:02:48.773 AEDT [32050] LOG: !!>> LogicalRepSyncTableStart:
tablesync relstate was SUBREL_STATE_COPYDONE.
2020-12-18 23:02:48.773 AEDT [32050] LOG: !!>> LogicalRepSyncTableStart: setup
replication origin tracking "pg_65608_16385".
2020-12-18 23:02:48.773 AEDT [32050] LOG: !!>> tablesync worker: wait for
CATCHUP state notification
2020-12-18 23:02:48.773 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 23:02:48.773 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
## Here we are at the 2nd tablesync at CATCHUP mode
## Delete the subscriber row so this replication wont cause PK violation next
time thru
psql -d test_sub -p 54321 -c "DELETE FROM test_tab WHERE a = 2";
DELETE 1
## And let 2nd tablesync keep going
[postgres@CentOS7-x64 ~]$ 2020-12-18 23:08:48.580 AEDT [32050] LOG: !!>>
tablesync worker: received CATCHUP state notification
2020-12-18 23:08:51.638 AEDT [32050] LOG: !!>> tablesync worker: Returned from
LogicalRepSyncTableStart
2020-12-18 23:09:03.006 AEDT [395] LOG: starting logical decoding for slot
"tap_sub_65608_sync_16385"
2020-12-18 23:09:03.006 AEDT [395] DETAIL: Streaming transactions committing
after 0/1653760, reading WAL from 0/1653728.
2020-12-18 23:09:03.006 AEDT [395] STATEMENT: START_REPLICATION SLOT
"tap_sub_65608_sync_16385" LOGICAL 0/0 (proto_version '2', two_phase 'on',
publication_names '"tap_pub"')
2020-12-18 23:09:03.006 AEDT [395] LOG: logical decoding found consistent
point at 0/1653728
2020-12-18 23:09:03.006 AEDT [395] DETAIL: There are no running transactions.
2020-12-18 23:09:03.006 AEDT [395] STATEMENT: START_REPLICATION SLOT
"tap_sub_65608_sync_16385" LOGICAL 0/0 (proto_version '2', two_phase 'on',
publication_names '"tap_pub"')
2020-12-18 23:09:03.006 AEDT [395] LOG: !!>>
pgoutput_begin_txn
2020-12-18 23:09:03.006 AEDT [395] CONTEXT: slot "tap_sub_65608_sync_16385",
output plugin "pgoutput", in the begin callback, associated LSN 0/1653760
2020-12-18 23:09:03.006 AEDT [395] STATEMENT: START_REPLICATION SLOT
"tap_sub_65608_sync_16385" LOGICAL 0/0 (proto_version '2', two_phase 'on',
publication_names '"tap_pub"')
2020-12-18 23:09:03.008 AEDT [395] LOG: !!>>
pgoutput_commit_txn
2020-12-18 23:09:03.008 AEDT [395] CONTEXT: slot "tap_sub_65608_sync_16385",
output plugin "pgoutput", in the commit callback, associated LSN 0/16538C0
2020-12-18 23:09:03.008 AEDT [395] STATEMENT: START_REPLICATION SLOT
"tap_sub_65608_sync_16385" LOGICAL 0/0 (proto_version '2', two_phase 'on',
publication_names '"tap_pub"')
## This time tablesync succeeds the inserts
## And cleanly finishes
2020-12-18 23:09:05.077 AEDT [32050] LOG: !!>> tablesync worker:
LogicalRepApplyLoop
2020-12-18 23:09:08.085 AEDT [32050] LOG: !!>> tablesync worker:
apply_dispatch for message kind 'B'
2020-12-18 23:09:11.228 AEDT [32050] LOG: !!>> tablesync worker:
apply_dispatch for message kind 'R'
2020-12-18 23:09:12.868 AEDT [32050] LOG: !!>> tablesync worker:
apply_dispatch for message kind 'I'
2020-12-18 23:09:15.189 AEDT [32050] LOG: !!>> tablesync worker:
should_apply_changes_for_rel: true
rel->state == READY: false
rel->state == SYNCDONE: false
rel->statelsn <= remote_final_lsn: true
2020-12-18 23:09:29.383 AEDT [32050] LOG: !!>> tablesync worker:
apply_dispatch for message kind 'C'
2020-12-18 23:09:40.517 AEDT [32050] LOG: !!>> tablesync worker: called
process_syncing_tables
2020-12-18 23:10:31.043 AEDT [32050] LOG: !!>> tablesync worker: called
process_syncing_tables
2020-12-18 23:10:39.233 AEDT [32050] LOG: logical replication table
synchronization worker for subscription "tap_sub", table "test_tab" has finished
2020-12-18 23:10:39.233 AEDT [32050] LOG: !!>> finish_sync_worker: dropping
the tablesync slot "tap_sub_65608_sync_16385".
2020-12-18 23:10:39.240 AEDT [32050] LOG: dropped replication slot
"tap_sub_65608_sync_16385" on publisher
## Appy worker takes over again, and removes the origin tracking from the (now
completed) tablesync
2020-12-18 23:10:39.240 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 23:10:39.240 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 23:10:39.240 AEDT [18766] LOG: !!>> apply worker: find tablesync
origin tracking for "pg_65608_16385".
2020-12-18 23:10:39.240 AEDT [18766] LOG: !!>> apply worker: dropping
tablesync origin tracking for "pg_65608_16385".
2020-12-18 23:10:39.241 AEDT [18766] LOG: !!>> apply worker: dropped tablesync
origin tracking for "pg_65608_16385".
2020-12-18 23:10:39.241 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 23:10:39.241 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 23:10:40.243 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 23:10:40.243 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 23:10:41.244 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 23:10:41.244 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 23:10:42.246 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
...
2020-12-18 23:10:52.281 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 23:10:52.281 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 23:10:53.284 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 23:10:53.284 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 23:10:54.287 AEDT [18766] LOG: !!>> apply worker:
LogicalRepApplyLoop
2020-12-18 23:10:54.287 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
psql -d test_sub -p 54321 -c "DROP SUBSCRIPTION tap_sub;"2020-12-18
23:10:55.294 AEDT [18766] LOG: !!>> apply worker: LogicalRepApplyLoop
2020-12-18 23:10:55.294 AEDT [18766] LOG: !!>> apply worker: called
process_syncing_tables
2020-12-18 23:10:55.896 AEDT [18766] FATAL: terminating logical replication
worker due to administrator command
2020-12-18 23:10:55.897 AEDT [27204] LOG: background worker "logical
replication worker" (PID 18766) exited with exit code 1
NOTICE: dropped replication slot "tap_sub" on publisher
DROP SUBSCRIPTION
[postgres@CentOS7-x64 ~]$ psql -d test_sub -p 54321 -c "select * from
pg_replication_origin;"
roident | roname
---------+--------
(0 rows)