On Mon, Feb 1, 2021 at 11:26 PM Amit Kapila <[email protected]> wrote: > I have updated the patch to display WARNING for each of the tablesync > slots during DropSubscription. As discussed, I have moved the drop > slot related code towards the end in AlterSubscription_refresh. Apart > from this, I have fixed one more issue in tablesync code where in > after catching the exception we were not clearing the transaction > state on the publisher, see changes in LogicalRepSyncTableStart. I > have also fixed other comments raised by you. Additionally, I have > removed the test because it was creating the same name slot as the > tablesync worker and tablesync worker removed the same due to new > logic in LogicalRepSyncStart. Earlier, it was not failing because of > the bug in that code which I have fixed in the attached. >
I was testing this patch. I had a table on the subscriber which had a row that would cause a PK constraint violation during the table copy. This is resulting in the subscriber trying to rollback the table copy and failing. 2021-02-01 23:28:16.041 EST [23738] LOG: logical replication apply worker for subscription "tap_sub" has started 2021-02-01 23:28:16.051 EST [23740] LOG: logical replication table synchronization worker for subscription "tap_sub", table "tab_rep" has started 2021-02-01 23:28:21.118 EST [23740] ERROR: table copy could not rollback transaction on publisher 2021-02-01 23:28:21.118 EST [23740] DETAIL: The error was: another command is already in progress 2021-02-01 23:28:21.122 EST [8028] LOG: background worker "logical replication worker" (PID 23740) exited with exit code 1 2021-02-01 23:28:21.125 EST [23908] LOG: logical replication table synchronization worker for subscription "tap_sub", table "tab_rep" has started 2021-02-01 23:28:21.138 EST [23908] ERROR: could not create replication slot "pg_16398_sync_16384": ERROR: replication slot "pg_16398_sync_16384" already exists 2021-02-01 23:28:21.139 EST [8028] LOG: background worker "logical replication worker" (PID 23908) exited with exit code 1 2021-02-01 23:28:26.168 EST [24048] LOG: logical replication table synchronization worker for subscription "tap_sub", table "tab_rep" has started 2021-02-01 23:28:34.244 EST [24048] ERROR: table copy could not rollback transaction on publisher 2021-02-01 23:28:34.244 EST [24048] DETAIL: The error was: another command is already in progress 2021-02-01 23:28:34.251 EST [8028] LOG: background worker "logical replication worker" (PID 24048) exited with exit code 1 2021-02-01 23:28:34.254 EST [24337] LOG: logical replication table synchronization worker for subscription "tap_sub", table "tab_rep" has started 2021-02-01 23:28:34.263 EST [24337] ERROR: could not create replication slot "pg_16398_sync_16384": ERROR: replication slot "pg_16398_sync_16384" already exists 2021-02-01 23:28:34.264 EST [8028] LOG: background worker "logical replication worker" (PID 24337) exited with exit code 1 And one more thing I see is that now we error out in PG_CATCH() in LogicalRepSyncTableStart() with the above error and as a result, the tablesync slot is not dropped. Hence causing the slot create to fail in the next restart. I think this can be avoided. We could either attempt a rollback only on specific failures and drop slot prior to erroring out. regards, Ajin Cherian Fujitsu Australia
