On Wed, 13 Jan 2021 at 13:26, Amit Kapila wrote: > On Tue, Jan 12, 2021 at 4:59 PM Bharath Rupireddy > <bharath.rupireddyforpostg...@gmail.com> wrote: >> >> On Tue, Jan 12, 2021 at 12:06 PM Amit Kapila <amit.kapil...@gmail.com> wrote: >> > > Here's my analysis: >> > > 1) in the publisher, alter publication drop table successfully >> > > removes(PublicationDropTables) the table from the catalogue >> > > pg_publication_rel >> > > 2) in the subscriber, alter subscription refresh publication >> > > successfully removes the table from the catalogue pg_subscription_rel >> > > (AlterSubscription_refresh->RemoveSubscriptionRel) >> > > so far so good >> > > >> > >> > Here, it should register the worker to stop on commit, and then on >> > commit it should call AtEOXact_ApplyLauncher to stop the apply worker. >> > Once the apply worker is stopped, the corresponding WALSender will >> > also be stopped. Something here is not happening as per expected >> > behavior. >> >> On the subscriber, an entry for worker stop is created in >> AlterSubscription_refresh --> logicalrep_worker_stop_at_commit. At the end >> of txn, in AtEOXact_ApplyLauncher, we try to stop that worker, but it cannot >> be stopped because logicalrep_worker_find returns null >> (AtEOXact_ApplyLauncher --> logicalrep_worker_stop --> >> logicalrep_worker_find). The worker entry for that subscriber is having >> relid as 0 [1], due to which the following if condition will not be hit. The >> apply worker on the subscriber related to the subscription on which refresh >> publication was run is not closed. It looks like relid 0 is valid because it >> will be applicable only during the table sync phase, the comment in the >> LogicalRepWorker structure says that. >> >> And also, I think, expecting the apply worker to be closed this way doesn't >> make sense because the apply worker is a per-subscription base, and the >> subscription can have other tables too. >> > > Okay, that makes sense. As responded to Li Japin, let's focus on > figuring out why we are sending the changes from the publisher node in > some cases and not in other cases.
After some analysis, I find that the dropped tables always replicate to subscriber. The difference is that if we drop the table from publication and refresh publication (on subscriber), the LogicalRepRelMapEntry in should_apply_changes_for_rel() set state to SUBREL_STATE_UNKNOWN. (gdb) p *rel $2 = {remoterel = {remoteid = 16410, nspname = 0x5564fb0177c0 "public", relname = 0x5564fb0177a0 "t1", natts = 1, attnames = 0x5564fb0177e0, atttyps = 0x5564fb017780, replident = 100 'd', relkind = 0 '\000', attkeys = 0x0}, localrelvalid = true, localreloid = 16412, localrel = 0x7f78705da1b8, attrmap = 0x5564fb017800, updatable = false, *state = 0 '\000'*, statelsn = 0} If we insert data between drop table from publication and refresh publication, the LogicalRepRelMapEntry state is always SUBREL_STATE_READY. (gdb) p *rel $2 = {remoterel = {remoteid = 16410, nspname = 0x5564fb0177c0 "public", relname = 0x5564fb0177a0 "t1", natts = 1, attnames = 0x5564fb0177e0, atttyps = 0x5564fb017780, replident = 100 'd', relkind = 0 '\000', attkeys = 0x0}, localrelvalid = true, localreloid = 16412, localrel = 0x7f78705d9d38, attrmap = 0x5564fb017800, updatable = false, *state = 114 'r'*, statelsn = 23545672} I will dig why the state of LogicalRepRelMapEntry doesn't change in second case. Any suggestion is welcome! -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.