On Wed, 13 Jan 2021 at 13:26, Amit Kapila wrote:
> On Tue, Jan 12, 2021 at 4:59 PM Bharath Rupireddy
> <bharath.rupireddyforpostg...@gmail.com> wrote:
>>
>> On Tue, Jan 12, 2021 at 12:06 PM Amit Kapila <amit.kapil...@gmail.com> wrote:
>> > > Here's my analysis:
>> > > 1) in the publisher, alter publication drop table successfully
>> > > removes(PublicationDropTables) the table from the catalogue
>> > > pg_publication_rel
>> > > 2) in the subscriber, alter subscription refresh publication
>> > > successfully removes the table from the catalogue pg_subscription_rel
>> > > (AlterSubscription_refresh->RemoveSubscriptionRel)
>> > > so far so good
>> > >
>> >
>> > Here, it should register the worker to stop on commit, and then on
>> > commit it should call AtEOXact_ApplyLauncher to stop the apply worker.
>> > Once the apply worker is stopped, the corresponding WALSender will
>> > also be stopped. Something here is not happening as per expected
>> > behavior.
>>
>> On the subscriber, an entry for worker stop is created in 
>> AlterSubscription_refresh --> logicalrep_worker_stop_at_commit. At the end 
>> of txn, in AtEOXact_ApplyLauncher, we try to stop that worker, but it cannot 
>> be stopped because logicalrep_worker_find returns null 
>> (AtEOXact_ApplyLauncher --> logicalrep_worker_stop --> 
>> logicalrep_worker_find). The worker entry for that subscriber is having 
>> relid as 0 [1], due to which the following if condition will not be hit. The 
>> apply worker on the subscriber related to the subscription on which refresh 
>> publication was run is not closed. It looks like relid 0 is valid because it 
>> will be applicable only during the table sync phase, the comment in the 
>> LogicalRepWorker structure says that.
>>
>> And also, I think, expecting the apply worker to be closed this way doesn't 
>> make sense because the apply worker is a per-subscription base, and the 
>> subscription can have other tables too.
>>
>
> Okay, that makes sense. As responded to Li Japin, let's focus on
> figuring out why we are sending the changes from the publisher node in
> some cases and not in other cases.

After some analysis, I find that the dropped tables always replicate to 
subscriber.
The difference is that if we drop the table from publication and refresh
publication (on subscriber), the LogicalRepRelMapEntry in 
should_apply_changes_for_rel()
set state to SUBREL_STATE_UNKNOWN.

(gdb) p *rel
$2 = {remoterel = {remoteid = 16410, nspname = 0x5564fb0177c0 "public",
    relname = 0x5564fb0177a0 "t1", natts = 1, attnames = 0x5564fb0177e0, 
atttyps = 0x5564fb017780,
    replident = 100 'd', relkind = 0 '\000', attkeys = 0x0}, localrelvalid = 
true,
  localreloid = 16412, localrel = 0x7f78705da1b8, attrmap = 0x5564fb017800, 
updatable = false,
  *state = 0 '\000'*, statelsn = 0}

If we insert data between drop table from publication and refresh publication, 
the
LogicalRepRelMapEntry state is always SUBREL_STATE_READY.

(gdb) p *rel
$2 = {remoterel = {remoteid = 16410, nspname = 0x5564fb0177c0 "public",
    relname = 0x5564fb0177a0 "t1", natts = 1, attnames = 0x5564fb0177e0, 
atttyps = 0x5564fb017780,
    replident = 100 'd', relkind = 0 '\000', attkeys = 0x0}, localrelvalid = 
true,
  localreloid = 16412, localrel = 0x7f78705d9d38, attrmap = 0x5564fb017800, 
updatable = false,
  *state = 114 'r'*, statelsn = 23545672}

I will dig why the state of LogicalRepRelMapEntry doesn't change in second case.

Any suggestion is welcome!

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.


Reply via email to