On 2021/10/11 22:15, Greg Nancarrow wrote:
On Mon, Oct 11, 2021 at 6:55 PM houzj.f...@fujitsu.com
<houzj.f...@fujitsu.com> wrote:

I can see the walsender tried to release a not-quite-ready repliaction slot
that was created when create a subscription. But the pgstat has been shutdown
before invoking ReplicationSlotRelease().

The stack is as follows:

#2  in ExceptionalCondition (pgstat_is_initialized && !pgstat_is_shutdown)
#3  in pgstat_assert_is_up () at pgstat.c:4852
#4  in pgstat_send (msg=msg@entry=0x7ffe716f7470, len=len@entry=144) at 
pgstat.c:3075
#5  in pgstat_report_replslot_drop (slotname=slotname@entry=0x7fbcf57a3c98 
"sub") at pgstat.c:1869
#6  in ReplicationSlotDropPtr (slot=0x7fbcf57a3c80) at slot.c:696
#7  in ReplicationSlotDropAcquired () at slot.c:585
#8  in ReplicationSlotRelease () at slot.c:482
#9  in ProcKill (code=<optimized out>, arg=<optimized out>) at proc.c:852
#10 in shmem_exit (code=code@entry=0) at ipc.c:272
#11 in proc_exit_prepare (code=code@entry=0) at ipc.c:194
#12 in proc_exit (code=code@entry=0) at ipc.c:107
#13 in ProcessRepliesIfAny () at walsender.c:1807
#14 in WalSndWaitForWal (loc=loc@entry=22087632) at walsender.c:1417
#15 in logical_read_xlog_page (state=0x2f8c600, targetPagePtr=22085632,
     reqLen=, targetRecPtr=, cur_page=0x2f6c1e0 "\016\321\005") at 
walsender.c:821
#16 in ReadPageInternal (state=state@entry=0x2f8c600,
     pageptr=pageptr@entry=22085632, reqLen=reqLen@entry=2000) at 
xlogreader.c:667
#17 in XLogReadRecord (state=0x2f8c600,
     errormsg=errormsg@entry=0x7ffe716f7f98) at xlogreader.c:337
#18 in DecodingContextFindStartpoint (ctx=ctx@entry=0x2f8c240)
     at logical.c:606
#19 in CreateReplicationSlot (cmd=cmd@entry=0x2f1aef0)

Is this behavior expected ?


I'd say it's not!

Yes. I think this is a bug.


Just looking at the stacktrace, I'm thinking that the following commit
may have had a bearing on this problem, by causing pgstat to be
shutdown earlier:

commit fb2c5028e63589c01fbdf8b031be824766dc7a68
Author: Andres Freund <and...@anarazel.de>
Date:   Fri Aug 6 10:05:57 2021 -0700

     pgstat: Schedule per-backend pgstat shutdown via before_shmem_exit().


Can you see if the problem can be reproduced prior to this commit?

Even in prior to the commit, pgstat_shutdown_hook() can be called
before ProcKill() at the backend exit, so ISTM that the problem can
be reproduced.

Probably we need to make sure that pgstat_shutdown_hook() is called
after ProcKill(), e.g., by registering pgstat_shutdown_hook() into
on_proc_exit_list (I'm not sure if this change is safe, though).
Or maybe pgstat logic for replication slot drop needs to be overhauled.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply via email to