> > 13.
> > @@ -507,7 +558,16 @@ CreateSubscription(CreateSubscriptionStmt *stmt,
> > bool isTopLevel)
> >   {
> >   Assert(slotname);
> >
> > - walrcv_create_slot(wrconn, slotname, false,
> > + /*
> > + * Even if two_phase is set, don't create the slot with
> > + * two-phase enabled. Will enable it once all the tables are
> > + * synced and ready. This avoids race-conditions like prepared
> > + * transactions being skipped due to changes not being applied
> > + * due to checks in should_apply_changes_for_rel() when
> > + * tablesync for the corresponding tables are in progress. See
> > + * comments atop worker.c.
> > + */
> > + walrcv_create_slot(wrconn, slotname, false, false,
> >
> > Can't we enable two_phase if copy_data is false? Because in that case,
> > all relations will be in a READY state. If we do that then we should
> > also set two_phase state as 'enabled' during createsubscription. I
> > think we need to be careful to check that connect option is given and
> > copy_data is false before setting such a state. Now, I guess we may
> > not be able to optimize this to not set 'enabled' state when the
> > subscription has no rels.
> >
> 
> Fixed in v77-0001

I noticed this modification in v77-0001 and executed "CREATE SUBSCRIPTION ... 
WITH (two_phase = on, copy_data = false)", but it crashed.
-------------
postgres=# CREATE SUBSCRIPTION sub CONNECTION 'dbname=postgres' PUBLICATION pub 
WITH(two_phase = on, copy_data = false);
WARNING:  relcache reference leak: relation "pg_subscription" not closed
WARNING:  snapshot 0x34278d0 still active
NOTICE:  created replication slot "sub" on publisher
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!?>
-------------

There are two warnings and a segmentation fault in subscriber log:
-------------
2021-05-24 15:08:32.435 CST [2848572] WARNING:  relcache reference leak: 
relation "pg_subscription" not closed
2021-05-24 15:08:32.435 CST [2848572] WARNING:  snapshot 0x32ce8b0 still active
2021-05-24 15:08:33.012 CST [2848555] LOG:  server process (PID 2848572) was 
terminated by signal 11: Segmentation fault
2021-05-24 15:08:33.012 CST [2848555] DETAIL:  Failed process was running: 
CREATE SUBSCRIPTION sub CONNECTION 'dbname=postgres' PUBLICATION pub 
WITH(two_phase = on, copy_data = false);
-------------

The backtrace about segmentation fault is attached. It happened in table_close 
function, we got it because "CurrentResourceOwner" was NULL.

I think it was related with the first warning, which reported "relcache 
reference leak". The backtrace information is attached, too. When updating 
two-phase state in CreateSubscription function, it released resource owner and 
set "CurrentResourceOwner" as NULL in CommitTransaction function.

The second warning about "snapshot still active" was also happened in 
CommitTransaction function. It called AtEOXact_Snapshot function, checked 
leftover snapshots and reported the warning.
I debugged and found the snapshot was added in function PortalRunUtility by 
calling PushActiveSnapshot function, the address of "ActiveSnapshot" at this 
time was same as the address in warning.

In summary, when creating subscription with two_phase = on and copy_data = 
false, it calls UpdateTwoPhaseState function in CreateSubscription function to 
set two_phase state as 'enabled', and it checked and released relcache and 
snapshot too early so the NG happened. I think some change should be made to 
avoid it. Thought?

FYI
 I also tested the new released V78* at [1], the above NG still exists.
[1] 
https://www.postgresql.org/message-id/CAFPTHDab56twVmC%2B0a%3DRNcRw4KuyFdqzW0JAcvJdS63n_fRnOQ%40mail.gmail.com

Regards
Tang

#0  ResourceArrayRemove (resarr=resarr@entry=0x80, value=value@entry=46738512) 
at resowner.c:309
#1  0x0000000000e46e96 in ResourceOwnerForgetRelationRef (owner=0x0, 
rel=rel@entry=0x2c92c50) at resowner.c:1127
#2  0x0000000000dcdfb9 in RelationDecrementReferenceCount 
(rel=rel@entry=0x2c92c50) at relcache.c:2083
#3  0x0000000000dcdff0 in RelationClose (relation=relation@entry=0x2c92c50) at 
relcache.c:2101
#4  0x00000000004a7a50 in relation_close (relation=relation@entry=0x2c92c50, 
lockmode=lockmode@entry=3) at relation.c:213
#5  0x00000000005bf6df in table_close (relation=relation@entry=0x2c92c50, 
lockmode=lockmode@entry=3) at table.c:169
#6  0x00000000007c6687 in CreateSubscription (stmt=stmt@entry=0x2c30338, 
isTopLevel=isTopLevel@entry=true) at subscriptioncmds.c:590
#7  0x0000000000bae4c8 in ProcessUtilitySlow (pstate=pstate@entry=0x2c52a40, 
pstmt=pstmt@entry=0x2c306a8,
    queryString=queryString@entry=0x2c2f610 "create subscription sub connection 
'dbname=postgres' publication pub with(two_phase=on, copy_data = false);",
    context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=params@entry=0x0, 
queryEnv=queryEnv@entry=0x0, dest=0x2c30798, qc=0x7fff20540200) at 
utility.c:1803
#8  0x0000000000babeea in standard_ProcessUtility (pstmt=0x2c306a8,
    queryString=0x2c2f610 "create subscription sub connection 'dbname=postgres' 
publication pub with(two_phase=on, copy_data = false);", 
context=PROCESS_UTILITY_TOPLEVEL,
    params=0x0, queryEnv=0x0, dest=0x2c30798, qc=0x7fff20540200) at 
utility.c:1034
#9  0x0000000000bac05b in ProcessUtility (pstmt=pstmt@entry=0x2c306a8, 
queryString=<optimized out>, context=<optimized out>, params=<optimized out>,
    queryEnv=<optimized out>, dest=dest@entry=0x2c30798, qc=0x7fff20540200) at 
utility.c:525
#10 0x0000000000ba4013 in PortalRunUtility (portal=portal@entry=0x2c944e0, 
pstmt=pstmt@entry=0x2c306a8, isTopLevel=isTopLevel@entry=true,
    setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x2c30798, 
qc=qc@entry=0x7fff20540200) at pquery.c:1159
#11 0x0000000000ba56b4 in PortalRunMulti (portal=portal@entry=0x2c944e0, 
isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=false,
    dest=dest@entry=0x2c30798, altdest=altdest@entry=0x2c30798, 
qc=qc@entry=0x7fff20540200) at pquery.c:1305
#12 0x0000000000ba6fc2 in PortalRun (portal=portal@entry=0x2c944e0, 
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, 
run_once=run_once@entry=true,
    dest=dest@entry=0x2c30798, altdest=altdest@entry=0x2c30798, 
qc=0x7fff20540200) at pquery.c:779
#13 0x0000000000ba1f97 in exec_simple_query (
    query_string=query_string@entry=0x2c2f610 "create subscription sub 
connection 'dbname=postgres' publication pub with(two_phase=on, copy_data = 
false);")
    at postgres.c:1214
#14 0x0000000000ba3457 in PostgresMain (argc=argc@entry=1, 
argv=argv@entry=0x7fff205403e0, dbname=<optimized out>, username=<optimized 
out>) at postgres.c:4486
#15 0x0000000000a72979 in BackendRun (port=port@entry=0x2c55690) at 
postmaster.c:4491
#16 0x0000000000a78ce9 in BackendStartup (port=port@entry=0x2c55690) at 
postmaster.c:4213
#17 0x0000000000a79128 in ServerLoop () at postmaster.c:1745
#18 0x0000000000a7bd0e in PostmasterMain (argc=argc@entry=3, 
argv=argv@entry=0x2c28ab0) at postmaster.c:1417
#19 0x000000000090dcf9 in main (argc=3, argv=0x2c28ab0) at main.c:209
#0  PrintRelCacheLeakWarning (rel=rel@entry=0x1e5dc30) at resowner.c:1137
#1  0x0000000000e48dd0 in ResourceOwnerReleaseInternal 
(owner=owner@entry=0x1e76c10, phase=phase@entry=RESOURCE_RELEASE_BEFORE_LOCKS, 
isCommit=isCommit@entry=true,
    isTopLevel=isTopLevel@entry=true) at resowner.c:543
#2  0x0000000000e48ce7 in ResourceOwnerReleaseInternal (owner=0x1e761a0, 
phase=phase@entry=RESOURCE_RELEASE_BEFORE_LOCKS, isCommit=isCommit@entry=true,
    isTopLevel=isTopLevel@entry=true) at resowner.c:508
#3  0x0000000000e49cf3 in ResourceOwnerRelease (owner=<optimized out>, 
phase=phase@entry=RESOURCE_RELEASE_BEFORE_LOCKS, isCommit=isCommit@entry=true,
    isTopLevel=isTopLevel@entry=true) at resowner.c:492
#4  0x00000000005e94fe in CommitTransaction () at xact.c:2233
#5  0x00000000005eb971 in CommitTransactionCommand () at xact.c:2965
#6  0x0000000000ace286 in UpdateTwoPhaseState (suboid=suboid@entry=16389, 
new_state=new_state@entry=101 'e') at tablesync.c:1285
#7  0x00000000007c8408 in CreateSubscription (stmt=stmt@entry=0x1dfd328, 
isTopLevel=isTopLevel@entry=true) at subscriptioncmds.c:571
#8  0x0000000000bb18b3 in ProcessUtilitySlow (pstate=pstate@entry=0x1e1ffa0, 
pstmt=pstmt@entry=0x1dfd698,
    queryString=queryString@entry=0x1dfc600 "CREATE SUBSCRIPTION sub CONNECTION 
'dbname=postgres' PUBLICATION pub WITH(two_phase = on, copy_data = false);",
    context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=params@entry=0x0, 
queryEnv=queryEnv@entry=0x0, dest=0x1dfd788, qc=0x7ffc6ae51740) at 
utility.c:1803
#9  0x0000000000baf2d5 in standard_ProcessUtility (pstmt=0x1dfd698,
    queryString=0x1dfc600 "CREATE SUBSCRIPTION sub CONNECTION 'dbname=postgres' 
PUBLICATION pub WITH(two_phase = on, copy_data = false);",
    context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1dfd788, 
qc=0x7ffc6ae51740) at utility.c:1034
#10 0x0000000000baf446 in ProcessUtility (pstmt=pstmt@entry=0x1dfd698, 
queryString=<optimized out>, context=<optimized out>, params=<optimized out>,
    queryEnv=<optimized out>, dest=dest@entry=0x1dfd788, qc=0x7ffc6ae51740) at 
utility.c:525
#11 0x0000000000ba9580 in PortalRunUtility (portal=portal@entry=0x1e5f4c0, 
pstmt=pstmt@entry=0x1dfd698, isTopLevel=isTopLevel@entry=true,
    setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x1dfd788, 
qc=qc@entry=0x7ffc6ae51740) at pquery.c:1147
#12 0x0000000000ba9a0e in PortalRunMulti (portal=portal@entry=0x1e5f4c0, 
isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=false,
    dest=dest@entry=0x1dfd788, altdest=altdest@entry=0x1dfd788, 
qc=qc@entry=0x7ffc6ae51740) at pquery.c:1303
#13 0x0000000000baa683 in PortalRun (portal=portal@entry=0x1e5f4c0, 
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, 
run_once=run_once@entry=true,
    dest=dest@entry=0x1dfd788, altdest=altdest@entry=0x1dfd788, 
qc=0x7ffc6ae51740) at pquery.c:786
#14 0x0000000000ba5200 in exec_simple_query (
    query_string=query_string@entry=0x1dfc600 "CREATE SUBSCRIPTION sub 
CONNECTION 'dbname=postgres' PUBLICATION pub WITH(two_phase = on, copy_data = 
false);")
    at postgres.c:1214
#15 0x0000000000ba66c0 in PostgresMain (argc=argc@entry=1, 
argv=argv@entry=0x7ffc6ae51920, dbname=<optimized out>, username=<optimized 
out>) at postgres.c:4486
#16 0x0000000000a7579d in BackendRun (port=port@entry=0x1e20cd0) at 
postmaster.c:4507
#17 0x0000000000a7bc1c in BackendStartup (port=port@entry=0x1e20cd0) at 
postmaster.c:4229
#18 0x0000000000a7c05b in ServerLoop () at postmaster.c:1745
#19 0x0000000000a7ec41 in PostmasterMain (argc=argc@entry=3, 
argv=argv@entry=0x1df5aa0) at postmaster.c:1417
#20 0x00000000009108a5 in main (argc=3, argv=0x1df5aa0) at main.c:209

Reply via email to