> On Mar 5, 2026, at 19:34, Zhijie Hou (Fujitsu) <[email protected]> wrote:
> 
> On Thursday, March 5, 2026 6:47 PM Amit Kapila <[email protected]> 
> wrote:
>> On Thu, Mar 5, 2026 at 9:35 AM shveta malik <[email protected]>
>> wrote:
>>> 
>>> 
>>> 3)
>>> + /* Sleep for the configured interval */
>>> + (void) WaitLatch(MyLatch,
>>> + WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, sleep_ms,
>>> + WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE);
>>> 
>>> I don't think this wait-event is appropriate. Unlike tablesync, we are
>>> not waiting for any state change here. Shall we add a new one for our
>>> case? How about WAIT_EVENT_LOGICAL_SEQSYNC_MAIN? Thoughts?
>>> 
>> 
>> +1 for a new wait event. Few other minor comments:
> 
> Added.
> 
>> 
>> 1.
>> + * Check if the subscription includes sequences and start a
>> + sequencesync
>> + * worker if one is not already running. The active sequencesync worker
>> + will
>> + * handle all pending sequence synchronization. If any sequences remain
>> + * unsynchronized after it exits, a new worker can be started in the
>> + next
>> + * iteration.
>>  *
>> - * Start a sequencesync worker if one is not already running. The active
>> - * sequencesync worker will handle all pending sequence synchronization. If
>> any
>> - * sequences remain unsynchronized after it exits, a new worker can be
>> started
>> - * in the next iteration.
>> 
>> Why did this comment change? The earlier one sounds okay to me.
> 
> I think either version is fine, so reverted this change now.
> 
>> 
>> 2.
>>  break;
>> +
>>  case COPYSEQ_INSUFFICIENT_PERM:
>> 
>> Why does this patch add additional new lines? We use both styles (existing
>> and what this patch does) in the code but it seems unnecessary to change for
>> this patch.
> 
> Removed.
> 
>> 
>> 3.
>> - ProcessSequencesForSync();
>> +
>> + /* Check if sequence worker needs to be started */
>> + MaybeLaunchSequenceSyncWorker();
>> 
>> No need for an additional line and a comment here.
> 
> Removed.
> 
> Here is the V11 patch which addressed all above comments and [1][2].
> 
> [1] 
> https://www.postgresql.org/message-id/CAJpy0uAfu-VPqCknLLvJ%2BPUx_cyoR-b70xUNT6Pyv8N-odKizQ%40mail.gmail.com
> [2] 
> https://www.postgresql.org/message-id/CAJpy0uBeAdz6-3P26Eryeq3TyjA-GiKY3z0hFMxzZD%3DAYGqQ3Q%40mail.gmail.com
> 
> Best Regards,
> Hou zj
> <v11-0002-Synchronize-sequences-directly-in-REFRESH-SEQUEN.patch><v11-0001-Support-automatic-sequence-replication.patch>

Hi Zhijie,

Thanks for your clarification in the other email. I have reviewed v11 and here 
comes my comments:

1 - 0001
```
 static CopySeqResult
-copy_sequence(LogicalRepSequenceInfo *seqinfo, Oid seqowner)
+validate_seqsync_state(LogicalRepSequenceInfo *seqinfo, Relation sequence_rel)
 {
-       UserContext ucxt;
        AclResult       aclresult;
+       Oid                     seqoid = seqinfo->localrelid;
+
+       /* Perform drift check if it's not the initial sync */
+       if (seqinfo->relstate == SUBREL_STATE_READY)
+       {
+               int64           local_last_value;
+               bool            local_is_called;
+
+               /*
+                * Skip synchronization if the current user does not have 
sufficient
+                * privileges to read the sequence data.
+                */
+               if (!GetSequence(sequence_rel, &local_last_value, 
&local_is_called))
+                       return COPYSEQ_INSUFFICIENT_PERM;
+
+               /*
+                * Skip synchronization if the sequence has not drifted from the
+                * publisher's value.
+                */
+               if (local_last_value == seqinfo->last_value &&
+                       local_is_called == seqinfo->is_called)
+                       return COPYSEQ_NO_DRIFT;
+       }
+
+       /* Verify that the current user can update the sequence */
+       aclresult = pg_class_aclcheck(seqoid, GetUserId(), ACL_UPDATE);
+
+       if (aclresult != ACLCHECK_OK)
+               return COPYSEQ_INSUFFICIENT_PERM;
+
+       return COPYSEQ_ALLOWED;
+}
```

I think we can move pg_class_aclcheck() to before the drift check, because it’s 
a local check, should be cheaper than a remote check. If the local check fails, 
we don’t need to touch remote.

2 - 0001
```
+static CopySeqResult
+copy_sequence(LogicalRepSequenceInfo *seqinfo, Relation sequence_rel)
+{
+       UserContext ucxt;
        bool            run_as_owner = MySubscription->runasowner;
        Oid                     seqoid = seqinfo->localrelid;
+       CopySeqResult result;
+       XLogRecPtr      local_page_lsn;
+
+       (void) GetSubscriptionRelState(MySubscription->oid,
+                                                                  
RelationGetRelid(sequence_rel),
+                                                                  
&local_page_lsn);
```

I think GetSubscriptionRelState is called to get local_page_lsn. But 
local_page_lsn is only used very late in this function, and there is an early 
return branch, so can we move this call to right before where local_page_lsn is 
used?

3 - 0001
```
        /*
         * Record the remote sequence's LSN in pg_subscription_rel and mark the
-        * sequence as READY.
+        * sequence as READY if updating a sequence that is in INIT state.
         */
-       UpdateSubscriptionRelState(MySubscription->oid, seqoid, 
SUBREL_STATE_READY,
-                                                          seqinfo->page_lsn, 
false);
+       if (seqinfo->relstate == SUBREL_STATE_INIT ||
+               seqinfo->page_lsn != local_page_lsn)
+               UpdateSubscriptionRelState(MySubscription->oid, seqoid, 
SUBREL_STATE_READY,
+                                                                  
seqinfo->page_lsn, false);
```

In the comment, I think you don’t have to add “if updating .. that is in INIT 
state”, but if you do, then you should also mention the lsn condition.

4 - 0001
```
+                       else
+                       {
+                               /*
+                                * Double the sleep time, but not beyond the 
maximum allowable
+                                * value.
+                                */
+                               sleep_ms = Min(sleep_ms * 2, 
SEQSYNC_MAX_SLEEP_MS);
+                       }
```

Double the sleep time when no drift is an optimization. But looks like the 
doubling happens only when all sequences have no drift. Say, there are 1000 
sequences, and only one is hot, then the it will still fetch the 1000 sequences 
from remote every 2 seconds, making the optimization much less efficient. I 
think the worker can wake up every 2 seconds, but next fetch time should be per 
sequence.

5 - 0001
```
+ * If the state of sequence is SUBREL_STATE_READY, only synchronize sequences
```

“The state of sequence is SUBREL_STATE_READY” is inaccurate, I think it should 
be “the state of sequence sync is SUBREL_STATE_READY”.

6 - 0001

start_sequence_sync runs an infinite loop to periodically sync sequences. I 
don’t it has an auto reconnect mechanism. When something wrong happens, the 
sync worker will exit, how can the worker 

7 - 0001

Say a major version upgrade use case that uses logical replication. Before 
shutdown the publication side (old version), if there are 1000 sequences, how 
can a user decide if all sequences have been synced? From this perspective, 
would it make sense to log a INFO message when all sequences have no drift? If 
next round still no drift, then don’t repeat the message. In other words, when 
the states between (all no drift) and (any drift) switch, log a INFO message.

8 - 0001
```
+bool
+GetSequence(Relation seqrel, int64 *last_value, bool *is_called)
```

This function name feels too general. How about ReadSequenceState or 
GetSequenceLastValueAndIsCalled?

9 - 0001
```
+                                       elog(ERROR, "unrecognized Sequence 
replication result: %d", (int) sync_status);
```
Nit: The “S” seems not have to be capital.

10 - 0002
```
+                                       ? errdetail("The local last_value %lld 
is ahead of the one on publisher",
+                                                               (long long int) 
local_last_value)
+                                       : errdetail("The local last_value %lld 
is behind of the one on publisher",
+                                                               (long long int) 
local_last_value));
```

Per https://www.postgresql.org/docs/18/error-style-guide.html, detail message 
should ends with a period:

Detail and hint messages: Use complete sentences, and end each with a period. 
Capitalize the first word of sentences. Put two spaces after the period if 
another sentence follows (for English text; might be inappropriate in other 
languages).

11 - 0002 - same code block as 10

If we give up the sync, does that mean it’s a safe situation for upgrade? If 
yes, does it make sense to add a hint to say something like “Now it’s safe for 
upgrade”, or whatever else guide to users.

12 - 0002
```
+void
+AlterSubSyncSequences(WalReceiverConn *conn, Oid subid, char *subname,
+                                         bool runasowner)
+{
+       /*
+        * Init the SequenceSyncContext which we clean up after the sequence
+        * synchronization.
+        */
+       SequenceSyncContext = AllocSetContextCreate(CurrentMemoryContext,
+                                                                               
                "SequenceSyncContext",
+                                                                               
                ALLOCSET_DEFAULT_SIZES);
+
+       PG_TRY();
+       {
+               MemoryContext   oldctx;
+
+               oldctx = MemoryContextSwitchTo(SequenceSyncContext);
+
+               LogicalRepSyncSequences(conn, subid, subname, runasowner);
```

I think we can take the return value of LogicalRepSyncSequences and log an INFO 
to tell user the current status.

13 - 0002
```
+# Verify there is no logical replication apply worker running
+$result = $node_subscriber->safe_psql(
+       'postgres',
+       "SELECT count(*) FROM pg_stat_activity WHERE backend_type = 'logical 
replication apply worker'");
+
+is($result, '0', 'no logical replication worker is running’);
```

Looks like a mismatch. The test checks “apply worker” but the error message 
mentions “logical replication worker”.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/






Reply via email to