github-actions[bot] commented on code in PR #63514:
URL: https://github.com/apache/doris/pull/63514#discussion_r3287566954


##########
fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/source/reader/JdbcIncrementalSourceReader.java:
##########
@@ -443,7 +444,7 @@ private SplitReadResult prepareStreamSplit(

Review Comment:
   Making the whole `pollRecords()` method synchronized also holds this reader 
monitor while the stream path is blocked in `streamReader.pollSplitRecords()` 
(line 619). `close()` only cancels `activePollFutures` outside the monitor, 
which helps the snapshot path, but the stream/binlog path has no future to 
cancel; it then waits for the same monitor before `finishSplitRecords()` can 
close `streamReader`. During DROP/cancel of a streaming PG job with no incoming 
records, the close request can therefore block behind the polling thread 
instead of closing the replication reader that would unblock it, leaving 
cleanup and slot release stuck. Please avoid holding this monitor across the 
blocking stream poll, or make close able to close/cancel the active stream 
reader outside the monitor as well.



##########
fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/source/reader/mysql/MySqlSourceReader.java:
##########
@@ -429,7 +429,7 @@ private SplitReadResult prepareSnapshotSplits(
     }
 
     /** Prepare binlog split */
-    private SplitReadResult prepareBinlogSplit(
+    private synchronized SplitReadResult prepareBinlogSplit(
             Map<String, Object> offsetMeta, JobBaseRecordRequest baseReq) 
throws Exception {
         // Load tableSchemas from FE if available (avoids re-discover on 
restart)
         tryLoadTableSchemasFromRequest(baseReq);

Review Comment:
   Same close-path regression as the JDBC/PG reader: this synchronized method 
holds the instance monitor while the binlog path calls 
`binlogReader.pollSplitRecords()` (line 632), but `close()` cannot call 
`finishSplitRecords()`/`binlogReader.close()` until it obtains that monitor. 
Because there are no `activePollFutures` in the binlog path, a DROP/cancel 
while the binlog reader is waiting for events can block cleanup instead of 
interrupting the reader. Please keep the monitor out of the blocking binlog 
poll or close/cancel the active binlog reader before waiting for the monitor.



##########
regression-test/suites/job_p0/streaming_job/cdc/test_streaming_postgres_job_special_offset_restart_fe.groovy:
##########
@@ -0,0 +1,184 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+import org.apache.doris.regression.suite.ClusterOptions
+import org.awaitility.Awaitility
+
+import static java.util.concurrent.TimeUnit.SECONDS
+
+// Mirror of test_streaming_mysql_job_special_offset_restart_fe for the PG 
path:
+// CREATE JOB with a JSON LSN offset, sync, restart FE, verify currentOffset
+// survives the replay and subsequent binlog DML still lands.
+//
+// PG-specific wrinkle: an auto-managed slot starts retaining WAL only at slot
+// creation time, so a CREATE-with-past-LSN against an auto slot would fail
+// because PG has already purged the requested LSN. We therefore pre-create a
+// user-provided slot first — that pins the WAL retention horizon back in time
+// far enough to make the LSN we capture valid.
+suite("test_streaming_postgres_job_special_offset_restart_fe",
+        "docker,pg,external_docker,external_docker_pg,nondatalake") {
+    def jobName = "test_streaming_pg_special_offset_restart_fe"
+    def options = new ClusterOptions()
+    options.setFeNum(1)
+    options.cloudMode = null
+
+    docker(options) {
+        def currentDb = (sql "select database()")[0][0]
+        def table1 = "special_offset_restart_pg_tbl"
+        def pgDB = "postgres"
+        def pgSchema = "cdc_test"
+        def pgUser = "postgres"
+        def pgPassword = "123456"
+        def userSlot = "special_offset_restart_slot"
+        def userPub = "special_offset_restart_pub"
+
+        sql """DROP JOB IF EXISTS where jobname = '${jobName}'"""
+        sql """drop table if exists ${currentDb}.${table1} force"""
+
+        String enabled = context.config.otherConfigs.get("enableJdbcTest")
+        if (enabled != null && enabled.equalsIgnoreCase("true")) {
+            String pg_port = context.config.otherConfigs.get("pg_14_port");
+            String externalEnvIp = 
context.config.otherConfigs.get("externalEnvIp")
+            String s3_endpoint = getS3Endpoint()
+            String bucket = getS3BucketName()
+            String driver_url = 
"https://${bucket}.${s3_endpoint}/regression/jdbc_driver/postgresql-42.5.0.jar";
+
+            // Setup: fresh PG table + fresh user slot/pub. Slot must be 
created
+            // BEFORE the LSN we capture below, otherwise PG would have purged
+            // the WAL covering that LSN by the time the job tries to replay 
it.
+            def lsnAtCreate = ""
+            connect("${pgUser}", "${pgPassword}", 
"jdbc:postgresql://${externalEnvIp}:${pg_port}/${pgDB}") {
+                sql """DROP TABLE IF EXISTS ${pgDB}.${pgSchema}.${table1}"""
+                sql """CREATE TABLE ${pgDB}.${pgSchema}.${table1} (
+                      "id" int PRIMARY KEY,
+                      "name" varchar(100)
+                    )"""
+                sql """DROP PUBLICATION IF EXISTS ${userPub}"""
+                sql """CREATE PUBLICATION ${userPub} FOR TABLE 
${pgDB}.${pgSchema}.${table1}"""
+                def existing = sql """SELECT COUNT(1) FROM 
pg_replication_slots WHERE slot_name = '${userSlot}'"""
+                if (existing[0][0] != 0) {
+                    sql """SELECT pg_drop_replication_slot('${userSlot}')"""
+                }
+                sql """SELECT 
pg_create_logical_replication_slot('${userSlot}', 'pgoutput')"""
+
+                // Capture LSN AFTER slot creation, BEFORE the INSERTs the job 
will read.
+                def lsnRows = sql """SELECT pg_current_wal_lsn()::text"""
+                def lsnStr = lsnRows[0][0].toString()
+                def parts = lsnStr.split("/")
+                def high = Long.parseLong(parts[0], 16)
+                def low = Long.parseLong(parts[1], 16)
+                lsnAtCreate = String.valueOf((high << 32) + low)

Review Comment:
   PostgreSQL LSNs are unsigned 64-bit values, but this conversion builds the 
numeric offset with signed `Long` arithmetic. Once the high half reaches 
`0x80000000`, `(high << 32) + low` becomes negative and the test creates an 
invalid JSON `lsn` even though the source LSN is valid. The new 
`slot_lsn_advance` test already uses `BigInteger` for this reason; please use 
the same `new BigInteger(parts[0], 16).shiftLeft(32).add(new 
BigInteger(parts[1], 16)).toString()` conversion here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to