laughingman7743 commented on PR #156:
URL: 
https://github.com/apache/flink-connector-jdbc/pull/156#issuecomment-4775237639

   ### CI failure is a pre-existing flaky test on `main`, not caused by this PR
   
   The red job 
(https://github.com/apache/flink-connector-jdbc/actions/runs/27995114250/job/82864660173)
 fails in `DerbyDynamicTableSourceITCase.testLimit`:
   
   ```
   Caused by: java.sql.SQLNonTransientConnectionException: No current 
connection.  (ERROR 08003)
     at ...EmbedPreparedStatement.executeQuery(...)
   ```
   
   **This is not introduced by this PR:**
   - This branch is rebased on `af994651`. CI for that exact commit on `main` 
(https://github.com/apache/flink-connector-jdbc/actions/runs/27738188350) fails 
in the **same `DerbyDynamicTableSourceITCase` class with the same `ERROR 08003: 
No current connection`**, only a different method (`testProject`).
   - The latest `main` 
(https://github.com/apache/flink-connector-jdbc/actions/runs/27966883251) also 
fails CI on another flaky core ITCase 
(`JdbcSourceStreamRelatedITCase.testAtLeastOnceWithFailure`, 30s 
`TimeoutException`).
   - None of the failing test files are touched by this PR; changes are 
isolated to the new `flink-connector-jdbc-spanner` module plus 
catalog/dialect/lineage in core.
   
   ### Root cause of the flaky `testLimit`
   
   `testLimit` runs `SELECT * FROM t LIMIT 1` over a source split into **2 
partitions** (`scan.partition.num=2`). With `LIMIT 1` the job completes after 
the first row and cancels the source while the split-fetcher thread is still 
opening the **second** split. Cancellation tears the JDBC connection down, so 
the in-flight `JdbcSourceSplitReader.openResultSetForSplitWhenAtLeastOnce()` -> 
`statement.executeQuery()` runs against an already-closed connection -> `08003: 
No current connection`.
   
   Two gaps in `JdbcSourceSplitReader` make this fatal and racy:
   - `wakeUp()` is a no-op 
(https://github.com/apache/flink-connector-jdbc/blob/main/flink-connector-jdbc-core/src/main/java/org/apache/flink/connector/jdbc/core/datastream/source/reader/JdbcSourceSplitReader.java),
 so there is no cooperative cancellation — the reader keeps opening the next 
split during shutdown.
   - `fetch()` rethrows any `SQLException` as a fatal `RuntimeException`, so a 
benign shutdown-time "connection closed" error fails the whole job.
   
   ### How the fix differs from the closed PR 
https://github.com/apache/flink-connector-jdbc/pull/191
   
   That PR guarded only `resultSet.isClosed()` before `extract()` / 
`resultSet.next()` inside the record loop. But per the stack trace the failure 
is at `executeQuery()` when **opening the next split**, which that PR does not 
touch — so it would not reliably fix this. It was closed without merge.
   
   The fix I'd propose is different and targets the actual cause — cooperative 
cancellation:
   1. `wakeUp()` / `close()` set a `volatile boolean wakeup` flag.
   2. `checkSplitOrStartNext()` checks the flag before opening a new split (the 
`executeQuery()` path) and returns "no more records" instead of starting a 
query during shutdown.
   3. In `fetch()`, if a `SQLException` occurs while the reader is shutting 
down (flag set / thread interrupted), treat it as a graceful end-of-split 
instead of rethrowing — this closes the residual race window where cancellation 
lands during the Derby call.
   
   This covers the `executeQuery()` failure point that #191 missed, and removes 
the "shutdown error -> job failure" behavior. I'm happy to open a separate, 
focused PR + JIRA for this so both this PR and `main` go green.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to