DanielCarter-stack commented on PR #10275:
URL: https://github.com/apache/seatunnel/pull/10275#issuecomment-3796688348
<!-- code-pr-reviewer -->
<!-- cpr:pr_reply_v2_parts {"group": "apache/seatunnel#10275", "part": 1,
"total": 1} -->
### MAJOR: Concurrent race condition: timing risk between splitsDiscovered
assignment and checkpoint
**Location**:
`seatunnel-connectors-v2/connector-fake/src/main/java/org/apache/seatunnel/connectors/seatunnel/fake/source/FakeSourceSplitEnumerator.java:63`
**Problem Description**:
According to the SourceSplitEnumerator interface documentation, run() and
snapshotState() may be called concurrently. The current code sets
splitsDiscovered = true after discoverySplits() completes, but not within a
synchronized block. If a checkpoint occurs during this period, splitsDiscovered
will be reinitialized to false upon recovery, causing registerReader() and
handleSplitRequest() to not trigger signalNoMoreSplits(), potentially
reintroducing the hang issue.
**Impact**:
Impact: 1) Theoretically may bypass this fix, with hang risk still present
after recovery; 2) Requires specific timing to trigger (discoverySplits
completes -> checkpoint -> splitsDiscovered=true), probability is low but
exists; 3) Violates best practices for state consistency.
### MINOR: Core field splitsDiscovered lacks JavaDoc comments
**Location**:
`seatunnel-connectors-v2/connector-fake/src/main/java/org/apache/seatunnel/connectors/seatunnel/fake/source/FakeSourceSplitEnumerator.java:47`
**Problem Description**:
splitsDiscovered is the core field of this fix, controlling whether to send
signalNoMoreSplits signals to readers. This field lacks JavaDoc comments
explaining: 1) The field's purpose and semantics; 2) Why the volatile keyword
is used; 3) Under what scenarios it is set to true; 4) Interaction
relationships with other methods (registerReader, handleSplitRequest).
**Impact**:
Impact: 1) Reduces code maintainability, future maintainers may not
understand its importance; 2) May be accidentally deleted or misused; 3)
Violates project coding standards (other fields in the same class have
comments).
### MINOR: Insufficient test coverage, missing key scenario validation
**Location**:
`seatunnel-connectors-v2/connector-fake/src/test/java/org/apache/seatunnel/connectors/seatunnel/fake/source/FakeSourceSplitEnumeratorTest.java:1`
**Problem Description**:
Although 2 test cases were added, key scenarios remain uncovered: 1)
Behavior of handleSplitRequest() (this is the modified method); 2) Behavior of
addSplitsBack() (triggers reallocation after modification); 3) Concurrent
scenarios (run() and snapshotState() called simultaneously, verifying the race
condition in FINDING-001); 4) Boundary conditions (empty splits, single reader,
multiple recoveries).
**Impact**:
Impact: 1) Untested code paths may contain bugs; 2) The concurrent risk in
FINDING-001 has no test validation; 3) Only one of the three modified methods
(handleSplitRequest, addSplitsBack, registerReader) is directly tested
(registerReader).
### MINOR: signalNoMoreSplits call lacks logging
**Location**:
`seatunnel-connectors-v2/connector-fake/src/main/java/org/apache/seatunnel/connectors/seatunnel/fake/source/FakeSourceSplitEnumerator.java:173`
**Problem Description**:
signalNoMoreSplits() is a critical operation to prevent reader hangs,
especially in recovery scenarios. However, this call has no logging, while
other important operations in the same class (assignSplit, discoverySplits)
have info-level logs. Missing logging causes: 1) Inability to confirm which
readers received the signal; 2) Difficult to debug recovery issues; 3)
Inability to verify through logs whether the fix is effective in production.
**Impact**:
Impact: 1) Reduces debuggability, making production issue troubleshooting
difficult; 2) Unable to monitor fix effectiveness through logs; 3) Inconsistent
with existing logging style (other critical operations have logs).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]