zhangshenghang opened a new pull request, #10687: URL: https://github.com/apache/seatunnel/pull/10687
### Purpose of this pull request This PR fixes an engine-side terminal-state convergence bug after worker node failure. When a worker goes offline, the engine can start cleaning distributed state from the running job state maps before all asynchronous task/pipeline/job callbacks have finished. In the current code path, `PhysicalVertex`, `SubPlan`, and `PhysicalPlan` all assume the current state still exists in the distributed map and call `current.equals(...)` / `current.isEndState()` directly. If the state has already been cleaned, those callbacks can throw `NullPointerException` and interrupt terminal-state convergence. This PR makes those state transitions null-safe by: - falling back to the local in-memory state when the distributed state has already been cleaned - skipping timestamp persistence when the timestamp map entry has already been removed - avoiding re-writing already-cleaned state back into the distributed maps It also adds targeted regression tests for the cleanup-race path and an engine E2E scenario for the `BATCH + no checkpoint + job.retry.times=0` failure path, so node shutdown must converge to a terminal state instead of hanging in the middle. ### Does this PR introduce _any_ user-facing change? No user-facing API/config change. This changes failure handling so jobs are less likely to hang in an intermediate state after node failure. ### How was this patch tested? Verified locally: - `./mvnw -nsu -pl seatunnel-engine/seatunnel-engine-server spotless:check` - `./mvnw -nsu -pl seatunnel-e2e/seatunnel-engine-e2e/connector-seatunnel-e2e-base spotless:check` Additional notes: - Added targeted regression test: `StateTransitionCleanupTest` - Added engine E2E coverage: `ClusterFailureNoRestoreIT` - Full Maven test/compile validation in this checkout is currently blocked by unrelated upstream build issues in other reactor modules (for example `seatunnel-config-shade`), so this PR is opened as draft. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
