MartijnVisser opened a new pull request, #28640: URL: https://github.com/apache/flink/pull/28640
## What is the purpose of the change Backport of the FLINK-39921 and FLINK-39929 test-stability fixes from master to release-2.3. `ExecutionVertexCancelTest.testSendCancelAndReceiveFail` and `ExecutionTimeBasedSlowTaskDetectorTest` flake on release-2.3 nightlies with `IllegalStateException: BUG: trying to schedule a region which is not in CREATED state` at `startScheduling` (e.g. [build 76729](https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=76729&view=results), where `ExecutionVertexCancelTest` failed in both the `test_ci core` and `test_cron_azure core` legs). Root cause (same as on master): since FLINK-38114, TaskDeploymentDescriptor creation is offloaded to the I/O executor and the deploy continuations complete on background threads. The `ComponentMainThreadExecutorServiceAdapter.forMainThread()` test executor asserts the caller is the test main thread, so these completions trip the assertion and surface as scheduling-state errors. FLINK-38114 is on release-2.3, so the assertion relaxation applied by the master fixes is equally justified here. Test-only change; the production scheduler is not affected. ## Brief change log Clean `git cherry-pick -x` of the merged master fixes (byte-identical, original authorship preserved): - FLINK-39921 (master a8d6a287ae2, #28449): use `NoMainThreadCheckComponentMainThreadExecutor` and wait for TDD creation via `ExecutionUtils.waitForTaskDeploymentDescriptorsCreation` in `ExecutionVertexCancelTest`. - FLINK-39929 (master 67eba458114, #28434): same treatment in `ExecutionTimeBasedSlowTaskDetectorTest`. The sibling fixes of this family (FLINK-39387, FLINK-39914, FLINK-39922) are already present on release-2.3. ## Verifying this change This change is already covered by existing tests: ran `ExecutionVertexCancelTest` and `ExecutionTimeBasedSlowTaskDetectorTest` 4x each on this branch, 20/20 tests green every run. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no - The serializers: no - The runtime per-record code paths (performance sensitive): no - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no (test-only) - The S3 file system connector: no ## Documentation - Does this pull request introduce a new feature? no - If yes, how is the feature documented? not applicable --- ##### Was generative AI tooling used to co-author this PR? - [X] Yes (Claude Code) Generated-by: Claude Code (Claude Fable 5) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
