MartijnVisser opened a new pull request, #28641: URL: https://github.com/apache/flink/pull/28641
## What is the purpose of the change Backport of the async-TDD `ComponentMainThreadExecutor` test-stability fix family from master to release-2.2. This PR intentionally bundles five JIRAs because they form one coherent cluster with an internal dependency (the FLINK-39387 pick introduces `NoMainThreadCheckComponentMainThreadExecutor`, which the later picks need to compile); each is a separate clean `-x` cherry-pick, so individual reverts remain possible. Observed on release-2.2 nightlies: `ExecutionVertexCancelTest.testSendCancelAndReceiveFail` and `ExecutionTimeBasedSlowTaskDetectorTest.testFinishedTaskNotExceedRatio` fail with `IllegalStateException: BUG: trying to schedule a region which is not in CREATED state` at `startScheduling` ([build 76677](https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=76677&view=results), `test_cron_jdk21 core` and `test_cron_azure core` legs). Root cause (same as on master): since FLINK-38114, TaskDeploymentDescriptor creation is offloaded to the I/O executor and deploy continuations complete on background threads. The `ComponentMainThreadExecutorServiceAdapter.forMainThread()` test executor asserts the caller is the test main thread, so these completions trip the assertion. FLINK-38114 is on release-2.2, so the assertion relaxation applied by the master fixes is equally justified here. Test-only change; the production scheduler is not affected. The same fixes are already on release-2.3 (partly inherited, partly via the parallel backport PR). ## Brief change log Clean `git cherry-pick -x` of the merged master fixes, in dependency order (byte-identical, original authorship preserved): - FLINK-39387 (master 106acbced18): prerequisite, introduces `NoMainThreadCheckComponentMainThreadExecutor` (absent on release-2.2). - FLINK-39914 (master 6d47db1d541, #28398): fix flaky `TaskDeploymentDescriptorFactoryTest#testHybridVertexFinish`. - FLINK-39922 (master 6d106cda908, #28425): fix flaky `AbstractAsyncRunnableStreamOperatorTest#testCheckpointDrain`. - FLINK-39921 (master a8d6a287ae2, #28449): fix flaky `ExecutionVertexCancelTest.testSendCancelAndReceiveFail`. - FLINK-39929 (master 67eba458114, #28434): fix flaky `ExecutionTimeBasedSlowTaskDetectorTest`. ## Verifying this change This change is already covered by existing tests: - `ExecutionVertexCancelTest` + `ExecutionTimeBasedSlowTaskDetectorTest`: 4 runs, 20/20 green each run. - `TaskDeploymentDescriptorFactoryTest` + `AbstractAsyncRunnableStreamOperatorTest`: green (36/36 combined run). - `flink-runtime` test-compile verified at every individual commit boundary. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no - The serializers: no - The runtime per-record code paths (performance sensitive): no - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no (test-only) - The S3 file system connector: no ## Documentation - Does this pull request introduce a new feature? no - If yes, how is the feature documented? not applicable --- ##### Was generative AI tooling used to co-author this PR? - [X] Yes (Claude Code) Generated-by: Claude Code (Claude Fable 5) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
