MartijnVisser opened a new pull request, #28640:
URL: https://github.com/apache/flink/pull/28640

   ## What is the purpose of the change
   
   Backport of the FLINK-39921 and FLINK-39929 test-stability fixes from master 
to release-2.3.
   
   `ExecutionVertexCancelTest.testSendCancelAndReceiveFail` and 
`ExecutionTimeBasedSlowTaskDetectorTest` flake on release-2.3 nightlies with 
`IllegalStateException: BUG: trying to schedule a region which is not in 
CREATED state` at `startScheduling` (e.g. [build 
76729](https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=76729&view=results),
 where `ExecutionVertexCancelTest` failed in both the `test_ci core` and 
`test_cron_azure core` legs).
   
   Root cause (same as on master): since FLINK-38114, TaskDeploymentDescriptor 
creation is offloaded to the I/O executor and the deploy continuations complete 
on background threads. The 
`ComponentMainThreadExecutorServiceAdapter.forMainThread()` test executor 
asserts the caller is the test main thread, so these completions trip the 
assertion and surface as scheduling-state errors. FLINK-38114 is on 
release-2.3, so the assertion relaxation applied by the master fixes is equally 
justified here. Test-only change; the production scheduler is not affected.
   
   ## Brief change log
   
   Clean `git cherry-pick -x` of the merged master fixes (byte-identical, 
original authorship preserved):
   
     - FLINK-39921 (master a8d6a287ae2, #28449): use 
`NoMainThreadCheckComponentMainThreadExecutor` and wait for TDD creation via 
`ExecutionUtils.waitForTaskDeploymentDescriptorsCreation` in 
`ExecutionVertexCancelTest`.
     - FLINK-39929 (master 67eba458114, #28434): same treatment in 
`ExecutionTimeBasedSlowTaskDetectorTest`.
   
   The sibling fixes of this family (FLINK-39387, FLINK-39914, FLINK-39922) are 
already present on release-2.3.
   
   ## Verifying this change
   
   This change is already covered by existing tests: ran 
`ExecutionVertexCancelTest` and `ExecutionTimeBasedSlowTaskDetectorTest` 4x 
each on this branch, 20/20 tests green every run.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
     - The serializers: no
     - The runtime per-record code paths (performance sensitive): no
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no (test-only)
     - The S3 file system connector: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? no
     - If yes, how is the feature documented? not applicable
   
   ---
   
   ##### Was generative AI tooling used to co-author this PR?
   
   - [X] Yes (Claude Code)
   
   Generated-by: Claude Code (Claude Fable 5)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to