[
https://issues.apache.org/jira/browse/FLINK-39414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18073707#comment-18073707
]
Taranpreet Kaur commented on FLINK-39414:
-----------------------------------------
I also observed some flakiness in test
testEnableBatchJobRecoveryAndNotRetainPartitions and the reason behind is:
Two things happen at exactly duration(field)=10s:
1. The Pekko scheduler fires the partition cleanup callback (scheduled at
REGISTRATION_TIMEOUT = duration)
2. Future.get(duration) times out inside willNotCompleteWithin
These race each other. If the scheduled task completes the future at or just
before duration, willNotCompleteWithin fails because the future did complete
within the window.
> TaskExecutorPartitionLifecycleTest#testEnableBatchJobRecoveryAnd* take long
> time to complete
> --------------------------------------------------------------------------------------------
>
> Key: FLINK-39414
> URL: https://issues.apache.org/jira/browse/FLINK-39414
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Coordination
> Affects Versions: 2.0.1, 2.2.0, 2.1.1
> Reporter: Matthias Pohl
> Priority: Minor
> Labels: pull-request-available, test-stability
>
> There are two test methods in {{TaskExecutorPartitionLifecycleTest}} which
> take quite long to finish:
> * {{testEnableBatchJobRecoveryAndNotRetainPartitions}} takes 15s
> * {{testEnableBatchJobRecoveryAndRetainPartitions}} takes 30s
> We might want to look into why they take that long and maybe add some
> improvement here to speed up CI.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)