[ 
https://issues.apache.org/jira/browse/FLINK-39414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18073707#comment-18073707
 ] 

Taranpreet Kaur commented on FLINK-39414:
-----------------------------------------

I also observed some flakiness in test 
testEnableBatchJobRecoveryAndNotRetainPartitions and the reason behind is:
Two things happen at exactly duration(field)=10s:
1. The Pekko scheduler fires the partition cleanup callback (scheduled at 
REGISTRATION_TIMEOUT = duration)
2. Future.get(duration) times out inside willNotCompleteWithin

These race each other. If the scheduled task completes the future at or just 
before duration, willNotCompleteWithin fails because the future did complete 
within the window.

> TaskExecutorPartitionLifecycleTest#testEnableBatchJobRecoveryAnd* take long 
> time to complete
> --------------------------------------------------------------------------------------------
>
>                 Key: FLINK-39414
>                 URL: https://issues.apache.org/jira/browse/FLINK-39414
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>    Affects Versions: 2.0.1, 2.2.0, 2.1.1
>            Reporter: Matthias Pohl
>            Priority: Minor
>              Labels: pull-request-available, test-stability
>
> There are two test methods in {{TaskExecutorPartitionLifecycleTest}} which 
> take quite long to finish:
> * {{testEnableBatchJobRecoveryAndNotRetainPartitions}} takes 15s
> * {{testEnableBatchJobRecoveryAndRetainPartitions}} takes 30s
> We might want to look into why they take that long and maybe add some 
> improvement here to speed up CI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to