[ 
https://issues.apache.org/jira/browse/FLINK-34336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17818400#comment-17818400
 ] 

Matthias Pohl commented on FLINK-34336:
---------------------------------------

https://github.com/apache/flink/actions/runs/7938595320/job/21677775512#step:10:11191

> AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState 
> may hang sometimes
> ---------------------------------------------------------------------------------------------
>
>                 Key: FLINK-34336
>                 URL: https://issues.apache.org/jira/browse/FLINK-34336
>             Project: Flink
>          Issue Type: Bug
>          Components: Tests
>    Affects Versions: 1.19.0, 1.20.0
>            Reporter: Rui Fan
>            Assignee: Rui Fan
>            Priority: Major
>              Labels: pull-request-available, test-stability
>             Fix For: 1.19.0
>
>
> AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState 
> may hang in 
> waitForRunningTasks({color:#9876aa}restClusterClient{color}{color:#cc7832}, 
> {color}jobID{color:#cc7832}, {color}parallelism2){color:#cc7832};{color}
> h2. Reason:
> The job has 2 tasks(vertices), after calling updateJobResourceRequirements. 
> The source parallelism isn't changed (It's parallelism) , and the 
> FlatMapper+Sink is changed from  parallelism to parallelism2.
> So we expect the task number should be parallelism + parallelism2 instead of 
> parallelism2.
>  
> h2. Why it can be passed for now?
> Flink 1.19 supports the scaling cooldown, and the cooldown time is 30s by 
> default. It means, flink job will rescale job 30 seconds after 
> updateJobResourceRequirements is called.
>  
> So the running tasks are old parallelism when we call 
> waitForRunningTasks({color:#9876aa}restClusterClient{color}{color:#cc7832}, 
> {color}jobID{color:#cc7832}, {color}parallelism2){color:#cc7832};. {color}
> IIUC, it cannot be guaranteed, and it's unexpected.
>  
> h2. How to reproduce this bug?
> [https://github.com/1996fanrui/flink/commit/ffd713e24d37db2c103e4cd4361d0cd916d0d2f6]
>  * Disable the cooldown
>  * Sleep for a while before waitForRunningTasks
> If so, the job running in new parallelism, so `waitForRunningTasks` will hang 
> forever.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to