[ https://issues.apache.org/jira/browse/FLINK-34336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17818614#comment-17818614 ]
Rui Fan commented on FLINK-34336: --------------------------------- Merged to master(1.20) via: e2e3de2d48e3f02b746bdbdcb4da7b0477986a11 > AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState > may hang sometimes > --------------------------------------------------------------------------------------------- > > Key: FLINK-34336 > URL: https://issues.apache.org/jira/browse/FLINK-34336 > Project: Flink > Issue Type: Bug > Components: Tests > Affects Versions: 1.19.0, 1.20.0 > Reporter: Rui Fan > Assignee: Rui Fan > Priority: Major > Labels: pull-request-available, test-stability > Fix For: 1.19.0 > > > AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState > may hang in > waitForRunningTasks({color:#9876aa}restClusterClient{color}{color:#cc7832}, > {color}jobID{color:#cc7832}, {color}parallelism2){color:#cc7832};{color} > h2. Reason: > The job has 2 tasks(vertices), after calling updateJobResourceRequirements. > The source parallelism isn't changed (It's parallelism) , and the > FlatMapper+Sink is changed from parallelism to parallelism2. > So we expect the task number should be parallelism + parallelism2 instead of > parallelism2. > > h2. Why it can be passed for now? > Flink 1.19 supports the scaling cooldown, and the cooldown time is 30s by > default. It means, flink job will rescale job 30 seconds after > updateJobResourceRequirements is called. > > So the running tasks are old parallelism when we call > waitForRunningTasks({color:#9876aa}restClusterClient{color}{color:#cc7832}, > {color}jobID{color:#cc7832}, {color}parallelism2){color:#cc7832};. {color} > IIUC, it cannot be guaranteed, and it's unexpected. > > h2. How to reproduce this bug? > [https://github.com/1996fanrui/flink/commit/ffd713e24d37db2c103e4cd4361d0cd916d0d2f6] > * Disable the cooldown > * Sleep for a while before waitForRunningTasks > If so, the job running in new parallelism, so `waitForRunningTasks` will hang > forever. -- This message was sent by Atlassian Jira (v8.20.10#820010)