Liu created FLINK-24174:
---------------------------
Summary: MiniClusterTestEnvironment‘s triggerTaskManagerFailover
may stuck in CommonTestUtils.waitForJobStatus()
Key: FLINK-24174
URL: https://issues.apache.org/jira/browse/FLINK-24174
Project: Flink
Issue Type: Improvement
Components: Test Infrastructure
Reporter: Liu
When writing taskmanager failover tests with [unified testing framework for
connectors|https://issues.apache.org/jira/browse/FLINK-19554], I find that it
may stuck in
CommonTestUtils.waitForJobStatus() as following:
# triggerTaskManagerFailover is called.
# JobStatus switched from RUNNING to RESTARTING.
# JobStatus switched from RESTARTING to RUNNING.
# The method terminateTaskManager() is completed.
# Since the jobStatus is RUNNING, CommonTestUtils.waitForJobStatus() will
never exit.
A solution is to call terminateTaskManager() with async way. At the same time,
call
CommonTestUtils.waitForJobStatus(). The pseudo code can be as follow:
{code:java}
public void triggerTaskManagerFailover(JobClient jobClient, Runnable
afterFailAction)
throws Exception {
CompletableFuture<Void> completableFuture = terminateTaskManager();
CommonTestUtils.waitForJobStatus(
jobClient,
Arrays.asList(JobStatus.FAILING, JobStatus.FAILED,
JobStatus.RESTARTING),
Deadline.fromNow(Duration.ofMinutes(5)));
completableFuture.get();
afterFailAction.run();
startTaskManager();
}
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)