[ 
https://issues.apache.org/jira/browse/FLINK-7352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16119492#comment-16119492
 ] 

Till Rohrmann commented on FLINK-7352:
--------------------------------------

I think [~StephanEwen] is right and the problem is 
https://github.com/apache/flink/blob/master/flink-runtime/src/test/java/org/apache/flink/runtime/executiongraph/ExecutionGraphTestUtils.java#L203.
 You can simulate it by removing the sleep and introducing a small sleep in 
https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java#L401.

I think the solution would be to wait on the {{SimpleAckingTaskManagerGateway}} 
until it has received all task submissions before switching the {{Executions}} 
to running.

> ExecutionGraphRestartTest timeouts
> ----------------------------------
>
>                 Key: FLINK-7352
>                 URL: https://issues.apache.org/jira/browse/FLINK-7352
>             Project: Flink
>          Issue Type: Bug
>          Components: Distributed Coordination, Tests
>    Affects Versions: 1.4.0, 1.3.2
>            Reporter: Nico Kruber
>            Priority: Critical
>              Labels: test-stability
>
> Recently, I received timeouts from some tests in 
> {{ExecutionGraphRestartTest}} like this
> {code}
> Tests in error: 
>   ExecutionGraphRestartTest.testConcurrentLocalFailAndRestart:638 ยป Timeout
> {code}
> This particular instance is from 1.3.2 RC2 and stuck in 
> {{ExecutionGraphTestUtils#waitUntilDeployedAndSwitchToRunning()}} but I also 
> had instances stuck in {{ExecutionGraphTestUtils#waitUntilJobStatus}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to