[ https://issues.apache.org/jira/browse/FLINK-7352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16119545#comment-16119545 ]
ASF GitHub Bot commented on FLINK-7352: --------------------------------------- GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/4501 [FLINK-7352] [tests] Stabilize ExecutionGraphRestartTest ## What is the purpose of the change Introduce an explicit waiting for the deployment of tasks. This replaces the loose ordering induced by Thread.sleep and fixes the race conditions caused by it. ## Brief change log - Introduce `WaitForTasks` consumer which is given to the `SimpleAckingTaskManagerGateway` - Using a single `SimpleAckingTaskManagerGateway` to receive all task submission calls ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not applicable) You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink fixExecutionGraphRestartTest Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/4501.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4501 ---- commit 40cd0c860dd600ce2baa69b0f0ba8cf7a787ff63 Author: Till Rohrmann <trohrm...@apache.org> Date: 2017-08-09T07:57:56Z [FLINK-7352] [tests] Stabilize ExecutionGraphRestartTest Introduce an explicit waiting for the deployment of tasks. This replaces the loose ordering induced by Thread.sleep and fixes the race conditions caused by it. ---- > ExecutionGraphRestartTest timeouts > ---------------------------------- > > Key: FLINK-7352 > URL: https://issues.apache.org/jira/browse/FLINK-7352 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, Tests > Affects Versions: 1.4.0, 1.3.2 > Reporter: Nico Kruber > Assignee: Till Rohrmann > Priority: Critical > Labels: test-stability > > Recently, I received timeouts from some tests in > {{ExecutionGraphRestartTest}} like this > {code} > Tests in error: > ExecutionGraphRestartTest.testConcurrentLocalFailAndRestart:638 ยป Timeout > {code} > This particular instance is from 1.3.2 RC2 and stuck in > {{ExecutionGraphTestUtils#waitUntilDeployedAndSwitchToRunning()}} but I also > had instances stuck in {{ExecutionGraphTestUtils#waitUntilJobStatus}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)