[
https://issues.apache.org/jira/browse/FLINK-38223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gustavo de Morais updated FLINK-38223:
--------------------------------------
Description:
Both these suites are really flaky on master. Tests like
testConstraintsAfterRestart and testCancelWhileFailing are constantly failing
CI pipelines with errors like
{code:java}
Aug 11 00:04:37 00:04:37.047 [ERROR] Errors:
Aug 11 00:04:37 00:04:37.047 [ERROR]
ExecutionGraphCoLocationRestartTest.testConstraintsAfterRestart:113 » Timeout
Not all executions fulfilled the predicate in time. {code}
{code:java}
org.opentest4j.AssertionFailedError: expected: RUNNING but was: FAILINGExpected
:RUNNINGActual :FAILING<Click to see difference>
at
org.apache.flink.runtime.executiongraph.ExecutionGraphRestartTest.testCancelWhileFailing(ExecutionGraphRestartTest.java:217)
at java.base/java.lang.reflect.Method.invoke(Method.java:568) at
java.base/java.util.concurrent.ForkJoinTask.doExec$$$capture(ForkJoinTask.java:373)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java)
at
java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
at
java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
at
java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
Suppressed: java.lang.IllegalStateException: Free slot must not be used.
at
org.apache.flink.util.Preconditions.checkState(Preconditions.java:193)
at
org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool.releaseSlots(DefaultDeclarativeSlotPool.java:564)
at
org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool.freeAndReleaseSlots(DefaultDeclarativeSlotPool.java:507)
at
org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool.releaseSlots(DefaultDeclarativeSlotPool.java:477)
at
org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolService.internalReleaseTaskManager(DeclarativeSlotPoolService.java:281)
at
org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolService.releaseAllTaskManagers(DeclarativeSlotPoolService.java:271)
at
org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolService.close(DeclarativeSlotPoolService.java:160)
at
org.apache.flink.runtime.executiongraph.ExecutionGraphRestartTest.testCancelWhileFailing(ExecutionGraphRestartTest.java:200)
... 7 more
{code}
{code:java}
java.util.concurrent.TimeoutException: Not all executions fulfilled the
predicate in time.
at
org.apache.flink.runtime.executiongraph.ExecutionGraphTestUtils.waitForAllExecutionsPredicate(ExecutionGraphTestUtils.java:203)
at
org.apache.flink.runtime.executiongraph.ExecutionGraphCoLocationRestartTest.testConstraintsAfterRestart(ExecutionGraphCoLocationRestartTest.java:113)
at java.base/java.lang.reflect.Method.invoke(Method.java:568) at
java.base/java.util.concurrent.ForkJoinTask.doExec$$$capture(ForkJoinTask.java:373)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java)
at
java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
at
java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
at
java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
{code}
was:
The testConstraintsAfterRestart seems to currently be quite flaky causing CI
pipelines to fail with
{code:java}
java.util.concurrent.TimeoutException: Not all executions fulfilled the
predicate in time.
at
org.apache.flink.runtime.executiongraph.ExecutionGraphTestUtils.waitForAllExecutionsPredicate(ExecutionGraphTestUtils.java:203)
at
org.apache.flink.runtime.executiongraph.ExecutionGraphCoLocationRestartTest.testConstraintsAfterRestart(ExecutionGraphCoLocationRestartTest.java:113)
at java.base/java.lang.reflect.Method.invoke(Method.java:568) at
java.base/java.util.concurrent.ForkJoinTask.doExec$$$capture(ForkJoinTask.java:373)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java)
at
java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
at
java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
at
java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
{code}
> ExecutionGraphRestartTest and ExecutionGraphCoLocationRestartTest are flaky
> on master
> -------------------------------------------------------------------------------------
>
> Key: FLINK-38223
> URL: https://issues.apache.org/jira/browse/FLINK-38223
> Project: Flink
> Issue Type: Bug
> Components: Tests
> Affects Versions: 2.1
> Reporter: Gustavo de Morais
> Priority: Major
> Fix For: 2.2
>
>
> Both these suites are really flaky on master. Tests like
> testConstraintsAfterRestart and testCancelWhileFailing are constantly failing
> CI pipelines with errors like
> {code:java}
> Aug 11 00:04:37 00:04:37.047 [ERROR] Errors:
> Aug 11 00:04:37 00:04:37.047 [ERROR]
> ExecutionGraphCoLocationRestartTest.testConstraintsAfterRestart:113 » Timeout
> Not all executions fulfilled the predicate in time. {code}
> {code:java}
> org.opentest4j.AssertionFailedError: expected: RUNNING but was:
> FAILINGExpected :RUNNINGActual :FAILING<Click to see difference>
> at
> org.apache.flink.runtime.executiongraph.ExecutionGraphRestartTest.testCancelWhileFailing(ExecutionGraphRestartTest.java:217)
> at java.base/java.lang.reflect.Method.invoke(Method.java:568) at
> java.base/java.util.concurrent.ForkJoinTask.doExec$$$capture(ForkJoinTask.java:373)
> at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java)
> at
> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
> at
> java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
> at
> java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
> at
> java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
> Suppressed: java.lang.IllegalStateException: Free slot must not be
> used. at
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:193)
> at
> org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool.releaseSlots(DefaultDeclarativeSlotPool.java:564)
> at
> org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool.freeAndReleaseSlots(DefaultDeclarativeSlotPool.java:507)
> at
> org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool.releaseSlots(DefaultDeclarativeSlotPool.java:477)
> at
> org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolService.internalReleaseTaskManager(DeclarativeSlotPoolService.java:281)
> at
> org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolService.releaseAllTaskManagers(DeclarativeSlotPoolService.java:271)
> at
> org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolService.close(DeclarativeSlotPoolService.java:160)
> at
> org.apache.flink.runtime.executiongraph.ExecutionGraphRestartTest.testCancelWhileFailing(ExecutionGraphRestartTest.java:200)
> ... 7 more
> {code}
> {code:java}
> java.util.concurrent.TimeoutException: Not all executions fulfilled the
> predicate in time.
> at
> org.apache.flink.runtime.executiongraph.ExecutionGraphTestUtils.waitForAllExecutionsPredicate(ExecutionGraphTestUtils.java:203)
> at
> org.apache.flink.runtime.executiongraph.ExecutionGraphCoLocationRestartTest.testConstraintsAfterRestart(ExecutionGraphCoLocationRestartTest.java:113)
> at java.base/java.lang.reflect.Method.invoke(Method.java:568) at
> java.base/java.util.concurrent.ForkJoinTask.doExec$$$capture(ForkJoinTask.java:373)
> at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java)
> at
> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
> at
> java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
> at
> java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
> at
> java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)