[ 
https://issues.apache.org/jira/browse/FLINK-38223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gustavo de Morais updated FLINK-38223:
--------------------------------------
    Description: 
Both these suites are really  flaky on master. Tests like 
testConstraintsAfterRestart and testCancelWhileFailing are constantly failing 
CI pipelines with errors like
{code:java}
Aug 11 00:04:37 00:04:37.047 [ERROR] Errors: 
Aug 11 00:04:37 00:04:37.047 [ERROR]   
ExecutionGraphCoLocationRestartTest.testConstraintsAfterRestart:113 » Timeout 
Not all executions fulfilled the predicate in time. {code}
{code:java}
org.opentest4j.AssertionFailedError: expected: RUNNING but was: FAILINGExpected 
:RUNNINGActual   :FAILING<Click to see difference>

        at 
org.apache.flink.runtime.executiongraph.ExecutionGraphRestartTest.testCancelWhileFailing(ExecutionGraphRestartTest.java:217)
 at java.base/java.lang.reflect.Method.invoke(Method.java:568)   at 
java.base/java.util.concurrent.ForkJoinTask.doExec$$$capture(ForkJoinTask.java:373)
  at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java)      
  at 
java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
   at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)  
   at 
java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)   
     at 
java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
       Suppressed: java.lang.IllegalStateException: Free slot must not be used. 
               at 
org.apache.flink.util.Preconditions.checkState(Preconditions.java:193)          
     at 
org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool.releaseSlots(DefaultDeclarativeSlotPool.java:564)
             at 
org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool.freeAndReleaseSlots(DefaultDeclarativeSlotPool.java:507)
              at 
org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool.releaseSlots(DefaultDeclarativeSlotPool.java:477)
             at 
org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolService.internalReleaseTaskManager(DeclarativeSlotPoolService.java:281)
               at 
org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolService.releaseAllTaskManagers(DeclarativeSlotPoolService.java:271)
           at 
org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolService.close(DeclarativeSlotPoolService.java:160)
            at 
org.apache.flink.runtime.executiongraph.ExecutionGraphRestartTest.testCancelWhileFailing(ExecutionGraphRestartTest.java:200)
         ... 7 more
 {code}
{code:java}
java.util.concurrent.TimeoutException: Not all executions fulfilled the 
predicate in time.
        at 
org.apache.flink.runtime.executiongraph.ExecutionGraphTestUtils.waitForAllExecutionsPredicate(ExecutionGraphTestUtils.java:203)
      at 
org.apache.flink.runtime.executiongraph.ExecutionGraphCoLocationRestartTest.testConstraintsAfterRestart(ExecutionGraphCoLocationRestartTest.java:113)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)   at 
java.base/java.util.concurrent.ForkJoinTask.doExec$$$capture(ForkJoinTask.java:373)
  at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java)      
  at 
java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
   at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)  
   at 
java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)   
     at 
java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
 {code}
 

  was:
The testConstraintsAfterRestart seems to currently be quite flaky causing CI 
pipelines to fail with
{code:java}
java.util.concurrent.TimeoutException: Not all executions fulfilled the 
predicate in time.
        at 
org.apache.flink.runtime.executiongraph.ExecutionGraphTestUtils.waitForAllExecutionsPredicate(ExecutionGraphTestUtils.java:203)
      at 
org.apache.flink.runtime.executiongraph.ExecutionGraphCoLocationRestartTest.testConstraintsAfterRestart(ExecutionGraphCoLocationRestartTest.java:113)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)   at 
java.base/java.util.concurrent.ForkJoinTask.doExec$$$capture(ForkJoinTask.java:373)
  at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java)      
  at 
java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
   at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)  
   at 
java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)   
     at 
java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
 {code}
 


> ExecutionGraphRestartTest and ExecutionGraphCoLocationRestartTest are flaky 
> on master
> -------------------------------------------------------------------------------------
>
>                 Key: FLINK-38223
>                 URL: https://issues.apache.org/jira/browse/FLINK-38223
>             Project: Flink
>          Issue Type: Bug
>          Components: Tests
>    Affects Versions: 2.1
>            Reporter: Gustavo de Morais
>            Priority: Major
>             Fix For: 2.2
>
>
> Both these suites are really  flaky on master. Tests like 
> testConstraintsAfterRestart and testCancelWhileFailing are constantly failing 
> CI pipelines with errors like
> {code:java}
> Aug 11 00:04:37 00:04:37.047 [ERROR] Errors: 
> Aug 11 00:04:37 00:04:37.047 [ERROR]   
> ExecutionGraphCoLocationRestartTest.testConstraintsAfterRestart:113 » Timeout 
> Not all executions fulfilled the predicate in time. {code}
> {code:java}
> org.opentest4j.AssertionFailedError: expected: RUNNING but was: 
> FAILINGExpected :RUNNINGActual   :FAILING<Click to see difference>
>       at 
> org.apache.flink.runtime.executiongraph.ExecutionGraphRestartTest.testCancelWhileFailing(ExecutionGraphRestartTest.java:217)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:568)   at 
> java.base/java.util.concurrent.ForkJoinTask.doExec$$$capture(ForkJoinTask.java:373)
>   at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java)    
>     at 
> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
>    at 
> java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)     
> at 
> java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) 
>        at 
> java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
>        Suppressed: java.lang.IllegalStateException: Free slot must not be 
> used.                at 
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:193)        
>        at 
> org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool.releaseSlots(DefaultDeclarativeSlotPool.java:564)
>              at 
> org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool.freeAndReleaseSlots(DefaultDeclarativeSlotPool.java:507)
>               at 
> org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool.releaseSlots(DefaultDeclarativeSlotPool.java:477)
>              at 
> org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolService.internalReleaseTaskManager(DeclarativeSlotPoolService.java:281)
>                at 
> org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolService.releaseAllTaskManagers(DeclarativeSlotPoolService.java:271)
>            at 
> org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolService.close(DeclarativeSlotPoolService.java:160)
>             at 
> org.apache.flink.runtime.executiongraph.ExecutionGraphRestartTest.testCancelWhileFailing(ExecutionGraphRestartTest.java:200)
>          ... 7 more
>  {code}
> {code:java}
> java.util.concurrent.TimeoutException: Not all executions fulfilled the 
> predicate in time.
>       at 
> org.apache.flink.runtime.executiongraph.ExecutionGraphTestUtils.waitForAllExecutionsPredicate(ExecutionGraphTestUtils.java:203)
>       at 
> org.apache.flink.runtime.executiongraph.ExecutionGraphCoLocationRestartTest.testConstraintsAfterRestart(ExecutionGraphCoLocationRestartTest.java:113)
>         at java.base/java.lang.reflect.Method.invoke(Method.java:568)   at 
> java.base/java.util.concurrent.ForkJoinTask.doExec$$$capture(ForkJoinTask.java:373)
>   at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java)    
>     at 
> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
>    at 
> java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)     
> at 
> java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) 
>        at 
> java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to