[ 
https://issues.apache.org/jira/browse/FLINK-38223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gustavo de Morais updated FLINK-38223:
--------------------------------------
    Description: 
Both these suites are really  flaky on master. Tests like 
testConstraintsAfterRestart and testCancelWhileFailing are constantly failing 
CI pipelines with errors like.

You can reproduce it locally by running the suite locally.
{code:java}
Aug 11 00:04:37 00:04:37.047 [ERROR] Errors: 
Aug 11 00:04:37 00:04:37.047 [ERROR]   
ExecutionGraphCoLocationRestartTest.testConstraintsAfterRestart:113 » Timeout 
Not all executions fulfilled the predicate in time. {code}
{code:java}
org.opentest4j.AssertionFailedError: expected: RUNNING but was: FAILINGExpected 
:RUNNINGActual   :FAILING<Click to see difference>

        at 
org.apache.flink.runtime.executiongraph.ExecutionGraphRestartTest.testCancelWhileFailing(ExecutionGraphRestartTest.java:217)
 at java.base/java.lang.reflect.Method.invoke(Method.java:568)   at 
java.base/java.util.concurrent.ForkJoinTask.doExec$$$capture(ForkJoinTask.java:373)
  at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java)      
  at 
java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
   at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)  
   at 
java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)   
     at 
java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
       Suppressed: java.lang.IllegalStateException: Free slot must not be used. 
               at 
org.apache.flink.util.Preconditions.checkState(Preconditions.java:193)          
     at 
org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool.releaseSlots(DefaultDeclarativeSlotPool.java:564)
             at 
org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool.freeAndReleaseSlots(DefaultDeclarativeSlotPool.java:507)
              at 
org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool.releaseSlots(DefaultDeclarativeSlotPool.java:477)
             at 
org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolService.internalReleaseTaskManager(DeclarativeSlotPoolService.java:281)
               at 
org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolService.releaseAllTaskManagers(DeclarativeSlotPoolService.java:271)
           at 
org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolService.close(DeclarativeSlotPoolService.java:160)
            at 
org.apache.flink.runtime.executiongraph.ExecutionGraphRestartTest.testCancelWhileFailing(ExecutionGraphRestartTest.java:200)
         ... 7 more
 {code}
{code:java}
java.util.concurrent.TimeoutException: Not all executions fulfilled the 
predicate in time.
        at 
org.apache.flink.runtime.executiongraph.ExecutionGraphTestUtils.waitForAllExecutionsPredicate(ExecutionGraphTestUtils.java:203)
      at 
org.apache.flink.runtime.executiongraph.ExecutionGraphCoLocationRestartTest.testConstraintsAfterRestart(ExecutionGraphCoLocationRestartTest.java:113)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)   at 
java.base/java.util.concurrent.ForkJoinTask.doExec$$$capture(ForkJoinTask.java:373)
  at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java)      
  at 
java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
   at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)  
   at 
java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)   
     at 
java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
 {code}
CI Link example

[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=69283&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=1ffc5ec2-7913-50ff-0177-3fca16f1b8f0]

 

  was:
Both these suites are really  flaky on master. Tests like 
testConstraintsAfterRestart and testCancelWhileFailing are constantly failing 
CI pipelines with errors like
{code:java}
Aug 11 00:04:37 00:04:37.047 [ERROR] Errors: 
Aug 11 00:04:37 00:04:37.047 [ERROR]   
ExecutionGraphCoLocationRestartTest.testConstraintsAfterRestart:113 » Timeout 
Not all executions fulfilled the predicate in time. {code}
{code:java}
org.opentest4j.AssertionFailedError: expected: RUNNING but was: FAILINGExpected 
:RUNNINGActual   :FAILING<Click to see difference>

        at 
org.apache.flink.runtime.executiongraph.ExecutionGraphRestartTest.testCancelWhileFailing(ExecutionGraphRestartTest.java:217)
 at java.base/java.lang.reflect.Method.invoke(Method.java:568)   at 
java.base/java.util.concurrent.ForkJoinTask.doExec$$$capture(ForkJoinTask.java:373)
  at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java)      
  at 
java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
   at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)  
   at 
java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)   
     at 
java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
       Suppressed: java.lang.IllegalStateException: Free slot must not be used. 
               at 
org.apache.flink.util.Preconditions.checkState(Preconditions.java:193)          
     at 
org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool.releaseSlots(DefaultDeclarativeSlotPool.java:564)
             at 
org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool.freeAndReleaseSlots(DefaultDeclarativeSlotPool.java:507)
              at 
org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool.releaseSlots(DefaultDeclarativeSlotPool.java:477)
             at 
org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolService.internalReleaseTaskManager(DeclarativeSlotPoolService.java:281)
               at 
org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolService.releaseAllTaskManagers(DeclarativeSlotPoolService.java:271)
           at 
org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolService.close(DeclarativeSlotPoolService.java:160)
            at 
org.apache.flink.runtime.executiongraph.ExecutionGraphRestartTest.testCancelWhileFailing(ExecutionGraphRestartTest.java:200)
         ... 7 more
 {code}
{code:java}
java.util.concurrent.TimeoutException: Not all executions fulfilled the 
predicate in time.
        at 
org.apache.flink.runtime.executiongraph.ExecutionGraphTestUtils.waitForAllExecutionsPredicate(ExecutionGraphTestUtils.java:203)
      at 
org.apache.flink.runtime.executiongraph.ExecutionGraphCoLocationRestartTest.testConstraintsAfterRestart(ExecutionGraphCoLocationRestartTest.java:113)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)   at 
java.base/java.util.concurrent.ForkJoinTask.doExec$$$capture(ForkJoinTask.java:373)
  at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java)      
  at 
java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
   at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)  
   at 
java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)   
     at 
java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
 {code}
 


> ExecutionGraphRestartTest and ExecutionGraphCoLocationRestartTest are flaky 
> on master
> -------------------------------------------------------------------------------------
>
>                 Key: FLINK-38223
>                 URL: https://issues.apache.org/jira/browse/FLINK-38223
>             Project: Flink
>          Issue Type: Bug
>          Components: Tests
>    Affects Versions: 2.1
>            Reporter: Gustavo de Morais
>            Priority: Major
>             Fix For: 2.2
>
>
> Both these suites are really  flaky on master. Tests like 
> testConstraintsAfterRestart and testCancelWhileFailing are constantly failing 
> CI pipelines with errors like.
> You can reproduce it locally by running the suite locally.
> {code:java}
> Aug 11 00:04:37 00:04:37.047 [ERROR] Errors: 
> Aug 11 00:04:37 00:04:37.047 [ERROR]   
> ExecutionGraphCoLocationRestartTest.testConstraintsAfterRestart:113 » Timeout 
> Not all executions fulfilled the predicate in time. {code}
> {code:java}
> org.opentest4j.AssertionFailedError: expected: RUNNING but was: 
> FAILINGExpected :RUNNINGActual   :FAILING<Click to see difference>
>       at 
> org.apache.flink.runtime.executiongraph.ExecutionGraphRestartTest.testCancelWhileFailing(ExecutionGraphRestartTest.java:217)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:568)   at 
> java.base/java.util.concurrent.ForkJoinTask.doExec$$$capture(ForkJoinTask.java:373)
>   at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java)    
>     at 
> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
>    at 
> java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)     
> at 
> java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) 
>        at 
> java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
>        Suppressed: java.lang.IllegalStateException: Free slot must not be 
> used.                at 
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:193)        
>        at 
> org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool.releaseSlots(DefaultDeclarativeSlotPool.java:564)
>              at 
> org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool.freeAndReleaseSlots(DefaultDeclarativeSlotPool.java:507)
>               at 
> org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool.releaseSlots(DefaultDeclarativeSlotPool.java:477)
>              at 
> org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolService.internalReleaseTaskManager(DeclarativeSlotPoolService.java:281)
>                at 
> org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolService.releaseAllTaskManagers(DeclarativeSlotPoolService.java:271)
>            at 
> org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolService.close(DeclarativeSlotPoolService.java:160)
>             at 
> org.apache.flink.runtime.executiongraph.ExecutionGraphRestartTest.testCancelWhileFailing(ExecutionGraphRestartTest.java:200)
>          ... 7 more
>  {code}
> {code:java}
> java.util.concurrent.TimeoutException: Not all executions fulfilled the 
> predicate in time.
>       at 
> org.apache.flink.runtime.executiongraph.ExecutionGraphTestUtils.waitForAllExecutionsPredicate(ExecutionGraphTestUtils.java:203)
>       at 
> org.apache.flink.runtime.executiongraph.ExecutionGraphCoLocationRestartTest.testConstraintsAfterRestart(ExecutionGraphCoLocationRestartTest.java:113)
>         at java.base/java.lang.reflect.Method.invoke(Method.java:568)   at 
> java.base/java.util.concurrent.ForkJoinTask.doExec$$$capture(ForkJoinTask.java:373)
>   at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java)    
>     at 
> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
>    at 
> java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)     
> at 
> java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) 
>        at 
> java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
>  {code}
> CI Link example
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=69283&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=1ffc5ec2-7913-50ff-0177-3fca16f1b8f0]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to