Roman Khachatryan created FLINK-35787:
-----------------------------------------

             Summary: DefaultSlotStatusSyncer might bring down JVM (exit code 
239 instead of a proper shutdown)
                 Key: FLINK-35787
                 URL: https://issues.apache.org/jira/browse/FLINK-35787
             Project: Flink
          Issue Type: Bug
            Reporter: Roman Khachatryan


In our internal CI, I've encountered the following error:
{code:java}
* 12:02:47,205 [   pool-126-thread-1] ERROR 
org.apache.flink.util.FatalExitExceptionHandler              [] - FATAL: Thread 
'pool-126-thread-1' produced an uncaught exception. Stopping the process...
  java.util.concurrent.CompletionException: 
java.util.concurrent.RejectedExecutionException: Task 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@38ce013a[Not
 completed, task = 
java.util.concurrent.Executors$RunnableAdapter@640a9cf7[Wrapped task = 
java.util.concurrent.>
          at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314)
 ~[?:?]
          at 
java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:951)
 ~[?:?]
          at 
java.util.concurrent.CompletableFuture.handleAsync(CompletableFuture.java:2282) 
~[?:?]
          at 
org.apache.flink.runtime.resourcemanager.slotmanager.DefaultSlotStatusSyncer.allocateSlot(DefaultSlotStatusSyncer.java:138)
 ~[classes/:?]
          at 
org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.allocateSlotsAccordingTo(FineGrainedSlotManager.java:722)
 ~[classes/:?]
          at 
org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.checkResourceRequirements(FineGrainedSlotManager.java:645)
 ~[classes/:?]
          at 
org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.lambda$checkResourceRequirementsWithDelay$12(FineGrainedSlotManager.java:603)
 ~[classes/:?]
          at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
          at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
          at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
 [?:?]
          at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
[?:?]
          at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
[?:?]
          at java.lang.Thread.run(Thread.java:829) [?:?]
  Caused by: java.util.concurrent.RejectedExecutionException: Task 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@38ce013a[Not
 completed, task = 
java.util.concurrent.Executors$RunnableAdapter@640a9cf7[Wrapped task = 
java.util.concurrent.CompletableFuture$UniHandle@f3d>
          at 
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055)
 ~[?:?]
          at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:825) 
~[?:?]
          at 
java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:340)
 ~[?:?]
          at 
java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:562)
 ~[?:?]
          at 
java.util.concurrent.ScheduledThreadPoolExecutor.execute(ScheduledThreadPoolExecutor.java:705)
 ~[?:?]
          at 
java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:687)
 ~[?:?]
          at 
java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:949)
 ~[?:?]
          ... 11 more{code}
>From the code, it looks like RM main thread executor was shut down, and that 
>triggered JVM exit:
{code:java}
        CompletableFuture<Acknowledge> requestFuture =
                gateway.requestSlot(
                        SlotID.getDynamicSlotID(resourceId),
                        jobId,
                        allocationId,
                        resourceProfile,
                        targetAddress,
                        resourceManagerId,
                        taskManagerRequestTimeout);        
CompletableFuture<Void> returnedFuture = new CompletableFuture<>();        
FutureUtils.assertNoException(
                requestFuture.handleAsync(
                        (Acknowledge acknowledge, Throwable throwable) -> { ... 
},
                        mainThreadExecutor));
 {code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to