[jira] [Updated] (FLINK-15320) JobManager crashes in the standalone model when cancelling job which subtask' status is scheduled

2020-01-02 Thread Gary Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-15320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Yao updated FLINK-15320:
-
Description: 
Use start-cluster.sh to start a standalone cluster, and then submit a job from 
the streaming's example which name is TopSpeedWindowing, parallelism is 20. 
Wait for one minute, cancel the job, jobmanager will crash. The exception stack 
is:

{noformat}
2019-12-19 10:12:11,060 ERROR 
org.apache.flink.runtime.util.FatalExitExceptionHandler       - FATAL: Thread 
'flink-akka.actor.default-dispatcher-2' produced an uncaught exception. 
Stopping the process...2019-12-19 10:12:11,060 ERROR 
org.apache.flink.runtime.util.FatalExitExceptionHandler       - FATAL: Thread 
'flink-akka.actor.default-dispatcher-2' produced an uncaught exception. 
Stopping the process...java.util.concurrent.CompletionException: 
java.util.concurrent.CompletionException: java.lang.IllegalStateException: 
Could not assign resource 
org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@583585c6 to 
current execution Attempt #0 (Window(GlobalWindows(), DeltaTrigger, 
TimeEvictor, ComparableAggregator, PassThroughWindowFunction) -> Sink: Print to 
Std. Out (19/20)) @ (unassigned) - [CANCELED]. at 
org.apache.flink.runtime.scheduler.DefaultScheduler.propagateIfNonNull(DefaultScheduler.java:387)
 at 
org.apache.flink.runtime.scheduler.DefaultScheduler.lambda$deployAll$4(DefaultScheduler.java:372)
 at 
java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822) at 
java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
 at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) 
at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
 at 
org.apache.flink.runtime.concurrent.FutureUtils$WaitingConjunctFuture.handleCompletedFuture(FutureUtils.java:705)
 at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
 at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
 at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) 
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) 
at 
org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.lambda$internalAllocateSlot$0(SchedulerImpl.java:170)
 at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
 at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
 at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) 
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) 
at 
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl.tryFulfillSlotRequestOrMakeAvailable(SlotPoolImpl.java:534)
 at 
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl.releaseSingleSlot(SlotPoolImpl.java:479)
 at 
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl.releaseSlot(SlotPoolImpl.java:390)
 at 
org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.release(SlotSharingManager.java:557)
 at 
org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.releaseChild(SlotSharingManager.java:607)
 at 
org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.access$700(SlotSharingManager.java:352)
 at 
org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$SingleTaskSlot.release(SlotSharingManager.java:716)
 at 
org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.releaseSharedSlot(SchedulerImpl.java:552)
 at 
org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.cancelSlotRequest(SchedulerImpl.java:184)
 at 
org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.returnLogicalSlot(SchedulerImpl.java:195)
 at 
org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot.lambda$returnSlotToOwner$0(SingleLogicalSlot.java:181)
 at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
 at 
java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:778)
 at 
java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2140)
 at 
org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot.returnSlotToOwner(SingleLogicalSlot.java:178)
 at 
org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot.releaseSlot(SingleLogicalSlot.java:125)
 at 
org.apache.flink.runtime.executiongraph.Execution.releaseAssignedResource(Execution.java:1451)
 at 
org.apache.flink.runtime.executiongraph.Execution.finishCancellation(Execution.java:1170)
 at 
org.apache.flink.runtime.executiongraph.Execution.completeCancelling(Execution.java:1150)
 at 
org.apache.flink.runtime.executiongraph.Execution.completeCancelling(Execution.java:1129)
 at 
org.apache.flink.runtime.executiongraph.Execution.cancelAtomically(Execution.java:)
 at 

[jira] [Updated] (FLINK-15320) JobManager crashes in the standalone model when cancelling job which subtask' status is scheduled

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-15320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-15320:
---
Labels: pull-request-available  (was: )

> JobManager crashes in the standalone model when cancelling job which subtask' 
> status is scheduled
> -
>
> Key: FLINK-15320
> URL: https://issues.apache.org/jira/browse/FLINK-15320
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.10.0
>Reporter: lining
>Assignee: Zhu Zhu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.10.0
>
>
> Use start-cluster.sh to start a standalone cluster, and then submit a job 
> from the streaming's example which name is TopSpeedWindowing, parallelism is 
> 20. Wait for one minute, cancel the job, jobmanager will crash. The exception 
> stack is:
> 2019-12-19 10:12:11,060 ERROR 
> org.apache.flink.runtime.util.FatalExitExceptionHandler       - FATAL: Thread 
> 'flink-akka.actor.default-dispatcher-2' produced an uncaught exception. 
> Stopping the process...2019-12-19 10:12:11,060 ERROR 
> org.apache.flink.runtime.util.FatalExitExceptionHandler       - FATAL: Thread 
> 'flink-akka.actor.default-dispatcher-2' produced an uncaught exception. 
> Stopping the process...java.util.concurrent.CompletionException: 
> java.util.concurrent.CompletionException: java.lang.IllegalStateException: 
> Could not assign resource 
> org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@583585c6 to 
> current execution Attempt #0 (Window(GlobalWindows(), DeltaTrigger, 
> TimeEvictor, ComparableAggregator, PassThroughWindowFunction) -> Sink: Print 
> to Std. Out (19/20)) @ (unassigned) - [CANCELED]. at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.propagateIfNonNull(DefaultScheduler.java:387)
>  at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.lambda$deployAll$4(DefaultScheduler.java:372)
>  at 
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822) 
> at 
> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
>  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>  at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>  at 
> org.apache.flink.runtime.concurrent.FutureUtils$WaitingConjunctFuture.handleCompletedFuture(FutureUtils.java:705)
>  at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
>  at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
>  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>  at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) 
> at 
> org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.lambda$internalAllocateSlot$0(SchedulerImpl.java:170)
>  at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
>  at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
>  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>  at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) 
> at 
> org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl.tryFulfillSlotRequestOrMakeAvailable(SlotPoolImpl.java:534)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl.releaseSingleSlot(SlotPoolImpl.java:479)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl.releaseSlot(SlotPoolImpl.java:390)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.release(SlotSharingManager.java:557)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.releaseChild(SlotSharingManager.java:607)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.access$700(SlotSharingManager.java:352)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$SingleTaskSlot.release(SlotSharingManager.java:716)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.releaseSharedSlot(SchedulerImpl.java:552)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.cancelSlotRequest(SchedulerImpl.java:184)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.returnLogicalSlot(SchedulerImpl.java:195)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot.lambda$returnSlotToOwner$0(SingleLogicalSlot.java:181)
>  at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
>  at 
> 

[jira] [Updated] (FLINK-15320) JobManager crashes in the standalone model when cancelling job which subtask' status is scheduled

2019-12-18 Thread Yu Li (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-15320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated FLINK-15320:
--
Fix Version/s: 1.10.0

> JobManager crashes in the standalone model when cancelling job which subtask' 
> status is scheduled
> -
>
> Key: FLINK-15320
> URL: https://issues.apache.org/jira/browse/FLINK-15320
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.10.0
>Reporter: lining
>Assignee: Zhu Zhu
>Priority: Blocker
> Fix For: 1.10.0
>
>
> Use start-cluster.sh to start a standalone cluster, and then submit a job 
> from the streaming's example which name is TopSpeedWindowing, parallelism is 
> 20. Wait for one minute, cancel the job, jobmanager will crash. The exception 
> stack is:
> 2019-12-19 10:12:11,060 ERROR 
> org.apache.flink.runtime.util.FatalExitExceptionHandler       - FATAL: Thread 
> 'flink-akka.actor.default-dispatcher-2' produced an uncaught exception. 
> Stopping the process...2019-12-19 10:12:11,060 ERROR 
> org.apache.flink.runtime.util.FatalExitExceptionHandler       - FATAL: Thread 
> 'flink-akka.actor.default-dispatcher-2' produced an uncaught exception. 
> Stopping the process...java.util.concurrent.CompletionException: 
> java.util.concurrent.CompletionException: java.lang.IllegalStateException: 
> Could not assign resource 
> org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@583585c6 to 
> current execution Attempt #0 (Window(GlobalWindows(), DeltaTrigger, 
> TimeEvictor, ComparableAggregator, PassThroughWindowFunction) -> Sink: Print 
> to Std. Out (19/20)) @ (unassigned) - [CANCELED]. at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.propagateIfNonNull(DefaultScheduler.java:387)
>  at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.lambda$deployAll$4(DefaultScheduler.java:372)
>  at 
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822) 
> at 
> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
>  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>  at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>  at 
> org.apache.flink.runtime.concurrent.FutureUtils$WaitingConjunctFuture.handleCompletedFuture(FutureUtils.java:705)
>  at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
>  at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
>  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>  at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) 
> at 
> org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.lambda$internalAllocateSlot$0(SchedulerImpl.java:170)
>  at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
>  at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
>  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>  at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) 
> at 
> org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl.tryFulfillSlotRequestOrMakeAvailable(SlotPoolImpl.java:534)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl.releaseSingleSlot(SlotPoolImpl.java:479)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl.releaseSlot(SlotPoolImpl.java:390)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.release(SlotSharingManager.java:557)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.releaseChild(SlotSharingManager.java:607)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.access$700(SlotSharingManager.java:352)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$SingleTaskSlot.release(SlotSharingManager.java:716)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.releaseSharedSlot(SchedulerImpl.java:552)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.cancelSlotRequest(SchedulerImpl.java:184)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.returnLogicalSlot(SchedulerImpl.java:195)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot.lambda$returnSlotToOwner$0(SingleLogicalSlot.java:181)
>  at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
>  at 
> java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:778)
>  at 
> 

[jira] [Updated] (FLINK-15320) JobManager crashes in the standalone model when cancelling job which subtask' status is scheduled

2019-12-18 Thread lining (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-15320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lining updated FLINK-15320:
---
Summary: JobManager crashes in the standalone model when cancelling job 
which subtask' status is scheduled  (was: JobManager crash in the model of 
standalone)

> JobManager crashes in the standalone model when cancelling job which subtask' 
> status is scheduled
> -
>
> Key: FLINK-15320
> URL: https://issues.apache.org/jira/browse/FLINK-15320
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.10.0
>Reporter: lining
>Priority: Blocker
>
> Use start-cluster.sh to start a standalone cluster, and then submit a job 
> from the streaming's example which name is TopSpeedWindowing, parallelism is 
> 20. Wait for one minute, cancel the job, jobmanager will crash. The exception 
> stack is:
> 2019-12-19 10:12:11,060 ERROR 
> org.apache.flink.runtime.util.FatalExitExceptionHandler       - FATAL: Thread 
> 'flink-akka.actor.default-dispatcher-2' produced an uncaught exception. 
> Stopping the process...2019-12-19 10:12:11,060 ERROR 
> org.apache.flink.runtime.util.FatalExitExceptionHandler       - FATAL: Thread 
> 'flink-akka.actor.default-dispatcher-2' produced an uncaught exception. 
> Stopping the process...java.util.concurrent.CompletionException: 
> java.util.concurrent.CompletionException: java.lang.IllegalStateException: 
> Could not assign resource 
> org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@583585c6 to 
> current execution Attempt #0 (Window(GlobalWindows(), DeltaTrigger, 
> TimeEvictor, ComparableAggregator, PassThroughWindowFunction) -> Sink: Print 
> to Std. Out (19/20)) @ (unassigned) - [CANCELED]. at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.propagateIfNonNull(DefaultScheduler.java:387)
>  at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.lambda$deployAll$4(DefaultScheduler.java:372)
>  at 
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822) 
> at 
> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
>  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>  at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>  at 
> org.apache.flink.runtime.concurrent.FutureUtils$WaitingConjunctFuture.handleCompletedFuture(FutureUtils.java:705)
>  at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
>  at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
>  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>  at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) 
> at 
> org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.lambda$internalAllocateSlot$0(SchedulerImpl.java:170)
>  at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
>  at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
>  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>  at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) 
> at 
> org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl.tryFulfillSlotRequestOrMakeAvailable(SlotPoolImpl.java:534)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl.releaseSingleSlot(SlotPoolImpl.java:479)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl.releaseSlot(SlotPoolImpl.java:390)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.release(SlotSharingManager.java:557)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.releaseChild(SlotSharingManager.java:607)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.access$700(SlotSharingManager.java:352)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$SingleTaskSlot.release(SlotSharingManager.java:716)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.releaseSharedSlot(SchedulerImpl.java:552)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.cancelSlotRequest(SchedulerImpl.java:184)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.returnLogicalSlot(SchedulerImpl.java:195)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot.lambda$returnSlotToOwner$0(SingleLogicalSlot.java:181)
>  at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
>  at 
>