Zhou Parker created FLINK-24692:
-----------------------------------

             Summary: kubernetes session mode deployment failed since slot 
allocation timeout
                 Key: FLINK-24692
                 URL: https://issues.apache.org/jira/browse/FLINK-24692
             Project: Flink
          Issue Type: Bug
          Components: Deployment / Kubernetes
    Affects Versions: 1.11.2
            Reporter: Zhou Parker


Kubernetes: 1.15
Flink: 1.11.2
 
When submit {{TopSpeedWindowing demo with session mode on k8s. Job failed.}}
{{}}
{{log from JM:}}
 
Caused by: 
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: 
Could not allocate the required slot within slot request timeout. Please make 
sure that the cluster has enough resources.
    at 
org.apache.flink.runtime.scheduler.DefaultScheduler.maybeWrapWithNoResourceAvailableException(DefaultScheduler.java:441)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
    ... 45 more
Caused by: java.util.concurrent.CompletionException: 
java.util.concurrent.TimeoutException
    at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
 ~[?:1.8.0_275]
    at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
 ~[?:1.8.0_275]
    at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) 
~[?:1.8.0_275]
    at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
 ~[?:1.8.0_275]
    ... 25 more
Caused by: java.util.concurrent.TimeoutException
    ... 23 more
 

Log from TM:

 

2021-10-29 06:54:22,862 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService 
[] - Starting RPC endpoint for 
org.apache.flink.runtime.taskexecutor.TaskExecutor at 
akka://flink/user/rpc/taskmanager_0 .
2021-10-29 06:54:22,875 INFO 
org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Start job 
leader service.
2021-10-29 06:54:22,877 INFO org.apache.flink.runtime.filecache.FileCache [] - 
User file cache uses directory 
/tmp/flink-dist-cache-7fb5ad02-77e1-4942-8ab6-3e10347664c4
2021-10-29 06:54:22,935 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor 
[] - Connecting to ResourceManager 
akka.tcp://flink@test.default:6123/user/rpc/resourcemanager_*(00000000000000000000000000000000).
2021-10-29 06:54:22,940 DEBUG org.apache.flink.runtime.rpc.akka.AkkaRpcService 
[] - Try to connect to remote RPC endpoint with address 
akka.tcp://flink@test.default:6123/user/rpc/resourcemanager_*. Returning a 
org.apache.flink.runtime.resourcemanager.ResourceManagerGateway gateway.
2021-10-29 06:54:23,265 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor 
[] - Resolved ResourceManager address, beginning registration
2021-10-29 06:54:23,265 DEBUG 
org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Registration at 
ResourceManager attempt 1 (timeout=100ms)
2021-10-29 06:54:23,391 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor 
[] - Successful registration at resource manager 
akka.tcp://flink@test.default:6123/user/rpc/resourcemanager_* under 
registration id dca9eaff5da556d2b99bd447a07538b7.
2021-10-29 06:54:23,456 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor 
[] - Receive slot request 190c5be552e5aed60834096b6e1efc2f for job 
f5680609a3e78061e63e97268e1860c6 from resource manager with leader id 
00000000000000000000000000000000.
2021-10-29 06:54:23,462 DEBUG org.apache.flink.runtime.memory.MemoryManager [] 
- Initialized MemoryManager with total memory size 536870920 and page size 
32768.
2021-10-29 06:54:23,464 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor 
[] - Allocated slot for 190c5be552e5aed60834096b6e1efc2f.
2021-10-29 06:54:23,465 INFO 
org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Add job 
f5680609a3e78061e63e97268e1860c6 for job leader monitoring.
2021-10-29 06:54:23,466 DEBUG 
org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - New leader 
information for job f5680609a3e78061e63e97268e1860c6. Address: 
akka.tcp://flink@test.default:6123/user/rpc/jobmanager_2, leader id: 
00000000000000000000000000000000.
2021-10-29 06:54:23,467 INFO 
org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Try to 
register at job manager 
akka.tcp://flink@test.default:6123/user/rpc/jobmanager_2 with leader id 
00000000-0000-0000-0000-000000000000.
2021-10-29 06:54:23,468 DEBUG org.apache.flink.runtime.rpc.akka.AkkaRpcService 
[] - Try to connect to remote RPC endpoint with address 
akka.tcp://flink@test.default:6123/user/rpc/jobmanager_2. Returning a 
org.apache.flink.runtime.jobmaster.JobMasterGateway gateway.
2021-10-29 06:54:23,541 INFO 
org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Resolved 
JobManager address, beginning registration
2021-10-29 06:54:23,542 DEBUG 
org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Registration 
at JobManager attempt 1 (timeout=100ms)
2021-10-29 06:54:23,660 DEBUG 
org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Registration 
at JobManager (akka.tcp://flink@test.default:6123/user/rpc/jobmanager_2) 
attempt 1 timed out after 100 ms
2021-10-29 06:54:23,660 DEBUG 
org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Registration 
at JobManager attempt 2 (timeout=200ms)
2021-10-29 06:54:23,878 DEBUG 
org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Registration 
at JobManager (akka.tcp://flink@test.default:6123/user/rpc/jobmanager_2) 
attempt 2 timed out after 200 ms
2021-10-29 06:54:23,879 DEBUG 
org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Registration 
at JobManager attempt 3 (timeout=400ms)
2021-10-29 06:54:24,299 DEBUG 
org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Registration 
at JobManager (akka.tcp://flink@test.default:6123/user/rpc/jobmanager_2) 
attempt 3 timed out after 400 ms
2021-10-29 06:54:24,299 DEBUG 
org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Registration 
at JobManager attempt 4 (timeout=800ms)
2021-10-29 06:54:25,118 DEBUG 
org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Registration 
at JobManager (akka.tcp://flink@test.default:6123/user/rpc/jobmanager_2) 
attempt 4 timed out after 800 ms
2021-10-29 06:54:25,119 DEBUG 
org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Registration 
at JobManager attempt 5 (timeout=1600ms)
2021-10-29 06:54:26,603 DEBUG 
org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Received heartbeat 
request from 8edb8ed60a1b18ffb9913e3d01670115.
2021-10-29 06:54:26,739 DEBUG 
org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Registration 
at JobManager (akka.tcp://flink@test.default:6123/user/rpc/jobmanager_2) 
attempt 5 timed out after 1600 ms
2021-10-29 06:54:26,739 DEBUG 
org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Registration 
at JobManager attempt 6 (timeout=3200ms)
2021-10-29 06:54:29,958 DEBUG 
org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Registration 
at JobManager (akka.tcp://flink@test.default:6123/user/rpc/jobmanager_2) 
attempt 6 timed out after 3200 ms
2021-10-29 06:54:29,959 DEBUG 
org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Registration 
at JobManager attempt 7 (timeout=6400ms)
2021-10-29 06:54:33,465 DEBUG 
org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Free slot with 
allocation id 190c5be552e5aed60834096b6e1efc2f because: The slot 
190c5be552e5aed60834096b6e1efc2f has timed out.
2021-10-29 06:54:33,466 DEBUG 
org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl [] - Free slot 
TaskSlot(index:0, state:ALLOCATED, resource profile: 
ResourceProfile\{cpuCores=1.0000000000000000, taskHeapMemory=384.000mb 
(402653174 bytes), taskOffHeapMemory=0 bytes, managedMemory=512.000mb 
(536870920 bytes), networkMemory=128.000mb (134217730 bytes)}, allocationId: 
190c5be552e5aed60834096b6e1efc2f, jobId: f5680609a3e78061e63e97268e1860c6).
java.lang.Exception: The slot 190c5be552e5aed60834096b6e1efc2f has timed out.
 at 
org.apache.flink.runtime.taskexecutor.TaskExecutor.timeoutSlot(TaskExecutor.java:1653)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
 at 
org.apache.flink.runtime.taskexecutor.TaskExecutor.access$2800(TaskExecutor.java:173)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
 at 
org.apache.flink.runtime.taskexecutor.TaskExecutor$SlotActionsImpl.lambda$timeoutSlot$1(TaskExecutor.java:1940)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
 at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:402)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
 at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:195)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
 at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
 at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
 at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
 at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
 at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
 at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
 at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
 at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
 at akka.actor.Actor$class.aroundReceive(Actor.scala:517) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
 at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
 at akka.actor.ActorCell.invoke(ActorCell.scala:561) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
 at akka.dispatch.Mailbox.run(Mailbox.scala:225) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
 at akka.dispatch.Mailbox.exec(Mailbox.scala:235) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
 at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
 at 
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
 at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
 at 
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
2021-10-29 06:54:33,471 INFO 
org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Remove job 
f5680609a3e78061e63e97268e1860c6 from job leader monitoring.
2021-10-29 06:54:33,471 DEBUG 
org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Retrying 
registration towards akka.tcp://flink@test.default:6123/user/rpc/jobmanager_2 
was cancelled.
2021-10-29 06:54:33,472 DEBUG 
org.apache.flink.runtime.state.TaskExecutorLocalStateStoresManager [] - 
Releasing local state under allocation id 190c5be552e5aed60834096b6e1efc2f.
2021-10-29 06:54:36,622 DEBUG 
org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Received heartbeat 
request from 8edb8ed60a1b18ffb9913e3d01670115.
2021-10-29 06:54:46,642 DEBUG 
org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Received heartbeat 
request from 8edb8ed60a1b18ffb9913e3d01670115.
2021-10-29 06:54:56,662 DEBUG 
org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Received heartbeat 
request from 8edb8ed60a1b18ffb9913e3d01670115.
2021-10-29 06:55:06,616 DEBUG 
org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Close ResourceManager 
connection 8edb8ed60a1b18ffb9913e3d01670115.
org.apache.flink.util.FlinkException: TaskExecutor exceeded the idle timeout.
 at 
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl.releaseTaskExecutor(SlotManagerImpl.java:1258)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
 at 
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl.lambda$releaseTaskExecutorIfPossible$14(SlotManagerImpl.java:1251)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
 at 
java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670) 
~[?:1.8.0_275]
 at 
java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:646)
 ~[?:1.8.0_275]
 at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
 ~[?:1.8.0_275]
 at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:402)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
 at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:195)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
 at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
 at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
 at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
 at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
 at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
 at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
 at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
 at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
 at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
 at akka.actor.Actor$class.aroundReceive(Actor.scala:517) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
 at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
 at akka.actor.ActorCell.invoke(ActorCell.scala:561) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
 at akka.dispatch.Mailbox.run(Mailbox.scala:225) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
 at akka.dispatch.Mailbox.exec(Mailbox.scala:235) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
 at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
 at 
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
 at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
 at 
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
2021-10-29 06:55:06,622 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor 
[] - Connecting to ResourceManager 
akka.tcp://flink@test.default:6123/user/rpc/resourcemanager_*(00000000000000000000000000000000).
2021-10-29 06:55:06,623 DEBUG org.apache.flink.runtime.rpc.akka.AkkaRpcService 
[] - Try to connect to remote RPC endpoint with address 
akka.tcp://flink@test.default:6123/user/rpc/resourcemanager_*. Returning a 
org.apache.flink.runtime.resourcemanager.ResourceManagerGateway gateway.
2021-10-29 06:55:06,631 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor 
[] - Resolved ResourceManager address, beginning registration
2021-10-29 06:55:06,631 DEBUG 
org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Registration at 
ResourceManager attempt 1 (timeout=100ms)
2021-10-29 06:55:06,636 INFO 
org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - 
RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested.
2021-10-29 06:55:06,638 INFO org.apache.flink.runtime.blob.TransientBlobCache 
[] - Shutting down BLOB cache
2021-10-29 06:55:06,639 DEBUG 
org.apache.flink.runtime.io.disk.iomanager.IOManager [] - Shutting down I/O 
manager.
2021-10-29 06:55:06,640 INFO org.apache.flink.runtime.filecache.FileCache [] - 
removed file cache directory 
/tmp/flink-dist-cache-7fb5ad02-77e1-4942-8ab6-3e10347664c4
2021-10-29 06:55:06,641 INFO org.apache.flink.runtime.blob.PermanentBlobCache 
[] - Shutting down BLOB cache
2021-10-29 06:55:06,643 INFO 
org.apache.flink.runtime.state.TaskExecutorLocalStateStoresManager [] - 
Shutting down TaskExecutorLocalStateStoresManager.
2021-10-29 06:55:06,645 INFO 
org.apache.flink.runtime.io.disk.FileChannelManagerImpl [] - FileChannelManager 
removed spill file directory /tmp/flink-io-66cad1f9-ce74-4c01-a02b-32d2e11dcb5a
2021-10-29 06:55:06,646 INFO 
org.apache.flink.runtime.io.disk.FileChannelManagerImpl [] - FileChannelManager 
removed spill file directory 
/tmp/flink-netty-shuffle-bbc6e6a4-9973-48a5-83b1-3ef94d8605f3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to