[ 
https://issues.apache.org/jira/browse/FLINK-9056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16426596#comment-16426596
 ] 

Fabian Hueske commented on FLINK-9056:
--------------------------------------

I assume that the following happens: 

The client job submission call blocks while the JobManager tries to start the 
job. However, the job will never start because there are not enough slots to 
achieve the requested parallelism. At some point the Akka request times out and 
throws the reported exception.

I think it is OK to throw an exception, however it would be better if the 
exception would indicate why the job could not be started. If the situation is 
as I assumed, this won't be trivial, because right now the client times out and 
for a proper error message, the JM would need to propagate an exception.

> Job submission fails with AskTimeoutException if not enough slots are 
> available
> -------------------------------------------------------------------------------
>
>                 Key: FLINK-9056
>                 URL: https://issues.apache.org/jira/browse/FLINK-9056
>             Project: Flink
>          Issue Type: Improvement
>          Components: Job-Submission
>    Affects Versions: 1.5.0
>         Environment: * FLIP-6 enabled
>  * Local Flink instance with fixed number of TMs
>  * Job parallelism exceeds available slots
>            Reporter: Fabian Hueske
>            Assignee: yuqi
>            Priority: Major
>
> The error message if a job submission fails due to lack of available slots is 
> not helpful:
> {code:java}
> Caused by: akka.pattern.AskTimeoutException: Ask timed out on 
> [Actor[akka://flink/user/8f0fabba-4021-45b6-a1f7-b8afd6627640#-574617182|#-574617182]]
>  after [300000 ms]. Sender[null] sent message of type 
> "org.apache.flink.runtime.rpc.messages.LocalRpcInvocation".
>      at 
> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
>      at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
>      at 
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
>      at 
> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
>      at 
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
>      at 
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
>      at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
>      at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
>      at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
>      at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to