[ 
https://issues.apache.org/jira/browse/FLINK-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550201#comment-14550201
 ] 

Stephan Ewen commented on FLINK-1952:
-------------------------------------

I cannot reproduce this error any more with these steps:
  - ConnectedComponents (successful)
  - KMeans (runs out of buffers)
  - ConnectedComponents

The third program now succeeds as expected.

I think the reason is that the task lifecycle is more robust now (with my 
changes from two weeks ago). 

Still, it does not mean that the slot sharing groups are bug free. It only 
means that this situation does not trigger that bug any more...

> Cannot run ConnectedComponents example: Could not allocate a slot on instance
> -----------------------------------------------------------------------------
>
>                 Key: FLINK-1952
>                 URL: https://issues.apache.org/jira/browse/FLINK-1952
>             Project: Flink
>          Issue Type: Bug
>          Components: Scheduler
>    Affects Versions: 0.9
>            Reporter: Robert Metzger
>            Priority: Blocker
>
> Steps to reproduce
> {code}
> ./bin/yarn-session.sh -n 350 
> {code}
> ... wait until they are connected ...
> {code}
> Number of connected TaskManagers changed to 266. Slots available: 266
> Number of connected TaskManagers changed to 323. Slots available: 323
> Number of connected TaskManagers changed to 334. Slots available: 334
> Number of connected TaskManagers changed to 343. Slots available: 343
> Number of connected TaskManagers changed to 350. Slots available: 350
> {code}
> Start CC
> {code}
> ./bin/flink run -p 350 
> ./examples/flink-java-examples-0.9-SNAPSHOT-ConnectedComponents.jar
> {code}
> ---> it runs
> Run KMeans, let it fail with 
> {code}
> Failed to deploy the task Map (Map at main(KMeans.java:100)) (1/350) - 
> execution #0 to slot SimpleSlot (2)(2)(0) - 182b7661ca9547a84591de940c47a200 
> - ALLOCATED/ALIVE: java.io.IOException: Insufficient number of network 
> buffers: required 350, but only 254 available. The total number of network 
> buffers is currently set to 2048. You can increase this number by setting the 
> configuration key 'taskmanager.network.numberOfBuffers'.
> {code}
> ... as expected.
> (I've waited for 10 minutes between the two submissions)
> Starting CC now will fail:
> {code}
> ./bin/flink run -p 350 
> ./examples/flink-java-examples-0.9-SNAPSHOT-ConnectedComponents.jar 
> {code}
> Error message(s):
> {code}
> Caused by: java.lang.IllegalStateException: Could not schedule consumer 
> vertex IterationHead(WorksetIteration (Unnamed Delta Iteration)) (19/350)
>       at 
> org.apache.flink.runtime.executiongraph.Execution$3.call(Execution.java:479)
>       at 
> org.apache.flink.runtime.executiongraph.Execution$3.call(Execution.java:469)
>       at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:94)
>       at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>       at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>       at 
> scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
>       ... 4 more
> Caused by: 
> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: 
> Could not allocate a slot on instance 4a6d761cb084c32310ece1f849556faf @ 
> cloud-19 - 1 slots - URL: 
> akka.tcp://flink@130.149.21.23:51400/user/taskmanager, as required by the 
> co-location constraint.
>       at 
> org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:247)
>       at 
> org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleImmediately(Scheduler.java:110)
>       at 
> org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:262)
>       at 
> org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:436)
>       at 
> org.apache.flink.runtime.executiongraph.Execution$3.call(Execution.java:475)
>       ... 9 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to