Maximilian Michels created FLINK-4296:
-----------------------------------------

             Summary: Scheduler accepts more tasks than it has task slots 
available
                 Key: FLINK-4296
                 URL: https://issues.apache.org/jira/browse/FLINK-4296
             Project: Flink
          Issue Type: Bug
          Components: JobManager, TaskManager
    Affects Versions: 1.1.0
            Reporter: Maximilian Michels
            Priority: Critical
             Fix For: 1.1.0, 1.2.0


Flink's scheduler doesn't support queued scheduling but expects to find all 
necessary task slots upon scheduling. If it does not it throws an error. Due to 
some changes in the latest master, this seems to be broken.

Flink accepts jobs with {{parallelism > total number of task slots}}, schedules 
and deploys tasks in all available task slots, and leaves the remaining tasks 
lingering forever.

Easy to reproduce: 
{code}
./bin/flink run -p TASK_SLOTS+n
{code} 

where {{TASK_SLOTS}} is the number of total task slots of the cluster and 
{{n>=1}}.

Here, {{p=11}}, {{TASK_SLOTS=10}}:

{noformat}
Cluster configuration: Standalone cluster with JobManager at 
localhost/127.0.0.1:6123
Using address localhost:6123 to connect to JobManager.
JobManager web interface address http://localhost:8081
Starting execution of program
Executing EnumTriangles example with default edges data set.
Use --edges to specify file input.
Printing result to stdout. Use --output to specify output path.
Submitting job with JobID: cd0c0b4cbe25643d8d92558168cfc045. Waiting for job 
completion.
08/01/2016 12:12:12     Job execution switched to status RUNNING.
08/01/2016 12:12:12     CHAIN DataSource (at 
getDefaultEdgeDataSet(EnumTrianglesData.java:57) 
(org.apache.flink.api.java.io.CollectionInputFormat)) -> Map (Map at 
main(EnumTriangles.java:108))(1/1) switched to SCHEDULED
08/01/2016 12:12:12     CHAIN DataSource (at 
getDefaultEdgeDataSet(EnumTrianglesData.java:57) 
(org.apache.flink.api.java.io.CollectionInputFormat)) -> Map (Map at 
main(EnumTriangles.java:108))(1/1) switched to DEPLOYING
08/01/2016 12:12:12     CHAIN DataSource (at 
getDefaultEdgeDataSet(EnumTrianglesData.java:57) 
(org.apache.flink.api.java.io.CollectionInputFormat)) -> Map (Map at 
main(EnumTriangles.java:108))(1/1) switched to RUNNING
08/01/2016 12:12:12     CHAIN DataSource (at 
getDefaultEdgeDataSet(EnumTrianglesData.java:57) 
(org.apache.flink.api.java.io.CollectionInputFormat)) -> Map (Map at 
main(EnumTriangles.java:108))(1/1) switched to FINISHED
08/01/2016 12:12:12     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(1/11) switched to SCHEDULED
08/01/2016 12:12:12     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(3/11) switched to SCHEDULED
08/01/2016 12:12:12     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(2/11) switched to SCHEDULED
08/01/2016 12:12:12     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(7/11) switched to SCHEDULED
08/01/2016 12:12:12     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(7/11) switched to DEPLOYING
08/01/2016 12:12:12     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(6/11) switched to SCHEDULED
08/01/2016 12:12:12     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(4/11) switched to SCHEDULED
08/01/2016 12:12:12     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(5/11) switched to SCHEDULED
08/01/2016 12:12:12     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(4/11) switched to DEPLOYING
08/01/2016 12:12:12     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(3/11) switched to DEPLOYING
08/01/2016 12:12:12     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(9/11) switched to SCHEDULED
08/01/2016 12:12:12     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(9/11) switched to DEPLOYING
08/01/2016 12:12:12     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(5/11) switched to DEPLOYING
08/01/2016 12:12:12     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(1/11) switched to DEPLOYING
08/01/2016 12:12:12     Join(Join at main(EnumTriangles.java:114))(1/11) 
switched to SCHEDULED
08/01/2016 12:12:12     Join(Join at main(EnumTriangles.java:114))(1/11) 
switched to DEPLOYING
08/01/2016 12:12:12     Join(Join at main(EnumTriangles.java:114))(2/11) 
switched to SCHEDULED
08/01/2016 12:12:12     Join(Join at main(EnumTriangles.java:114))(2/11) 
switched to DEPLOYING
08/01/2016 12:12:12     Join(Join at main(EnumTriangles.java:114))(3/11) 
switched to SCHEDULED
08/01/2016 12:12:12     Join(Join at main(EnumTriangles.java:114))(3/11) 
switched to DEPLOYING
08/01/2016 12:12:12     Join(Join at main(EnumTriangles.java:114))(4/11) 
switched to SCHEDULED
08/01/2016 12:12:12     Join(Join at main(EnumTriangles.java:114))(4/11) 
switched to DEPLOYING
08/01/2016 12:12:12     Join(Join at main(EnumTriangles.java:114))(5/11) 
switched to SCHEDULED
08/01/2016 12:12:12     Join(Join at main(EnumTriangles.java:114))(5/11) 
switched to DEPLOYING
08/01/2016 12:12:12     Join(Join at main(EnumTriangles.java:114))(6/11) 
switched to SCHEDULED
08/01/2016 12:12:12     Join(Join at main(EnumTriangles.java:114))(6/11) 
switched to DEPLOYING
08/01/2016 12:12:12     Join(Join at main(EnumTriangles.java:114))(7/11) 
switched to SCHEDULED
08/01/2016 12:12:12     Join(Join at main(EnumTriangles.java:114))(7/11) 
switched to DEPLOYING
08/01/2016 12:12:12     Join(Join at main(EnumTriangles.java:114))(8/11) 
switched to SCHEDULED
08/01/2016 12:12:12     Join(Join at main(EnumTriangles.java:114))(8/11) 
switched to DEPLOYING
08/01/2016 12:12:12     Join(Join at main(EnumTriangles.java:114))(9/11) 
switched to SCHEDULED
08/01/2016 12:12:12     Join(Join at main(EnumTriangles.java:114))(9/11) 
switched to DEPLOYING
08/01/2016 12:12:12     Join(Join at main(EnumTriangles.java:114))(10/11) 
switched to SCHEDULED
08/01/2016 12:12:12     Join(Join at main(EnumTriangles.java:114))(10/11) 
switched to DEPLOYING
08/01/2016 12:12:12     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(11/11) switched to SCHEDULED
08/01/2016 12:12:12     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(10/11) switched to SCHEDULED
08/01/2016 12:12:12     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(11/11) switched to DEPLOYING
08/01/2016 12:12:12     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(10/11) switched to DEPLOYING
08/01/2016 12:12:12     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(8/11) switched to SCHEDULED
08/01/2016 12:12:12     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(6/11) switched to DEPLOYING
08/01/2016 12:12:12     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(2/11) switched to DEPLOYING
08/01/2016 12:12:12     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(3/11) switched to RUNNING
08/01/2016 12:12:12     Join(Join at main(EnumTriangles.java:114))(11/11) 
switched to SCHEDULED
08/01/2016 12:12:12     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(1/11) switched to RUNNING
08/01/2016 12:12:12     Join(Join at main(EnumTriangles.java:114))(1/11) 
switched to RUNNING
08/01/2016 12:12:12     Join(Join at main(EnumTriangles.java:114))(2/11) 
switched to RUNNING
08/01/2016 12:12:12     Join(Join at main(EnumTriangles.java:114))(3/11) 
switched to RUNNING
08/01/2016 12:12:12     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(9/11) switched to RUNNING
08/01/2016 12:12:12     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(4/11) switched to RUNNING
08/01/2016 12:12:12     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(5/11) switched to RUNNING
08/01/2016 12:12:12     Join(Join at main(EnumTriangles.java:114))(7/11) 
switched to RUNNING
08/01/2016 12:12:12     Join(Join at main(EnumTriangles.java:114))(6/11) 
switched to RUNNING
08/01/2016 12:12:12     Join(Join at main(EnumTriangles.java:114))(8/11) 
switched to RUNNING
08/01/2016 12:12:12     Join(Join at main(EnumTriangles.java:114))(9/11) 
switched to RUNNING
08/01/2016 12:12:12     Join(Join at main(EnumTriangles.java:114))(10/11) 
switched to RUNNING
08/01/2016 12:12:12     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(10/11) switched to RUNNING
08/01/2016 12:12:12     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(11/11) switched to RUNNING
08/01/2016 12:12:12     Join(Join at main(EnumTriangles.java:114))(4/11) 
switched to RUNNING
08/01/2016 12:12:12     Join(Join at main(EnumTriangles.java:114))(5/11) 
switched to RUNNING
08/01/2016 12:12:12     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(7/11) switched to RUNNING
08/01/2016 12:12:12     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(2/11) switched to RUNNING
08/01/2016 12:12:12     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(6/11) switched to RUNNING
08/01/2016 12:12:13     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(1/11) switched to FINISHED
08/01/2016 12:12:13     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(2/11) switched to FINISHED
08/01/2016 12:12:13     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(7/11) switched to FINISHED
08/01/2016 12:12:13     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(6/11) switched to FINISHED
08/01/2016 12:12:13     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(3/11) switched to FINISHED
08/01/2016 12:12:13     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(9/11) switched to FINISHED
08/01/2016 12:12:13     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(11/11) switched to FINISHED
08/01/2016 12:12:13     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(5/11) switched to FINISHED
08/01/2016 12:12:13     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(10/11) switched to FINISHED
08/01/2016 12:12:13     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(4/11) switched to FINISHED
{noformat}

For {{8/11}}, the {{Join}} task switches to RUNNING, but the {{GroupReduce}} 
does not:
{noformat}
08/01/2016 12:12:12     Join(Join at main(EnumTriangles.java:114))(8/11) 
switched to SCHEDULED
08/01/2016 12:12:12     Join(Join at main(EnumTriangles.java:114))(8/11) 
switched to DEPLOYING
....
08/01/2016 12:12:12     GroupReduce (GroupReduce at 
main(EnumTriangles.java:112))(8/11) switched to SCHEDULED
....
{08/01/2016 12:12:12     Join(Join at main(EnumTriangles.java:114))(8/11) 
switched to RUNNING}}
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to