Hi all I run spark on mesos cluster, and meet a problem : when I send 6 spark drivers *at the same time*, I can get the Information on node3:8081 that there are 4 drivers in "Launched Drivers" and 2 in "Queueed Drivers". On mesos:5050, I can see there are 4 active tasks are running, but each task sends the warning : "Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources". At this time, I see mesos all cpus are used on node1:5050 and running forever until I kiil a task.
My question is : should I control each spark driver cores by myself or MesosClusterDispatcher can help me to control driver-cores ? Could someone can help me ? Thank you spark-submit script : bin/spark-submit --name org.apache.spark.examples.SparkPi --deploy-mode cluster --supervise --master mesos://node3:7077 --driver-cores 1.0 --driver-memory 1024M --class org.apache.spark.examples.SparkPi hdfs://node4:9000/spark/spark/lib/spark-examples-1.6.1-hadoop2.6.0.jar 10 My environment : mesos version 0.28.1 , node1 master, node2 slave MesosClusterDispatcher, node3 hadoop cluster version 2.6.4, node4, node5 each node has 4cpu, 8G ram config file : spark-defaults-conf spark.master mesos://node3:7077 spark.executor.uri hdfs://node4:9000/spark/spark-1.6.1-bin-hadoop2.6.tgz spark.mesos.executor.home /root/spark . *Stack Trace* : I0627 22:18:41.414885 3422 sched.cpp:703] Framework registered with 24f88e71-0eee-4023-9e5d-2e4595e2c5b4-0002 16/06/27 22:18:41 INFO mesos.CoarseMesosSchedulerBackend: Registered as framework ID 24f88e71-0eee-4023-9e5d-2e4595e2c5b4-0002 16/06/27 22:18:41 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 48636. 16/06/27 22:18:41 INFO netty.NettyBlockTransferService: Server created on 48636 16/06/27 22:18:41 INFO storage.BlockManagerMaster: Trying to register BlockManager 16/06/27 22:18:41 INFO storage.BlockManagerMasterEndpoint: Registering block manager 192.168.1.46:48636 with 511.5 MB RAM, BlockManagerId(driver, 192.168.1.46, 48636) 16/06/27 22:18:41 INFO storage.BlockManagerMaster: Registered BlockManager 16/06/27 22:18:41 INFO mesos.CoarseMesosSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 16/06/27 22:18:42 INFO spark.SparkContext: Starting job: reduce at SparkPi.scala:36 16/06/27 22:18:42 INFO scheduler.DAGScheduler: Got job 0 (reduce at SparkPi.scala:36) with 10 output partitions 16/06/27 22:18:42 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:36) 16/06/27 22:18:42 INFO scheduler.DAGScheduler: Parents of final stage: List() 16/06/27 22:18:42 INFO scheduler.DAGScheduler: Missing parents: List() 16/06/27 22:18:42 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:32), which has no missing parents 16/06/27 22:18:42 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1904.0 B, free 1904.0 B) 16/06/27 22:18:42 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1218.0 B, free 3.0 KB) 16/06/27 22:18:42 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.46:48636 (size: 1218.0 B, free: 511.5 MB) 16/06/27 22:18:42 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006 16/06/27 22:18:42 INFO scheduler.DAGScheduler: Submitting 10 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:32) 16/06/27 22:18:42 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 10 tasks 16/06/27 22:18:58 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 16/06/27 22:19:13 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 16/06/27 22:19:28 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources