Hi all

I run spark on mesos cluster, and meet a problem : when I send 6 spark
drivers *at the same time*, I can get the Information on node3:8081 that
there are 4 drivers in "Launched Drivers" and 2 in "Queueed Drivers". On
mesos:5050, I can see there are 4 active tasks are running, but each task
sends the warning : "Initial job has not accepted any resources; check your
cluster UI to ensure that workers are registered and have sufficient
resources". At this time, I see mesos all cpus are used on node1:5050 and
running forever until I kiil a task.

My question is : should I control each spark driver cores by myself or
MesosClusterDispatcher can help me to control driver-cores ?

Could someone can help me ?

Thank you

spark-submit script :
bin/spark-submit --name org.apache.spark.examples.SparkPi --deploy-mode
cluster --supervise --master mesos://node3:7077 --driver-cores 1.0
--driver-memory 1024M --class org.apache.spark.examples.SparkPi
hdfs://node4:9000/spark/spark/lib/spark-examples-1.6.1-hadoop2.6.0.jar 10

My environment :
mesos version 0.28.1 , node1 master, node2 slave
MesosClusterDispatcher, node3
hadoop cluster version 2.6.4, node4, node5

each node has 4cpu, 8G ram

config file :
spark-defaults-conf
spark.master    mesos://node3:7077
spark.executor.uri
 hdfs://node4:9000/spark/spark-1.6.1-bin-hadoop2.6.tgz
spark.mesos.executor.home       /root/spark

.

*Stack Trace* :

I0627 22:18:41.414885  3422 sched.cpp:703] Framework registered with
24f88e71-0eee-4023-9e5d-2e4595e2c5b4-0002
16/06/27 22:18:41 INFO mesos.CoarseMesosSchedulerBackend: Registered
as framework ID 24f88e71-0eee-4023-9e5d-2e4595e2c5b4-0002
16/06/27 22:18:41 INFO util.Utils: Successfully started service
'org.apache.spark.network.netty.NettyBlockTransferService' on port
48636.
16/06/27 22:18:41 INFO netty.NettyBlockTransferService: Server created on 48636
16/06/27 22:18:41 INFO storage.BlockManagerMaster: Trying to register
BlockManager
16/06/27 22:18:41 INFO storage.BlockManagerMasterEndpoint: Registering
block manager 192.168.1.46:48636 with 511.5 MB RAM,
BlockManagerId(driver, 192.168.1.46, 48636)
16/06/27 22:18:41 INFO storage.BlockManagerMaster: Registered BlockManager
16/06/27 22:18:41 INFO mesos.CoarseMesosSchedulerBackend:
SchedulerBackend is ready for scheduling beginning after reached
minRegisteredResourcesRatio: 0.0
16/06/27 22:18:42 INFO spark.SparkContext: Starting job: reduce at
SparkPi.scala:36
16/06/27 22:18:42 INFO scheduler.DAGScheduler: Got job 0 (reduce at
SparkPi.scala:36) with 10 output partitions
16/06/27 22:18:42 INFO scheduler.DAGScheduler: Final stage:
ResultStage 0 (reduce at SparkPi.scala:36)
16/06/27 22:18:42 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/06/27 22:18:42 INFO scheduler.DAGScheduler: Missing parents: List()
16/06/27 22:18:42 INFO scheduler.DAGScheduler: Submitting ResultStage
0 (MapPartitionsRDD[1] at map at SparkPi.scala:32), which has no
missing parents
16/06/27 22:18:42 INFO storage.MemoryStore: Block broadcast_0 stored
as values in memory (estimated size 1904.0 B, free 1904.0 B)
16/06/27 22:18:42 INFO storage.MemoryStore: Block broadcast_0_piece0
stored as bytes in memory (estimated size 1218.0 B, free 3.0 KB)
16/06/27 22:18:42 INFO storage.BlockManagerInfo: Added
broadcast_0_piece0 in memory on 192.168.1.46:48636 (size: 1218.0 B,
free: 511.5 MB)
16/06/27 22:18:42 INFO spark.SparkContext: Created broadcast 0 from
broadcast at DAGScheduler.scala:1006
16/06/27 22:18:42 INFO scheduler.DAGScheduler: Submitting 10 missing
tasks from ResultStage 0 (MapPartitionsRDD[1] at map at
SparkPi.scala:32)
16/06/27 22:18:42 INFO scheduler.TaskSchedulerImpl: Adding task set
0.0 with 10 tasks
16/06/27 22:18:58 WARN scheduler.TaskSchedulerImpl: Initial job has
not accepted any resources; check your cluster UI to ensure that
workers are registered and have sufficient resources
16/06/27 22:19:13 WARN scheduler.TaskSchedulerImpl: Initial job has
not accepted any resources; check your cluster UI to ensure that
workers are registered and have sufficient resources
16/06/27 22:19:28 WARN scheduler.TaskSchedulerImpl: Initial job has
not accepted any resources; check your cluster UI to ensure that
workers are registered and have sufficient resources

Reply via email to