I’ve setup a Spark cluster in the last few weeks and everything is working, but 
I cannot run spark-shell interactively against the cluster from a remote host

  *   Deploy .jar to cluster from remote (laptop) spark-submit and have it run 
– Check
  *   Run .jar on spark-shell locally – Check
  *   Run same .jar on spark-shell on master server – Check
  *   Run spark-shell interactively against cluster on master server – Check
  *   Run spark-shell interactively from remote (laptop) against cluster – FAIL

It seems other people have faced this same issue:
http://apache-spark-user-list.1001560.n3.nabble.com/spark-shell-working-local-but-not-remote-td19727.html

I’m getting the same warnings about memory, despite plenty of memory being 
available for the job to run (see above working cases)

"WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your 
cluster UI to ensure that workers are registered and have sufficient memory”

Some have suggested it has to do with conflicts of Jars on the class path and 
that Spark is providing spurious memory error messages while the problem is 
really class path conflicts.
http://apache-spark-user-list.1001560.n3.nabble.com/WARN-ClusterScheduler-Initial-job-has-not-accepted-any-resources-check-your-cluster-UI-to-ensure-thay-td374.html#a396

Details:

  *   Cluster: 1 master, 3 workers on 4GB/4 core Ubuntu 14.04 LTS
  *   Local (aka remote laptop) MacBook Pro 10.10.1
  *   All running HotSpot Java (build 1.8.0_31-b13 and uild 1.8.0_25-b17)
  *   All running spark-1.2.0-bin-hadoop2.4
  *   Using Standalone cluster manager

Cluster UI:
*

Even when I clamp down to the most restrictive amounts, 1 core, 1 executor, 
128m (of 3G available), it still says I don’t have the resources:

>>>> Start Console example
$ spark-shell --executor-memory 128m --total-executor-cores 1 --driver-cores 1 
--master spark://XXXX:7077

15/01/24 15:57:29 INFO SparkILoop: Created spark context..
Spark context available as sc.

scala> val rdd = sc.parallelize(1 to 1000);
rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at 
<console>:12
scala> rdd.count

15/01/24 15:58:20 INFO BlockManagerMaster: Updated info of block 
broadcast_0_piece0
15/01/24 15:58:20 INFO SparkContext: Created broadcast 0 from broadcast at 
DAGScheduler.scala:838
15/01/24 15:58:20 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 
(ParallelCollectionRDD[0] at parallelize at <console>:12)
15/01/24 15:58:20 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
15/01/24 15:58:35 WARN TaskSchedulerImpl: Initial job has not accepted any 
resources; check your cluster UI to ensure that workers are registered and have 
sufficient memory
>>> End console example

So, can anyone tell me if remote interactive spark-shell on a Standalone 
cluster even works? Thanks for your help.

Cluster UI below showing job is running on cluster, is using a driver app and 
worker, and that there are plenty of cores and GB of memory free.

[cid:533C8C93-7F29-4E0E-9FFE-647DCFEA6A6E]

Sincerely,
Joe Lust

Reply via email to