Re: Why spark-submit command hangs?

2014-07-22 Thread Earthson
I've just have the same problem.

I'm using

$SPARK_HOME/bin/spark-submit --master yarn --deploy-mode client $JOBJAR
--class $JOBCLASS

It's really strange, because the log shows that 

14/07/22 16:16:58 INFO ui.SparkUI: Started SparkUI at
14/07/22 16:16:58 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
14/07/22 16:16:58 INFO spark.SparkContext: Added JAR
with timestamp 1406017018666
14/07/22 16:16:58 INFO cluster.YarnClusterScheduler: Created
14/07/22 16:16:58 INFO yarn.ApplicationMaster$$anon$1: Adding shutdown hook
for context org.apache.spark.SparkContext@41ecfc8c

Why cluster.YarnClusterScheduler start? where's the Client?

View this message in context:
Sent from the Apache Spark User List mailing list archive at

Re: Why spark-submit command hangs?

2014-07-22 Thread Earthson
That's what my problem is:)

View this message in context:
Sent from the Apache Spark User List mailing list archive at

Re: Why spark-submit command hangs?

2014-07-22 Thread Andrew Or
Hi Earthson,

Is your problem resolved? The way you submit your application looks alright
to me; spark-submit should be able to parse the combination of --master and
--deploy-mode correctly. I suspect you might have hard-coded yarn-cluster
or something in your application.


2014-07-22 1:25 GMT-07:00 Earthson

 That's what my problem is:)

 View this message in context:
 Sent from the Apache Spark User List mailing list archive at

Re: Why spark-submit command hangs?

2014-07-21 Thread Andrew Or
Hi Sam,

Did you specify the MASTER in your I ask because I didn't see
a --master in your launch command. Also, your app seems to take in a master
(yarn-standalone). This is not exactly correct because by the time the
SparkContext is launched locally, which is the default, it is too late to
use yarn-cluster mode by definition, since the driver should launched
within one of the containers on the worker machines.

I would suggest the following:
- change your application to not take in the Spark master as a command line
- use yarn-cluster instead of yarn-standalone (which is deprecated)
- add --master yarn-cluster in your spark-submit command

Another worrying thing is the warning from your logs:
14/07/21 22:38:42 WARN spark.SparkConf: null jar passed to SparkContext

How are you creating your SparkContext?


2014-07-21 7:47 GMT-07:00 Sam Liu

 Hi Experts,

 I setup Yarn and Spark env: all services runs on a single node. And then
 submited a WordCount job using spark-submit script with command:

 ./bin/spark-submit tests/wordcount-spark-scala.jar --class
 scala.spark.WordCount --num-executors 1 --driver-memory 300M
 --executor-memory 300M --executor-cores 1 yarn-standalone
 hdfs://hostname/tmp/input hdfs://hostname/tmp/output

 However, the command hangs and no job is submited to Yarn. Any comments?


 Spark assembly has been built with Hive, including Datanucleus jars on
 14/07/21 22:38:42 WARN spark.SparkConf: null jar passed to SparkContext
 14/07/21 22:38:43 INFO spark.SecurityManager: Changing view acls to:
 14/07/21 22:38:43 INFO spark.SecurityManager: SecurityManager:
 authentication disabled; ui acls disabled; users with view permissions:
 14/07/21 22:38:43 INFO slf4j.Slf4jLogger: Slf4jLogger started
 14/07/21 22:38:43 INFO Remoting: Starting remoting
 14/07/21 22:38:43 INFO Remoting: Remoting started; listening on addresses
 14/07/21 22:38:43 INFO Remoting: Remoting now listens on addresses:
 14/07/21 22:38:43 INFO spark.SparkEnv: Registering MapOutputTracker
 14/07/21 22:38:43 INFO spark.SparkEnv: Registering BlockManagerMaster
 14/07/21 22:38:43 INFO storage.DiskBlockManager: Created local directory
 at /tmp/spark-local-20140721223843-75cd
 14/07/21 22:38:43 INFO storage.MemoryStore: MemoryStore started with
 capacity 180.0 MB.
 14/07/21 22:38:43 INFO network.ConnectionManager: Bound socket to port
 57453 with id = ConnectionManagerId(hostname,57453)
 14/07/21 22:38:43 INFO storage.BlockManagerMaster: Trying to register
 14/07/21 22:38:43 INFO storage.BlockManagerInfo: Registering block manager
 hostname:57453 with 180.0 MB RAM
 14/07/21 22:38:43 INFO storage.BlockManagerMaster: Registered BlockManager
 14/07/21 22:38:43 INFO spark.HttpServer: Starting HTTP Server
 14/07/21 22:38:43 INFO server.Server: jetty-8.y.z-SNAPSHOT
 14/07/21 22:38:43 INFO server.AbstractConnector: Started
 14/07/21 22:38:43 INFO broadcast.HttpBroadcast: Broadcast server started
 14/07/21 22:38:43 INFO spark.HttpFileServer: HTTP File server directory is
 14/07/21 22:38:43 INFO spark.HttpServer: Starting HTTP Server
 14/07/21 22:38:43 INFO server.Server: jetty-8.y.z-SNAPSHOT
 14/07/21 22:38:43 INFO server.AbstractConnector: Started
 14/07/21 22:38:43 INFO server.Server: jetty-8.y.z-SNAPSHOT
 14/07/21 22:38:43 INFO server.AbstractConnector: Started
 14/07/21 22:38:43 INFO ui.SparkUI: Started SparkUI at http://hostname:4040
 14/07/21 22:38:44 WARN util.NativeCodeLoader: Unable to load native-hadoop
 library for your platform... using builtin-java classes where applicable
 14/07/21 22:38:44 WARN spark.SparkContext: yarn-standalone is deprecated
 as of Spark 1.0. Use yarn-cluster instead.
 14/07/21 22:38:44 INFO cluster.YarnClusterScheduler: Created
 14/07/21 22:38:44 INFO yarn.ApplicationMaster$$anon$1: Adding shutdown
 hook for context org.apache.spark.SparkContext@610c610c

 Sam Liu