Yes, that's how file: URLs are interpreted everywhere in Spark. (It's also
explained in the link to the docs I posted earlier.)
The second interpretation below is local: URLs in Spark, but that doesn't
work with Yarn on Spark 1.0 (so it won't work with CDH 5.1 and older
either).
On Mon, Sep 8,
Hello friends:
It was mentioned in another (Y.A.R.N.-centric) email thread that
'SPARK_JAR' was deprecated,
and to use the 'spark.yarn.jar' property instead for YARN submission.
For example:
user$ pyspark [some-options] --driver-java-options
On Mon, Sep 8, 2014 at 9:35 AM, Dimension Data, LLC.
subscripti...@didata.us wrote:
user$ pyspark [some-options] --driver-java-options
spark.yarn.jar=hdfs://namenode:8020/path/to/spark-assembly-*.jar
This command line does not look correct. spark.yarn.jar is not a JVM
command line option.
On Mon, Sep 8, 2014 at 10:00 AM, Dimension Data, LLC.
subscripti...@didata.us wrote:
user$ export MASTER=local[nn] # Run spark shell on LOCAL CPU threads.
user$ pyspark [someOptions] --driver-java-options -Dspark.*XYZ*.jar='
/usr/lib/spark/assembly/lib/spark-assembly-*.jar'
My question is,
On Mon, Sep 8, 2014 at 11:52 AM, Dimension Data, LLC.
subscripti...@didata.us wrote:
So just to clarify for me: When specifying 'spark.yarn.jar' as I did
above, even if I don't use HDFS to create a
RDD (e.g. do something simple like: 'sc.parallelize(range(100))'), it is
still necessary to
On Mon, Sep 8, 2014 at 3:54 PM, Dimension Data, LLC.
subscripti...@didata.us wrote:
You're probably right about the above because, as seen *below* for
pyspark (but probably for other Spark
applications too), once '-Dspark.master=[yarn-client|yarn-cluster]' is
specified, the app invocation