, Gerard.
On Tue, Feb 17, 2015 at 1:45 PM, Emre Sevinc emre.sev...@gmail.com wrote:
I've decided to try
spark-submit ... --conf
spark.driver.extraJavaOptions=-DpropertiesFile=/home/emre/data/myModule.properties
But when I try to retrieve the value of propertiesFile via
System.err.println
I've decided to try
spark-submit ... --conf
spark.driver.extraJavaOptions=-DpropertiesFile=/home/emre/data/myModule.properties
But when I try to retrieve the value of propertiesFile via
System.err.println(propertiesFile : +
System.getProperty(propertiesFile));
I get NULL
directory, and
that
file is put inside the über JAR file when I build my application with
Maven,
and then when I submit it using spark-submit, I can read that
module.properties file via the traditional method:
properties.load(MyModule.class.getClassLoader().getResourceAsStream
Hello,
I'm using Spark 1.2.1 and have a module.properties file, and in it I have
non-Spark properties, as well as Spark properties, e.g.:
job.output.dir=file:///home/emre/data/mymodule/out
I'm trying to pass it to spark-submit via:
spark-submit --class com.myModule --master local[4
to pass it to spark-submit via:
spark-submit --class com.myModule --master local[4] --deploy-mode client
--verbose --properties-file /home/emre/data/mymodule.properties mymodule.jar
And I thought I could read the value of my non-Spark property, namely,
job.output.dir by using:
SparkConf
Sean,
I'm trying this as an alternative to what I currently do. Currently I have
my module.properties file for my module in the resources directory, and
that file is put inside the über JAR file when I build my application with
Maven, and then when I submit it using spark-submit, I can read
.:
job.output.dir=file:///home/emre/data/mymodule/out
I'm trying to pass it to spark-submit via:
spark-submit --class com.myModule --master local[4] --deploy-mode
client --verbose --properties-file /home/emre/data/mymodule.properties
mymodule.jar
And I thought I could read the value
-configuration:commons 1.8
Is there any workaround for this?
Greg
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/spark-submit-conflicts-with-dependencies-tp8909p21399.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
.nabble.com/spark-submit-conflicts-with-dependencies-tp8909p21399.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e
-list.1001560.n3.nabble.com/spark-submit-conflicts-with-dependencies-tp8909p21399.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
/dependency
Ey-Chih
Date: Tue, 20 Jan 2015 16:57:20 -0800 Subject: Re: Spark 1.1.0 - spark-submit
failedFrom: yuzhih...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org
Please check which netty jar(s) are on the classpath.
NioWorkerPool(Executor workerExecutor, int
are supported:
s3://pathtomybucket/mylibrary.py.
Simplest way to reproduce in local:
bin/spark-submit --py-files s3://whatever.path.com/library.py main.py
Actual commands to run it in EMR
#launch cluster
aws emr create-cluster --name SparkCluster --ami-version 3.3.1
--instance-type m1.medium
those files with
'--py-files' and it works fine in local mode but it fails and gives me
following message when run in EMR:
Error: Only local python files are supported:
s3://pathtomybucket/mylibrary.py.
Simplest way to reproduce in local:
bin/spark-submit --py-files s3://whatever.path.com
those files with
'--py-files' and it works fine in local mode but it fails and gives me
following message when run in EMR:
Error: Only local python files are supported:
s3://pathtomybucket/mylibrary.py.
Simplest way to reproduce in local:
bin/spark-submit --py-files s3://whatever.path.com
message when run in EMR:
Error: Only local python files are supported:
s3://pathtomybucket/mylibrary.py.
Simplest way to reproduce in local:
bin/spark-submit --py-files s3://whatever.path.com/library.py main.py
Actual commands to run it in EMR
#launch cluster
aws emr create-cluster --name
Hi,
I issued the following command in a ec2 cluster launched using spark-ec2:
~/spark/bin/spark-submit --class com.crowdstar.cluster.etl.ParseAndClean
--master spark://ec2-54-185-107-113.us-west-2.compute.amazonaws.com:7077
--deploy-mode cluster --total-executor-cores 4
file:///tmp/etl-admin/jar
:
~/spark/bin/spark-submit --class com.crowdstar.cluster.etl.ParseAndClean
--master spark://ec2-54-185-107-113.us-west-2.compute.amazonaws.com:7077
--deploy-mode cluster --total-executor-cores 4
file:///tmp/etl-admin/jar/spark-etl-0.0.1-SNAPSHOT.jar
/ETL/input/2015/01/10/12/10Jan2015.avro
file
those files with
'--py-files' and it works fine in local mode but it fails and gives me
following message when run in EMR:
Error: Only local python files are supported:
s3://pathtomybucket/mylibrary.py.
Simplest way to reproduce in local:
bin/spark-submit --py-files s3://whatever.path.com
are accessible by Yarn (when running
the applications repeatedly I've seen all five nodes in use in various
attempts.) The applications are written in python and are run via spark-submit
with yarn-client as the master.
Example application submission: bin/spark-submit --num-executors 1 --conf
Hi,
On Fri, Dec 12, 2014 at 7:01 AM, ryaminal tacmot...@gmail.com wrote:
Now our solution is to make a very simply YARN application which execustes
as its command spark-submit --master yarn-cluster
s3n://application/jar.jar
This seemed so simple and elegant, but it has some weird issues
made
private in future releases.
Now our solution is to make a very simply YARN application which execustes
as its command spark-submit --master yarn-cluster s3n://application/jar.jar
This seemed so simple and elegant, but it has some weird issues. We
get NoClassDefFoundErrors. When we ssh
Hey Tobias,
Can you try using the YARN Fair Scheduler and set
yarn.scheduler.fair.continuous-scheduling-enabled to true?
-Sandy
On Sun, Dec 7, 2014 at 5:39 PM, Tobias Pfeiffer t...@preferred.jp wrote:
Hi,
thanks for your responses!
On Sat, Dec 6, 2014 at 4:22 AM, Sandy Ryza
Hi,
On Tue, Dec 9, 2014 at 4:39 AM, Sandy Ryza sandy.r...@cloudera.com wrote:
Can you try using the YARN Fair Scheduler and set
yarn.scheduler.fair.continuous-scheduling-enabled to true?
I'm using Cloudera 5.2.0 and my configuration says
yarn.resourcemanager.scheduler.class =
Hi,
thanks for your responses!
On Sat, Dec 6, 2014 at 4:22 AM, Sandy Ryza sandy.r...@cloudera.com wrote:
What version are you using? In some recent versions, we had a couple of
large hardcoded sleeps on the Spark side.
I am using Spark 1.1.1.
As Andrew mentioned, I guess most of the 10
executors have registered.
-Sandy
On Fri, Dec 5, 2014 at 12:46 PM, Ashish Rangole arang...@gmail.com
wrote:
Likely this not the case here yet one thing to point out with Yarn
parameters like --num-executors is that they should be specified *before*
app jar and app args on spark-submit
Hi, all:
According to https://github.com/apache/spark/pull/2732, When a spark job fails
or exits nonzero in yarn-cluster mode, the spark-submit will get the
corresponding return code of the spark job. But I tried in spark-1.1.1 yarn
cluster, spark-submit return zero anyway.
Here is my spark
I tried in spark client mode, spark-submit can get the correct return code from
spark job. But in yarn-cluster mode, It failed.
From: lin_q...@outlook.com
To: u...@spark.incubator.apache.org
Subject: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in
yarn-cluster mode
Date: Fri, 5
(sHiveFromSpark)val sc =
new SparkContext(sparkConf)val hiveContext = new HiveContext(sc)
// Before exitUtil.printLog(INFO, Exit)exit(100)}
There were two `exit` in this code. If the args was wrong, the spark-submit
will get the return code 101, but, if the args is correct, spark
(INFO, Exit)
exit(100)
}
There were two `exit` in this code. If the args was wrong, the
spark-submit will get the return code 101, but, if the args is correct,
spark-submit cannot get the second return code 100. What's the difference
between these two `exit`? I was so confused
of the
overhead comes from YARN itself.
In other words, no I don't know of any quick fix on your end that you can
do to speed this up.
-Andrew
2014-12-03 20:10 GMT-08:00 Tobias Pfeiffer t...@preferred.jp:
Hi,
I am using spark-submit to submit my application to YARN in yarn-cluster
mode. I have
quick fix on your end that you can
do to speed this up.
-Andrew
2014-12-03 20:10 GMT-08:00 Tobias Pfeiffer t...@preferred.jp:
Hi,
I am using spark-submit to submit my application to YARN in
yarn-cluster mode. I have both the Spark assembly jar file as well as my
application jar file put
.
In other words, no I don't know of any quick fix on your end that you can
do to speed this up.
-Andrew
2014-12-03 20:10 GMT-08:00 Tobias Pfeiffer t...@preferred.jp:
Hi,
I am using spark-submit to submit my application to YARN in
yarn-cluster mode. I have both the Spark assembly jar file
maybe also try to run SparkPi against YARN as a speed check.
spark-submit --class org.apache.spark.examples.SparkPi --deploy-mode
cluster --master yarn
/opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/jars/spark-examples-1.1.0-cdh5.2.1-hadoop2.5.0-cdh5.2.1.jar
10
On Fri, Dec 5, 2014 at 2:32 PM, Denny
.
-Andrew
2014-12-03 20:10 GMT-08:00 Tobias Pfeiffer t...@preferred.jp:
Hi,
I am using spark-submit to submit my application to YARN in
yarn-cluster mode. I have both the Spark assembly jar file as well as my
application jar file put in HDFS and can see from the logging output that
both
:10 GMT-08:00 Tobias Pfeiffer t...@preferred.jp:
Hi,
I am using spark-submit to submit my application to YARN in
yarn-cluster mode. I have both the Spark assembly jar file as well as my
application jar file put in HDFS and can see from the logging output that
both files are used from
Likely this not the case here yet one thing to point out with Yarn
parameters like --num-executors is that they should be specified *before*
app jar and app args on spark-submit command line otherwise the app only
gets the default number of containers which is 2.
On Dec 5, 2014 12:22 PM, Sandy
is that they should be specified *before*
app jar and app args on spark-submit command line otherwise the app only
gets the default number of containers which is 2.
On Dec 5, 2014 12:22 PM, Sandy Ryza sandy.r...@cloudera.com wrote:
Hi Denny,
Those sleeps were only at startup, so if jobs are taking
There were two exit in this code. If the args was wrong, the spark-submit
will get the return code 101, but, if the args is correct, spark-submit
cannot get the second return code 100. What’s the difference between these
two exit? I was so confused.
I’m also confused. When I tried your codes
, Ashish Rangole arang...@gmail.com
wrote:
Likely this not the case here yet one thing to point out with Yarn
parameters like --num-executors is that they should be specified *before*
app jar and app args on spark-submit command line otherwise the app only
gets the default number of containers
...@gmail.com
wrote:
Likely this not the case here yet one thing to point out with Yarn
parameters like --num-executors is that they should be specified *before*
app jar and app args on spark-submit command line otherwise the app only
gets the default number of containers which is 2.
On Dec 5, 2014 12
Hi,
I am using spark-submit to submit my application to YARN in yarn-cluster
mode. I have both the Spark assembly jar file as well as my application jar
file put in HDFS and can see from the logging output that both files are
used from there. However, it still takes about 10 seconds for my
!
However, in local[N] mode, neither that one nor
the spark.files.userClassPathFirst one works. So when using spark-submit
with --master local[3] instead of --master yarn-cluster, the value
for spark.files.userClassPathFirst is displayed correctly, but the classes
are still loaded from the wrong jar
Hi,
I am using spark-submit to submit my application jar to a YARN cluster. I
want to deliver a single jar file to my users, so I would like to avoid to
tell them also, please put that log4j.xml file somewhere and add that path
to the spark-submit command.
I thought it would be sufficient
, Tobias Pfeiffer t...@preferred.jp wrote:
Hi,
I am using spark-submit to submit my application jar to a YARN cluster. I
want to deliver a single jar file to my users, so I would like to avoid to
tell them also, please put that log4j.xml file somewhere and add that path
to the spark-submit
-on-yarn.html
On Thu, Nov 20, 2014 at 9:20 AM, Tobias Pfeiffer t...@preferred.jp wrote:
Hi,
I am using spark-submit to submit my application jar to a YARN cluster. I
want to deliver a single jar file to my users, so I would like to avoid to
tell them also, please put that log4j.xml file
Check the --files argument in the output spark-submit -h.
On Thu, Nov 20, 2014 at 7:51 AM, Matt Narrell matt.narr...@gmail.com wrote:
How do I configure the files to be uploaded to YARN containers. So far, I’ve
only seen --conf spark.yarn.jar=hdfs://….” which allows me to specify the
HDFS
at 12:20 AM, Tobias Pfeiffer t...@preferred.jp wrote:
Hi,
I am using spark-submit to submit my application jar to a YARN cluster. I
want to deliver a single jar file to my users, so I would like to avoid to
tell them also, please put that log4j.xml file somewhere and add that path
to the spark
4:59 AM, Samarth Mailinglist
mailinglistsama...@gmail.com wrote:
I am trying to run a job written in python with the following command:
bin/spark-submit --master spark://localhost:7077
/path/spark_solution_basic.py --py-files /path/*.py --files
/path/config.properties
I always get
You are changing these paths and filenames to match your own actual scripts
and file locations right?
On Nov 17, 2014 4:59 AM, Samarth Mailinglist mailinglistsama...@gmail.com
wrote:
I am trying to run a job written in python with the following command:
bin/spark-submit --master spark
) for the time
being.
Date: Tue, 11 Nov 2014 20:15:17 +0530
Subject: Re: Spark-submit and Windows / Linux mixed network
From: riteshoneinamill...@gmail.com
To: as...@live.com
CC: user@spark.apache.org
Never tried this form but just guessing,
What's the output when you submit this jar:
\\shares
Hi,
I'm trying to submit a spark application fro network share to the spark master.
Network shares are configured so that the master and all nodes have access to
the target ja at (say):
\\shares\publish\Spark\app1\someJar.jar
And this is mounted on each linux box (i.e. master and workers) at:
Never tried this form but just guessing,
What's the output when you submit this jar: \\shares\publish\Spark\app1\
someJar.jar
using spark-submit.cmd
Not sure why that is failing, but i found a workaround like:
#!/bin/bash -e
SPARK_SUBMIT=/home/akhld/mobi/localcluster/spark-1/bin/spark-submit
*export _JAVA_OPTIONS=-Xmx1g*
OPTS+= --class org.apache.spark.examples.SparkPi
echo $SPARK_SUBMIT $OPTS lib/spark-examples-1.1.0-hadoop1.0.4.jar
about this. Unfortunately, it doesn't
change anything. With this setting both true and false (as indicated by
the Spark web interface) and no matter whether local[N] or yarn-client
or yarn-cluster mode are used with spark-submit, the classpath looks the
same and the netty class is loaded from
.nabble.com/with-SparkStreeaming-spark-submit-don-t-see-output-after-ssc-start-tp17989p18224.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
that and
was really happy, but it seems like spark-submit puts an older version
of netty on the classpath when submitting to a cluster, such that my
code ends up with an NoSuchMethodError:
Code:
val a = new DefaultHttpRequest(HttpVersion.HTTP_1_1, HttpMethod.POST,
http://localhost;)
val
know about this. Unfortunately, it doesn't
change anything. With this setting both true and false (as indicated by
the Spark web interface) and no matter whether local[N] or yarn-client
or yarn-cluster mode are used with spark-submit, the classpath looks the
same and the netty class is loaded from
I have a Spark Streaming program that works fine if I execute it via
sbt runMain com.cray.examples.spark.streaming.cyber.StatefulDhcpServerHisto
-f /Users/spr/Documents/.../tmp/ -t 10
but if I start it via
$S/bin/spark-submit --master local[12] --class StatefulNewDhcpServers
target/scala-2.10
From: Tobias Pfeiffer t...@preferred.jpmailto:t...@preferred.jp
Am I right that you are actually executing two different classes here?
Yes, I realized after I posted that I was calling 2 different classes, though
they are in the same JAR. I went back and tried it again with the same class
Hi,
I tried hard to get a version of netty into my jar file created with sbt
assembly that works with all my libraries. Now I managed that and was
really happy, but it seems like spark-submit puts an older version of netty
on the classpath when submitting to a cluster, such that my code ends up
I see the job in the web interface but don't know how to kill it
running
~/spark-1.1.0-bin-hadoop2.4/bin/spark-submit \
--class my.spark.MyClass --master local[3] \
target/scala-2.10/myclass-assembly-1.0.jar
I get:
Spark assembly has been built with Hive, including Datanucleus jars on
classpath
Exception in thread main java.lang.NoClassDefFoundError:
com
Hi again,
On Thu, Oct 30, 2014 at 11:50 AM, Tobias Pfeiffer t...@preferred.jp wrote:
Spark assembly has been built with Hive, including Datanucleus jars on
classpath
Exception in thread main java.lang.NoClassDefFoundError:
com/typesafe/scalalogging/slf4j/Logger
It turned out scalalogging
-1.2.0-SNAPSHOT.jar
But not through spark-submit:
./bin/spark-submit --class
org.apache.spark.examples.streaming.CassandraSave --master
spark://ip-172-31-38-112:7077
streaming-test/target/scala-2.10/simple-streaming_2.10-1.0.jar --jars
local:///home/ubuntu/spark-cassandra-connector/spark-cassandra
-connector-assembly-1.2.0-SNAPSHOT.jar
But not through spark-submit:
./bin/spark-submit --class
org.apache.spark.examples.streaming.CassandraSave --master
spark://ip-172-31-38-112:7077
streaming-test/target/scala-2.10/simple-streaming_2.10-1.0.jar --jars
local:///home/ubuntu/spark
-assembly-1.2.0-SNAPSHOT.jar
But not through spark-submit:
./bin/spark-submit --class
org.apache.spark.examples.streaming.CassandraSave --master
spark://ip-172-31-38-112:7077
streaming-test/target/scala-2.10/simple-streaming_2.10-1.0.jar --jars
local:///home/ubuntu/spark-cassandra
Version: spark 1.1.0
42 workers,40g memory per worker
Running graphx componentgraph ,use five hours
在 Oct 25, 2014,1:27,Sameer Farooqui same...@databricks.com 写道:
That does seem a bit odd. How many Executors are running under this Driver?
Does the spark-submit process start out using
You can open the application UI (that runs on 4040) and see how much memory
is being allocated to the executor tabs and from the environments tab.
Thanks
Best Regards
On Wed, Oct 22, 2014 at 9:55 PM, Holden Karau hol...@pigscanfly.ca wrote:
Hi Michael Campbell,
Are you deploying against yarn
i used standalone spark,set spark.driver.memory=5g,but spark-submit process use
57g memory, is this normal?how to decrease it?
That does seem a bit odd. How many Executors are running under this Driver?
Does the spark-submit process start out using ~60GB of memory right away or
does it start out smaller and slowly build up to that high? If so, how long
does it take to get that high?
Also, which version of Spark are you
Hi Michael Campbell,
Are you deploying against yarn or standalone mode? In yarn try setting the
shell variables SPARK_EXECUTOR_MEMORY=2G in standalone try and
set SPARK_WORKER_MEMORY=2G.
Cheers,
Holden :)
On Thu, Oct 16, 2014 at 2:22 PM, Michael Campbell
michael.campb...@gmail.com wrote:
Hi,
I'd like to run my python script using spark-submit together with a JAR
file containing Java specifications for a Hadoop file system. How can I do
that? It seems I can either provide a JAR file or a PYthon file to
spark-submit.
So far I have been running my code in ipython with IPYTHON_OPTS
Hi,
i using the comma separated style for submit multiple jar files in the
follow shell but it does not work:
bin/spark-submit --class org.apache.spark.examples.mllib.JavaKMeans
--master yarn-cluster --execur-memory 2g *--jars
lib/spark-examples-1.0.2-hadoop2.2.0.jar,lib/spark-mllib_2.10-1.0.0
/spark-submit --class org.apache.spark.examples.mllib.JavaKMeans
--master yarn-cluster --execur-memory 2g *--jars
lib/spark-examples-1.0.2-hadoop2.2.0.jar,lib/spark-mllib_2.10-1.0.0.jar
*hdfs://master:8000/srcdata/kmeans
8 4
Thanks!
--
WangHaihua
shell but it does not work:
bin/spark-submit --class org.apache.spark.examples.mllib.JavaKMeans --master
yarn-cluster --execur-memory 2g --jars
lib/spark-examples-1.0.2-hadoop2.2.0.jar,lib/spark-mllib_2.10-1.0.0.jar
hdfs://master:8000/srcdata/kmeans 8 4
Thanks!
--
WangHaihua
-15 22:39 GMT+08:00 Soumitra Kumar kumar.soumi...@gmail.com:
I am writing to HBase, following are my options:
export
SPARK_CLASSPATH=/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol.jar
spark-submit \
--jars
/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol.jar,/opt/cloudera/parcels
...@hbase.apache.org
Sent: Thursday, October 16, 2014 12:50:01 AM
Subject: Re: How to add HBase dependencies and conf with spark-submit?
Thanks, Soumitra Kumar,
I didn’t know why you put hbase-protocol.jar in SPARK_CLASSPATH, while add
hbase-protocol.jar , hbase-common.jar , hbase-client.jar , htrace
TL;DR - a spark SQL job fails with an OOM (Out of heap space) error. If
given --executor-memory values, it won't even start. Even (!) if the
values given ARE THE SAME AS THE DEFAULT.
Without --executor-memory:
14/10/16 17:14:58 INFO TaskSetManager: Serialized task 1.0:64 as 14710
bytes in 1
+user@hbase
2014-10-15 20:48 GMT+08:00 Fengyun RAO raofeng...@gmail.com:
We use Spark 1.1, and HBase 0.98.1-cdh5.1.0, and need to read and write an
HBase table in Spark program.
I notice there are:
spark.driver.extraClassPath
spark.executor.extraClassPathproperties to manage extra
I am writing to HBase, following are my options:
export SPARK_CLASSPATH=/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol.jar
spark-submit \
--jars
/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol.jar,/opt/cloudera/parcels/CDH/lib/hbase/hbase-common.jar,/opt/cloudera/parcels/CDH/lib
(spark.executor.memory, 4g)*
that I can run manually with sbt run without any problem.
But, I try to run the same job with spark-submit
*./spark-1.1.0-bin-hadoop2.4/bin/spark-submit \*
* --class value.jobs.MyJob \*
* --master local[4] \*
* --conf spark.executor.memory=4g
)
.setMaster(local[4])
.set(spark.executor.memory, 4g)
that I can run manually with sbt run without any problem.
But, I try to run the same job with spark-submit
./spark-1.1.0-bin-hadoop2.4/bin/spark-submit \
--class value.jobs.MyJob \
--master local[4
What is the proper way to specify java options for the Spark executors
using spark-submit? We had done this previously using
export SPARK_JAVA_OPTS='..
previously, for example to attach a debugger to each executor or add
-verbose:gc
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps
On spark-submit I
-defaults.conf file used with the spark-submit script. Heap
size settings can be set with spark.executor.memory.
you can find it at Runtime Environment
Larry
On 9/24/14 10:52 PM, Arun Ahuja wrote:
What is the proper way to specify java options for the Spark executors
using spark-submit? We had done
will be added to the discussion
below:
http://apache-spark-user-list.1001560.n3.nabble.com/spark-submit-command-line-with-files-tp14645p14708.html
To unsubscribe from spark-submit command-line with --files, click here
http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro
-laptop (has the /tmp/myobject.ser /opt/test/lib/spark-test.jar)
launches spark-submit ---files .. hadoop-yarn-cluster[3 nodes]
*
and on my laptop:$HADOOP_CONF_DIR, I have the configuration that points to
this 3-node yarn cluster.
*What is the right way to get to this file (myobject.ser) in my
, correct ? that's why its probably not finding the file.
*
Here's what I am trying to do:
my-laptop (has the /tmp/myobject.ser /opt/test/lib/spark-test.jar)
launches spark-submit ---files .. hadoop-yarn-cluster[3 nodes]
*
and on my laptop:$HADOOP_CONF_DIR, I have the configuration
the file on
hdfs ?
-C
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/spark-submit-command-line-with-files-tp14645p14753.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
(myobject.ser) to get the file. Am I doing something wrong
?
CMD:
bin/spark-submit --name Test --class
com.test.batch.modeltrainer.ModelTrainerMain \
--master local --files /tmp/myobject.ser --verbose
/opt/test/lib/spark-test.jar
com.test.batch.modeltrainer.ModelTrainerMain.scala
37: val serFile
Hey just a minor clarification, you _can_ use SparkFiles.get in your
application only if it runs on the executors, e.g. in the following way:
sc.parallelize(1 to 100).map { i = SparkFiles.get(my.file) }.collect()
But not in general (otherwise NPE, as in your case). Perhaps this should be
Hi,
I am wondering: Is it possible to run spark-submit in a mode where it will
start an application on a YARN cluster (i.e., driver and executors run on
the cluster) and then forget about it in the sense that the Spark
application is completely independent from the host that ran the
spark-submit
...@cloudera.com
wrote:
Yes, what Sandy said.
On top of that, I would suggest filing a bug for a new command line
argument for spark-submit to make the launcher process exit cleanly as
soon as a cluster job starts successfully. That can be helpful for
code that launches Spark jobs but monitors the job
idea Marcelo. There isn't AFAIK any reason the
client needs to hang there for correct operation.
On Thu, Sep 18, 2014 at 9:39 AM, Marcelo Vanzin van...@cloudera.com
wrote:
Yes, what Sandy said.
On top of that, I would suggest filing a bug for a new command line
argument for spark-submit
Hi,
thanks for everyone's replies!
On Thu, Sep 18, 2014 at 7:37 AM, Sandy Ryza sandy.r...@cloudera.com
wrote:
YARN cluster mode should have the behavior you're looking for. The
client
process will stick around to report on things, but should be able to be
killed without affecting the
yarn-cluster with the same result]. I am using the
SparkFiles.get(myobject.ser) to get the file. Am I doing something wrong ?
CMD:
bin/spark-submit --name Test --class
com.test.batch.modeltrainer.ModelTrainerMain \
--master local --files /tmp/myobject.ser --verbose
/opt/test/lib/spark
.
From: Xiangrui Meng men...@gmail.commailto:men...@gmail.com
Sent: Sunday, September 07, 2014 11:40 PM
To: Victor Tso-Guillen
Cc: Penny Espinoza; Spark
Subject: Re: prepending jars to the driver class path for spark-submit on YARN
There is an undocumented configuration to put users jars
with
org.apache.httpcomponents httpcore and httpclient when using spark-submit
with YARN running Spark 1.0.2 on a Hadoop 2.2 cluster. I’ve seen several
posts about this issue, but no resolution.
The error message is this:
Caused by: java.lang.NoSuchMethodError
exception on line 31, where the ClassToRoundTrip object is
deserialized. Strangely, the earlier use on line 28 is okay:
spark-submit --class SimpleApp \
--master local[4] \
target/scala-2.10/simpleapp_2.10-1.0.jar
However, if I add extra parameters for driver-class-path
I don't understand what you mean. Can you be more specific?
From: Victor Tso-Guillen v...@paxata.com
Sent: Saturday, September 06, 2014 5:13 PM
To: Penny Espinoza
Cc: Spark
Subject: Re: prepending jars to the driver class path for spark-submit on YARN
I ran
When you submit the job to yarn with spark-submit, set --conf
spark.yarn.user.classpath.first=true .
On Mon, Sep 8, 2014 at 10:46 AM, Penny Espinoza
pesp...@societyconsulting.com wrote:
I don't understand what you mean. Can you be more specific?
From: Victor
601 - 700 of 789 matches
Mail list logo