Re: Can't I mix non-Spark properties into a .properties file and pass it to spark-submit via --properties-file?

2015-02-17 Thread Gerard Maas
, Gerard. On Tue, Feb 17, 2015 at 1:45 PM, Emre Sevinc emre.sev...@gmail.com wrote: I've decided to try spark-submit ... --conf spark.driver.extraJavaOptions=-DpropertiesFile=/home/emre/data/myModule.properties But when I try to retrieve the value of propertiesFile via System.err.println

Re: Can't I mix non-Spark properties into a .properties file and pass it to spark-submit via --properties-file?

2015-02-17 Thread Emre Sevinc
I've decided to try spark-submit ... --conf spark.driver.extraJavaOptions=-DpropertiesFile=/home/emre/data/myModule.properties But when I try to retrieve the value of propertiesFile via System.err.println(propertiesFile : + System.getProperty(propertiesFile)); I get NULL

Re: Can't I mix non-Spark properties into a .properties file and pass it to spark-submit via --properties-file?

2015-02-16 Thread Corey Nolet
directory, and that file is put inside the über JAR file when I build my application with Maven, and then when I submit it using spark-submit, I can read that module.properties file via the traditional method: properties.load(MyModule.class.getClassLoader().getResourceAsStream

Can't I mix non-Spark properties into a .properties file and pass it to spark-submit via --properties-file?

2015-02-16 Thread Emre Sevinc
Hello, I'm using Spark 1.2.1 and have a module.properties file, and in it I have non-Spark properties, as well as Spark properties, e.g.: job.output.dir=file:///home/emre/data/mymodule/out I'm trying to pass it to spark-submit via: spark-submit --class com.myModule --master local[4

Re: Can't I mix non-Spark properties into a .properties file and pass it to spark-submit via --properties-file?

2015-02-16 Thread Sean Owen
to pass it to spark-submit via: spark-submit --class com.myModule --master local[4] --deploy-mode client --verbose --properties-file /home/emre/data/mymodule.properties mymodule.jar And I thought I could read the value of my non-Spark property, namely, job.output.dir by using: SparkConf

Re: Can't I mix non-Spark properties into a .properties file and pass it to spark-submit via --properties-file?

2015-02-16 Thread Emre Sevinc
Sean, I'm trying this as an alternative to what I currently do. Currently I have my module.properties file for my module in the resources directory, and that file is put inside the über JAR file when I build my application with Maven, and then when I submit it using spark-submit, I can read

Re: Can't I mix non-Spark properties into a .properties file and pass it to spark-submit via --properties-file?

2015-02-16 Thread Charles Feduke
.: job.output.dir=file:///home/emre/data/mymodule/out I'm trying to pass it to spark-submit via: spark-submit --class com.myModule --master local[4] --deploy-mode client --verbose --properties-file /home/emre/data/mymodule.properties mymodule.jar And I thought I could read the value

Re: spark-submit conflicts with dependencies

2015-01-28 Thread Sean Owen
-configuration:commons 1.8 Is there any workaround for this? Greg -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-submit-conflicts-with-dependencies-tp8909p21399.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: spark-submit conflicts with dependencies

2015-01-27 Thread soid
.nabble.com/spark-submit-conflicts-with-dependencies-tp8909p21399.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e

Re: spark-submit conflicts with dependencies

2015-01-27 Thread Ted Yu
-list.1001560.n3.nabble.com/spark-submit-conflicts-with-dependencies-tp8909p21399.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

RE: Spark 1.1.0 - spark-submit failed

2015-01-21 Thread ey-chih chow
/dependency Ey-Chih Date: Tue, 20 Jan 2015 16:57:20 -0800 Subject: Re: Spark 1.1.0 - spark-submit failedFrom: yuzhih...@gmail.com To: eyc...@hotmail.com CC: user@spark.apache.org Please check which netty jar(s) are on the classpath. NioWorkerPool(Executor workerExecutor, int

Re: spark-submit --py-files remote: Only local additional python files are supported

2015-01-21 Thread Vladimir Grigor
are supported: s3://pathtomybucket/mylibrary.py. Simplest way to reproduce in local: bin/spark-submit --py-files s3://whatever.path.com/library.py main.py Actual commands to run it in EMR #launch cluster aws emr create-cluster --name SparkCluster --ami-version 3.3.1 --instance-type m1.medium

spark-submit --py-files remote: Only local additional python files are supported

2015-01-20 Thread Vladimir Grigor
those files with '--py-files' and it works fine in local mode but it fails and gives me following message when run in EMR: Error: Only local python files are supported: s3://pathtomybucket/mylibrary.py. Simplest way to reproduce in local: bin/spark-submit --py-files s3://whatever.path.com

spark-submit --py-files remote: Only local additional python files are supported

2015-01-20 Thread Vladimir Grigor
those files with '--py-files' and it works fine in local mode but it fails and gives me following message when run in EMR: Error: Only local python files are supported: s3://pathtomybucket/mylibrary.py. Simplest way to reproduce in local: bin/spark-submit --py-files s3://whatever.path.com

Re: spark-submit --py-files remote: Only local additional python files are supported

2015-01-20 Thread Andrew Or
message when run in EMR: Error: Only local python files are supported: s3://pathtomybucket/mylibrary.py. Simplest way to reproduce in local: bin/spark-submit --py-files s3://whatever.path.com/library.py main.py Actual commands to run it in EMR #launch cluster aws emr create-cluster --name

Spark 1.1.0 - spark-submit failed

2015-01-20 Thread ey-chih chow
Hi, I issued the following command in a ec2 cluster launched using spark-ec2: ~/spark/bin/spark-submit --class com.crowdstar.cluster.etl.ParseAndClean --master spark://ec2-54-185-107-113.us-west-2.compute.amazonaws.com:7077 --deploy-mode cluster --total-executor-cores 4 file:///tmp/etl-admin/jar

Re: Spark 1.1.0 - spark-submit failed

2015-01-20 Thread Ted Yu
: ~/spark/bin/spark-submit --class com.crowdstar.cluster.etl.ParseAndClean --master spark://ec2-54-185-107-113.us-west-2.compute.amazonaws.com:7077 --deploy-mode cluster --total-executor-cores 4 file:///tmp/etl-admin/jar/spark-etl-0.0.1-SNAPSHOT.jar /ETL/input/2015/01/10/12/10Jan2015.avro file

spark-submit --py-files remote: Only local additional python files are supported

2015-01-17 Thread voukka
those files with '--py-files' and it works fine in local mode but it fails and gives me following message when run in EMR: Error: Only local python files are supported: s3://pathtomybucket/mylibrary.py. Simplest way to reproduce in local: bin/spark-submit --py-files s3://whatever.path.com

spark/yarn ignoring num-executors (python, Amazon EMR, spark-submit, yarn-client)

2014-12-19 Thread Tim Schweichler
are accessible by Yarn (when running the applications repeatedly I've seen all five nodes in use in various attempts.) The applications are written in python and are run via spark-submit with yarn-client as the master. Example application submission: bin/spark-submit --num-executors 1 --conf

Re: Running spark-submit from a remote machine using a YARN application

2014-12-14 Thread Tobias Pfeiffer
Hi, On Fri, Dec 12, 2014 at 7:01 AM, ryaminal tacmot...@gmail.com wrote: Now our solution is to make a very simply YARN application which execustes as its command spark-submit --master yarn-cluster s3n://application/jar.jar This seemed so simple and elegant, but it has some weird issues

Running spark-submit from a remote machine using a YARN application

2014-12-11 Thread ryaminal
made private in future releases. Now our solution is to make a very simply YARN application which execustes as its command spark-submit --master yarn-cluster s3n://application/jar.jar This seemed so simple and elegant, but it has some weird issues. We get NoClassDefFoundErrors. When we ssh

Re: spark-submit on YARN is slow

2014-12-08 Thread Sandy Ryza
Hey Tobias, Can you try using the YARN Fair Scheduler and set yarn.scheduler.fair.continuous-scheduling-enabled to true? -Sandy On Sun, Dec 7, 2014 at 5:39 PM, Tobias Pfeiffer t...@preferred.jp wrote: Hi, thanks for your responses! On Sat, Dec 6, 2014 at 4:22 AM, Sandy Ryza

Re: spark-submit on YARN is slow

2014-12-08 Thread Tobias Pfeiffer
Hi, On Tue, Dec 9, 2014 at 4:39 AM, Sandy Ryza sandy.r...@cloudera.com wrote: Can you try using the YARN Fair Scheduler and set yarn.scheduler.fair.continuous-scheduling-enabled to true? I'm using Cloudera 5.2.0 and my configuration says yarn.resourcemanager.scheduler.class =

Re: spark-submit on YARN is slow

2014-12-07 Thread Tobias Pfeiffer
Hi, thanks for your responses! On Sat, Dec 6, 2014 at 4:22 AM, Sandy Ryza sandy.r...@cloudera.com wrote: What version are you using? In some recent versions, we had a couple of large hardcoded sleeps on the Spark side. I am using Spark 1.1.1. As Andrew mentioned, I guess most of the 10

Re: spark-submit on YARN is slow

2014-12-06 Thread Sandy Ryza
executors have registered. -Sandy On Fri, Dec 5, 2014 at 12:46 PM, Ashish Rangole arang...@gmail.com wrote: Likely this not the case here yet one thing to point out with Yarn parameters like --num-executors is that they should be specified *before* app jar and app args on spark-submit

Issue on [SPARK-3877][YARN]: Return code of the spark-submit in yarn-cluster mode

2014-12-05 Thread LinQili
Hi, all: According to https://github.com/apache/spark/pull/2732, When a spark job fails or exits nonzero in yarn-cluster mode, the spark-submit will get the corresponding return code of the spark job. But I tried in spark-1.1.1 yarn cluster, spark-submit return zero anyway. Here is my spark

RE: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in yarn-cluster mode

2014-12-05 Thread LinQili
I tried in spark client mode, spark-submit can get the correct return code from spark job. But in yarn-cluster mode, It failed. From: lin_q...@outlook.com To: u...@spark.incubator.apache.org Subject: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in yarn-cluster mode Date: Fri, 5

RE: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in yarn-cluster mode

2014-12-05 Thread LinQili
(sHiveFromSpark)val sc = new SparkContext(sparkConf)val hiveContext = new HiveContext(sc) // Before exitUtil.printLog(INFO, Exit)exit(100)} There were two `exit` in this code. If the args was wrong, the spark-submit will get the return code 101, but, if the args is correct, spark

Re: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in yarn-cluster mode

2014-12-05 Thread Shixiong Zhu
(INFO, Exit) exit(100) } There were two `exit` in this code. If the args was wrong, the spark-submit will get the return code 101, but, if the args is correct, spark-submit cannot get the second return code 100. What's the difference between these two `exit`? I was so confused

Re: spark-submit on YARN is slow

2014-12-05 Thread Andrew Or
of the overhead comes from YARN itself. In other words, no I don't know of any quick fix on your end that you can do to speed this up. -Andrew 2014-12-03 20:10 GMT-08:00 Tobias Pfeiffer t...@preferred.jp: Hi, I am using spark-submit to submit my application to YARN in yarn-cluster mode. I have

Re: spark-submit on YARN is slow

2014-12-05 Thread Sandy Ryza
quick fix on your end that you can do to speed this up. -Andrew 2014-12-03 20:10 GMT-08:00 Tobias Pfeiffer t...@preferred.jp: Hi, I am using spark-submit to submit my application to YARN in yarn-cluster mode. I have both the Spark assembly jar file as well as my application jar file put

Re: spark-submit on YARN is slow

2014-12-05 Thread Denny Lee
. In other words, no I don't know of any quick fix on your end that you can do to speed this up. -Andrew 2014-12-03 20:10 GMT-08:00 Tobias Pfeiffer t...@preferred.jp: Hi, I am using spark-submit to submit my application to YARN in yarn-cluster mode. I have both the Spark assembly jar file

Re: spark-submit on YARN is slow

2014-12-05 Thread Sameer Farooqui
maybe also try to run SparkPi against YARN as a speed check. spark-submit --class org.apache.spark.examples.SparkPi --deploy-mode cluster --master yarn /opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/jars/spark-examples-1.1.0-cdh5.2.1-hadoop2.5.0-cdh5.2.1.jar 10 On Fri, Dec 5, 2014 at 2:32 PM, Denny

Re: spark-submit on YARN is slow

2014-12-05 Thread Sandy Ryza
. -Andrew 2014-12-03 20:10 GMT-08:00 Tobias Pfeiffer t...@preferred.jp: Hi, I am using spark-submit to submit my application to YARN in yarn-cluster mode. I have both the Spark assembly jar file as well as my application jar file put in HDFS and can see from the logging output that both

Re: spark-submit on YARN is slow

2014-12-05 Thread Arun Ahuja
:10 GMT-08:00 Tobias Pfeiffer t...@preferred.jp: Hi, I am using spark-submit to submit my application to YARN in yarn-cluster mode. I have both the Spark assembly jar file as well as my application jar file put in HDFS and can see from the logging output that both files are used from

Re: spark-submit on YARN is slow

2014-12-05 Thread Ashish Rangole
Likely this not the case here yet one thing to point out with Yarn parameters like --num-executors is that they should be specified *before* app jar and app args on spark-submit command line otherwise the app only gets the default number of containers which is 2. On Dec 5, 2014 12:22 PM, Sandy

Re: spark-submit on YARN is slow

2014-12-05 Thread Sandy Ryza
is that they should be specified *before* app jar and app args on spark-submit command line otherwise the app only gets the default number of containers which is 2. On Dec 5, 2014 12:22 PM, Sandy Ryza sandy.r...@cloudera.com wrote: Hi Denny, Those sleeps were only at startup, so if jobs are taking

Re: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in yarn-cluster mode

2014-12-05 Thread Shixiong Zhu
There were two exit in this code. If the args was wrong, the spark-submit will get the return code 101, but, if the args is correct, spark-submit cannot get the second return code 100. What’s the difference between these two exit? I was so confused. I’m also confused. When I tried your codes

Re: spark-submit on YARN is slow

2014-12-05 Thread Denny Lee
, Ashish Rangole arang...@gmail.com wrote: Likely this not the case here yet one thing to point out with Yarn parameters like --num-executors is that they should be specified *before* app jar and app args on spark-submit command line otherwise the app only gets the default number of containers

Re: spark-submit on YARN is slow

2014-12-05 Thread Denny Lee
...@gmail.com wrote: Likely this not the case here yet one thing to point out with Yarn parameters like --num-executors is that they should be specified *before* app jar and app args on spark-submit command line otherwise the app only gets the default number of containers which is 2. On Dec 5, 2014 12

spark-submit on YARN is slow

2014-12-03 Thread Tobias Pfeiffer
Hi, I am using spark-submit to submit my application to YARN in yarn-cluster mode. I have both the Spark assembly jar file as well as my application jar file put in HDFS and can see from the logging output that both files are used from there. However, it still takes about 10 seconds for my

Re: netty on classpath when using spark-submit

2014-12-03 Thread Tobias Pfeiffer
! However, in local[N] mode, neither that one nor the spark.files.userClassPathFirst one works. So when using spark-submit with --master local[3] instead of --master yarn-cluster, the value for spark.files.userClassPathFirst is displayed correctly, but the classes are still loaded from the wrong jar

spark-submit and logging

2014-11-20 Thread Tobias Pfeiffer
Hi, I am using spark-submit to submit my application jar to a YARN cluster. I want to deliver a single jar file to my users, so I would like to avoid to tell them also, please put that log4j.xml file somewhere and add that path to the spark-submit command. I thought it would be sufficient

Re: spark-submit and logging

2014-11-20 Thread Sean Owen
, Tobias Pfeiffer t...@preferred.jp wrote: Hi, I am using spark-submit to submit my application jar to a YARN cluster. I want to deliver a single jar file to my users, so I would like to avoid to tell them also, please put that log4j.xml file somewhere and add that path to the spark-submit

Re: spark-submit and logging

2014-11-20 Thread Matt Narrell
-on-yarn.html On Thu, Nov 20, 2014 at 9:20 AM, Tobias Pfeiffer t...@preferred.jp wrote: Hi, I am using spark-submit to submit my application jar to a YARN cluster. I want to deliver a single jar file to my users, so I would like to avoid to tell them also, please put that log4j.xml file

Re: spark-submit and logging

2014-11-20 Thread Marcelo Vanzin
Check the --files argument in the output spark-submit -h. On Thu, Nov 20, 2014 at 7:51 AM, Matt Narrell matt.narr...@gmail.com wrote: How do I configure the files to be uploaded to YARN containers. So far, I’ve only seen --conf spark.yarn.jar=hdfs://….” which allows me to specify the HDFS

Re: spark-submit and logging

2014-11-20 Thread Marcelo Vanzin
at 12:20 AM, Tobias Pfeiffer t...@preferred.jp wrote: Hi, I am using spark-submit to submit my application jar to a YARN cluster. I want to deliver a single jar file to my users, so I would like to avoid to tell them also, please put that log4j.xml file somewhere and add that path to the spark

Re: spark-submit question

2014-11-17 Thread Samarth Mailinglist
4:59 AM, Samarth Mailinglist mailinglistsama...@gmail.com wrote: I am trying to run a job written in python with the following command: bin/spark-submit --master spark://localhost:7077 /path/spark_solution_basic.py --py-files /path/*.py --files /path/config.properties I always get

Re: spark-submit question

2014-11-16 Thread Sean Owen
You are changing these paths and filenames to match your own actual scripts and file locations right? On Nov 17, 2014 4:59 AM, Samarth Mailinglist mailinglistsama...@gmail.com wrote: I am trying to run a job written in python with the following command: bin/spark-submit --master spark

RE: Spark-submit and Windows / Linux mixed network

2014-11-12 Thread Ashic Mahtab
) for the time being. Date: Tue, 11 Nov 2014 20:15:17 +0530 Subject: Re: Spark-submit and Windows / Linux mixed network From: riteshoneinamill...@gmail.com To: as...@live.com CC: user@spark.apache.org Never tried this form but just guessing, What's the output when you submit this jar: \\shares

Spark-submit and Windows / Linux mixed network

2014-11-11 Thread Ashic Mahtab
Hi, I'm trying to submit a spark application fro network share to the spark master. Network shares are configured so that the master and all nodes have access to the target ja at (say): \\shares\publish\Spark\app1\someJar.jar And this is mounted on each linux box (i.e. master and workers) at:

Re: Spark-submit and Windows / Linux mixed network

2014-11-11 Thread Ritesh Kumar Singh
Never tried this form but just guessing, What's the output when you submit this jar: \\shares\publish\Spark\app1\ someJar.jar using spark-submit.cmd

Re: spark-submit inside script... need some bash help

2014-11-09 Thread Akhil Das
Not sure why that is failing, but i found a workaround like: #!/bin/bash -e SPARK_SUBMIT=/home/akhld/mobi/localcluster/spark-1/bin/spark-submit *export _JAVA_OPTIONS=-Xmx1g* OPTS+= --class org.apache.spark.examples.SparkPi echo $SPARK_SUBMIT $OPTS lib/spark-examples-1.1.0-hadoop1.0.4.jar

Re: netty on classpath when using spark-submit

2014-11-09 Thread Tobias Pfeiffer
about this. Unfortunately, it doesn't change anything. With this setting both true and false (as indicated by the Spark web interface) and no matter whether local[N] or yarn-client or yarn-cluster mode are used with spark-submit, the classpath looks the same and the netty class is loaded from

Re: with SparkStreeaming spark-submit, don't see output after ssc.start()

2014-11-05 Thread spr
.nabble.com/with-SparkStreeaming-spark-submit-don-t-see-output-after-ssc-start-tp17989p18224.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

Re: netty on classpath when using spark-submit

2014-11-04 Thread M. Dale
that and was really happy, but it seems like spark-submit puts an older version of netty on the classpath when submitting to a cluster, such that my code ends up with an NoSuchMethodError: Code: val a = new DefaultHttpRequest(HttpVersion.HTTP_1_1, HttpMethod.POST, http://localhost;) val

Re: netty on classpath when using spark-submit

2014-11-04 Thread Tobias Pfeiffer
know about this. Unfortunately, it doesn't change anything. With this setting both true and false (as indicated by the Spark web interface) and no matter whether local[N] or yarn-client or yarn-cluster mode are used with spark-submit, the classpath looks the same and the netty class is loaded from

with SparkStreeaming spark-submit, don't see output after ssc.start()

2014-11-03 Thread spr
I have a Spark Streaming program that works fine if I execute it via sbt runMain com.cray.examples.spark.streaming.cyber.StatefulDhcpServerHisto -f /Users/spr/Documents/.../tmp/ -t 10 but if I start it via $S/bin/spark-submit --master local[12] --class StatefulNewDhcpServers target/scala-2.10

Re: with SparkStreeaming spark-submit, don't see output after ssc.start()

2014-11-03 Thread Steve Reinhardt
From: Tobias Pfeiffer t...@preferred.jpmailto:t...@preferred.jp Am I right that you are actually executing two different classes here? Yes, I realized after I posted that I was calling 2 different classes, though they are in the same JAR. I went back and tried it again with the same class

netty on classpath when using spark-submit

2014-11-03 Thread Tobias Pfeiffer
Hi, I tried hard to get a version of netty into my jar file created with sbt assembly that works with all my libraries. Now I managed that and was really happy, but it seems like spark-submit puts an older version of netty on the classpath when submitting to a cluster, such that my code ends up

How do I kill av job submitted with spark-submit

2014-11-02 Thread Steve Lewis
I see the job in the web interface but don't know how to kill it

spark-submit results in NoClassDefFoundError

2014-10-29 Thread Tobias Pfeiffer
running ~/spark-1.1.0-bin-hadoop2.4/bin/spark-submit \ --class my.spark.MyClass --master local[3] \ target/scala-2.10/myclass-assembly-1.0.jar I get: Spark assembly has been built with Hive, including Datanucleus jars on classpath Exception in thread main java.lang.NoClassDefFoundError: com

Re: spark-submit results in NoClassDefFoundError

2014-10-29 Thread Tobias Pfeiffer
Hi again, On Thu, Oct 30, 2014 at 11:50 AM, Tobias Pfeiffer t...@preferred.jp wrote: Spark assembly has been built with Hive, including Datanucleus jars on classpath Exception in thread main java.lang.NoClassDefFoundError: com/typesafe/scalalogging/slf4j/Logger It turned out scalalogging

Including jars in Spark-shell vs Spark-submit

2014-10-28 Thread Harold Nguyen
-1.2.0-SNAPSHOT.jar But not through spark-submit: ./bin/spark-submit --class org.apache.spark.examples.streaming.CassandraSave --master spark://ip-172-31-38-112:7077 streaming-test/target/scala-2.10/simple-streaming_2.10-1.0.jar --jars local:///home/ubuntu/spark-cassandra-connector/spark-cassandra

Re: Including jars in Spark-shell vs Spark-submit

2014-10-28 Thread Helena Edelson
-connector-assembly-1.2.0-SNAPSHOT.jar But not through spark-submit: ./bin/spark-submit --class org.apache.spark.examples.streaming.CassandraSave --master spark://ip-172-31-38-112:7077 streaming-test/target/scala-2.10/simple-streaming_2.10-1.0.jar --jars local:///home/ubuntu/spark

Re: Including jars in Spark-shell vs Spark-submit

2014-10-28 Thread Helena Edelson
-assembly-1.2.0-SNAPSHOT.jar But not through spark-submit: ./bin/spark-submit --class org.apache.spark.examples.streaming.CassandraSave --master spark://ip-172-31-38-112:7077 streaming-test/target/scala-2.10/simple-streaming_2.10-1.0.jar --jars local:///home/ubuntu/spark-cassandra

Re: spark-submit memory too larger

2014-10-25 Thread marylucy
Version: spark 1.1.0 42 workers,40g memory per worker Running graphx componentgraph ,use five hours 在 Oct 25, 2014,1:27,Sameer Farooqui same...@databricks.com 写道: That does seem a bit odd. How many Executors are running under this Driver? Does the spark-submit process start out using

Re: Spark Bug? job fails to run when given options on spark-submit (but starts and fails without)

2014-10-24 Thread Akhil Das
You can open the application UI (that runs on 4040) and see how much memory is being allocated to the executor tabs and from the environments tab. Thanks Best Regards On Wed, Oct 22, 2014 at 9:55 PM, Holden Karau hol...@pigscanfly.ca wrote: Hi Michael Campbell, Are you deploying against yarn

spark-submit memory too larger

2014-10-24 Thread marylucy
i used standalone spark,set spark.driver.memory=5g,but spark-submit process use 57g memory, is this normal?how to decrease it?

Re: spark-submit memory too larger

2014-10-24 Thread Sameer Farooqui
That does seem a bit odd. How many Executors are running under this Driver? Does the spark-submit process start out using ~60GB of memory right away or does it start out smaller and slowly build up to that high? If so, how long does it take to get that high? Also, which version of Spark are you

Re: Spark Bug? job fails to run when given options on spark-submit (but starts and fails without)

2014-10-22 Thread Holden Karau
Hi Michael Campbell, Are you deploying against yarn or standalone mode? In yarn try setting the shell variables SPARK_EXECUTOR_MEMORY=2G in standalone try and set SPARK_WORKER_MEMORY=2G. Cheers, Holden :) On Thu, Oct 16, 2014 at 2:22 PM, Michael Campbell michael.campb...@gmail.com wrote:

Spark-Submit Python along with JAR

2014-10-21 Thread TJ Klein
Hi, I'd like to run my python script using spark-submit together with a JAR file containing Java specifications for a Hadoop file system. How can I do that? It seems I can either provide a JAR file or a PYthon file to spark-submit. So far I have been running my code in ipython with IPYTHON_OPTS

how to submit multiple jar files when using spark-submit script in shell?

2014-10-17 Thread eric wong
Hi, i using the comma separated style for submit multiple jar files in the follow shell but it does not work: bin/spark-submit --class org.apache.spark.examples.mllib.JavaKMeans --master yarn-cluster --execur-memory 2g *--jars lib/spark-examples-1.0.2-hadoop2.2.0.jar,lib/spark-mllib_2.10-1.0.0

Re: how to submit multiple jar files when using spark-submit script in shell?

2014-10-17 Thread Andrew Or
/spark-submit --class org.apache.spark.examples.mllib.JavaKMeans --master yarn-cluster --execur-memory 2g *--jars lib/spark-examples-1.0.2-hadoop2.2.0.jar,lib/spark-mllib_2.10-1.0.0.jar *hdfs://master:8000/srcdata/kmeans 8 4 Thanks! -- WangHaihua

Re: how to submit multiple jar files when using spark-submit script in shell?

2014-10-17 Thread Marcelo Vanzin
shell but it does not work: bin/spark-submit --class org.apache.spark.examples.mllib.JavaKMeans --master yarn-cluster --execur-memory 2g --jars lib/spark-examples-1.0.2-hadoop2.2.0.jar,lib/spark-mllib_2.10-1.0.0.jar hdfs://master:8000/srcdata/kmeans 8 4 Thanks! -- WangHaihua

Re: How to add HBase dependencies and conf with spark-submit?

2014-10-16 Thread Fengyun RAO
-15 22:39 GMT+08:00 Soumitra Kumar kumar.soumi...@gmail.com: I am writing to HBase, following are my options: export SPARK_CLASSPATH=/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol.jar spark-submit \ --jars /opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol.jar,/opt/cloudera/parcels

Re: How to add HBase dependencies and conf with spark-submit?

2014-10-16 Thread Soumitra Kumar
...@hbase.apache.org Sent: Thursday, October 16, 2014 12:50:01 AM Subject: Re: How to add HBase dependencies and conf with spark-submit? Thanks, Soumitra Kumar, I didn’t know why you put hbase-protocol.jar in SPARK_CLASSPATH, while add hbase-protocol.jar , hbase-common.jar , hbase-client.jar , htrace

Spark Bug? job fails to run when given options on spark-submit (but starts and fails without)

2014-10-16 Thread Michael Campbell
TL;DR - a spark SQL job fails with an OOM (Out of heap space) error. If given --executor-memory values, it won't even start. Even (!) if the values given ARE THE SAME AS THE DEFAULT. Without --executor-memory: 14/10/16 17:14:58 INFO TaskSetManager: Serialized task 1.0:64 as 14710 bytes in 1

Re: How to add HBase dependencies and conf with spark-submit?

2014-10-15 Thread Fengyun RAO
+user@hbase 2014-10-15 20:48 GMT+08:00 Fengyun RAO raofeng...@gmail.com: We use Spark 1.1, and HBase 0.98.1-cdh5.1.0, and need to read and write an HBase table in Spark program. I notice there are: spark.driver.extraClassPath spark.executor.extraClassPathproperties to manage extra

Re: How to add HBase dependencies and conf with spark-submit?

2014-10-15 Thread Soumitra Kumar
I am writing to HBase, following are my options: export SPARK_CLASSPATH=/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol.jar spark-submit \ --jars /opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol.jar,/opt/cloudera/parcels/CDH/lib/hbase/hbase-common.jar,/opt/cloudera/parcels/CDH/lib

java.lang.OutOfMemoryError: Java heap space when running job via spark-submit

2014-10-09 Thread Jaonary Rabarisoa
(spark.executor.memory, 4g)* that I can run manually with sbt run without any problem. But, I try to run the same job with spark-submit *./spark-1.1.0-bin-hadoop2.4/bin/spark-submit \* * --class value.jobs.MyJob \* * --master local[4] \* * --conf spark.executor.memory=4g

Re: java.lang.OutOfMemoryError: Java heap space when running job via spark-submit

2014-10-09 Thread Jaonary Rabarisoa
) .setMaster(local[4]) .set(spark.executor.memory, 4g) that I can run manually with sbt run without any problem. But, I try to run the same job with spark-submit ./spark-1.1.0-bin-hadoop2.4/bin/spark-submit \ --class value.jobs.MyJob \ --master local[4

Specifying Spark Executor Java options using Spark Submit

2014-09-24 Thread Arun Ahuja
What is the proper way to specify java options for the Spark executors using spark-submit? We had done this previously using export SPARK_JAVA_OPTS='.. previously, for example to attach a debugger to each executor or add -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps On spark-submit I

Re: Specifying Spark Executor Java options using Spark Submit

2014-09-24 Thread Larry Xiao
-defaults.conf file used with the spark-submit script. Heap size settings can be set with spark.executor.memory. you can find it at Runtime Environment Larry On 9/24/14 10:52 PM, Arun Ahuja wrote: What is the proper way to specify java options for the Spark executors using spark-submit? We had done

Re: spark-submit command-line with --files

2014-09-20 Thread chinchu
will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/spark-submit-command-line-with-files-tp14645p14708.html To unsubscribe from spark-submit command-line with --files, click here http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro

Re: spark-submit command-line with --files

2014-09-20 Thread chinchu
-laptop (has the /tmp/myobject.ser /opt/test/lib/spark-test.jar) launches spark-submit ---files .. hadoop-yarn-cluster[3 nodes] * and on my laptop:$HADOOP_CONF_DIR, I have the configuration that points to this 3-node yarn cluster. *What is the right way to get to this file (myobject.ser) in my

Re: spark-submit command-line with --files

2014-09-20 Thread Marcelo Vanzin
, correct ? that's why its probably not finding the file. * Here's what I am trying to do: my-laptop (has the /tmp/myobject.ser /opt/test/lib/spark-test.jar) launches spark-submit ---files .. hadoop-yarn-cluster[3 nodes] * and on my laptop:$HADOOP_CONF_DIR, I have the configuration

Re: spark-submit command-line with --files

2014-09-20 Thread chinchu
the file on hdfs ? -C -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-submit-command-line-with-files-tp14645p14753.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: spark-submit command-line with --files

2014-09-19 Thread Andrew Or
(myobject.ser) to get the file. Am I doing something wrong ? CMD: bin/spark-submit --name Test --class com.test.batch.modeltrainer.ModelTrainerMain \ --master local --files /tmp/myobject.ser --verbose /opt/test/lib/spark-test.jar com.test.batch.modeltrainer.ModelTrainerMain.scala 37: val serFile

Re: spark-submit command-line with --files

2014-09-19 Thread Andrew Or
Hey just a minor clarification, you _can_ use SparkFiles.get in your application only if it runs on the executors, e.g. in the following way: sc.parallelize(1 to 100).map { i = SparkFiles.get(my.file) }.collect() But not in general (otherwise NPE, as in your case). Perhaps this should be

spark-submit: fire-and-forget mode?

2014-09-18 Thread Tobias Pfeiffer
Hi, I am wondering: Is it possible to run spark-submit in a mode where it will start an application on a YARN cluster (i.e., driver and executors run on the cluster) and then forget about it in the sense that the Spark application is completely independent from the host that ran the spark-submit

Re: spark-submit: fire-and-forget mode?

2014-09-18 Thread Andrew Or
...@cloudera.com wrote: Yes, what Sandy said. On top of that, I would suggest filing a bug for a new command line argument for spark-submit to make the launcher process exit cleanly as soon as a cluster job starts successfully. That can be helpful for code that launches Spark jobs but monitors the job

Re: spark-submit: fire-and-forget mode?

2014-09-18 Thread Nicholas Chammas
idea Marcelo. There isn't AFAIK any reason the client needs to hang there for correct operation. On Thu, Sep 18, 2014 at 9:39 AM, Marcelo Vanzin van...@cloudera.com wrote: Yes, what Sandy said. On top of that, I would suggest filing a bug for a new command line argument for spark-submit

Re: spark-submit: fire-and-forget mode?

2014-09-18 Thread Tobias Pfeiffer
Hi, thanks for everyone's replies! On Thu, Sep 18, 2014 at 7:37 AM, Sandy Ryza sandy.r...@cloudera.com wrote: YARN cluster mode should have the behavior you're looking for. The client process will stick around to report on things, but should be able to be killed without affecting the

spark-submit command-line with --files

2014-09-18 Thread chinchu
yarn-cluster with the same result]. I am using the SparkFiles.get(myobject.ser) to get the file. Am I doing something wrong ? CMD: bin/spark-submit --name Test --class com.test.batch.modeltrainer.ModelTrainerMain \ --master local --files /tmp/myobject.ser --verbose /opt/test/lib/spark

Re: prepending jars to the driver class path for spark-submit on YARN

2014-09-09 Thread Penny Espinoza
. From: Xiangrui Meng men...@gmail.commailto:men...@gmail.com Sent: Sunday, September 07, 2014 11:40 PM To: Victor Tso-Guillen Cc: Penny Espinoza; Spark Subject: Re: prepending jars to the driver class path for spark-submit on YARN There is an undocumented configuration to put users jars

Re: prepending jars to the driver class path for spark-submit on YARN

2014-09-08 Thread Xiangrui Meng
with org.apache.httpcomponents httpcore and httpclient when using spark-submit with YARN running Spark 1.0.2 on a Hadoop 2.2 cluster. I’ve seen several posts about this issue, but no resolution. The error message is this: Caused by: java.lang.NoSuchMethodError

Spark-submit ClassNotFoundException with JAR!

2014-09-08 Thread Peter Aberline
exception on line 31, where the ClassToRoundTrip object is deserialized. Strangely, the earlier use on line 28 is okay: spark-submit --class SimpleApp \ --master local[4] \ target/scala-2.10/simpleapp_2.10-1.0.jar However, if I add extra parameters for driver-class-path

RE: prepending jars to the driver class path for spark-submit on YARN

2014-09-08 Thread Penny Espinoza
I don't understand what you mean. Can you be more specific? From: Victor Tso-Guillen v...@paxata.com Sent: Saturday, September 06, 2014 5:13 PM To: Penny Espinoza Cc: Spark Subject: Re: prepending jars to the driver class path for spark-submit on YARN I ran

Re: prepending jars to the driver class path for spark-submit on YARN

2014-09-08 Thread Xiangrui Meng
When you submit the job to yarn with spark-submit, set --conf spark.yarn.user.classpath.first=true . On Mon, Sep 8, 2014 at 10:46 AM, Penny Espinoza pesp...@societyconsulting.com wrote: I don't understand what you mean. Can you be more specific? From: Victor

<    2   3   4   5   6   7   8   >