Re: deployment options for Spark and YARN w/ many app jar library dependencies

2015-06-17 Thread Sandy Ryza
, I’m looking to deploy spark on YARN and I have read through the docs ( https://spark.apache.org/docs/latest/running-on-yarn.html). One question that I still have is if there is an alternate means of including your own app jars as opposed to the process in the “Adding Other Jars” section

deployment options for Spark and YARN w/ many app jar library dependencies

2015-06-17 Thread Sweeney, Matt
Hi folks, I’m looking to deploy spark on YARN and I have read through the docs (https://spark.apache.org/docs/latest/running-on-yarn.html). One question that I still have is if there is an alternate means of including your own app jars as opposed to the process in the “Adding Other Jars

Resource allocation configurations for Spark on Yarn

2015-06-12 Thread Jim Green
Hi Team, Sharing one article which summarize the Resource allocation configurations for Spark on Yarn: Resource allocation configurations for Spark on Yarn http://www.openkb.info/2015/06/resource-allocation-configurations-for.html -- Thanks, www.openkb.info (Open KnowledgeBase for Hadoop

Re: Is it possible to see Spark jobs on MapReduce job history ? (running Spark on YARN cluster)

2015-06-12 Thread Steve Loughran
...@gmail.com wrote: Hi all, I wonder if anyone has used use MapReduce Job History to show Spark jobs. I can see my Spark jobs (Spark running on Yarn cluster) on Resource manager (RM). I start Spark History server, and then through Spark's web-based user interface I can monitor

Is it possible to see Spark jobs on MapReduce job history ? (running Spark on YARN cluster)

2015-06-11 Thread Elkhan Dadashov
Hi all, I wonder if anyone has used use MapReduce Job History to show Spark jobs. I can see my Spark jobs (Spark running on Yarn cluster) on Resource manager (RM). I start Spark History server, and then through Spark's web-based user interface I can monitor the cluster (and track cluster

spark on yarn

2015-06-09 Thread Neera
.n3.nabble.com/spark-on-yarn-tp23230.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-06-04 Thread Ji ZHANG
Hi, I set spark.shuffle.io.preferDirectBufs to false in SparkConf and this setting can be seen in web ui's environment tab. But, it still eats memory, i.e. -Xmx set to 512M but RES grows to 1.5G in half a day. On Wed, Jun 3, 2015 at 12:02 PM, Shixiong Zhu zsxw...@gmail.com wrote: Could you

Re: Spark 1.4 YARN Application Master fails with 500 connect refused

2015-06-02 Thread Night Wolf
Just testing with Spark 1.3, it looks like it sets the proxy correctly to be the YARN RM host (0101) 15/06/03 10:34:19 INFO yarn.ApplicationMaster: Registered signal handlers for [TERM, HUP, INT] 15/06/03 10:34:20 INFO yarn.ApplicationMaster: ApplicationAttemptId:

Re: Spark 1.4 YARN Application Master fails with 500 connect refused

2015-06-02 Thread Marcelo Vanzin
That code hasn't changed at all between 1.3 and 1.4; it also has been working fine for me. Are you sure you're using exactly the same Hadoop libraries (since you're building with -Phadoop-provided) and Hadoop configuration in both cases? On Tue, Jun 2, 2015 at 5:29 PM, Night Wolf

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-06-02 Thread Ji ZHANG
Hi, Thanks for you information. I'll give spark1.4 a try when it's released. On Wed, Jun 3, 2015 at 11:31 AM, Tathagata Das t...@databricks.com wrote: Could you try it out with Spark 1.4 RC3? Also pinging, Cloudera folks, they may be aware of something. BTW, the way I have debugged memory

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-06-02 Thread Tathagata Das
Could you try it out with Spark 1.4 RC3? Also pinging, Cloudera folks, they may be aware of something. BTW, the way I have debugged memory leaks in the past is as follows. Run with a small driver memory, say 1 GB. Periodically (maybe a script), take snapshots of histogram and also do memory

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-06-02 Thread Ji ZHANG
Hi, Thanks for you reply. Here's the top 30 entries of jmap -histo:live result: num #instances #bytes class name -- 1: 40802 145083848 [B 2: 99264 12716112 methodKlass 3: 99264 12291480

Re: Spark 1.4 YARN Application Master fails with 500 connect refused

2015-06-02 Thread Night Wolf
Thanks Marcelo - looks like it was my fault. Seems when we deployed the new version of spark it was picking up the wrong yarn site and setting the wrong proxy host. All good now! On Wed, Jun 3, 2015 at 11:01 AM, Marcelo Vanzin van...@cloudera.com wrote: That code hasn't changed at all between

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-05-28 Thread Akhil Das
Hi Zhang, Could you paste your code in a gist? Not sure what you are doing inside the code to fill up memory. Thanks Best Regards On Thu, May 28, 2015 at 10:08 AM, Ji ZHANG zhangj...@gmail.com wrote: Hi, Yes, I'm using createStream, but the storageLevel param is by default

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-05-28 Thread Ji ZHANG
Hi, I wrote a simple test job, it only does very basic operations. for example: val lines = KafkaUtils.createStream(ssc, zkQuorum, group, Map(topic - 1)).map(_._2) val logs = lines.flatMap { line = try { Some(parse(line).extract[Impression]) } catch { case _:

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-05-28 Thread Akhil Das
Can you replace your counting part with this? logs.filter(_.s_id 0).foreachRDD(rdd = logger.info(rdd.count())) Thanks Best Regards On Thu, May 28, 2015 at 1:02 PM, Ji ZHANG zhangj...@gmail.com wrote: Hi, I wrote a simple test job, it only does very basic operations. for example:

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-05-28 Thread Ji ZHANG
Hi, Unfortunately, they're still growing, both driver and executors. I run the same job with local mode, everything is fine. On Thu, May 28, 2015 at 5:26 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Can you replace your counting part with this? logs.filter(_.s_id 0).foreachRDD(rdd =

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-05-27 Thread Akhil Das
After submitting the job, if you do a ps aux | grep spark-submit then you can see all JVM params. Are you using the highlevel consumer (receiver based) for receiving data from Kafka? In that case if your throughput is high and the processing delay exceeds batch interval then you will hit this

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-05-27 Thread Ji ZHANG
Hi, Yes, I'm using createStream, but the storageLevel param is by default MEMORY_AND_DISK_SER_2. Besides, the driver's memory is also growing. I don't think Kafka messages will be cached in driver. On Thu, May 28, 2015 at 12:24 AM, Akhil Das ak...@sigmoidanalytics.com wrote: Are you using the

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-05-27 Thread Ji ZHANG
Hi Akhil, Thanks for your reply. Accoding to the Streaming tab of Web UI, the Processing Time is around 400ms, and there's no Scheduling Delay, so I suppose it's not the Kafka messages that eat up the off-heap memory. Or maybe it is, but how to tell? I googled about how to check the off-heap

running spark on yarn

2015-05-21 Thread Nathan Kronenfeld
to do that under yarn? I've managed to get it mostly there by including all spark, yarn, hadoop, and hdfs config files in my SparkConf (somewhat indirectly, and that is a bit of a short-hand), but while the job shows up now under yarn, and has its own applications web ui page, it's not showing up

Re: Spark on Yarn : Map outputs lifetime ?

2015-05-18 Thread Imran Rashid
the stage completes, because it might get used again by another later (eg., if the stage is retried). On Tue, May 12, 2015 at 6:50 PM, Ashwin Shankar ashwinshanka...@gmail.com wrote: Hi, In spark on yarn and when running spark_shuffle as auxiliary service on node manager, does map spills of a stage

Spark on Yarn : Map outputs lifetime ?

2015-05-12 Thread Ashwin Shankar
Hi, In spark on yarn and when running spark_shuffle as auxiliary service on node manager, does map spills of a stage gets cleaned up once the next stage completes OR is it preserved till the app completes(ie waits for all the stages to complete) ? -- Thanks, Ashwin

Re: How to install spark in spark on yarn mode

2015-04-30 Thread xiaohe lan
link: http://mbonaci.github.io/mbo-spark/ You dont need to install spark on every node.Just install it on one node or you can install it on remote system also and made a spark cluster. Thanks Madhvi On Thursday 30 April 2015 09:31 AM, xiaohe lan wrote: Hi experts, I see spark on yarn has

Re: How to install spark in spark on yarn mode

2015-04-30 Thread madhvi
spark on every node.Just install it on one node or you can install it on remote system also and made a spark cluster. Thanks Madhvi On Thursday 30 April 2015 09:31 AM, xiaohe lan wrote: Hi experts, I see spark on yarn has yarn-client and yarn-cluster mode. I

Re: How to install spark in spark on yarn mode

2015-04-30 Thread Shixiong Zhu
and made a spark cluster. Thanks Madhvi On Thursday 30 April 2015 09:31 AM, xiaohe lan wrote: Hi experts, I see spark on yarn has yarn-client and yarn-cluster mode. I also have a 5 nodes hadoop cluster (hadoop 2.4). How to install spark if I want to try the spark on yarn mode. Do I need

Re: How to install spark in spark on yarn mode

2015-04-29 Thread madhvi
, xiaohe lan wrote: Hi experts, I see spark on yarn has yarn-client and yarn-cluster mode. I also have a 5 nodes hadoop cluster (hadoop 2.4). How to install spark if I want to try the spark on yarn mode. Do I need to install spark on the each node of hadoop cluster ? Thanks, Xiaohe

How to install spark in spark on yarn mode

2015-04-29 Thread xiaohe lan
Hi experts, I see spark on yarn has yarn-client and yarn-cluster mode. I also have a 5 nodes hadoop cluster (hadoop 2.4). How to install spark if I want to try the spark on yarn mode. Do I need to install spark on the each node of hadoop cluster ? Thanks, Xiaohe

Re: How to debug Spark on Yarn?

2015-04-28 Thread Steve Loughran
On 27 Apr 2015, at 07:51, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.commailto:deepuj...@gmail.com wrote: Spark 1.3 1. View stderr/stdout from executor from Web UI: when the job is running i figured out the executor that am suppose to see, and those two links show 4 special characters on browser. 2.

Re: How to debug Spark on Yarn?

2015-04-27 Thread ๏̯͡๏
Spark 1.3 1. View stderr/stdout from executor from Web UI: when the job is running i figured out the executor that am suppose to see, and those two links show 4 special characters on browser. 2. Tail on Yarn logs: /apache/hadoop/bin/yarn logs -applicationId application_1429087638744_151059 |

Re: How to debug Spark on Yarn?

2015-04-27 Thread ๏̯͡๏
1) Application container logs from Web RM UI never load on browser. I eventually have to kill the browser. 2) /apache/hadoop/bin/yarn logs -applicationId application_1429087638744_151059 | less emits logs only after the application has completed. Are there no better ways to see the logs as they

Re: How to debug Spark on Yarn?

2015-04-27 Thread Zoltán Zvara
You can check container logs from RM web UI or when log-aggregation is enabled with the yarn command. There are other, but less convenient options. On Mon, Apr 27, 2015 at 8:53 AM ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: Spark 1.3 1. View stderr/stdout from executor from Web UI: when the job

Re: How to debug Spark on Yarn?

2015-04-24 Thread Marcelo Vanzin
On top of what's been said... On Wed, Apr 22, 2015 at 10:48 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: 1) I can go to Spark UI and see the status of the APP but cannot see the logs as the job progresses. How can i see logs of executors as they progress ? Spark 1.3 should have links to the

Re: How to debug Spark on Yarn?

2015-04-24 Thread Sven Krasser
, Ted Yu yuzhih...@gmail.com wrote: For step 2, you can pipe application log to a file instead of copy-pasting. Cheers On Apr 22, 2015, at 10:48 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: I submit a spark app to YARN and i get these messages 15/04/22 22:45:04 INFO yarn.Client

Re: How to debug Spark on Yarn?

2015-04-24 Thread Sven Krasser
On Fri, Apr 24, 2015 at 11:31 AM, Marcelo Vanzin van...@cloudera.com wrote: Spark 1.3 should have links to the executor logs in the UI while the application is running. Not yet in the history server, though. You're absolutely correct -- didn't notice it until now. This is a great addition!

Re: How to debug Spark on Yarn?

2015-04-23 Thread Ted Yu
For step 2, you can pipe application log to a file instead of copy-pasting. Cheers On Apr 22, 2015, at 10:48 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: I submit a spark app to YARN and i get these messages 15/04/22 22:45:04 INFO yarn.Client: Application report

How to debug Spark on Yarn?

2015-04-22 Thread ๏̯͡๏
I submit a spark app to YARN and i get these messages 15/04/22 22:45:04 INFO yarn.Client: Application report for application_1429087638744_101363 (state: RUNNING) 15/04/22 22:45:04 INFO yarn.Client: Application report for application_1429087638744_101363 (state: RUNNING). ... 1) I can go

RE: shuffle.FetchFailedException in spark on YARN job

2015-04-20 Thread Shao, Saisai
...@sigmoidanalytics.com] Sent: Monday, April 20, 2015 2:56 PM To: roy Cc: user@spark.apache.org Subject: Re: shuffle.FetchFailedException in spark on YARN job Which version of Spark are you using? Did you try using spark.shuffle.blockTransferService=nio Thanks Best Regards On Sat, Apr 18, 2015 at 11:14 PM

Re: shuffle.FetchFailedException in spark on YARN job

2015-04-20 Thread Akhil Das
spark.akka.frameSize=1 --executor-cores 3 any idea why its failing on shuffle fetch ? thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/shuffle-FetchFailedException-in-spark-on-YARN-job-tp22557.html Sent from the Apache Spark User List

shuffle.FetchFailedException in spark on YARN job

2015-04-18 Thread roy
spark.yarn.driver.memoryOverhead=1000 --conf spark.akka.frameSize=1 --executor-cores 3 any idea why its failing on shuffle fetch ? thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/shuffle-FetchFailedException-in-spark-on-YARN-job-tp22557.html Sent from the Apache

Re: Spark-on-YARN architecture

2015-03-10 Thread Harika Matha
: In YARN cluster mode, there is no Spark master, since YARN is your resource manager. Yes you could force your AM somehow to run on the same node as the RM, but why -- what do think is faster about that? On Tue, Mar 10, 2015 at 10:06 AM, Harika matha.har...@gmail.com wrote: Hi all, I have

Re: Spark-on-YARN architecture

2015-03-10 Thread Sean Owen
in YARN client mode. And I want to run the AM on the same node as RM inorder use the node which otherwise would run AM. How can I get AM run on the same node as RM? On Tue, Mar 10, 2015 at 3:49 PM, Sean Owen so...@cloudera.com wrote: In YARN cluster mode, there is no Spark master, since YARN

Spark-on-YARN architecture

2015-03-10 Thread Harika
of the cluster. 1. Is this the correct way of configuration? What is the architecture of Spark on YARN? 2. Is there a way in which I can run Spark master, YARN application master and resource manager on a single node?(so that I can use three other nodes for the computation) Thanks Harika -- View

Re: Spark-on-YARN architecture

2015-03-10 Thread Sean Owen
In YARN cluster mode, there is no Spark master, since YARN is your resource manager. Yes you could force your AM somehow to run on the same node as the RM, but why -- what do think is faster about that? On Tue, Mar 10, 2015 at 10:06 AM, Harika matha.har...@gmail.com wrote: Hi all, I have Spark

Re: Setting up Spark with YARN on EC2 cluster

2015-03-10 Thread roni
Hi Harika, Did you get any solution for this? I want to use yarn , but the spark-ec2 script does not support it. Thanks -Roni -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Setting-up-Spark-with-YARN-on-EC2-cluster-tp21818p21991.html Sent from the Apache

Re: Setting up Spark with YARN on EC2 cluster

2015-03-10 Thread Deborah Siegel
]$ ./sbin/stop-all.sh ephemeral-hdfs]$ ./sbin/start-all.sh HTH Deb On Wed, Feb 25, 2015 at 11:46 PM, Harika matha.har...@gmail.com wrote: Hi, I want to setup a Spark cluster with YARN dependency on Amazon EC2. I was reading this https://spark.apache.org/docs/1.2.0/running-on-yarn.html

Re: How to get yarn logs to display in the spark or yarn history-server?

2015-02-27 Thread Christophe Préaud
Yes, spark.yarn.historyServer.address is used to access the spark history server from yarn, it is not needed if you use only the yarn history server. It may be possible to have both history servers running, but I have not tried that yet. Besides, as far as I have understood, yarn and spark

Re: How to get yarn logs to display in the spark or yarn history-server?

2015-02-26 Thread Christophe Préaud
You can see this information in the yarn web UI using the configuration I provided in my former mail (click on the application id, then on logs; you will then be automatically redirected to the yarn history server UI). On 24/02/2015 19:49, Colin Kincaid Williams wrote: So back to my original

Setting up Spark with YARN on EC2 cluster

2015-02-25 Thread Harika
Hi, I want to setup a Spark cluster with YARN dependency on Amazon EC2. I was reading this https://spark.apache.org/docs/1.2.0/running-on-yarn.html document and I understand that Hadoop has to be setup for running Spark with YARN. My questions - 1. Do we have to setup Hadoop cluster on EC2

Re: How to get yarn logs to display in the spark or yarn history-server?

2015-02-24 Thread Christophe Préaud
Hi Colin, Here is how I have configured my hadoop cluster to have yarn logs available through both the yarn CLI and the _yarn_ history server (with gzip compression and 10 days retention): 1. Add the following properties in the yarn-site.xml on each node managers and on the resource manager:

Re: How to get yarn logs to display in the spark or yarn history-server?

2015-02-24 Thread Colin Kincaid Williams
Looks like in my tired state, I didn't mention spark the whole time. However, it might be implied by the application log above. Spark log aggregation appears to be working, since I can run the yarn command above. I do have yarn logging setup for the yarn history server. I was trying to use the

Re: How to get yarn logs to display in the spark or yarn history-server?

2015-02-24 Thread Imran Rashid
the spark history server and the yarn history server are totally independent. Spark knows nothing about yarn logs, and vice versa, so unfortunately there isn't any way to get all the info in one place. On Tue, Feb 24, 2015 at 12:36 PM, Colin Kincaid Williams disc...@uw.edu wrote: Looks like in

Re: How to get yarn logs to display in the spark or yarn history-server?

2015-02-24 Thread Colin Kincaid Williams
So back to my original question. I can see the spark logs using the example above: yarn logs -applicationId application_1424740955620_0009 This shows yarn log aggregation working. I can see the std out and std error in that container information above. Then how can I get this information in a

How to get yarn logs to display in the spark or yarn history-server?

2015-02-23 Thread Colin Kincaid Williams
Hi, I have been trying to get my yarn logs to display in the spark history-server or yarn history-server. I can see the log information yarn logs -applicationId application_1424740955620_0009 15/02/23 22:15:14 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to

Open file limit settings for Spark on Yarn job

2015-02-10 Thread Arun Luthra
Hi, I'm running Spark on Yarn from an edge node, and the tasks on the run Data Nodes. My job fails with the Too many open files error once it gets to groupByKey(). Alternatively I can make it fail immediately if I repartition the data when I create the RDD. Where do I need to make sure

Re: Open file limit settings for Spark on Yarn job

2015-02-10 Thread Sandy Ryza
Spark on Yarn from an edge node, and the tasks on the run Data Nodes. My job fails with the Too many open files error once it gets to groupByKey(). Alternatively I can make it fail immediately if I repartition the data when I create the RDD. Where do I need to make sure that ulimit -n is high

Re: Spark on Yarn: java.lang.IllegalArgumentException: Invalid rule

2015-02-03 Thread maven
The version I'm using was already pre-built for Hadoop 2.3. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-on-Yarn-java-lang-IllegalArgumentException-Invalid-rule-tp21382p21485.html Sent from the Apache Spark User List mailing list archive

spark on yarn succeeds but exit code 1 in logs

2015-01-31 Thread Koert Kuipers
i have a simple spark app that i run with spark-submit on yarn. it runs fine and shows up with finalStatus=SUCCEEDED in the resource manager logs. however in the nodemanager logs i see this: 2015-01-31 18:30:48,195 INFO

Re: spark on yarn succeeds but exit code 1 in logs

2015-01-31 Thread Ted Yu
with spark-submit on yarn. it runs fine and shows up with finalStatus=SUCCEEDED in the resource manager logs. however in the nodemanager logs i see this: 2015-01-31 18:30:48,195 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage

Re: spark on yarn succeeds but exit code 1 in logs

2015-01-31 Thread Koert Kuipers
clue there ? You can pastebin part of the RM log around the time your job ran ? What hadoop version are you using ? Thanks On Sat, Jan 31, 2015 at 11:24 AM, Koert Kuipers ko...@tresata.com wrote: i have a simple spark app that i run with spark-submit on yarn. it runs fine and shows up

Re: Spark on YARN: java.lang.ClassCastException SerializedLambda to org.apache.spark.api.java.function.Function in instance of org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1

2015-01-30 Thread Milad khajavi
Don't know how to solve it. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-on-YARN-java-lang-ClassCastException-SerializedLambda-to-org-apache-spark-api-java-function-Fu1-tp21261p21315.html Sent from the Apache Spark User List mailing list archive

Re: Spark on Yarn: java.lang.IllegalArgumentException: Invalid rule

2015-01-28 Thread siddardha
Then your spark is not built for yarn. Try to build with sbt/sbt -Dhadoop.version=2.3.0 -Pyarn assembly -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-on-Yarn-java-lang-IllegalArgumentException-Invalid-rule-tp21382p21404.html Sent from the Apache

Re: Spark on Yarn: java.lang.IllegalArgumentException: Invalid rule

2015-01-27 Thread Niranjan Reddy
and on yarn, without any issues. It looks like I may be missing a configuration step somewhere. Any thoughts on what may be causing this? NR -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-on-Yarn-java-lang-IllegalArgumentException-Invalid-rule-tp21382

Re: Spark on Yarn: java.lang.IllegalArgumentException: Invalid rule

2015-01-27 Thread maven
Thanks, Siddardha. I did but got the same error. Kerberos is enabled on my cluster and I may be missing a configuration step somewhere. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-on-Yarn-java-lang-IllegalArgumentException-Invalid-rule

Re: Spark on Yarn: java.lang.IllegalArgumentException: Invalid rule

2015-01-27 Thread Ted Yu
this? NR -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-on-Yarn-java-lang-IllegalArgumentException-Invalid-rule-tp21382.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Spark on Yarn: java.lang.IllegalArgumentException: Invalid rule

2015-01-26 Thread maven
on what may be causing this? NR -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-on-Yarn-java-lang-IllegalArgumentException-Invalid-rule-tp21382.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Spark on YARN: java.lang.ClassCastException SerializedLambda to org.apache.spark.api.java.function.Function in instance of org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1

2015-01-22 Thread thanhtien522
Update: I deployed a stand-alone spark in localhost then set Master as spark://localhost:7077 and it met the same issue Don't know how to solve it. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-on-YARN-java-lang-ClassCastException-SerializedLambda

Spark on YARN: java.lang.ClassCastException SerializedLambda to org.apache.spark.api.java.function.Function in instance of org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1

2015-01-20 Thread thanhtien522
Hi, I try to run Spark on YARN cluster by set master as yarn-client on java code. It works fine with count task by not working with other command. It threw ClassCastException: java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field

Re: Trying to execute Spark in Yarn

2015-01-08 Thread Shixiong Zhu
`--jars` accepts a comma-separated list of jars. See the usage about `--jars` --jars JARS Comma-separated list of local jars to include on the driver and executor classpaths. Best Regards, Shixiong Zhu 2015-01-08 19:23 GMT+08:00 Guillermo Ortiz konstt2...@gmail.com: I'm trying to execute

Trying to execute Spark in Yarn

2015-01-08 Thread Guillermo Ortiz
I'm trying to execute Spark from a Hadoop Cluster, I have created this script to try it: #!/bin/bash export HADOOP_CONF_DIR=/etc/hadoop/conf SPARK_CLASSPATH= for lib in `ls /user/local/etc/lib/*.jar` do SPARK_CLASSPATH=$SPARK_CLASSPATH:$lib done

Re: Trying to execute Spark in Yarn

2015-01-08 Thread Guillermo Ortiz
thanks! 2015-01-08 12:59 GMT+01:00 Shixiong Zhu zsxw...@gmail.com: `--jars` accepts a comma-separated list of jars. See the usage about `--jars` --jars JARS Comma-separated list of local jars to include on the driver and executor classpaths. Best Regards, Shixiong Zhu 2015-01-08

Re: spark-network-yarn 2.11 depends on spark-network-shuffle 2.10

2015-01-08 Thread Aniket Bhatnagar
libraries are java-only (the scala version appended there is just for helping the build scripts). But it does look weird, so it would be nice to fix it. On Wed, Jan 7, 2015 at 12:25 AM, Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: It seems that spark-network-yarn compiled for scala 2.11

spark-network-yarn 2.11 depends on spark-network-shuffle 2.10

2015-01-07 Thread Aniket Bhatnagar
It seems that spark-network-yarn compiled for scala 2.11 depends on spark-network-shuffle compiled for scala 2.10. This causes cross version dependencies conflicts in sbt. Seems like a publishing error? http://www.uploady.com/#!/download/6Yn95UZA0DR/3taAJFjCJjrsSXOR

Re: spark-network-yarn 2.11 depends on spark-network-shuffle 2.10

2015-01-07 Thread Marcelo Vanzin
wrote: It seems that spark-network-yarn compiled for scala 2.11 depends on spark-network-shuffle compiled for scala 2.10. This causes cross version dependencies conflicts in sbt. Seems like a publishing error? http://www.uploady.com/#!/download/6Yn95UZA0DR/3taAJFjCJjrsSXOR -- Marcelo

Re: Spark 1.2.0 Yarn not published

2014-12-29 Thread Aniket Bhatnagar
Thanks Ted. Adding dependency to spark-network-yarn would allow resolution to YarnShuffleService which from docs suggests that it runs on Node Manager and perhaps isn't useful while submitting jobs programmatically. I think what I need is a dependency to spark-yarn module so that classes like

Spark 1.2.0 Yarn not published

2014-12-28 Thread Aniket Bhatnagar
Hi all I just realized that spark-yarn artifact hasn't been published for 1.2.0 release. Any particular reason for that? I was using it in my yet another spark-job-server project to submit jobs to a YARN cluster through convenient REST APIs (with some success). The job server was creating

Re: Spark 1.2.0 Yarn not published

2014-12-28 Thread Ted Yu
See this thread: http://search-hadoop.com/m/JW1q5vd61V1/Spark-yarn+1.2.0subj=Re+spark+yarn_2+10+1+2+0+artifacts Cheers On Dec 28, 2014, at 11:13 PM, Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: Hi all I just realized that spark-yarn artifact hasn't been published for 1.2.0 release

Re: Who manage the log4j appender while running spark on yarn?

2014-12-22 Thread WangTaoTheTonic
.1001560.n3.nabble.com/Who-manage-the-log4j-appender-while-running-spark-on-yarn-tp20778p20818.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

Re: Who manage the log4j appender while running spark on yarn?

2014-12-22 Thread Marcelo Vanzin
section of the Running on YARN docs. On Fri, Dec 19, 2014 at 12:37 AM, WangTaoTheTonic barneystin...@aliyun.com wrote: Hi guys, I recently ran spark on yarn and found spark didn't set any log4j properties file in configuration or code. And the log4j logs was writing into stderr file under

Who manage the log4j appender while running spark on yarn?

2014-12-19 Thread WangTaoTheTonic
Hi guys, I recently ran spark on yarn and found spark didn't set any log4j properties file in configuration or code. And the log4j logs was writing into stderr file under ${yarn.nodemanager.log-dirs}/application_${appid}. I wanna know which side(spark or hadoop) controll the appender? Have

spark/yarn ignoring num-executors (python, Amazon EMR, spark-submit, yarn-client)

2014-12-19 Thread Tim Schweichler
Hello, I'm experiencing an issue where yarn is scheduling two executors (the default) regardless of what I enter as num-executors when submitting an application. Background: I'm running Spark with Yarn on Amazon EMR. My cluster has two core nodes and three task nodes. All five nodes

resource allocation spark on yarn

2014-12-12 Thread gpatcham
Hi All, I have spark on yarn and there are multiple spark jobs on the cluster. Sometimes some jobs are not getting enough resources even when there are enough free resources available on cluster, even when I use below settings --num-workers 75 \ --worker-cores 16 Jobs stick with the resources

Re: resource allocation spark on yarn

2014-12-12 Thread Sameer Farooqui
, Dec 12, 2014 at 10:52 AM, gpatcham gpatc...@gmail.com wrote: Hi All, I have spark on yarn and there are multiple spark jobs on the cluster. Sometimes some jobs are not getting enough resources even when there are enough free resources available on cluster, even when I use below settings --num

Re: resource allocation spark on yarn

2014-12-12 Thread Giri P
12, 2014 at 10:52 AM, gpatcham gpatc...@gmail.com wrote: Hi All, I have spark on yarn and there are multiple spark jobs on the cluster. Sometimes some jobs are not getting enough resources even when there are enough free resources available on cluster, even when I use below settings --num

Re: resource allocation spark on yarn

2014-12-12 Thread Tsuyoshi OZAWA
spark on yarn and there are multiple spark jobs on the cluster. Sometimes some jobs are not getting enough resources even when there are enough free resources available on cluster, even when I use below settings --num-workers 75 \ --worker-cores 16 Jobs stick with the resources what they get

Re: Spark on YARN memory utilization

2014-12-09 Thread Denny Lee
Thanks Sandy! On Mon, Dec 8, 2014 at 23:15 Sandy Ryza sandy.r...@cloudera.com wrote: Another thing to be aware of is that YARN will round up containers to the nearest increment of yarn.scheduler.minimum-allocation-mb, which defaults to 1024. -Sandy On Sat, Dec 6, 2014 at 3:48 PM, Denny Lee

Re: vcores used in cluster metrics(yarn resource manager ui) when running spark on yarn

2014-12-08 Thread Sandy Ryza
Hi yuemeng, Are you possibly running the Capacity Scheduler with the default resource calculator? -Sandy On Sat, Dec 6, 2014 at 7:29 PM, yuemeng1 yueme...@huawei.com wrote: Hi, all When i running an app with this cmd: ./bin/spark-sql --master yarn-client --num-executors 2 --executor

Re: Spark on YARN memory utilization

2014-12-08 Thread Sandy Ryza
Another thing to be aware of is that YARN will round up containers to the nearest increment of yarn.scheduler.minimum-allocation-mb, which defaults to 1024. -Sandy On Sat, Dec 6, 2014 at 3:48 PM, Denny Lee denny.g@gmail.com wrote: Got it - thanks! On Sat, Dec 6, 2014 at 14:56 Arun Ahuja

Spark on YARN memory utilization

2014-12-06 Thread Denny Lee
This is perhaps more of a YARN question than a Spark question but i was just curious to how is memory allocated in YARN via the various configurations. For example, if I spin up my cluster with 4GB with a different number of executors as noted below 4GB executor-memory x 10 executors = 46GB

Re: Spark on YARN memory utilization

2014-12-06 Thread Arun Ahuja
Hi Denny, This is due the spark.yarn.memoryOverhead parameter, depending on what version of Spark you are on the default of this may differ, but it should be the larger of 1024mb per executor or .07 * executorMemory. When you set executor memory, the yarn resource request is executorMemory +

Re: Spark on YARN memory utilization

2014-12-06 Thread Denny Lee
Got it - thanks! On Sat, Dec 6, 2014 at 14:56 Arun Ahuja aahuj...@gmail.com wrote: Hi Denny, This is due the spark.yarn.memoryOverhead parameter, depending on what version of Spark you are on the default of this may differ, but it should be the larger of 1024mb per executor or .07 *

vcores used in cluster metrics(yarn resource manager ui) when running spark on yarn

2014-12-06 Thread yuemeng1
Hi, all When i running an app with this cmd: ./bin/spark-sql --master yarn-client --num-executors 2 --executor-cores 3, i noticed that yarn resource manager ui shows the `vcores used` in cluster metrics is 3. It seems `vcores used` show wrong num (should be 7?)? Or i miss something

Issue on [SPARK-3877][YARN]: Return code of the spark-submit in yarn-cluster mode

2014-12-05 Thread LinQili
Hi, all: According to https://github.com/apache/spark/pull/2732, When a spark job fails or exits nonzero in yarn-cluster mode, the spark-submit will get the corresponding return code of the spark job. But I tried in spark-1.1.1 yarn cluster, spark-submit return zero anyway. Here is my spark

RE: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in yarn-cluster mode

2014-12-05 Thread LinQili
I tried in spark client mode, spark-submit can get the correct return code from spark job. But in yarn-cluster mode, It failed. From: lin_q...@outlook.com To: u...@spark.incubator.apache.org Subject: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in yarn-cluster mode Date: Fri, 5

RE: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in yarn-cluster mode

2014-12-05 Thread LinQili
-submit cannot get the second return code 100. What's the difference between these two `exit`? I was so confused. From: lin_q...@outlook.com To: u...@spark.incubator.apache.org Subject: RE: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in yarn-cluster mode Date: Fri, 5 Dec 2014 17

Re: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in yarn-cluster mode

2014-12-05 Thread Shixiong Zhu
. -- From: lin_q...@outlook.com To: u...@spark.incubator.apache.org Subject: RE: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in yarn-cluster mode Date: Fri, 5 Dec 2014 17:11:39 +0800 I tried in spark client mode, spark-submit can get the correct return code

Re: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in yarn-cluster mode

2014-12-05 Thread Shixiong Zhu
these two `exit`? I was so confused. -- From: lin_q...@outlook.com To: u...@spark.incubator.apache.org Subject: RE: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in yarn-cluster mode Date: Fri, 5 Dec 2014 17:11:39 +0800 I tried in spark client mode

Spark on YARN - master role

2014-11-25 Thread Praveen Sripati
Hi, In the Spark on YARN, the AM (driver) will ask the RM for resources. Once the resources are allocated by the RM, the AM will start the executors through the NM. This is my understanding. But, according to the Spark documentation (1), the `spark.yarn.applicationMaster.waitTries` properties

Re: Spark on YARN

2014-11-19 Thread Alan Prando
Owen so...@cloudera.com: My guess is you're asking for all cores of all machines but the driver needs at least one core, so one executor is unable to find a machine to fit on. On Nov 18, 2014 7:04 PM, Alan Prando a...@scanboo.com.br wrote: Hi Folks! I'm running Spark on YARN cluster installed

Re: Spark on YARN

2014-11-19 Thread Sean Owen
for all cores of all machines but the driver needs at least one core, so one executor is unable to find a machine to fit on. On Nov 18, 2014 7:04 PM, Alan Prando a...@scanboo.com.br wrote: Hi Folks! I'm running Spark on YARN cluster installed with Cloudera Manager Express. The cluster has 1

<    1   2   3   4   5   6   >