Re: Spark-on-Yarn ClassNotFound Exception

2022-12-18 Thread Hariharan
Hi scrypso, Sorry for the late reply. Yes, I did mean spark.driver.extraClassPath. I was able to work around this issue by removing the need for an extra class, but I'll investigate along these lines nonetheless. Thanks again for all your help! On Thu, Dec 15, 2022 at 9:56 PM scrypso wrote: >

Re: Spark-on-Yarn ClassNotFound Exception

2022-12-15 Thread scrypso
Hmm, did you mean spark.*driver*.extraClassPath? That is very odd then - if you check the logs directory for the driver (on the cluster) I think there should be a launch container log, where you can see the exact command used to start the JVM (at the very end), and a line starting "export

Re: Spark-on-Yarn ClassNotFound Exception

2022-12-13 Thread Hariharan
Hi scrypso, Thanks for the help so far, and I think you're definitely on to something here. I tried loading the class as you suggested with the code below: try { Thread.currentThread().getContextClassLoader().loadClass(MyS3ClientFactory.class.getCanonicalName()); logger.info("Loaded

Re: Spark-on-Yarn ClassNotFound Exception

2022-12-13 Thread scrypso
I'm on my phone, so can't compare with the Spark source, but that looks to me like it should be well after the ctx loader has been set. You could try printing the classpath of the loader Thread.currentThread().getThreadContextClassLoader(), or try to load your class from that yourself to see if

Re: Spark-on-Yarn ClassNotFound Exception

2022-12-13 Thread Hariharan
Thanks for the response, scrypso! I will try adding the extraClassPath option. Meanwhile, please find the full stack trace below (I have masked/removed references to proprietary code) java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class

Re: Spark-on-Yarn ClassNotFound Exception

2022-12-13 Thread scrypso
Two ideas you could try: You can try spark.driver.extraClassPath as well. Spark loads the user's jar in a child classloader, so Spark/Yarn/Hadoop can only see your classes reflectively. Hadoop's Configuration should use the thread ctx classloader, and Spark should set that to the loader that

Re: Spark-on-Yarn ClassNotFound Exception

2022-12-13 Thread Hariharan
Missed to mention it above, but just to add, the error is coming from the driver. I tried using *--driver-class-path /path/to/my/jar* as well, but no luck. Thanks! On Mon, Dec 12, 2022 at 4:21 PM Hariharan wrote: > Hello folks, > > I have a spark app with a custom implementation of >

Re: Spark 3.0 yarn does not support cdh5

2019-10-21 Thread melin li
Many clusters still use cdh5, and want to continue to support cdh5,cdh5 based on hadoop 2.6 melin li 于2019年10月21日周一 下午3:02写道: > 很多集群还是使用cdh5,希望继续支持cdh5,cdh5是基于hadoop 2.6 > > dev/make-distribution.sh --tgz -Pkubernetes -Pyarn -Phive-thriftserver > -Phive -Dhadoop.version=2.6.0-cdh5.15.0

Re: [spark on yarn] spark on yarn without DFS

2019-05-23 Thread Achilleus 003
This is interesting. Would really appreciate it if you could share what exactly did you change in* core-site.xml *and *yarn-site.xml.* On Wed, May 22, 2019 at 9:14 AM Gourav Sengupta wrote: > just wondering what is the advantage of doing this? > > Regards > Gourav Sengupta > > On Wed, May 22,

Re: [spark on yarn] spark on yarn without DFS

2019-05-22 Thread Gourav Sengupta
just wondering what is the advantage of doing this? Regards Gourav Sengupta On Wed, May 22, 2019 at 3:01 AM Huizhe Wang wrote: > Hi Hari, > Thanks :) I tried to do it as u said. It works ;) > > > Hariharan 于2019年5月20日 周一下午3:54写道: > >> Hi Huizhe, >> >> You can set the "fs.defaultFS" field in

Re: [spark on yarn] spark on yarn without DFS

2019-05-21 Thread Huizhe Wang
Hi Hari, Thanks :) I tried to do it as u said. It works ;) Hariharan 于2019年5月20日 周一下午3:54写道: > Hi Huizhe, > > You can set the "fs.defaultFS" field in core-site.xml to some path on s3. > That way your spark job will use S3 for all operations that need HDFS. > Intermediate data will still be

Re: [spark on yarn] spark on yarn without DFS

2019-05-20 Thread JB Data31
There is a kind of check in the *yarn-site.xml* *yarn.nodemanager.remote-app-log-dir /var/yarn/logs* ** Using *hdfs://:9000* as* fs.defaultFS* in *core-site.xml* you have to *hdfs dfs -mkdir /var/yarn/logs* Using *S3://* as * fs.defaultFS*... Take care of *.dir* properties in*

Re: [spark on yarn] spark on yarn without DFS

2019-05-20 Thread Hariharan
Hi Huizhe, You can set the "fs.defaultFS" field in core-site.xml to some path on s3. That way your spark job will use S3 for all operations that need HDFS. Intermediate data will still be stored on local disk though. Thanks, Hari On Mon, May 20, 2019 at 10:14 AM Abdeali Kothari wrote: > While

Re: [spark on yarn] spark on yarn without DFS

2019-05-19 Thread Abdeali Kothari
While spark can read from S3 directly in EMR, I believe it still needs the HDFS to perform shuffles and to write intermediate data into disk when doing jobs (I.e. when the in memory need stop spill over to disk) For these operations, Spark does need a distributed file system - You could use

Re: [spark on yarn] spark on yarn without DFS

2019-05-19 Thread Jeff Zhang
I am afraid not, because yarn needs dfs. Huizhe Wang 于2019年5月20日周一 上午9:50写道: > Hi, > > I wanna to use Spark on Yarn without HDFS.I store my resource in AWS and > using s3a to get them. However, when I use stop-dfs.sh stoped Namenode and > DataNode. I got an error when using yarn cluster mode.

Re: Spark on yarn - application hangs

2019-05-10 Thread Mich Talebzadeh
sure NP. I meant these topics [image: image.png] Have a look at this article of mine https://www.linkedin.com/pulse/real-time-processing-trade-data-kafka-flume-spark-talebzadeh-ph-d-/ under section Understanding the Spark Application Through Visualization See if it helps HTH Dr Mich

Re: Spark on yarn - application hangs

2019-05-10 Thread Mkal
How can i check what exactly is stagnant? Do you mean on the DAG visualization on Spark UI? Sorry i'm new to spark. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail:

Re: Spark on yarn - application hangs

2019-05-10 Thread Mich Talebzadeh
Hi, Have you checked matrices from Spark UI by any chance? What is stagnant? HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw *

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-12 Thread Vadim Semenov
Yeah, then the easiest would be to fork spark and run using the forked version, and in case of YARN it should be pretty easy to do. git clone https://github.com/apache/spark.git cd spark export MAVEN_OPTS="-Xmx4g -XX:ReservedCodeCacheSize=512m" ./build/mvn -DskipTests clean package

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-12 Thread Serega Sheypak
I tried a similar approach, it works well for user functions. but I need to crash tasks or executor when spark application runs "repartition". I didn't any away to inject "poison pill" into repartition call :( пн, 11 февр. 2019 г. в 21:19, Vadim Semenov : > something like this > > import

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-11 Thread Vadim Semenov
something like this import org.apache.spark.TaskContext ds.map(r => { val taskContext = TaskContext.get() if (taskContext.partitionId == 1000) { throw new RuntimeException } r }) On Mon, Feb 11, 2019 at 8:41 AM Serega Sheypak wrote: > > I need to crash task which does repartition. >

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-11 Thread Serega Sheypak
I need to crash task which does repartition. пн, 11 февр. 2019 г. в 10:37, Gabor Somogyi : > What blocks you to put if conditions inside the mentioned map function? > > On Mon, Feb 11, 2019 at 10:31 AM Serega Sheypak > wrote: > >> Yeah, but I don't need to crash entire app, I want to fail

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-11 Thread Gabor Somogyi
What blocks you to put if conditions inside the mentioned map function? On Mon, Feb 11, 2019 at 10:31 AM Serega Sheypak wrote: > Yeah, but I don't need to crash entire app, I want to fail several tasks > or executors and then wait for completion. > > вс, 10 февр. 2019 г. в 21:49, Gabor Somogyi

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-11 Thread Serega Sheypak
Yeah, but I don't need to crash entire app, I want to fail several tasks or executors and then wait for completion. вс, 10 февр. 2019 г. в 21:49, Gabor Somogyi : > Another approach is adding artificial exception into the application's > source code like this: > > val query = input.toDS.map(_ /

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-10 Thread Gabor Somogyi
Another approach is adding artificial exception into the application's source code like this: val query = input.toDS.map(_ / 0).writeStream.format("console").start() G On Sun, Feb 10, 2019 at 9:36 PM Serega Sheypak wrote: > Hi BR, > thanks for your reply. I want to mimic the issue and kill

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-10 Thread Serega Sheypak
Hi BR, thanks for your reply. I want to mimic the issue and kill tasks at a certain stage. Killing executor is also an option for me. I'm curious how do core spark contributors test spark fault tolerance? вс, 10 февр. 2019 г. в 16:57, Gabor Somogyi : > Hi Serega, > > If I understand your

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-10 Thread Gabor Somogyi
Hi Serega, If I understand your problem correctly you would like to kill one executor only and the rest of the app has to be untouched. If that's true yarn -kill is not what you want because it stops the whole application. I've done similar thing when tested/testing Spark's HA features. - jps

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-10 Thread Jörn Franke
yarn application -kill applicationid ? > Am 10.02.2019 um 13:30 schrieb Serega Sheypak : > > Hi there! > I have weird issue that appears only when tasks fail at specific stage. I > would like to imitate failure on my own. > The plan is to run problematic app and then kill entire executor or

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-23 Thread Serega Sheypak
Hi Imran, here is my usecase There is 1K nodes cluster and jobs have performance degradation because of a single node. It's rather hard to convince Cluster Ops to decommission node because of "performance degradation". Imagine 10 dev teams chase single ops team for valid reason (node has problems)

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-23 Thread Imran Rashid
Serga, can you explain a bit more why you want this ability? If the node is really bad, wouldn't you want to decomission the NM entirely? If you've got heterogenous resources, than nodelabels seem like they would be more appropriate -- and I don't feel great about adding workarounds for the

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-22 Thread Jörn Franke
You can try with Yarn node labels: https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/NodeLabel.html Then you can whitelist nodes. > Am 19.01.2019 um 00:20 schrieb Serega Sheypak : > > Hi, is there any possibility to tell Scheduler to blacklist specific nodes in > advance?

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-22 Thread Attila Zsolt Piros
The new issue is https://issues.apache.org/jira/browse/SPARK-26688. On Tue, Jan 22, 2019 at 11:30 AM Attila Zsolt Piros wrote: > Hi, > > >> Is it this one: https://github.com/apache/spark/pull/23223 ? > > No. My old development was https://github.com/apache/spark/pull/21068, > which is closed.

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-22 Thread Attila Zsolt Piros
Hi, >> Is it this one: https://github.com/apache/spark/pull/23223 ? No. My old development was https://github.com/apache/spark/pull/21068, which is closed. This would be a new improvement with a new Apache JIRA issue ( https://issues.apache.org) and with a new Github pull request. >> Can I try

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-21 Thread Serega Sheypak
Hi Apiros, thanks for your reply. Is it this one: https://github.com/apache/spark/pull/23223 ? Can I try to reach you through Cloudera Support portal? пн, 21 янв. 2019 г. в 20:06, attilapiros : > Hello, I was working on this area last year (I have developed the > YarnAllocatorBlacklistTracker)

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-21 Thread attilapiros
Hello, I was working on this area last year (I have developed the YarnAllocatorBlacklistTracker) and if you haven't found any solution for your problem I can introduce a new config which would contain a sequence of always blacklisted nodes. This way blacklisting would improve a bit again :) --

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-20 Thread Serega Sheypak
o:* Felix Cheung > *Cc:* Serega Sheypak; user > *Subject:* Re: Spark on Yarn, is it possible to manually blacklist nodes > before running spark job? > > on yarn it is impossible afaik. on kubernetes you can use taints to keep > certain nodes outside of spark > > On Fri, Jan

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-19 Thread Felix Cheung
From: Li Gao Sent: Saturday, January 19, 2019 8:43 AM To: Felix Cheung Cc: Serega Sheypak; user Subject: Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job? on yarn it is impossible afaik. on kubernetes you can use taints

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-19 Thread Li Gao
on yarn it is impossible afaik. on kubernetes you can use taints to keep certain nodes outside of spark On Fri, Jan 18, 2019 at 9:35 PM Felix Cheung wrote: > Not as far as I recall... > > > -- > *From:* Serega Sheypak > *Sent:* Friday, January 18, 2019 3:21 PM >

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-18 Thread Felix Cheung
Not as far as I recall... From: Serega Sheypak Sent: Friday, January 18, 2019 3:21 PM To: user Subject: Spark on Yarn, is it possible to manually blacklist nodes before running spark job? Hi, is there any possibility to tell Scheduler to blacklist specific

Re: Spark on YARN not utilizing all the YARN containers available

2018-10-10 Thread Gourav Sengupta
Hi Dillon, yes we can understand the number of executors that are running but the question is more around understanding the relation between YARN containers, their persistence and SPARK excutors. Regards, Gourav On Wed, Oct 10, 2018 at 6:38 AM Dillon Dukek wrote: > There is documentation here

Re: Spark on YARN not utilizing all the YARN containers available

2018-10-09 Thread Dillon Dukek
There is documentation here http://spark.apache.org/docs/latest/running-on-yarn.html about running spark on YARN. Like I said before you can use either the logs from the application or the Spark UI to understand how many executors are running at any given time. I don't think I can help much

Re: Spark on YARN not utilizing all the YARN containers available

2018-10-09 Thread Gourav Sengupta
Hi Dillon, I do think that there is a setting available where in once YARN sets up the containers then you do not deallocate them, I had used it previously in HIVE, and it just saves processing time in terms of allocating containers. That said I am still trying to understand how do we determine

Re: Spark on YARN not utilizing all the YARN containers available

2018-10-09 Thread Dillon Dukek
I'm still not sure exactly what you are meaning by saying that you have 6 yarn containers. Yarn should just be aware of the total available resources in your cluster and then be able to launch containers based on the executor requirements you set when you submit your job. If you can, I think it

Re: Spark on YARN not utilizing all the YARN containers available

2018-10-09 Thread Gourav Sengupta
hi, may be I am not quite clear in my head on this one. But how do we know that 1 yarn container = 1 executor? Regards, Gourav Sengupta On Tue, Oct 9, 2018 at 8:53 PM Dillon Dukek wrote: > Can you send how you are launching your streaming process? Also what > environment is this cluster

Re: Spark on YARN not utilizing all the YARN containers available

2018-10-09 Thread Dillon Dukek
Can you send how you are launching your streaming process? Also what environment is this cluster running in (EMR, GCP, self managed, etc)? On Tue, Oct 9, 2018 at 10:21 AM kant kodali wrote: > Hi All, > > I am using Spark 2.3.1 and using YARN as a cluster manager. > > I currently got > > 1) 6

Re: Spark on YARN in client-mode: do we need 1 vCore for the AM?

2018-05-24 Thread Jeff Zhang
I don't think it is possible to have less than 1 core for AM, this is due to yarn not spark. The number of AM comparing to the number of executors should be small and acceptable. If you do want to save more resources, I would suggest you to use yarn cluster mode where driver and AM run in the

Re: spark on yarn can't load kafka dependency jar

2016-12-15 Thread Mich Talebzadeh
try this it should work and yes they are comma separated spark-streaming-kafka_2.10-1.5.1.jar Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw *

Re: spark on yarn can't load kafka dependency jar

2016-12-15 Thread neil90
Don't the jars need to be comma sperated when you pass? i.e. --jars "hdfs://zzz:8020/jars/kafka_2.10-0.8.2.2.jar", /opt/bigdevProject/sparkStreaming_jar4/sparkStreaming.jar -- View this message in context:

Re: Spark on yarn enviroment var

2016-10-01 Thread Vadim Semenov
The question should be addressed to the oozie community. As far as I remember, a spark action doesn't have support of env variables. On Fri, Sep 30, 2016 at 8:11 PM, Saurabh Malviya (samalviy) < samal...@cisco.com> wrote: > Hi, > > > > I am running spark on yarn using oozie. > > > > When submit

Re: Spark on yarn, only 1 or 2 vcores getting allocated to the containers getting created.

2016-08-03 Thread Mungeol Heo
Try to turn yarn.scheduler.capacity.resource-calculator on, then check again. On Wed, Aug 3, 2016 at 4:53 PM, Saisai Shao wrote: > Use dominant resource calculator instead of default resource calculator will > get the expected vcores as you wanted. Basically by default

Re: Spark on yarn, only 1 or 2 vcores getting allocated to the containers getting created.

2016-08-03 Thread Mungeol Heo
Try to turn "yarn.scheduler.capacity.resource-calculator" on On Wed, Aug 3, 2016 at 4:53 PM, Saisai Shao wrote: > Use dominant resource calculator instead of default resource calculator will > get the expected vcores as you wanted. Basically by default yarn does not >

Re: Spark on yarn, only 1 or 2 vcores getting allocated to the containers getting created.

2016-08-03 Thread Saisai Shao
Use dominant resource calculator instead of default resource calculator will get the expected vcores as you wanted. Basically by default yarn does not honor cpu cores as resource, so you will always see vcore is 1 no matter what number of cores you set in spark. On Wed, Aug 3, 2016 at 12:11 PM,

Re: spark on yarn

2016-05-26 Thread Steve Loughran
> On 21 May 2016, at 15:14, Shushant Arora wrote: > > And will it allocate rest executors when other containers get freed which > were occupied by other hadoop jobs/spark applications? > requests will go into the queue(s), they'll stay outstanding until things free

Re: spark on yarn

2016-05-21 Thread Shushant Arora
3.And is the same behavior applied to streaming application also? On Sat, May 21, 2016 at 7:44 PM, Shushant Arora wrote: > And will it allocate rest executors when other containers get freed which > were occupied by other hadoop jobs/spark applications? > > And is

Re: spark on yarn

2016-05-21 Thread Shushant Arora
And will it allocate rest executors when other containers get freed which were occupied by other hadoop jobs/spark applications? And is there any minimum (% of executors demanded vs available) executors it wait for to be freed or just start with even 1 . Thanks! On Thu, Apr 21, 2016 at 8:39 PM,

Re: spark on yarn

2016-04-21 Thread Steve Loughran
If there isn't enough space in your cluster for all the executors you asked for to be created, Spark will only get the ones which can be allocated. It will start work without waiting for the others to arrive. Make sure you ask for enough memory: YARN is a lot more unforgiving about memory use

Re: spark on yarn

2016-04-20 Thread Mail.com
I get an error with a message that state what is max number of cores allowed. > On Apr 20, 2016, at 11:21 AM, Shushant Arora > wrote: > > I am running a spark application on yarn cluster. > > say I have available vcors in cluster as 100.And I start spark

Re: Spark with Yarn Client

2016-03-11 Thread Alexander Pivovarov
Check doc - http://spark.apache.org/docs/latest/running-on-yarn.html also you can start EMR-4.2.0 or 4.3.0 cluster with Spark app and see how it's configured On Fri, Mar 11, 2016 at 7:50 PM, Divya Gehlot wrote: > Hi, > I am trying to understand behaviour /configuration

Re: Spark on YARN memory consumption

2016-03-11 Thread Jan Štěrba
Thanks that explains a lot. -- Jan Sterba https://twitter.com/honzasterba | http://flickr.com/honzasterba | http://500px.com/honzasterba On Fri, Mar 11, 2016 at 2:36 PM, Silvio Fiorito wrote: > Hi Jan, > > > > Yes what you’re seeing is due to YARN container memory

RE: Spark on YARN memory consumption

2016-03-11 Thread Silvio Fiorito
Hi Jan, Yes what you’re seeing is due to YARN container memory overhead. Also, typically the memory increments for YARN containers is 1GB. This gives a good overview: http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/ Thanks, Silvio From: Jan

Re: Spark on Yarn with Dynamic Resource Allocation. Container always marked as failed

2016-03-02 Thread Xiaoye Sun
Hi Jeff and Prabhu, Thanks for your help. I look deep in the nodemanager log and I found that I have a error message like this: 2016-03-02 03:13:59,692 ERROR org.apache.spark.network.shuffle.ExternalShuffleBlockResolver: error opening leveldb file

Re: Spark on Yarn with Dynamic Resource Allocation. Container always marked as failed

2016-03-02 Thread Prabhu Joseph
Is all NodeManager services restarted after the change in yarn-site.xml On Thu, Mar 3, 2016 at 6:00 AM, Jeff Zhang wrote: > The executor may fail to start. You need to check the executor logs, if > there's no executor log then you need to check node manager log. > > On Wed,

Re: Spark on Yarn with Dynamic Resource Allocation. Container always marked as failed

2016-03-02 Thread Jeff Zhang
The executor may fail to start. You need to check the executor logs, if there's no executor log then you need to check node manager log. On Wed, Mar 2, 2016 at 4:26 PM, Xiaoye Sun wrote: > Hi all, > > I am very new to spark and yarn. > > I am running a BroadcastTest

Re: Spark 1.5.2 Yarn Application Master - resiliencey

2016-02-03 Thread Nirav Patel
Awesome! it looks promising. Thanks Rishabh and Marcelo. On Wed, Feb 3, 2016 at 12:09 PM, Rishabh Wadhawan wrote: > Check out this link > http://spark.apache.org/docs/latest/configuration.html and check > spark.shuffle.service. Thanks > > On Feb 3, 2016, at 1:02 PM,

Re: Spark 1.5.2 Yarn Application Master - resiliencey

2016-02-03 Thread Marcelo Vanzin
Without the exact error from the driver that caused the job to restart, it's hard to tell. But a simple way to improve things is to install the Spark shuffle service on the YARN nodes, so that even if an executor crashes, its shuffle output is still available to other executors. On Wed, Feb 3,

Re: Spark 1.5.2 Yarn Application Master - resiliencey

2016-02-03 Thread Nirav Patel
Do you mean this setup? https://spark.apache.org/docs/1.5.2/job-scheduling.html#dynamic-resource-allocation On Wed, Feb 3, 2016 at 11:50 AM, Marcelo Vanzin wrote: > Without the exact error from the driver that caused the job to restart, > it's hard to tell. But a simple

Re: Spark 1.5.2 Yarn Application Master - resiliencey

2016-02-03 Thread Marcelo Vanzin
Yes, but you don't necessarily need to use dynamic allocation (just enable the external shuffle service). On Wed, Feb 3, 2016 at 11:53 AM, Nirav Patel wrote: > Do you mean this setup? > > https://spark.apache.org/docs/1.5.2/job-scheduling.html#dynamic-resource-allocation

Re: Spark 1.5.2 Yarn Application Master - resiliencey

2016-02-03 Thread Rishabh Wadhawan
Hi Nirav There is a difference between dynamic resource allocation and shuffle service. The dynamic allocation when you enable the configurations for it, every time you run any task spark will determine the number of executors required to run that task for you, which means decreasing the

Re: Spark 1.5.2 Yarn Application Master - resiliencey

2016-02-03 Thread Rishabh Wadhawan
Check out this link http://spark.apache.org/docs/latest/configuration.html and check spark.shuffle.service. Thanks > On Feb 3, 2016, at 1:02 PM, Marcelo Vanzin wrote: > > Yes, but you don't necessarily need to use

RE: Spark App -Yarn-Cluster-Mode ===> Hadoop_conf_**.zip file.

2016-01-18 Thread Siddharth Ubale
To: Siddharth Ubale <siddharth.ub...@syncoms.com> Cc: user@spark.apache.org Subject: Re: Spark App -Yarn-Cluster-Mode ===> Hadoop_conf_**.zip file. Interesting. Which hbase / Phoenix releases are you using ? The following method has been removed from Put: public Put setWriteToWAL(bool

Re: Spark 1.6.0, yarn-shuffle

2016-01-18 Thread johd
Hi, No, i have not. :-/ Regards, J -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-6-0-yarn-shuffle-tp25961p26002.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark App -Yarn-Cluster-Mode ===> Hadoop_conf_**.zip file.

2016-01-15 Thread Ted Yu
bq. check application tracking page:http://slave1:8088/proxy/application_1452763526769_0011/ Then , ... Have you done the above to see what error was in each attempt ? Which Spark / hadoop release are you using ? Thanks On Fri, Jan

Re: Spark App -Yarn-Cluster-Mode ===> Hadoop_conf_**.zip file.

2016-01-15 Thread Ted Yu
at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:497) > > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:483) >

RE: Spark App -Yarn-Cluster-Mode ===> Hadoop_conf_**.zip file.

2016-01-15 Thread Siddharth Ubale
.ub...@syncoms.com> Cc: user@spark.apache.org Subject: Re: Spark App -Yarn-Cluster-Mode ===> Hadoop_conf_**.zip file. bq. check application tracking page:http://slave1:8088/proxy/application_1452763526769_0011/ Then<http://slave1:8088/proxy/application_1452763526769_0011/Then>, ...

Re: [Spark on YARN] Multiple Auxiliary Shuffle Service Versions

2016-01-06 Thread Deenar Toraskar
Hi guys 1. >> Add this jar to the classpath of all NodeManagers in your cluster. A related question on configuration of the auxillary shuffle service. *How do i find the classpath for NodeManager?* I tried finding all places where the existing mapreduce shuffle jars are present and place

Re: ​Spark 1.6 - YARN Cluster Mode

2015-12-21 Thread Akhil Das
Try adding these properties: spark.driver.extraJavaOptions -Dhdp.version=2.3.2.0-2950 spark.yarn.am.extraJavaOptions -Dhdp.version=2.3.2.0-2950 ​There was a similar discussion with Spark 1.3.0 over here http://stackoverflow.com/questions/29470542/spark-1-3-0-running-pi-example-on-yarn-fails ​

Re: Spark on YARN multitenancy

2015-12-15 Thread Ben Roling
I'm curious to see the feedback others will provide. My impression is the only way to get Spark to give up resources while it is idle would be to use the preemption feature of the scheduler you're using in YARN. When another user comes along the scheduler would preempt one or more Spark

Re: Spark on YARN multitenancy

2015-12-15 Thread Ashwin Sai Shankar
We run large multi-tenant clusters with spark/hadoop workloads, and we use 'yarn's preemption'/'spark's dynamic allocation' to achieve multitenancy. See following link on how to enable/configure preemption using fair scheduler :

Re: Spark on YARN multitenancy

2015-12-15 Thread Ben Roling
Oops - I meant while it is *busy* when I said while it is *idle*. On Tue, Dec 15, 2015 at 11:35 AM Ben Roling wrote: > I'm curious to see the feedback others will provide. My impression is the > only way to get Spark to give up resources while it is idle would be to use >

Re: Spark on YARN: java.lang.ClassCastException SerializedLambda to org.apache.spark.api.java.function.Function in instance of org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1

2015-12-06 Thread Mohamed Nadjib Mami
Your jars are not delivered to the workers. Have a look at this: http://stackoverflow.com/questions/24052899/how-to-make-it-easier-to-deploy-my-jar-to-spark-cluster-in-standalone-mode -- View this message in context:

Re: Spark on yarn vs spark standalone

2015-11-30 Thread Jacek Laskowski
Hi, My understanding of Spark on YARN and even Spark in general is very limited so keep that in mind. I'm not sure why you compare yarn-cluster and spark standalone? In yarn-cluster a driver runs on a node in the YARN cluster while spark standalone keeps the driver on the machine you launched a

Re: Spark on yarn vs spark standalone

2015-11-30 Thread Jacek Laskowski
Hi Mark, I said I've only managed to develop a limited understanding of how Spark works in the different deploy modes ;-) But somehow I thought that cluster in spark standalone is not supported. I think I've seen a JIRA with a change quite recently where it was said or something similar. Can't

Re: Spark on yarn vs spark standalone

2015-11-30 Thread Mark Hamstra
Standalone mode also supports running the driver on a cluster node. See "cluster" mode in http://spark.apache.org/docs/latest/spark-standalone.html#launching-spark-applications . Also, http://spark.apache.org/docs/latest/spark-standalone.html#high-availability On Mon, Nov 30, 2015 at 9:47 AM,

Re: Spark on yarn vs spark standalone

2015-11-26 Thread Jeff Zhang
If your cluster is a dedicated spark cluster (only running spark job, no other jobs like hive/pig/mr), then spark standalone would be fine. Otherwise I think yarn would be a better option. On Fri, Nov 27, 2015 at 3:36 PM, cs user wrote: > Hi All, > > Apologies if this

Re: Spark on YARN using Java 1.8 fails

2015-11-11 Thread mvle
Unfortunately, no. I switched back to OpenJDK 1.7. Didn't get a chance to dig deeper. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-on-YARN-using-Java-1-8-fails-tp24925p25360.html Sent from the Apache Spark User List mailing list archive at

Re: Spark on YARN using Java 1.8 fails

2015-11-11 Thread Abel Rincón
Hi, There was another related question https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201506.mbox/%3CCAJ2peNeruM2Y2Tbf8-Wiras-weE586LM_o25FsN=+z1-bfw...@mail.gmail.com%3E Some months ago, if I remember well, using spark 1.3 + YARN + Java 8 we had the same probem.

RE: Spark on Yarn

2015-10-21 Thread Jean-Baptiste Onofré
Hi The compiled version (master side) and client version diverge on spark network JavaUtils. You should use the same/aligned version. RegardsJB Sent from my Samsung device Original message From: Raghuveer Chanda Date: 21/10/2015 12:33

Re: Spark on Yarn

2015-10-21 Thread Raghuveer Chanda
Hi, So does this mean I can't run spark 1.4 fat jar on yarn without installing spark 1.4. I am including spark 1.4 in my pom.xml so doesn't this mean its compiling in 1.4. On Wed, Oct 21, 2015 at 4:38 PM, Jean-Baptiste Onofré wrote: > Hi > > The compiled version (master

Re: Spark on Yarn

2015-10-21 Thread Adrian Tanase
, 2015 at 2:14 PM To: Jean-Baptiste Onofré Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" Subject: Re: Spark on Yarn Hi, So does this mean I can't run spark 1.4 fat jar on yarn without installing spark 1.4. I am including spark 1.4 in my pom.xml so doesn't this me

Re: Spark on Yarn

2015-10-21 Thread Raghuveer Chanda
th maven) and marking it as provided in sbt. > > -adrian > > From: Raghuveer Chanda > Date: Wednesday, October 21, 2015 at 2:14 PM > To: Jean-Baptiste Onofré > Cc: "user@spark.apache.org" > Subject: Re: Spark on Yarn > > Hi, > > So does this mean I can't run

Re: Spark on YARN using Java 1.8 fails

2015-10-12 Thread Abhisheks
Did you get any resolution for this? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-on-YARN-using-Java-1-8-fails-tp24925p25039.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: [Spark on YARN] Multiple Auxiliary Shuffle Service Versions

2015-10-06 Thread Steve Loughran
On 6 Oct 2015, at 01:23, Andrew Or > wrote: Both the history server and the shuffle service are backward compatible, but not forward compatible. This means as long as you have the latest version of history server / shuffle service running in

Re: [Spark on YARN] Multiple Auxiliary Shuffle Service Versions

2015-10-06 Thread Alex Rovner
Thank you all for your help. *Alex Rovner* *Director, Data Engineering * *o:* 646.759.0052 * * On Tue, Oct 6, 2015 at 11:17 AM, Steve Loughran wrote: > > On 6 Oct 2015, at 01:23, Andrew Or wrote: > > Both the history

Re: [Spark on YARN] Multiple Auxiliary Shuffle Service Versions

2015-10-06 Thread Andreas Fritzler
Hi Andrew, thanks a lot for the clarification! Regards, Andreas On Tue, Oct 6, 2015 at 2:23 AM, Andrew Or wrote: > Hi all, > > Both the history server and the shuffle service are backward compatible, > but not forward compatible. This means as long as you have the

Re: [Spark on YARN] Multiple Auxiliary Shuffle Service Versions

2015-10-05 Thread Andrew Or
Hi all, Both the history server and the shuffle service are backward compatible, but not forward compatible. This means as long as you have the latest version of history server / shuffle service running in your cluster then you're fine (you don't need multiple of them). That said, an old shuffle

Re: [Spark on YARN] Multiple Auxiliary Shuffle Service Versions

2015-10-05 Thread Alex Rovner
We are running CDH 5.4 with Spark 1.3 as our main version and that version is configured to use the external shuffling service. We have also installed Spark 1.5 and have configured it not to use the external shuffling service and that works well for us so far. I would be interested myself how to

Re: [Spark on YARN] Multiple Auxiliary Shuffle Service Versions

2015-10-05 Thread Steve Loughran
> On 5 Oct 2015, at 15:59, Alex Rovner wrote: > > I have the same question about the history server. We are trying to run > multiple versions of Spark and are wondering if the history server is > backwards compatible. yes, it supports the pre-1.4 "Single attempt"

Re: [Spark on YARN] Multiple Auxiliary Shuffle Service Versions

2015-10-05 Thread Andreas Fritzler
Hi Steve, Alex, how do you handle the distribution and configuration of the spark-*-yarn-shuffle.jar on your NodeManagers if you want to use 2 different Spark versions? Regards, Andreas On Mon, Oct 5, 2015 at 4:54 PM, Steve Loughran wrote: > > > On 5 Oct 2015, at

Re: [Spark on YARN] Multiple Auxiliary Shuffle Service Versions

2015-10-05 Thread Alex Rovner
I have the same question about the history server. We are trying to run multiple versions of Spark and are wondering if the history server is backwards compatible. *Alex Rovner* *Director, Data Engineering * *o:* 646.759.0052 * * On Mon, Oct 5, 2015 at 9:22 AM, Andreas

Re: [Spark on YARN] Multiple Auxiliary Shuffle Service Versions

2015-10-05 Thread Alex Rovner
Hey Steve, Are you referring to the 1.5 version of the history server? *Alex Rovner* *Director, Data Engineering * *o:* 646.759.0052 * * On Mon, Oct 5, 2015 at 10:18 AM, Steve Loughran wrote: > > > On 5 Oct 2015, at 15:59, Alex Rovner

  1   2   3   >