Re: how can i use spark with yarn cluster in java

2023-09-06 Thread Mich Talebzadeh
Sounds like a network issue, for example connecting to remote server? try ping 172.21.242.26 telnet 172.21.242.26 596590 or nc -vz 172.21.242.26 596590 example nc -vz rhes76 1521 Ncat: Version 7.50 ( https://nmap.org/ncat ) Ncat: Connected to 50.140.197.230:1521. Ncat: 0 bytes sent, 0 bytes

how can i use spark with yarn cluster in java

2023-09-06 Thread BCMS
i want to use yarn cluster with my current code. if i use conf.set("spark.master","local[*]") inplace of conf.set("spark.master","yarn"), everything is very well. but i try to use yarn in setmaster, my code give an below error. ``` package com.example.pocsparkspring; import

[Spark RPC]: Yarn - Application Master / executors to Driver communication issue

2023-07-14 Thread Sunayan Saikia
Hey Spark Community, Our Jupyterhub/Jupyterlab (with spark client) runs behind two layers of HAProxy and the Yarn cluster runs remotely. We want to use deploy mode 'client' so that we can capture the output of any spark sql query in jupyterlab. I'm aware of other technologies like Livy and Spark

Re: Spark-on-Yarn ClassNotFound Exception

2022-12-18 Thread Hariharan
;> at scala.collection.immutable.List.map(List.scala:293) >>>> at >>>> org.apache.spark.sql.execution.datasources.DataSource$.checkAndGlobPathIfNecessary(DataSource.scala:750) >>>> at >>>> org.apache.spark.sql.execution

Re: Spark-on-Yarn ClassNotFound Exception

2022-12-15 Thread scrypso
t;>> at >>> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:408) >>> at >>> org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228) >>> at >>> org.apache.spark.sql.

Re: Spark-on-Yarn ClassNotFound Exception

2022-12-13 Thread Hariharan
eader.$anonfun$load$2(DataFrameReader.scala:210) >> at scala.Option.getOrElse(Option.scala:189) >> at >> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210) >> >> Thanks again! >> >> On Tue, Dec 13, 2022 at 9

Re: Spark-on-Yarn ClassNotFound Exception

2022-12-13 Thread scrypso
; > On Tue, Dec 13, 2022 at 9:52 PM scrypso wrote: > >> Two ideas you could try: >> >> You can try spark.driver.extraClassPath as well. Spark loads the user's >> jar in a child classloader, so Spark/Yarn/Hadoop can only see your classes >> reflectively. Hadoop

Re: Spark-on-Yarn ClassNotFound Exception

2022-12-13 Thread Hariharan
) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210) Thanks again! On Tue, Dec 13, 2022 at 9:52 PM scrypso wrote: > Two ideas you could try: > > You can try spark.driver.extraClassPath as well. Spark loads the user's > jar in a child classloader, so Spark/Yarn/Hadoop can o

Re: Spark-on-Yarn ClassNotFound Exception

2022-12-13 Thread scrypso
Two ideas you could try: You can try spark.driver.extraClassPath as well. Spark loads the user's jar in a child classloader, so Spark/Yarn/Hadoop can only see your classes reflectively. Hadoop's Configuration should use the thread ctx classloader, and Spark should set that to the loader

Re: Spark-on-Yarn ClassNotFound Exception

2022-12-13 Thread Hariharan
Missed to mention it above, but just to add, the error is coming from the driver. I tried using *--driver-class-path /path/to/my/jar* as well, but no luck. Thanks! On Mon, Dec 12, 2022 at 4:21 PM Hariharan wrote: > Hello folks, > > I have a spark app with a custom implementation of >

Spark-on-Yarn ClassNotFound Exception

2022-12-12 Thread Hariharan
Hello folks, I have a spark app with a custom implementation of *fs.s3a.s3.client.factory.impl* which is packaged into the same jar. Output of *jar tf* *2620 Mon Dec 12 11:23:00 IST 2022 aws/utils/MyS3ClientFactory.class* However when I run the my spark app with spark-submit in cluster mode, it

Re: Spark 3.0 yarn does not support cdh5

2019-10-21 Thread melin li
.0-cdh5.15.0 -DskipTest > > ``` > [INFO] Compiling 25 Scala sources to > /Users/libinsong/Documents/codes/tongdun/spark-3.0/resource-managers/yarn/target/scala-2.12/classes > ... > [ERROR] [Error] > /Users/libinsong/Documents/codes/tongdun/spark-3.0/resource-managers/yarn/src/main/s

Spark 3.0 yarn does not support cdh5

2019-10-21 Thread melin li
/classes ... [ERROR] [Error] /Users/libinsong/Documents/codes/tongdun/spark-3.0/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:298: value setRolledLogsIncludePattern is not a member of org.apache.hadoop.yarn.api.records.LogAggregationContext [ERROR] [Error] /Users

Spark on YARN with private Docker repositories/registries

2019-08-16 Thread Tak-Lon (Stephen) Wu
Hi guys, Have anyone been using spark (spark-submit) with yarn mode which pull images from a private Docker repositories/registries ?? how do you pass in the docker config.json which included the auth tokens ? or is there any environment variable can be added in the system environment to make

Spark on Yarn - Dynamically getting a list of archives from --archives in spark-submit

2019-06-13 Thread Tommy Li
Hi Is there any way to get a list of the archives submitted with a spark job from the spark context? I see that spark context has a `.files()` function which returns the files included with `--files`, but I don't see an equivalent for `--archives`. Thanks, Tommy

Re: [spark on yarn] spark on yarn without DFS

2019-05-23 Thread Achilleus 003
gt; doing jobs (I.e. when the in memory need stop spill over to disk) >>>> >>>> For these operations, Spark does need a distributed file system - You >>>> could use something like EMRFS (which is like a HDFS backed by S3) on >>>> Amazon. >>>> >&

Re: [spark on yarn] spark on yarn without DFS

2019-05-22 Thread Gourav Sengupta
ing else too - so a stacktrace or error message >>> could help in understanding the problem. >>> >>> >>> >>> On Mon, May 20, 2019, 07:20 Huizhe Wang wrote: >>> >>>> Hi, >>>> >>>> I wanna to use Spark on Yarn without HDFS.I store my resource in AWS >>>> and using s3a to get them. However, when I use stop-dfs.sh stoped Namenode >>>> and DataNode. I got an error when using yarn cluster mode. Could I using >>>> yarn without start DFS, how could I use this mode? >>>> >>>> Yours, >>>> Jane >>>> >>>

Re: [spark on yarn] spark on yarn without DFS

2019-05-21 Thread Huizhe Wang
issue could be something else too - so a stacktrace or error message >> could help in understanding the problem. >> >> >> >> On Mon, May 20, 2019, 07:20 Huizhe Wang wrote: >> >>> Hi, >>> >>> I wanna to use Spark on Yarn without HDFS

Re: [spark on yarn] spark on yarn without DFS

2019-05-20 Thread JB Data31
e something like EMRFS (which is like a HDFS backed by S3) on >> Amazon. >> >> The issue could be something else too - so a stacktrace or error message >> could help in understanding the problem. >> >> >> >> On Mon, May 20, 2019, 07:20 Huizhe Wang wrote

Re: [spark on yarn] spark on yarn without DFS

2019-05-20 Thread Hariharan
system - You > could use something like EMRFS (which is like a HDFS backed by S3) on > Amazon. > > The issue could be something else too - so a stacktrace or error message > could help in understanding the problem. > > > > On Mon, May 20, 2019, 07:20 Huizhe Wang wrote:

Re: [spark on yarn] spark on yarn without DFS

2019-05-19 Thread Abdeali Kothari
something like EMRFS (which is like a HDFS backed by S3) on Amazon. The issue could be something else too - so a stacktrace or error message could help in understanding the problem. On Mon, May 20, 2019, 07:20 Huizhe Wang wrote: > Hi, > > I wanna to use Spark on Yarn without HDFS.I

Re: [spark on yarn] spark on yarn without DFS

2019-05-19 Thread Jeff Zhang
I am afraid not, because yarn needs dfs. Huizhe Wang 于2019年5月20日周一 上午9:50写道: > Hi, > > I wanna to use Spark on Yarn without HDFS.I store my resource in AWS and > using s3a to get them. However, when I use stop-dfs.sh stoped Namenode and > DataNode. I got an error when using ya

[spark on yarn] spark on yarn without DFS

2019-05-19 Thread Huizhe Wang
Hi, I wanna to use Spark on Yarn without HDFS.I store my resource in AWS and using s3a to get them. However, when I use stop-dfs.sh stoped Namenode and DataNode. I got an error when using yarn cluster mode. Could I using yarn without start DFS, how could I use this mode? Yours, Jane

Re: How to configure alluxio cluster with spark in yarn

2019-05-16 Thread Bin Fan
hi Andy Assuming you are running Spark with YARN, then I would recommend deploying Alluxio in the same YARN cluster if you are looking for best performance. Alluxio can also be deployed separated as a standalone service, but in that case, you may need to transfer data from Alluxio cluster to your

Re: Spark on yarn - application hangs

2019-05-10 Thread Mich Talebzadeh
sure NP. I meant these topics [image: image.png] Have a look at this article of mine https://www.linkedin.com/pulse/real-time-processing-trade-data-kafka-flume-spark-talebzadeh-ph-d-/ under section Understanding the Spark Application Through Visualization See if it helps HTH Dr Mich

Re: Spark on yarn - application hangs

2019-05-10 Thread Mkal
How can i check what exactly is stagnant? Do you mean on the DAG visualization on Spark UI? Sorry i'm new to spark. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail:

Re: Spark on yarn - application hangs

2019-05-10 Thread Mich Talebzadeh
Hi, Have you checked matrices from Spark UI by any chance? What is stagnant? HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw *

Spark on yarn - application hangs

2019-05-10 Thread Mkal
I've built a spark job in which an external program is called through the use of pipe(). Job runs correctly on cluster when the input is a small sample dataset but when the input is a real large dataset it stays on RUNNING state forever. I've tried different ways to tune executor memory, executor

How to configure alluxio cluster with spark in yarn

2019-05-09 Thread u9g
Hey, I want to speed up the Spark task running in the Yarn cluster through Alluxio. Is Alluxio recommended to run in the same yarn cluster on the yarn mode? Should I deploy Alluxio independently on the nodes of the yarn cluster? Or deploy a cluster separately? Best, Andy Li

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-12 Thread Vadim Semenov
Yeah, then the easiest would be to fork spark and run using the forked version, and in case of YARN it should be pretty easy to do. git clone https://github.com/apache/spark.git cd spark export MAVEN_OPTS="-Xmx4g -XX:ReservedCodeCacheSize=512m" ./build/mvn -DskipTests clean package

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-12 Thread Serega Sheypak
I tried a similar approach, it works well for user functions. but I need to crash tasks or executor when spark application runs "repartition". I didn't any away to inject "poison pill" into repartition call :( пн, 11 февр. 2019 г. в 21:19, Vadim Semenov : > something like this > > import

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-11 Thread Vadim Semenov
something like this import org.apache.spark.TaskContext ds.map(r => { val taskContext = TaskContext.get() if (taskContext.partitionId == 1000) { throw new RuntimeException } r }) On Mon, Feb 11, 2019 at 8:41 AM Serega Sheypak wrote: > > I need to crash task which does repartition. >

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-11 Thread Serega Sheypak
I need to crash task which does repartition. пн, 11 февр. 2019 г. в 10:37, Gabor Somogyi : > What blocks you to put if conditions inside the mentioned map function? > > On Mon, Feb 11, 2019 at 10:31 AM Serega Sheypak > wrote: > >> Yeah, but I don't need to crash entire app, I want to fail

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-11 Thread Gabor Somogyi
What blocks you to put if conditions inside the mentioned map function? On Mon, Feb 11, 2019 at 10:31 AM Serega Sheypak wrote: > Yeah, but I don't need to crash entire app, I want to fail several tasks > or executors and then wait for completion. > > вс, 10 февр. 2019 г. в 21:49, Gabor Somogyi

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-11 Thread Serega Sheypak
Yeah, but I don't need to crash entire app, I want to fail several tasks or executors and then wait for completion. вс, 10 февр. 2019 г. в 21:49, Gabor Somogyi : > Another approach is adding artificial exception into the application's > source code like this: > > val query = input.toDS.map(_ /

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-10 Thread Gabor Somogyi
Another approach is adding artificial exception into the application's source code like this: val query = input.toDS.map(_ / 0).writeStream.format("console").start() G On Sun, Feb 10, 2019 at 9:36 PM Serega Sheypak wrote: > Hi BR, > thanks for your reply. I want to mimic the issue and kill

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-10 Thread Serega Sheypak
Hi BR, thanks for your reply. I want to mimic the issue and kill tasks at a certain stage. Killing executor is also an option for me. I'm curious how do core spark contributors test spark fault tolerance? вс, 10 февр. 2019 г. в 16:57, Gabor Somogyi : > Hi Serega, > > If I understand your

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-10 Thread Gabor Somogyi
Hi Serega, If I understand your problem correctly you would like to kill one executor only and the rest of the app has to be untouched. If that's true yarn -kill is not what you want because it stops the whole application. I've done similar thing when tested/testing Spark's HA features. - jps

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-10 Thread Jörn Franke
yarn application -kill applicationid ? > Am 10.02.2019 um 13:30 schrieb Serega Sheypak : > > Hi there! > I have weird issue that appears only when tasks fail at specific stage. I > would like to imitate failure on my own. > The plan is to run problematic app and then kill entire executor or

Spark on YARN, HowTo kill executor or individual task?

2019-02-10 Thread Serega Sheypak
Hi there! I have weird issue that appears only when tasks fail at specific stage. I would like to imitate failure on my own. The plan is to run problematic app and then kill entire executor or some tasks when execution reaches certain stage. Is it do-able?

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-23 Thread Serega Sheypak
Hi Imran, here is my usecase There is 1K nodes cluster and jobs have performance degradation because of a single node. It's rather hard to convince Cluster Ops to decommission node because of "performance degradation". Imagine 10 dev teams chase single ops team for valid reason (node has problems)

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-23 Thread Imran Rashid
Serga, can you explain a bit more why you want this ability? If the node is really bad, wouldn't you want to decomission the NM entirely? If you've got heterogenous resources, than nodelabels seem like they would be more appropriate -- and I don't feel great about adding workarounds for the

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-22 Thread Jörn Franke
You can try with Yarn node labels: https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/NodeLabel.html Then you can whitelist nodes. > Am 19.01.2019 um 00:20 schrieb Serega Sheypak : > > Hi, is there any possibility to tell Scheduler to blacklist specific nodes in > advance?

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-22 Thread Attila Zsolt Piros
The new issue is https://issues.apache.org/jira/browse/SPARK-26688. On Tue, Jan 22, 2019 at 11:30 AM Attila Zsolt Piros wrote: > Hi, > > >> Is it this one: https://github.com/apache/spark/pull/23223 ? > > No. My old development was https://github.com/apache/spark/pull/21068, > which is closed.

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-22 Thread Attila Zsolt Piros
Hi, >> Is it this one: https://github.com/apache/spark/pull/23223 ? No. My old development was https://github.com/apache/spark/pull/21068, which is closed. This would be a new improvement with a new Apache JIRA issue ( https://issues.apache.org) and with a new Github pull request. >> Can I try

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-21 Thread Serega Sheypak
Hi Apiros, thanks for your reply. Is it this one: https://github.com/apache/spark/pull/23223 ? Can I try to reach you through Cloudera Support portal? пн, 21 янв. 2019 г. в 20:06, attilapiros : > Hello, I was working on this area last year (I have developed the > YarnAllocatorBlacklistTracker)

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-21 Thread attilapiros
Hello, I was working on this area last year (I have developed the YarnAllocatorBlacklistTracker) and if you haven't found any solution for your problem I can introduce a new config which would contain a sequence of always blacklisted nodes. This way blacklisting would improve a bit again :) --

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-20 Thread Serega Sheypak
Thanks, so I'll check YARN. Does anyone know if Spark-on-Yarn plans to expose such functionality? сб, 19 янв. 2019 г. в 18:04, Felix Cheung : > To clarify, yarn actually supports excluding node right when requesting > resources. It’s spark that doesn’t provide a way to po

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-19 Thread Felix Cheung
From: Li Gao Sent: Saturday, January 19, 2019 8:43 AM To: Felix Cheung Cc: Serega Sheypak; user Subject: Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job? on yarn it is impossible afaik. on kubernetes you can use taints

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-19 Thread Li Gao
y 18, 2019 3:21 PM > *To:* user > *Subject:* Spark on Yarn, is it possible to manually blacklist nodes > before running spark job? > > Hi, is there any possibility to tell Scheduler to blacklist specific nodes > in advance? >

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-18 Thread Felix Cheung
Not as far as I recall... From: Serega Sheypak Sent: Friday, January 18, 2019 3:21 PM To: user Subject: Spark on Yarn, is it possible to manually blacklist nodes before running spark job? Hi, is there any possibility to tell Scheduler to blacklist specific

Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-18 Thread Serega Sheypak
Hi, is there any possibility to tell Scheduler to blacklist specific nodes in advance?

Re: Spark on YARN not utilizing all the YARN containers available

2018-10-10 Thread Gourav Sengupta
ere > http://spark.apache.org/docs/latest/running-on-yarn.html about running > spark on YARN. Like I said before you can use either the logs from the > application or the Spark UI to understand how many executors are running at > any given time. I don't think I can help much furth

Re: Spark on YARN not utilizing all the YARN containers available

2018-10-09 Thread Dillon Dukek
There is documentation here http://spark.apache.org/docs/latest/running-on-yarn.html about running spark on YARN. Like I said before you can use either the logs from the application or the Spark UI to understand how many executors are running at any given time. I don't think I can help much

Re: Spark on YARN not utilizing all the YARN containers available

2018-10-09 Thread Gourav Sengupta
Hi Dillon, I do think that there is a setting available where in once YARN sets up the containers then you do not deallocate them, I had used it previously in HIVE, and it just saves processing time in terms of allocating containers. That said I am still trying to understand how do we determine

Re: Spark on YARN not utilizing all the YARN containers available

2018-10-09 Thread Dillon Dukek
I'm still not sure exactly what you are meaning by saying that you have 6 yarn containers. Yarn should just be aware of the total available resources in your cluster and then be able to launch containers based on the executor requirements you set when you submit your job. If you can, I think it

Re: Spark on YARN not utilizing all the YARN containers available

2018-10-09 Thread Gourav Sengupta
hi, may be I am not quite clear in my head on this one. But how do we know that 1 yarn container = 1 executor? Regards, Gourav Sengupta On Tue, Oct 9, 2018 at 8:53 PM Dillon Dukek wrote: > Can you send how you are launching your streaming process? Also what > environment is this cluster

Re: Spark on YARN not utilizing all the YARN containers available

2018-10-09 Thread Dillon Dukek
Can you send how you are launching your streaming process? Also what environment is this cluster running in (EMR, GCP, self managed, etc)? On Tue, Oct 9, 2018 at 10:21 AM kant kodali wrote: > Hi All, > > I am using Spark 2.3.1 and using YARN as a cluster manager. > > I currently got > > 1) 6

Spark on YARN not utilizing all the YARN containers available

2018-10-09 Thread kant kodali
Hi All, I am using Spark 2.3.1 and using YARN as a cluster manager. I currently got 1) 6 YARN containers(executors=6) with 4 executor cores for each container. 2) 6 Kafka partitions from one topic. 3) You can assume every other configuration is set to whatever the default values are. Spawned a

Re: Spark on YARN in client-mode: do we need 1 vCore for the AM?

2018-05-24 Thread Jeff Zhang
I don't think it is possible to have less than 1 core for AM, this is due to yarn not spark. The number of AM comparing to the number of executors should be small and acceptable. If you do want to save more resources, I would suggest you to use yarn cluster mode where driver and AM run

Spark on YARN in client-mode: do we need 1 vCore for the AM?

2018-05-18 Thread peay
Hello, I run a Spark cluster on YARN, and we have a bunch of client-mode applications we use for interactive work. Whenever we start one of this, an application master container is started. My understanding is that this is mostly an empty shell, used to request further containers or get

Hortonworks Spark-Hbase-Connector does not read zookeeper configurations from spark session config ??(Spark on Yarn)

2018-02-22 Thread Dharmin Siddesh J
ever it works good when i run it in cluster mode. sample command spark-submit --master yarn --deploy-mode cluster --files /home/siddesh/hbase-site.xml --class com.orzota.rs.json.HbaseConnector --packages com.hortonworks:shc:1.0.0-2.0-s_2.11 --repositories http://repo.hortonworks.com/content/groups

How to create security filter for Spark UI in Spark on YARN

2018-01-09 Thread Jhon Anderson Cardenas Diaz
*Environment*: AWS EMR, yarn cluster. *Description*: I am trying to use a java filter to protect the access to spark ui, this by using the property spark.ui.filters; the problem is that when spark is running on yarn mode, that property is being allways overriden by hadoop with the filter

Re: Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

2018-01-04 Thread Marcelo Vanzin
On Wed, Jan 3, 2018 at 8:18 PM, John Zhuge <john.zh...@gmail.com> wrote: > Something like: > > Note: When running Spark on YARN, environment variables for the executors > need to be set using the spark.yarn.executorEnv.[EnvironmentVariableName] > property in your conf/spa

Re: Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

2018-01-03 Thread John Zhuge
Sounds good. Should we add another paragraph after this paragraph in configuration.md to explain executor env as well? I will be happy to upload a simple patch. Note: When running Spark on YARN in cluster mode, environment variables > need to be set using the spark.yarn.appMaster

Re: Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

2018-01-03 Thread Marcelo Vanzin
Because spark-env.sh is something that makes sense only on the gateway machine (where the app is being submitted from). On Wed, Jan 3, 2018 at 6:46 PM, John Zhuge wrote: > Thanks Jacek and Marcelo! > > Any reason it is not sourced? Any security consideration? > > > On Wed,

Re: Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

2018-01-03 Thread John Zhuge
Thanks Jacek and Marcelo! Any reason it is not sourced? Any security consideration? On Wed, Jan 3, 2018 at 9:59 AM, Marcelo Vanzin wrote: > On Tue, Jan 2, 2018 at 10:57 PM, John Zhuge wrote: > > I am running Spark 2.0.0 and 2.1.1 on YARN in a Hadoop

Re: Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

2018-01-03 Thread Marcelo Vanzin
On Tue, Jan 2, 2018 at 10:57 PM, John Zhuge wrote: > I am running Spark 2.0.0 and 2.1.1 on YARN in a Hadoop 2.7.3 cluster. Is > spark-env.sh sourced when starting the Spark AM container or the executor > container? No, it's not. -- Marcelo

Re: Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

2018-01-03 Thread Jacek Laskowski
/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala?utf8=%E2%9C%93#L796-L801 for the code that does the settings to properties mapping. With that I think conf/spark-defaults.conf won't be loaded by itself. Why don't you set a property and see if it's

Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

2018-01-02 Thread John Zhuge
Hi, I am running Spark 2.0.0 and 2.1.1 on YARN in a Hadoop 2.7.3 cluster. Is spark-env.sh sourced when starting the Spark AM container or the executor container? Saw this paragraph on https://github.com/apache/spark/blob/master/docs/configuration.md: Note: When running Spark on YARN in cluster

[Spark on YARN] Asynchronously launching containers in YARN

2017-10-13 Thread Craig Ingram
I was recently doing some research into Spark on YARN's startup time and observed slow, synchronous allocation of containers/executors. I am testing on a 4 node bare metal cluster w/48 cores and 128GB memory per node. YARN was only allocating about 3 containers per second. Moreover when starting 3

Re: Port to open for submitting Spark on Yarn application

2017-09-03 Thread Satoshi Yamada
Jerry, Thanks for your comment. On Mon, Sep 4, 2017 at 10:43 AM, Saisai Shao <sai.sai.s...@gmail.com> wrote: > I think spark.yarn.am.port is not used any more, so you don't need to > consider this. > > If you're running Spark on YARN, I think some YARN RM port to submit >

Re: Port to open for submitting Spark on Yarn application

2017-09-03 Thread Saisai Shao
I think spark.yarn.am.port is not used any more, so you don't need to consider this. If you're running Spark on YARN, I think some YARN RM port to submit applications should also be reachable via firewall, as well as HDFS port to upload resources. Also in the Spark side, executors

Port to open for submitting Spark on Yarn application

2017-09-03 Thread Satoshi Yamada
Hi, In case we run Spark on Yarn in client mode, we have firewall for Hadoop cluster, and the client node is outside firewall, I think I have to open some ports that Application Master uses. I think the ports is specified by "spark.yarn.am.port" as document says. https://spark.apach

Re: How to configure spark on Yarn cluster

2017-07-28 Thread yohann jardin
k.apache.org<mailto:user@spark.apache.org> Subject: Re: How to configure spark on Yarn cluster Not sure that we are OK on one thing: Yarn limitations are for the sum of all nodes, while you only specify the memory for a single node through Spark. By the way, the memory displayed in the

Re: How to configure spark on Yarn cluster

2017-07-28 Thread jeff saremi
e: How to configure spark on Yarn cluster Not sure that we are OK on one thing: Yarn limitations are for the sum of all nodes, while you only specify the memory for a single node through Spark. By the way, the memory displayed in the UI is only a part of the total memory allocation:

Re: How to configure spark on Yarn cluster

2017-07-28 Thread yohann jardin
park.apache.org> Subject: Re: How to configure spark on Yarn cluster Check the executor page of the Spark UI, to check if your storage level is limiting. Also, instead of starting with 100 TB of data, sample it, make it work, and grow it little by little until you reached 100 TB.

Re: How to configure spark on Yarn cluster

2017-07-28 Thread jeff saremi
From: yohann jardin <yohannjar...@hotmail.com> Sent: Thursday, July 27, 2017 11:15:39 PM To: jeff saremi; user@spark.apache.org Subject: Re: How to configure spark on Yarn cluster Check the executor page of the Spark UI, to check if your storage level is limiting. Also, i

Re: How to configure spark on Yarn cluster

2017-07-28 Thread yohann jardin
Check the executor page of the Spark UI, to check if your storage level is limiting. Also, instead of starting with 100 TB of data, sample it, make it work, and grow it little by little until you reached 100 TB. This will validate the workflow and let you see how much data is shuffled, etc.

How to configure spark on Yarn cluster

2017-07-28 Thread jeff saremi
I have the simplest job which i'm running against 100TB of data. The job keeps failing with ExecutorLostFailure's on containers killed by Yarn for exceeding memory limits I have varied the executor-memory from 32GB to 96GB, the spark.yarn.executor.memoryOverhead from 8192 to 36000 and similar

Re: Running Spark und YARN on AWS EMR

2017-07-17 Thread Takashi Sasaki
Hi Josh, As you say, I also recognize the problem. I feel I got a warning when specifying a huge data set. We also adjust the partition size but we are doing command options instead of default settings, or in code. Regards, Takashi 2017-07-18 6:48 GMT+09:00 Josh Holbrook

Re: Running Spark und YARN on AWS EMR

2017-07-17 Thread Josh Holbrook
I just ran into this issue! Small world. As far as I can tell, by default spark on EMR is completely untuned, but it comes with a flag that you can set to tell EMR to autotune spark. In your configuration.json file, you can add something like: { "Classification": "spark", "Properties":

Re: Running Spark und YARN on AWS EMR

2017-07-17 Thread Pascal Stammer
Hi Takashi, thanks for your help. After a further investigation, I figure out that the killed container was the driver process. After setting spark.yarn.driver.memoryOverhead instead of spark.yarn.executor.memoryOverhead the error was gone and application is executed without error. Maybe it

Re: Running Spark und YARN on AWS EMR

2017-07-17 Thread Takashi Sasaki
Hi Pascal, The error also occurred frequently in our project. As a solution, it was effective to specify the memory size directly with spark-submit command. eg. spark-submit executor-memory 2g Regards, Takashi > 2017-07-18 5:18 GMT+09:00 Pascal Stammer : >> Hi, >> >>

Running Spark und YARN on AWS EMR

2017-07-17 Thread Pascal Stammer
Hi, I am running a Spark 2.1.x Application on AWS EMR with YARN and get following error that kill my application: AM Container for appattempt_1500320286695_0001_01 exited with exitCode: -104 For more detailed output, check application tracking

Spark on yarn logging

2017-06-29 Thread John Vines
I followed the instructions for configuring a custom logger per https://spark.apache.org/docs/2.0.2/running-on-yarn.html (because we have long running spark jobs, sometimes occasionally get stuck and without a rolling file appender will fill up disk). This seems to work well for us, but it breaks

spark on yarn cluster model can't use saveAsTable ?

2017-05-15 Thread lk_spark
hi,all: I have a test under spark2.1.0 , which read txt files as DataFrame and save to hive . When I submit the app jar with yarn client model it works well , but If I submit with cluster model , it will not create table and write data , and I didn't find any error log ... can anybody

Re: notebook connecting Spark On Yarn

2017-02-15 Thread Jon Gregg
rying to create multiple notebooks connecting to spark on yarn. > After starting few jobs my cluster went out of containers. All new notebook > request are in busy state as Jupyter kernel gateway is not getting any > containers for master to be started. > > Some job are not leaving t

notebook connecting Spark On Yarn

2017-02-15 Thread Sachin Aggarwal
Hi, I am trying to create multiple notebooks connecting to spark on yarn. After starting few jobs my cluster went out of containers. All new notebook request are in busy state as Jupyter kernel gateway is not getting any containers for master to be started. Some job are not leaving

Re: spark on yarn can't load kafka dependency jar

2016-12-15 Thread Mich Talebzadeh
/bigdevProject/sparkStreaming_jar4/sparkStreaming.jar > > > > -- > View this message in context: http://apache-spark-user-list. > 1001560.n3.nabble.com/spark-on-yarn-can-t-load-kafka- > dependency-jar-tp28216p28220.html > Sent from the

Re: spark on yarn can't load kafka dependency jar

2016-12-15 Thread neil90
Don't the jars need to be comma sperated when you pass? i.e. --jars "hdfs://zzz:8020/jars/kafka_2.10-0.8.2.2.jar", /opt/bigdevProject/sparkStreaming_jar4/sparkStreaming.jar -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-on-yarn-can-t-

Re: Can i display message on console when use spark on yarn?

2016-10-20 Thread ayan guha
What do you exactly mean by Yarn Console? We use spark-submit and it generates exactly same log as you mentioned on driver console, On Thu, Oct 20, 2016 at 8:21 PM, Jone Zhang <joyoungzh...@gmail.com> wrote: > I submit spark with "spark-submit --master yarn-cluster --deploy-mode &g

Can i display message on console when use spark on yarn?

2016-10-20 Thread Jone Zhang
I submit spark with "spark-submit --master yarn-cluster --deploy-mode cluster" How can i display message on yarn console. I expect it to be like this: . 16/10/20 17:12:53 main INFO org.apache.spark.deploy.yarn.Client>SPK> Application report for application_1453970859007_481440

DataFrame/Dataset join not producing correct results in Spark 2.0/Yarn

2016-10-12 Thread shankinson
sage in context: http://apache-spark-user-list.1001560.n3.nabble.com/DataFrame-Dataset-join-not-producing-correct-results-in-Spark-2-0-Yarn-tp27888.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

DataFrame/Dataset join not producing correct results in Spark 2.0/Yarn

2016-10-12 Thread Stephen Hankinson
Hi, We have a cluster running Apache Spark 2.0 on Hadoop 2.7.2, Centos 7.2. We had written some new code using the Spark DataFrame/DataSet APIs but are noticing incorrect results on a join after writing and then reading data to Windows Azure Storage Blobs (The default HDFS location). I've been

Re: Spark on yarn enviroment var

2016-10-01 Thread Vadim Semenov
The question should be addressed to the oozie community. As far as I remember, a spark action doesn't have support of env variables. On Fri, Sep 30, 2016 at 8:11 PM, Saurabh Malviya (samalviy) < samal...@cisco.com> wrote: > Hi, > > > > I am running spark on yarn using oozie

Spark on yarn enviroment var

2016-09-30 Thread Saurabh Malviya (samalviy)
Hi, I am running spark on yarn using oozie. When submit through command line using spark-submit spark is able to read env variable. But while submit through oozie its not able toget env variable and don't see driver log. Is there any way we specify env variable in oozie spark action

Does Spark on YARN inherit or replace the Hadoop/YARN configs?

2016-08-30 Thread Everett Anderson
Hi, I've had a bit of trouble getting Spark on YARN to work. When executing in this mode and submitting from outside the cluster, one must set HADOOP_CONF_DIR or YARN_CONF_DIR <https://spark.apache.org/docs/latest/running-on-yarn.html>, from which spark-submit can find the params it

Re: Spark on yarn, only 1 or 2 vcores getting allocated to the containers getting created.

2016-08-03 Thread Mungeol Heo
y by default yarn does not > honor cpu cores as resource, so you will always see vcore is 1 no matter > what number of cores you set in spark. > > On Wed, Aug 3, 2016 at 12:11 PM, satyajit vegesna > <satyajit.apas...@gmail.com> wrote: >> >> Hi All, >> >>

Re: Spark on yarn, only 1 or 2 vcores getting allocated to the containers getting created.

2016-08-03 Thread Mungeol Heo
y by default yarn does not > honor cpu cores as resource, so you will always see vcore is 1 no matter > what number of cores you set in spark. > > On Wed, Aug 3, 2016 at 12:11 PM, satyajit vegesna > <satyajit.apas...@gmail.com> wrote: >> >> Hi All, >> >> I am t

  1   2   3   4   5   6   >