Spark on yarn vs spark standalone

2015-11-26 Thread cs user
Hi All, Apologies if this question has been asked before. I'd like to know if there are any downsides to running spark over yarn with the --master yarn-cluster option vs having a separate spark standalone cluster to execute jobs? We're looking at installing a hdfs/hadoop cluster with Ambari

Re: Spark on yarn vs spark standalone

2015-11-26 Thread Jeff Zhang
> Apologies if this question has been asked before. I'd like to know if > there are any downsides to running spark over yarn with the --master > yarn-cluster option vs having a separate spark standalone cluster to > execute jobs? > > We're looking at installing a hdfs/hadoop cluster wit

Re: Spark on YARN using Java 1.8 fails

2015-11-11 Thread mvle
Unfortunately, no. I switched back to OpenJDK 1.7. Didn't get a chance to dig deeper. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-on-YARN-using-Java-1-8-fails-tp24925p25360.html Sent from the Apache Spark User List mailing list archive

Re: Spark on YARN using Java 1.8 fails

2015-11-11 Thread Abel Rincón
Hi, There was another related question https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201506.mbox/%3CCAJ2peNeruM2Y2Tbf8-Wiras-weE586LM_o25FsN=+z1-bfw...@mail.gmail.com%3E Some months ago, if I remember well, using spark 1.3 + YARN + Java 8 we had the same probem. https

Spark using Yarn timelineserver - High CPU usage

2015-11-05 Thread Krzysztof Zarzycki
Hi there, I have a serious problem in my Hadoop cluster, that YARN Timeline server generates very high load, 800% CPU when there are 8 Spark Streaming jobs running in parallel. I discuss this problem on Hadoop group in parallel:

"java.io.IOException: Connection reset by peer" thrown on the resource manager when launching Spark on Yarn

2015-10-22 Thread PashMic
Hi all, I am trying to launch a Spark job using yarn-client mode on a cluster. I have already tried spark-shell with yarn and I can launch the application. But, I also would like to be able run the driver program from, say eclipse, while using the cluster to run the tasks. I have also added spark

Spark on Yarn

2015-10-21 Thread Raghuveer Chanda
Hi all, I am trying to run spark on yarn in quickstart cloudera vm.It already has spark 1.3 and Hadoop 2.6.0-cdh5.4.0 installed.(I am not using spark-submit since I want to run a different version of spark). I am able to run spark 1.3 on yarn but get the below error for spark 1.4. The log shows

RE: Spark on Yarn

2015-10-21 Thread Jean-Baptiste Onofré
12:33 (GMT+01:00) To: user@spark.apache.org Subject: Spark on Yarn Hi all, I am trying to run spark on yarn in quickstart cloudera vm.It already has spark 1.3 and Hadoop 2.6.0-cdh5.4.0 installed.(I am not using spark-submit since I want to run a different version of spark). I am able to run spa

Re: Spark on Yarn

2015-10-21 Thread Raghuveer Chanda
Hi, So does this mean I can't run spark 1.4 fat jar on yarn without installing spark 1.4. I am including spark 1.4 in my pom.xml so doesn't this mean its compiling in 1.4. On Wed, Oct 21, 2015 at 4:38 PM, Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > Hi > > The compiled

Re: Spark on Yarn

2015-10-21 Thread Adrian Tanase
, 2015 at 2:14 PM To: Jean-Baptiste Onofré Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" Subject: Re: Spark on Yarn Hi, So does this mean I can't run spark 1.4 fat jar on yarn without installing spark 1.4. I am including spark 1.4 in my pom.xml so doesn't this me

Re: Spark on Yarn

2015-10-21 Thread Raghuveer Chanda
th maven) and marking it as provided in sbt. > > -adrian > > From: Raghuveer Chanda > Date: Wednesday, October 21, 2015 at 2:14 PM > To: Jean-Baptiste Onofré > Cc: "user@spark.apache.org" > Subject: Re: Spark on Yarn > > Hi, > > So does this mean I can't run

Preemption with Spark on Yarn

2015-10-20 Thread surbhi.mungre
Hi All, I am new to Spark and I am trying to understand how preemption works with Spark on Yarn. My goal is to determine amount of re-work a Spark application has to do if an executor is preempted. For my test, I am using a 4 node cluster with Cloudera VM running Spark 1.3.0. I am running

Re: Spark on YARN using Java 1.8 fails

2015-10-12 Thread Abhisheks
Did you get any resolution for this? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-on-YARN-using-Java-1-8-fails-tp24925p25039.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Running Spark in Yarn-client mode

2015-10-08 Thread Sushrut Ikhar
t; Regards > JB > > On 10/08/2015 07:23 AM, Sushrut Ikhar wrote: > >> Hi, >> I am new to Spark and I have been trying to run Spark in yarn-client mode. >> >> I get this error in yarn logs : >> Error: Could not find or load main class >> org.apache.s

Running Spark in Yarn-client mode

2015-10-07 Thread Sushrut Ikhar
Hi, I am new to Spark and I have been trying to run Spark in yarn-client mode. I get this error in yarn logs : Error: Could not find or load main class org.apache.spark.executor.CoarseGrainedExecutorBackend Also, I keep getting these warnings: WARN YarnScheduler: Initial job has not accepted

Re: Running Spark in Yarn-client mode

2015-10-07 Thread Jean-Baptiste Onofré
Hi Sushrut, which packaging of Spark do you use ? Do you have a working Yarn cluster (with at least one worker) ? spark-hadoop-x ? Regards JB On 10/08/2015 07:23 AM, Sushrut Ikhar wrote: Hi, I am new to Spark and I have been trying to run Spark in yarn-client mode. I get this error in yarn

Re: [Spark on YARN] Multiple Auxiliary Shuffle Service Versions

2015-10-06 Thread Steve Loughran
On 6 Oct 2015, at 01:23, Andrew Or > wrote: Both the history server and the shuffle service are backward compatible, but not forward compatible. This means as long as you have the latest version of history server / shuffle service running in

Re: [Spark on YARN] Multiple Auxiliary Shuffle Service Versions

2015-10-06 Thread Alex Rovner
Thank you all for your help. *Alex Rovner* *Director, Data Engineering * *o:* 646.759.0052 * * On Tue, Oct 6, 2015 at 11:17 AM, Steve Loughran wrote: > > On 6 Oct 2015, at 01:23, Andrew Or wrote: > > Both the history

Re: [Spark on YARN] Multiple Auxiliary Shuffle Service Versions

2015-10-06 Thread Andreas Fritzler
Alex Rovner* >> *Director, Data Engineering * >> *o:* 646.759.0052 >> >> * <http://www.magnetic.com/>* >> >> On Mon, Oct 5, 2015 at 11:06 AM, Andreas Fritzler < >> andreas.fritz...@gmail.com> wrote: >> >>> Hi Steve, Alex,

Re: [Spark on YARN] Multiple Auxiliary Shuffle Service Versions

2015-10-05 Thread Andrew Or
eve, Alex, >> >> how do you handle the distribution and configuration of >> the spark-*-yarn-shuffle.jar on your NodeManagers if you want to use 2 >> different Spark versions? >> >> Regards, >> Andreas >> >> On Mon, Oct 5, 2015 at 4:54 PM, St

Re: [Spark on YARN] Multiple Auxiliary Shuffle Service Versions

2015-10-05 Thread Alex Rovner
gt; how do you handle the distribution and configuration of > the spark-*-yarn-shuffle.jar on your NodeManagers if you want to use 2 > different Spark versions? > > Regards, > Andreas > > On Mon, Oct 5, 2015 at 4:54 PM, Steve Loughran <ste...@hortonworks.com> > wrote: >

[Spark on YARN] Multiple Auxiliary Shuffle Service Versions

2015-10-05 Thread Andreas Fritzler
Hi, I was just wondering, if it is possible to register multiple versions of the aux-services with YARN as described in the documentation: 1. In the yarn-site.xml on each node, add spark_shuffle to yarn.nodemanager.aux-services, then set

Re: [Spark on YARN] Multiple Auxiliary Shuffle Service Versions

2015-10-05 Thread Steve Loughran
> On 5 Oct 2015, at 15:59, Alex Rovner wrote: > > I have the same question about the history server. We are trying to run > multiple versions of Spark and are wondering if the history server is > backwards compatible. yes, it supports the pre-1.4 "Single attempt"

Re: [Spark on YARN] Multiple Auxiliary Shuffle Service Versions

2015-10-05 Thread Andreas Fritzler
Hi Steve, Alex, how do you handle the distribution and configuration of the spark-*-yarn-shuffle.jar on your NodeManagers if you want to use 2 different Spark versions? Regards, Andreas On Mon, Oct 5, 2015 at 4:54 PM, Steve Loughran <ste...@hortonworks.com> wrote: > > > On 5 Oct

Re: [Spark on YARN] Multiple Auxiliary Shuffle Service Versions

2015-10-05 Thread Alex Rovner
I have the same question about the history server. We are trying to run multiple versions of Spark and are wondering if the history server is backwards compatible. *Alex Rovner* *Director, Data Engineering * *o:* 646.759.0052 * * On Mon, Oct 5, 2015 at 9:22 AM, Andreas

Re: [Spark on YARN] Multiple Auxiliary Shuffle Service Versions

2015-10-05 Thread Alex Rovner
Hey Steve, Are you referring to the 1.5 version of the history server? *Alex Rovner* *Director, Data Engineering * *o:* 646.759.0052 * * On Mon, Oct 5, 2015 at 10:18 AM, Steve Loughran wrote: > > > On 5 Oct 2015, at 15:59, Alex Rovner

Re: [Spark on YARN] Multiple Auxiliary Shuffle Service Versions

2015-10-05 Thread Steve Loughran
> On 5 Oct 2015, at 16:48, Alex Rovner wrote: > > Hey Steve, > > Are you referring to the 1.5 version of the history server? > Yes. I should warn, however, that there's no guarantee that a history server running the 1.4 code will handle the histories of a 1.5+

Re: Spark on YARN using Java 1.8 fails

2015-10-05 Thread Ted Yu
YARN 2.7.1 (running on the cluster) was built with Java 1.8, I assume. Have you used the following command to retrieve / inspect logs ? yarn logs -applicationId Cheers On Mon, Oct 5, 2015 at 8:41 AM, mvle <m...@us.ibm.com> wrote: > Hi, > > I have successfully run pyspark on Spar

Spark on YARN using Java 1.8 fails

2015-10-05 Thread mvle
Hi, I have successfully run pyspark on Spark 1.5.1 on YARN 2.7.1 with Java OpenJDK 1.7. However, when I run the same test on Java OpenJDK 1.8 (or Oracle Java 1.8), I cannot start up pyspark. Has anyone been able to run Spark on YARN with Java 1.8? I get ApplicationMaster disassociated messages

Where are logs for Spark Kafka Yarn on Cloudera

2015-09-29 Thread Rachana Srivastava
application logs are logged. Also tried setting log output to spark.yarn.app.container.log.dir but got access denied error. Question: Do we need to have some special setup to run spark streaming on Yarn? How do we debug? Where to find more details to test streaming on Yarn. Thanks, Rachana

Re: Where are logs for Spark Kafka Yarn on Cloudera

2015-09-29 Thread Marcelo Vanzin
. Is it the only location > where application logs are logged. > > > > Also tried setting log output to spark.yarn.app.container.log.dir but got > access denied error. > > > > Question: Do we need to have some special setup to run spark streaming on > Yarn? How

Re: Re: How to fix some WARN when submit job on spark 1.5 YARN

2015-09-24 Thread r7raul1...@163.com
Thank you r7raul1...@163.com From: Sean Owen Date: 2015-09-24 16:18 To: r7raul1...@163.com CC: user Subject: Re: How to fix some WARN when submit job on spark 1.5 YARN You can ignore all of these. Various libraries can take advantage of native acceleration if libs are available but it's

Re: Spark on YARN / aws - executor lost on node restart

2015-09-24 Thread Adrian Tanase
lto:user@spark.apache.org>" Subject: Re: Spark on YARN / aws - executor lost on node restart Hi guys, Digging up this question after spending some more time trying to replicate it. It seems to be an issue with the YARN – spark integration, wondering if there is a bug already tracking this?

Re: How to fix some WARN when submit job on spark 1.5 YARN

2015-09-24 Thread Sean Owen
You can ignore all of these. Various libraries can take advantage of native acceleration if libs are available but it's no problem if they don't. On Thu, Sep 24, 2015 at 3:25 AM, r7raul1...@163.com wrote: > 1 WARN netlib.BLAS: Failed to load implementation from: >

How to fix some WARN when submit job on spark 1.5 YARN

2015-09-23 Thread r7raul1...@163.com
1 WARN netlib.BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS 2 WARN netlib.BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS 3 WARN Unable to load native-hadoop library for your platform r7raul1...@163.com

Re: Spark on Yarn vs Standalone

2015-09-21 Thread Saisai Shao
t; wrote: >>>> >>>>> Hi Sandy >>>>> >>>>> Thank you for your reply >>>>> Currently we use r3.2xlarge boxes (vCPU: 8, Mem: 61 GiB) >>>>> with emr setting for Spark "maximizeResourceAllocation": "true&q

Re: Spark on Yarn vs Standalone

2015-09-21 Thread Sandy Ryza
>>> Those settings seem reasonable to me. >>>>> >>>>> Are you observing performance that's worse than you would expect? >>>>> >>>>> -Sandy >>>>> >>>>> On Mon, Sep 7, 2015 at 11:22 AM, Alexander Piv

Re: Spark on Yarn vs Standalone

2015-09-21 Thread Alexander Pivovarov
because of GC or it might occupy more memory than Yarn allows) >>>>> >>>>> >>>>> >>>>> On Tue, Sep 8, 2015 at 3:02 PM, Sandy Ryza <sandy.r...@cloudera.com> >>>>> wrote: >>>>> >>>>>> Th

Re: Spark on Yarn vs Standalone

2015-09-21 Thread Alexander Pivovarov
>>>>> Hi Alex, >>>>> >>>>> If they're both configured correctly, there's no reason that Spark >>>>> Standalone should provide performance or memory improvement over Spark on >>>>> YARN. >>>>> >>>>>

Re: Spark on Yarn: Kryo throws ClassNotFoundException for class included in fat jar

2015-09-18 Thread Vipul Rai
Hi Nick/Igor, ​​ Any solution for this ? Even I am having the same issue and copying jar to each executor is not feasible if we use lot of jars. Thanks, Vipul

Re: Spark on YARN / aws - executor lost on node restart

2015-09-18 Thread Adrian Tanase
Hi guys, Digging up this question after spending some more time trying to replicate it. It seems to be an issue with the YARN – spark integration, wondering if there is a bug already tracking this? If I just kill the process on the machine, YARN detects the container is dead and the spark

RE: Spark streaming on spark-standalone/ yarn inside Spring XD

2015-09-17 Thread Vignesh Radhakrishnan
spark-standalone/ yarn inside Spring XD I am not at all familiar with how SpringXD works so hard to say. On Wed, Sep 16, 2015 at 12:12 PM, Vignesh Radhakrishnan <vignes...@altiux.com<mailto:vignes...@altiux.com>> wrote: Yes, it is TD. I'm able to run word count etc on spark standalon

Spark w/YARN Scheduling Questions...

2015-09-17 Thread Robert Saccone
it relates to the Spark concepts of Jobs, Stages, and Tasks in the online documentation. This makes it hard to reason about the scheduling behavior. What is the heuristic used to kill executors when running Spark with YARN in dynamic mode? From the logs what we observe is that executors that have

Re: Spark w/YARN Scheduling Questions...

2015-09-17 Thread Saisai Shao
entioned in the Spark logs we get from our > runs but we can't seem to find a definition and how it relates to the Spark > concepts of Jobs, Stages, and Tasks in the online documentation. This > makes it hard to reason about the scheduling behavior. > > > What is the heuristic

Spark streaming on spark-standalone/ yarn inside Spring XD

2015-09-16 Thread Vignesh Radhakrishnan
spark processor on spark standalone or yarn inside spring XD or is spark local the only option here ? The processor module is: class WordCount extends Processor[String, (String, Int)] { def process(input: ReceiverInputDStream[String]): DStream[(String, Int)] = { val words

Spark on YARN / aws - executor lost on node restart

2015-09-16 Thread Adrian Tanase
Hi all, We’re using spark streaming (1.4.0), deployed on AWS through yarn. It’s a stateful app that reads from kafka (with the new direct API) and we’re checkpointing to HDFS. During some resilience testing, we restarted one of the machines and brought it back online. During the offline

Re: Spark streaming on spark-standalone/ yarn inside Spring XD

2015-09-16 Thread Tathagata Das
t spark to spark standalone > (running on the same machine) or yarn-client. *Is it possible to run > spark processor on spark standalone or yarn inside spring XD or is spark > local the only option here ?* > > > > The processor module is: > > > > class WordCount exte

Re: Spark streaming on spark-standalone/ yarn inside Spring XD

2015-09-16 Thread Vignesh Radhakrishnan
Yes, it is TD. I'm able to run word count etc on spark standalone/ yarn when it's not integrated with spring xd. But the same breaks when used as processor on spring. Was trying to get an opinion on whether it's doable or it's something that's not supported at the moment On 16 Sep 2015 23:50

Re: Spark streaming on spark-standalone/ yarn inside Spring XD

2015-09-16 Thread Tathagata Das
I am not at all familiar with how SpringXD works so hard to say. On Wed, Sep 16, 2015 at 12:12 PM, Vignesh Radhakrishnan < vignes...@altiux.com> wrote: > Yes, it is TD. I'm able to run word count etc on spark standalone/ yarn > when it's not integrated with spring xd. > But the s

Re: Spark on Yarn vs Standalone

2015-09-10 Thread Sandy Ryza
ark.yarn.executor.memoryOverhead 5324 >>> >>> we also set spark.default.parallelism = slave_count * 16 >>> >>> Does it look good for you? (we run single heavy job on cluster) >>> >>> Alex >>> >>> On Mon, Sep 7, 2015 at 11:03 AM

bad substitution for [hdp.version] Error in spark on YARN job

2015-09-09 Thread Jeetendra Gangele
Hi , I am getting below error when running the spark job on YARN with HDP cluster. I have installed spark and yarn from Ambari and I am using spark 1.3.1 with HDP version HDP-2.3.0.0-2557. My spark-default.conf has correct entry spark.driver.extraJavaOptions -Dhdp.version=2.3.0.0-2557

Re: bad substitution for [hdp.version] Error in spark on YARN job

2015-09-09 Thread Jeetendra Gangele
when running the spark job on YARN with HDP > cluster. > I have installed spark and yarn from Ambari and I am using spark 1.3.1 > with HDP version HDP-2.3.0.0-2557. > > My spark-default.conf has correct entry > > spark.driver.extraJavaOptions -Dhdp.version=2.3.0.0-2557 > spark.ya

Re: Spark on Yarn: Kryo throws ClassNotFoundException for class included in fat jar

2015-09-08 Thread Igor Berman
as a starting point, attach your stacktrace... ps: look for duplicates in your classpath, maybe you include another jar with same class On 8 September 2015 at 06:38, Nicholas R. Peterson wrote: > I'm trying to run a Spark 1.4.1 job on my CDH5.4 cluster, through Yarn. >

Re: Spark on Yarn: Kryo throws ClassNotFoundException for class included in fat jar

2015-09-08 Thread Nicholas R. Peterson
Thans, Igor; I've got it running again right now, and can attach the stack trace when it finishes. In the mean time, I've noticed something interesting: in the Spark UI, the application jar that I submit is not being included on the classpath. It has been successfully uploaded to the nodes -- in

Re: Spark on Yarn vs Standalone

2015-09-08 Thread Sandy Ryza
m = slave_count * 16 > > Does it look good for you? (we run single heavy job on cluster) > > Alex > > On Mon, Sep 7, 2015 at 11:03 AM, Sandy Ryza <sandy.r...@cloudera.com> > wrote: > >> Hi Alex, >> >> If they're both configured correctly, there'

Re: Spark on Yarn: Kryo throws ClassNotFoundException for class included in fat jar

2015-09-08 Thread Nick Peterson
Yes, the jar contains the class: $ jar -tf lumiata-evaluation-assembly-1.0.jar | grep 2028/Document/Document com/i2028/Document/Document$1.class com/i2028/Document/Document.class What else can I do? Is there any way to get more information about the classes available to the particular

Re: Spark on Yarn: Kryo throws ClassNotFoundException for class included in fat jar

2015-09-08 Thread Nicholas R. Peterson
Here is the stack trace: (Sorry for the duplicate, Igor -- I forgot to include the list.) 15/09/08 05:56:43 WARN scheduler.TaskSetManager: Lost task 183.0 in stage 41.0 (TID 193386, ds-compute2.lumiata.com): java.io.IOException: com.esotericsoftware.kryo.KryoException: Error constructing

Re: Spark on Yarn: Kryo throws ClassNotFoundException for class included in fat jar

2015-09-08 Thread Igor Berman
java.lang.ClassNotFoundException: com.i2028.Document.Document 1. so have you checked that jar that you create(fat jar) contains this class? 2. might be there is some stale cache issue...not sure though On 8 September 2015 at 16:12, Nicholas R. Peterson wrote: > Here is

Re: Spark on Yarn: Kryo throws ClassNotFoundException for class included in fat jar

2015-09-08 Thread Igor Berman
hmm...out of ideas. can you check in spark ui environment tab that this jar is not somehow appears 2 times or more...? or more generally - any 2 jars that can contain this class by any chance regarding your question about classloader - no idea, probably there is, I remember stackoverflow has some

Re: Spark on Yarn: Kryo throws ClassNotFoundException for class included in fat jar

2015-09-08 Thread Igor Berman
another idea - you can add this fat jar explicitly to the classpath of executors...it's not a solution, but might be it work... I mean place it somewhere locally on executors and add it to cp with spark.executor.extraClassPath On 8 September 2015 at 18:30, Nick Peterson

Re: Spark on Yarn: Kryo throws ClassNotFoundException for class included in fat jar

2015-09-08 Thread Nick Peterson
Yeah... none of the jars listed on the classpath contain this class. The only jar that does is the fat jar that I'm submitting with spark-submit, which as mentioned isn't showing up on the classpath anywhere. -- Nick On Tue, Sep 8, 2015 at 8:26 AM Igor Berman wrote: >

Re: Spark on Yarn: Kryo throws ClassNotFoundException for class included in fat jar

2015-09-08 Thread Nick Peterson
Yes, putting the jar on each node and adding it manually to the executor classpath does it. So, it seems that's where the issue lies. I'll do some experimenting and see if I can narrow down the problem; but, for now, at least I can run my job! Thanks for your help. On Tue, Sep 8, 2015 at 8:40

Re: Spark on Yarn vs Standalone

2015-09-08 Thread Alexander Pivovarov
gt;> Alex >> >> On Mon, Sep 7, 2015 at 11:03 AM, Sandy Ryza <sandy.r...@cloudera.com> >> wrote: >> >>> Hi Alex, >>> >>> If they're both configured correctly, there's no reason that Spark >>> Standalone should provide performance

Re: Spark on Yarn vs Standalone

2015-09-07 Thread Sandy Ryza
Hi Alex, If they're both configured correctly, there's no reason that Spark Standalone should provide performance or memory improvement over Spark on YARN. -Sandy On Fri, Sep 4, 2015 at 1:24 PM, Alexander Pivovarov <apivova...@gmail.com> wrote: > Hi Everyone > > We are trying

Spark on Yarn: Kryo throws ClassNotFoundException for class included in fat jar

2015-09-07 Thread Nicholas R. Peterson
I'm trying to run a Spark 1.4.1 job on my CDH5.4 cluster, through Yarn. Serialization is set to use Kryo. I have a large object which I send to the executors as a Broadcast. The object seems to serialize just fine. When it attempts to deserialize, though, Kryo throws a ClassNotFoundException...

Re: Spark on Yarn vs Standalone

2015-09-07 Thread Alexander Pivovarov
ason that Spark > Standalone should provide performance or memory improvement over Spark on > YARN. > > -Sandy > > On Fri, Sep 4, 2015 at 1:24 PM, Alexander Pivovarov <apivova...@gmail.com> > wrote: > >> Hi Everyone >> >> We are trying the latest aws emr-

Spark on Yarn vs Standalone

2015-09-04 Thread Alexander Pivovarov
Hi Everyone We are trying the latest aws emr-4.0.0 and Spark and my question is about YARN vs Standalone mode. Our usecase is - start 100-150 nodes cluster every week, - run one heavy spark job (5-6 hours) - save data to s3 - stop cluster Officially aws emr-4.0.0 comes with Spark on Yarn It's

Re: Spark-on-YARN LOCAL_DIRS location

2015-08-29 Thread Akhil Das
space filling up during Spark jobs because Spark-on-YARN uses the yarn.nodemanager.local-dirs for shuffle space. I noticed this message appears when submitting Spark-on-YARN jobs: WARN SparkConf: In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager

Spark-on-YARN LOCAL_DIRS location

2015-08-26 Thread michael.england
Hi, I am having issues with /tmp space filling up during Spark jobs because Spark-on-YARN uses the yarn.nodemanager.local-dirs for shuffle space. I noticed this message appears when submitting Spark-on-YARN jobs: WARN SparkConf: In Spark 1.0 and later spark.local.dir will be overridden

spark on yarn is slower than spark-ec2 standalone, how to tune?

2015-08-15 Thread AlexG
I'm using a manually installation of Spark under Yarn to run a 30 node r3.8xlarge EC2 cluster (each node has 244Gb RAM, 600Gb SDD). All my code runs much faster on a cluster launched w/ the spark-ec2 script, but there's a mysterious problem with nodes becoming inaccessible, so I switched to using

Re: Spark on YARN

2015-08-10 Thread Jem Tucker
On Fri, Aug 7, 2015 at 1:48 AM, Jem Tucker jem.tuc...@gmail.com wrote: Hi, I am running spark on YARN on the CDH5.3.2 stack. I have created a new user to own and run a testing environment, however when using this user applications I submit to yarn never begin to run, even

Re: Spark on YARN

2015-08-08 Thread Sandy Ryza
...@gmail.com wrote: Hi, I am running spark on YARN on the CDH5.3.2 stack. I have created a new user to own and run a testing environment, however when using this user applications I submit to yarn never begin to run, even if they are the exact same application that is successful with another user

Re: Spark on YARN

2015-08-08 Thread Jem Tucker
at 1:48 AM, Jem Tucker jem.tuc...@gmail.com wrote: Hi, I am running spark on YARN on the CDH5.3.2 stack. I have created a new user to own and run a testing environment, however when using this user applications I submit to yarn never begin to run, even if they are the exact same application

Re: Spark on YARN

2015-08-08 Thread Jem Tucker
them resources? Does an application master start? If so, what are in its logs? If not, anything suspicious in the YARN ResourceManager logs? -Sandy On Fri, Aug 7, 2015 at 1:48 AM, Jem Tucker jem.tuc...@gmail.com wrote: Hi, I am running spark on YARN on the CDH5.3.2 stack. I have created

Re: Spark on YARN

2015-08-08 Thread Shushant Arora
ResourceManager logs? -Sandy On Fri, Aug 7, 2015 at 1:48 AM, Jem Tucker jem.tuc...@gmail.com wrote: Hi, I am running spark on YARN on the CDH5.3.2 stack. I have created a new user to own and run a testing environment, however when using this user applications I submit to yarn never begin

Spark on YARN

2015-08-07 Thread Jem Tucker
Hi, I am running spark on YARN on the CDH5.3.2 stack. I have created a new user to own and run a testing environment, however when using this user applications I submit to yarn never begin to run, even if they are the exact same application that is successful with another user? Has anyone seen

Re: Spark on YARN

2015-07-30 Thread Jeetendra Gangele
it. If you met similar problem, you could increase this configuration “yarn.nodemanager.vmem-pmem-ratio”. Thanks Jerry *From:* Jeff Zhang [mailto:zjf...@gmail.com] *Sent:* Thursday, July 30, 2015 4:36 PM *To:* Jeetendra Gangele *Cc:* user *Subject:* Re: Spark on YARN 15/07/30 12:13:35

Re: Spark on YARN

2015-07-30 Thread Jeetendra Gangele
but at the terminal job is succeeding .I guess there are con issue job it not at all launching /bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster lib/spark-examples-1.4.1-hadoop2.6.0.jar 10 Complete log SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found

Re: Spark on YARN

2015-07-30 Thread Jeff Zhang
is succeeding .I guess there are con issue job it not at all launching /bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster lib/spark-examples-1.4.1-hadoop2.6.0.jar 10 Complete log SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file

Spark on YARN

2015-07-30 Thread Jeetendra Gangele
I am running below command this is default spark PI program but this is not running all the log are going in stderr but at the terminal job is succeeding .I guess there are con issue job it not at all launching /bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster lib

RE: Spark on YARN

2015-07-30 Thread Shao, Saisai
Gangele Cc: user Subject: Re: Spark on YARN 15/07/30 12:13:35 ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM AM is killed somehow, may due to preemption. Does it always happen ? Resource manager log would be helpful. On Thu, Jul 30, 2015 at 4:17 PM, Jeetendra Gangele gangele

spark on yarn

2015-07-14 Thread Shushant Arora
I am running spark application on yarn managed cluster. When I specify --executor-cores 4 it fails to start the application. I am starting the app as spark-submit --class classname --num-executors 10 --executor-cores 5 --master masteradd jarname Exception in thread main

Re: spark on yarn

2015-07-14 Thread Marcelo Vanzin
On Tue, Jul 14, 2015 at 9:57 AM, Shushant Arora shushantaror...@gmail.com wrote: When I specify --executor-cores 4 it fails to start the application. When I give --executor-cores as 4 , it works fine. Do you have any NM that advertises more than 4 available cores? Also, it's always worth it

Re: spark on yarn

2015-07-14 Thread Shushant Arora
Ok thanks a lot! few more doubts : What happens in a streaming application say with spark-submit --class classname --num-executors 10 --executor-cores 4 --master masteradd jarname Will it allocate 10 containers throughout the life of streaming application on same nodes until any node failure

Re: spark on yarn

2015-07-14 Thread Marcelo Vanzin
On Tue, Jul 14, 2015 at 11:13 AM, Shushant Arora shushantaror...@gmail.com wrote: spark-submit --class classname --num-executors 10 --executor-cores 4 --master masteradd jarname Will it allocate 10 containers throughout the life of streaming application on same nodes until any node failure

Re: spark on yarn

2015-07-14 Thread Marcelo Vanzin
On Tue, Jul 14, 2015 at 12:03 PM, Shushant Arora shushantaror...@gmail.com wrote: Can a container have multiple JVMs running in YARN? Yes and no. A container runs a single command, but that process can start other processes, and those also count towards the resource usage of the container

Re: spark on yarn

2015-07-14 Thread Shushant Arora
Can a container have multiple JVMs running in YARN? I am comparing Hadoop Mapreduce running on yarn vs spark running on yarn here : 1.Is the difference is in Hadoop Mapreduce job - say I specify 20 reducers and my job uses 10 map tasks then, it need total 30 containers or 30 vcores ? I guess 30

Re: spark on yarn

2015-07-14 Thread Marcelo Vanzin
On Tue, Jul 14, 2015 at 10:40 AM, Shushant Arora shushantaror...@gmail.com wrote: My understanding was --executor-cores(5 here) are maximum concurrent tasks possible in an executor and --num-executors (10 here)are no of executors or containers demanded by Application master/Spark driver

Re: spark on yarn

2015-07-14 Thread Shushant Arora
Is yarn.scheduler.maximum-allocation-vcores the setting for max vcores per container? Whats the setting for max limit of --num-executors ? On Tue, Jul 14, 2015 at 11:18 PM, Marcelo Vanzin van...@cloudera.com wrote: On Tue, Jul 14, 2015 at 10:40 AM, Shushant Arora shushantaror...@gmail.com

Re: spark on yarn

2015-07-14 Thread Ted Yu
Shushant : Please also see 'Debugging your Application' section of https://spark.apache.org/docs/latest/running-on-yarn.html On Tue, Jul 14, 2015 at 10:48 AM, Marcelo Vanzin van...@cloudera.com wrote: On Tue, Jul 14, 2015 at 10:40 AM, Shushant Arora shushantaror...@gmail.com wrote: My

Re: spark on yarn

2015-07-14 Thread Marcelo Vanzin
On Tue, Jul 14, 2015 at 10:55 AM, Shushant Arora shushantaror...@gmail.com wrote: Is yarn.scheduler.maximum-allocation-vcores the setting for max vcores per container? I don't remember YARN config names by heart, but that sounds promising. I'd look at the YARN documentation for details.

Re: spark on yarn

2015-07-14 Thread Shushant Arora
got the below exception in logs: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException): Invalid resource request, requested virtual cores 0, or requested virtual cores max configured, requestedVirtualCores=5, maxVirtualCores=4 at

spark on yarn failing silently

2015-06-22 Thread roy
Hi, suddenly our spark job on yarn started failing silently without showing any error, following is the trace in verbose mode Using properties file: /usr/lib/spark/conf/spark-defaults.conf Adding default property: spark.serializer=org.apache.spark.serializer.KryoSerializer Adding default

Re: Spark on Yarn - How to configure

2015-06-19 Thread Andrew Or
Hi Ashish, For Spark on YARN, you actually only need the Spark files on one machine - the submission client. This machine could even live outside of the cluster. Then all you need to do is point YARN_CONF_DIR to the directory containing your hadoop configuration files (e.g. yarn-site.xml

Spark on Yarn - How to configure

2015-06-19 Thread Ashish Soni
Can some one please let me know what all i need to configure to have Spark run using Yarn , There is lot of documentation but none of it says how and what all files needs to be changed Let say i have 4 node for Spark - SparkMaster , SparkSlave1 , SparkSlave2 , SparkSlave3 Now in which node

Re: deployment options for Spark and YARN w/ many app jar library dependencies

2015-06-18 Thread Sweeney, Matt
them on each node after the first download. You can also use the spark.executor.extraClassPath config to point to them. -Sandy On Wed, Jun 17, 2015 at 4:47 PM, Sweeney, Matt mswee...@fourv.commailto:mswee...@fourv.com wrote: Hi folks, I'm looking to deploy spark on YARN and I have read through

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-06-18 Thread Ji ZHANG
Hi, We switched from ParallelGC to CMS, and the symptom is gone. On Thu, Jun 4, 2015 at 3:37 PM, Ji ZHANG zhangj...@gmail.com wrote: Hi, I set spark.shuffle.io.preferDirectBufs to false in SparkConf and this setting can be seen in web ui's environment tab. But, it still eats memory, i.e.

Re: Spark-sql(yarn-client) java.lang.NoClassDefFoundError: org/apache/spark/deploy/yarn/ExecutorLauncher

2015-06-18 Thread Yin Huai
btw, user listt will be a better place for this thread. On Thu, Jun 18, 2015 at 8:19 AM, Yin Huai yh...@databricks.com wrote: Is it the full stack trace? On Thu, Jun 18, 2015 at 6:39 AM, Sea 261810...@qq.com wrote: Hi, all: I want to run spark sql on yarn(yarn-client), but ... I already

Spark-sql(yarn-client) java.lang.NoClassDefFoundError: org/apache/spark/deploy/yarn/ExecutorLauncher

2015-06-18 Thread Sea
Hi, all: I want to run spark sql on yarn(yarn-client), but ... I already set spark.yarn.jar and spark.jars in conf/spark-defaults.conf. ./bin/spark-sql -f game.sql --executor-memory 2g --num-executors 100 game.txt Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-06-18 Thread Tathagata Das
Glad to hear that. :) On Thu, Jun 18, 2015 at 6:25 AM, Ji ZHANG zhangj...@gmail.com wrote: Hi, We switched from ParallelGC to CMS, and the symptom is gone. On Thu, Jun 4, 2015 at 3:37 PM, Ji ZHANG zhangj...@gmail.com wrote: Hi, I set spark.shuffle.io.preferDirectBufs to false in

<    1   2   3   4   5   6   >