Spark performance on small dataset

2022-11-20 Thread Prarthi Jain
Hi Everyone, Spark and the RDD approach it favors assumes that most applications run on big data and need massive parallelism via sharding and concurrent computing. But some tasks run on small data and do not need or benefit from RDD parallelism. How are these tasks expected to perform on Spark?

Re: A scene with unstable Spark performance

2022-05-18 Thread Chang Chen
This is a case where resources are fixed in the same SparkContext, but sqls have different priorities. Some SQLs are only allowed to be executed if there are spare resources, once the high priority sql comes in, those sqls taskset either are killed or stalled. If we set a high priority pool's

Re: A scene with unstable Spark performance

2022-05-17 Thread Sungwoo Park
The problem you describe is the motivation for developing Spark on MR3. >From the blog article (https://www.datamonad.com/post/2021-08-18-spark-mr3/ ): *The main motivation for developing Spark on MR3 is to allow multiple Spark applications to share compute resources such as Yarn containers or

Re: A scene with unstable Spark performance

2022-05-17 Thread Bowen Song
From: Qian SUN Sent: Wednesday, May 18, 2022 9:32 To: Bowen Song Cc: user.spark Subject: Re: A scene with unstable Spark performance Hi. I think you need Spark dynamic resource allocation. Please refer to https://spark.apache.org/docs/latest/job-scheduling.html#dynamic

Re: A scene with unstable Spark performance

2022-05-17 Thread Qian SUN
Song 于2022年5月17日周二 22:33写道: > Hi all, > > > > I find Spark performance is unstable in this scene: we divided the jobs > into two groups according to the job completion time. One group of jobs had > an execution time of less than 10s, and the other group of jobs had an >

A scene with unstable Spark performance

2022-05-17 Thread Bowen Song
Hi all, I find Spark performance is unstable in this scene: we divided the jobs into two groups according to the job completion time. One group of jobs had an execution time of less than 10s, and the other group of jobs had an execution time from 10s to 300s. The reason for the difference

RE: Spark performance over S3

2021-04-07 Thread Boris Litvak
Oh, Tzahi, I misread the metrics in the first reply. It’s about reads indeed, not writes. From: Tzahi File Sent: Wednesday, 7 April 2021 16:02 To: Hariharan Cc: user Subject: Re: Spark performance over S3 Hi Hariharan, Thanks for your reply. In both cases we are writing the data to S3

Re: Spark performance over S3

2021-04-07 Thread Tzahi File
Hi Hariharan, Thanks for your reply. In both cases we are writing the data to S3. The difference is that in the first case we read the data from S3 and in the second we read from HDFS. We are using ListObjectsV2 API in S3A . The S3 bucket and

Re: Spark performance over S3

2021-04-07 Thread Vladimir Prus
VPC endpoint can also make a major difference in costs. Without it, access to S3 incurs data transfer costs and NAT costs, and these can be large. On Wed, 7 Apr 2021 at 14:13, Hariharan wrote: > Hi Tzahi, > > Comparing the first two cases: > >- > reads the parquet files from S3 and also

Re: Spark performance over S3

2021-04-07 Thread Hariharan
Hi Tzahi, Comparing the first two cases: - > reads the parquet files from S3 and also writes to S3, it takes 22 min - > reads the parquet files from S3 and writes to its local hdfs, it takes the same amount of time (±22 min) It looks like most of the time is being spent in reading, and the time

RE: Spark performance over S3

2021-04-07 Thread Boris Litvak
tion to compare this with EMRFS performance … I know it requires you to put in some work. Boris From: Gourav Sengupta Sent: Tuesday, 6 April 2021 22:24 To: Tzahi File Cc: user Subject: Re: Spark performance over S3 Hi Tzahi, that is a huge cost. So that I can understand the question before answe

Re: Spark performance over S3

2021-04-06 Thread Gourav Sengupta
Hi Tzahi, that is a huge cost. So that I can understand the question before answering it: 1. what is the SPARK version that you are using? 2. what is the SQL code that you are using to read and write? There are several other questions that are pertinent, but the above will be a great starting

Spark performance over S3

2021-04-06 Thread Tzahi File
Hi All, We have a spark cluster on aws ec2 that has 60 X i3.4xlarge. The spark job running on that cluster reads from an S3 bucket and writes to that bucket. the bucket and the ec2 run in the same region. As part of our efforts to reduce the runtime of our spark jobs we found there's serious

Re: How does extending an existing parquet with columns affect impala/spark performance?

2018-04-03 Thread naresh Goud
>From spark point of view it shouldn’t effect. it’s possible to extend columns of new parquet files and it won’t affect Performance and not required to change spark application code. On Tue, Apr 3, 2018 at 9:14 AM Vitaliy Pisarev wrote: > This is not strictly a

How does extending an existing parquet with columns affect impala/spark performance?

2018-04-03 Thread Vitaliy Pisarev
This is not strictly a spark question but I'll give it a shot: have an existing setup of parquet files that are being queried from impala and from spark. I intend to add some 30 relatively 'heavy' columns to the parquet. Each column would store an array of structs. Each struct can have from 5 to

Re: GroupBy and Spark Performance issue

2017-01-17 Thread Andy Dang
Repartition wouldn't save you from skewed data unfortunately. The way Spark works now is that it pulls data of the same key to one single partition, and Spark, AFAIK, retains the mapping from key to data in memory. You can use aggregateBykey() or combineByKey() or reduceByKey() to avoid this

GroupBy and Spark Performance issue

2017-01-16 Thread KhajaAsmath Mohammed
Hi, I am trying to group by data in spark and find out maximum value for group of data. I have to use group by as I need to transpose based on the values. I tried repartition data by increasing number from 1 to 1.Job gets run till the below stage and it takes long time to move ahead. I was

Re: SPARK PERFORMANCE TUNING

2016-09-21 Thread Mich Talebzadeh
nput > data format? Is compression used? > > On 21 Sep 2016, at 13:37, Trinadh Kaja <ktr.hadoo...@gmail.com> wrote: > > Hi all, > > how to increase spark performance ,i am using pyspark. > > cluster info : > > Total memory :600gb > Cores:96 > > command

Re: SPARK PERFORMANCE TUNING

2016-09-21 Thread Jörn Franke
Do you mind sharing what your software does? What is the input data size? What is the spark version and apis used? How many nodes? What is the input data format? Is compression used? > On 21 Sep 2016, at 13:37, Trinadh Kaja <ktr.hadoo...@gmail.com> wrote: > > Hi all, > >

SPARK PERFORMANCE TUNING

2016-09-21 Thread Trinadh Kaja
Hi all, how to increase spark performance ,i am using pyspark. cluster info : Total memory :600gb Cores:96 command : spark-submit --master yarn-client --executor-memory 10G --num-executors 50 --executor-cores 2 --driver-memory 10g --queue thequeue please help on this -- Thanks

increase spark performance

2016-09-21 Thread Trinadh Kaja
Hi all, how to increase spark performance , cluster info : total memory :600gb cores -- Thanks K.Trinadh Ph-7348826118

Re: Spark performance testing

2016-07-09 Thread Mich Talebzadeh
> Yea, I'm looking for any personal experiences people have had with tools > like these. > > On Jul 8, 2016, at 8:57 PM, charles li <charles.up...@gmail.com> wrote: > > Hi, Andrew, I've got lots of materials when asking google for "*spark > performance test*" &g

Re: Spark performance testing

2016-07-08 Thread Andrew Ehrlich
Yea, I'm looking for any personal experiences people have had with tools like these. > On Jul 8, 2016, at 8:57 PM, charles li <charles.up...@gmail.com> wrote: > > Hi, Andrew, I've got lots of materials when asking google for "spark > performance test" > > h

Re: Spark performance testing

2016-07-08 Thread charles li
Hi, Andrew, I've got lots of materials when asking google for "*spark performance test*" - https://github.com/databricks/spark-perf - https://spark-summit.org/2014/wp-content/uploads/2014/06/Testing-Spark-Best-Practices-Anupama-Shetty-Neil-Marshall.pdf - http://people

Spark performance testing

2016-07-08 Thread Andrew Ehrlich
Hi group, What solutions are people using to do performance testing and tuning of spark applications? I have been doing a pretty manual technique where I lay out an Excel sheet of various memory settings and caching parameters and then execute each one by hand. It’s pretty tedious though, so

Re: Is that normal spark performance?

2016-06-15 Thread Deepak Goel
r-3] [TaskSetManager] > Finished task 1.0 in stage 1.0 (TID 6) in 199 ms on node1 (4/5) > [2016-06-15 09:26:01.599] [INFO ] [task-result-getter-2] [TaskSetManager] > Finished task 3.0 in stage 1.0 (TID 8) in 200 ms on node1 (5/5) > [2016-06-15 09:26:01.599] [INFO ] [task-result-getter-2]

Re: Is that normal spark performance?

2016-06-15 Thread Jörn Franke
s on node1 (4/5) > [2016-06-15 09:26:01.599] [INFO ] [task-result-getter-2] [TaskSetManager] > Finished task 3.0 in stage 1.0 (TID 8) in 200 ms on node1 (5/5) > [2016-06-15 09:26:01.599] [INFO ] [task-result-getter-2] [TaskSchedulerImpl] > Removed TaskSet 1.0, whose tasks have all completed, from pool > [2016-06-15 09:26:01.599] [INFO ] [dag-scheduler-event-loop] [DAGScheduler] > ResultStage 1 (collect at EquityTCAAnalytics.java:88) finished in 0.202 s > [2016-06-15 09:26:01.612] [INFO ] [main] [DAGScheduler] Job 0 finished: > collect at EquityTCAAnalytics.java:88, took 32.496470 s > [2016-06-15 09:26:01.634] [INFO ] [main] [EquityTCAAnalytics] [((2016-06-10 > 13:45:00.0,DA),6944), ((2016-06-10 14:25:00.0,B),5241), ..., ((2016-06-10 > 10:55:00.0,QD),109080), ((2016-06-10 14:55:00.0,A),1300)] > [2016-06-15 09:26:01.641] [INFO ] [main] [EquityTCAAnalytics] finish > 32.5 s is normal? > View this message in context: Is that normal spark performance? > Sent from the Apache Spark User List mailing list archive at Nabble.com.

Is that normal spark performance?

2016-06-15 Thread nikita.dobryukha
). I've created an example:32.5 s is normal? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-that-normal-spark-performance-tp27174.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: 答复: My notes on Spark Performance & Tuning Guide

2016-05-17 Thread Gajanan Satone
016/‎5/‎18 5:16 > 收件人: user @spark <user@spark.apache.org> > 主题: Re: My notes on Spark Performance & Tuning Guide > > Hi all, > > Many thanks for your tremendous interest in the forthcoming notes. I have > had nearly thirty requests and many supporting kind words fro

答复: My notes on Spark Performance & Tuning Guide

2016-05-17 Thread 谭成灶
Thanks for your sharing! Please include me too 发件人: Mich Talebzadeh<mailto:mich.talebza...@gmail.com> 发送时间: ‎2016/‎5/‎18 5:16 收件人: user @spark<mailto:user@spark.apache.org> 主题: Re: My notes on Spark Performance & Tuning Guide Hi all, Many

Re: My notes on Spark Performance & Tuning Guide

2016-05-17 Thread Jeff Zhang
I think you can write it in gitbook and share it in user mail list then everyone can comment on that. On Wed, May 18, 2016 at 10:12 AM, Vinayak Agrawal < vinayakagrawa...@gmail.com> wrote: > Please include me too. > > Vinayak Agrawal > Big Data Analytics > IBM > > "To Strive, To Seek, To Find

Re: My notes on Spark Performance & Tuning Guide

2016-05-17 Thread Vinayak Agrawal
Please include me too. Vinayak Agrawal Big Data Analytics IBM "To Strive, To Seek, To Find and Not to Yield!" ~Lord Alfred Tennyson > On May 17, 2016, at 2:15 PM, Mich Talebzadeh > wrote: > > Hi all, > > Many thanks for your tremendous interest in the forthcoming

Re: My notes on Spark Performance & Tuning Guide

2016-05-17 Thread Abi
Please include me too On May 12, 2016 6:08:14 AM EDT, Mich Talebzadeh wrote: >Hi Al,, > > >Following the threads in spark forum, I decided to write up on >configuration of Spark including allocation of resources and >configuration >of driver, executors, threads,

Re: My notes on Spark Performance & Tuning Guide

2016-05-17 Thread Cesar Flores
Please sent me to me too ! Thanks ! ! ! Cesar Flores On Tue, May 17, 2016 at 4:55 PM, Femi Anthony wrote: > Please send it to me as well. > > Thanks > > Sent from my iPhone > > On May 17, 2016, at 12:09 PM, Raghavendra Pandey < > raghavendra.pan...@gmail.com> wrote: > >

Re: My notes on Spark Performance & Tuning Guide

2016-05-17 Thread Femi Anthony
Please send it to me as well. Thanks Sent from my iPhone > On May 17, 2016, at 12:09 PM, Raghavendra Pandey > wrote: > > Can you please send me as well. > > Thanks > Raghav > >> On 12 May 2016 20:02, "Tom Ellis" wrote: >> I would like to

Re: My notes on Spark Performance & Tuning Guide

2016-05-17 Thread rakesh sharma
It would be a rare doc. Please share Get Outlook for Android On Tue, May 17, 2016 at 9:14 AM -0700, "Natu Lauchande" > wrote: Hi Mich, I am also interested in the write up. Regards, Natu On Thu, May 12, 2016 at 12:08

Re: My notes on Spark Performance & Tuning Guide

2016-05-17 Thread Natu Lauchande
Hi Mich, I am also interested in the write up. Regards, Natu On Thu, May 12, 2016 at 12:08 PM, Mich Talebzadeh wrote: > Hi Al,, > > > Following the threads in spark forum, I decided to write up on > configuration of Spark including allocation of resources and

Re: My notes on Spark Performance & Tuning Guide

2016-05-17 Thread Raghavendra Pandey
Can you please send me as well. Thanks Raghav On 12 May 2016 20:02, "Tom Ellis" wrote: > I would like to also Mich, please send it through, thanks! > > On Thu, 12 May 2016 at 15:14 Alonso Isidoro wrote: > >> Me too, send me the guide. >> >> Enviado desde

Re: My notes on Spark Performance & Tuning Guide

2016-05-12 Thread Tom Ellis
I would like to also Mich, please send it through, thanks! On Thu, 12 May 2016 at 15:14 Alonso Isidoro wrote: > Me too, send me the guide. > > Enviado desde mi iPhone > > El 12 may 2016, a las 12:11, Ashok Kumar >

Re: My notes on Spark Performance & Tuning Guide

2016-05-12 Thread Alonso Isidoro
Me too, send me the guide. Enviado desde mi iPhone > El 12 may 2016, a las 12:11, Ashok Kumar > escribió: > > Hi Dr Mich, > > I will be very keen to have a look at it and review if possible. > > Please forward me a copy > > Thanking you warmly > > > On

Re: My notes on Spark Performance & Tuning Guide

2016-05-12 Thread Ashok Kumar
Hi Dr Mich, I will be very keen to have a look at it and review if possible. Please forward me a copy Thanking you warmly On Thursday, 12 May 2016, 11:08, Mich Talebzadeh wrote: Hi Al,, Following the threads in spark forum, I decided to write up on

My notes on Spark Performance & Tuning Guide

2016-05-12 Thread Mich Talebzadeh
Hi Al,, Following the threads in spark forum, I decided to write up on configuration of Spark including allocation of resources and configuration of driver, executors, threads, execution of Spark apps and general troubleshooting taking into account the allocation of resources for Spark

Re: Hive on Spark performance

2016-03-13 Thread Mich Talebzadeh
Depending on the version of Hive on Spark engine. As far as I am aware the latest version of Hive that I am using (Hive 2) has improvements compared to the previous versions of Hive (0.14,1.2.1) on Spark engine. As of today I have managed to use Hive 2.0 on Spark version 1.3.1. So it is not the

spark performance non-linear response

2015-10-07 Thread Yadid Ayzenberg
) performance improvement as the cluster size increases (plot below). Each node has 4 cores and each worker is configured to use 10GB or RAM. Spark performance I would expect a more linear response given the number of partitions and the fact that all of the data is cached. Can anyone suggest what I

Re: spark performance non-linear response

2015-10-07 Thread Sean Owen
more or > less equal and entirely cached in RAM. > I evaluated the performance on several cluster sizes, and am witnessing a > non linear (power) performance improvement as the cluster size increases > (plot below). Each node has 4 cores and each worker is configured to use > 10GB o

Re: spark performance non-linear response

2015-10-07 Thread Yadid Ayzenberg
cores and each worker is configured to use 10GB or RAM. Spark performance I would expect a more linear response given the number of partitions and the fact that all of the data is cached. Can anyone suggest what I should tweak in order to improve the performance? Or perhaps provide

Re: spark performance non-linear response

2015-10-07 Thread Jonathan Coveney
048 partitions which are more or > less equal and entirely cached in RAM. > I evaluated the performance on several cluster sizes, and am witnessing a > non linear (power) performance improvement as the cluster size increases > (plot below). Each node has 4 cores and each worker is configured t

Re: flatmap() and spark performance

2015-09-28 Thread Hemant Bhanawat
You can use spark.executor.memory to specify the memory of the executors which will hold this intermediate results. You may want to look at the section "Understanding Memory Management in Spark" of this link:

Re: spark performance - executor computing time

2015-09-17 Thread Adrian Tanase
lto:user@spark.apache.org>" Subject: Re: spark performance - executor computing time Is this repeatable? Do you always get one or two executors that are 6 times as slow? It could be that some of your tasks have more work to do (maybe you are filtering some records out? If it’s always one p

Re: spark performance - executor computing time

2015-09-16 Thread Robin East
Is this repeatable? Do you always get one or two executors that are 6 times as slow? It could be that some of your tasks have more work to do (maybe you are filtering some records out? If it’s always one particular worker node is there something about the machine configuration (e.g. CPU speed)

spark performance - executor computing time

2015-09-15 Thread patcharee
Hi, I was running a job (on Spark 1.5 + Yarn + java 8). In a stage that lookup (org.apache.spark.rdd.PairRDDFunctions.lookup(PairRDDFunctions.scala:873)) there was an executor that took the executor computing time > 6 times of median. This executor had almost the same shuffle read size and

DataFrames in Spark - Performance when interjected with RDDs

2015-09-07 Thread Pallavi Rao
Hello All, I had a question regarding the performance optimization (Catalyst Optimizer) of DataFrames. I understand that DataFrames are interoperable with RDDs. If I switch back and forth between DataFrames and RDDs, does the performance optimization still kick-in? I need to switch to RDDs to

blogs/articles/videos on how to analyse spark performance

2015-08-19 Thread Todd
Hi, I would ask if there are some blogs/articles/videos on how to analyse spark performance during runtime,eg, tools that can be used or something related.

Re: blogs/articles/videos on how to analyse spark performance

2015-08-19 Thread Gourav Sengupta
to analyse spark performance during runtime,eg, tools that can be used or something related.

Re: blogs/articles/videos on how to analyse spark performance

2015-08-19 Thread Igor Berman
, Gourav Sengupta On Wed, Aug 19, 2015 at 4:12 PM, Todd bit1...@163.com wrote: Hi, I would ask if there are some blogs/articles/videos on how to analyse spark performance during runtime,eg, tools that can be used or something related.

RE: Spark performance

2015-07-13 Thread Mohammed Guller
. Mohammed From: Michael Segel [mailto:msegel_had...@hotmail.com] Sent: Sunday, July 12, 2015 6:59 AM To: Mohammed Guller Cc: David Mitchell; Roman Sokolov; user; Ravisankar Mani Subject: Re: Spark performance Not necessarily. It depends on the use case and what you intend to do with the data. 4-6

Re: Spark performance

2015-07-12 Thread santoshv98
cluster. So why are you planning to move away from MSSql and move to Spark as the destination platform? You said “Spark performance” is slow as compared to MSSql. What kind of load are you running and what kind of querying are you performing? There may be startup costs associated with running

Re: Spark performance

2015-07-11 Thread Jörn Franke
What is your business case for the move? Le ven. 10 juil. 2015 à 12:49, Ravisankar Mani rrav...@gmail.com a écrit : Hi everyone, I have planned to move mssql server to spark?. I have using around 50,000 to 1l records. The spark performance is slow when compared to mssql server. What

Re: Spark performance

2015-07-11 Thread David Mitchell
@spark.apache.org *Subject:* Spark performance Hi everyone, I have planned to move mssql server to spark?. I have using around 50,000 to 1l records. The spark performance is slow when compared to mssql server. What is the best data base(Spark or sql) to store or retrieve data around

RE: Spark performance

2015-07-11 Thread Roman Sokolov
...@gmail.com] *Sent:* Friday, July 10, 2015 3:50 AM *To:* user@spark.apache.org *Subject:* Spark performance Hi everyone, I have planned to move mssql server to spark?. I have using around 50,000 to 1l records. The spark performance is slow when compared to mssql server. What is the best

RE: Spark performance

2015-07-11 Thread Mohammed Guller
To: Roman Sokolov Cc: Mohammed Guller; user; Ravisankar Mani Subject: Re: Spark performance You can certainly query over 4 TB of data with Spark. However, you will get an answer in minutes or hours, not in milliseconds or seconds. OLTP databases are used for web applications, and typically return

Re: Spark performance

2015-07-11 Thread Jörn Franke
[mailto:rrav...@gmail.com] *Sent:* Friday, July 10, 2015 3:50 AM *To:* user@spark.apache.org *Subject:* Spark performance Hi everyone, I have planned to move mssql server to spark?. I have using around 50,000 to 1l records. The spark performance is slow when compared to mssql server

Re: Spark performance

2015-07-11 Thread Jörn Franke
for that volume of data. Mohammed *From:* Ravisankar Mani [mailto:rrav...@gmail.com] *Sent:* Friday, July 10, 2015 3:50 AM *To:* user@spark.apache.org *Subject:* Spark performance Hi everyone, I have planned to move mssql server to spark?. I have using around 50,000 to 1l records. The spark

Spark performance

2015-07-10 Thread Ravisankar Mani
Hi everyone, I have planned to move mssql server to spark?. I have using around 50,000 to 1l records. The spark performance is slow when compared to mssql server. What is the best data base(Spark or sql) to store or retrieve data around 50,000 to 1l records ? regards, Ravi

RE: Spark performance

2015-07-10 Thread Mohammed Guller
will perform much better for that volume of data. Mohammed From: Ravisankar Mani [mailto:rrav...@gmail.com] Sent: Friday, July 10, 2015 3:50 AM To: user@spark.apache.org Subject: Spark performance Hi everyone, I have planned to move mssql server to spark?. I have using around 50,000 to 1l records

Re: Spark performance issue

2015-07-03 Thread Silvio Fiorito
Date: Friday, July 3, 2015 at 8:58 AM To: user@spark.apache.orgmailto:user@spark.apache.org Subject: Spark performance issue Hello guys, I'm after some advice on Spark performance. I've a MapReduce job that read inputs carry out a simple calculation and write the results into HDFS. I've

Spark performance issue

2015-07-03 Thread diplomatic Guru
Hello guys, I'm after some advice on Spark performance. I've a MapReduce job that read inputs carry out a simple calculation and write the results into HDFS. I've implemented the same logic in Spark job. When I tried both jobs on same datasets, I'm getting different execution time, which

Does spark performance really scale out with multiple machines?

2015-06-15 Thread Wang, Ningjun (LNG-NPV)
I try to measure how spark standalone cluster performance scale out with multiple machines. I did a test of training the SVM model which is heavy in memory computation. I measure the run time for spark standalone cluster of 1 - 3 nodes, the result is following 1 node: 35 minutes 2 nodes: 30.1

Re: Does spark performance really scale out with multiple machines?

2015-06-15 Thread William Briggs
There are a lot of variables to consider. I'm not an expert on Spark, and my ML knowledge is rudimentary at best, but here are some questions whose answers might help us to help you: - What type of Spark cluster are you running (e.g., Stand-alone, Mesos, YARN)? - What does the HTTP UI

Re: Does spark performance really scale out with multiple machines?

2015-06-15 Thread William Briggs
I just wanted to clarify - when I said you hit your maximum level of parallelism, I meant that the default number of partitions might not be large enough to take advantage of more hardware, not that there was no way to increase your parallelism - the documentation I linked gives a few suggestions

Re: Spark performance in cluster mode using yarn

2015-05-15 Thread Sachin Singh
results, Thanks in advance, -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-performance-in-cluster-mode-using-yarn-tp22877.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Spark performance in cluster mode using yarn

2015-05-14 Thread ayan guha
://apache-spark-user-list.1001560.n3.nabble.com/Spark-performance-in-cluster-mode-using-yarn-tp22877.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr

Spark performance in cluster mode using yarn

2015-05-13 Thread sachin Singh
this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-performance-in-cluster-mode-using-yarn-tp22877.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail

Re: Spark Performance on Yarn

2015-04-22 Thread Ted Yu
suffice. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Performance-on-Yarn-tp21729p22610.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Spark Performance on Yarn

2015-04-22 Thread nsalian
+1 to executor-memory to 5g. Do check the overhead space for both the driver and the executor as per Wilfred's suggestion. Typically, 384 MB should suffice. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Performance-on-Yarn-tp21729p22610.html Sent

Re: Spark Performance on Yarn

2015-04-22 Thread Neelesh Salian
: +1 to executor-memory to 5g. Do check the overhead space for both the driver and the executor as per Wilfred's suggestion. Typically, 384 MB should suffice. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Performance-on-Yarn

Re: Spark Performance on Yarn

2015-04-21 Thread hnahak
Try --executor-memory 5g , because you have 8 gb RAM in each machine -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Performance-on-Yarn-tp21729p22603.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Spark Performance on Yarn

2015-04-20 Thread Peng Cheng
I got exactly the same problem, except that I'm running on a standalone master. Can you tell me the counterpart parameter on standalone master for increasing the same memroy overhead? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Performance-on-Yarn

RE: General configurations on CDH5 to achieve maximum Spark Performance

2015-04-17 Thread Evo Eftimov
' Subject: RE: General configurations on CDH5 to achieve maximum Spark Performance Essentially to change the performance yield of software cluster infrastructure platform like spark you play with different permutations of: - Number of CPU cores used by Spark Executors on every cluster

General configurations on CDH5 to achieve maximum Spark Performance

2015-04-16 Thread Manish Gupta 8
Hi, Is there a document/link that describes the general configuration settings to achieve maximum Spark Performance while running on CDH5? In our environment, we did lot of changes (and still doing it) to get decent performance otherwise our 6 node dev cluster with default configurations, lags

RE: General configurations on CDH5 to achieve maximum Spark Performance

2015-04-16 Thread Evo Eftimov
because all worker instances run in the memory of a single machine .. Regards, Evo Eftimov From: Manish Gupta 8 [mailto:mgupt...@sapient.com] Sent: Thursday, April 16, 2015 6:03 PM To: user@spark.apache.org Subject: General configurations on CDH5 to achieve maximum Spark Performance Hi

RE: General configurations on CDH5 to achieve maximum Spark Performance

2015-04-16 Thread Manish Gupta 8
. Thanks, Manish From: Evo Eftimov [mailto:evo.efti...@isecc.com] Sent: Thursday, April 16, 2015 10:38 PM To: Manish Gupta 8; user@spark.apache.org Subject: RE: General configurations on CDH5 to achieve maximum Spark Performance Well there are a number of performance tuning guidelines in dedicated

RE: General configurations on CDH5 to achieve maximum Spark Performance

2015-04-16 Thread Evo Eftimov
-on-yarn.html From: Manish Gupta 8 [mailto:mgupt...@sapient.com] Sent: Thursday, April 16, 2015 6:21 PM To: Evo Eftimov; user@spark.apache.org Subject: RE: General configurations on CDH5 to achieve maximum Spark Performance Thanks Evo. Yes, my concern is only regarding the infrastructure

Re: General configurations on CDH5 to achieve maximum Spark Performance

2015-04-16 Thread Sean Owen
, 2015 at 6:02 PM, Manish Gupta 8 mgupt...@sapient.com wrote: Hi, Is there a document/link that describes the general configuration settings to achieve maximum Spark Performance while running on CDH5? In our environment, we did lot of changes (and still doing it) to get decent performance

Spark Performance -Hive or Hbase?

2015-03-25 Thread Siddharth Ubale
HI , We have started RnD on Apache Spark to use its features such as Spark-SQL Spark Streaming. I have two Pain points , can anyone of you address them which are as follows: 1. Does spark allow us the feature to fetch updated items after an RDD has been mapped and schema has been

Need some help on the Spark performance on Hadoop Yarn

2015-03-19 Thread Yi Ming Huang
Dear Spark experts, I appreciate you can look into my problem and give me some help and suggestions here... Thank you! I have a simple Spark application to parse and analyze the log, and I can run it on my hadoop yarn cluster. The problem with me is that I find it runs quite slow on the cluster,

Re: Spark Performance on Yarn

2015-02-23 Thread Lee Bierman
.ec2.internal:42535/user/Coarse -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Performance-on-Yarn-tp21729p21739.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Spark performance tuning

2015-02-22 Thread Akhil Das
To: user@spark.apache.org Subject: Spark performance tuning Date: Fri, 20 Feb 2015 16:04:23 -0500 Hi, I am new to Spark, and I am trying to test the Spark SQL performance vs Hive. I setup a standalone box, with 24 cores and 64G memory. We have one SQL in mind to test. Here

Re: Spark Performance on Yarn

2015-02-21 Thread Davies Liu
/container/application_1423083596644_0238/container_1423083596644_0238_01_004160 org.apache.spark.executor.CoarseGrainedExecutorBackend akka.tcp://sparkDriver@ip-10-168-86-13.ec2.internal:42535/user/Coarse -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark

RE: Spark performance tuning

2015-02-21 Thread java8964
Can someone share some ideas about how to tune the GC time? Thanks From: java8...@hotmail.com To: user@spark.apache.org Subject: Spark performance tuning Date: Fri, 20 Feb 2015 16:04:23 -0500 Hi, I am new to Spark, and I am trying to test the Spark SQL performance vs Hive. I setup

Re: Spark Performance on Yarn

2015-02-20 Thread Sean Owen
); }) .foreachpartition(dostuff) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Performance-on-Yarn-tp21729.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Spark Performance on Yarn

2015-02-20 Thread Kelvin Chu
org.apache.spark.executor.CoarseGrainedExecutorBackend akka.tcp://sparkDriver@ip-10-168-86-13.ec2.internal:42535/user/Coarse -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Performance-on-Yarn-tp21729p21739.html Sent from the Apache Spark User List mailing list archive

Spark performance tuning

2015-02-20 Thread java8964
Hi, I am new to Spark, and I am trying to test the Spark SQL performance vs Hive. I setup a standalone box, with 24 cores and 64G memory. We have one SQL in mind to test. Here is the basically setup on this one box for the SQL we are trying to run: 1) Dataset 1, 6.6G AVRO file with snappy

Re: Spark Performance on Yarn

2015-02-20 Thread Sandy Ryza
.nabble.com/Spark-Performance-on-Yarn-tp21729p21739.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user

Re: Spark Performance on Yarn

2015-02-20 Thread Lee Bierman
.nabble.com/Spark-Performance-on-Yarn-tp21729p21739.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h

Re: Spark Performance on Yarn

2015-02-20 Thread lbierman
org.apache.spark.executor.CoarseGrainedExecutorBackend akka.tcp://sparkDriver@ip-10-168-86-13.ec2.internal:42535/user/Coarse -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Performance-on-Yarn-tp21729p21739.html Sent from the Apache Spark User List mailing list archive

Re: Spark Performance on Yarn

2015-02-20 Thread Sandy Ryza
(right); }) .foreachpartition(dostuff) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Performance-on-Yarn-tp21729.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Spark Performance on Yarn

2015-02-20 Thread Sandy Ryza
/container_1423083596644_0238_01_004160 org.apache.spark.executor.CoarseGrainedExecutorBackend akka.tcp://sparkDriver@ip-10-168-86-13.ec2.internal:42535/user/Coarse -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Performance-on-Yarn-tp21729p21739.html Sent from the Apache Spark

Re: Spark Performance on Yarn

2015-02-20 Thread Kelvin Chu
-user-list.1001560.n3.nabble.com/Spark-Performance-on-Yarn-tp21729p21739.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional

Re: Spark Performance on Yarn

2015-02-20 Thread Sandy Ryza
/application_1423083596644_0238/container_1423083596644_0238_01_004160 org.apache.spark.executor.CoarseGrainedExecutorBackend akka.tcp://sparkDriver@ip-10-168-86-13.ec2.internal:42535/user/Coarse -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Performance

  1   2   >