Spark[SqL] performance tuning

2020-11-12 Thread Lakshmi Nivedita
Hi all, I have pyspark sql script with loading of one table 80mb and one is 2 mb and rest 3 are small tables performing lots of joins in the script to fetch the data. My system configuration is 4 nodes,300 GB,64 cores To write a data frame into table 24Mb size records . System is taking 4

Re: Question on Spark SQL performance of Range Queries on Large Datasets

2015-04-27 Thread ayan guha
The answer is it depends :) The fact that query runtime increases indicates more shuffle. You may want to construct rdds based on keys you use. You may want to specify what kind of node you are using and how many executors you are using. You may also want to play around with executor memory

Question on Spark SQL performance of Range Queries on Large Datasets

2015-04-27 Thread Mani
Hi, I am a graduate student from Virginia Tech (USA) pursuing my Masters in Computer Science. I’ve been researching on parallel and distributed databases and their performance for running some Range queries involving simple joins and group by on large datasets. As part of my research, I tried

Re: Spark SQL performance issue.

2015-04-23 Thread ayan guha
Quick questions: why are you cache both rdd and table? Which stage of job is slow? On 23 Apr 2015 17:12, Nikolay Tikhonov tikhonovnico...@gmail.com wrote: Hi, I have Spark SQL performance issue. My code contains a simple JavaBean: public class Person implements Externalizable

Re: Spark SQL performance issue.

2015-04-23 Thread Nikolay Tikhonov
...@gmail.com: Quick questions: why are you cache both rdd and table? Which stage of job is slow? On 23 Apr 2015 17:12, Nikolay Tikhonov tikhonovnico...@gmail.com wrote: Hi, I have Spark SQL performance issue. My code contains a simple JavaBean: public class Person implements Externalizable

Re: Spark SQL performance issue.

2015-04-23 Thread Arush Kharbanda
tikhonovnico...@gmail.com wrote: Hi, I have Spark SQL performance issue. My code contains a simple JavaBean: public class Person implements Externalizable { private int id; private String name; private double salary; } Apply

Re: spark sql performance

2015-03-13 Thread Akhil Das
That totally depends on your data size and your cluster setup. Thanks Best Regards On Thu, Mar 12, 2015 at 7:32 PM, Udbhav Agarwal udbhav.agar...@syncoms.com wrote: Hi, What is query time for join query on hbase with spark sql. Say tables in hbase have 0.5 million records each. I am

RE: spark sql performance

2015-03-13 Thread Udbhav Agarwal
Thanks Akhil, What more info should I give so we can estimate query time in my scenario? Thanks, Udbhav Agarwal From: Akhil Das [mailto:ak...@sigmoidanalytics.com] Sent: 13 March, 2015 12:01 PM To: Udbhav Agarwal Cc: user@spark.apache.org Subject: Re: spark sql performance That totally depends

RE: spark sql performance

2015-03-13 Thread Udbhav Agarwal
To: Udbhav Agarwal Cc: user@spark.apache.org Subject: Re: spark sql performance The size/type of your data, and your cluster configuration would be fine i think. Thanks Best Regards On Fri, Mar 13, 2015 at 12:07 PM, Udbhav Agarwal udbhav.agar...@syncoms.commailto:udbhav.agar...@syncoms.com wrote

Re: spark sql performance

2015-03-13 Thread Akhil Das
,* *Udbhav Agarwal* *From:* Akhil Das [mailto:ak...@sigmoidanalytics.com] *Sent:* 13 March, 2015 12:01 PM *To:* Udbhav Agarwal *Cc:* user@spark.apache.org *Subject:* Re: spark sql performance That totally depends on your data size and your cluster setup. Thanks Best Regards

RE: spark sql performance

2015-03-13 Thread Udbhav Agarwal
Okay Akhil! Thanks for the information. Thanks, Udbhav Agarwal From: Akhil Das [mailto:ak...@sigmoidanalytics.com] Sent: 13 March, 2015 12:34 PM To: Udbhav Agarwal Cc: user@spark.apache.org Subject: Re: spark sql performance Can't say that unless you try it. Thanks Best Regards On Fri, Mar 13

Re: spark sql performance

2015-03-13 Thread Akhil Das
,* *Udbhav Agarwal* *From:* Akhil Das [mailto:ak...@sigmoidanalytics.com] *Sent:* 13 March, 2015 12:27 PM *To:* Udbhav Agarwal *Cc:* user@spark.apache.org *Subject:* Re: spark sql performance So you can cache upto 8GB of data in memory (hope your data size of one table is 2GB

RE: spark sql performance

2015-03-13 Thread Udbhav Agarwal
: spark sql performance So you can cache upto 8GB of data in memory (hope your data size of one table is 2GB), then it should be pretty fast with SparkSQL. Also i'm assuming you have around 12-16 cores total. Thanks Best Regards On Fri, Mar 13, 2015 at 12:22 PM, Udbhav Agarwal udbhav.agar

RE: spark sql performance

2015-03-13 Thread Udbhav Agarwal
Additionally I wanted to tell that presently I was running the query on one machine with 3gm ram and the join query was taking around 6 seconds. Thanks, Udbhav Agarwal From: Udbhav Agarwal Sent: 13 March, 2015 12:45 PM To: 'Akhil Das' Cc: user@spark.apache.org Subject: RE: spark sql performance

Re: spark sql performance

2015-03-13 Thread Akhil Das
was running the query on one machine with 3gm ram and the join query was taking around 6 seconds. *Thanks,* *Udbhav Agarwal* *From:* Udbhav Agarwal *Sent:* 13 March, 2015 12:45 PM *To:* 'Akhil Das' *Cc:* user@spark.apache.org *Subject:* RE: spark sql performance Okay Akhil! Thanks

Re: spark sql performance

2015-03-13 Thread Akhil Das
:16 PM *To:* Udbhav Agarwal *Cc:* user@spark.apache.org *Subject:* Re: spark sql performance The size/type of your data, and your cluster configuration would be fine i think. Thanks Best Regards On Fri, Mar 13, 2015 at 12:07 PM, Udbhav Agarwal udbhav.agar...@syncoms.com wrote

RE: spark sql performance

2015-03-13 Thread Udbhav Agarwal
Okay Akhil. I am having 4 cores cpu.(2.4 ghz) Thanks, Udbhav Agarwal From: Akhil Das [mailto:ak...@sigmoidanalytics.com] Sent: 13 March, 2015 1:07 PM To: Udbhav Agarwal Cc: user@spark.apache.org Subject: Re: spark sql performance You can see where it is spending time, whether there is any GC

spark sql performance

2015-03-12 Thread Udbhav Agarwal
Hi, What is query time for join query on hbase with spark sql. Say tables in hbase have 0.5 million records each. I am expecting a query time (latency) in milliseconds with spark sql. Can this be possible ? Thanks, Udbhav Agarwal

Spark SQL performance and data size constraints

2014-11-26 Thread SK
this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-performance-and-data-size-constraints-tp19843.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e

RE: Spark SQL performance and data size constraints

2014-11-26 Thread Cheng, Hao
. -Original Message- From: SK [mailto:skrishna...@gmail.com] Sent: Wednesday, November 26, 2014 4:17 PM To: u...@spark.incubator.apache.org Subject: Spark SQL performance and data size constraints Hi, I use the following code to read in data and extract the unique users using Spark SQL. The data