Hi all,
I have pyspark sql script with loading of one table 80mb and one is 2 mb
and rest 3 are small tables performing lots of joins in the script to fetch
the data.
My system configuration is
4 nodes,300 GB,64 cores
To write a data frame into table 24Mb size records . System is taking 4
The answer is it depends :)
The fact that query runtime increases indicates more shuffle. You may want
to construct rdds based on keys you use.
You may want to specify what kind of node you are using and how many
executors you are using. You may also want to play around with executor
memory
Hi,
I am a graduate student from Virginia Tech (USA) pursuing my Masters in
Computer Science. I’ve been researching on parallel and distributed databases
and their performance for running some Range queries involving simple joins and
group by on large datasets. As part of my research, I tried
Quick questions: why are you cache both rdd and table?
Which stage of job is slow?
On 23 Apr 2015 17:12, Nikolay Tikhonov tikhonovnico...@gmail.com wrote:
Hi,
I have Spark SQL performance issue. My code contains a simple JavaBean:
public class Person implements Externalizable
...@gmail.com:
Quick questions: why are you cache both rdd and table?
Which stage of job is slow?
On 23 Apr 2015 17:12, Nikolay Tikhonov tikhonovnico...@gmail.com
wrote:
Hi,
I have Spark SQL performance issue. My code contains a simple JavaBean:
public class Person implements Externalizable
tikhonovnico...@gmail.com
wrote:
Hi,
I have Spark SQL performance issue. My code contains a simple JavaBean:
public class Person implements Externalizable {
private int id;
private String name;
private double salary;
}
Apply
That totally depends on your data size and your cluster setup.
Thanks
Best Regards
On Thu, Mar 12, 2015 at 7:32 PM, Udbhav Agarwal udbhav.agar...@syncoms.com
wrote:
Hi,
What is query time for join query on hbase with spark sql. Say tables in
hbase have 0.5 million records each. I am
Thanks Akhil,
What more info should I give so we can estimate query time in my scenario?
Thanks,
Udbhav Agarwal
From: Akhil Das [mailto:ak...@sigmoidanalytics.com]
Sent: 13 March, 2015 12:01 PM
To: Udbhav Agarwal
Cc: user@spark.apache.org
Subject: Re: spark sql performance
That totally depends
To: Udbhav Agarwal
Cc: user@spark.apache.org
Subject: Re: spark sql performance
The size/type of your data, and your cluster configuration would be fine i
think.
Thanks
Best Regards
On Fri, Mar 13, 2015 at 12:07 PM, Udbhav Agarwal
udbhav.agar...@syncoms.commailto:udbhav.agar...@syncoms.com wrote
,*
*Udbhav Agarwal*
*From:* Akhil Das [mailto:ak...@sigmoidanalytics.com]
*Sent:* 13 March, 2015 12:01 PM
*To:* Udbhav Agarwal
*Cc:* user@spark.apache.org
*Subject:* Re: spark sql performance
That totally depends on your data size and your cluster setup.
Thanks
Best Regards
Okay Akhil! Thanks for the information.
Thanks,
Udbhav Agarwal
From: Akhil Das [mailto:ak...@sigmoidanalytics.com]
Sent: 13 March, 2015 12:34 PM
To: Udbhav Agarwal
Cc: user@spark.apache.org
Subject: Re: spark sql performance
Can't say that unless you try it.
Thanks
Best Regards
On Fri, Mar 13
,*
*Udbhav Agarwal*
*From:* Akhil Das [mailto:ak...@sigmoidanalytics.com]
*Sent:* 13 March, 2015 12:27 PM
*To:* Udbhav Agarwal
*Cc:* user@spark.apache.org
*Subject:* Re: spark sql performance
So you can cache upto 8GB of data in memory (hope your data size of one
table is 2GB
: spark sql performance
So you can cache upto 8GB of data in memory (hope your data size of one table
is 2GB), then it should be pretty fast with SparkSQL. Also i'm assuming you
have around 12-16 cores total.
Thanks
Best Regards
On Fri, Mar 13, 2015 at 12:22 PM, Udbhav Agarwal
udbhav.agar
Additionally I wanted to tell that presently I was running the query on one
machine with 3gm ram and the join query was taking around 6 seconds.
Thanks,
Udbhav Agarwal
From: Udbhav Agarwal
Sent: 13 March, 2015 12:45 PM
To: 'Akhil Das'
Cc: user@spark.apache.org
Subject: RE: spark sql performance
was running the query on
one machine with 3gm ram and the join query was taking around 6 seconds.
*Thanks,*
*Udbhav Agarwal*
*From:* Udbhav Agarwal
*Sent:* 13 March, 2015 12:45 PM
*To:* 'Akhil Das'
*Cc:* user@spark.apache.org
*Subject:* RE: spark sql performance
Okay Akhil! Thanks
:16 PM
*To:* Udbhav Agarwal
*Cc:* user@spark.apache.org
*Subject:* Re: spark sql performance
The size/type of your data, and your cluster configuration would be fine i
think.
Thanks
Best Regards
On Fri, Mar 13, 2015 at 12:07 PM, Udbhav Agarwal
udbhav.agar...@syncoms.com wrote
Okay Akhil.
I am having 4 cores cpu.(2.4 ghz)
Thanks,
Udbhav Agarwal
From: Akhil Das [mailto:ak...@sigmoidanalytics.com]
Sent: 13 March, 2015 1:07 PM
To: Udbhav Agarwal
Cc: user@spark.apache.org
Subject: Re: spark sql performance
You can see where it is spending time, whether there is any GC
Hi,
What is query time for join query on hbase with spark sql. Say tables in hbase
have 0.5 million records each. I am expecting a query time (latency) in
milliseconds with spark sql. Can this be possible ?
Thanks,
Udbhav Agarwal
this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-performance-and-data-size-constraints-tp19843.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e
.
-Original Message-
From: SK [mailto:skrishna...@gmail.com]
Sent: Wednesday, November 26, 2014 4:17 PM
To: u...@spark.incubator.apache.org
Subject: Spark SQL performance and data size constraints
Hi,
I use the following code to read in data and extract the unique users using
Spark SQL. The data
20 matches
Mail list logo