SparkSQL performance

Renato Marroquín Mogrovejo Mon, 20 Apr 2015 00:33:13 -0700

Hi all,

I have a simple query "Select * from tableX where attribute1 between 0 and
5" that I run over a Kryo file with four partitions that ends up being
around 3.5 million rows in our case.
If I run this query by doing a simple map().filter() it takes around ~9.6
seconds but when I apply schema, register the table into a SqlContext, and
then run the query, it takes around ~16 seconds. This is using Spark 1.2.1
with Scala 2.10.0
I am wondering why there is such a big gap on performance if it is just a
filter. Internally, the relation files are mapped to a JavaBean. This
different data presentation (JavaBeans vs SparkSQL internal representation)
could lead to such difference? Is there anything I could do to make the
performance get closer to the "hard-coded" option?
Thanks in advance for any suggestions or ideas.



Renato M.

SparkSQL performance

Reply via email to