SparkSQL optimizes better by column pruning and predicate pushdown,
primarily. Here you are not taking advantage of either.

I am curious to know what goes in your filter function, as you are not
using a filter in SQL side.

Best
Ayan
On 21 Apr 2015 08:05, "Renato Marroquín Mogrovejo" <
renatoj.marroq...@gmail.com> wrote:

> Does anybody have an idea? a clue? a hint?
> Thanks!
>
>
> Renato M.
>
> 2015-04-20 9:31 GMT+02:00 Renato Marroquín Mogrovejo <
> renatoj.marroq...@gmail.com>:
>
>> Hi all,
>>
>> I have a simple query "Select * from tableX where attribute1 between 0
>> and 5" that I run over a Kryo file with four partitions that ends up being
>> around 3.5 million rows in our case.
>> If I run this query by doing a simple map().filter() it takes around ~9.6
>> seconds but when I apply schema, register the table into a SqlContext, and
>> then run the query, it takes around ~16 seconds. This is using Spark 1.2.1
>> with Scala 2.10.0
>> I am wondering why there is such a big gap on performance if it is just a
>> filter. Internally, the relation files are mapped to a JavaBean. This
>> different data presentation (JavaBeans vs SparkSQL internal representation)
>> could lead to such difference? Is there anything I could do to make the
>> performance get closer to the "hard-coded" option?
>> Thanks in advance for any suggestions or ideas.
>>
>>
>> Renato M.
>>
>
>

Reply via email to