Hi all, I have a simple query "Select * from tableX where attribute1 between 0 and 5" that I run over a Kryo file with four partitions that ends up being around 3.5 million rows in our case. If I run this query by doing a simple map().filter() it takes around ~9.6 seconds but when I apply schema, register the table into a SqlContext, and then run the query, it takes around ~16 seconds. This is using Spark 1.2.1 with Scala 2.10.0 I am wondering why there is such a big gap on performance if it is just a filter. Internally, the relation files are mapped to a JavaBean. This different data presentation (JavaBeans vs SparkSQL internal representation) could lead to such difference? Is there anything I could do to make the performance get closer to the "hard-coded" option? Thanks in advance for any suggestions or ideas.
Renato M.