Re: Spark SQL Custom Predicate Pushdown

2015-01-17 Thread Michael Armbrust
1) The fields in the SELECT clause are not pushed down to the predicate pushdown API. I have many optimizations that allow fields to be filtered out before the resulting object is serialized on the Accumulo tablet server. How can I get the selection information from the execution plan? I'm a

Re: Spark SQL Custom Predicate Pushdown

2015-01-17 Thread Corey Nolet
I see now. It optimizes the selection semantics so that less things need to be included just to do a count(). Very nice. I did a collect() instead of a count just to see what would happen and it looks like the all the expected select fields were propagated down as expected. Thanks. On Sat,

Re: Spark SQL Custom Predicate Pushdown

2015-01-17 Thread Corey Nolet
Michael, What I'm seeing (in Spark 1.2.0) is that the required columns being pushed down to the DataRelation are not the product of the SELECT clause but rather just the columns explicitly included in the WHERE clause. Examples from my testing: SELECT * FROM myTable -- The required columns are

Re: Spark SQL Custom Predicate Pushdown

2015-01-17 Thread Michael Armbrust
How are you running your test here? Are you perhaps doing a .count()? On Sat, Jan 17, 2015 at 12:54 PM, Corey Nolet cjno...@gmail.com wrote: Michael, What I'm seeing (in Spark 1.2.0) is that the required columns being pushed down to the DataRelation are not the product of the SELECT clause

Re: Spark SQL Custom Predicate Pushdown

2015-01-17 Thread Corey Nolet
I did an initial implementation. There are two assumptions i had from the start that I was very surprised were not a part of the predicate pushdown API: 1) The fields in the SELECT clause are not pushed down to the predicate pushdown API. I have many optimizations that allow fields to be filtered

Re: Spark SQL Custom Predicate Pushdown

2015-01-16 Thread Corey Nolet
Hao, Thanks so much for the links! This is exactly what I'm looking for. If I understand correctly, I can extend PrunedFilteredScan, PrunedScan, and TableScan and I should be able to support all the sql semantics? I'm a little confused about the Array[Filter] that is used with the Filtered scan.

RE: Spark SQL Custom Predicate Pushdown

2015-01-15 Thread Cheng, Hao
The Data Source API probably work for this purpose. It support the column pruning and the Predicate Push Down: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala Examples also can be found in the unit test: