The Data Source API probably work for this purpose. It support the column pruning and the Predicate Push Down: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala
Examples also can be found in the unit test: https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/sources From: Corey Nolet [mailto:cjno...@gmail.com] Sent: Friday, January 16, 2015 1:51 PM To: user Subject: Spark SQL Custom Predicate Pushdown I have document storage services in Accumulo that I'd like to expose to Spark SQL. I am able to push down predicate logic to Accumulo to have it perform only the seeks necessary on each tablet server to grab the results being asked for. I'm interested in using Spark SQL to push those predicates down to the tablet servers. Where wouldI begin my implementation? Currently I have an input format which accepts a "query object" that gets pushed down. How would I extract this information from the HiveContext/SQLContext to be able to push this down?