The Data Source API probably work for this purpose.
It support the column pruning and the Predicate Push Down:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala

Examples also can be found in the unit test:
https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/sources


From: Corey Nolet [mailto:cjno...@gmail.com]
Sent: Friday, January 16, 2015 1:51 PM
To: user
Subject: Spark SQL Custom Predicate Pushdown

I have document storage services in Accumulo that I'd like to expose to Spark 
SQL. I am able to push down predicate logic to Accumulo to have it perform only 
the seeks necessary on each tablet server to grab the results being asked for.

I'm interested in using Spark SQL to push those predicates down to the tablet 
servers. Where wouldI begin my implementation? Currently I have an input format 
which accepts a "query object" that gets pushed down. How would I extract this 
information from the HiveContext/SQLContext to be able to push this down?

Reply via email to