I have document storage services in Accumulo that I'd like to expose to Spark SQL. I am able to push down predicate logic to Accumulo to have it perform only the seeks necessary on each tablet server to grab the results being asked for.
I'm interested in using Spark SQL to push those predicates down to the tablet servers. Where wouldI begin my implementation? Currently I have an input format which accepts a "query object" that gets pushed down. How would I extract this information from the HiveContext/SQLContext to be able to push this down?