You can hack around this by constructing logical plans yourself and then creating a DataFrame in order to execute them. Note that this is all depending on internals of the framework and can break when Spark upgrades.
On Thu, Nov 5, 2015 at 4:18 PM, Yana Kadiyska <yana.kadiy...@gmail.com> wrote: > I don't think a view would help -- in the case of under-constraining, I > want to make sure that the user is constraining a column (e.g. I want to > restrict them to querying a single partition at a time but I don't care > which one)...a view per partition value is not practical due to the fairly > high cardinality... > > In the case of predicate augmentation, the additional predicate depends on > the value the user is providing e.g. my data is partitioned under > teacherName but the end users don't have this information...So if they ask > for student_id="1234" I'd like to add "teacherName='Smith'" based on a > mapping that is not surfaced to the user (sorry for the contrived > example)...But I don't think I can do this with a view. A join will produce > the right answer but is counter-productive as my goal is to minimize the > partitions being processed. > > I can parse the query myself -- I was not fond of this solution as I'd go > sql string to parse tree back to augmented sql string only to have spark > repeat the first part of the exercise....but will do if need be. And yes, > I'd have to be able to process sub-queries too... > > On Thu, Nov 5, 2015 at 5:50 PM, Jörn Franke <jornfra...@gmail.com> wrote: > >> Would it be possible to use views to address some of your requirements? >> >> Alternatively it might be better to parse it yourself. There are open >> source libraries for it, if you need really a complete sql parser. Do you >> want to do it on sub queries? >> >> On 05 Nov 2015, at 23:34, Yana Kadiyska <yana.kadiy...@gmail.com> wrote: >> >> Hi folks, not sure if this belongs to dev or user list..sending to dev as >> it seems a bit convoluted. >> >> I have a UI in which we allow users to write ad-hoc queries against a >> (very large, partitioned) table. I would like to analyze the queries prior >> to execution for two purposes: >> >> 1. Reject under-constrained queries (i.e. there is a field predicate that >> I want to make sure is always present) >> 2. Augment the query with additional predicates (e.g if the user asks for >> a student_id I also want to push a constraint on another field) >> >> I could parse the sql string before passing to spark but obviously spark >> already does this anyway. Can someone give me general direction on how to >> do this (if possible). >> >> Something like >> >> myDF = sql("user_sql_query") >> myDF.queryExecution.logical //here examine the filters provided by >> user, reject if underconstrained, push new filters as needed (via >> withNewChildren?) >> >> at this point with some luck I'd have a new LogicalPlan -- what is the >> proper way to create an execution plan on top of this new Plan? Im looking >> at this >> https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L329 >> but this method is restricted to the package. I'd really prefer to hook >> into this as early as possible and still let spark run the plan >> optimizations as usual. >> >> Any guidance or pointers much appreciated. >> >> >