Re: Need advice on hooking into Sql query plan

Reynold Xin Thu, 05 Nov 2015 16:21:03 -0800

You can hack around this by constructing logical plans yourself and then
creating a DataFrame in order to execute them. Note that this is all
depending on internals of the framework and can break when Spark upgrades.



On Thu, Nov 5, 2015 at 4:18 PM, Yana Kadiyska <yana.kadiy...@gmail.com>
wrote:

> I don't think a view would help -- in the case of under-constraining, I
> want to make sure that the user is constraining a column (e.g. I want to
> restrict them to querying a single partition at a time but I don't care
> which one)...a view per partition value is not practical due to the fairly
> high cardinality...
>
> In the case of predicate augmentation, the additional predicate depends on
> the value the user is providing e.g. my data is partitioned under
> teacherName but the end users don't have this information...So if they ask
> for student_id="1234" I'd like to add "teacherName='Smith'" based on a
> mapping that is not surfaced to the user (sorry for the contrived
> example)...But I don't think I can do this with a view. A join will produce
> the right answer but is counter-productive as my goal is to minimize the
> partitions being processed.
>
> I can parse the query myself -- I was not fond of this solution as I'd go
> sql string to parse tree back to augmented sql string only to have spark
> repeat the first part of the exercise....but will do if need be. And yes,
> I'd have to be able to process sub-queries too...
>
> On Thu, Nov 5, 2015 at 5:50 PM, Jörn Franke <jornfra...@gmail.com> wrote:
>
>> Would it be possible to use views to address some of your requirements?
>>
>> Alternatively it might be better to parse it yourself. There are open
>> source libraries for it, if you need really a complete sql parser. Do you
>> want to do it on sub queries?
>>
>> On 05 Nov 2015, at 23:34, Yana Kadiyska <yana.kadiy...@gmail.com> wrote:
>>
>> Hi folks, not sure if this belongs to dev or user list..sending to dev as
>> it seems a bit convoluted.
>>
>> I have a UI in which we allow users to write ad-hoc queries against a
>> (very large, partitioned) table. I would like to analyze the queries prior
>> to execution for two purposes:
>>
>> 1. Reject under-constrained queries (i.e. there is a field predicate that
>> I want to make sure is always present)
>> 2. Augment the query with additional predicates (e.g if the user asks for
>> a student_id I also want to push a constraint on another field)
>>
>> I could parse the sql string before passing to spark but obviously spark
>> already does this anyway. Can someone give me general direction on how to
>> do this (if possible).
>>
>> Something like
>>
>> myDF = sql("user_sql_query")
>> myDF.queryExecution.logical  //here examine the filters provided by
>> user, reject if underconstrained, push new filters as needed (via
>> withNewChildren?)
>>
>> at this point with some luck I'd have a new LogicalPlan -- what is the
>> proper way to create an execution plan on top of this new Plan? Im looking
>> at this
>> https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L329
>> but this method is restricted to the package. I'd really prefer to hook
>> into this as early as possible and still let spark run the plan
>> optimizations as usual.
>>
>> Any guidance or pointers much appreciated.
>>
>>
>

Re: Need advice on hooking into Sql query plan

Reply via email to