[jira] [Commented] (SPARK-18455) General support for subquery processing

Nattavut Sutyanyong (JIRA) Wed, 16 Nov 2016 10:20:21 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-18455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671197#comment-15671197
 ]


Nattavut Sutyanyong commented on SPARK-18455:
---------------------------------------------

This comment captures a discussion in the PR of SPARK-17348 on one of the 
rationales behind the implementation choice to do the transformation of 
correlated subquery into join in Analyzer phase.

[~hvanhovell] wrote:
{quote}
{code:sql}
-- hive: subquery_exists_having.q
select b.key, min(b.value)
from src b
group by b.key
having exists ( select a.key
                from src a
                where a.value > 'val_9' and a.value = min(b.value)
                )
{code}

The difficulty here is that we need to evaluate the min(b.value) in the 
aggregate. So we needed a way to extract the entire min(b.value) expression. 
The most straightforward way was to extract the entire predicate and rewrite 
the tree in the process. This is quite an aggressive approach, and it breaks as 
soon as you cannot/should not move the predicate. In hindsight it might have 
been better to isolate the entire outer expression instead of only isolating 
the outer reference, and to do the rewriting in a later stage.
{quote}

> General support for subquery processing
> ---------------------------------------
>
>                 Key: SPARK-18455
>                 URL: https://issues.apache.org/jira/browse/SPARK-18455
>             Project: Spark
>          Issue Type: Story
>          Components: SQL
>            Reporter: Nattavut Sutyanyong
>
> Subquery support has been introduced in Spark 2.0. The initial implementation 
> covers the most common subquery use case: the ones used in TPC queries for 
> instance.
> Spark currently supports the following subqueries:
> * Uncorrelated Scalar Subqueries. All cases are supported.
> * Correlated Scalar Subqueries. We only allow subqueries that are aggregated 
> and use equality predicates.
> * Predicate Subqueries. IN or Exists type of queries. We allow most 
> predicates, except when they are pulled from under an Aggregate or Window 
> operator. In that case we only support equality predicates.
> However this does not cover the full range of possible subqueries. This, in 
> part, has to do with the fact that we currently rewrite all correlated 
> subqueries into a (LEFT/LEFT SEMI/LEFT ANTI) join.
> We currently lack supports for the following use cases:
> * The use of predicate subqueries in a projection.
> * The use of non-equality predicates below Aggregates and or Window operators.
> * The use of non-Aggregate subqueries for correlated scalar subqueries.
> This JIRA aims to lift these current limitations in subquery processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18455) General support for subquery processing

Reply via email to