[ 
https://issues.apache.org/jira/browse/SPARK-14785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261157#comment-15261157
 ] 

Frederick Reiss commented on SPARK-14785:
-----------------------------------------

Note that the rewritten query in the example above needs an additional filter 
on {{t.b > t3.AVG}}. 

This rewrite is described in [a 1982 
paper|https://pdfs.semanticscholar.org/e8ac/c9f63559a09d608e62c4061520c24d970c31.pdf].
 
This rewrite does not always give correct query results. In particular, the 
rewritten query may be missing results if the subquery contains a {{COUNT}} 
aggregate, or if the subquery sometimes returns {{NULL}} and the filtering 
predicate returns {{TRUE}} on a {{NULL}} input. See the chapter "Query Rewrite 
Optimization Rules in IBM DB2 Universal Database" in _Readings in Database 
Systems_ for more information.

For the purpose of covering TPC-DS, this rewrite should work for queries 6, 32, 
81, and 92. Those queries only use {{AVG}} aggregates in their subqueries.

> Support correlated scalar subquery
> ----------------------------------
>
>                 Key: SPARK-14785
>                 URL: https://issues.apache.org/jira/browse/SPARK-14785
>             Project: Spark
>          Issue Type: New Feature
>            Reporter: Davies Liu
>
> For example:
> {code}
> SELECT a from t where b > (select avg(c) from t2 where t.id = t2.id)
> {code}
> it could be rewritten as 
> {code}
> SELECT a FROM t JOIN (SELECT id, AVG(c) FROM t2 GROUP by id) t3 ON t3.id = 
> t.id
> {code}
> TPCDS Q92, Q81, Q6 required this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to