[ https://issues.apache.org/jira/browse/SPARK-14785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261157#comment-15261157 ]
Frederick Reiss commented on SPARK-14785: ----------------------------------------- Note that the rewritten query in the example above needs an additional filter on {{t.b > t3.AVG}}. This rewrite is described in [a 1982 paper|https://pdfs.semanticscholar.org/e8ac/c9f63559a09d608e62c4061520c24d970c31.pdf]. This rewrite does not always give correct query results. In particular, the rewritten query may be missing results if the subquery contains a {{COUNT}} aggregate, or if the subquery sometimes returns {{NULL}} and the filtering predicate returns {{TRUE}} on a {{NULL}} input. See the chapter "Query Rewrite Optimization Rules in IBM DB2 Universal Database" in _Readings in Database Systems_ for more information. For the purpose of covering TPC-DS, this rewrite should work for queries 6, 32, 81, and 92. Those queries only use {{AVG}} aggregates in their subqueries. > Support correlated scalar subquery > ---------------------------------- > > Key: SPARK-14785 > URL: https://issues.apache.org/jira/browse/SPARK-14785 > Project: Spark > Issue Type: New Feature > Reporter: Davies Liu > > For example: > {code} > SELECT a from t where b > (select avg(c) from t2 where t.id = t2.id) > {code} > it could be rewritten as > {code} > SELECT a FROM t JOIN (SELECT id, AVG(c) FROM t2 GROUP by id) t3 ON t3.id = > t.id > {code} > TPCDS Q92, Q81, Q6 required this -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org