Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/15763#discussion_r86653844 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1044,6 +1044,34 @@ class Analyzer( failOnOuterReference(p) p } + + // SPARK-17348 + // Looking for a potential incorrect result case. + // When a correlated predicate is a non-equality predicate + // it must be placed at the immediate child operator. + // Otherwise, the pull up of the correlated predicate + // will generate a plan with a different semantics + // which could return incorrect result. + var continue : Boolean = true --- End diff -- One technique that I know of being used to transform correlation queries to queries with no correlation is outlined in this 1996 IEEE Data Engineering paper. Complex query decorrelation P. Seshadri; H. Pirahesh; T. Y. C. Leung Data Engineering, 1996. Proceedings of the Twelfth International Conference on Pages: 450 - 458 Distributed systems aggravate the performance impact of correlated queries from the movement of the entire data set of the subqueries to where the data of the outer tables reside. This processing is similar to the `BroadcastNestedLoopJoinExec` in Spark. The idea behind the paper is to build a duplicate portion of the outer tables and de-correlate the original subquery by joining the duplicate portion within the subquery. The algorithm is claimed to be generic and can be applied to all forms of correlations, both shallow correlation where the correlated point is immediately below the operation over the outer table(s), and deep correlation, where the correlated point is at arbitrary level below the operation over the outer tables.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org