Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12954#discussion_r62391407
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
    @@ -1645,3 +1646,30 @@ object RewriteCorrelatedScalarSubquery extends 
Rule[LogicalPlan] {
           }
       }
     }
    +
    +/**
    + * Rewrite [[Filter]] plans that contain correlated [[ScalarSubquery]] 
expressions. When these
    + * correlated [[ScalarSubquery]] expressions are wrapped in a some 
Predicate expression, we rewrite
    + * them into [[PredicateSubquery]] expressions.
    + */
    +object RewriteScalarSubqueriesInFilter extends Rule[LogicalPlan] {
    --- End diff --
    
    The advantage of Semi Join is that a subquery can actually return multiple 
results for one row without causing correctness problems. My initial approach 
was to relax the rules for scalar subquery (allow disjunctive predicates) and 
prevent possible duplicates by using left semis. This didn't work because I was 
also pulling non-correlated predicates through the aggregate (which makes its 
results invalid).
    
    I am not sure which predicates you want to push through the left semi. 
Since all predicates that should be pushed down the left hand side are already 
in the predicate condition. But I might be missing something here.
    
    I have remove this in my last commit. I do feel that this might be a small 
improvement over the current situation. Let's revisit this after Spark 2.0.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to