Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17491#discussion_r109211310
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
 ---
    @@ -90,11 +90,12 @@ trait PredicateHelper {
        * Returns true iff `expr` could be evaluated as a condition within join.
        */
       protected def canEvaluateWithinJoin(expr: Expression): Boolean = expr 
match {
    -    case l: ListQuery =>
    +    case _: ListQuery | _: Exists =>
           // A ListQuery defines the query which we want to search in an IN 
subquery expression.
           // Currently the only way to evaluate an IN subquery is to convert 
it to a
           // LeftSemi/LeftAnti/ExistenceJoin by `RewritePredicateSubquery` 
rule.
           // It cannot be evaluated as part of a Join operator.
    +      // An Exists shouldn't be push into a Join operator too.
    --- End diff --
    
    @nsyca Looking at this further, there is a SubqueryExec operator that can 
execute a ScalarSubquery and InSubquery (PlanSubqueries). As part of my change, 
i had removed the case for PredicateSubquery as we removed PredicateSubquery 
all together. I just quickly tried the following and got the query to work. I 
haven't verified the semantics but just tried something quickly. Basically if 
we were to keep the Exists expression as it is and push it down as a join 
condition and execute it as a InSubquery (possibly with a additional limit 
clause) there seems to be an infrastructure for it already. Or perhaps we may 
want to introduce a ExistSubquery exec operator that can work more efficiently.
    
    ```scala
      case subquery: expressions.Exists =>
            val executedPlan = new QueryExecution(sparkSession, 
subquery.plan).executedPlan
            InSubquery(Literal.TrueLiteral,
              SubqueryExec(s"subquery${subquery.exprId.id}", executedPlan), 
subquery.exprId)
    ```
    What do you think Natt ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to