Github user liwensun commented on a diff in the pull request: https://github.com/apache/spark/pull/22141#discussion_r211798295 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala --- @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { plan: LogicalPlan): (Option[Expression], LogicalPlan) = { var newPlan = plan val newExprs = exprs.map { e => - e transformUp { + e transformDown { case Exists(sub, conditions, _) => val exists = AttributeReference("exists", BooleanType, nullable = false)() // Deduplicate conflicting attributes if any. newPlan = dedupJoin( Join(newPlan, sub, ExistenceJoin(exists), conditions.reduceLeftOption(And))) exists + case (Not(InSubquery(values, ListQuery(sub, conditions, _, _)))) => + val exists = AttributeReference("exists", BooleanType, nullable = false)() + val inConditions = values.zip(sub.output).map(EqualTo.tupled) + val nullAwareJoinConds = inConditions.map(c => Or(c, IsNull(c))) --- End diff -- Thanks for working on this! But I'm not sure if this can handle the expression like this correctly: ```Not(And/Or(InSubquery, otherExpressiions*))``` or this ```Not(Not(InSubquery))``` Based on my understanding I think fundamentally what we want is probably to change the handling for the InSubquery case here by making the ExistenceJoin null aware somehow instead of adding another `Not(InSubquery(..))` case, right?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org