[ https://issues.apache.org/jira/browse/SPARK-21759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Liang-Chi Hsieh updated SPARK-21759: ------------------------------------ Description: With the check for structural integrity proposed in SPARK-21726, I found that an optimization rule {{PullupCorrelatedPredicates}} can produce unresolved plans. For a correlated IN query like: {code} Project [a#0] +- Filter a#0 IN (list#4 [b#1]) : +- Project [c#2] : +- Filter (outer(b#1) < d#3) : +- LocalRelation <empty>, [c#2, d#3] +- LocalRelation <empty>, [a#0, b#1] {code} After {{PullupCorrelatedPredicates}}, it produces query plan like: {code} 'Project [a#0] +- 'Filter a#0 IN (list#4 [(b#1 < d#3)]) : +- Project [c#2, d#3] : +- LocalRelation <empty>, [c#2, d#3] +- LocalRelation <empty>, [a#0, b#1] {code} Because the correlated predicate involves another attribute {{d#3}} in subquery, it has been pulled out and added into the {{Project}} on the top of the subquery. When {{list}} in {{In}} contains just one {{ListQuery}}, {{In.checkInputDataTypes}} checks if the size of {{value}} expressions matches the output size of subquery. In the above example, there is only {{value}} expression and the subquery output has two attributes {{c#2, d#3}}, so it fails the check and {{In.resolved}} returns {{false}}. We should not let {{In.checkInputDataTypes}} wrongly report unresolved plans to fail the structural integrity check. was: With the check for structural integrity proposed in SPARK-21726, I found that an optimization rule {{PullupCorrelatedPredicates}} can produce unresolved plans. For a correlated IN query like: {code} Project [a#0] +- Filter a#0 IN (list#4 [b#1]) : +- Project [c#2] : +- Filter (outer(b#1) < d#3) : +- LocalRelation <empty>, [c#2, d#3] +- LocalRelation <empty>, [a#0, b#1] {code} After {{PullupCorrelatedPredicates}}, it produces query plan like: {code} 'Project [a#0] +- 'Filter a#0 IN (list#4 [(b#1 < d#3)]) : +- Project [c#2, d#3] : +- LocalRelation <empty>, [c#2, d#3] +- LocalRelation <empty>, [a#0, b#1] {code} Because the correlated predicate involves another attribute {{d#3}} in subquery, it has been pulled out and added into the {{Project}} on the top of the subquery. When {{list}} in {{In}} contains just one {{ListQuery}}, {{In.checkInputDataTypes}} checks if the size of {{value}} expressions matches the output size of subquery. In the above example, there is only {{value}} expression and the subquery output has two attributes {{c#2, d#3}}, so it fails the check and {{In.resolved}} returns {{false}}. We should let {{PullupCorrelatedPredicates}} produce resolved plans to pass the structural integrity check. > In.checkInputDataTypes should not wrongly report unresolved plans for IN > correlated subquery > -------------------------------------------------------------------------------------------- > > Key: SPARK-21759 > URL: https://issues.apache.org/jira/browse/SPARK-21759 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.2.0 > Reporter: Liang-Chi Hsieh > > With the check for structural integrity proposed in SPARK-21726, I found that > an optimization rule {{PullupCorrelatedPredicates}} can produce unresolved > plans. > For a correlated IN query like: > {code} > Project [a#0] > +- Filter a#0 IN (list#4 [b#1]) > : +- Project [c#2] > : +- Filter (outer(b#1) < d#3) > : +- LocalRelation <empty>, [c#2, d#3] > +- LocalRelation <empty>, [a#0, b#1] > {code} > After {{PullupCorrelatedPredicates}}, it produces query plan like: > {code} > 'Project [a#0] > +- 'Filter a#0 IN (list#4 [(b#1 < d#3)]) > : +- Project [c#2, d#3] > : +- LocalRelation <empty>, [c#2, d#3] > +- LocalRelation <empty>, [a#0, b#1] > {code} > Because the correlated predicate involves another attribute {{d#3}} in > subquery, it has been pulled out and added into the {{Project}} on the top of > the subquery. > When {{list}} in {{In}} contains just one {{ListQuery}}, > {{In.checkInputDataTypes}} checks if the size of {{value}} expressions > matches the output size of subquery. In the above example, there is only > {{value}} expression and the subquery output has two attributes {{c#2, d#3}}, > so it fails the check and {{In.resolved}} returns {{false}}. > We should not let {{In.checkInputDataTypes}} wrongly report unresolved plans > to fail the structural integrity check. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org