[ 
https://issues.apache.org/jira/browse/SPARK-21759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang-Chi Hsieh updated SPARK-21759:
------------------------------------
    Description: 
With the check for structural integrity proposed in SPARK-21726, I found that 
an optimization rule {{PullupCorrelatedPredicates}} can produce unresolved 
plans.

For a correlated IN query like:
{code}
Project [a#0]
+- Filter a#0 IN (list#4 [b#1])
   :  +- Project [c#2]
   :     +- Filter (outer(b#1) < d#3)
   :        +- LocalRelation <empty>, [c#2, d#3]
   +- LocalRelation <empty>, [a#0, b#1]
{code}

After {{PullupCorrelatedPredicates}}, it produces query plan like:
{code}
'Project [a#0]
+- 'Filter a#0 IN (list#4 [(b#1 < d#3)])
   :  +- Project [c#2, d#3]
   :     +- LocalRelation <empty>, [c#2, d#3]
   +- LocalRelation <empty>, [a#0, b#1]
{code}

Because the correlated predicate involves another attribute {{d#3}} in 
subquery, it has been pulled out and added into the {{Project}} on the top of 
the subquery.

When {{list}} in {{In}} contains just one {{ListQuery}}, 
{{In.checkInputDataTypes}} checks if the size of {{value}} expressions matches 
the output size of subquery. In the above example, there is only {{value}} 
expression and the subquery output has two attributes {{c#2, d#3}}, so it fails 
the check and {{In.resolved}} returns {{false}}.

We should not let {{In.checkInputDataTypes}} wrongly report unresolved plans to 
fail the structural integrity check.



  was:
With the check for structural integrity proposed in SPARK-21726, I found that 
an optimization rule {{PullupCorrelatedPredicates}} can produce unresolved 
plans.

For a correlated IN query like:
{code}
Project [a#0]
+- Filter a#0 IN (list#4 [b#1])
   :  +- Project [c#2]
   :     +- Filter (outer(b#1) < d#3)
   :        +- LocalRelation <empty>, [c#2, d#3]
   +- LocalRelation <empty>, [a#0, b#1]
{code}

After {{PullupCorrelatedPredicates}}, it produces query plan like:
{code}
'Project [a#0]
+- 'Filter a#0 IN (list#4 [(b#1 < d#3)])
   :  +- Project [c#2, d#3]
   :     +- LocalRelation <empty>, [c#2, d#3]
   +- LocalRelation <empty>, [a#0, b#1]
{code}

Because the correlated predicate involves another attribute {{d#3}} in 
subquery, it has been pulled out and added into the {{Project}} on the top of 
the subquery.

When {{list}} in {{In}} contains just one {{ListQuery}}, 
{{In.checkInputDataTypes}} checks if the size of {{value}} expressions matches 
the output size of subquery. In the above example, there is only {{value}} 
expression and the subquery output has two attributes {{c#2, d#3}}, so it fails 
the check and {{In.resolved}} returns {{false}}.

We should let {{PullupCorrelatedPredicates}} produce resolved plans to pass the 
structural integrity check.




> In.checkInputDataTypes should not wrongly report unresolved plans for IN 
> correlated subquery
> --------------------------------------------------------------------------------------------
>
>                 Key: SPARK-21759
>                 URL: https://issues.apache.org/jira/browse/SPARK-21759
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Liang-Chi Hsieh
>
> With the check for structural integrity proposed in SPARK-21726, I found that 
> an optimization rule {{PullupCorrelatedPredicates}} can produce unresolved 
> plans.
> For a correlated IN query like:
> {code}
> Project [a#0]
> +- Filter a#0 IN (list#4 [b#1])
>    :  +- Project [c#2]
>    :     +- Filter (outer(b#1) < d#3)
>    :        +- LocalRelation <empty>, [c#2, d#3]
>    +- LocalRelation <empty>, [a#0, b#1]
> {code}
> After {{PullupCorrelatedPredicates}}, it produces query plan like:
> {code}
> 'Project [a#0]
> +- 'Filter a#0 IN (list#4 [(b#1 < d#3)])
>    :  +- Project [c#2, d#3]
>    :     +- LocalRelation <empty>, [c#2, d#3]
>    +- LocalRelation <empty>, [a#0, b#1]
> {code}
> Because the correlated predicate involves another attribute {{d#3}} in 
> subquery, it has been pulled out and added into the {{Project}} on the top of 
> the subquery.
> When {{list}} in {{In}} contains just one {{ListQuery}}, 
> {{In.checkInputDataTypes}} checks if the size of {{value}} expressions 
> matches the output size of subquery. In the above example, there is only 
> {{value}} expression and the subquery output has two attributes {{c#2, d#3}}, 
> so it fails the check and {{In.resolved}} returns {{false}}.
> We should not let {{In.checkInputDataTypes}} wrongly report unresolved plans 
> to fail the structural integrity check.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to