[
https://issues.apache.org/jira/browse/SPARK-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15271120#comment-15271120
]
Herman van Hovell edited comment on SPARK-14986 at 5/4/16 6:13 PM:
-------------------------------------------------------------------
I have taken a look at this. Your query yields the following plan:
{noformat}
== Parsed Logical Plan ==
'Project ['nil]
+- 'Generate 'EXPLODE('array()), true, true, Some(n), ['nil]
+- SubqueryAlias x
+- Project [1 AS x#0]
+- OneRowRelation$
== Analyzed Logical Plan ==
nil: null
Project [nil#6]
+- Generate explode(array()), true, true, Some(n), [nil#6]
+- SubqueryAlias x
+- Project [1 AS x#0]
+- OneRowRelation$
== Optimized Logical Plan ==
Generate explode([]), false, true, Some(n), [nil#6]
+- OneRowRelation$
== Physical Plan ==
Generate explode([]), false, true, [nil#6]
+- Scan OneRowRelation[]
{noformat}
The optimizer set the {{join}} flag to false because no fields from the first
relation ({{select 1 as x}})are used. See:
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L365
Setting the join flag to false, triggers a different code path. This code path
emits all the rows in the generated relation for each input row. It does not
return any rows if the relation is empty; which is what you are seeing. The
other code path would generate a row because it performs a left join like
operation on the generated results.
This is only a problem for {{OUTER}} lateral views. We could add the {{outer}}
flag to the optimizer rule. Does anyone know what the default behavior of Hive
is?
was (Author: hvanhovell):
I have taken a look at this. Your query yields the following plan:
{noformat}
== Parsed Logical Plan ==
'Project ['nil]
+- 'Generate 'EXPLODE('array()), true, true, Some(n), ['nil]
+- SubqueryAlias x
+- Project [1 AS x#0]
+- OneRowRelation$
== Analyzed Logical Plan ==
nil: null
Project [nil#6]
+- Generate explode(array()), true, true, Some(n), [nil#6]
+- SubqueryAlias x
+- Project [1 AS x#0]
+- OneRowRelation$
== Optimized Logical Plan ==
Generate explode([]), false, true, Some(n), [nil#6]
+- OneRowRelation$
== Physical Plan ==
Generate explode([]), false, true, [nil#6]
+- Scan OneRowRelation[]
{noformat}
The optimizer set the {join} flag to false because no fields from the first
relation ({select 1 as x})are used. See:
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L365
Setting the join flag to false, triggers a different code path. This code path
emits all the rows in the generated relation for each input row. It does not
return any rows if the relation is empty; which is what you are seeing. The
other code path would generate a row because it performs a left join like
operation on the generated results.
This is only a problem for {OUTER} lateral views. We could add the {outer} flag
to the optimizer rule. Does anyone know what the default behavior of Hive is?
> Spark SQL returns incorrect results for LATERAL VIEW OUTER queries if all
> inner columns are projected out
> ---------------------------------------------------------------------------------------------------------
>
> Key: SPARK-14986
> URL: https://issues.apache.org/jira/browse/SPARK-14986
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.5.2
> Reporter: Andrey Balmin
>
> Repro: using Hive context, run this SQL query:
> select nil from (select 1 as x ) x LATERAL VIEW OUTER EXPLODE( array ())
> n as nil
> Actual result: returns 0 rows.
> Expected results: should return 1 row with null value.
> Details:
> If the query is modified to also return x:
> select x, nil from (select 1 as x ) x LATERAL VIEW OUTER EXPLODE( array
> ()) n as nil
> it works correctly and returns 1 row: [ 1, null ]
> Clearly, changing Select clause of a query should not change the number of
> rows it returns.
> Looking at the query plan it seems that the Generator object was
> (incorrectly) marked with “join=false"
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]