[jira] [Commented] (SPARK-22204) Explain output for SQL with commands shows no optimization

Andrew Ash (JIRA) Tue, 17 Oct 2017 18:14:44 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-22204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16208676#comment-16208676
 ]


Andrew Ash commented on SPARK-22204:
------------------------------------

One way to work around this issue could be by getting the child of the command 
node and running explain on that.  This does do the query planning twice though.

See also discussion at 
https://github.com/apache/spark/pull/19269#discussion_r139841435

> Explain output for SQL with commands shows no optimization
> ----------------------------------------------------------
>
>                 Key: SPARK-22204
>                 URL: https://issues.apache.org/jira/browse/SPARK-22204
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Andrew Ash
>
> When displaying the explain output for a basic SELECT query, the query plan 
> changes as expected from analyzed -> optimized stages.  But when putting that 
> same query into a command, for example {{CREATE TABLE}} it appears that the 
> optimization doesn't take place.
> In Spark shell:
> Explain output for a {{SELECT}} statement shows optimization:
> {noformat}
> scala> spark.sql("SELECT a FROM (SELECT a FROM (SELECT a FROM (SELECT 1 AS a) 
> AS b) AS c) AS d").explain(true)
> == Parsed Logical Plan ==
> 'Project ['a]
> +- 'SubqueryAlias d
>    +- 'Project ['a]
>       +- 'SubqueryAlias c
>          +- 'Project ['a]
>             +- SubqueryAlias b
>                +- Project [1 AS a#29]
>                   +- OneRowRelation
> == Analyzed Logical Plan ==
> a: int
> Project [a#29]
> +- SubqueryAlias d
>    +- Project [a#29]
>       +- SubqueryAlias c
>          +- Project [a#29]
>             +- SubqueryAlias b
>                +- Project [1 AS a#29]
>                   +- OneRowRelation
> == Optimized Logical Plan ==
> Project [1 AS a#29]
> +- OneRowRelation
> == Physical Plan ==
> *Project [1 AS a#29]
> +- Scan OneRowRelation[]
> scala> 
> {noformat}
> But the same command run inside {{CREATE TABLE}} does not:
> {noformat}
> scala> spark.sql("CREATE TABLE IF NOT EXISTS tmptable AS SELECT a FROM 
> (SELECT a FROM (SELECT a FROM (SELECT 1 AS a) AS b) AS c) AS d").explain(true)
> == Parsed Logical Plan ==
> 'CreateTable `tmptable`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> Ignore
> +- 'Project ['a]
>    +- 'SubqueryAlias d
>       +- 'Project ['a]
>          +- 'SubqueryAlias c
>             +- 'Project ['a]
>                +- SubqueryAlias b
>                   +- Project [1 AS a#33]
>                      +- OneRowRelation
> == Analyzed Logical Plan ==
> CreateHiveTableAsSelectCommand [Database:default}, TableName: tmptable, 
> InsertIntoHiveTable]
>    +- Project [a#33]
>       +- SubqueryAlias d
>          +- Project [a#33]
>             +- SubqueryAlias c
>                +- Project [a#33]
>                   +- SubqueryAlias b
>                      +- Project [1 AS a#33]
>                         +- OneRowRelation
> == Optimized Logical Plan ==
> CreateHiveTableAsSelectCommand [Database:default}, TableName: tmptable, 
> InsertIntoHiveTable]
>    +- Project [a#33]
>       +- SubqueryAlias d
>          +- Project [a#33]
>             +- SubqueryAlias c
>                +- Project [a#33]
>                   +- SubqueryAlias b
>                      +- Project [1 AS a#33]
>                         +- OneRowRelation
> == Physical Plan ==
> CreateHiveTableAsSelectCommand CreateHiveTableAsSelectCommand 
> [Database:default}, TableName: tmptable, InsertIntoHiveTable]
>    +- Project [a#33]
>       +- SubqueryAlias d
>          +- Project [a#33]
>             +- SubqueryAlias c
>                +- Project [a#33]
>                   +- SubqueryAlias b
>                      +- Project [1 AS a#33]
>                         +- OneRowRelation
> scala>
> {noformat}
> Note that there is no change between the analyzed and optimized plans when 
> run in a command.
> This is misleading my users, as they think that there is no optimization 
> happening in the query!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22204) Explain output for SQL with commands shows no optimization

Reply via email to