[ https://issues.apache.org/jira/browse/SPARK-22204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-22204. ---------------------------------- Resolution: Incomplete > Explain output for SQL with commands shows no optimization > ---------------------------------------------------------- > > Key: SPARK-22204 > URL: https://issues.apache.org/jira/browse/SPARK-22204 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.0 > Reporter: Andrew Ash > Priority: Major > Labels: bulk-closed > > When displaying the explain output for a basic SELECT query, the query plan > changes as expected from analyzed -> optimized stages. But when putting that > same query into a command, for example {{CREATE TABLE}} it appears that the > optimization doesn't take place. > In Spark shell: > Explain output for a {{SELECT}} statement shows optimization: > {noformat} > scala> spark.sql("SELECT a FROM (SELECT a FROM (SELECT a FROM (SELECT 1 AS a) > AS b) AS c) AS d").explain(true) > == Parsed Logical Plan == > 'Project ['a] > +- 'SubqueryAlias d > +- 'Project ['a] > +- 'SubqueryAlias c > +- 'Project ['a] > +- SubqueryAlias b > +- Project [1 AS a#29] > +- OneRowRelation > == Analyzed Logical Plan == > a: int > Project [a#29] > +- SubqueryAlias d > +- Project [a#29] > +- SubqueryAlias c > +- Project [a#29] > +- SubqueryAlias b > +- Project [1 AS a#29] > +- OneRowRelation > == Optimized Logical Plan == > Project [1 AS a#29] > +- OneRowRelation > == Physical Plan == > *Project [1 AS a#29] > +- Scan OneRowRelation[] > scala> > {noformat} > But the same command run inside {{CREATE TABLE}} does not: > {noformat} > scala> spark.sql("CREATE TABLE IF NOT EXISTS tmptable AS SELECT a FROM > (SELECT a FROM (SELECT a FROM (SELECT 1 AS a) AS b) AS c) AS d").explain(true) > == Parsed Logical Plan == > 'CreateTable `tmptable`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > Ignore > +- 'Project ['a] > +- 'SubqueryAlias d > +- 'Project ['a] > +- 'SubqueryAlias c > +- 'Project ['a] > +- SubqueryAlias b > +- Project [1 AS a#33] > +- OneRowRelation > == Analyzed Logical Plan == > CreateHiveTableAsSelectCommand [Database:default}, TableName: tmptable, > InsertIntoHiveTable] > +- Project [a#33] > +- SubqueryAlias d > +- Project [a#33] > +- SubqueryAlias c > +- Project [a#33] > +- SubqueryAlias b > +- Project [1 AS a#33] > +- OneRowRelation > == Optimized Logical Plan == > CreateHiveTableAsSelectCommand [Database:default}, TableName: tmptable, > InsertIntoHiveTable] > +- Project [a#33] > +- SubqueryAlias d > +- Project [a#33] > +- SubqueryAlias c > +- Project [a#33] > +- SubqueryAlias b > +- Project [1 AS a#33] > +- OneRowRelation > == Physical Plan == > CreateHiveTableAsSelectCommand CreateHiveTableAsSelectCommand > [Database:default}, TableName: tmptable, InsertIntoHiveTable] > +- Project [a#33] > +- SubqueryAlias d > +- Project [a#33] > +- SubqueryAlias c > +- Project [a#33] > +- SubqueryAlias b > +- Project [1 AS a#33] > +- OneRowRelation > scala> > {noformat} > Note that there is no change between the analyzed and optimized plans when > run in a command. > This is misleading my users, as they think that there is no optimization > happening in the query! -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org