[ https://issues.apache.org/jira/browse/SPARK-14488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cheng Lian updated SPARK-14488: ------------------------------- Description: Currently, Spark 2.0 master allows DDL statements like {{CREATE TEMPORARY TABLE ... USING ... AS SELECT ...}}, which imposes weird behavior and weird semantics. Let's try the following Spark shell snippet: {code} sqlContext range 10 registerTempTable "x" // The problematic DDL statement: sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM x" sqlContext.tables().show() {code} It shows the following result: {noformat} +---------+-----------+ |tableName|isTemporary| +---------+-----------+ | y| false| | x| true| +---------+-----------+ {noformat} *Weird behavior* Note that {{y}} is NOT temporary although it's created using {{CREATE TEMPORARY TABLE ...}}, and the query result is written in Parquet format under default Hive warehouse location, which is {{/user/hive/warehouse/y}} on my local machine. *Weird semantics* Secondly, even if this DDL statement does create a temporary table, the semantics is still somewhat weird: # It has a {{AS SELECT ...}} clause, which is supposed to run a given query instead of loading data from existing files. # It has a {{USING <format>}} clause, which is supposed to, I guess, converting the result of the above query into the given format. And by "converting", we have to write out the data into file system. # It has a {{TEMPORARY}} keyword, which is supposed to, I guess, create an in-memory temporary table using the files written above? The main questions: # Is the above combination ({{TEMPORARY}} + {{USING}} + {{AS SELECT}}) a valid one? # If it's not, why do we have a [{{CreateTempTableUsingAsSelect}} command|https://github.com/apache/spark/blob/583b5e05309adb73cdffd974a810d6bfb5f2ff95/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala#L116], which exactly maps to this combination? # If it is, what is the expected semantics? was: Currently, Spark 2.0 master allows DDL statements like {{CREATE TEMPORARY TABLE ... USING ... AS SELECT ...}}, which imposes weird behavior and weird semantics. Let's try the following Spark shell snippet: {code} sqlContext range 10 registerTempTable "x" // The problematic DDL statement: sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM x" sqlContext.tables().show() {code} It shows the following result: {noformat} +---------+-----------+ |tableName|isTemporary| +---------+-----------+ | y| false| | x| true| +---------+-----------+ {noformat} *Weird behavior* Note that {{y}} is NOT temporary although it's created using {{CREATE TEMPORARY TABLE ...}}, and the query result is written in Parquet format under default Hive warehouse location, which is {{/user/hive/warehouse/y}} on my local machine. *Weird semantics* Secondly, even if this DDL statement does create a temporary table, the semantics is still somewhat weird: # It has a {{AS SELECT ...}} clause, which is supposed to run a given query instead of loading data from existing files. # It has a {{USING <format>}} clause, which is supposed to, I guess, converting the result of the above query into the given format. And by "converting", we have to write out the data into file system. # It has a {{TEMPORARY}} keyword, which is supposed to, I guess, create an in-memory temporary table using the files written above? The main questions: # Is the above combination ({{TEMPORARY}} + {{USING}} + {{AS SELECT}}) a valid one? If it's not, why do we have a [{{CreateTempTableUsingAsSelect}} command|https://github.com/apache/spark/blob/583b5e05309adb73cdffd974a810d6bfb5f2ff95/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala#L116], which exactly maps to this combination? # If it is, what is the expected semantics? > Weird behavior of DDL "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..." > -------------------------------------------------------------------------- > > Key: SPARK-14488 > URL: https://issues.apache.org/jira/browse/SPARK-14488 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0 > Reporter: Cheng Lian > Assignee: Cheng Lian > > Currently, Spark 2.0 master allows DDL statements like {{CREATE TEMPORARY > TABLE ... USING ... AS SELECT ...}}, which imposes weird behavior and weird > semantics. > Let's try the following Spark shell snippet: > {code} > sqlContext range 10 registerTempTable "x" > // The problematic DDL statement: > sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM x" > sqlContext.tables().show() > {code} > It shows the following result: > {noformat} > +---------+-----------+ > |tableName|isTemporary| > +---------+-----------+ > | y| false| > | x| true| > +---------+-----------+ > {noformat} > *Weird behavior* > Note that {{y}} is NOT temporary although it's created using {{CREATE > TEMPORARY TABLE ...}}, and the query result is written in Parquet format > under default Hive warehouse location, which is {{/user/hive/warehouse/y}} on > my local machine. > *Weird semantics* > Secondly, even if this DDL statement does create a temporary table, the > semantics is still somewhat weird: > # It has a {{AS SELECT ...}} clause, which is supposed to run a given query > instead of loading data from existing files. > # It has a {{USING <format>}} clause, which is supposed to, I guess, > converting the result of the above query into the given format. And by > "converting", we have to write out the data into file system. > # It has a {{TEMPORARY}} keyword, which is supposed to, I guess, create an > in-memory temporary table using the files written above? > The main questions: > # Is the above combination ({{TEMPORARY}} + {{USING}} + {{AS SELECT}}) a > valid one? > # If it's not, why do we have a [{{CreateTempTableUsingAsSelect}} > command|https://github.com/apache/spark/blob/583b5e05309adb73cdffd974a810d6bfb5f2ff95/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala#L116], > which exactly maps to this combination? > # If it is, what is the expected semantics? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org