[ 
https://issues.apache.org/jira/browse/SPARK-14488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-14488:
-------------------------------
    Description: 
Currently, Spark 2.0 master allows DDL statements like {{CREATE TEMPORARY TABLE 
... USING ... AS SELECT ...}}, which imposes weird behavior and weird semantics.

Let's try the following Spark shell snippet:

{code}
sqlContext range 10 registerTempTable "x"

// The problematic DDL statement:
sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM x"

sqlContext.tables().show()
{code}

It shows the following result:

{noformat}
+---------+-----------+
|tableName|isTemporary|
+---------+-----------+
|        y|      false|
|        x|       true|
+---------+-----------+
{noformat}

*Weird behavior*

Note that {{y}} is NOT temporary although it's created using {{CREATE TEMPORARY 
TABLE ...}}, and the query result is written in Parquet format under default 
Hive warehouse location, which is {{/user/hive/warehouse/y}} on my local 
machine.

*Weird semantics*

Secondly, even if this DDL statement does create a temporary table, the 
semantics is still somewhat weird:

# It has a {{AS SELECT ...}} clause, which is supposed to run a given query 
instead of loading data from existing files.
# It has a {{USING <format>}} clause, which is supposed to, I guess, converting 
the result of the above query into the given format. And by "converting", we 
have to write out the data into file system.
# It has a {{TEMPORARY}} keyword, which is supposed to, I guess, create an 
in-memory temporary table using the files written above?

The main questions:

# Is the above combination ({{TEMPORARY}} + {{USING}} + {{AS SELECT}}) a valid 
one?
# If it's not, why do we have a [{{CreateTempTableUsingAsSelect}} 
command|https://github.com/apache/spark/blob/583b5e05309adb73cdffd974a810d6bfb5f2ff95/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala#L116],
 which exactly maps to this combination?
# If it is, what is the expected semantics?


  was:
Currently, Spark 2.0 master allows DDL statements like {{CREATE TEMPORARY TABLE 
... USING ... AS SELECT ...}}, which imposes weird behavior and weird semantics.

Let's try the following Spark shell snippet:

{code}
sqlContext range 10 registerTempTable "x"

// The problematic DDL statement:
sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM x"

sqlContext.tables().show()
{code}

It shows the following result:

{noformat}
+---------+-----------+
|tableName|isTemporary|
+---------+-----------+
|        y|      false|
|        x|       true|
+---------+-----------+
{noformat}

*Weird behavior*

Note that {{y}} is NOT temporary although it's created using {{CREATE TEMPORARY 
TABLE ...}}, and the query result is written in Parquet format under default 
Hive warehouse location, which is {{/user/hive/warehouse/y}} on my local 
machine.

*Weird semantics*

Secondly, even if this DDL statement does create a temporary table, the 
semantics is still somewhat weird:

# It has a {{AS SELECT ...}} clause, which is supposed to run a given query 
instead of loading data from existing files.
# It has a {{USING <format>}} clause, which is supposed to, I guess, converting 
the result of the above query into the given format. And by "converting", we 
have to write out the data into file system.
# It has a {{TEMPORARY}} keyword, which is supposed to, I guess, create an 
in-memory temporary table using the files written above?

The main questions:

# Is the above combination ({{TEMPORARY}} + {{USING}} + {{AS SELECT}}) a valid 
one? If it's not, why do we have a [{{CreateTempTableUsingAsSelect}} 
command|https://github.com/apache/spark/blob/583b5e05309adb73cdffd974a810d6bfb5f2ff95/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala#L116],
 which exactly maps to this combination?
# If it is, what is the expected semantics?



> Weird behavior of DDL "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..."
> --------------------------------------------------------------------------
>
>                 Key: SPARK-14488
>                 URL: https://issues.apache.org/jira/browse/SPARK-14488
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Cheng Lian
>            Assignee: Cheng Lian
>
> Currently, Spark 2.0 master allows DDL statements like {{CREATE TEMPORARY 
> TABLE ... USING ... AS SELECT ...}}, which imposes weird behavior and weird 
> semantics.
> Let's try the following Spark shell snippet:
> {code}
> sqlContext range 10 registerTempTable "x"
> // The problematic DDL statement:
> sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM x"
> sqlContext.tables().show()
> {code}
> It shows the following result:
> {noformat}
> +---------+-----------+
> |tableName|isTemporary|
> +---------+-----------+
> |        y|      false|
> |        x|       true|
> +---------+-----------+
> {noformat}
> *Weird behavior*
> Note that {{y}} is NOT temporary although it's created using {{CREATE 
> TEMPORARY TABLE ...}}, and the query result is written in Parquet format 
> under default Hive warehouse location, which is {{/user/hive/warehouse/y}} on 
> my local machine.
> *Weird semantics*
> Secondly, even if this DDL statement does create a temporary table, the 
> semantics is still somewhat weird:
> # It has a {{AS SELECT ...}} clause, which is supposed to run a given query 
> instead of loading data from existing files.
> # It has a {{USING <format>}} clause, which is supposed to, I guess, 
> converting the result of the above query into the given format. And by 
> "converting", we have to write out the data into file system.
> # It has a {{TEMPORARY}} keyword, which is supposed to, I guess, create an 
> in-memory temporary table using the files written above?
> The main questions:
> # Is the above combination ({{TEMPORARY}} + {{USING}} + {{AS SELECT}}) a 
> valid one?
> # If it's not, why do we have a [{{CreateTempTableUsingAsSelect}} 
> command|https://github.com/apache/spark/blob/583b5e05309adb73cdffd974a810d6bfb5f2ff95/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala#L116],
>  which exactly maps to this combination?
> # If it is, what is the expected semantics?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to