[ 
https://issues.apache.org/jira/browse/SPARK-29421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lantao Jin updated SPARK-29421:
-------------------------------
    Description: 
Use CREATE TABLE tb1 LIKE tb2 command to create an empty table tb1 based on the 
definition of table tb2. The most user case is to create tb1 with the same 
schema of tb2. But an inconvenient case here is this command also copies the 
FileFormat from tb2, it cannot change the input/output format and serde. Add 
the ability of changing file format is useful for some scenarios like upgrading 
a table from a low performance file format to a high performance one (parquet, 
orc).

Hive support STORED AS new file format syntax:
{code}
CREATE TABLE tbl(a int) STORED AS TEXTFILE;
CREATE TABLE tbl2 LIKE tbl STORED AS PARQUET;
{code}
We add a similar syntax for Spark. Here we separate to two features:
1. specify a different table provider in CREATE TABLE LIKE
2. Hive compatibility

In this PR, we address the first one:
Using `USING provider` to specify a different table provider in CREATE TABLE 
LIKE.

  was:
Use CREATE TABLE tb1 LIKE tb2 command to create an empty table tb1 based on the 
definition of table tb2. The most user case is to create tb1 with the same 
schema of tb2. But an inconvenient case here is this command also copies the 
FileFormat from tb2, it cannot change the input/output format and serde. Add 
the ability of changing file format is useful for some scenarios like upgrading 
a table from a low performance file format to a high performance one (parquet, 
orc).

Here gives two options to enhance it.
Option1: Add a configuration {{spark.sql.createTableLike.fileformat}}, the 
value by default is "none" which keeps the behaviour same with current -- 
copying the file format from source table. After run command SET 
spark.sql.createTableLike.fileformat=parquet or any other valid file format 
defined in {{HiveSerDe}}, {{CREATE TABLE ... LIKE}} will use the new file 
format type.

Option2: Add syntax {{USING fileformat}} after {{CREATE TABLE ... LIKE}}. For 
example,
{code}
CREATE TABLE tb1 LIKE tb2 USING parquet;
{code}
If USING keyword is ignored, it also keeps the behaviour same with current -- 
copying the file format from source table.

Both of them can keep its behaviour same with current.
We use option1 with parquet file format as an enhancement in our production 
thriftserver because we need change many existing SQL scripts without any 
modification. But for community, Option2 could be treated as a new feature 
since it needs user to write additional USING part.

cc [~dongjoon] [~hyukjin.kwon] [~joshrosen] [~cloud_fan] [~yumwang]


> Add an opportunity to change the file format of command CREATE TABLE LIKE
> -------------------------------------------------------------------------
>
>                 Key: SPARK-29421
>                 URL: https://issues.apache.org/jira/browse/SPARK-29421
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Lantao Jin
>            Priority: Major
>
> Use CREATE TABLE tb1 LIKE tb2 command to create an empty table tb1 based on 
> the definition of table tb2. The most user case is to create tb1 with the 
> same schema of tb2. But an inconvenient case here is this command also copies 
> the FileFormat from tb2, it cannot change the input/output format and serde. 
> Add the ability of changing file format is useful for some scenarios like 
> upgrading a table from a low performance file format to a high performance 
> one (parquet, orc).
> Hive support STORED AS new file format syntax:
> {code}
> CREATE TABLE tbl(a int) STORED AS TEXTFILE;
> CREATE TABLE tbl2 LIKE tbl STORED AS PARQUET;
> {code}
> We add a similar syntax for Spark. Here we separate to two features:
> 1. specify a different table provider in CREATE TABLE LIKE
> 2. Hive compatibility
> In this PR, we address the first one:
> Using `USING provider` to specify a different table provider in CREATE TABLE 
> LIKE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to