[ 
https://issues.apache.org/jira/browse/SPARK-29421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16948506#comment-16948506
 ] 

Lantao Jin edited comment on SPARK-29421 at 10/10/19 12:07 PM:
---------------------------------------------------------------

[~cloud_fan] Yes, Hive support a similar command with {{STORED AS}}:
{code}
hive> CREATE TABLE tbl(a int) STORED AS TEXTFILE;
OK
Time taken: 0.726 seconds
hive> CREATE TABLE tbl2 LIKE tbl STORED AS PARQUET;
OK
Time taken: 0.294 seconds
{code}

Here I use {{USING}} to be compatible with Spark command {{CREATE TABLE ... 
USING}}.
So do you think which option would be better?


was (Author: cltlfcjin):
[~cloud_fan] Yes, Hive support a similar command with {{STORED AS}}:
{code}
hive> CREATE TABLE tbl(a int) STORED AS TEXTFILE;
OK
Time taken: 0.726 seconds
hive> CREATE TABLE tbl2 LIKE tbl STORED AS PARQUET;
OK
Time taken: 0.294 seconds
{code}

> Add an opportunity to change the file format of command CREATE TABLE LIKE
> -------------------------------------------------------------------------
>
>                 Key: SPARK-29421
>                 URL: https://issues.apache.org/jira/browse/SPARK-29421
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Lantao Jin
>            Priority: Major
>
> Use CREATE TABLE tb1 LIKE tb2 command to create an empty table tb1 based on 
> the definition of table tb2. The most user case is to create tb1 with the 
> same schema of tb2. But an inconvenient case here is this command also copies 
> the FileFormat from tb2, it cannot change the input/output format and serde. 
> Add the ability of changing file format is useful for some scenarios like 
> upgrading a table from a low performance file format to a high performance 
> one (parquet, orc).
> Here gives two options to enhance it.
> Option1: Add a configuration {{spark.sql.createTableLike.fileformat}}, the 
> value by default is "none" which keeps the behaviour same with current -- 
> copying the file format from source table. After run command SET 
> spark.sql.createTableLike.fileformat=parquet or any other valid file format 
> defined in {{HiveSerDe}}, {{CREATE TABLE ... LIKE}} will use the new file 
> format type.
> Option2: Add syntax {{USING fileformat}} after {{CREATE TABLE ... LIKE}}. For 
> example,
> {code}
> CREATE TABLE tb1 LIKE tb2 USING parquet;
> {code}
> If USING keyword is ignored, it also keeps the behaviour same with current -- 
> copying the file format from source table.
> Both of them can keep its behaviour same with current.
> We use option1 with parquet file format as an enhancement in our production 
> thriftserver because we need change many existing SQL scripts without any 
> modification. But for community, Option2 could be treated as a new feature 
> since it needs user to write additional USING part.
> cc [~dongjoon] [~hyukjin.kwon] [~joshrosen] [~cloud_fan] [~yumwang]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to