[jira] [Updated] (SPARK-31136) Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax

Dongjoon Hyun (Jira) Thu, 12 Mar 2020 21:23:29 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-31136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Dongjoon Hyun updated SPARK-31136:
----------------------------------
    Description: 
We need to consider the behavior change of SPARK-30098 .
This is a placeholder to keep the discussion and the final decision.

`CREATE TABLE` syntax changes its behavior silently.

The following is one example of the breaking the existing user data pipelines.
*Apache Spark 2.4.5*
{code}
spark-sql> CREATE TABLE t(a STRING);

spark-sql> LOAD DATA INPATH '/usr/local/spark/README.md' INTO TABLE t;

spark-sql> SELECT * FROM t LIMIT 1;
# Apache Spark
Time taken: 2.05 seconds, Fetched 1 row(s)
{code}

{code}
spark-sql> CREATE TABLE t(a CHAR(3));

spark-sql> INSERT INTO TABLE t SELECT 'a ';

spark-sql> SELECT a, length(a) FROM t;
a       3
{code}

*Apache Spark 3.0.0-preview2*
{code}
spark-sql> CREATE TABLE t(a STRING);

spark-sql> LOAD DATA INPATH '/usr/local/spark/README.md' INTO TABLE t;
Error in query: LOAD DATA is not supported for datasource tables: `default`.`t`;
{code}

{code}
spark-sql> CREATE TABLE t(a CHAR(3));

spark-sql> INSERT INTO TABLE t SELECT 'a ';

spark-sql> SELECT a, length(a) FROM t;
a       2
{code}

  was:
We need to consider the behavior change of SPARK-30098 .
This is a placeholder to keep the discussion and the final decision.

`CREATE TABLE` syntax changes its behavior silently.

The following is one example of the breaking the existing user data pipelines.
*Apache Spark 2.4.5*
{code}
spark-sql> CREATE TABLE t(a STRING);
Time taken: 3.061 seconds
spark-sql> LOAD DATA INPATH '/usr/local/spark/README.md' INTO TABLE t;
Time taken: 0.383 seconds
spark-sql> SELECT * FROM t LIMIT 1;
# Apache Spark
Time taken: 2.05 seconds, Fetched 1 row(s)
{code}

{code}
spark-sql> CREATE TABLE t(a CHAR(3));
Time taken: 1.823 seconds
spark-sql> INSERT INTO TABLE t SELECT 'a ';
Time taken: 1.735 seconds
spark-sql> SELECT a, length(a) FROM t;
a       3
Time taken: 0.289 seconds, Fetched 1 row(s)
{code}

*Apache Spark 3.0.0-preview2*
{code}
spark-sql> CREATE TABLE t(a STRING);
Time taken: 3.969 seconds
spark-sql> LOAD DATA INPATH '/usr/local/spark/README.md' INTO TABLE t;
Error in query: LOAD DATA is not supported for datasource tables: `default`.`t`;
{code}

{code}
spark-sql> CREATE TABLE t(a CHAR(3));
Time taken: 2.206 seconds
spark-sql> INSERT INTO TABLE t SELECT 'a ';
Time taken: 1.935 seconds
spark-sql> SELECT a, length(a) FROM t;
a       2
Time taken: 0.45 seconds, Fetched 1 row(s)
{code}


> Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-31136
>                 URL: https://issues.apache.org/jira/browse/SPARK-31136
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Dongjoon Hyun
>            Priority: Blocker
>              Labels: correctness
>
> We need to consider the behavior change of SPARK-30098 .
> This is a placeholder to keep the discussion and the final decision.
> `CREATE TABLE` syntax changes its behavior silently.
> The following is one example of the breaking the existing user data pipelines.
> *Apache Spark 2.4.5*
> {code}
> spark-sql> CREATE TABLE t(a STRING);
> spark-sql> LOAD DATA INPATH '/usr/local/spark/README.md' INTO TABLE t;
> spark-sql> SELECT * FROM t LIMIT 1;
> # Apache Spark
> Time taken: 2.05 seconds, Fetched 1 row(s)
> {code}
> {code}
> spark-sql> CREATE TABLE t(a CHAR(3));
> spark-sql> INSERT INTO TABLE t SELECT 'a ';
> spark-sql> SELECT a, length(a) FROM t;
> a     3
> {code}
> *Apache Spark 3.0.0-preview2*
> {code}
> spark-sql> CREATE TABLE t(a STRING);
> spark-sql> LOAD DATA INPATH '/usr/local/spark/README.md' INTO TABLE t;
> Error in query: LOAD DATA is not supported for datasource tables: 
> `default`.`t`;
> {code}
> {code}
> spark-sql> CREATE TABLE t(a CHAR(3));
> spark-sql> INSERT INTO TABLE t SELECT 'a ';
> spark-sql> SELECT a, length(a) FROM t;
> a     2
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31136) Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax

Reply via email to