Hi, All.

It's great to see community activities to polish 4.0.0 more and more.
Thank you all.

I'd like to bring SPARK-46122 (another SQL topic) to you from the subtasks
of SPARK-44444 (Prepare Apache Spark 4.0.0),

- https://issues.apache.org/jira/browse/SPARK-46122
   Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default

This legacy configuration is about `CREATE TABLE` SQL syntax without
`USING` and `STORED AS`, which is currently mapped to `Hive` table.
The proposal of SPARK-46122 is to switch the default value of this
configuration from `true` to `false` to use Spark native tables because
we support better.

In other words, Spark will use the value of `spark.sql.sources.default`
as the table provider instead of `Hive` like the other Spark APIs. Of
course,
the users can get all the legacy behavior by setting back to `true`.

Historically, this behavior change was merged once at Apache Spark 3.0.0
preparation via SPARK-30098 already, but reverted during the 3.0.0 RC
period.

2019-12-06: SPARK-30098 Use default datasource as provider for CREATE TABLE
2020-05-16: SPARK-31707 Revert SPARK-30098 Use default datasource as
            provider for CREATE TABLE command

At Apache Spark 3.1.0, we had another discussion about this and defined it
as one of legacy behavior via this configuration via reused ID, SPARK-30098.

2020-12-01: https://lists.apache.org/thread/8c8k1jk61pzlcosz3mxo4rkj5l23r204
2020-12-03: SPARK-30098 Add a configuration to use default datasource as
            provider for CREATE TABLE command

Last year, we received two additional requests twice to switch this because
Apache Spark 4.0.0 is a good time to make a decision for the future
direction.

2023-02-27: SPARK-42603 as an independent idea.
2023-11-27: SPARK-46122 as a part of Apache Spark 4.0.0 idea


WDYT? The technical scope is defined in the following PR which is one line
of main
code, one line of migration guide, and a few lines of test code.

- https://github.com/apache/spark/pull/46207

Dongjoon.

Reply via email to