I'm reviving this thread because this feature was reverted before the 3.0 release, and now we are trying to add it back since the CREATE TABLE syntax is unified.
The benefits are pretty clear: CREATE TABLE by default (without USING or STORED AS) should create native tables that work best with Spark. You can see all the benefits listed in https://github.com/apache/spark/pull/30554. I'm sending this email to collect feedback about the risks. AFAIK the broken use cases are: 1. A user issues `CREATE TABLE ... LOCATION ...`. After some table insertions he want to read the data files directly from the table location. Because the file format is changed from Hive text to Parquet, this use case may be broken. 2. A user issues `CREATE TABLE ...` and then runs `ALTER TABLE SET SERDE` or `LOAD DATA`. These two are Hive specific commands and doesn't work with Spark native tables. 3. A user issues `CREATE TABLE ...` and then uses Hive to add partitions with different serdes to this table. Spark doesn't allow a native partitioned table to have partitions with different formats. >From my personal experience, the Hive text tables are usually used to import CSV-like data. It's very likely that people will create Hive text table explicitly as they need the Hive syntax to specify options like delimiter. Besides, I'm not sure how many Spark users are using this feature, as the native CSV data source can do the same job. I'd consider it a bad user experience if a simple `CREATE TABLE` gives users a very slow table. Changing it to return native Parquet table doesn't seems to break many people, but I can be wrong. Please reply to this thread if you know more use cases that may be affected by this change, and share your thoughts. Thanks, Wenchen On Sat, Dec 7, 2019 at 1:58 PM Takeshi Yamamuro <linguin....@gmail.com> wrote: > Oh, looks nice. Thanks for the sharing, Dongjoon > > Bests, > Takeshi > > On Sat, Dec 7, 2019 at 3:35 AM Dongjoon Hyun <dongjoon.h...@gmail.com> > wrote: > >> Hi, All. >> >> I want to share the following change to the community. >> >> SPARK-30098 Use default datasource as provider for CREATE TABLE syntax >> >> This is merged today and now Spark's `CREATE TABLE` is using Spark's >> default data sources instead of `hive` provider. This is a good and big >> improvement for Apache Spark 3.0, but this might surprise someone. (Please >> note that there is a fallback option for them.) >> >> Thank you, Yi, Wenchen, Xiao. >> >> Cheers, >> Dongjoon. >> > > > -- > --- > Takeshi Yamamuro >