+1 for Wenchen's suggestion. I believe that the difference and effects are informed widely and discussed in many ways twice.
First, this was shared on last December. "FYI: SPARK-30098 Use default datasource as provider for CREATE TABLE syntax", 2019/12/06 https://lists.apache.org/thread.html/493f88c10169680191791f9f6962fd16cd0ffa3b06726e92ed04cbe1%40%3Cdev.spark.apache.org%3E Second (at this time in this thread), this has been discussed according to the new community rubric. - https://spark.apache.org/versioning-policy.html (Section: "Considerations When Breaking APIs") Thank you all. Bests, Dongjoon. On Tue, Mar 17, 2020 at 10:41 PM Wenchen Fan <cloud0...@gmail.com> wrote: > OK let me put a proposal here: > > 1. Permanently ban CHAR for native data source tables, and only keep it > for Hive compatibility. > It's OK to forget about padding like what Snowflake and MySQL have done. > But it's hard for Spark to require consistent behavior about CHAR type in > all data sources. Since CHAR type is not that useful nowadays, seems OK to > just ban it. Another way is to document that the padding of CHAR type is > data source dependent, but it's a bit weird to leave this inconsistency in > Spark. > > 2. Leave VARCHAR unchanged in 3.0 > VARCHAR type is so widely used in databases and it's weird if Spark > doesn't support it. VARCHAR type is exactly the same as Spark StringType > when the length limitation is not hit, and I'm fine to temporarily leave > this flaw in 3.0 and users may hit behavior changes when the string values > hit the VARCHAR length limitation. > > 3. Finalize the VARCHAR behavior in 3.1 > For now I have 2 ideas: > a) Make VARCHAR(x) a first-class data type. This means Spark data sources > should support VARCHAR, and CREATE TABLE should fail if a column is VARCHAR > type and the underlying data source doesn't support it (e.g. JSON/CSV). > Type cast, type coercion, table insertion, etc. should be updated as well. > b) Simply document that, the underlying data source may or may not enforce > the length limitation of VARCHAR(x). > > Please let me know if you have different ideas. > > Thanks, > Wenchen > > On Wed, Mar 18, 2020 at 1:08 AM Michael Armbrust <mich...@databricks.com> > wrote: > >> What I'd oppose is to just ban char for the native data sources, and do >>> not have a plan to address this problem systematically. >>> >> >> +1 >> >> >>> Just forget about padding, like what Snowflake and MySQL have done. >>> Document that char(x) is just an alias for string. And then move on. Almost >>> no work needs to be done... >>> >> >> +1 >> >> >