Technically, I has been suffered with (1) `CREATE TABLE` due to many difference for a long time (since 2017). So, I had a wrong assumption for the implication of that "(2) FYI: SPARK-30098 Use default datasource as provider for CREATE TABLE syntax", Reynold. I admit that. You may not feel in the similar way. However, it was a lot to me. Also, switching `convertMetastoreOrc` at 2.4 was a big change to me although there will be no difference for Parquet-only users.
Dongjoon. > References: > 1. "CHAR implementation?", 2017/09/15 > https://lists.apache.org/thread.html/96b004331d9762e356053b5c8c97e953e398e489d15e1b49e775702f%40%3Cdev.spark.apache.org%3E > 2. "FYI: SPARK-30098 Use default datasource as provider for CREATE TABLE syntax", 2019/12/06 > https://lists.apache.org/thread.html/493f88c10169680191791f9f6962fd16cd0ffa3b06726e92ed04cbe1%40%3Cdev.spark.apache.org%3E On Thu, Mar 19, 2020 at 8:47 PM Reynold Xin <r...@databricks.com> wrote: > You are joking when you said " informed widely and discussed in many ways > twice" right? > > This thread doesn't even talk about char/varchar: > https://lists.apache.org/thread.html/493f88c10169680191791f9f6962fd16cd0ffa3b06726e92ed04cbe1%40%3Cdev.spark.apache.org%3E > > (Yes it talked about changing the default data source provider, but that's > just one of the ways we are exposing this char/varchar issue). > > > > On Thu, Mar 19, 2020 at 8:41 PM, Dongjoon Hyun <dongjoon.h...@gmail.com> > wrote: > >> +1 for Wenchen's suggestion. >> >> I believe that the difference and effects are informed widely and >> discussed in many ways twice. >> >> First, this was shared on last December. >> >> "FYI: SPARK-30098 Use default datasource as provider for CREATE TABLE >> syntax", 2019/12/06 >> >> https://lists.apache.org/thread.html/493f88c10169680191791f9f6962fd16cd0ffa3b06726e92ed04cbe1%40%3Cdev.spark.apache.org%3E >> >> Second (at this time in this thread), this has been discussed according >> to the new community rubric. >> >> - https://spark.apache.org/versioning-policy.html (Section: >> "Considerations When Breaking APIs") >> >> Thank you all. >> >> Bests, >> Dongjoon. >> >> On Tue, Mar 17, 2020 at 10:41 PM Wenchen Fan <cloud0...@gmail.com> wrote: >> >>> OK let me put a proposal here: >>> >>> 1. Permanently ban CHAR for native data source tables, and only keep it >>> for Hive compatibility. >>> It's OK to forget about padding like what Snowflake and MySQL have done. >>> But it's hard for Spark to require consistent behavior about CHAR type in >>> all data sources. Since CHAR type is not that useful nowadays, seems OK to >>> just ban it. Another way is to document that the padding of CHAR type is >>> data source dependent, but it's a bit weird to leave this inconsistency in >>> Spark. >>> >>> 2. Leave VARCHAR unchanged in 3.0 >>> VARCHAR type is so widely used in databases and it's weird if Spark >>> doesn't support it. VARCHAR type is exactly the same as Spark StringType >>> when the length limitation is not hit, and I'm fine to temporarily leave >>> this flaw in 3.0 and users may hit behavior changes when the string values >>> hit the VARCHAR length limitation. >>> >>> 3. Finalize the VARCHAR behavior in 3.1 >>> For now I have 2 ideas: >>> a) Make VARCHAR(x) a first-class data type. This means Spark data >>> sources should support VARCHAR, and CREATE TABLE should fail if a column is >>> VARCHAR type and the underlying data source doesn't support it (e.g. >>> JSON/CSV). Type cast, type coercion, table insertion, etc. should be >>> updated as well. >>> b) Simply document that, the underlying data source may or may not >>> enforce the length limitation of VARCHAR(x). >>> >>> Please let me know if you have different ideas. >>> >>> Thanks, >>> Wenchen >>> >>> On Wed, Mar 18, 2020 at 1:08 AM Michael Armbrust <mich...@databricks.com> >>> wrote: >>> >>>> What I'd oppose is to just ban char for the native data sources, and do >>>>> not have a plan to address this problem systematically. >>>>> >>>> >>>> +1 >>>> >>>> >>>>> Just forget about padding, like what Snowflake and MySQL have done. >>>>> Document that char(x) is just an alias for string. And then move on. >>>>> Almost >>>>> no work needs to be done... >>>>> >>>> >>>> +1 >>>> >>> >