Technically, I has been suffered with (1) `CREATE TABLE` due to many
difference for a long time (since 2017). So, I had a wrong assumption for
the implication of that "(2) FYI: SPARK-30098 Use default datasource as
provider for CREATE TABLE syntax", Reynold. I admit that. You may not feel
in the similar way. However, it was a lot to me. Also, switching
`convertMetastoreOrc` at 2.4 was a big change to me although there will be
no difference for Parquet-only users.


> References:
> 1. "CHAR implementation?", 2017/09/15
> 2. "FYI: SPARK-30098 Use default datasource as provider for CREATE TABLE
syntax", 2019/12/06

On Thu, Mar 19, 2020 at 8:47 PM Reynold Xin <> wrote:

> You are joking when you said " informed widely and discussed in many ways
> twice" right?
> This thread doesn't even talk about char/varchar:
> (Yes it talked about changing the default data source provider, but that's
> just one of the ways we are exposing this char/varchar issue).
> On Thu, Mar 19, 2020 at 8:41 PM, Dongjoon Hyun <>
> wrote:
>> +1 for Wenchen's suggestion.
>> I believe that the difference and effects are informed widely and
>> discussed in many ways twice.
>> First, this was shared on last December.
>>     "FYI: SPARK-30098 Use default datasource as provider for CREATE TABLE
>> syntax", 2019/12/06
>> Second (at this time in this thread), this has been discussed according
>> to the new community rubric.
>>     - (Section:
>> "Considerations When Breaking APIs")
>> Thank you all.
>> Bests,
>> Dongjoon.
>> On Tue, Mar 17, 2020 at 10:41 PM Wenchen Fan <> wrote:
>>> OK let me put a proposal here:
>>> 1. Permanently ban CHAR for native data source tables, and only keep it
>>> for Hive compatibility.
>>> It's OK to forget about padding like what Snowflake and MySQL have done.
>>> But it's hard for Spark to require consistent behavior about CHAR type in
>>> all data sources. Since CHAR type is not that useful nowadays, seems OK to
>>> just ban it. Another way is to document that the padding of CHAR type is
>>> data source dependent, but it's a bit weird to leave this inconsistency in
>>> Spark.
>>> 2. Leave VARCHAR unchanged in 3.0
>>> VARCHAR type is so widely used in databases and it's weird if Spark
>>> doesn't support it. VARCHAR type is exactly the same as Spark StringType
>>> when the length limitation is not hit, and I'm fine to temporarily leave
>>> this flaw in 3.0 and users may hit behavior changes when the string values
>>> hit the VARCHAR length limitation.
>>> 3. Finalize the VARCHAR behavior in 3.1
>>> For now I have 2 ideas:
>>> a) Make VARCHAR(x) a first-class data type. This means Spark data
>>> sources should support VARCHAR, and CREATE TABLE should fail if a column is
>>> VARCHAR type and the underlying data source doesn't support it (e.g.
>>> JSON/CSV). Type cast, type coercion, table insertion, etc. should be
>>> updated as well.
>>> b) Simply document that, the underlying data source may or may not
>>> enforce the length limitation of VARCHAR(x).
>>> Please let me know if you have different ideas.
>>> Thanks,
>>> Wenchen
>>> On Wed, Mar 18, 2020 at 1:08 AM Michael Armbrust <>
>>> wrote:
>>>> What I'd oppose is to just ban char for the native data sources, and do
>>>>> not have a plan to address this problem systematically.
>>>> +1
>>>>> Just forget about padding, like what Snowflake and MySQL have done.
>>>>> Document that char(x) is just an alias for string. And then move on. 
>>>>> Almost
>>>>> no work needs to be done...
>>>> +1

Reply via email to