Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

Xiao Li Mon, 11 May 2020 00:30:17 -0700

>
> 1. Turn on spark.sql.legacy.createHiveTableByDefault.enabled by default,
> which effectively revert SPARK-30098. The CREATE TABLE syntax is still
> confusing but it's the same as 2.4
> 2. Do not support the v2 CreateTable command if STORE AS/BY or EXTERNAL is
> specified. This gives us more time to think about how to do it in 3.1.
>


I prefer to first turn on *spark.sql.legacy.createHiveTableByDefault.*
*enabled* by default and then start RC2 first.

We still can continue trying option 2, if we can finish it within 10
days. BTW, we still have multiple ongoing discussions about data source v2
APIs. To be honest, most Spark users will not hit these cases in Spark 3.0.
Thus, temporarily blocking a few cases in DSV2 looks reasonable to me. We
can support them in Spark 3.1.

Xiao





On Sun, May 10, 2020 at 9:32 PM Jungtaek Lim <kabhwan.opensou...@gmail.com>
wrote:

> Let's focus on how to unblock Spark 3.0.0 for now, as other blockers are
> getting resolved.
>
> I'm in favor of option 1 to avoid bring multiple backward incompatible
> changes. Unifying create table would bring backward incompatibility (I'd
> rather say the new syntax should be cleared up ignoring the backward
> compatibility) and we'd be better to not force end users to adopt the
> changes twice.
>
> On Fri, May 8, 2020 at 11:22 PM Wenchen Fan <cloud0...@gmail.com> wrote:
>
>> Hi all,
>>
>> I'd like to bring this up again to share the status and get more
>> feedback. Currently, we all agree to unify the CREATE TABLE syntax by
>> merging the native and Hive-style syntaxes.
>>
>> The unified CREATE TABLE syntax will become the native syntax and there
>> is no Hive-style syntax anymore. This brings several changes:
>> 1. support PARTITION BY (col type, ...). This can't co-exist with PARTITION
>> BY (col, ...), and simply adds partition columns to the end.
>> 2. support SKEWED BY, which just fails
>> 3. support STORE AS/BY, which can't co-exist with USING provider
>> 4. support EXTERNAL as well
>>
>> All the behaviors will remain the same as before, for the builtin
>> catalog. However, the native CREATE TABLE syntax needs to support the v2
>> CreateTable command and we need to translate the new syntax changes to
>> catalog plugin API calls, and we are still working on reaching an agreement
>> about how to do it.
>>
>> To unblock 3.0, I think there are two choices:
>> 1. Turn on spark.sql.legacy.createHiveTableByDefault.enabled by default,
>> which effectively revert SPARK-30098. The CREATE TABLE syntax is still
>> confusing but it's the same as 2.4
>> 2. Do not support the v2 CreateTable command if STORE AS/BY or EXTERNAL is
>> specified. This gives us more time to think about how to do it in 3.1.
>>
>> If you have other ideas, please reply to this thread.
>>
>> Thanks,
>> Wenchen
>>
>> On Thu, Mar 26, 2020 at 7:28 AM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>> Thanks, filed SPARK-31257
>>> <https://issues.apache.org/jira/browse/SPARK-31257>. Thanks again for
>>> looking into this - I'll take a look whenever I get time sooner.
>>>
>>> On Thu, Mar 26, 2020 at 8:06 AM Ryan Blue <rb...@netflix.com> wrote:
>>>
>>>> Feel free to open another issue, I just used that one since it
>>>> describes this and doesn't appear to be done.
>>>>
>>>> On Wed, Mar 25, 2020 at 4:03 PM Jungtaek Lim <
>>>> kabhwan.opensou...@gmail.com> wrote:
>>>>
>>>>> UPDATE: Sorry I just missed the PR (
>>>>> https://github.com/apache/spark/pull/28026). I still think it'd be
>>>>> nice to avoid recycling the JIRA issue which was resolved before. Shall we
>>>>> have a new JIRA issue with linking to SPARK-30098, and set proper 
>>>>> priority?
>>>>>
>>>>> On Thu, Mar 26, 2020 at 7:59 AM Jungtaek Lim <
>>>>> kabhwan.opensou...@gmail.com> wrote:
>>>>>
>>>>>> Would it be better to prioritize this to make sure the change is
>>>>>> included in Spark 3.0? (Maybe filing an issue and set as a blocker)
>>>>>>
>>>>>> Looks like there's consensus that SPARK-30098 brought ambiguous issue
>>>>>> which should be fixed (though the consideration of severity seems to be
>>>>>> different), and once we notice the issue it would be really odd if we
>>>>>> publish it as it is, and try to fix it later (the fix may not be even
>>>>>> included in 3.0.x as it might bring behavioral change).
>>>>>>
>>>>>> On Tue, Mar 24, 2020 at 3:37 PM Wenchen Fan <cloud0...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Ryan,
>>>>>>>
>>>>>>> It's great to hear that you are cleaning up this long-standing mess.
>>>>>>> Please let me know if you hit any problems that I can help with.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Wenchen
>>>>>>>
>>>>>>> On Sat, Mar 21, 2020 at 3:16 AM Nicholas Chammas <
>>>>>>> nicholas.cham...@gmail.com> wrote:
>>>>>>>
>>>>>>>> On Thu, Mar 19, 2020 at 3:46 AM Wenchen Fan <cloud0...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> 2. PARTITIONED BY colTypeList: I think we can support it in the
>>>>>>>>> unified syntax. Just make sure it doesn't appear together with 
>>>>>>>>> PARTITIONED
>>>>>>>>> BY transformList.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Another side note: Perhaps as part of (or after) unifying the
>>>>>>>> CREATE TABLE syntax, we can also update Catalog.createTable() to
>>>>>>>> support creating partitioned tables
>>>>>>>> <https://issues.apache.org/jira/browse/SPARK-31001>.
>>>>>>>>
>>>>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Software Engineer
>>>> Netflix
>>>>
>>>

-- 
<https://databricks.com/sparkaisummit/north-america>

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

Reply via email to