Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

Jungtaek Lim Tue, 12 May 2020 16:23:23 -0700

Never mind, forget about the dead code. Turned out that reverting "via
manual" can be very easily done - remove config and apply to the tests ->
remove field -> remove the changes into docs. It was considered as
non-trivial because we only consider about "git revert" but there's no
strict rule to do so.


Please take a look at this PR, https://github.com/apache/spark/pull/28517

On Wed, May 13, 2020 at 3:10 AM Russell Spitzer <russell.spit...@gmail.com>
wrote:

> I think that the dead code approach, while a bit unpalatable and worse
> than reverting, is probably better than leaving the parameter (even if it
> is hidden)
>
> On Tue, May 12, 2020 at 12:46 PM Ryan Blue <rb...@netflix.com> wrote:
>
>> +1 for the approach Jungtaek suggests. That will avoid needing to support
>> behavior that is not well understood with minimal changes.
>>
>> On Tue, May 12, 2020 at 1:45 AM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>> Before I forget, we'd better not forget to change the doc, as create
>>> table doc looks to represent current syntax which will be incorrect later.
>>>
>>> On Tue, May 12, 2020 at 5:32 PM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>>
>>>> It's not only for end users, but also for us. Spark itself uses the
>>>> config "true" and "false" in tests and it still brings confusion. We still
>>>> have to deal with both situations.
>>>>
>>>> I'm wondering how long days it would be needed to revert it cleanly,
>>>> but if we worry about the amount of code change just around the new RC,
>>>> what about make the code dirty (should be fixed soon) but less headache via
>>>> applying traditional (and bad) way?
>>>>
>>>> Let's just remove the config so that the config cannot be used in any
>>>> way (even in Spark codebase), and set corresponding field in parser to the
>>>> constant value so that no one can modify in any way. This would make the
>>>> dead code by intention which should be cleaned it up later, so let's add
>>>> FIXME comment there so that anyone can take it up for cleaning up the code
>>>> later. (If no one volunteers then I'll probably pick up.)
>>>>
>>>> That is a bad pattern, but still better as we prevent end users (even
>>>> early adopters) go through the undocumented path in any way, and that will
>>>> be explicitly marked as "should be fixed". This is different from retaining
>>>> config - I don't expect unified create table syntax will be landed in
>>>> bugfix version, so even unified create table syntax can be landed in 3.1.0
>>>> (this is also not guaranteed) the config will live in 3.0.x in any way. If
>>>> we temporarily go dirty way then we can clean up the code in any version,
>>>> even from bugfix version, maybe within a couple of weeks just after 3.0.0
>>>> is released.
>>>>
>>>> Does it sound valid?
>>>>
>>>> On Tue, May 12, 2020 at 2:35 PM Wenchen Fan <cloud0...@gmail.com>
>>>> wrote:
>>>>
>>>>> SPARK-30098 was merged about 6 months ago. It's not a clean revert and
>>>>> we may need to spend quite a bit of time to resolve conflicts and fix 
>>>>> tests.
>>>>>
>>>>> I don't see why it's still a problem if a feature is disabled and
>>>>> hidden from end-users (it's undocumented, the config is internal). The
>>>>> related code will be replaced in the master branch sooner or later, when 
>>>>> we
>>>>> unify the syntaxes.
>>>>>
>>>>>
>>>>>
>>>>> On Tue, May 12, 2020 at 6:16 AM Ryan Blue <rb...@netflix.com.invalid>
>>>>> wrote:
>>>>>
>>>>>> I'm all for getting the unified syntax into master. The only issue
>>>>>> appears to be whether or not to pass the presence of the EXTERNAL keyword
>>>>>> through to a catalog in v2. Maybe it's time to start a discuss thread for
>>>>>> that issue so we're not stuck for another 6 weeks on it.
>>>>>>
>>>>>> On Mon, May 11, 2020 at 3:13 PM Jungtaek Lim <
>>>>>> kabhwan.opensou...@gmail.com> wrote:
>>>>>>
>>>>>>> Btw another wondering here is, is it good to retain the flag on
>>>>>>> master as an intermediate step? Wouldn't it be better for us to
>>>>>>> start "unified create table syntax" from scratch?
>>>>>>>
>>>>>>>
>>>>>>> On Tue, May 12, 2020 at 6:50 AM Jungtaek Lim <
>>>>>>> kabhwan.opensou...@gmail.com> wrote:
>>>>>>>
>>>>>>>> I'm sorry, but I have to agree with Ryan and Russell. I chose the
>>>>>>>> option 1 because it's less worse than option 2, but it doesn't mean I 
>>>>>>>> fully
>>>>>>>> agree with option 1.
>>>>>>>>
>>>>>>>> Let's make below things clear if we really go with option 1,
>>>>>>>> otherwise please consider reverting it.
>>>>>>>>
>>>>>>>> * Do you fully indicate about "all" the paths where the second
>>>>>>>> create table syntax is taken?
>>>>>>>> * Could you explain "why" to end users without any confusion? Do
>>>>>>>> you think end users will understand it easily?
>>>>>>>> * Do you have an actual end users to guide to turn this on? Or do
>>>>>>>> you have a plan to turn this on for your team/customers and deal with
>>>>>>>> the ambiguity?
>>>>>>>> * Could you please document about how things will change if the
>>>>>>>> flag is turned on?
>>>>>>>>
>>>>>>>> I guess the option 1 is to leave a flag as "undocumented" one and
>>>>>>>> forget about the path to turn on, but I think that would lead to make 
>>>>>>>> the
>>>>>>>> feature be "broken window" even we are not able to touch.
>>>>>>>>
>>>>>>>> On Tue, May 12, 2020 at 6:45 AM Russell Spitzer <
>>>>>>>> russell.spit...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> I think reverting 30098 is the right decision here if we want to
>>>>>>>>> unblock 3.0. We shouldn't ship with features which we know do not 
>>>>>>>>> function
>>>>>>>>> in the way we intend, regardless of how little exposure most users 
>>>>>>>>> have to
>>>>>>>>> them. Even if it's off my default, we should probably work to avoid
>>>>>>>>> switches that cause things to behave unpredictably or require a flow 
>>>>>>>>> chart
>>>>>>>>> to actually determine what will happen.
>>>>>>>>>
>>>>>>>>> On Mon, May 11, 2020 at 3:07 PM Ryan Blue
>>>>>>>>> <rb...@netflix.com.invalid> wrote:
>>>>>>>>>
>>>>>>>>>> I'm all for fixing behavior in master by turning this off as an
>>>>>>>>>> intermediate step, but I don't think that Spark 3.0 can safely 
>>>>>>>>>> include
>>>>>>>>>> SPARK-30098.
>>>>>>>>>>
>>>>>>>>>> The problem is that SPARK-30098 introduces strange behavior, as
>>>>>>>>>> Jungtaek pointed out. And that behavior is not fully understood. 
>>>>>>>>>> While
>>>>>>>>>> working on a unified CREATE TABLE syntax, I hit additional test
>>>>>>>>>> failures
>>>>>>>>>> <https://github.com/apache/spark/pull/28026#issuecomment-606967363>
>>>>>>>>>> where the wrong create path was being used.
>>>>>>>>>>
>>>>>>>>>> Unless we plan to NOT support the behavior
>>>>>>>>>> when spark.sql.legacy.createHiveTableByDefault.enabled is disabled, 
>>>>>>>>>> we
>>>>>>>>>> should not ship Spark 3.0 with SPARK-30098. Otherwise, we will have 
>>>>>>>>>> to deal
>>>>>>>>>> with this problem for years to come.
>>>>>>>>>>
>>>>>>>>>> On Mon, May 11, 2020 at 1:06 AM JackyLee <qcsd2...@163.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> +1. Agree with Xiao Li and Jungtaek Lim.
>>>>>>>>>>>
>>>>>>>>>>> This seems to be controversial, and can not be done in a short
>>>>>>>>>>> time. It is
>>>>>>>>>>> necessary to choose option 1 to unblock Spark 3.0 and support it
>>>>>>>>>>> in 3.1.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Sent from:
>>>>>>>>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Ryan Blue
>>>>>>>>>> Software Engineer
>>>>>>>>>> Netflix
>>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Ryan Blue
>>>>>> Software Engineer
>>>>>> Netflix
>>>>>>
>>>>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

Reply via email to