Re: [VOTE] Functional DataSourceV2 in Spark 3.0

Sean Owen Thu, 28 Feb 2019 11:49:06 -0800

This is a fine thing to VOTE on. Committers (and community,
non-binding) can VOTE on what we like; we just don't do it often where
not required because it's a) overkill overhead over simple lazy
consensus, and b) it can be hard to say what the binding VOTE binds if
it's not a discrete commit or release. This is a big enough deal that
it's not overkill. The question is, what does it bind?


It means the release is definitely blocked until the items here are
done, but, what's 'done'? It will return to the same questions already
on the table, like do we need to define just APIs, and to what degree
of stability. At worst it might not resolve anything.

I don't see much harm in nailing down what appears to be agreement at
the level of specific goals, even if this isn't a vote on a release
date or specific commit. I think it's clear these items must be
resolved to the level of semi-stable API by 3.0, as it's coming soon
and this is the right time to establish these APIs. It might provide
necessary clarity and constraints to get it over the line.

To Mark -- yeah, this is asserting that DSv2 is a primary or necessary
goal of the release, just like a "Blocker" does. Why would this
argument be different or better if it waited until 3.0 was imminent? I
get that one might say, well, we ended up working on more important
stuff in the meantime and now we don't have time. But this VOTE's
purpose is to declare that this is the important stuff now.

To Jose -- what's the "just a few PRs in review" issue? you worry that
we might rush DSv2 at the end to meet a deadline? all the better to,
if anything, agree it's important now. It's also an agreement to delay
the release for it, not rush it. I don't see that later is a better
time to make the decision, if rush is a worry?

Given my definition, and understanding of the issues, I'd say +1

On Thu, Feb 28, 2019 at 12:24 PM Ryan Blue <rb...@netflix.com.invalid> wrote:
>
> Mark, I disagree. Setting common goals is a critical part of getting things 
> done.
>
> This doesn't commit the community to push out the release if the goals aren't 
> met, but does mean that we will, as a community, seriously consider it. This 
> is also an acknowledgement that this is the most important feature in the 
> next release (whether major or minor) for many of us. This has been in limbo 
> for a very long time, so I think it is important for the community to commit 
> to getting it to a functional state.
>
> It sounds like your objection is to this commitment for 3.0, but remember 
> that 3.0 is the next release so that we can remove deprecated APIs. It does 
> not mean that we aren't adding new features in that release and aren't 
> considering other goals.
>
> On Thu, Feb 28, 2019 at 10:12 AM Mark Hamstra <m...@clearstorydata.com> wrote:
>>
>> Then I'm -1. Setting new features as blockers of major releases is not 
>> proper project management, IMO.
>>
>> On Thu, Feb 28, 2019 at 10:06 AM Ryan Blue <rb...@netflix.com> wrote:
>>>
>>> Mark, if this goal is adopted, "we" is the Apache Spark community.
>>>
>>> On Thu, Feb 28, 2019 at 9:52 AM Mark Hamstra <m...@clearstorydata.com> 
>>> wrote:
>>>>
>>>> Who is "we" in these statements, such as "we should consider a functional 
>>>> DSv2 implementation a blocker for Spark 3.0"? If it means those 
>>>> contributing to the DSv2 effort want to set their own goals, milestones, 
>>>> etc., then that is fine with me. If you mean that the Apache Spark project 
>>>> should officially commit to the lack of a functional DSv2 implementation 
>>>> being a blocker for the release of Spark 3.0, then I'm -1. A major release 
>>>> is just not about adding new features. Rather, it is about making changes 
>>>> to the existing public API. As such, I'm opposed to any new feature or any 
>>>> API addition being considered a blocker of the 3.0.0 release.
>>>>
>>>>
>>>> On Thu, Feb 28, 2019 at 9:09 AM Matt Cheah <mch...@palantir.com> wrote:
>>>>>
>>>>> +1 (non-binding)
>>>>>
>>>>>
>>>>>
>>>>> Are identifiers and namespaces going to be rolled under one of those six 
>>>>> points?
>>>>>
>>>>>
>>>>>
>>>>> From: Ryan Blue <rb...@netflix.com.INVALID>
>>>>> Reply-To: "rb...@netflix.com" <rb...@netflix.com>
>>>>> Date: Thursday, February 28, 2019 at 8:39 AM
>>>>> To: Spark Dev List <dev@spark.apache.org>
>>>>> Subject: [VOTE] Functional DataSourceV2 in Spark 3.0
>>>>>
>>>>>
>>>>>
>>>>> I’d like to call a vote for committing to getting DataSourceV2 in a 
>>>>> functional state for Spark 3.0.
>>>>>
>>>>> For more context, please see the discussion thread, but here is a quick 
>>>>> summary about what this commitment means:
>>>>>
>>>>> ·         We think that a “functional DSv2” is an achievable goal for the 
>>>>> Spark 3.0 release
>>>>>
>>>>> ·         We will consider this a blocker for Spark 3.0, and take 
>>>>> reasonable steps to make it happen
>>>>>
>>>>> ·         We will not delay the release without a community discussion
>>>>>
>>>>> Here’s what we’ve defined as a functional DSv2:
>>>>>
>>>>> ·         Add a plugin system for catalogs
>>>>>
>>>>> ·         Add an interface for table catalogs (see the ongoing SPIP vote)
>>>>>
>>>>> ·         Add an implementation of the new interface that calls 
>>>>> SessionCatalog to load v2 tables
>>>>>
>>>>> ·         Add a resolution rule to load v2 tables from the v2 catalog
>>>>>
>>>>> ·         Add CTAS logical and physical plan nodes
>>>>>
>>>>> ·         Add conversions from SQL parsed plans to v2 logical plans 
>>>>> (e.g., INSERT INTO support)
>>>>>
>>>>> Please vote in the next 3 days on whether you agree with committing to 
>>>>> this goal.
>>>>>
>>>>> [ ] +1: Agree that we should consider a functional DSv2 implementation a 
>>>>> blocker for Spark 3.0
>>>>> [ ] +0: . . .
>>>>> [ ] -1: I disagree with this goal because . . .
>>>>>
>>>>> Thank you!
>>>>>
>>>>> --
>>>>>
>>>>> Ryan Blue
>>>>>
>>>>> Software Engineer
>>>>>
>>>>> Netflix
>>>
>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [VOTE] Functional DataSourceV2 in Spark 3.0

Reply via email to