Re: [VOTE] Functional DataSourceV2 in Spark 3.0

Joseph Torres Thu, 28 Feb 2019 10:53:10 -0800

I’m sure we, as a community, will seriously consider any proposal that
Spark would benefit if the PMC delays release X to include changes A, B, C.
Indeed, every release I remember has had a few iterations of “can we hold
the train for a bit because it would be super great to get this PR in”.


Many contributors (including me) do believe data source v2 should be done
by 3.0, can mark the appropriate JIRAs as blockers, and will at release
time argue in favor of holding the train for a week or two if that’s what’s
needed to get all the pieces on board.

What the vote seems to imply is that we will consider holding the release
even beyond the “just a few PRs in review” level, if there are serious
outstanding design or implementation questions. That’s not a judgment I
think we can make in advance. Is it better to delay Spark 3.0 by N months
or DSv2 by 6 months? Who knows - depends on the PMC’s priorities at the
time and how confident we are in the value of N.

On Thu, Feb 28, 2019 at 10:24 AM Ryan Blue <rb...@netflix.com.invalid>
wrote:

> Mark, I disagree. Setting common goals is a critical part of getting
> things done.
>
> This doesn't commit the community to push out the release if the goals
> aren't met, but does mean that we will, as a community, seriously consider
> it. This is also an acknowledgement that this is the most important feature
> in the next release (whether major or minor) for many of us. This has been
> in limbo for a very long time, so I think it is important for the community
> to commit to getting it to a functional state.
>
> It sounds like your objection is to this commitment for 3.0, but remember
> that 3.0 is the next release so that we can remove deprecated APIs. It does
> not mean that we aren't adding new features in that release and aren't
> considering other goals.
>
> On Thu, Feb 28, 2019 at 10:12 AM Mark Hamstra <m...@clearstorydata.com>
> wrote:
>
>> Then I'm -1. Setting new features as blockers of major releases is not
>> proper project management, IMO.
>>
>> On Thu, Feb 28, 2019 at 10:06 AM Ryan Blue <rb...@netflix.com> wrote:
>>
>>> Mark, if this goal is adopted, "we" is the Apache Spark community.
>>>
>>> On Thu, Feb 28, 2019 at 9:52 AM Mark Hamstra <m...@clearstorydata.com>
>>> wrote:
>>>
>>>> Who is "we" in these statements, such as "we should consider a
>>>> functional DSv2 implementation a blocker for Spark 3.0"? If it means those
>>>> contributing to the DSv2 effort want to set their own goals, milestones,
>>>> etc., then that is fine with me. If you mean that the Apache Spark project
>>>> should officially commit to the lack of a functional DSv2 implementation
>>>> being a blocker for the release of Spark 3.0, then I'm -1. A major release
>>>> is just not about adding new features. Rather, it is about making changes
>>>> to the existing public API. As such, I'm opposed to any new feature or any
>>>> API addition being considered a blocker of the 3.0.0 release.
>>>>
>>>>
>>>> On Thu, Feb 28, 2019 at 9:09 AM Matt Cheah <mch...@palantir.com> wrote:
>>>>
>>>>> +1 (non-binding)
>>>>>
>>>>>
>>>>>
>>>>> Are identifiers and namespaces going to be rolled under one of those
>>>>> six points?
>>>>>
>>>>>
>>>>>
>>>>> *From: *Ryan Blue <rb...@netflix.com.INVALID>
>>>>> *Reply-To: *"rb...@netflix.com" <rb...@netflix.com>
>>>>> *Date: *Thursday, February 28, 2019 at 8:39 AM
>>>>> *To: *Spark Dev List <dev@spark.apache.org>
>>>>> *Subject: *[VOTE] Functional DataSourceV2 in Spark 3.0
>>>>>
>>>>>
>>>>>
>>>>> I’d like to call a vote for committing to getting DataSourceV2 in a
>>>>> functional state for Spark 3.0.
>>>>>
>>>>> For more context, please see the discussion thread, but here is a
>>>>> quick summary about what this commitment means:
>>>>>
>>>>> ·         We think that a “functional DSv2” is an achievable goal for
>>>>> the Spark 3.0 release
>>>>>
>>>>> ·         We will consider this a blocker for Spark 3.0, and take
>>>>> reasonable steps to make it happen
>>>>>
>>>>> ·         We will *not* delay the release without a community
>>>>> discussion
>>>>>
>>>>> Here’s what we’ve defined as a functional DSv2:
>>>>>
>>>>> ·         Add a plugin system for catalogs
>>>>>
>>>>> ·         Add an interface for table catalogs (see the ongoing SPIP
>>>>> vote)
>>>>>
>>>>> ·         Add an implementation of the new interface that calls
>>>>> SessionCatalog to load v2 tables
>>>>>
>>>>> ·         Add a resolution rule to load v2 tables from the v2 catalog
>>>>>
>>>>> ·         Add CTAS logical and physical plan nodes
>>>>>
>>>>> ·         Add conversions from SQL parsed plans to v2 logical plans
>>>>> (e.g., INSERT INTO support)
>>>>>
>>>>> Please vote in the next 3 days on whether you agree with committing to
>>>>> this goal.
>>>>>
>>>>> [ ] +1: Agree that we should consider a functional DSv2 implementation
>>>>> a blocker for Spark 3.0
>>>>> [ ] +0: . . .
>>>>> [ ] -1: I disagree with this goal because . . .
>>>>>
>>>>> Thank you!
>>>>>
>>>>> --
>>>>>
>>>>> Ryan Blue
>>>>>
>>>>> Software Engineer
>>>>>
>>>>> Netflix
>>>>>
>>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: [VOTE] Functional DataSourceV2 in Spark 3.0

Reply via email to