Re: [VOTE] Functional DataSourceV2 in Spark 3.0

Matt Cheah Thu, 28 Feb 2019 14:06:15 -0800

I want to specifically highlight and +1 a point that Ryan brought up:


A commitment binds us to do this and make a reasonable attempt at finishing on 
time. If we choose not to commit, or if we choose to commit and don’t make a 
reasonable attempt, then we need to ask, “what happened?” Is Spark the right 
place for this work?

 

What I don’t want is to work on it for 3-4 more months, miss the release, and 
then not have anyone take that problem seriously because we never said it was 
important. If we try and fail, then we need to fix what went wrong. This 
removes the option to pretend it wasn’t a goal in the first place. That’s why I 
think it is important that we make a statement that we, the community, intend 
to do it.

 

This is the crux of the matter we want to tackle here. Whether or not we block 
the release is a decision we can make when we are closer to the release date. 
But the fact of the matter is that Data Source V2’s new APIs have not been 
given the prioritization and urgency that they deserve. This vote is binding us 
to consider Data Source V2 so important that it needs to be prioritized far 
more highly than it is right now, to the point where we would at least consider 
delaying the release if it meant we could finish the work.

 

I also don’t quite follow the reason why we shouldn’t consider features to be 
as important to target as API breaks in major versions. When major versions of 
any software product are introduced, they certainly include API breaks as 
necessary, but they also add new features that give users incentive to upgrade 
in the first place. If all we do is introduce API breaks but no new features or 
critical bug fixes (and critical bug fixes are often severe enough that they’re 
backported to earlier branches anyways), what appeal is there for users to 
upgrade to that latest version?

 

-Matt Cheah

 

On 2/28/19, 1:37 PM, "Mridul Muralidharan" <[email protected]> wrote:

 

      I am -1 on this vote for pretty much all the reasons that Mark mentioned.

    A major version change gives us an opportunity to remove deprecated

    interfaces, stabilize experimental/developer api, drop support for

    outdated functionality/platforms and evolve the project with a vision

    for foreseeable future.

    IMO the primary focus should be on interface evolution, stability and

    lowering tech debt which might result in breaking changes.

    

    Which is not to say DSv2 should not be part of 3.0

    Along with a lot of other exciting features also being added, it can

    be one more important enhancement.

    

    But I am not for delaying the release simply to accommodate a specific 
feature.

    Features can be added in subsequent as well - I am yet to hear of a

    good reason why it must be make it into 3.0 to need a VOTE thread.

    

    Regards,

    Mridul

    

    On Thu, Feb 28, 2019 at 10:44 AM Mark Hamstra <[email protected]> 
wrote:

    >

    > I agree that adding new features in a major release is not forbidden, but 
that is just not the primary goal of a major release. If we reach the point 
where we are happy with the new public API before some new features are in a 
satisfactory state to be merged, then I don't want there to be a prior 
presumption that we cannot complete the primary goal of the major release. If 
at that point you want to argue that it is worth waiting for some new feature, 
then that would be fine and may have sufficient merits to warrant some delay.

    >

    > Regardless of whether significant new public API comes into a major 
release or a feature release, it should come in with an experimental annotation 
so that we can make changes without requiring a new major release.

    >

    > If you want to argue that some new features that are currently targeting 
3.0.0 are significant enough that one or more of them should justify an 
accelerated 3.1.0 release schedule if it is not ready in time for the 3.0.0 
release, then I can much more easily get behind that kind of commitment; but I 
remain opposed to the notion of promoting any new features to the status of 
blockers of 3.0.0 at this time.

    >

    > On Thu, Feb 28, 2019 at 10:23 AM Ryan Blue <[email protected]> wrote:

    >>

    >> Mark, I disagree. Setting common goals is a critical part of getting 
things done.

    >>

    >> This doesn't commit the community to push out the release if the goals 
aren't met, but does mean that we will, as a community, seriously consider it. 
This is also an acknowledgement that this is the most important feature in the 
next release (whether major or minor) for many of us. This has been in limbo 
for a very long time, so I think it is important for the community to commit to 
getting it to a functional state.

    >>

    >> It sounds like your objection is to this commitment for 3.0, but 
remember that 3.0 is the next release so that we can remove deprecated APIs. It 
does not mean that we aren't adding new features in that release and aren't 
considering other goals.

    >>

    >> On Thu, Feb 28, 2019 at 10:12 AM Mark Hamstra <[email protected]> 
wrote:

    >>>

    >>> Then I'm -1. Setting new features as blockers of major releases is not 
proper project management, IMO.

    >>>

    >>> On Thu, Feb 28, 2019 at 10:06 AM Ryan Blue <[email protected]> wrote:

    >>>>

    >>>> Mark, if this goal is adopted, "we" is the Apache Spark community.

    >>>>

    >>>> On Thu, Feb 28, 2019 at 9:52 AM Mark Hamstra <[email protected]> 
wrote:

    >>>>>

    >>>>> Who is "we" in these statements, such as "we should consider a 
functional DSv2 implementation a blocker for Spark 3.0"? If it means those 
contributing to the DSv2 effort want to set their own goals, milestones, etc., 
then that is fine with me. If you mean that the Apache Spark project should 
officially commit to the lack of a functional DSv2 implementation being a 
blocker for the release of Spark 3.0, then I'm -1. A major release is just not 
about adding new features. Rather, it is about making changes to the existing 
public API. As such, I'm opposed to any new feature or any API addition being 
considered a blocker of the 3.0.0 release.

    >>>>>

    >>>>>

    >>>>> On Thu, Feb 28, 2019 at 9:09 AM Matt Cheah <[email protected]> 
wrote:

    >>>>>>

    >>>>>> +1 (non-binding)

    >>>>>>

    >>>>>>

    >>>>>>

    >>>>>> Are identifiers and namespaces going to be rolled under one of those 
six points?

    >>>>>>

    >>>>>>

    >>>>>>

    >>>>>> From: Ryan Blue <[email protected]>

    >>>>>> Reply-To: "[email protected]" <[email protected]>

    >>>>>> Date: Thursday, February 28, 2019 at 8:39 AM

    >>>>>> To: Spark Dev List <[email protected]>

    >>>>>> Subject: [VOTE] Functional DataSourceV2 in Spark 3.0

    >>>>>>

    >>>>>>

    >>>>>>

    >>>>>> I’d like to call a vote for committing to getting DataSourceV2 in a 
functional state for Spark 3.0.

    >>>>>>

    >>>>>> For more context, please see the discussion thread, but here is a 
quick summary about what this commitment means:

    >>>>>>

    >>>>>> ·         We think that a “functional DSv2” is an achievable goal 
for the Spark 3.0 release

    >>>>>>

    >>>>>> ·         We will consider this a blocker for Spark 3.0, and take 
reasonable steps to make it happen

    >>>>>>

    >>>>>> ·         We will not delay the release without a community 
discussion

    >>>>>>

    >>>>>> Here’s what we’ve defined as a functional DSv2:

    >>>>>>

    >>>>>> ·         Add a plugin system for catalogs

    >>>>>>

    >>>>>> ·         Add an interface for table catalogs (see the ongoing SPIP 
vote)

    >>>>>>

    >>>>>> ·         Add an implementation of the new interface that calls 
SessionCatalog to load v2 tables

    >>>>>>

    >>>>>> ·         Add a resolution rule to load v2 tables from the v2 catalog

    >>>>>>

    >>>>>> ·         Add CTAS logical and physical plan nodes

    >>>>>>

    >>>>>> ·         Add conversions from SQL parsed plans to v2 logical plans 
(e.g., INSERT INTO support)

    >>>>>>

    >>>>>> Please vote in the next 3 days on whether you agree with committing 
to this goal.

    >>>>>>

    >>>>>> [ ] +1: Agree that we should consider a functional DSv2 
implementation a blocker for Spark 3.0

    >>>>>> [ ] +0: . . .

    >>>>>> [ ] -1: I disagree with this goal because . . .

    >>>>>>

    >>>>>> Thank you!

    >>>>>>

    >>>>>> --

    >>>>>>

    >>>>>> Ryan Blue

    >>>>>>

    >>>>>> Software Engineer

    >>>>>>

    >>>>>> Netflix

    >>>>

    >>>>

    >>>>

    >>>> --

    >>>> Ryan Blue

    >>>> Software Engineer

    >>>> Netflix

    >>

    >>

    >>

    >> --

    >> Ryan Blue

    >> Software Engineer

    >> Netflix

smime.p7s
Description: S/MIME cryptographic signature

Re: [VOTE] Functional DataSourceV2 in Spark 3.0

Reply via email to