Re: [VOTE] Functional DataSourceV2 in Spark 3.0

Ryan Blue Thu, 28 Feb 2019 13:24:11 -0800

The question is, what does it bind?

I’m not pushing for a binding statement to do this or delay the 3.0 release
because I don’t think that’s a very reasonable thing to do. It may well be
that there is a good reason for missing the goal.


So “what does it bind?” is an apt question.

A commitment binds us to do this and make a reasonable attempt at finishing
on time. If we choose not to commit, or if we choose to commit and don’t
make a reasonable attempt, then we need to ask, “what happened?” Is Spark
the right place for this work?

What I don’t want is to work on it for 3-4 more months, miss the release,
and then not have anyone take that problem seriously because we never said
it was important. If we try and fail, then we need to fix what went wrong.
This removes the option to pretend it wasn’t a goal in the first place.
That’s why I think it is important that we make a statement that we, the
community, intend to do it.

On Thu, Feb 28, 2019 at 11:48 AM Sean Owen <[email protected]> wrote:

> This is a fine thing to VOTE on. Committers (and community,
> non-binding) can VOTE on what we like; we just don't do it often where
> not required because it's a) overkill overhead over simple lazy
> consensus, and b) it can be hard to say what the binding VOTE binds if
> it's not a discrete commit or release. This is a big enough deal that
> it's not overkill. The question is, what does it bind?
>
> It means the release is definitely blocked until the items here are
> done, but, what's 'done'? It will return to the same questions already
> on the table, like do we need to define just APIs, and to what degree
> of stability. At worst it might not resolve anything.
>
> I don't see much harm in nailing down what appears to be agreement at
> the level of specific goals, even if this isn't a vote on a release
> date or specific commit. I think it's clear these items must be
> resolved to the level of semi-stable API by 3.0, as it's coming soon
> and this is the right time to establish these APIs. It might provide
> necessary clarity and constraints to get it over the line.
>
> To Mark -- yeah, this is asserting that DSv2 is a primary or necessary
> goal of the release, just like a "Blocker" does. Why would this
> argument be different or better if it waited until 3.0 was imminent? I
> get that one might say, well, we ended up working on more important
> stuff in the meantime and now we don't have time. But this VOTE's
> purpose is to declare that this is the important stuff now.
>
> To Jose -- what's the "just a few PRs in review" issue? you worry that
> we might rush DSv2 at the end to meet a deadline? all the better to,
> if anything, agree it's important now. It's also an agreement to delay
> the release for it, not rush it. I don't see that later is a better
> time to make the decision, if rush is a worry?
>
> Given my definition, and understanding of the issues, I'd say +1
>
> On Thu, Feb 28, 2019 at 12:24 PM Ryan Blue <[email protected]>
> wrote:
> >
> > Mark, I disagree. Setting common goals is a critical part of getting
> things done.
> >
> > This doesn't commit the community to push out the release if the goals
> aren't met, but does mean that we will, as a community, seriously consider
> it. This is also an acknowledgement that this is the most important feature
> in the next release (whether major or minor) for many of us. This has been
> in limbo for a very long time, so I think it is important for the community
> to commit to getting it to a functional state.
> >
> > It sounds like your objection is to this commitment for 3.0, but
> remember that 3.0 is the next release so that we can remove deprecated
> APIs. It does not mean that we aren't adding new features in that release
> and aren't considering other goals.
> >
> > On Thu, Feb 28, 2019 at 10:12 AM Mark Hamstra <[email protected]>
> wrote:
> >>
> >> Then I'm -1. Setting new features as blockers of major releases is not
> proper project management, IMO.
> >>
> >> On Thu, Feb 28, 2019 at 10:06 AM Ryan Blue <[email protected]> wrote:
> >>>
> >>> Mark, if this goal is adopted, "we" is the Apache Spark community.
> >>>
> >>> On Thu, Feb 28, 2019 at 9:52 AM Mark Hamstra <[email protected]>
> wrote:
> >>>>
> >>>> Who is "we" in these statements, such as "we should consider a
> functional DSv2 implementation a blocker for Spark 3.0"? If it means those
> contributing to the DSv2 effort want to set their own goals, milestones,
> etc., then that is fine with me. If you mean that the Apache Spark project
> should officially commit to the lack of a functional DSv2 implementation
> being a blocker for the release of Spark 3.0, then I'm -1. A major release
> is just not about adding new features. Rather, it is about making changes
> to the existing public API. As such, I'm opposed to any new feature or any
> API addition being considered a blocker of the 3.0.0 release.
> >>>>
> >>>>
> >>>> On Thu, Feb 28, 2019 at 9:09 AM Matt Cheah <[email protected]>
> wrote:
> >>>>>
> >>>>> +1 (non-binding)
> >>>>>
> >>>>>
> >>>>>
> >>>>> Are identifiers and namespaces going to be rolled under one of those
> six points?
> >>>>>
> >>>>>
> >>>>>
> >>>>> From: Ryan Blue <[email protected]>
> >>>>> Reply-To: "[email protected]" <[email protected]>
> >>>>> Date: Thursday, February 28, 2019 at 8:39 AM
> >>>>> To: Spark Dev List <[email protected]>
> >>>>> Subject: [VOTE] Functional DataSourceV2 in Spark 3.0
> >>>>>
> >>>>>
> >>>>>
> >>>>> I’d like to call a vote for committing to getting DataSourceV2 in a
> functional state for Spark 3.0.
> >>>>>
> >>>>> For more context, please see the discussion thread, but here is a
> quick summary about what this commitment means:
> >>>>>
> >>>>> ·         We think that a “functional DSv2” is an achievable goal
> for the Spark 3.0 release
> >>>>>
> >>>>> ·         We will consider this a blocker for Spark 3.0, and take
> reasonable steps to make it happen
> >>>>>
> >>>>> ·         We will not delay the release without a community
> discussion
> >>>>>
> >>>>> Here’s what we’ve defined as a functional DSv2:
> >>>>>
> >>>>> ·         Add a plugin system for catalogs
> >>>>>
> >>>>> ·         Add an interface for table catalogs (see the ongoing SPIP
> vote)
> >>>>>
> >>>>> ·         Add an implementation of the new interface that calls
> SessionCatalog to load v2 tables
> >>>>>
> >>>>> ·         Add a resolution rule to load v2 tables from the v2 catalog
> >>>>>
> >>>>> ·         Add CTAS logical and physical plan nodes
> >>>>>
> >>>>> ·         Add conversions from SQL parsed plans to v2 logical plans
> (e.g., INSERT INTO support)
> >>>>>
> >>>>> Please vote in the next 3 days on whether you agree with committing
> to this goal.
> >>>>>
> >>>>> [ ] +1: Agree that we should consider a functional DSv2
> implementation a blocker for Spark 3.0
> >>>>> [ ] +0: . . .
> >>>>> [ ] -1: I disagree with this goal because . . .
> >>>>>
> >>>>> Thank you!
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Ryan Blue
> >>>>>
> >>>>> Software Engineer
> >>>>>
> >>>>> Netflix
> >>>
> >>>
> >>>
> >>> --
> >>> Ryan Blue
> >>> Software Engineer
> >>> Netflix
> >
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: [VOTE] Functional DataSourceV2 in Spark 3.0

Reply via email to