Re: [RESULT] [VOTE] Functional DataSourceV2 in Spark 3.0

2019-03-03 Thread Mark Hamstra
No, it is not at all dead! There just isn't any kind of expectation or
commitment that the 3.0.0 release will be held up in any way if DSv2 is not
ready to go when the rest of 3.0.0 is. There is nothing new preventing
continued work on DSv2 or its eventual inclusion in a release.

On Sun, Mar 3, 2019 at 1:36 PM Jean Georges Perrin  wrote:

> Hi, I am kind of new at the whole Apache process (not specifically Spark).
> Does that means that the DataSourceV2 is dead or stays experimental? Thanks
> for clarifying for a newbie.
>
> jg
>
>
> On Mar 3, 2019, at 11:21, Ryan Blue  wrote:
>
> This vote fails with the following counts:
>
> 3 +1 votes:
>
>- Matt Cheah
>- Ryan Blue
>- Sean Owen (binding)
>
> 1 -0 vote:
>
>- Jose Torres
>
> 2 -1 votes:
>
>- Mark Hamstra (binding)
>- Midrul Muralidharan (binding)
>
> Thanks for the discussion, everyone, It sounds to me that the main
> objection is simply that we’ve already committed to a release that removes
> deprecated APIs and we don’t want to commit to features at the same time.
> While I’m a bit disappointed, I think that’s a reasonable position for the
> community to take and at least is a clear result.
>
> rb
>
> On Thu, Feb 28, 2019 at 8:38 AM Ryan Blue rb...@netflix.com
>  wrote:
>
> I’d like to call a vote for committing to getting DataSourceV2 in a
>> functional state for Spark 3.0.
>>
>> For more context, please see the discussion thread, but here is a quick
>> summary about what this commitment means:
>>
>>- We think that a “functional DSv2” is an achievable goal for the
>>Spark 3.0 release
>>- We will consider this a blocker for Spark 3.0, and take reasonable
>>steps to make it happen
>>- We will *not* delay the release without a community discussion
>>
>> Here’s what we’ve defined as a functional DSv2:
>>
>>- Add a plugin system for catalogs
>>- Add an interface for table catalogs (see the ongoing SPIP vote)
>>- Add an implementation of the new interface that calls
>>SessionCatalog to load v2 tables
>>- Add a resolution rule to load v2 tables from the v2 catalog
>>- Add CTAS logical and physical plan nodes
>>- Add conversions from SQL parsed plans to v2 logical plans (e.g.,
>>INSERT INTO support)
>>
>> Please vote in the next 3 days on whether you agree with committing to
>> this goal.
>>
>> [ ] +1: Agree that we should consider a functional DSv2 implementation a
>> blocker for Spark 3.0
>> [ ] +0: . . .
>> [ ] -1: I disagree with this goal because . . .
>>
>> Thank you!
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
> --
> Ryan Blue
> Software Engineer
> Netflix
>
>


Re: [RESULT] [VOTE] Functional DataSourceV2 in Spark 3.0

2019-03-03 Thread Jean Georges Perrin
Hi, I am kind of new at the whole Apache process (not specifically Spark). Does 
that means that the DataSourceV2 is dead or stays experimental? Thanks for 
clarifying for a newbie. 

jg


> On Mar 3, 2019, at 11:21, Ryan Blue  wrote:
> 
> This vote fails with the following counts:
> 
> 3 +1 votes:
> 
> Matt Cheah
> Ryan Blue
> Sean Owen (binding)
> 1 -0 vote:
> 
> Jose Torres
> 2 -1 votes:
> 
> Mark Hamstra (binding)
> Midrul Muralidharan (binding)
> Thanks for the discussion, everyone, It sounds to me that the main objection 
> is simply that we’ve already committed to a release that removes deprecated 
> APIs and we don’t want to commit to features at the same time. While I’m a 
> bit disappointed, I think that’s a reasonable position for the community to 
> take and at least is a clear result.
> 
> rb
> 
>> On Thu, Feb 28, 2019 at 8:38 AM Ryan Blue rb...@netflix.com wrote:
>> 
>> I’d like to call a vote for committing to getting DataSourceV2 in a 
>> functional state for Spark 3.0.
>> 
>> For more context, please see the discussion thread, but here is a quick 
>> summary about what this commitment means:
>> 
>> We think that a “functional DSv2” is an achievable goal for the Spark 3.0 
>> release
>> We will consider this a blocker for Spark 3.0, and take reasonable steps to 
>> make it happen
>> We will not delay the release without a community discussion
>> Here’s what we’ve defined as a functional DSv2:
>> 
>> Add a plugin system for catalogs
>> Add an interface for table catalogs (see the ongoing SPIP vote)
>> Add an implementation of the new interface that calls SessionCatalog to load 
>> v2 tables
>> Add a resolution rule to load v2 tables from the v2 catalog
>> Add CTAS logical and physical plan nodes
>> Add conversions from SQL parsed plans to v2 logical plans (e.g., INSERT INTO 
>> support)
>> Please vote in the next 3 days on whether you agree with committing to this 
>> goal.
>> 
>> [ ] +1: Agree that we should consider a functional DSv2 implementation a 
>> blocker for Spark 3.0
>> [ ] +0: . . .
>> [ ] -1: I disagree with this goal because . . .
>> 
>> Thank you!
>> 
>> -- 
>> Ryan Blue
>> Software Engineer
>> Netflix
> 
> -- 
> Ryan Blue
> Software Engineer
> Netflix


[RESULT] [VOTE] Functional DataSourceV2 in Spark 3.0

2019-03-03 Thread Ryan Blue
This vote fails with the following counts:

3 +1 votes:

   - Matt Cheah
   - Ryan Blue
   - Sean Owen (binding)

1 -0 vote:

   - Jose Torres

2 -1 votes:

   - Mark Hamstra (binding)
   - Midrul Muralidharan (binding)

Thanks for the discussion, everyone, It sounds to me that the main
objection is simply that we’ve already committed to a release that removes
deprecated APIs and we don’t want to commit to features at the same time.
While I’m a bit disappointed, I think that’s a reasonable position for the
community to take and at least is a clear result.

rb

On Thu, Feb 28, 2019 at 8:38 AM Ryan Blue rb...@netflix.com
 wrote:

I’d like to call a vote for committing to getting DataSourceV2 in a
> functional state for Spark 3.0.
>
> For more context, please see the discussion thread, but here is a quick
> summary about what this commitment means:
>
>- We think that a “functional DSv2” is an achievable goal for the
>Spark 3.0 release
>- We will consider this a blocker for Spark 3.0, and take reasonable
>steps to make it happen
>- We will *not* delay the release without a community discussion
>
> Here’s what we’ve defined as a functional DSv2:
>
>- Add a plugin system for catalogs
>- Add an interface for table catalogs (see the ongoing SPIP vote)
>- Add an implementation of the new interface that calls SessionCatalog
>to load v2 tables
>- Add a resolution rule to load v2 tables from the v2 catalog
>- Add CTAS logical and physical plan nodes
>- Add conversions from SQL parsed plans to v2 logical plans (e.g.,
>INSERT INTO support)
>
> Please vote in the next 3 days on whether you agree with committing to
> this goal.
>
> [ ] +1: Agree that we should consider a functional DSv2 implementation a
> blocker for Spark 3.0
> [ ] +0: . . .
> [ ] -1: I disagree with this goal because . . .
>
> Thank you!
> --
> Ryan Blue
> Software Engineer
> Netflix
>
-- 
Ryan Blue
Software Engineer
Netflix


Re: [VOTE] Functional DataSourceV2 in Spark 3.0

2019-02-28 Thread Matt Cheah
hat we aren't adding new features in that release and aren't 
considering other goals.

    >>

    >> On Thu, Feb 28, 2019 at 10:12 AM Mark Hamstra  
wrote:

    >>>

    >>> Then I'm -1. Setting new features as blockers of major releases is not 
proper project management, IMO.

    >>>

    >>> On Thu, Feb 28, 2019 at 10:06 AM Ryan Blue  wrote:

    >>>>

    >>>> Mark, if this goal is adopted, "we" is the Apache Spark community.

    >>>>

    >>>> On Thu, Feb 28, 2019 at 9:52 AM Mark Hamstra  
wrote:

    >>>>>

    >>>>> Who is "we" in these statements, such as "we should consider a 
functional DSv2 implementation a blocker for Spark 3.0"? If it means those 
contributing to the DSv2 effort want to set their own goals, milestones, etc., 
then that is fine with me. If you mean that the Apache Spark project should 
officially commit to the lack of a functional DSv2 implementation being a 
blocker for the release of Spark 3.0, then I'm -1. A major release is just not 
about adding new features. Rather, it is about making changes to the existing 
public API. As such, I'm opposed to any new feature or any API addition being 
considered a blocker of the 3.0.0 release.

    >>>>>

    >>>>>

    >>>>> On Thu, Feb 28, 2019 at 9:09 AM Matt Cheah  
wrote:

    >>>>>>

    >>>>>> +1 (non-binding)

    >>>>>>

    >>>>>>

    >>>>>>

    >>>>>> Are identifiers and namespaces going to be rolled under one of those 
six points?

    >>>>>>

    >>>>>>

    >>>>>>

    >>>>>> From: Ryan Blue 

    >>>>>> Reply-To: "rb...@netflix.com" 

    >>>>>> Date: Thursday, February 28, 2019 at 8:39 AM

    >>>>>> To: Spark Dev List 

    >>>>>> Subject: [VOTE] Functional DataSourceV2 in Spark 3.0

    >>>>>>

    >>>>>>

    >>>>>>

    >>>>>> I’d like to call a vote for committing to getting DataSourceV2 in a 
functional state for Spark 3.0.

    >>>>>>

    >>>>>> For more context, please see the discussion thread, but here is a 
quick summary about what this commitment means:

    >>>>>>

    >>>>>> · We think that a “functional DSv2” is an achievable goal 
for the Spark 3.0 release

    >>>>>>

    >>>>>> · We will consider this a blocker for Spark 3.0, and take 
reasonable steps to make it happen

    >>>>>>

    >>>>>> · We will not delay the release without a community 
discussion

    >>>>>>

    >>>>>> Here’s what we’ve defined as a functional DSv2:

    >>>>>>

    >>>>>> · Add a plugin system for catalogs

    >>>>>>

    >>>>>> · Add an interface for table catalogs (see the ongoing SPIP 
vote)

    >>>>>>

    >>>>>> · Add an implementation of the new interface that calls 
SessionCatalog to load v2 tables

    >>>>>>

    >>>>>> ·     Add a resolution rule to load v2 tables from the v2 catalog

    >>>>>>

    >>>>>> · Add CTAS logical and physical plan nodes

    >>>>>>

    >>>>>> · Add conversions from SQL parsed plans to v2 logical plans 
(e.g., INSERT INTO support)

    >>>>>>

    >>>>>> Please vote in the next 3 days on whether you agree with committing 
to this goal.

    >>>>>>

    >>>>>> [ ] +1: Agree that we should consider a functional DSv2 
implementation a blocker for Spark 3.0

    >>>>>> [ ] +0: . . .

    >>>>>> [ ] -1: I disagree with this goal because . . .

    >>>>>>

    >>>>>> Thank you!

    >>>>>>

    >>>>>> --

    >>>>>>

    >>>>>> Ryan Blue

    >>>>>>

    >>>>>> Software Engineer

    >>>>>>

    >>>>>> Netflix

    >>>>

    >>>>

    >>>>

    >>>> --

    >>>> Ryan Blue

    >>>> Software Engineer

    >>>> Netflix

    >>

    >>

    >>

    >> --

    >> Ryan Blue

    >> Software Engineer

    >> Netflix

    



smime.p7s
Description: S/MIME cryptographic signature


Re: [VOTE] Functional DataSourceV2 in Spark 3.0

2019-02-28 Thread Mridul Muralidharan
  I am -1 on this vote for pretty much all the reasons that Mark mentioned.
A major version change gives us an opportunity to remove deprecated
interfaces, stabilize experimental/developer api, drop support for
outdated functionality/platforms and evolve the project with a vision
for foreseeable future.
IMO the primary focus should be on interface evolution, stability and
lowering tech debt which might result in breaking changes.

Which is not to say DSv2 should not be part of 3.0
Along with a lot of other exciting features also being added, it can
be one more important enhancement.

But I am not for delaying the release simply to accommodate a specific feature.
Features can be added in subsequent as well - I am yet to hear of a
good reason why it must be make it into 3.0 to need a VOTE thread.

Regards,
Mridul

On Thu, Feb 28, 2019 at 10:44 AM Mark Hamstra  wrote:
>
> I agree that adding new features in a major release is not forbidden, but 
> that is just not the primary goal of a major release. If we reach the point 
> where we are happy with the new public API before some new features are in a 
> satisfactory state to be merged, then I don't want there to be a prior 
> presumption that we cannot complete the primary goal of the major release. If 
> at that point you want to argue that it is worth waiting for some new 
> feature, then that would be fine and may have sufficient merits to warrant 
> some delay.
>
> Regardless of whether significant new public API comes into a major release 
> or a feature release, it should come in with an experimental annotation so 
> that we can make changes without requiring a new major release.
>
> If you want to argue that some new features that are currently targeting 
> 3.0.0 are significant enough that one or more of them should justify an 
> accelerated 3.1.0 release schedule if it is not ready in time for the 3.0.0 
> release, then I can much more easily get behind that kind of commitment; but 
> I remain opposed to the notion of promoting any new features to the status of 
> blockers of 3.0.0 at this time.
>
> On Thu, Feb 28, 2019 at 10:23 AM Ryan Blue  wrote:
>>
>> Mark, I disagree. Setting common goals is a critical part of getting things 
>> done.
>>
>> This doesn't commit the community to push out the release if the goals 
>> aren't met, but does mean that we will, as a community, seriously consider 
>> it. This is also an acknowledgement that this is the most important feature 
>> in the next release (whether major or minor) for many of us. This has been 
>> in limbo for a very long time, so I think it is important for the community 
>> to commit to getting it to a functional state.
>>
>> It sounds like your objection is to this commitment for 3.0, but remember 
>> that 3.0 is the next release so that we can remove deprecated APIs. It does 
>> not mean that we aren't adding new features in that release and aren't 
>> considering other goals.
>>
>> On Thu, Feb 28, 2019 at 10:12 AM Mark Hamstra  
>> wrote:
>>>
>>> Then I'm -1. Setting new features as blockers of major releases is not 
>>> proper project management, IMO.
>>>
>>> On Thu, Feb 28, 2019 at 10:06 AM Ryan Blue  wrote:
>>>>
>>>> Mark, if this goal is adopted, "we" is the Apache Spark community.
>>>>
>>>> On Thu, Feb 28, 2019 at 9:52 AM Mark Hamstra  
>>>> wrote:
>>>>>
>>>>> Who is "we" in these statements, such as "we should consider a functional 
>>>>> DSv2 implementation a blocker for Spark 3.0"? If it means those 
>>>>> contributing to the DSv2 effort want to set their own goals, milestones, 
>>>>> etc., then that is fine with me. If you mean that the Apache Spark 
>>>>> project should officially commit to the lack of a functional DSv2 
>>>>> implementation being a blocker for the release of Spark 3.0, then I'm -1. 
>>>>> A major release is just not about adding new features. Rather, it is 
>>>>> about making changes to the existing public API. As such, I'm opposed to 
>>>>> any new feature or any API addition being considered a blocker of the 
>>>>> 3.0.0 release.
>>>>>
>>>>>
>>>>> On Thu, Feb 28, 2019 at 9:09 AM Matt Cheah  wrote:
>>>>>>
>>>>>> +1 (non-binding)
>>>>>>
>>>>>>
>>>>>>
>>>>>> Are identifiers and namespaces going to be rolled under one of those six 
>>>>>> points?
>>>>>>
>>>>>

Re: [VOTE] Functional DataSourceV2 in Spark 3.0

2019-02-28 Thread Joseph Torres
the existing public API. As such, I'm opposed to any new feature or any
> API addition being considered a blocker of the 3.0.0 release.
> >>>>
> >>>>
> >>>> On Thu, Feb 28, 2019 at 9:09 AM Matt Cheah 
> wrote:
> >>>>>
> >>>>> +1 (non-binding)
> >>>>>
> >>>>>
> >>>>>
> >>>>> Are identifiers and namespaces going to be rolled under one of those
> six points?
> >>>>>
> >>>>>
> >>>>>
> >>>>> From: Ryan Blue 
> >>>>> Reply-To: "rb...@netflix.com" 
> >>>>> Date: Thursday, February 28, 2019 at 8:39 AM
> >>>>> To: Spark Dev List 
> >>>>> Subject: [VOTE] Functional DataSourceV2 in Spark 3.0
> >>>>>
> >>>>>
> >>>>>
> >>>>> I’d like to call a vote for committing to getting DataSourceV2 in a
> functional state for Spark 3.0.
> >>>>>
> >>>>> For more context, please see the discussion thread, but here is a
> quick summary about what this commitment means:
> >>>>>
> >>>>> · We think that a “functional DSv2” is an achievable goal
> for the Spark 3.0 release
> >>>>>
> >>>>> · We will consider this a blocker for Spark 3.0, and take
> reasonable steps to make it happen
> >>>>>
> >>>>> · We will not delay the release without a community
> discussion
> >>>>>
> >>>>> Here’s what we’ve defined as a functional DSv2:
> >>>>>
> >>>>> · Add a plugin system for catalogs
> >>>>>
> >>>>> · Add an interface for table catalogs (see the ongoing SPIP
> vote)
> >>>>>
> >>>>> · Add an implementation of the new interface that calls
> SessionCatalog to load v2 tables
> >>>>>
> >>>>> · Add a resolution rule to load v2 tables from the v2 catalog
> >>>>>
> >>>>> · Add CTAS logical and physical plan nodes
> >>>>>
> >>>>> · Add conversions from SQL parsed plans to v2 logical plans
> (e.g., INSERT INTO support)
> >>>>>
> >>>>> Please vote in the next 3 days on whether you agree with committing
> to this goal.
> >>>>>
> >>>>> [ ] +1: Agree that we should consider a functional DSv2
> implementation a blocker for Spark 3.0
> >>>>> [ ] +0: . . .
> >>>>> [ ] -1: I disagree with this goal because . . .
> >>>>>
> >>>>> Thank you!
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Ryan Blue
> >>>>>
> >>>>> Software Engineer
> >>>>>
> >>>>> Netflix
> >>>
> >>>
> >>>
> >>> --
> >>> Ryan Blue
> >>> Software Engineer
> >>> Netflix
> >
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Functional DataSourceV2 in Spark 3.0

2019-02-28 Thread Ryan Blue
> being a blocker for the release of Spark 3.0, then I'm -1. A major release
> is just not about adding new features. Rather, it is about making changes
> to the existing public API. As such, I'm opposed to any new feature or any
> API addition being considered a blocker of the 3.0.0 release.
> >>>>
> >>>>
> >>>> On Thu, Feb 28, 2019 at 9:09 AM Matt Cheah 
> wrote:
> >>>>>
> >>>>> +1 (non-binding)
> >>>>>
> >>>>>
> >>>>>
> >>>>> Are identifiers and namespaces going to be rolled under one of those
> six points?
> >>>>>
> >>>>>
> >>>>>
> >>>>> From: Ryan Blue 
> >>>>> Reply-To: "rb...@netflix.com" 
> >>>>> Date: Thursday, February 28, 2019 at 8:39 AM
> >>>>> To: Spark Dev List 
> >>>>> Subject: [VOTE] Functional DataSourceV2 in Spark 3.0
> >>>>>
> >>>>>
> >>>>>
> >>>>> I’d like to call a vote for committing to getting DataSourceV2 in a
> functional state for Spark 3.0.
> >>>>>
> >>>>> For more context, please see the discussion thread, but here is a
> quick summary about what this commitment means:
> >>>>>
> >>>>> · We think that a “functional DSv2” is an achievable goal
> for the Spark 3.0 release
> >>>>>
> >>>>> · We will consider this a blocker for Spark 3.0, and take
> reasonable steps to make it happen
> >>>>>
> >>>>> · We will not delay the release without a community
> discussion
> >>>>>
> >>>>> Here’s what we’ve defined as a functional DSv2:
> >>>>>
> >>>>> · Add a plugin system for catalogs
> >>>>>
> >>>>> · Add an interface for table catalogs (see the ongoing SPIP
> vote)
> >>>>>
> >>>>> · Add an implementation of the new interface that calls
> SessionCatalog to load v2 tables
> >>>>>
> >>>>> · Add a resolution rule to load v2 tables from the v2 catalog
> >>>>>
> >>>>> · Add CTAS logical and physical plan nodes
> >>>>>
> >>>>> · Add conversions from SQL parsed plans to v2 logical plans
> (e.g., INSERT INTO support)
> >>>>>
> >>>>> Please vote in the next 3 days on whether you agree with committing
> to this goal.
> >>>>>
> >>>>> [ ] +1: Agree that we should consider a functional DSv2
> implementation a blocker for Spark 3.0
> >>>>> [ ] +0: . . .
> >>>>> [ ] -1: I disagree with this goal because . . .
> >>>>>
> >>>>> Thank you!
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Ryan Blue
> >>>>>
> >>>>> Software Engineer
> >>>>>
> >>>>> Netflix
> >>>
> >>>
> >>>
> >>> --
> >>> Ryan Blue
> >>> Software Engineer
> >>> Netflix
> >
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
>


-- 
Ryan Blue
Software Engineer
Netflix


Re: [VOTE] Functional DataSourceV2 in Spark 3.0

2019-02-28 Thread Sean Owen
This is a fine thing to VOTE on. Committers (and community,
non-binding) can VOTE on what we like; we just don't do it often where
not required because it's a) overkill overhead over simple lazy
consensus, and b) it can be hard to say what the binding VOTE binds if
it's not a discrete commit or release. This is a big enough deal that
it's not overkill. The question is, what does it bind?

It means the release is definitely blocked until the items here are
done, but, what's 'done'? It will return to the same questions already
on the table, like do we need to define just APIs, and to what degree
of stability. At worst it might not resolve anything.

I don't see much harm in nailing down what appears to be agreement at
the level of specific goals, even if this isn't a vote on a release
date or specific commit. I think it's clear these items must be
resolved to the level of semi-stable API by 3.0, as it's coming soon
and this is the right time to establish these APIs. It might provide
necessary clarity and constraints to get it over the line.

To Mark -- yeah, this is asserting that DSv2 is a primary or necessary
goal of the release, just like a "Blocker" does. Why would this
argument be different or better if it waited until 3.0 was imminent? I
get that one might say, well, we ended up working on more important
stuff in the meantime and now we don't have time. But this VOTE's
purpose is to declare that this is the important stuff now.

To Jose -- what's the "just a few PRs in review" issue? you worry that
we might rush DSv2 at the end to meet a deadline? all the better to,
if anything, agree it's important now. It's also an agreement to delay
the release for it, not rush it. I don't see that later is a better
time to make the decision, if rush is a worry?

Given my definition, and understanding of the issues, I'd say +1

On Thu, Feb 28, 2019 at 12:24 PM Ryan Blue  wrote:
>
> Mark, I disagree. Setting common goals is a critical part of getting things 
> done.
>
> This doesn't commit the community to push out the release if the goals aren't 
> met, but does mean that we will, as a community, seriously consider it. This 
> is also an acknowledgement that this is the most important feature in the 
> next release (whether major or minor) for many of us. This has been in limbo 
> for a very long time, so I think it is important for the community to commit 
> to getting it to a functional state.
>
> It sounds like your objection is to this commitment for 3.0, but remember 
> that 3.0 is the next release so that we can remove deprecated APIs. It does 
> not mean that we aren't adding new features in that release and aren't 
> considering other goals.
>
> On Thu, Feb 28, 2019 at 10:12 AM Mark Hamstra  wrote:
>>
>> Then I'm -1. Setting new features as blockers of major releases is not 
>> proper project management, IMO.
>>
>> On Thu, Feb 28, 2019 at 10:06 AM Ryan Blue  wrote:
>>>
>>> Mark, if this goal is adopted, "we" is the Apache Spark community.
>>>
>>> On Thu, Feb 28, 2019 at 9:52 AM Mark Hamstra  
>>> wrote:
>>>>
>>>> Who is "we" in these statements, such as "we should consider a functional 
>>>> DSv2 implementation a blocker for Spark 3.0"? If it means those 
>>>> contributing to the DSv2 effort want to set their own goals, milestones, 
>>>> etc., then that is fine with me. If you mean that the Apache Spark project 
>>>> should officially commit to the lack of a functional DSv2 implementation 
>>>> being a blocker for the release of Spark 3.0, then I'm -1. A major release 
>>>> is just not about adding new features. Rather, it is about making changes 
>>>> to the existing public API. As such, I'm opposed to any new feature or any 
>>>> API addition being considered a blocker of the 3.0.0 release.
>>>>
>>>>
>>>> On Thu, Feb 28, 2019 at 9:09 AM Matt Cheah  wrote:
>>>>>
>>>>> +1 (non-binding)
>>>>>
>>>>>
>>>>>
>>>>> Are identifiers and namespaces going to be rolled under one of those six 
>>>>> points?
>>>>>
>>>>>
>>>>>
>>>>> From: Ryan Blue 
>>>>> Reply-To: "rb...@netflix.com" 
>>>>> Date: Thursday, February 28, 2019 at 8:39 AM
>>>>> To: Spark Dev List 
>>>>> Subject: [VOTE] Functional DataSourceV2 in Spark 3.0
>>>>>
>>>>>
>>>>>
>>>>> I’d like to call a vote for com

Re: [VOTE] Functional DataSourceV2 in Spark 3.0

2019-02-28 Thread Joseph Torres
I’m sure we, as a community, will seriously consider any proposal that
Spark would benefit if the PMC delays release X to include changes A, B, C.
Indeed, every release I remember has had a few iterations of “can we hold
the train for a bit because it would be super great to get this PR in”.

Many contributors (including me) do believe data source v2 should be done
by 3.0, can mark the appropriate JIRAs as blockers, and will at release
time argue in favor of holding the train for a week or two if that’s what’s
needed to get all the pieces on board.

What the vote seems to imply is that we will consider holding the release
even beyond the “just a few PRs in review” level, if there are serious
outstanding design or implementation questions. That’s not a judgment I
think we can make in advance. Is it better to delay Spark 3.0 by N months
or DSv2 by 6 months? Who knows - depends on the PMC’s priorities at the
time and how confident we are in the value of N.

On Thu, Feb 28, 2019 at 10:24 AM Ryan Blue 
wrote:

> Mark, I disagree. Setting common goals is a critical part of getting
> things done.
>
> This doesn't commit the community to push out the release if the goals
> aren't met, but does mean that we will, as a community, seriously consider
> it. This is also an acknowledgement that this is the most important feature
> in the next release (whether major or minor) for many of us. This has been
> in limbo for a very long time, so I think it is important for the community
> to commit to getting it to a functional state.
>
> It sounds like your objection is to this commitment for 3.0, but remember
> that 3.0 is the next release so that we can remove deprecated APIs. It does
> not mean that we aren't adding new features in that release and aren't
> considering other goals.
>
> On Thu, Feb 28, 2019 at 10:12 AM Mark Hamstra 
> wrote:
>
>> Then I'm -1. Setting new features as blockers of major releases is not
>> proper project management, IMO.
>>
>> On Thu, Feb 28, 2019 at 10:06 AM Ryan Blue  wrote:
>>
>>> Mark, if this goal is adopted, "we" is the Apache Spark community.
>>>
>>> On Thu, Feb 28, 2019 at 9:52 AM Mark Hamstra 
>>> wrote:
>>>
>>>> Who is "we" in these statements, such as "we should consider a
>>>> functional DSv2 implementation a blocker for Spark 3.0"? If it means those
>>>> contributing to the DSv2 effort want to set their own goals, milestones,
>>>> etc., then that is fine with me. If you mean that the Apache Spark project
>>>> should officially commit to the lack of a functional DSv2 implementation
>>>> being a blocker for the release of Spark 3.0, then I'm -1. A major release
>>>> is just not about adding new features. Rather, it is about making changes
>>>> to the existing public API. As such, I'm opposed to any new feature or any
>>>> API addition being considered a blocker of the 3.0.0 release.
>>>>
>>>>
>>>> On Thu, Feb 28, 2019 at 9:09 AM Matt Cheah  wrote:
>>>>
>>>>> +1 (non-binding)
>>>>>
>>>>>
>>>>>
>>>>> Are identifiers and namespaces going to be rolled under one of those
>>>>> six points?
>>>>>
>>>>>
>>>>>
>>>>> *From: *Ryan Blue 
>>>>> *Reply-To: *"rb...@netflix.com" 
>>>>> *Date: *Thursday, February 28, 2019 at 8:39 AM
>>>>> *To: *Spark Dev List 
>>>>> *Subject: *[VOTE] Functional DataSourceV2 in Spark 3.0
>>>>>
>>>>>
>>>>>
>>>>> I’d like to call a vote for committing to getting DataSourceV2 in a
>>>>> functional state for Spark 3.0.
>>>>>
>>>>> For more context, please see the discussion thread, but here is a
>>>>> quick summary about what this commitment means:
>>>>>
>>>>> · We think that a “functional DSv2” is an achievable goal for
>>>>> the Spark 3.0 release
>>>>>
>>>>> · We will consider this a blocker for Spark 3.0, and take
>>>>> reasonable steps to make it happen
>>>>>
>>>>> · We will *not* delay the release without a community
>>>>> discussion
>>>>>
>>>>> Here’s what we’ve defined as a functional DSv2:
>>>>>
>>>>> · Add a plugin system for catalogs
>>>>>
>>>>> · Add an interface for table catalogs (see the ongoing SPIP
>>>>> vote)
>>>>>
>>>>> · Add an implementation of the new interface that calls
>>>>> SessionCatalog to load v2 tables
>>>>>
>>>>> · Add a resolution rule to load v2 tables from the v2 catalog
>>>>>
>>>>> · Add CTAS logical and physical plan nodes
>>>>>
>>>>> · Add conversions from SQL parsed plans to v2 logical plans
>>>>> (e.g., INSERT INTO support)
>>>>>
>>>>> Please vote in the next 3 days on whether you agree with committing to
>>>>> this goal.
>>>>>
>>>>> [ ] +1: Agree that we should consider a functional DSv2 implementation
>>>>> a blocker for Spark 3.0
>>>>> [ ] +0: . . .
>>>>> [ ] -1: I disagree with this goal because . . .
>>>>>
>>>>> Thank you!
>>>>>
>>>>> --
>>>>>
>>>>> Ryan Blue
>>>>>
>>>>> Software Engineer
>>>>>
>>>>> Netflix
>>>>>
>>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>


Re: [VOTE] Functional DataSourceV2 in Spark 3.0

2019-02-28 Thread Mark Hamstra
I agree that adding new features in a major release is not forbidden, but
that is just not the primary goal of a major release. If we reach the point
where we are happy with the new public API before some new features are in
a satisfactory state to be merged, then I don't want there to be a prior
presumption that we cannot complete the primary goal of the major release.
If at that point you want to argue that it is worth waiting for some new
feature, then that would be fine and may have sufficient merits to warrant
some delay.

Regardless of whether significant new public API comes into a major release
or a feature release, it should come in with an experimental annotation so
that we can make changes without requiring a new major release.

If you want to argue that some new features that are currently targeting
3.0.0 are significant enough that one or more of them should justify an
accelerated 3.1.0 release schedule if it is not ready in time for the 3.0.0
release, then I can much more easily get behind that kind of commitment;
but I remain opposed to the notion of promoting any new features to the
status of blockers of 3.0.0 at this time.

On Thu, Feb 28, 2019 at 10:23 AM Ryan Blue  wrote:

> Mark, I disagree. Setting common goals is a critical part of getting
> things done.
>
> This doesn't commit the community to push out the release if the goals
> aren't met, but does mean that we will, as a community, seriously consider
> it. This is also an acknowledgement that this is the most important feature
> in the next release (whether major or minor) for many of us. This has been
> in limbo for a very long time, so I think it is important for the community
> to commit to getting it to a functional state.
>
> It sounds like your objection is to this commitment for 3.0, but remember
> that 3.0 is the next release so that we can remove deprecated APIs. It does
> not mean that we aren't adding new features in that release and aren't
> considering other goals.
>
> On Thu, Feb 28, 2019 at 10:12 AM Mark Hamstra 
> wrote:
>
>> Then I'm -1. Setting new features as blockers of major releases is not
>> proper project management, IMO.
>>
>> On Thu, Feb 28, 2019 at 10:06 AM Ryan Blue  wrote:
>>
>>> Mark, if this goal is adopted, "we" is the Apache Spark community.
>>>
>>> On Thu, Feb 28, 2019 at 9:52 AM Mark Hamstra 
>>> wrote:
>>>
>>>> Who is "we" in these statements, such as "we should consider a
>>>> functional DSv2 implementation a blocker for Spark 3.0"? If it means those
>>>> contributing to the DSv2 effort want to set their own goals, milestones,
>>>> etc., then that is fine with me. If you mean that the Apache Spark project
>>>> should officially commit to the lack of a functional DSv2 implementation
>>>> being a blocker for the release of Spark 3.0, then I'm -1. A major release
>>>> is just not about adding new features. Rather, it is about making changes
>>>> to the existing public API. As such, I'm opposed to any new feature or any
>>>> API addition being considered a blocker of the 3.0.0 release.
>>>>
>>>>
>>>> On Thu, Feb 28, 2019 at 9:09 AM Matt Cheah  wrote:
>>>>
>>>>> +1 (non-binding)
>>>>>
>>>>>
>>>>>
>>>>> Are identifiers and namespaces going to be rolled under one of those
>>>>> six points?
>>>>>
>>>>>
>>>>>
>>>>> *From: *Ryan Blue 
>>>>> *Reply-To: *"rb...@netflix.com" 
>>>>> *Date: *Thursday, February 28, 2019 at 8:39 AM
>>>>> *To: *Spark Dev List 
>>>>> *Subject: *[VOTE] Functional DataSourceV2 in Spark 3.0
>>>>>
>>>>>
>>>>>
>>>>> I’d like to call a vote for committing to getting DataSourceV2 in a
>>>>> functional state for Spark 3.0.
>>>>>
>>>>> For more context, please see the discussion thread, but here is a
>>>>> quick summary about what this commitment means:
>>>>>
>>>>> · We think that a “functional DSv2” is an achievable goal for
>>>>> the Spark 3.0 release
>>>>>
>>>>> · We will consider this a blocker for Spark 3.0, and take
>>>>> reasonable steps to make it happen
>>>>>
>>>>> · We will *not* delay the release without a community
>>>>> discussion
>>>>>
>>>>> Here’s what we’ve defined as a functional DSv2:
>>

Re: [VOTE] Functional DataSourceV2 in Spark 3.0

2019-02-28 Thread Ryan Blue
Mark, I disagree. Setting common goals is a critical part of getting things
done.

This doesn't commit the community to push out the release if the goals
aren't met, but does mean that we will, as a community, seriously consider
it. This is also an acknowledgement that this is the most important feature
in the next release (whether major or minor) for many of us. This has been
in limbo for a very long time, so I think it is important for the community
to commit to getting it to a functional state.

It sounds like your objection is to this commitment for 3.0, but remember
that 3.0 is the next release so that we can remove deprecated APIs. It does
not mean that we aren't adding new features in that release and aren't
considering other goals.

On Thu, Feb 28, 2019 at 10:12 AM Mark Hamstra 
wrote:

> Then I'm -1. Setting new features as blockers of major releases is not
> proper project management, IMO.
>
> On Thu, Feb 28, 2019 at 10:06 AM Ryan Blue  wrote:
>
>> Mark, if this goal is adopted, "we" is the Apache Spark community.
>>
>> On Thu, Feb 28, 2019 at 9:52 AM Mark Hamstra 
>> wrote:
>>
>>> Who is "we" in these statements, such as "we should consider a
>>> functional DSv2 implementation a blocker for Spark 3.0"? If it means those
>>> contributing to the DSv2 effort want to set their own goals, milestones,
>>> etc., then that is fine with me. If you mean that the Apache Spark project
>>> should officially commit to the lack of a functional DSv2 implementation
>>> being a blocker for the release of Spark 3.0, then I'm -1. A major release
>>> is just not about adding new features. Rather, it is about making changes
>>> to the existing public API. As such, I'm opposed to any new feature or any
>>> API addition being considered a blocker of the 3.0.0 release.
>>>
>>>
>>> On Thu, Feb 28, 2019 at 9:09 AM Matt Cheah  wrote:
>>>
>>>> +1 (non-binding)
>>>>
>>>>
>>>>
>>>> Are identifiers and namespaces going to be rolled under one of those
>>>> six points?
>>>>
>>>>
>>>>
>>>> *From: *Ryan Blue 
>>>> *Reply-To: *"rb...@netflix.com" 
>>>> *Date: *Thursday, February 28, 2019 at 8:39 AM
>>>> *To: *Spark Dev List 
>>>> *Subject: *[VOTE] Functional DataSourceV2 in Spark 3.0
>>>>
>>>>
>>>>
>>>> I’d like to call a vote for committing to getting DataSourceV2 in a
>>>> functional state for Spark 3.0.
>>>>
>>>> For more context, please see the discussion thread, but here is a quick
>>>> summary about what this commitment means:
>>>>
>>>> · We think that a “functional DSv2” is an achievable goal for
>>>> the Spark 3.0 release
>>>>
>>>> · We will consider this a blocker for Spark 3.0, and take
>>>> reasonable steps to make it happen
>>>>
>>>> · We will *not* delay the release without a community
>>>> discussion
>>>>
>>>> Here’s what we’ve defined as a functional DSv2:
>>>>
>>>> · Add a plugin system for catalogs
>>>>
>>>> · Add an interface for table catalogs (see the ongoing SPIP
>>>> vote)
>>>>
>>>> · Add an implementation of the new interface that calls
>>>> SessionCatalog to load v2 tables
>>>>
>>>> · Add a resolution rule to load v2 tables from the v2 catalog
>>>>
>>>> · Add CTAS logical and physical plan nodes
>>>>
>>>> · Add conversions from SQL parsed plans to v2 logical plans
>>>> (e.g., INSERT INTO support)
>>>>
>>>> Please vote in the next 3 days on whether you agree with committing to
>>>> this goal.
>>>>
>>>> [ ] +1: Agree that we should consider a functional DSv2 implementation
>>>> a blocker for Spark 3.0
>>>> [ ] +0: . . .
>>>> [ ] -1: I disagree with this goal because . . .
>>>>
>>>> Thank you!
>>>>
>>>> --
>>>>
>>>> Ryan Blue
>>>>
>>>> Software Engineer
>>>>
>>>> Netflix
>>>>
>>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>

-- 
Ryan Blue
Software Engineer
Netflix


Re: [VOTE] Functional DataSourceV2 in Spark 3.0

2019-02-28 Thread Ryan Blue
Mark, if this goal is adopted, "we" is the Apache Spark community.

On Thu, Feb 28, 2019 at 9:52 AM Mark Hamstra 
wrote:

> Who is "we" in these statements, such as "we should consider a functional
> DSv2 implementation a blocker for Spark 3.0"? If it means those
> contributing to the DSv2 effort want to set their own goals, milestones,
> etc., then that is fine with me. If you mean that the Apache Spark project
> should officially commit to the lack of a functional DSv2 implementation
> being a blocker for the release of Spark 3.0, then I'm -1. A major release
> is just not about adding new features. Rather, it is about making changes
> to the existing public API. As such, I'm opposed to any new feature or any
> API addition being considered a blocker of the 3.0.0 release.
>
>
> On Thu, Feb 28, 2019 at 9:09 AM Matt Cheah  wrote:
>
>> +1 (non-binding)
>>
>>
>>
>> Are identifiers and namespaces going to be rolled under one of those six
>> points?
>>
>>
>>
>> *From: *Ryan Blue 
>> *Reply-To: *"rb...@netflix.com" 
>> *Date: *Thursday, February 28, 2019 at 8:39 AM
>> *To: *Spark Dev List 
>> *Subject: *[VOTE] Functional DataSourceV2 in Spark 3.0
>>
>>
>>
>> I’d like to call a vote for committing to getting DataSourceV2 in a
>> functional state for Spark 3.0.
>>
>> For more context, please see the discussion thread, but here is a quick
>> summary about what this commitment means:
>>
>> · We think that a “functional DSv2” is an achievable goal for
>> the Spark 3.0 release
>>
>> · We will consider this a blocker for Spark 3.0, and take
>> reasonable steps to make it happen
>>
>> · We will *not* delay the release without a community discussion
>>
>> Here’s what we’ve defined as a functional DSv2:
>>
>> · Add a plugin system for catalogs
>>
>> · Add an interface for table catalogs (see the ongoing SPIP vote)
>>
>> · Add an implementation of the new interface that calls
>> SessionCatalog to load v2 tables
>>
>> · Add a resolution rule to load v2 tables from the v2 catalog
>>
>> · Add CTAS logical and physical plan nodes
>>
>> · Add conversions from SQL parsed plans to v2 logical plans
>> (e.g., INSERT INTO support)
>>
>> Please vote in the next 3 days on whether you agree with committing to
>> this goal.
>>
>> [ ] +1: Agree that we should consider a functional DSv2 implementation a
>> blocker for Spark 3.0
>> [ ] +0: . . .
>> [ ] -1: I disagree with this goal because . . .
>>
>> Thank you!
>>
>> --
>>
>> Ryan Blue
>>
>> Software Engineer
>>
>> Netflix
>>
>

-- 
Ryan Blue
Software Engineer
Netflix


Re: [VOTE] Functional DataSourceV2 in Spark 3.0

2019-02-28 Thread Mark Hamstra
Then I'm -1. Setting new features as blockers of major releases is not
proper project management, IMO.

On Thu, Feb 28, 2019 at 10:06 AM Ryan Blue  wrote:

> Mark, if this goal is adopted, "we" is the Apache Spark community.
>
> On Thu, Feb 28, 2019 at 9:52 AM Mark Hamstra 
> wrote:
>
>> Who is "we" in these statements, such as "we should consider a functional
>> DSv2 implementation a blocker for Spark 3.0"? If it means those
>> contributing to the DSv2 effort want to set their own goals, milestones,
>> etc., then that is fine with me. If you mean that the Apache Spark project
>> should officially commit to the lack of a functional DSv2 implementation
>> being a blocker for the release of Spark 3.0, then I'm -1. A major release
>> is just not about adding new features. Rather, it is about making changes
>> to the existing public API. As such, I'm opposed to any new feature or any
>> API addition being considered a blocker of the 3.0.0 release.
>>
>>
>> On Thu, Feb 28, 2019 at 9:09 AM Matt Cheah  wrote:
>>
>>> +1 (non-binding)
>>>
>>>
>>>
>>> Are identifiers and namespaces going to be rolled under one of those six
>>> points?
>>>
>>>
>>>
>>> *From: *Ryan Blue 
>>> *Reply-To: *"rb...@netflix.com" 
>>> *Date: *Thursday, February 28, 2019 at 8:39 AM
>>> *To: *Spark Dev List 
>>> *Subject: *[VOTE] Functional DataSourceV2 in Spark 3.0
>>>
>>>
>>>
>>> I’d like to call a vote for committing to getting DataSourceV2 in a
>>> functional state for Spark 3.0.
>>>
>>> For more context, please see the discussion thread, but here is a quick
>>> summary about what this commitment means:
>>>
>>> · We think that a “functional DSv2” is an achievable goal for
>>> the Spark 3.0 release
>>>
>>> · We will consider this a blocker for Spark 3.0, and take
>>> reasonable steps to make it happen
>>>
>>> · We will *not* delay the release without a community discussion
>>>
>>> Here’s what we’ve defined as a functional DSv2:
>>>
>>> · Add a plugin system for catalogs
>>>
>>> · Add an interface for table catalogs (see the ongoing SPIP
>>> vote)
>>>
>>> · Add an implementation of the new interface that calls
>>> SessionCatalog to load v2 tables
>>>
>>> · Add a resolution rule to load v2 tables from the v2 catalog
>>>
>>> · Add CTAS logical and physical plan nodes
>>>
>>> · Add conversions from SQL parsed plans to v2 logical plans
>>> (e.g., INSERT INTO support)
>>>
>>> Please vote in the next 3 days on whether you agree with committing to
>>> this goal.
>>>
>>> [ ] +1: Agree that we should consider a functional DSv2 implementation a
>>> blocker for Spark 3.0
>>> [ ] +0: . . .
>>> [ ] -1: I disagree with this goal because . . .
>>>
>>> Thank you!
>>>
>>> --
>>>
>>> Ryan Blue
>>>
>>> Software Engineer
>>>
>>> Netflix
>>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>


Re: [VOTE] Functional DataSourceV2 in Spark 3.0

2019-02-28 Thread Mark Hamstra
Who is "we" in these statements, such as "we should consider a functional
DSv2 implementation a blocker for Spark 3.0"? If it means those
contributing to the DSv2 effort want to set their own goals, milestones,
etc., then that is fine with me. If you mean that the Apache Spark project
should officially commit to the lack of a functional DSv2 implementation
being a blocker for the release of Spark 3.0, then I'm -1. A major release
is just not about adding new features. Rather, it is about making changes
to the existing public API. As such, I'm opposed to any new feature or any
API addition being considered a blocker of the 3.0.0 release.


On Thu, Feb 28, 2019 at 9:09 AM Matt Cheah  wrote:

> +1 (non-binding)
>
>
>
> Are identifiers and namespaces going to be rolled under one of those six
> points?
>
>
>
> *From: *Ryan Blue 
> *Reply-To: *"rb...@netflix.com" 
> *Date: *Thursday, February 28, 2019 at 8:39 AM
> *To: *Spark Dev List 
> *Subject: *[VOTE] Functional DataSourceV2 in Spark 3.0
>
>
>
> I’d like to call a vote for committing to getting DataSourceV2 in a
> functional state for Spark 3.0.
>
> For more context, please see the discussion thread, but here is a quick
> summary about what this commitment means:
>
> · We think that a “functional DSv2” is an achievable goal for the
> Spark 3.0 release
>
> · We will consider this a blocker for Spark 3.0, and take
> reasonable steps to make it happen
>
> · We will *not* delay the release without a community discussion
>
> Here’s what we’ve defined as a functional DSv2:
>
> · Add a plugin system for catalogs
>
> · Add an interface for table catalogs (see the ongoing SPIP vote)
>
> · Add an implementation of the new interface that calls
> SessionCatalog to load v2 tables
>
> · Add a resolution rule to load v2 tables from the v2 catalog
>
> · Add CTAS logical and physical plan nodes
>
> · Add conversions from SQL parsed plans to v2 logical plans
> (e.g., INSERT INTO support)
>
> Please vote in the next 3 days on whether you agree with committing to
> this goal.
>
> [ ] +1: Agree that we should consider a functional DSv2 implementation a
> blocker for Spark 3.0
> [ ] +0: . . .
> [ ] -1: I disagree with this goal because . . .
>
> Thank you!
>
> --
>
> Ryan Blue
>
> Software Engineer
>
> Netflix
>


Re: [VOTE] Functional DataSourceV2 in Spark 3.0

2019-02-28 Thread Matt Cheah
+1 (non-binding)

 

Are identifiers and namespaces going to be rolled under one of those six points?

 

From: Ryan Blue 
Reply-To: "rb...@netflix.com" 
Date: Thursday, February 28, 2019 at 8:39 AM
To: Spark Dev List 
Subject: [VOTE] Functional DataSourceV2 in Spark 3.0

 

I’d like to call a vote for committing to getting DataSourceV2 in a functional 
state for Spark 3.0.

For more context, please see the discussion thread, but here is a quick summary 
about what this commitment means:

· We think that a “functional DSv2” is an achievable goal for the Spark 
3.0 release

· We will consider this a blocker for Spark 3.0, and take reasonable 
steps to make it happen

· We will not delay the release without a community discussion

Here’s what we’ve defined as a functional DSv2:

· Add a plugin system for catalogs

· Add an interface for table catalogs (see the ongoing SPIP vote)

· Add an implementation of the new interface that calls SessionCatalog 
to load v2 tables

· Add a resolution rule to load v2 tables from the v2 catalog

· Add CTAS logical and physical plan nodes

· Add conversions from SQL parsed plans to v2 logical plans (e.g., 
INSERT INTO support)

Please vote in the next 3 days on whether you agree with committing to this 
goal.

[ ] +1: Agree that we should consider a functional DSv2 implementation a 
blocker for Spark 3.0
[ ] +0: . . .
[ ] -1: I disagree with this goal because . . .

Thank you!

-- 

Ryan Blue 

Software Engineer

Netflix



smime.p7s
Description: S/MIME cryptographic signature


[VOTE] Functional DataSourceV2 in Spark 3.0

2019-02-28 Thread Ryan Blue
I’d like to call a vote for committing to getting DataSourceV2 in a
functional state for Spark 3.0.

For more context, please see the discussion thread, but here is a quick
summary about what this commitment means:

   - We think that a “functional DSv2” is an achievable goal for the Spark
   3.0 release
   - We will consider this a blocker for Spark 3.0, and take reasonable
   steps to make it happen
   - We will *not* delay the release without a community discussion

Here’s what we’ve defined as a functional DSv2:

   - Add a plugin system for catalogs
   - Add an interface for table catalogs (see the ongoing SPIP vote)
   - Add an implementation of the new interface that calls SessionCatalog
   to load v2 tables
   - Add a resolution rule to load v2 tables from the v2 catalog
   - Add CTAS logical and physical plan nodes
   - Add conversions from SQL parsed plans to v2 logical plans (e.g.,
   INSERT INTO support)

Please vote in the next 3 days on whether you agree with committing to this
goal.

[ ] +1: Agree that we should consider a functional DSv2 implementation a
blocker for Spark 3.0
[ ] +0: . . .
[ ] -1: I disagree with this goal because . . .

Thank you!
-- 
Ryan Blue
Software Engineer
Netflix