Re: [VOTE] Functional DataSourceV2 in Spark 3.0
I want to specifically highlight and +1 a point that Ryan brought up: A commitment binds us to do this and make a reasonable attempt at finishing on time. If we choose not to commit, or if we choose to commit and don’t make a reasonable attempt, then we need to ask, “what happened?” Is Spark the right place for this work? What I don’t want is to work on it for 3-4 more months, miss the release, and then not have anyone take that problem seriously because we never said it was important. If we try and fail, then we need to fix what went wrong. This removes the option to pretend it wasn’t a goal in the first place. That’s why I think it is important that we make a statement that we, the community, intend to do it. This is the crux of the matter we want to tackle here. Whether or not we block the release is a decision we can make when we are closer to the release date. But the fact of the matter is that Data Source V2’s new APIs have not been given the prioritization and urgency that they deserve. This vote is binding us to consider Data Source V2 so important that it needs to be prioritized far more highly than it is right now, to the point where we would at least consider delaying the release if it meant we could finish the work. I also don’t quite follow the reason why we shouldn’t consider features to be as important to target as API breaks in major versions. When major versions of any software product are introduced, they certainly include API breaks as necessary, but they also add new features that give users incentive to upgrade in the first place. If all we do is introduce API breaks but no new features or critical bug fixes (and critical bug fixes are often severe enough that they’re backported to earlier branches anyways), what appeal is there for users to upgrade to that latest version? -Matt Cheah On 2/28/19, 1:37 PM, "Mridul Muralidharan" wrote: I am -1 on this vote for pretty much all the reasons that Mark mentioned. A major version change gives us an opportunity to remove deprecated interfaces, stabilize experimental/developer api, drop support for outdated functionality/platforms and evolve the project with a vision for foreseeable future. IMO the primary focus should be on interface evolution, stability and lowering tech debt which might result in breaking changes. Which is not to say DSv2 should not be part of 3.0 Along with a lot of other exciting features also being added, it can be one more important enhancement. But I am not for delaying the release simply to accommodate a specific feature. Features can be added in subsequent as well - I am yet to hear of a good reason why it must be make it into 3.0 to need a VOTE thread. Regards, Mridul On Thu, Feb 28, 2019 at 10:44 AM Mark Hamstra wrote: > > I agree that adding new features in a major release is not forbidden, but that is just not the primary goal of a major release. If we reach the point where we are happy with the new public API before some new features are in a satisfactory state to be merged, then I don't want there to be a prior presumption that we cannot complete the primary goal of the major release. If at that point you want to argue that it is worth waiting for some new feature, then that would be fine and may have sufficient merits to warrant some delay. > > Regardless of whether significant new public API comes into a major release or a feature release, it should come in with an experimental annotation so that we can make changes without requiring a new major release. > > If you want to argue that some new features that are currently targeting 3.0.0 are significant enough that one or more of them should justify an accelerated 3.1.0 release schedule if it is not ready in time for the 3.0.0 release, then I can much more easily get behind that kind of commitment; but I remain opposed to the notion of promoting any new features to the status of blockers of 3.0.0 at this time. > > On Thu, Feb 28, 2019 at 10:23 AM Ryan Blue wrote: >> >> Mark, I disagree. Setting common goals is a critical part of getting things done. >> >> This doesn't commit the community to push out the release if the goals aren't met, but does mean that we will, as a community, seriously consider it. This is also an acknowledgement that this is the most important feature in the next release (whether major or minor) for many of us. This has been in limbo for a very long time, so I think it is important for the community to commit to getting it to a functional state. >> >> It sounds like your objection is to this commitment for 3.0, but remember that 3.0 is the next release so that we can remove deprecated APIs. It does not mean that we aren't adding new features in that release and aren't considering other goals.
Re: [VOTE] Functional DataSourceV2 in Spark 3.0
I am -1 on this vote for pretty much all the reasons that Mark mentioned. A major version change gives us an opportunity to remove deprecated interfaces, stabilize experimental/developer api, drop support for outdated functionality/platforms and evolve the project with a vision for foreseeable future. IMO the primary focus should be on interface evolution, stability and lowering tech debt which might result in breaking changes. Which is not to say DSv2 should not be part of 3.0 Along with a lot of other exciting features also being added, it can be one more important enhancement. But I am not for delaying the release simply to accommodate a specific feature. Features can be added in subsequent as well - I am yet to hear of a good reason why it must be make it into 3.0 to need a VOTE thread. Regards, Mridul On Thu, Feb 28, 2019 at 10:44 AM Mark Hamstra wrote: > > I agree that adding new features in a major release is not forbidden, but > that is just not the primary goal of a major release. If we reach the point > where we are happy with the new public API before some new features are in a > satisfactory state to be merged, then I don't want there to be a prior > presumption that we cannot complete the primary goal of the major release. If > at that point you want to argue that it is worth waiting for some new > feature, then that would be fine and may have sufficient merits to warrant > some delay. > > Regardless of whether significant new public API comes into a major release > or a feature release, it should come in with an experimental annotation so > that we can make changes without requiring a new major release. > > If you want to argue that some new features that are currently targeting > 3.0.0 are significant enough that one or more of them should justify an > accelerated 3.1.0 release schedule if it is not ready in time for the 3.0.0 > release, then I can much more easily get behind that kind of commitment; but > I remain opposed to the notion of promoting any new features to the status of > blockers of 3.0.0 at this time. > > On Thu, Feb 28, 2019 at 10:23 AM Ryan Blue wrote: >> >> Mark, I disagree. Setting common goals is a critical part of getting things >> done. >> >> This doesn't commit the community to push out the release if the goals >> aren't met, but does mean that we will, as a community, seriously consider >> it. This is also an acknowledgement that this is the most important feature >> in the next release (whether major or minor) for many of us. This has been >> in limbo for a very long time, so I think it is important for the community >> to commit to getting it to a functional state. >> >> It sounds like your objection is to this commitment for 3.0, but remember >> that 3.0 is the next release so that we can remove deprecated APIs. It does >> not mean that we aren't adding new features in that release and aren't >> considering other goals. >> >> On Thu, Feb 28, 2019 at 10:12 AM Mark Hamstra >> wrote: >>> >>> Then I'm -1. Setting new features as blockers of major releases is not >>> proper project management, IMO. >>> >>> On Thu, Feb 28, 2019 at 10:06 AM Ryan Blue wrote: Mark, if this goal is adopted, "we" is the Apache Spark community. On Thu, Feb 28, 2019 at 9:52 AM Mark Hamstra wrote: > > Who is "we" in these statements, such as "we should consider a functional > DSv2 implementation a blocker for Spark 3.0"? If it means those > contributing to the DSv2 effort want to set their own goals, milestones, > etc., then that is fine with me. If you mean that the Apache Spark > project should officially commit to the lack of a functional DSv2 > implementation being a blocker for the release of Spark 3.0, then I'm -1. > A major release is just not about adding new features. Rather, it is > about making changes to the existing public API. As such, I'm opposed to > any new feature or any API addition being considered a blocker of the > 3.0.0 release. > > > On Thu, Feb 28, 2019 at 9:09 AM Matt Cheah wrote: >> >> +1 (non-binding) >> >> >> >> Are identifiers and namespaces going to be rolled under one of those six >> points? >> >> >> >> From: Ryan Blue >> Reply-To: "rb...@netflix.com" >> Date: Thursday, February 28, 2019 at 8:39 AM >> To: Spark Dev List >> Subject: [VOTE] Functional DataSourceV2 in Spark 3.0 >> >> >> >> I’d like to call a vote for committing to getting DataSourceV2 in a >> functional state for Spark 3.0. >> >> For more context, please see the discussion thread, but here is a quick >> summary about what this commitment means: >> >> · We think that a “functional DSv2” is an achievable goal for >> the Spark 3.0 release >> >> · We will consider this a blocker for Spark 3.0, and take >> reasonable steps to make it happen
Re: [VOTE] Functional DataSourceV2 in Spark 3.0
I'm not worried about rushing. I worry that, without clear parameters for the amount or types of DSv2 delays that are acceptable, we might end up holding back 3.0 indefinitely to meet the deadline when we wouldn't have made that decision de novo. (Or even worse, the PMC eventually feels they must release 3.0 anyway, and then we're in the same position but everyone's angry and frustrated.) I do recognize that I'm particularly allergic to this risk, which is why I'm not giving a -1 here - one of the first projects I worked on in my career was delayed for over a year because an incomplete feature was bound to its release date. But I don't agree that "might not resolve anything" is the worst possible outcome here. On Thu, Feb 28, 2019 at 11:48 AM Sean Owen wrote: > This is a fine thing to VOTE on. Committers (and community, > non-binding) can VOTE on what we like; we just don't do it often where > not required because it's a) overkill overhead over simple lazy > consensus, and b) it can be hard to say what the binding VOTE binds if > it's not a discrete commit or release. This is a big enough deal that > it's not overkill. The question is, what does it bind? > > It means the release is definitely blocked until the items here are > done, but, what's 'done'? It will return to the same questions already > on the table, like do we need to define just APIs, and to what degree > of stability. At worst it might not resolve anything. > > I don't see much harm in nailing down what appears to be agreement at > the level of specific goals, even if this isn't a vote on a release > date or specific commit. I think it's clear these items must be > resolved to the level of semi-stable API by 3.0, as it's coming soon > and this is the right time to establish these APIs. It might provide > necessary clarity and constraints to get it over the line. > > To Mark -- yeah, this is asserting that DSv2 is a primary or necessary > goal of the release, just like a "Blocker" does. Why would this > argument be different or better if it waited until 3.0 was imminent? I > get that one might say, well, we ended up working on more important > stuff in the meantime and now we don't have time. But this VOTE's > purpose is to declare that this is the important stuff now. > > To Jose -- what's the "just a few PRs in review" issue? you worry that > we might rush DSv2 at the end to meet a deadline? all the better to, > if anything, agree it's important now. It's also an agreement to delay > the release for it, not rush it. I don't see that later is a better > time to make the decision, if rush is a worry? > > Given my definition, and understanding of the issues, I'd say +1 > > On Thu, Feb 28, 2019 at 12:24 PM Ryan Blue > wrote: > > > > Mark, I disagree. Setting common goals is a critical part of getting > things done. > > > > This doesn't commit the community to push out the release if the goals > aren't met, but does mean that we will, as a community, seriously consider > it. This is also an acknowledgement that this is the most important feature > in the next release (whether major or minor) for many of us. This has been > in limbo for a very long time, so I think it is important for the community > to commit to getting it to a functional state. > > > > It sounds like your objection is to this commitment for 3.0, but > remember that 3.0 is the next release so that we can remove deprecated > APIs. It does not mean that we aren't adding new features in that release > and aren't considering other goals. > > > > On Thu, Feb 28, 2019 at 10:12 AM Mark Hamstra > wrote: > >> > >> Then I'm -1. Setting new features as blockers of major releases is not > proper project management, IMO. > >> > >> On Thu, Feb 28, 2019 at 10:06 AM Ryan Blue wrote: > >>> > >>> Mark, if this goal is adopted, "we" is the Apache Spark community. > >>> > >>> On Thu, Feb 28, 2019 at 9:52 AM Mark Hamstra > wrote: > > Who is "we" in these statements, such as "we should consider a > functional DSv2 implementation a blocker for Spark 3.0"? If it means those > contributing to the DSv2 effort want to set their own goals, milestones, > etc., then that is fine with me. If you mean that the Apache Spark project > should officially commit to the lack of a functional DSv2 implementation > being a blocker for the release of Spark 3.0, then I'm -1. A major release > is just not about adding new features. Rather, it is about making changes > to the existing public API. As such, I'm opposed to any new feature or any > API addition being considered a blocker of the 3.0.0 release. > > > On Thu, Feb 28, 2019 at 9:09 AM Matt Cheah > wrote: > > > > +1 (non-binding) > > > > > > > > Are identifiers and namespaces going to be rolled under one of those > six points? > > > > > > > > From: Ryan Blue > > Reply-To: "rb...@netflix.com" > > Date: Thursday, February 28, 2019 at 8:39 AM > > To: Spark Dev List > >
Re: [VOTE] Functional DataSourceV2 in Spark 3.0
The question is, what does it bind? I’m not pushing for a binding statement to do this or delay the 3.0 release because I don’t think that’s a very reasonable thing to do. It may well be that there is a good reason for missing the goal. So “what does it bind?” is an apt question. A commitment binds us to do this and make a reasonable attempt at finishing on time. If we choose not to commit, or if we choose to commit and don’t make a reasonable attempt, then we need to ask, “what happened?” Is Spark the right place for this work? What I don’t want is to work on it for 3-4 more months, miss the release, and then not have anyone take that problem seriously because we never said it was important. If we try and fail, then we need to fix what went wrong. This removes the option to pretend it wasn’t a goal in the first place. That’s why I think it is important that we make a statement that we, the community, intend to do it. On Thu, Feb 28, 2019 at 11:48 AM Sean Owen wrote: > This is a fine thing to VOTE on. Committers (and community, > non-binding) can VOTE on what we like; we just don't do it often where > not required because it's a) overkill overhead over simple lazy > consensus, and b) it can be hard to say what the binding VOTE binds if > it's not a discrete commit or release. This is a big enough deal that > it's not overkill. The question is, what does it bind? > > It means the release is definitely blocked until the items here are > done, but, what's 'done'? It will return to the same questions already > on the table, like do we need to define just APIs, and to what degree > of stability. At worst it might not resolve anything. > > I don't see much harm in nailing down what appears to be agreement at > the level of specific goals, even if this isn't a vote on a release > date or specific commit. I think it's clear these items must be > resolved to the level of semi-stable API by 3.0, as it's coming soon > and this is the right time to establish these APIs. It might provide > necessary clarity and constraints to get it over the line. > > To Mark -- yeah, this is asserting that DSv2 is a primary or necessary > goal of the release, just like a "Blocker" does. Why would this > argument be different or better if it waited until 3.0 was imminent? I > get that one might say, well, we ended up working on more important > stuff in the meantime and now we don't have time. But this VOTE's > purpose is to declare that this is the important stuff now. > > To Jose -- what's the "just a few PRs in review" issue? you worry that > we might rush DSv2 at the end to meet a deadline? all the better to, > if anything, agree it's important now. It's also an agreement to delay > the release for it, not rush it. I don't see that later is a better > time to make the decision, if rush is a worry? > > Given my definition, and understanding of the issues, I'd say +1 > > On Thu, Feb 28, 2019 at 12:24 PM Ryan Blue > wrote: > > > > Mark, I disagree. Setting common goals is a critical part of getting > things done. > > > > This doesn't commit the community to push out the release if the goals > aren't met, but does mean that we will, as a community, seriously consider > it. This is also an acknowledgement that this is the most important feature > in the next release (whether major or minor) for many of us. This has been > in limbo for a very long time, so I think it is important for the community > to commit to getting it to a functional state. > > > > It sounds like your objection is to this commitment for 3.0, but > remember that 3.0 is the next release so that we can remove deprecated > APIs. It does not mean that we aren't adding new features in that release > and aren't considering other goals. > > > > On Thu, Feb 28, 2019 at 10:12 AM Mark Hamstra > wrote: > >> > >> Then I'm -1. Setting new features as blockers of major releases is not > proper project management, IMO. > >> > >> On Thu, Feb 28, 2019 at 10:06 AM Ryan Blue wrote: > >>> > >>> Mark, if this goal is adopted, "we" is the Apache Spark community. > >>> > >>> On Thu, Feb 28, 2019 at 9:52 AM Mark Hamstra > wrote: > > Who is "we" in these statements, such as "we should consider a > functional DSv2 implementation a blocker for Spark 3.0"? If it means those > contributing to the DSv2 effort want to set their own goals, milestones, > etc., then that is fine with me. If you mean that the Apache Spark project > should officially commit to the lack of a functional DSv2 implementation > being a blocker for the release of Spark 3.0, then I'm -1. A major release > is just not about adding new features. Rather, it is about making changes > to the existing public API. As such, I'm opposed to any new feature or any > API addition being considered a blocker of the 3.0.0 release. > > > On Thu, Feb 28, 2019 at 9:09 AM Matt Cheah > wrote: > > > > +1 (non-binding) > > > > > > > > Are identifiers and namespaces going to be rolled
Re: [VOTE] Functional DataSourceV2 in Spark 3.0
This is a fine thing to VOTE on. Committers (and community, non-binding) can VOTE on what we like; we just don't do it often where not required because it's a) overkill overhead over simple lazy consensus, and b) it can be hard to say what the binding VOTE binds if it's not a discrete commit or release. This is a big enough deal that it's not overkill. The question is, what does it bind? It means the release is definitely blocked until the items here are done, but, what's 'done'? It will return to the same questions already on the table, like do we need to define just APIs, and to what degree of stability. At worst it might not resolve anything. I don't see much harm in nailing down what appears to be agreement at the level of specific goals, even if this isn't a vote on a release date or specific commit. I think it's clear these items must be resolved to the level of semi-stable API by 3.0, as it's coming soon and this is the right time to establish these APIs. It might provide necessary clarity and constraints to get it over the line. To Mark -- yeah, this is asserting that DSv2 is a primary or necessary goal of the release, just like a "Blocker" does. Why would this argument be different or better if it waited until 3.0 was imminent? I get that one might say, well, we ended up working on more important stuff in the meantime and now we don't have time. But this VOTE's purpose is to declare that this is the important stuff now. To Jose -- what's the "just a few PRs in review" issue? you worry that we might rush DSv2 at the end to meet a deadline? all the better to, if anything, agree it's important now. It's also an agreement to delay the release for it, not rush it. I don't see that later is a better time to make the decision, if rush is a worry? Given my definition, and understanding of the issues, I'd say +1 On Thu, Feb 28, 2019 at 12:24 PM Ryan Blue wrote: > > Mark, I disagree. Setting common goals is a critical part of getting things > done. > > This doesn't commit the community to push out the release if the goals aren't > met, but does mean that we will, as a community, seriously consider it. This > is also an acknowledgement that this is the most important feature in the > next release (whether major or minor) for many of us. This has been in limbo > for a very long time, so I think it is important for the community to commit > to getting it to a functional state. > > It sounds like your objection is to this commitment for 3.0, but remember > that 3.0 is the next release so that we can remove deprecated APIs. It does > not mean that we aren't adding new features in that release and aren't > considering other goals. > > On Thu, Feb 28, 2019 at 10:12 AM Mark Hamstra wrote: >> >> Then I'm -1. Setting new features as blockers of major releases is not >> proper project management, IMO. >> >> On Thu, Feb 28, 2019 at 10:06 AM Ryan Blue wrote: >>> >>> Mark, if this goal is adopted, "we" is the Apache Spark community. >>> >>> On Thu, Feb 28, 2019 at 9:52 AM Mark Hamstra >>> wrote: Who is "we" in these statements, such as "we should consider a functional DSv2 implementation a blocker for Spark 3.0"? If it means those contributing to the DSv2 effort want to set their own goals, milestones, etc., then that is fine with me. If you mean that the Apache Spark project should officially commit to the lack of a functional DSv2 implementation being a blocker for the release of Spark 3.0, then I'm -1. A major release is just not about adding new features. Rather, it is about making changes to the existing public API. As such, I'm opposed to any new feature or any API addition being considered a blocker of the 3.0.0 release. On Thu, Feb 28, 2019 at 9:09 AM Matt Cheah wrote: > > +1 (non-binding) > > > > Are identifiers and namespaces going to be rolled under one of those six > points? > > > > From: Ryan Blue > Reply-To: "rb...@netflix.com" > Date: Thursday, February 28, 2019 at 8:39 AM > To: Spark Dev List > Subject: [VOTE] Functional DataSourceV2 in Spark 3.0 > > > > I’d like to call a vote for committing to getting DataSourceV2 in a > functional state for Spark 3.0. > > For more context, please see the discussion thread, but here is a quick > summary about what this commitment means: > > · We think that a “functional DSv2” is an achievable goal for the > Spark 3.0 release > > · We will consider this a blocker for Spark 3.0, and take > reasonable steps to make it happen > > · We will not delay the release without a community discussion > > Here’s what we’ve defined as a functional DSv2: > > · Add a plugin system for catalogs > > · Add an interface for table catalogs (see the ongoing SPIP vote) > > · Add an
Re: [VOTE] Functional DataSourceV2 in Spark 3.0
I agree that adding new features in a major release is not forbidden, but that is just not the primary goal of a major release. If we reach the point where we are happy with the new public API before some new features are in a satisfactory state to be merged, then I don't want there to be a prior presumption that we cannot complete the primary goal of the major release. If at that point you want to argue that it is worth waiting for some new feature, then that would be fine and may have sufficient merits to warrant some delay. Regardless of whether significant new public API comes into a major release or a feature release, it should come in with an experimental annotation so that we can make changes without requiring a new major release. If you want to argue that some new features that are currently targeting 3.0.0 are significant enough that one or more of them should justify an accelerated 3.1.0 release schedule if it is not ready in time for the 3.0.0 release, then I can much more easily get behind that kind of commitment; but I remain opposed to the notion of promoting any new features to the status of blockers of 3.0.0 at this time. On Thu, Feb 28, 2019 at 10:23 AM Ryan Blue wrote: > Mark, I disagree. Setting common goals is a critical part of getting > things done. > > This doesn't commit the community to push out the release if the goals > aren't met, but does mean that we will, as a community, seriously consider > it. This is also an acknowledgement that this is the most important feature > in the next release (whether major or minor) for many of us. This has been > in limbo for a very long time, so I think it is important for the community > to commit to getting it to a functional state. > > It sounds like your objection is to this commitment for 3.0, but remember > that 3.0 is the next release so that we can remove deprecated APIs. It does > not mean that we aren't adding new features in that release and aren't > considering other goals. > > On Thu, Feb 28, 2019 at 10:12 AM Mark Hamstra > wrote: > >> Then I'm -1. Setting new features as blockers of major releases is not >> proper project management, IMO. >> >> On Thu, Feb 28, 2019 at 10:06 AM Ryan Blue wrote: >> >>> Mark, if this goal is adopted, "we" is the Apache Spark community. >>> >>> On Thu, Feb 28, 2019 at 9:52 AM Mark Hamstra >>> wrote: >>> Who is "we" in these statements, such as "we should consider a functional DSv2 implementation a blocker for Spark 3.0"? If it means those contributing to the DSv2 effort want to set their own goals, milestones, etc., then that is fine with me. If you mean that the Apache Spark project should officially commit to the lack of a functional DSv2 implementation being a blocker for the release of Spark 3.0, then I'm -1. A major release is just not about adding new features. Rather, it is about making changes to the existing public API. As such, I'm opposed to any new feature or any API addition being considered a blocker of the 3.0.0 release. On Thu, Feb 28, 2019 at 9:09 AM Matt Cheah wrote: > +1 (non-binding) > > > > Are identifiers and namespaces going to be rolled under one of those > six points? > > > > *From: *Ryan Blue > *Reply-To: *"rb...@netflix.com" > *Date: *Thursday, February 28, 2019 at 8:39 AM > *To: *Spark Dev List > *Subject: *[VOTE] Functional DataSourceV2 in Spark 3.0 > > > > I’d like to call a vote for committing to getting DataSourceV2 in a > functional state for Spark 3.0. > > For more context, please see the discussion thread, but here is a > quick summary about what this commitment means: > > · We think that a “functional DSv2” is an achievable goal for > the Spark 3.0 release > > · We will consider this a blocker for Spark 3.0, and take > reasonable steps to make it happen > > · We will *not* delay the release without a community > discussion > > Here’s what we’ve defined as a functional DSv2: > > · Add a plugin system for catalogs > > · Add an interface for table catalogs (see the ongoing SPIP > vote) > > · Add an implementation of the new interface that calls > SessionCatalog to load v2 tables > > · Add a resolution rule to load v2 tables from the v2 catalog > > · Add CTAS logical and physical plan nodes > > · Add conversions from SQL parsed plans to v2 logical plans > (e.g., INSERT INTO support) > > Please vote in the next 3 days on whether you agree with committing to > this goal. > > [ ] +1: Agree that we should consider a functional DSv2 implementation > a blocker for Spark 3.0 > [ ] +0: . . . > [ ] -1: I disagree with this goal because . . . > > Thank you! > > -- > > Ryan Blue > >
Re: [VOTE] Functional DataSourceV2 in Spark 3.0
Mark, I disagree. Setting common goals is a critical part of getting things done. This doesn't commit the community to push out the release if the goals aren't met, but does mean that we will, as a community, seriously consider it. This is also an acknowledgement that this is the most important feature in the next release (whether major or minor) for many of us. This has been in limbo for a very long time, so I think it is important for the community to commit to getting it to a functional state. It sounds like your objection is to this commitment for 3.0, but remember that 3.0 is the next release so that we can remove deprecated APIs. It does not mean that we aren't adding new features in that release and aren't considering other goals. On Thu, Feb 28, 2019 at 10:12 AM Mark Hamstra wrote: > Then I'm -1. Setting new features as blockers of major releases is not > proper project management, IMO. > > On Thu, Feb 28, 2019 at 10:06 AM Ryan Blue wrote: > >> Mark, if this goal is adopted, "we" is the Apache Spark community. >> >> On Thu, Feb 28, 2019 at 9:52 AM Mark Hamstra >> wrote: >> >>> Who is "we" in these statements, such as "we should consider a >>> functional DSv2 implementation a blocker for Spark 3.0"? If it means those >>> contributing to the DSv2 effort want to set their own goals, milestones, >>> etc., then that is fine with me. If you mean that the Apache Spark project >>> should officially commit to the lack of a functional DSv2 implementation >>> being a blocker for the release of Spark 3.0, then I'm -1. A major release >>> is just not about adding new features. Rather, it is about making changes >>> to the existing public API. As such, I'm opposed to any new feature or any >>> API addition being considered a blocker of the 3.0.0 release. >>> >>> >>> On Thu, Feb 28, 2019 at 9:09 AM Matt Cheah wrote: >>> +1 (non-binding) Are identifiers and namespaces going to be rolled under one of those six points? *From: *Ryan Blue *Reply-To: *"rb...@netflix.com" *Date: *Thursday, February 28, 2019 at 8:39 AM *To: *Spark Dev List *Subject: *[VOTE] Functional DataSourceV2 in Spark 3.0 I’d like to call a vote for committing to getting DataSourceV2 in a functional state for Spark 3.0. For more context, please see the discussion thread, but here is a quick summary about what this commitment means: · We think that a “functional DSv2” is an achievable goal for the Spark 3.0 release · We will consider this a blocker for Spark 3.0, and take reasonable steps to make it happen · We will *not* delay the release without a community discussion Here’s what we’ve defined as a functional DSv2: · Add a plugin system for catalogs · Add an interface for table catalogs (see the ongoing SPIP vote) · Add an implementation of the new interface that calls SessionCatalog to load v2 tables · Add a resolution rule to load v2 tables from the v2 catalog · Add CTAS logical and physical plan nodes · Add conversions from SQL parsed plans to v2 logical plans (e.g., INSERT INTO support) Please vote in the next 3 days on whether you agree with committing to this goal. [ ] +1: Agree that we should consider a functional DSv2 implementation a blocker for Spark 3.0 [ ] +0: . . . [ ] -1: I disagree with this goal because . . . Thank you! -- Ryan Blue Software Engineer Netflix >>> >> >> -- >> Ryan Blue >> Software Engineer >> Netflix >> > -- Ryan Blue Software Engineer Netflix
Re: [VOTE] Functional DataSourceV2 in Spark 3.0
Mark, if this goal is adopted, "we" is the Apache Spark community. On Thu, Feb 28, 2019 at 9:52 AM Mark Hamstra wrote: > Who is "we" in these statements, such as "we should consider a functional > DSv2 implementation a blocker for Spark 3.0"? If it means those > contributing to the DSv2 effort want to set their own goals, milestones, > etc., then that is fine with me. If you mean that the Apache Spark project > should officially commit to the lack of a functional DSv2 implementation > being a blocker for the release of Spark 3.0, then I'm -1. A major release > is just not about adding new features. Rather, it is about making changes > to the existing public API. As such, I'm opposed to any new feature or any > API addition being considered a blocker of the 3.0.0 release. > > > On Thu, Feb 28, 2019 at 9:09 AM Matt Cheah wrote: > >> +1 (non-binding) >> >> >> >> Are identifiers and namespaces going to be rolled under one of those six >> points? >> >> >> >> *From: *Ryan Blue >> *Reply-To: *"rb...@netflix.com" >> *Date: *Thursday, February 28, 2019 at 8:39 AM >> *To: *Spark Dev List >> *Subject: *[VOTE] Functional DataSourceV2 in Spark 3.0 >> >> >> >> I’d like to call a vote for committing to getting DataSourceV2 in a >> functional state for Spark 3.0. >> >> For more context, please see the discussion thread, but here is a quick >> summary about what this commitment means: >> >> · We think that a “functional DSv2” is an achievable goal for >> the Spark 3.0 release >> >> · We will consider this a blocker for Spark 3.0, and take >> reasonable steps to make it happen >> >> · We will *not* delay the release without a community discussion >> >> Here’s what we’ve defined as a functional DSv2: >> >> · Add a plugin system for catalogs >> >> · Add an interface for table catalogs (see the ongoing SPIP vote) >> >> · Add an implementation of the new interface that calls >> SessionCatalog to load v2 tables >> >> · Add a resolution rule to load v2 tables from the v2 catalog >> >> · Add CTAS logical and physical plan nodes >> >> · Add conversions from SQL parsed plans to v2 logical plans >> (e.g., INSERT INTO support) >> >> Please vote in the next 3 days on whether you agree with committing to >> this goal. >> >> [ ] +1: Agree that we should consider a functional DSv2 implementation a >> blocker for Spark 3.0 >> [ ] +0: . . . >> [ ] -1: I disagree with this goal because . . . >> >> Thank you! >> >> -- >> >> Ryan Blue >> >> Software Engineer >> >> Netflix >> > -- Ryan Blue Software Engineer Netflix
Re: [VOTE] Functional DataSourceV2 in Spark 3.0
Then I'm -1. Setting new features as blockers of major releases is not proper project management, IMO. On Thu, Feb 28, 2019 at 10:06 AM Ryan Blue wrote: > Mark, if this goal is adopted, "we" is the Apache Spark community. > > On Thu, Feb 28, 2019 at 9:52 AM Mark Hamstra > wrote: > >> Who is "we" in these statements, such as "we should consider a functional >> DSv2 implementation a blocker for Spark 3.0"? If it means those >> contributing to the DSv2 effort want to set their own goals, milestones, >> etc., then that is fine with me. If you mean that the Apache Spark project >> should officially commit to the lack of a functional DSv2 implementation >> being a blocker for the release of Spark 3.0, then I'm -1. A major release >> is just not about adding new features. Rather, it is about making changes >> to the existing public API. As such, I'm opposed to any new feature or any >> API addition being considered a blocker of the 3.0.0 release. >> >> >> On Thu, Feb 28, 2019 at 9:09 AM Matt Cheah wrote: >> >>> +1 (non-binding) >>> >>> >>> >>> Are identifiers and namespaces going to be rolled under one of those six >>> points? >>> >>> >>> >>> *From: *Ryan Blue >>> *Reply-To: *"rb...@netflix.com" >>> *Date: *Thursday, February 28, 2019 at 8:39 AM >>> *To: *Spark Dev List >>> *Subject: *[VOTE] Functional DataSourceV2 in Spark 3.0 >>> >>> >>> >>> I’d like to call a vote for committing to getting DataSourceV2 in a >>> functional state for Spark 3.0. >>> >>> For more context, please see the discussion thread, but here is a quick >>> summary about what this commitment means: >>> >>> · We think that a “functional DSv2” is an achievable goal for >>> the Spark 3.0 release >>> >>> · We will consider this a blocker for Spark 3.0, and take >>> reasonable steps to make it happen >>> >>> · We will *not* delay the release without a community discussion >>> >>> Here’s what we’ve defined as a functional DSv2: >>> >>> · Add a plugin system for catalogs >>> >>> · Add an interface for table catalogs (see the ongoing SPIP >>> vote) >>> >>> · Add an implementation of the new interface that calls >>> SessionCatalog to load v2 tables >>> >>> · Add a resolution rule to load v2 tables from the v2 catalog >>> >>> · Add CTAS logical and physical plan nodes >>> >>> · Add conversions from SQL parsed plans to v2 logical plans >>> (e.g., INSERT INTO support) >>> >>> Please vote in the next 3 days on whether you agree with committing to >>> this goal. >>> >>> [ ] +1: Agree that we should consider a functional DSv2 implementation a >>> blocker for Spark 3.0 >>> [ ] +0: . . . >>> [ ] -1: I disagree with this goal because . . . >>> >>> Thank you! >>> >>> -- >>> >>> Ryan Blue >>> >>> Software Engineer >>> >>> Netflix >>> >> > > -- > Ryan Blue > Software Engineer > Netflix >
Re: [VOTE] Functional DataSourceV2 in Spark 3.0
Who is "we" in these statements, such as "we should consider a functional DSv2 implementation a blocker for Spark 3.0"? If it means those contributing to the DSv2 effort want to set their own goals, milestones, etc., then that is fine with me. If you mean that the Apache Spark project should officially commit to the lack of a functional DSv2 implementation being a blocker for the release of Spark 3.0, then I'm -1. A major release is just not about adding new features. Rather, it is about making changes to the existing public API. As such, I'm opposed to any new feature or any API addition being considered a blocker of the 3.0.0 release. On Thu, Feb 28, 2019 at 9:09 AM Matt Cheah wrote: > +1 (non-binding) > > > > Are identifiers and namespaces going to be rolled under one of those six > points? > > > > *From: *Ryan Blue > *Reply-To: *"rb...@netflix.com" > *Date: *Thursday, February 28, 2019 at 8:39 AM > *To: *Spark Dev List > *Subject: *[VOTE] Functional DataSourceV2 in Spark 3.0 > > > > I’d like to call a vote for committing to getting DataSourceV2 in a > functional state for Spark 3.0. > > For more context, please see the discussion thread, but here is a quick > summary about what this commitment means: > > · We think that a “functional DSv2” is an achievable goal for the > Spark 3.0 release > > · We will consider this a blocker for Spark 3.0, and take > reasonable steps to make it happen > > · We will *not* delay the release without a community discussion > > Here’s what we’ve defined as a functional DSv2: > > · Add a plugin system for catalogs > > · Add an interface for table catalogs (see the ongoing SPIP vote) > > · Add an implementation of the new interface that calls > SessionCatalog to load v2 tables > > · Add a resolution rule to load v2 tables from the v2 catalog > > · Add CTAS logical and physical plan nodes > > · Add conversions from SQL parsed plans to v2 logical plans > (e.g., INSERT INTO support) > > Please vote in the next 3 days on whether you agree with committing to > this goal. > > [ ] +1: Agree that we should consider a functional DSv2 implementation a > blocker for Spark 3.0 > [ ] +0: . . . > [ ] -1: I disagree with this goal because . . . > > Thank you! > > -- > > Ryan Blue > > Software Engineer > > Netflix >
Re: [VOTE] Functional DataSourceV2 in Spark 3.0
+1 (non-binding) Are identifiers and namespaces going to be rolled under one of those six points? From: Ryan Blue Reply-To: "rb...@netflix.com" Date: Thursday, February 28, 2019 at 8:39 AM To: Spark Dev List Subject: [VOTE] Functional DataSourceV2 in Spark 3.0 I’d like to call a vote for committing to getting DataSourceV2 in a functional state for Spark 3.0. For more context, please see the discussion thread, but here is a quick summary about what this commitment means: · We think that a “functional DSv2” is an achievable goal for the Spark 3.0 release · We will consider this a blocker for Spark 3.0, and take reasonable steps to make it happen · We will not delay the release without a community discussion Here’s what we’ve defined as a functional DSv2: · Add a plugin system for catalogs · Add an interface for table catalogs (see the ongoing SPIP vote) · Add an implementation of the new interface that calls SessionCatalog to load v2 tables · Add a resolution rule to load v2 tables from the v2 catalog · Add CTAS logical and physical plan nodes · Add conversions from SQL parsed plans to v2 logical plans (e.g., INSERT INTO support) Please vote in the next 3 days on whether you agree with committing to this goal. [ ] +1: Agree that we should consider a functional DSv2 implementation a blocker for Spark 3.0 [ ] +0: . . . [ ] -1: I disagree with this goal because . . . Thank you! -- Ryan Blue Software Engineer Netflix smime.p7s Description: S/MIME cryptographic signature