Re: Spark Improvement Proposals

Nicholas Chammas Sun, 09 Oct 2016 13:48:58 -0700

Oh, hmm… I guess I’m a little confused on the relation between Cody’s email
and the document he linked to, which says:


https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md#when

SIPs should be used for significant user-facing or cross-cutting changes,
not day-to-day improvements. When in doubt, if a committer thinks a change
needs an SIP, it does.

Nick


On Sun, Oct 9, 2016 at 4:40 PM Matei Zaharia <matei.zaha...@gmail.com>
wrote:

> Yup, but the example you gave is for alternatives about *user-facing
> behavior*, not implementation. The current SIP doc describes "strategy"
> more as implementation strategy. I'm just saying there are different
> possible goals for these types of docs.
>
> BTW, PEPs and Scala SIPs focus primarily on user-facing behavior, but also
> require a reference implementation. This is a bit different from what Cody
> had in mind, I think.
>
>
> Matei
>
> On Oct 9, 2016, at 1:25 PM, Nicholas Chammas <nicholas.cham...@gmail.com>
> wrote:
>
>
>    - Rejected strategies: I personally wouldn’t put this, because what’s
>    the point of voting to reject a strategy before you’ve really begun
>    designing and implementing something? What if you discover that the
>    strategy is actually better when you start doing stuff?
>
> I would guess the point is to document alternatives that were discussed
> and rejected, so that later on people can be pointed to that discussion and
> the devs don’t have to repeat themselves unnecessarily every time someone
> comes along and asks “Why didn’t you do this other thing?” That doesn’t
> mean a rejected proposal can’t later be revisited and the SIP can’t be
> updated.
>
> For reference from the Python community, PEP 492
> <https://www.python.org/dev/peps/pep-0492/>, a Python Enhancement
> Proposal for adding async and await syntax and “first-class” coroutines
> to Python, has a section on rejected ideas
> <https://www.python.org/dev/peps/pep-0492/#why-async-def> for the new
> syntax. It captures a summary of what the devs discussed, but it doesn’t
> mean the PEP can’t be updated and a previously rejected proposal can’t be
> revived.
>
> At least in the Python community, a PEP serves not just as formal starting
> point for a proposal (the “real” starting point is usually a discussion on
> python-ideas or python-dev), but also as documentation of what was agreed
> on and a living “spec” of sorts. So PEPs sometimes get updated years after
> they are approved when revisions are agreed upon. PEPs are also intended
> for wide consumption, vs. bug tracker issues which the broader Python dev
> community are not expected to follow closely.
>
> Dunno if we want to follow a similar pattern for Spark, since the
> project’s needs are different. But the Python community has used PEPs to
> help organize and steer development since 2000; there are plenty of
> examples there we can probably take inspiration from.
>
> By the way, can we call these things something other than Spark
> Improvement Proposals? The acronym, SIP, conflicts with Scala SIPs
> <http://docs.scala-lang.org/sips/index.html>. Since the Scala and Spark
> communities have a lot of overlap, we don’t want, for example, names like
> “SIP-10” to have an ambiguous meaning.
>
> Nick
> 
>
> On Sun, Oct 9, 2016 at 3:34 PM Matei Zaharia <matei.zaha...@gmail.com>
> wrote:
>
> Hi Cody,
>
> I think this would be a lot more concrete if we had a more detailed
> template for SIPs. Right now, it's not super clear what's in scope -- e.g.
> are  they a way to solicit feedback on the user-facing behavior or on the
> internals? "Goals" can cover both things. I've been thinking of SIPs more
> as Product Requirements Docs (PRDs), which focus on *what* a code change
> should do as opposed to how.
>
> In particular, here are some things that you may or may not consider in
> scope for SIPs:
>
> - Goals and non-goals: This is definitely in scope, and IMO should focus
> on user-visible behavior (e.g. "system supports SQL window functions" or
> "system continues working if one node fails"). BTW I wouldn't say "rejected
> goals" because some of them might become goals later, so we're not
> definitively rejecting them.
>
> - Public API: Probably should be included in most SIPs unless it's too
> large to fully specify then (e.g. "let's add an ML library").
>
> - Use cases: I usually find this very useful in PRDs to better communicate
> the goals.
>
> - Internal architecture: This is usually *not* a thing users can easily
> comment on and it sounds more like a design doc item. Of course it's
> important to show that the SIP is feasible to implement. One exception,
> however, is that I think we'll have some SIPs primarily on internals (e.g.
> if somebody wants to refactor Spark's query optimizer or something).
>
> - Rejected strategies: I personally wouldn't put this, because what's the
> point of voting to reject a strategy before you've really begun designing
> and implementing something? What if you discover that the strategy is
> actually better when you start doing stuff?
>
> At a super high level, it depends on whether you want the SIPs to be PRDs
> for getting some quick feedback on the goals of a feature before it is
> designed, or something more like full-fledged design docs (just a more
> visible design doc for bigger changes). I looked at Kafka's KIPs, and they
> actually seem to be more like design docs. This can work too but it does
> require more work from the proposer and it can lead to the same problems
> you mentioned with people already having a design and implementation in
> mind.
>
> Basically, the question is, are you trying to iterate faster on design by
> adding a step for user feedback earlier? Or are you just trying to make
> design docs for key features more visible (and their approval more formal)?
>
> BTW note that in either case, I'd like to have a template for design docs
> too, which should also include goals. I think that would've avoided some of
> the issues you brought up.
>
> Matei
>
> On Oct 9, 2016, at 10:40 AM, Cody Koeninger <c...@koeninger.org> wrote:
>
> Here's my specific proposal (meta-proposal?)
>
> Spark Improvement Proposals (SIP)
>
>
> Background:
>
> The current problem is that design and implementation of large features
> are often done in private, before soliciting user feedback.
>
> When feedback is solicited, it is often as to detailed design specifics,
> not focused on goals.
>
> When implementation does take place after design, there is often
> disagreement as to what goals are or are not in scope.
>
> This results in commits that don't fully meet user needs.
>
>
> Goals:
>
> - Ensure user, contributor, and committer goals are clearly identified and
> agreed upon, before implementation takes place.
>
> - Ensure that a technically feasible strategy is chosen that is likely to
> meet the goals.
>
>
> Rejected Goals:
>
> - SIPs are not for detailed design.  Design by committee doesn't work.
>
> - SIPs are not for every change.  We dont need that much process.
>
>
> Strategy:
>
> My suggestion is outlined as a Spark Improvement Proposal process
> documented at
>
>
> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md
>
> Specifics of Jira manipulation are an implementation detail we can figure
> out.
>
> I'm suggesting voting; the need here is for a _clear_ outcome.
>
>
> Rejected Strategies:
>
> Having someone who understands the problem implement it first works, but
> only if significant iteration after user feedback is allowed.
>
> Historically this has been problematic due to pressure to limit public api
> changes.
>
> On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <r...@databricks.com> wrote:
>
> Alright looks like there are quite a bit of support. We should wait to
> hear from more people too.
>
> To push this forward, Cody and I will be working together in the next
> couple of weeks to come up with a concrete, detailed proposal on what this
> entails, and then we can discuss this the specific proposal as well.
>
>
> On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <c...@koeninger.org> wrote:
>
> Yeah, in case it wasn't clear, I was talking about SIPs for major
> user-facing or cross-cutting changes, not minor feature adds.
>
> On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos <
> stavros.kontopou...@lightbend.com> wrote:
>
> +1 to the SIP label as long as it does not slow down things and it targets
> optimizing efforts, coordination etc. For example really small features
> should not need to go through this process (assuming they dont touch public
> interfaces)  or re-factorings and hope it will be kept this way. So as a
> guideline doc should be provided, like in the KIP case.
>
> IMHO so far aside from tagging things and linking them elsewhere simply
> having design docs and prototypes implementations in PRs is not something
> that has not worked so far. What is really a pain in many projects out
> there is discontinuity in progress of PRs, missing features, slow reviews
> which is understandable to some extent... it is not only about Spark but
> things can be improved for sure for this project in particular as already
> stated.
>
> On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <c...@koeninger.org>
> wrote:
>
> +1 to adding an SIP label and linking it from the website.  I think it
> needs
>
> - template that focuses it towards soliciting user goals / non goals
> - clear resolution as to which strategy was chosen to pursue.  I'd
> recommend a vote.
>
> Matei asked me to clarify what I meant by changing interfaces, I think
> it's directly relevant to the SIP idea so I'll clarify here, and split
> a thread for the other discussion per Nicholas' request.
>
> I meant changing public user interfaces.  I think the first design is
> unlikely to be right, because it's done at a time when you have the
> least information.  As a user, I find it considerably more frustrating
> to be unable to use a tool to get my job done, than I do having to
> make minor changes to my code in order to take advantage of features.
> I've seen committers be seriously reluctant to allow changes to
> @experimental code that are needed in order for it to really work
> right.  You need to be able to iterate, and if people on both sides of
> the fence aren't going to respect that some newer apis are subject to
> change, then why even mark them as such?
>
> Ideally a finished SIP should give me a checklist of things that an
> implementation must do, and things that it doesn't need to do.
> Contributors/committers should be seriously discouraged from putting
> out a version 0.1 that doesn't have at least a prototype
> implementation of all those things, especially if they're then going
> to argue against interface changes necessary to get the the rest of
> the things done in the 0.2 version.
>
>
> On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <r...@databricks.com> wrote:
> > I like the lightweight proposal to add a SIP label.
> >
> > During Spark 2.0 development, Tom (Graves) and I suggested using wiki to
> > track the list of major changes, but that never really materialized due
> to
> > the overhead. Adding a SIP label on major JIRAs and then link to them
> > prominently on the Spark website makes a lot of sense.
> >
> >
> > On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia <matei.zaha...@gmail.com>
> > wrote:
> >>
> >> For the improvement proposals, I think one major point was to make them
> >> really visible to users who are not contributors, so we should do more
> than
> >> sending stuff to dev@. One very lightweight idea is to have a new type
> of
> >> JIRA called a SIP and have a link to a filter that shows all such JIRAs
> from
> >> http://spark.apache.org. I also like the idea of SIP and design doc
> >> templates (in fact many projects have them).
> >>
> >> Matei
> >>
> >> On Oct 7, 2016, at 10:38 AM, Reynold Xin <r...@databricks.com> wrote:
> >>
> >> I called Cody last night and talked about some of the topics in his
> email.
> >> It became clear to me Cody genuinely cares about the project.
> >>
> >> Some of the frustrations come from the success of the project itself
> >> becoming very "hot", and it is difficult to get clarity from people who
> >> don't dedicate all their time to Spark. In fact, it is in some ways
> similar
> >> to scaling an engineering team in a successful startup: old processes
> that
> >> worked well might not work so well when it gets to a certain size,
> cultures
> >> can get diluted, building culture vs building process, etc.
> >>
> >> I also really like to have a more visible process for larger changes,
> >> especially major user facing API changes. Historically we upload design
> docs
> >> for major changes, but it is not always consistent and difficult to
> quality
> >> of the docs, due to the volunteering nature of the organization.
> >>
> >> Some of the more concrete ideas we discussed focus on building a culture
> >> to improve clarity:
> >>
> >> - Process: Large changes should have design docs posted on JIRA. One
> thing
> >> Cody and I didn't discuss but an idea that just came to me is we should
> >> create a design doc template for the project and ask everybody to
> follow.
> >> The design doc template should also explicitly list goals and
> non-goals, to
> >> make design doc more consistent.
> >>
> >> - Process: Email dev@ to solicit feedback. We have some this with some
> >> changes, but again very inconsistent. Just posting something on JIRA
> isn't
> >> sufficient, because there are simply too many JIRAs and the signal get
> lost
> >> in the noise. While this is generally impossible to enforce because we
> can't
> >> force all volunteers to conform to a process (or they might not even be
> >> aware of this),  those who are more familiar with the project can help
> by
> >> emailing the dev@ when they see something that hasn't been.
> >>
> >> - Culture: The design doc author(s) should be open to feedback. A design
> >> doc should serve as the base for discussion and is by no means the final
> >> design. Of course, this does not mean the author has to accept every
> >> feedback. They should also be comfortable accepting / rejecting ideas on
> >> technical grounds.
> >>
> >> - Process / Culture: For major ongoing projects, it can be useful to
> have
> >> some monthly Google hangouts that are open to the world. I am actually
> not
> >> sure how well this will work, because of the volunteering nature and we
> need
> >> to adjust for timezones for people across the globe, but it seems worth
> >> trying.
> >>
> >> - Culture: Contributors (including committers) should be more direct in
> >> setting expectations, including whether they are working on a specific
> >> issue, whether they will be working on a specific issue, and whether an
> >> issue or pr or jira should be rejected. Most people I know in this
> community
> >> are nice and don't enjoy telling other people no, but it is often more
> >> annoying to a contributor to not know anything than getting a no.
> >>
> >>
> >> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia <matei.zaha...@gmail.com
> >
> >> wrote:
> >>>
> >>>
> >>> Love the idea of a more visible "Spark Improvement Proposal" process
> that
> >>> solicits user input on new APIs. For what it's worth, I don't think
> >>> committers are trying to minimize their own work -- every committer
> cares
> >>> about making the software useful for users. However, it is always hard
> to
> >>> get user input and so it helps to have this kind of process. I've
> certainly
> >>> looked at the *IPs a lot in other software I use just to see the
> biggest
> >>> things on the roadmap.
> >>>
> >>> When you're talking about "changing interfaces", are you talking about
> >>> public or internal APIs? I do think many people hate changing public
> APIs
> >>> and I actually think that's for the best of the project. That's a
> technical
> >>> debate, but basically, the worst thing when you're using a piece of
> software
> >>> is that the developers constantly ask you to rewrite your app to
> update to a
> >>> new version (and thus benefit from bug fixes, etc). Cue anyone who's
> used
> >>> Protobuf, or Guava. The "let's get everyone to change their code this
> >>> release" model works well within a single large company, but doesn't
> work
> >>> well for a community, which is why nearly all *very* widely used
> programming
> >>> interfaces (I'm talking things like Java standard library, Windows
> API, etc)
> >>> almost *never* break backwards compatibility. All this is done within
> reason
> >>> though, e.g. we do change things in major releases (2.x, 3.x, etc).
> >>
> >>
> >>
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>
>
>
> --
> Stavros Kontopoulos
>
> *Senior Software Engineer*
> *Lightbend, Inc.*
>
> *p:  +30 6977967274 <%2B1%20650%20678%200020>*
> *e: stavros.kontopou...@lightbend.com* <dave.mar...@lightbend.com>
>
>
>
>
>
>
>
>

Re: Spark Improvement Proposals

Reply via email to