Oh, hmm… I guess I’m a little confused on the relation between Cody’s email and the document he linked to, which says:
https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md#when SIPs should be used for significant user-facing or cross-cutting changes, not day-to-day improvements. When in doubt, if a committer thinks a change needs an SIP, it does. Nick On Sun, Oct 9, 2016 at 4:40 PM Matei Zaharia <matei.zaha...@gmail.com> wrote: > Yup, but the example you gave is for alternatives about *user-facing > behavior*, not implementation. The current SIP doc describes "strategy" > more as implementation strategy. I'm just saying there are different > possible goals for these types of docs. > > BTW, PEPs and Scala SIPs focus primarily on user-facing behavior, but also > require a reference implementation. This is a bit different from what Cody > had in mind, I think. > > > Matei > > On Oct 9, 2016, at 1:25 PM, Nicholas Chammas <nicholas.cham...@gmail.com> > wrote: > > > - Rejected strategies: I personally wouldn’t put this, because what’s > the point of voting to reject a strategy before you’ve really begun > designing and implementing something? What if you discover that the > strategy is actually better when you start doing stuff? > > I would guess the point is to document alternatives that were discussed > and rejected, so that later on people can be pointed to that discussion and > the devs don’t have to repeat themselves unnecessarily every time someone > comes along and asks “Why didn’t you do this other thing?” That doesn’t > mean a rejected proposal can’t later be revisited and the SIP can’t be > updated. > > For reference from the Python community, PEP 492 > <https://www.python.org/dev/peps/pep-0492/>, a Python Enhancement > Proposal for adding async and await syntax and “first-class” coroutines > to Python, has a section on rejected ideas > <https://www.python.org/dev/peps/pep-0492/#why-async-def> for the new > syntax. It captures a summary of what the devs discussed, but it doesn’t > mean the PEP can’t be updated and a previously rejected proposal can’t be > revived. > > At least in the Python community, a PEP serves not just as formal starting > point for a proposal (the “real” starting point is usually a discussion on > python-ideas or python-dev), but also as documentation of what was agreed > on and a living “spec” of sorts. So PEPs sometimes get updated years after > they are approved when revisions are agreed upon. PEPs are also intended > for wide consumption, vs. bug tracker issues which the broader Python dev > community are not expected to follow closely. > > Dunno if we want to follow a similar pattern for Spark, since the > project’s needs are different. But the Python community has used PEPs to > help organize and steer development since 2000; there are plenty of > examples there we can probably take inspiration from. > > By the way, can we call these things something other than Spark > Improvement Proposals? The acronym, SIP, conflicts with Scala SIPs > <http://docs.scala-lang.org/sips/index.html>. Since the Scala and Spark > communities have a lot of overlap, we don’t want, for example, names like > “SIP-10” to have an ambiguous meaning. > > Nick > > > On Sun, Oct 9, 2016 at 3:34 PM Matei Zaharia <matei.zaha...@gmail.com> > wrote: > > Hi Cody, > > I think this would be a lot more concrete if we had a more detailed > template for SIPs. Right now, it's not super clear what's in scope -- e.g. > are they a way to solicit feedback on the user-facing behavior or on the > internals? "Goals" can cover both things. I've been thinking of SIPs more > as Product Requirements Docs (PRDs), which focus on *what* a code change > should do as opposed to how. > > In particular, here are some things that you may or may not consider in > scope for SIPs: > > - Goals and non-goals: This is definitely in scope, and IMO should focus > on user-visible behavior (e.g. "system supports SQL window functions" or > "system continues working if one node fails"). BTW I wouldn't say "rejected > goals" because some of them might become goals later, so we're not > definitively rejecting them. > > - Public API: Probably should be included in most SIPs unless it's too > large to fully specify then (e.g. "let's add an ML library"). > > - Use cases: I usually find this very useful in PRDs to better communicate > the goals. > > - Internal architecture: This is usually *not* a thing users can easily > comment on and it sounds more like a design doc item. Of course it's > important to show that the SIP is feasible to implement. One exception, > however, is that I think we'll have some SIPs primarily on internals (e.g. > if somebody wants to refactor Spark's query optimizer or something). > > - Rejected strategies: I personally wouldn't put this, because what's the > point of voting to reject a strategy before you've really begun designing > and implementing something? What if you discover that the strategy is > actually better when you start doing stuff? > > At a super high level, it depends on whether you want the SIPs to be PRDs > for getting some quick feedback on the goals of a feature before it is > designed, or something more like full-fledged design docs (just a more > visible design doc for bigger changes). I looked at Kafka's KIPs, and they > actually seem to be more like design docs. This can work too but it does > require more work from the proposer and it can lead to the same problems > you mentioned with people already having a design and implementation in > mind. > > Basically, the question is, are you trying to iterate faster on design by > adding a step for user feedback earlier? Or are you just trying to make > design docs for key features more visible (and their approval more formal)? > > BTW note that in either case, I'd like to have a template for design docs > too, which should also include goals. I think that would've avoided some of > the issues you brought up. > > Matei > > On Oct 9, 2016, at 10:40 AM, Cody Koeninger <c...@koeninger.org> wrote: > > Here's my specific proposal (meta-proposal?) > > Spark Improvement Proposals (SIP) > > > Background: > > The current problem is that design and implementation of large features > are often done in private, before soliciting user feedback. > > When feedback is solicited, it is often as to detailed design specifics, > not focused on goals. > > When implementation does take place after design, there is often > disagreement as to what goals are or are not in scope. > > This results in commits that don't fully meet user needs. > > > Goals: > > - Ensure user, contributor, and committer goals are clearly identified and > agreed upon, before implementation takes place. > > - Ensure that a technically feasible strategy is chosen that is likely to > meet the goals. > > > Rejected Goals: > > - SIPs are not for detailed design. Design by committee doesn't work. > > - SIPs are not for every change. We dont need that much process. > > > Strategy: > > My suggestion is outlined as a Spark Improvement Proposal process > documented at > > > https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md > > Specifics of Jira manipulation are an implementation detail we can figure > out. > > I'm suggesting voting; the need here is for a _clear_ outcome. > > > Rejected Strategies: > > Having someone who understands the problem implement it first works, but > only if significant iteration after user feedback is allowed. > > Historically this has been problematic due to pressure to limit public api > changes. > > On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <r...@databricks.com> wrote: > > Alright looks like there are quite a bit of support. We should wait to > hear from more people too. > > To push this forward, Cody and I will be working together in the next > couple of weeks to come up with a concrete, detailed proposal on what this > entails, and then we can discuss this the specific proposal as well. > > > On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <c...@koeninger.org> wrote: > > Yeah, in case it wasn't clear, I was talking about SIPs for major > user-facing or cross-cutting changes, not minor feature adds. > > On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos < > stavros.kontopou...@lightbend.com> wrote: > > +1 to the SIP label as long as it does not slow down things and it targets > optimizing efforts, coordination etc. For example really small features > should not need to go through this process (assuming they dont touch public > interfaces) or re-factorings and hope it will be kept this way. So as a > guideline doc should be provided, like in the KIP case. > > IMHO so far aside from tagging things and linking them elsewhere simply > having design docs and prototypes implementations in PRs is not something > that has not worked so far. What is really a pain in many projects out > there is discontinuity in progress of PRs, missing features, slow reviews > which is understandable to some extent... it is not only about Spark but > things can be improved for sure for this project in particular as already > stated. > > On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <c...@koeninger.org> > wrote: > > +1 to adding an SIP label and linking it from the website. I think it > needs > > - template that focuses it towards soliciting user goals / non goals > - clear resolution as to which strategy was chosen to pursue. I'd > recommend a vote. > > Matei asked me to clarify what I meant by changing interfaces, I think > it's directly relevant to the SIP idea so I'll clarify here, and split > a thread for the other discussion per Nicholas' request. > > I meant changing public user interfaces. I think the first design is > unlikely to be right, because it's done at a time when you have the > least information. As a user, I find it considerably more frustrating > to be unable to use a tool to get my job done, than I do having to > make minor changes to my code in order to take advantage of features. > I've seen committers be seriously reluctant to allow changes to > @experimental code that are needed in order for it to really work > right. You need to be able to iterate, and if people on both sides of > the fence aren't going to respect that some newer apis are subject to > change, then why even mark them as such? > > Ideally a finished SIP should give me a checklist of things that an > implementation must do, and things that it doesn't need to do. > Contributors/committers should be seriously discouraged from putting > out a version 0.1 that doesn't have at least a prototype > implementation of all those things, especially if they're then going > to argue against interface changes necessary to get the the rest of > the things done in the 0.2 version. > > > On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <r...@databricks.com> wrote: > > I like the lightweight proposal to add a SIP label. > > > > During Spark 2.0 development, Tom (Graves) and I suggested using wiki to > > track the list of major changes, but that never really materialized due > to > > the overhead. Adding a SIP label on major JIRAs and then link to them > > prominently on the Spark website makes a lot of sense. > > > > > > On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia <matei.zaha...@gmail.com> > > wrote: > >> > >> For the improvement proposals, I think one major point was to make them > >> really visible to users who are not contributors, so we should do more > than > >> sending stuff to dev@. One very lightweight idea is to have a new type > of > >> JIRA called a SIP and have a link to a filter that shows all such JIRAs > from > >> http://spark.apache.org. I also like the idea of SIP and design doc > >> templates (in fact many projects have them). > >> > >> Matei > >> > >> On Oct 7, 2016, at 10:38 AM, Reynold Xin <r...@databricks.com> wrote: > >> > >> I called Cody last night and talked about some of the topics in his > email. > >> It became clear to me Cody genuinely cares about the project. > >> > >> Some of the frustrations come from the success of the project itself > >> becoming very "hot", and it is difficult to get clarity from people who > >> don't dedicate all their time to Spark. In fact, it is in some ways > similar > >> to scaling an engineering team in a successful startup: old processes > that > >> worked well might not work so well when it gets to a certain size, > cultures > >> can get diluted, building culture vs building process, etc. > >> > >> I also really like to have a more visible process for larger changes, > >> especially major user facing API changes. Historically we upload design > docs > >> for major changes, but it is not always consistent and difficult to > quality > >> of the docs, due to the volunteering nature of the organization. > >> > >> Some of the more concrete ideas we discussed focus on building a culture > >> to improve clarity: > >> > >> - Process: Large changes should have design docs posted on JIRA. One > thing > >> Cody and I didn't discuss but an idea that just came to me is we should > >> create a design doc template for the project and ask everybody to > follow. > >> The design doc template should also explicitly list goals and > non-goals, to > >> make design doc more consistent. > >> > >> - Process: Email dev@ to solicit feedback. We have some this with some > >> changes, but again very inconsistent. Just posting something on JIRA > isn't > >> sufficient, because there are simply too many JIRAs and the signal get > lost > >> in the noise. While this is generally impossible to enforce because we > can't > >> force all volunteers to conform to a process (or they might not even be > >> aware of this), those who are more familiar with the project can help > by > >> emailing the dev@ when they see something that hasn't been. > >> > >> - Culture: The design doc author(s) should be open to feedback. A design > >> doc should serve as the base for discussion and is by no means the final > >> design. Of course, this does not mean the author has to accept every > >> feedback. They should also be comfortable accepting / rejecting ideas on > >> technical grounds. > >> > >> - Process / Culture: For major ongoing projects, it can be useful to > have > >> some monthly Google hangouts that are open to the world. I am actually > not > >> sure how well this will work, because of the volunteering nature and we > need > >> to adjust for timezones for people across the globe, but it seems worth > >> trying. > >> > >> - Culture: Contributors (including committers) should be more direct in > >> setting expectations, including whether they are working on a specific > >> issue, whether they will be working on a specific issue, and whether an > >> issue or pr or jira should be rejected. Most people I know in this > community > >> are nice and don't enjoy telling other people no, but it is often more > >> annoying to a contributor to not know anything than getting a no. > >> > >> > >> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia <matei.zaha...@gmail.com > > > >> wrote: > >>> > >>> > >>> Love the idea of a more visible "Spark Improvement Proposal" process > that > >>> solicits user input on new APIs. For what it's worth, I don't think > >>> committers are trying to minimize their own work -- every committer > cares > >>> about making the software useful for users. However, it is always hard > to > >>> get user input and so it helps to have this kind of process. I've > certainly > >>> looked at the *IPs a lot in other software I use just to see the > biggest > >>> things on the roadmap. > >>> > >>> When you're talking about "changing interfaces", are you talking about > >>> public or internal APIs? I do think many people hate changing public > APIs > >>> and I actually think that's for the best of the project. That's a > technical > >>> debate, but basically, the worst thing when you're using a piece of > software > >>> is that the developers constantly ask you to rewrite your app to > update to a > >>> new version (and thus benefit from bug fixes, etc). Cue anyone who's > used > >>> Protobuf, or Guava. The "let's get everyone to change their code this > >>> release" model works well within a single large company, but doesn't > work > >>> well for a community, which is why nearly all *very* widely used > programming > >>> interfaces (I'm talking things like Java standard library, Windows > API, etc) > >>> almost *never* break backwards compatibility. All this is done within > reason > >>> though, e.g. we do change things in major releases (2.x, 3.x, etc). > >> > >> > >> > >> > > > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > > > > -- > Stavros Kontopoulos > > *Senior Software Engineer* > *Lightbend, Inc.* > > *p: +30 6977967274 <%2B1%20650%20678%200020>* > *e: stavros.kontopou...@lightbend.com* <dave.mar...@lightbend.com> > > > > > > > >