Sounds good. Just to comment on the compatibility part: > I meant changing public user interfaces. I think the first design is > unlikely to be right, because it's done at a time when you have the > least information. As a user, I find it considerably more frustrating > to be unable to use a tool to get my job done, than I do having to > make minor changes to my code in order to take advantage of features. > I've seen committers be seriously reluctant to allow changes to > @experimental code that are needed in order for it to really work > right. You need to be able to iterate, and if people on both sides of > the fence aren't going to respect that some newer apis are subject to > change, then why even mark them as such? > > Ideally a finished SIP should give me a checklist of things that an > implementation must do, and things that it doesn't need to do. > Contributors/committers should be seriously discouraged from putting > out a version 0.1 that doesn't have at least a prototype > implementation of all those things, especially if they're then going > to argue against interface changes necessary to get the the rest of > the things done in the 0.2 version.
Experimental APIs and alpha components are indeed supposed to be changeable (https://cwiki.apache.org/confluence/display/SPARK/Spark+Versioning+Policy). Maybe people are being too conservative in some cases, but I do want to note that regardless of what precise policy we try to write down, this type of issue will ultimately be a judgment call. Is it worth making a small cosmetic change in an API that's marked experimental, but has been used widely for a year? Perhaps not. Is it worth making it in something one month old, or even in an older API as we move to 2.0? Maybe yes. I think we should just discuss each one (start an email thread if resolving it on JIRA is too complex) and perhaps be more religious about making things non-experimental when we think they're done. Matei > > > On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <r...@databricks.com> wrote: >> I like the lightweight proposal to add a SIP label. >> >> During Spark 2.0 development, Tom (Graves) and I suggested using wiki to >> track the list of major changes, but that never really materialized due to >> the overhead. Adding a SIP label on major JIRAs and then link to them >> prominently on the Spark website makes a lot of sense. >> >> >> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia <matei.zaha...@gmail.com> >> wrote: >>> >>> For the improvement proposals, I think one major point was to make them >>> really visible to users who are not contributors, so we should do more than >>> sending stuff to dev@. One very lightweight idea is to have a new type of >>> JIRA called a SIP and have a link to a filter that shows all such JIRAs from >>> http://spark.apache.org. I also like the idea of SIP and design doc >>> templates (in fact many projects have them). >>> >>> Matei >>> >>> On Oct 7, 2016, at 10:38 AM, Reynold Xin <r...@databricks.com> wrote: >>> >>> I called Cody last night and talked about some of the topics in his email. >>> It became clear to me Cody genuinely cares about the project. >>> >>> Some of the frustrations come from the success of the project itself >>> becoming very "hot", and it is difficult to get clarity from people who >>> don't dedicate all their time to Spark. In fact, it is in some ways similar >>> to scaling an engineering team in a successful startup: old processes that >>> worked well might not work so well when it gets to a certain size, cultures >>> can get diluted, building culture vs building process, etc. >>> >>> I also really like to have a more visible process for larger changes, >>> especially major user facing API changes. Historically we upload design docs >>> for major changes, but it is not always consistent and difficult to quality >>> of the docs, due to the volunteering nature of the organization. >>> >>> Some of the more concrete ideas we discussed focus on building a culture >>> to improve clarity: >>> >>> - Process: Large changes should have design docs posted on JIRA. One thing >>> Cody and I didn't discuss but an idea that just came to me is we should >>> create a design doc template for the project and ask everybody to follow. >>> The design doc template should also explicitly list goals and non-goals, to >>> make design doc more consistent. >>> >>> - Process: Email dev@ to solicit feedback. We have some this with some >>> changes, but again very inconsistent. Just posting something on JIRA isn't >>> sufficient, because there are simply too many JIRAs and the signal get lost >>> in the noise. While this is generally impossible to enforce because we can't >>> force all volunteers to conform to a process (or they might not even be >>> aware of this), those who are more familiar with the project can help by >>> emailing the dev@ when they see something that hasn't been. >>> >>> - Culture: The design doc author(s) should be open to feedback. A design >>> doc should serve as the base for discussion and is by no means the final >>> design. Of course, this does not mean the author has to accept every >>> feedback. They should also be comfortable accepting / rejecting ideas on >>> technical grounds. >>> >>> - Process / Culture: For major ongoing projects, it can be useful to have >>> some monthly Google hangouts that are open to the world. I am actually not >>> sure how well this will work, because of the volunteering nature and we need >>> to adjust for timezones for people across the globe, but it seems worth >>> trying. >>> >>> - Culture: Contributors (including committers) should be more direct in >>> setting expectations, including whether they are working on a specific >>> issue, whether they will be working on a specific issue, and whether an >>> issue or pr or jira should be rejected. Most people I know in this community >>> are nice and don't enjoy telling other people no, but it is often more >>> annoying to a contributor to not know anything than getting a no. >>> >>> >>> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia <matei.zaha...@gmail.com> >>> wrote: >>>> >>>> >>>> Love the idea of a more visible "Spark Improvement Proposal" process that >>>> solicits user input on new APIs. For what it's worth, I don't think >>>> committers are trying to minimize their own work -- every committer cares >>>> about making the software useful for users. However, it is always hard to >>>> get user input and so it helps to have this kind of process. I've certainly >>>> looked at the *IPs a lot in other software I use just to see the biggest >>>> things on the roadmap. >>>> >>>> When you're talking about "changing interfaces", are you talking about >>>> public or internal APIs? I do think many people hate changing public APIs >>>> and I actually think that's for the best of the project. That's a technical >>>> debate, but basically, the worst thing when you're using a piece of >>>> software >>>> is that the developers constantly ask you to rewrite your app to update to >>>> a >>>> new version (and thus benefit from bug fixes, etc). Cue anyone who's used >>>> Protobuf, or Guava. The "let's get everyone to change their code this >>>> release" model works well within a single large company, but doesn't work >>>> well for a community, which is why nearly all *very* widely used >>>> programming >>>> interfaces (I'm talking things like Java standard library, Windows API, >>>> etc) >>>> almost *never* break backwards compatibility. All this is done within >>>> reason >>>> though, e.g. we do change things in major releases (2.x, 3.x, etc). >>> >>> >>> >>> >> --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org