Re: Spark Improvement Proposals

Matei Zaharia Sat, 08 Oct 2016 14:22:59 -0700

Sounds good. Just to comment on the compatibility part:

> I meant changing public user interfaces.  I think the first design is
> unlikely to be right, because it's done at a time when you have the
> least information.  As a user, I find it considerably more frustrating
> to be unable to use a tool to get my job done, than I do having to
> make minor changes to my code in order to take advantage of features.
> I've seen committers be seriously reluctant to allow changes to
> @experimental code that are needed in order for it to really work
> right.  You need to be able to iterate, and if people on both sides of
> the fence aren't going to respect that some newer apis are subject to
> change, then why even mark them as such?
> 
> Ideally a finished SIP should give me a checklist of things that an
> implementation must do, and things that it doesn't need to do.
> Contributors/committers should be seriously discouraged from putting
> out a version 0.1 that doesn't have at least a prototype
> implementation of all those things, especially if they're then going
> to argue against interface changes necessary to get the the rest of
> the things done in the 0.2 version.


Experimental APIs and alpha components are indeed supposed to be changeable 
(https://cwiki.apache.org/confluence/display/SPARK/Spark+Versioning+Policy). 
Maybe people are being too conservative in some cases, but I do want to note 
that regardless of what precise policy we try to write down, this type of issue 
will ultimately be a judgment call. Is it worth making a small cosmetic change 
in an API that's marked experimental, but has been used widely for a year? 
Perhaps not. Is it worth making it in something one month old, or even in an 
older API as we move to 2.0? Maybe yes. I think we should just discuss each one 
(start an email thread if resolving it on JIRA is too complex) and perhaps be 
more religious about making things non-experimental when we think they're done.

Matei


> 
> 
> On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <r...@databricks.com> wrote:
>> I like the lightweight proposal to add a SIP label.
>> 
>> During Spark 2.0 development, Tom (Graves) and I suggested using wiki to
>> track the list of major changes, but that never really materialized due to
>> the overhead. Adding a SIP label on major JIRAs and then link to them
>> prominently on the Spark website makes a lot of sense.
>> 
>> 
>> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia <matei.zaha...@gmail.com>
>> wrote:
>>> 
>>> For the improvement proposals, I think one major point was to make them
>>> really visible to users who are not contributors, so we should do more than
>>> sending stuff to dev@. One very lightweight idea is to have a new type of
>>> JIRA called a SIP and have a link to a filter that shows all such JIRAs from
>>> http://spark.apache.org. I also like the idea of SIP and design doc
>>> templates (in fact many projects have them).
>>> 
>>> Matei
>>> 
>>> On Oct 7, 2016, at 10:38 AM, Reynold Xin <r...@databricks.com> wrote:
>>> 
>>> I called Cody last night and talked about some of the topics in his email.
>>> It became clear to me Cody genuinely cares about the project.
>>> 
>>> Some of the frustrations come from the success of the project itself
>>> becoming very "hot", and it is difficult to get clarity from people who
>>> don't dedicate all their time to Spark. In fact, it is in some ways similar
>>> to scaling an engineering team in a successful startup: old processes that
>>> worked well might not work so well when it gets to a certain size, cultures
>>> can get diluted, building culture vs building process, etc.
>>> 
>>> I also really like to have a more visible process for larger changes,
>>> especially major user facing API changes. Historically we upload design docs
>>> for major changes, but it is not always consistent and difficult to quality
>>> of the docs, due to the volunteering nature of the organization.
>>> 
>>> Some of the more concrete ideas we discussed focus on building a culture
>>> to improve clarity:
>>> 
>>> - Process: Large changes should have design docs posted on JIRA. One thing
>>> Cody and I didn't discuss but an idea that just came to me is we should
>>> create a design doc template for the project and ask everybody to follow.
>>> The design doc template should also explicitly list goals and non-goals, to
>>> make design doc more consistent.
>>> 
>>> - Process: Email dev@ to solicit feedback. We have some this with some
>>> changes, but again very inconsistent. Just posting something on JIRA isn't
>>> sufficient, because there are simply too many JIRAs and the signal get lost
>>> in the noise. While this is generally impossible to enforce because we can't
>>> force all volunteers to conform to a process (or they might not even be
>>> aware of this),  those who are more familiar with the project can help by
>>> emailing the dev@ when they see something that hasn't been.
>>> 
>>> - Culture: The design doc author(s) should be open to feedback. A design
>>> doc should serve as the base for discussion and is by no means the final
>>> design. Of course, this does not mean the author has to accept every
>>> feedback. They should also be comfortable accepting / rejecting ideas on
>>> technical grounds.
>>> 
>>> - Process / Culture: For major ongoing projects, it can be useful to have
>>> some monthly Google hangouts that are open to the world. I am actually not
>>> sure how well this will work, because of the volunteering nature and we need
>>> to adjust for timezones for people across the globe, but it seems worth
>>> trying.
>>> 
>>> - Culture: Contributors (including committers) should be more direct in
>>> setting expectations, including whether they are working on a specific
>>> issue, whether they will be working on a specific issue, and whether an
>>> issue or pr or jira should be rejected. Most people I know in this community
>>> are nice and don't enjoy telling other people no, but it is often more
>>> annoying to a contributor to not know anything than getting a no.
>>> 
>>> 
>>> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia <matei.zaha...@gmail.com>
>>> wrote:
>>>> 
>>>> 
>>>> Love the idea of a more visible "Spark Improvement Proposal" process that
>>>> solicits user input on new APIs. For what it's worth, I don't think
>>>> committers are trying to minimize their own work -- every committer cares
>>>> about making the software useful for users. However, it is always hard to
>>>> get user input and so it helps to have this kind of process. I've certainly
>>>> looked at the *IPs a lot in other software I use just to see the biggest
>>>> things on the roadmap.
>>>> 
>>>> When you're talking about "changing interfaces", are you talking about
>>>> public or internal APIs? I do think many people hate changing public APIs
>>>> and I actually think that's for the best of the project. That's a technical
>>>> debate, but basically, the worst thing when you're using a piece of 
>>>> software
>>>> is that the developers constantly ask you to rewrite your app to update to 
>>>> a
>>>> new version (and thus benefit from bug fixes, etc). Cue anyone who's used
>>>> Protobuf, or Guava. The "let's get everyone to change their code this
>>>> release" model works well within a single large company, but doesn't work
>>>> well for a community, which is why nearly all *very* widely used 
>>>> programming
>>>> interfaces (I'm talking things like Java standard library, Windows API, 
>>>> etc)
>>>> almost *never* break backwards compatibility. All this is done within 
>>>> reason
>>>> though, e.g. we do change things in major releases (2.x, 3.x, etc).
>>> 
>>> 
>>> 
>>> 
>> 


---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Spark Improvement Proposals

Reply via email to