[discuss] separate API annotation into two components: InterfaceAudience & InterfaceStability
We currently have three levels of interface annotation: - unannotated: stable public API - DeveloperApi: A lower-level, unstable API intended for developers. - Experimental: An experimental user-facing API. After using this annotation for ~ 2 years, I would like to propose the following changes: 1. Require explicitly annotation for public APIs. This reduces the chance of us accidentally exposing private APIs. 2. Separate interface annotation into two components: one that describes intended audience, and the other that describes stability, similar to what Hadoop does. This allows us to define "low level" APIs that are stable, e.g. the data source API (I'd argue this is the API that should be more stable than end-user-facing APIs). InterfaceAudience: Public, Developer InterfaceStability: Stable, Experimental What do you think?
Re: [discuss] separate API annotation into two components: InterfaceAudience & InterfaceStability
I think this is fairly important to do so I went ahead and created a PR for the first mini step: https://github.com/apache/spark/pull/15374 On Wed, Aug 24, 2016 at 9:48 AM, Reynold Xin wrote: > Looks like I'm general people like it. Next step is for somebody to take > the lead and implement it. > > Tom do you have cycles to do this? > > > On Wednesday, August 24, 2016, Tom Graves wrote: > >> ping, did this discussion conclude or did we decide what we are doing? >> >> Tom >> >> >> On Friday, May 13, 2016 3:19 PM, Michael Armbrust >> wrote: >> >> >> +1 to the general structure of Reynold's proposal. I've found what we do >> currently a little confusing. In particular, it doesn't make much sense >> that @DeveloperApi things are always labeled as possibly changing. For >> example the Data Source API should arguably be one of the most stable >> interfaces since its very difficult for users to recompile libraries that >> might break when there are changes. >> >> For a similar reason, I don't really see the point of LimitedPrivate. >> The goal here should be communication of promises of stability or future >> stability. >> >> Regarding Developer vs. Public. I don't care too much about the naming, >> but it does seem useful to differentiate APIs that we expect end users to >> consume from those that are used to augment Spark. "Library" and >> "Application" also seem reasonable. >> >> On Fri, May 13, 2016 at 11:15 AM, Marcelo Vanzin >> wrote: >> >> On Fri, May 13, 2016 at 10:18 AM, Sean Busbey >> wrote: >> > I think LimitedPrivate gets a bad rap due to the way it is misused in >> > Hadoop. The use case here -- "we offer this to developers of >> > intermediate layers; those willing to update their software as we >> > update ours" >> >> I think "LimitedPrivate" is a rather confusing name for that. I think >> Reynold's first e-mail better matches that use case: this would be >> "InterfaceAudience(Developer)" and "InterfaceStability(Experimental)". >> >> But I don't really like "Developer" as a name here, because it's >> ambiguous. Developer of what? Theoretically everybody writing Spark or >> on top of its APIs is a developer. In that sense, I prefer using >> something like "Library" and "Application" instead of "Developer" and >> "Public". >> >> Personally, in fact, I don't see a lot of gain in differentiating >> between the target users of an interface... knowing whether it's a >> stable interface or not is a lot more useful. If you're equating a >> "developer API" with "it's not really stable", then you don't really >> need two annotations for that - just say it's not stable. >> >> -- >> Marcelo >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> For additional commands, e-mail: dev-h...@spark.apache.org >> >> >> >> >>
Re: [discuss] separate API annotation into two components: InterfaceAudience & InterfaceStability
On Thu, May 12, 2016 at 2:29 PM, Reynold Xin wrote: > We currently have three levels of interface annotation: > > - unannotated: stable public API > - DeveloperApi: A lower-level, unstable API intended for developers. > - Experimental: An experimental user-facing API. > > > After using this annotation for ~ 2 years, I would like to propose the > following changes: > > 1. Require explicitly annotation for public APIs. This reduces the chance of > us accidentally exposing private APIs. > +1 > 2. Separate interface annotation into two components: one that describes > intended audience, and the other that describes stability, similar to what > Hadoop does. This allows us to define "low level" APIs that are stable, e.g. > the data source API (I'd argue this is the API that should be more stable > than end-user-facing APIs). > > InterfaceAudience: Public, Developer > > InterfaceStability: Stable, Experimental > I'm not very sure about this. What advantage do we get from Public vs. Developer ? Also somebody needs to take a judgement call on that which might not always be easy to do > > What do you think? - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [discuss] separate API annotation into two components: InterfaceAudience & InterfaceStability
That's true. I think I want to differentiate end-user vs developer. Public isn't the best word. Maybe EndUser? On Thu, May 12, 2016 at 3:34 PM, Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > On Thu, May 12, 2016 at 2:29 PM, Reynold Xin wrote: > > We currently have three levels of interface annotation: > > > > - unannotated: stable public API > > - DeveloperApi: A lower-level, unstable API intended for developers. > > - Experimental: An experimental user-facing API. > > > > > > After using this annotation for ~ 2 years, I would like to propose the > > following changes: > > > > 1. Require explicitly annotation for public APIs. This reduces the > chance of > > us accidentally exposing private APIs. > > > +1 > > > 2. Separate interface annotation into two components: one that describes > > intended audience, and the other that describes stability, similar to > what > > Hadoop does. This allows us to define "low level" APIs that are stable, > e.g. > > the data source API (I'd argue this is the API that should be more stable > > than end-user-facing APIs). > > > > InterfaceAudience: Public, Developer > > > > InterfaceStability: Stable, Experimental > > > I'm not very sure about this. What advantage do we get from Public vs. > Developer ? Also somebody needs to take a judgement call on that which > might not always be easy to do > > > > What do you think? >
Re: [discuss] separate API annotation into two components: InterfaceAudience & InterfaceStability
We could switch to the Audience Annotation from Apache Yetus[1], and then rely on Public for end-users and LimitedPrivate for those things we intend as lower-level things with particular non-end-user audiences. [1]: http://yetus.apache.org/documentation/in-progress/#yetus-audience-annotations On Thu, May 12, 2016 at 3:35 PM, Reynold Xin wrote: > That's true. I think I want to differentiate end-user vs developer. Public > isn't the best word. Maybe EndUser? > > On Thu, May 12, 2016 at 3:34 PM, Shivaram Venkataraman > wrote: >> >> On Thu, May 12, 2016 at 2:29 PM, Reynold Xin wrote: >> > We currently have three levels of interface annotation: >> > >> > - unannotated: stable public API >> > - DeveloperApi: A lower-level, unstable API intended for developers. >> > - Experimental: An experimental user-facing API. >> > >> > >> > After using this annotation for ~ 2 years, I would like to propose the >> > following changes: >> > >> > 1. Require explicitly annotation for public APIs. This reduces the >> > chance of >> > us accidentally exposing private APIs. >> > >> +1 >> >> > 2. Separate interface annotation into two components: one that describes >> > intended audience, and the other that describes stability, similar to >> > what >> > Hadoop does. This allows us to define "low level" APIs that are stable, >> > e.g. >> > the data source API (I'd argue this is the API that should be more >> > stable >> > than end-user-facing APIs). >> > >> > InterfaceAudience: Public, Developer >> > >> > InterfaceStability: Stable, Experimental >> > >> I'm not very sure about this. What advantage do we get from Public vs. >> Developer ? Also somebody needs to take a judgement call on that which >> might not always be easy to do >> > >> > What do you think? > > -- busbey - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [discuss] separate API annotation into two components: InterfaceAudience & InterfaceStability
Hm LimitedPrivate is not the intention. Those APIs (e.g. data source) are by no means private. They are just lower level APIs whose intended audience is library developers, not end users. On Thu, May 12, 2016 at 8:32 PM, Sean Busbey wrote: > We could switch to the Audience Annotation from Apache Yetus[1], and > then rely on Public for end-users and LimitedPrivate for those things > we intend as lower-level things with particular non-end-user > audiences. > > [1]: > http://yetus.apache.org/documentation/in-progress/#yetus-audience-annotations > > On Thu, May 12, 2016 at 3:35 PM, Reynold Xin wrote: > > That's true. I think I want to differentiate end-user vs developer. > Public > > isn't the best word. Maybe EndUser? > > > > On Thu, May 12, 2016 at 3:34 PM, Shivaram Venkataraman > > wrote: > >> > >> On Thu, May 12, 2016 at 2:29 PM, Reynold Xin > wrote: > >> > We currently have three levels of interface annotation: > >> > > >> > - unannotated: stable public API > >> > - DeveloperApi: A lower-level, unstable API intended for developers. > >> > - Experimental: An experimental user-facing API. > >> > > >> > > >> > After using this annotation for ~ 2 years, I would like to propose the > >> > following changes: > >> > > >> > 1. Require explicitly annotation for public APIs. This reduces the > >> > chance of > >> > us accidentally exposing private APIs. > >> > > >> +1 > >> > >> > 2. Separate interface annotation into two components: one that > describes > >> > intended audience, and the other that describes stability, similar to > >> > what > >> > Hadoop does. This allows us to define "low level" APIs that are > stable, > >> > e.g. > >> > the data source API (I'd argue this is the API that should be more > >> > stable > >> > than end-user-facing APIs). > >> > > >> > InterfaceAudience: Public, Developer > >> > > >> > InterfaceStability: Stable, Experimental > >> > > >> I'm not very sure about this. What advantage do we get from Public vs. > >> Developer ? Also somebody needs to take a judgement call on that which > >> might not always be easy to do > >> > > >> > What do you think? > > > > > > > > -- > busbey >
Re: [discuss] separate API annotation into two components: InterfaceAudience & InterfaceStability
> On 12 May 2016, at 22:29, Reynold Xin wrote: > > We currently have three levels of interface annotation: > > - unannotated: stable public API > - DeveloperApi: A lower-level, unstable API intended for developers. > - Experimental: An experimental user-facing API. > > > After using this annotation for ~ 2 years, I would like to propose the > following changes: > > 1. Require explicitly annotation for public APIs. This reduces the chance of > us accidentally exposing private APIs. +1 > > 2. Separate interface annotation into two components: one that describes > intended audience, and the other that describes stability, similar to what > Hadoop does. This allows us to define "low level" APIs that are stable, e.g. > the data source API (I'd argue this is the API that should be more stable > than end-user-facing APIs). > > InterfaceAudience: Public, Developer > > InterfaceStability: Stable, Experimental > > > What do you think? you should know there's a bit of a "discussion" in Hadoop right now about what "LimitedPrivate" means, that is: things marked "LimitedPrivate(MapReduce)" are pretty much universally used in YARN apps, and other things tagged as private (UGI) are so universal that its meaningless. That is: even if you tag up something as Developer, it may end up being used so widely that it becomes public. The hard part then becomes recognising which classes and methods have such a use, which ends up needing an IDE with everything loaded in. Java 9 is going to open up a lot more in terms of modularization, though i don't know what that will mean for scala. For Java projects, it may allow isolation to be more explicit - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [discuss] separate API annotation into two components: InterfaceAudience & InterfaceStability
So we definitely need to be careful here. I know you didn't mention it but it mentioned by others so I would not recommend using LimitedPrivate. I had started a discussion on Hadoop about some of this due to the way Spark needed to use some of the Api's.https://issues.apache.org/jira/browse/HADOOP-10506 Overall it seems like a good idea, but we definitely need definitions with these and make sure they are clear to the end user looking at the code or docs. I assume Developer really means to be used only within Spark? Developer is a pretty broad term which could mean end user developer or spark internal developer, etc. Hadoop uses Private for this I think from an end user point of view PRIVATE is more obvious that they shouldn't be using it. So perhaps something other then Developer. (INTERNAL, PROJECT_PRIVATE, etc.) Tom On Thursday, May 12, 2016 4:29 PM, Reynold Xin wrote: We currently have three levels of interface annotation: - unannotated: stable public API- DeveloperApi: A lower-level, unstable API intended for developers.- Experimental: An experimental user-facing API. After using this annotation for ~ 2 years, I would like to propose the following changes: 1. Require explicitly annotation for public APIs. This reduces the chance of us accidentally exposing private APIs. 2. Separate interface annotation into two components: one that describes intended audience, and the other that describes stability, similar to what Hadoop does. This allows us to define "low level" APIs that are stable, e.g. the data source API (I'd argue this is the API that should be more stable than end-user-facing APIs). InterfaceAudience: Public, Developer InterfaceStability: Stable, Experimental What do you think?
Re: [discuss] separate API annotation into two components: InterfaceAudience & InterfaceStability
On Fri, May 13, 2016 at 6:37 AM, Tom Graves wrote: > So we definitely need to be careful here. I know you didn't mention it but > it mentioned by others so I would not recommend using LimitedPrivate. I had > started a discussion on Hadoop about some of this due to the way Spark > needed to use some of the Api's. > https://issues.apache.org/jira/browse/HADOOP-10506 > I think LimitedPrivate gets a bad rap due to the way it is misused in Hadoop. The use case here -- "we offer this to developers of intermediate layers; those willing to update their software as we update ours" -- is a perfectly acceptable distinction from the "this is just for us" and "this is something folks can rely on enough to contract out their software development". Essentially, LimitedPrivate(LIBRARY) or LimitedPrivate(PORCELAIN) (to borrow from git's distinction on interfaces for tool makers vs end users). -- busbey - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [discuss] separate API annotation into two components: InterfaceAudience & InterfaceStability
On Fri, May 13, 2016 at 10:18 AM, Sean Busbey wrote: > I think LimitedPrivate gets a bad rap due to the way it is misused in > Hadoop. The use case here -- "we offer this to developers of > intermediate layers; those willing to update their software as we > update ours" I think "LimitedPrivate" is a rather confusing name for that. I think Reynold's first e-mail better matches that use case: this would be "InterfaceAudience(Developer)" and "InterfaceStability(Experimental)". But I don't really like "Developer" as a name here, because it's ambiguous. Developer of what? Theoretically everybody writing Spark or on top of its APIs is a developer. In that sense, I prefer using something like "Library" and "Application" instead of "Developer" and "Public". Personally, in fact, I don't see a lot of gain in differentiating between the target users of an interface... knowing whether it's a stable interface or not is a lot more useful. If you're equating a "developer API" with "it's not really stable", then you don't really need two annotations for that - just say it's not stable. -- Marcelo - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [discuss] separate API annotation into two components: InterfaceAudience & InterfaceStability
+1 to the general structure of Reynold's proposal. I've found what we do currently a little confusing. In particular, it doesn't make much sense that @DeveloperApi things are always labeled as possibly changing. For example the Data Source API should arguably be one of the most stable interfaces since its very difficult for users to recompile libraries that might break when there are changes. For a similar reason, I don't really see the point of LimitedPrivate. The goal here should be communication of promises of stability or future stability. Regarding Developer vs. Public. I don't care too much about the naming, but it does seem useful to differentiate APIs that we expect end users to consume from those that are used to augment Spark. "Library" and "Application" also seem reasonable. On Fri, May 13, 2016 at 11:15 AM, Marcelo Vanzin wrote: > On Fri, May 13, 2016 at 10:18 AM, Sean Busbey wrote: > > I think LimitedPrivate gets a bad rap due to the way it is misused in > > Hadoop. The use case here -- "we offer this to developers of > > intermediate layers; those willing to update their software as we > > update ours" > > I think "LimitedPrivate" is a rather confusing name for that. I think > Reynold's first e-mail better matches that use case: this would be > "InterfaceAudience(Developer)" and "InterfaceStability(Experimental)". > > But I don't really like "Developer" as a name here, because it's > ambiguous. Developer of what? Theoretically everybody writing Spark or > on top of its APIs is a developer. In that sense, I prefer using > something like "Library" and "Application" instead of "Developer" and > "Public". > > Personally, in fact, I don't see a lot of gain in differentiating > between the target users of an interface... knowing whether it's a > stable interface or not is a lot more useful. If you're equating a > "developer API" with "it's not really stable", then you don't really > need two annotations for that - just say it's not stable. > > -- > Marcelo > > - > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >
Re: [discuss] separate API annotation into two components: InterfaceAudience & InterfaceStability
ping, did this discussion conclude or did we decide what we are doing? Tom On Friday, May 13, 2016 3:19 PM, Michael Armbrust wrote: +1 to the general structure of Reynold's proposal. I've found what we do currently a little confusing. In particular, it doesn't make much sense that @DeveloperApi things are always labeled as possibly changing. For example the Data Source API should arguably be one of the most stable interfaces since its very difficult for users to recompile libraries that might break when there are changes. For a similar reason, I don't really see the point of LimitedPrivate. The goal here should be communication of promises of stability or future stability. Regarding Developer vs. Public. I don't care too much about the naming, but it does seem useful to differentiate APIs that we expect end users to consume from those that are used to augment Spark. "Library" and "Application" also seem reasonable. On Fri, May 13, 2016 at 11:15 AM, Marcelo Vanzin wrote: On Fri, May 13, 2016 at 10:18 AM, Sean Busbey wrote: > I think LimitedPrivate gets a bad rap due to the way it is misused in > Hadoop. The use case here -- "we offer this to developers of > intermediate layers; those willing to update their software as we > update ours" I think "LimitedPrivate" is a rather confusing name for that. I think Reynold's first e-mail better matches that use case: this would be "InterfaceAudience(Developer)" and "InterfaceStability(Experimental)". But I don't really like "Developer" as a name here, because it's ambiguous. Developer of what? Theoretically everybody writing Spark or on top of its APIs is a developer. In that sense, I prefer using something like "Library" and "Application" instead of "Developer" and "Public". Personally, in fact, I don't see a lot of gain in differentiating between the target users of an interface... knowing whether it's a stable interface or not is a lot more useful. If you're equating a "developer API" with "it's not really stable", then you don't really need two annotations for that - just say it's not stable. -- Marcelo - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [discuss] separate API annotation into two components: InterfaceAudience & InterfaceStability
Looks like I'm general people like it. Next step is for somebody to take the lead and implement it. Tom do you have cycles to do this? On Wednesday, August 24, 2016, Tom Graves wrote: > ping, did this discussion conclude or did we decide what we are doing? > > Tom > > > On Friday, May 13, 2016 3:19 PM, Michael Armbrust > wrote: > > > +1 to the general structure of Reynold's proposal. I've found what we do > currently a little confusing. In particular, it doesn't make much sense > that @DeveloperApi things are always labeled as possibly changing. For > example the Data Source API should arguably be one of the most stable > interfaces since its very difficult for users to recompile libraries that > might break when there are changes. > > For a similar reason, I don't really see the point of LimitedPrivate. > The goal here should be communication of promises of stability or future > stability. > > Regarding Developer vs. Public. I don't care too much about the naming, > but it does seem useful to differentiate APIs that we expect end users to > consume from those that are used to augment Spark. "Library" and > "Application" also seem reasonable. > > On Fri, May 13, 2016 at 11:15 AM, Marcelo Vanzin > wrote: > > On Fri, May 13, 2016 at 10:18 AM, Sean Busbey > wrote: > > I think LimitedPrivate gets a bad rap due to the way it is misused in > > Hadoop. The use case here -- "we offer this to developers of > > intermediate layers; those willing to update their software as we > > update ours" > > I think "LimitedPrivate" is a rather confusing name for that. I think > Reynold's first e-mail better matches that use case: this would be > "InterfaceAudience(Developer)" and "InterfaceStability(Experimental)". > > But I don't really like "Developer" as a name here, because it's > ambiguous. Developer of what? Theoretically everybody writing Spark or > on top of its APIs is a developer. In that sense, I prefer using > something like "Library" and "Application" instead of "Developer" and > "Public". > > Personally, in fact, I don't see a lot of gain in differentiating > between the target users of an interface... knowing whether it's a > stable interface or not is a lot more useful. If you're equating a > "developer API" with "it's not really stable", then you don't really > need two annotations for that - just say it's not stable. > > -- > Marcelo > > - > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > > For additional commands, e-mail: dev-h...@spark.apache.org > > > > > >