Re: [Rd] [External] Re: Is ALTREP "non-API"?
On Thu, Apr 25, 2024 at 4:24 AM Ivan Krylov via R-devel wrote: > > On Wed, 24 Apr 2024 15:31:39 -0500 (CDT) > luke-tierney--- via R-devel wrote: > > > We would be better off (in my view, not necessarily shared by others > > in R-core) if we could get to a point where: > > > > all entry points listed in installed header files can be used in > > packages, at least with some caveats; > > > > the caveats are expressed in a standard way that is searchable, > > e.g. with a standardized comment syntax at the header file or > > individual declaration level. > > This sounds almost like Doxygen, although the exact syntax used to > denote the entry points and the necessary comments is far from the most > important detail at this point. I'm guessing Doxygen would be overkill here? I think instead these would just be structured comments that mark a particular function, or set of functions, as part of the API -- and some automated tool could then just pull those functions out into a list of API functions. Then, we would have a single "source of truth" for what is in the API, and could be seen at a glance by browsing / grepping the installed R headers. I see this as a structured way of accomplishing what is already being done to clarify whether functions are part of the API in the R headers. A similar approach would have macros like R_API, or with a bit more specificity, maybe something like R_API(ALTREP), which would have no actual definition -- they would exist in the source purely to mark functions as part of (some subset of) the API. Or, similarly, anything declared within a block like R_API {} would be considered part of the API (to avoid the need to tag every declaration individually.) This would at least make it easy to figure out what functions are part of the R API, without requiring too much extra maintenance effort from the R maintainers. The other alternative I could imagine would be an installed header like R_ext/API.h, which package authors who want to submit packages to CRAN would be required to use, with direct usage of other headers eventually being phased out. But that would be a larger maintenance burden, unless its generation could be automated (e.g. from the functions tagged above). As a side note, it's worth stating that the set of API endpoints that R Core wants to make available to CRAN packages, versus those that are intended for other usages (e.g. applications embedding R), are different sets. But I suspect this discussion is most relevant to R package authors who wish to submit their packages to CRAN. > > There are some 500 entry points in the R shared library that are in > > the installed headers but not mentioned in WRE. These would need to > > be reviewed and adjusted. > > Is there a way for outsiders to help? For example, would it help to > produce the linking graph (package P links to entry points X, Y)? I > understand that an entry point being unpopular doesn't mean it > shouldn't be public (and the other way around), but combined with a > list of entry points that are listed in WRE, such a graph could be > useful to direct effort or estimate impact from interface changes. I'm guessing the most welcome kinds of contributions would be documentation? IMHO, "documenting an API" and "describing how an API can be used" are somewhat separate endeavors. I believe R-exts does an excellent job of the latter, but may not be the right vehicle for the former. To that end, I believe it would be helpful to have some structured API documentation as a separate R-api document. Each documented function would have described inputs, outputs, whether inputs + outputs require protection from the garbage collector, and other important usage notes. This is something that I think could be developed and maintained by community members, with members of the R community submitting documentation for each of the available API functions. Such an effort could be started independently from R Core, but some guidance would be appreciated as far as (1) would such a document eventually be accepted as part of the official R manuals, and (2) if so, what would be required of such a document. > -- > Best regards, > Ivan > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Re: Is ALTREP "non-API"?
On Wed, 24 Apr 2024 15:31:39 -0500 (CDT) luke-tierney--- via R-devel wrote: > We would be better off (in my view, not necessarily shared by others > in R-core) if we could get to a point where: > > all entry points listed in installed header files can be used in > packages, at least with some caveats; > > the caveats are expressed in a standard way that is searchable, > e.g. with a standardized comment syntax at the header file or > individual declaration level. This sounds almost like Doxygen, although the exact syntax used to denote the entry points and the necessary comments is far from the most important detail at this point. > There are some 500 entry points in the R shared library that are in > the installed headers but not mentioned in WRE. These would need to > be reviewed and adjusted. Is there a way for outsiders to help? For example, would it help to produce the linking graph (package P links to entry points X, Y)? I understand that an entry point being unpopular doesn't mean it shouldn't be public (and the other way around), but combined with a list of entry points that are listed in WRE, such a graph could be useful to direct effort or estimate impact from interface changes. -- Best regards, Ivan __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Re: Is ALTREP "non-API"?
On Wed, 24 Apr 2024, Hadley Wickham wrote: A few more thoughts based on a simple question: how do you determine the length of a vector? Rf_length() is used in example code in R-exts, but I don't think it's formally documented anywhere (although it's possible I missed it). Is using in an example sufficient to consider a function to be part of the public API? If so, SET_TYPEOF() is used in a number of examples, and hence used by CRAN packages, but is no longer considered part of the public API. Rf_xlength() doesn't appear to be mentioned anywhere in R-exts. Does this imply that long vectors are not part of the exported API? Or is there some other way we should be determining the length of such vectors? Are the macro variants LENGTH and XLENGTH part of the exported API? Are we supposed to use them or avoid them? Relatedly, I presume that LOGICAL() is the way we're supposed to extract logical values from a vector, but it isn't documented in R-exts, suggesting that it's not part of the public API? My pragmatic approach to deciding if an entry point is usable in a package is to grep for it in the installed headers grep for it in WRE if those are good, check the text in both places to make sure it doesn't tell me not to use is The first two can be automated; the text reading can't for now. One place this runs into trouble is when the prose in WRE doesn't explicitly mention the entry point, but says something like 'this one and similar ones are OK'. A couple of years ago I worked on improving some of those by explicitly adding some of those implicit ones, which did sometimes make the text more cumbersome. I'm pretty sure I added LOGICAL() and RAW() at that point (but may be mis-remebering); they are there now. In some other cases I left the text alone but added index entries. That makes them findable with a text search. I think I got most that can be handled that way, but there may be some others left. Far from ideal, but at least a step forward. --- It's also worth pointing out where R-exts does well, with the documentation of utility functions ( https://cran.r-project.org/doc/manuals/R-exts.html#Utility-functions). I think this is what most people would consider documentation to imply, i.e. a list of input arguments/types, the output type, and basic notes on their operation. --- Finally, it's worth noting that there's some lingering ill feelings over how the connections API was treated. It was documented in R-exts only to be later removed, including expunging mentions of it in the news. That's obviously water under the bridge, but I do believe that there is the potential for the R core team to build goodwill with the community if they are willing to engage a bit more with the users of their APIs. As you well know R-core is not a monolith. There are several R-core members who also are not happy about how that played out and where that stands now. But there was and is no viable option other than to agree to disagree. There is really no upside to re-litigating this now. Best, luke Hadley [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Re: Is ALTREP "non-API"?
On Wed, Apr 24, 2024 at 1:32 PM luke-tierney--- via R-devel wrote: > > On Wed, 24 Apr 2024, Hadley Wickham wrote: > > >> > >> > >> > > That is not true at all - the presence of header does not constitute > declaration of something as the R API. There are cases where internal > functions are in the headers for historical or other reasons since the > headers are used both for the internal implementation and packages. > >> That's > why this is in R-exts under "The R API: entry points for C code": > > > > If I understand your point correctly, does this mean that > Rf_allocVector() is not part of the "official" R API? It does not > >> appear to > be documented in the "The R API: entry points for C code" section. > > > > It does, obviously: > https://cran.r-project.org/doc/manuals/R-exts.html#Allocating-storage-1 > >>> > >>> > >>> I'm just trying to understand the precise definition of the official API > >>> here. So it's any function mentioned in R-exts, regardless of which > >> section > >>> it appears in? > >>> > >>> Does this sentence imply that all functions starting with alloc* are part > >>> of the official API? > >>> > >> > >> Again, I can only quote the R-exts (few lines below the previous "The R > >> API" quote): > >> > >> > >> We can classify the entry points as > >> API > >> Entry points which are documented in this manual and declared in an > >> installed header file. These can be used in distributed packages and will > >> only be changed after deprecation. > >> > >> > >> It says "in this manual" - I don't see anywhere restriction on a > >> particular section of the manual, so I really don't see why you would think > >> that allocation is not part on the API. > >> > > > > Because you mentioned that section explicitly earlier in the thread. This > > obviously seems clear to you, but it's not at all clear to me and I suspect > > many of the wider community. It's frustrating because we are trying > > our best to do what y'all want us to do, but it feels like we keep getting > > the rug pulled out from under us with very little notice, and then have to > > spend a large amount of time figuring out workarounds. > > Please try to keep this discussion non-adversarial. > > > That is at least > > feasible for my team since we have multiple talented folks who are paid > > full-time to work on R, but it's a huge struggle for most people who are > > generally maintaining packages in their spare time. > > As you well know, almost all R-core members are also trying to > maintain and improve R in their spare time. Good for folks to keep in > mind before demanding R-core do X, Y, or Z for you. > > > For the purposes of this discussion could you please "documented in the > > manual" means? For example, this line mentions allocXxx functions: "There > > are quite a few allocXxx functions defined in Rinternals.h—you may want to > > explore them.". Does that imply that they are documented and free to use? > > Where we are now in terms of what package authors can use to write R > extensions has evolved organically over many years. The current state > is certainly not ideal: > > There are entry points in installed headers that might be > available; > > but to find out if they are in fact available requires reading > prose text in the header files and in WRE. > > Trying to fine-tune wording in WRE, or add a lot of additional entries > is not really a good or realistic way forward: WRE is both > documentation and tutorial and more legalistic language/more complete > coverage would make it less readable and still not guarantee > completeness or clarity. > > We would be better off (in my view, not necessarily shared by others > in R-core) if we could get to a point where: > > all entry points listed in installed header files can be used in > packages, at least with some caveats; > > the caveats are expressed in a standard way that is searchable, > e.g. with a standardized comment syntax at the header file or > individual declaration level. > > In principle this is achievable, but getting there from where we are > now is a lot of work. There are some 500 entry points in the R shared > library that are in the installed headers but not mentioned in WRE. > These would need to be reviewed and adjusted. My guess is about a > third are fine and intended to be API-stable, another third are not > used in packages and don't need to be in public headers. The remainder > are things that may be used in current packages but really should not > be, for example because they expose internal data in ways that can > cause segfaults or they make it difficult to implement performance > improvements in the base engine. Sorting through these and working > with package authors to find alternate, safer options takes a lot of > time (see 'spare time' above) and energy (some package authors are > easier to work with than others). Several of us
Re: [Rd] [External] Re: Is ALTREP "non-API"?
On Wed, 24 Apr 2024, Hadley Wickham wrote: That is not true at all - the presence of header does not constitute declaration of something as the R API. There are cases where internal functions are in the headers for historical or other reasons since the headers are used both for the internal implementation and packages. That's why this is in R-exts under "The R API: entry points for C code": If I understand your point correctly, does this mean that Rf_allocVector() is not part of the "official" R API? It does not appear to be documented in the "The R API: entry points for C code" section. It does, obviously: https://cran.r-project.org/doc/manuals/R-exts.html#Allocating-storage-1 I'm just trying to understand the precise definition of the official API here. So it's any function mentioned in R-exts, regardless of which section it appears in? Does this sentence imply that all functions starting with alloc* are part of the official API? Again, I can only quote the R-exts (few lines below the previous "The R API" quote): We can classify the entry points as API Entry points which are documented in this manual and declared in an installed header file. These can be used in distributed packages and will only be changed after deprecation. It says "in this manual" - I don't see anywhere restriction on a particular section of the manual, so I really don't see why you would think that allocation is not part on the API. Because you mentioned that section explicitly earlier in the thread. This obviously seems clear to you, but it's not at all clear to me and I suspect many of the wider community. It's frustrating because we are trying our best to do what y'all want us to do, but it feels like we keep getting the rug pulled out from under us with very little notice, and then have to spend a large amount of time figuring out workarounds. Please try to keep this discussion non-adversarial. That is at least feasible for my team since we have multiple talented folks who are paid full-time to work on R, but it's a huge struggle for most people who are generally maintaining packages in their spare time. As you well know, almost all R-core members are also trying to maintain and improve R in their spare time. Good for folks to keep in mind before demanding R-core do X, Y, or Z for you. For the purposes of this discussion could you please "documented in the manual" means? For example, this line mentions allocXxx functions: "There are quite a few allocXxx functions defined in Rinternals.h—you may want to explore them.". Does that imply that they are documented and free to use? Where we are now in terms of what package authors can use to write R extensions has evolved organically over many years. The current state is certainly not ideal: There are entry points in installed headers that might be available; but to find out if they are in fact available requires reading prose text in the header files and in WRE. Trying to fine-tune wording in WRE, or add a lot of additional entries is not really a good or realistic way forward: WRE is both documentation and tutorial and more legalistic language/more complete coverage would make it less readable and still not guarantee completeness or clarity. We would be better off (in my view, not necessarily shared by others in R-core) if we could get to a point where: all entry points listed in installed header files can be used in packages, at least with some caveats; the caveats are expressed in a standard way that is searchable, e.g. with a standardized comment syntax at the header file or individual declaration level. In principle this is achievable, but getting there from where we are now is a lot of work. There are some 500 entry points in the R shared library that are in the installed headers but not mentioned in WRE. These would need to be reviewed and adjusted. My guess is about a third are fine and intended to be API-stable, another third are not used in packages and don't need to be in public headers. The remainder are things that may be used in current packages but really should not be, for example because they expose internal data in ways that can cause segfaults or they make it difficult to implement performance improvements in the base engine. Sorting through these and working with package authors to find alternate, safer options takes a lot of time (see 'spare time' above) and energy (some package authors are easier to work with than others). Several of us have taken cracks at moving this forward from time to time, but it rarely gets to the top of anyone's priority list. And in general, I'd urge R Core to make an explicit list of functions that you consider to be part of the exported API, and then grandfather in packages that used those functions prior to learning that we weren't supposed to. Making a list and hoping that it will remain up to date is not realistic. The only way that would