On Sat, 8 Jun 2024, Reed A. Cartwright wrote:

[You don't often get email from racartwri...@gmail.com. Learn why this is 
important at https://aka.ms/LearnAboutSenderIdentification ]

Would it be reasonable to move the non-API stuff that cannot be hidden
into header files inside a "details" directory (or some other specific
naming scheme)?

That's what I use when I need to separate a public API from an internal API.

As do I, as does everyone else. As I wrote originally: " ... for a
variety of reasons that isn't achievable, at least not in the near
term." Can we leave it at that please?

luke


On Fri, Jun 7, 2024 at 7:30 AM luke-tierney--- via R-devel
<r-devel@r-project.org> wrote:

On Fri, 7 Jun 2024, Steven Dirkse wrote:

You don't often get email from sdir...@gams.com. Learn why this is important
Thanks for sharing this overview of an interesting and much-needed project.
You mention that R exports about 1500 symbols (on platforms supporting
visibility) but this subject isn't mentioned explicitly again in your note,
so I'm wondering how things tie together.  Un-exported symbols cannot be
part of the API - how would people use them in this case?  In a perfect
world the set of exported symbols could define the API or match it exactly,
but I guess that isn't the case at present.  So I conclude that R exports
extra (i.e. non-API) symbols.  Is part of the goal to remove these extra
exports?

No. We'll hide what we can, but base packages for one need access to
some entry points that should not be in the API, so those have to stay
un-hidden.

Best,

luke


-Steve

On Thu, Jun 6, 2024 at 10:47 AM luke-tierney--- via R-devel
<r-devel@r-project.org> wrote:
      This is an update on some current work on the C API for use in R
      extensions.

      The internal R implementation makes use of tens of thousands of
      C
      entry points. On Linux and Windows, which support visibility
      restrictions, most of these are visible only within the R
      executble or
      shared library. About 1500 are not hidden and are visible to
      dynamically loaded shared libraries, such as ones in packages,
      and to
      embedding applications.

      There are two main reasons for limiting access to entry points
      in a
      software framework:

      - Some entry points are very easy to use in ways that corrupt
      internal
         data, leading to segfaults or, worse, incorrect computations
      without
         segfaults.

      - Some entry point expose internal structure and other
      implementation
         details, which makes it hard to make improvements without
      breaking
         client code that has come to depend on these details.

      The API of C entry points that can be used in R extensions, both
      for
      packages and embedding, has evolved organically over many years.
      The
      definition for the current release expressed in the Writing R
      Extensions manual (WRE) is roughly:

           An entry point can be used if (1) it is declared in a
      header file
           in R.home("include"), and (2) if it is documented for use
      in WRE.

      Ideally, (1) would be necessary and sufficient, but for a
      variety of
      reasons that isn't achievable, at least not in the near term.
      (2) can
      be challenging to determine; in particular, it is not amenable
      to a
      computational answer.

      An experimental effort is underway to add annotations to the WRE
      Texinfo source to allow (2) to be answered unambiguously. The
      annotations so far mostly reflect my reading or WRE and may be
      revised
      as they are reviewed by others. The annotated document can be
      used for
      programmatically identifying what is currently considered part
      of the C
      API. The result so far is an experimental function
      tools:::funAPI():

          > head(tools:::funAPI())
                           name                    loc apitype
           1 Rf_AdobeSymbol2utf8 R_ext/GraphicsDevice.h    eapi
           2        alloc3DArray                    WRE     api
           3          allocArray                    WRE     api
           4           allocLang                    WRE     api
           5           allocList                    WRE     api
           6         allocMatrix                    WRE     api

      The 'apitype' field has three possible levels

           | api  | stable (ideally) API |
           | eapi | experimental API     |
           | emb  | embedding API        |

      Entry points in the embedded API would typically only be used in
      applications embedding R or providing new front ends, but might
      be
      reasonable to use in packages that support embedding.

      The 'loc' field indicates how the entry point is identified as
      part of
      an API: explicit mention in WRE, or declaration in a header file
      identified as fully part of an API.

      [tools:::funAPI() may not be completely accurate as it relies on
      regular expressions for examining header files considered part
      of the
      API rather than proper parsing. But it seems to be pretty close
      to
      what can be achieved with proper parsing.  Proper parsing would
      add
      dependencies on additional tools, which I would like to avoid
      for
      now. One dependency already present is that a C compiler has to
      be on
      the search path and cc -E has to run the C pre-processor.]

      Two additional experimental functions are available for
      analyzing
      package compliance: tools:::checkPkgAPI and
      tools:::checkAllPkgsAPI.
      These examine installed packages.

      [These may produce some false positives on macOS; they may or
      may not
      work on Windows at this point.]

      Using these tools initially showed around 200 non-API entry
      points
      used across packages on CRAN and BIOC. Ideally this number
      should be
      reduced to zero. This will require a combination of additions to
      the
      API and changes in packages.

      Some entry points can safely be added to the API. Around 40 have
      already been added to WRE with API annotations; another 40 or so
      can
      probably be added after review.

      The remainder mostly fall into two groups:

      - Entry points that should never be used in packages, such as
         SET_OBJECT or SETLENGTH (or any non-API SETXYZ functions for
      that
         matter) that can create inconsistent or corrupt internal
      state.

      - Entry points that depend on the existence of internal
      structure that
         might be subject to change, such as the existence of promise
      objects
         or internal structure of environments.

      Many, if not most, of these seem to be used in idioms that can
      either
      be accomplished with existing higher-level functions already in
      the
      API, or by new higher level functions that can be created and
      added. Working through these will take some time and
      coordination
      between R-core and maintainers of affected packages.

      Once things have gelled a bit more I hope to turn this into a
      blog
      post that will include some examples of moving non-API entry
      point
      uses into compliance.

      Best,

      luke

      --
      Luke Tierney
      Ralph E. Wareham Professor of Mathematical Sciences
      University of Iowa                  Phone:
       319-335-3386
      Department of Statistics and        Fax:
       319-335-3017
          Actuarial Science
      241 Schaeffer Hall                  email:
       luke-tier...@uiowa.edu
      Iowa City, IA 52242                 WWW:
      
https://urldefense.com/v3/__http://www.stat.uiowa.edu__;!!IKRxdwAv5BmarQ!foNGcMBk1Ky20Cgz66006bUDTWTxmZhh2ntk8-PLXUqCy2s6xw68UOo-fy7OsIRpHBwgMtfQyBkcYZUZBvMvo18$

      ______________________________________________
      R-devel@r-project.org mailing list
      
https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-devel__;!!IKRxdwAv5BmarQ!foNGcMBk1Ky20Cgz66006bUDTWTxmZhh2ntk8-PLXUqCy2s6xw68UOo-fy7OsIRpHBwgMtfQyBkcYZUZnVX5taE$






--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242                 WWW:  
https://urldefense.com/v3/__http://www.stat.uiowa.edu__;!!IKRxdwAv5BmarQ!foNGcMBk1Ky20Cgz66006bUDTWTxmZhh2ntk8-PLXUqCy2s6xw68UOo-fy7OsIRpHBwgMtfQyBkcYZUZBvMvo18$
______________________________________________
R-devel@r-project.org mailing list
https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-devel__;!!IKRxdwAv5BmarQ!foNGcMBk1Ky20Cgz66006bUDTWTxmZhh2ntk8-PLXUqCy2s6xw68UOo-fy7OsIRpHBwgMtfQyBkcYZUZnVX5taE$


--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
   Actuarial Science
241 Schaeffer Hall                  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu/
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to