Another quick update:

Over 100 entry points used in packages for which it was safe to do so
have now been marked as part of an API (in some cases after adding
error checking of arguments). These can be used in package C code,
with caveats for ones considered experimental or intended for embedded
use.

The remaining 100 or so non-API entry points used in packages will
require changes in package C code. In some cases the API already
provides safe alternatives to unsafe internal entry points.  In most
other cases it should be possible to develop safer interfaces that
allow packages to accomplish what they need to do in a more robust
way, while giving R maintainers and developers the freedom to make
needed internal changes without disrupting package space.

It will take some time to develop these new interfaces. 'Writing R
extensions' now has a new section 'Moving into C API compliance' that
should help with adapting to these changes.

Best,

luke

On Thu, 6 Jun 2024, luke-tier...@uiowa.edu wrote:

This is an update on some current work on the C API for use in R
extensions.

The internal R implementation makes use of tens of thousands of C
entry points. On Linux and Windows, which support visibility
restrictions, most of these are visible only within the R executble or
shared library. About 1500 are not hidden and are visible to
dynamically loaded shared libraries, such as ones in packages, and to
embedding applications.

There are two main reasons for limiting access to entry points in a
software framework:

- Some entry points are very easy to use in ways that corrupt internal
 data, leading to segfaults or, worse, incorrect computations without
 segfaults.

- Some entry point expose internal structure and other implementation
 details, which makes it hard to make improvements without breaking
 client code that has come to depend on these details.

The API of C entry points that can be used in R extensions, both for
packages and embedding, has evolved organically over many years. The
definition for the current release expressed in the Writing R
Extensions manual (WRE) is roughly:

   An entry point can be used if (1) it is declared in a header file
   in R.home("include"), and (2) if it is documented for use in WRE.

Ideally, (1) would be necessary and sufficient, but for a variety of
reasons that isn't achievable, at least not in the near term. (2) can
be challenging to determine; in particular, it is not amenable to a
computational answer.

An experimental effort is underway to add annotations to the WRE
Texinfo source to allow (2) to be answered unambiguously. The
annotations so far mostly reflect my reading or WRE and may be revised
as they are reviewed by others. The annotated document can be used for
programmatically identifying what is currently considered part of the C
API. The result so far is an experimental function tools:::funAPI():

   > head(tools:::funAPI())
                     name                    loc apitype
   1 Rf_AdobeSymbol2utf8 R_ext/GraphicsDevice.h    eapi
   2        alloc3DArray                    WRE     api
   3          allocArray                    WRE     api
   4           allocLang                    WRE     api
   5           allocList                    WRE     api
   6         allocMatrix                    WRE     api

The 'apitype' field has three possible levels

   | api  | stable (ideally) API |
   | eapi | experimental API     |
   | emb  | embedding API        |

Entry points in the embedded API would typically only be used in
applications embedding R or providing new front ends, but might be
reasonable to use in packages that support embedding.

The 'loc' field indicates how the entry point is identified as part of
an API: explicit mention in WRE, or declaration in a header file
identified as fully part of an API.

[tools:::funAPI() may not be completely accurate as it relies on
regular expressions for examining header files considered part of the
API rather than proper parsing. But it seems to be pretty close to
what can be achieved with proper parsing.  Proper parsing would add
dependencies on additional tools, which I would like to avoid for
now. One dependency already present is that a C compiler has to be on
the search path and cc -E has to run the C pre-processor.]

Two additional experimental functions are available for analyzing
package compliance: tools:::checkPkgAPI and tools:::checkAllPkgsAPI.
These examine installed packages.

[These may produce some false positives on macOS; they may or may not
work on Windows at this point.]

Using these tools initially showed around 200 non-API entry points
used across packages on CRAN and BIOC. Ideally this number should be
reduced to zero. This will require a combination of additions to the
API and changes in packages.

Some entry points can safely be added to the API. Around 40 have
already been added to WRE with API annotations; another 40 or so can
probably be added after review.

The remainder mostly fall into two groups:

- Entry points that should never be used in packages, such as
 SET_OBJECT or SETLENGTH (or any non-API SETXYZ functions for that
 matter) that can create inconsistent or corrupt internal state.

- Entry points that depend on the existence of internal structure that
 might be subject to change, such as the existence of promise objects
 or internal structure of environments.

Many, if not most, of these seem to be used in idioms that can either
be accomplished with existing higher-level functions already in the
API, or by new higher level functions that can be created and
added. Working through these will take some time and coordination
between R-core and maintainers of affected packages.

Once things have gelled a bit more I hope to turn this into a blog
post that will include some examples of moving non-API entry point
uses into compliance.

Best,

luke



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
   Actuarial Science
241 Schaeffer Hall                  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to