from:"Luke Tierney"

Re: [R-pkg-devel] [External] Use of ‘R_InputHandlers’

2024-07-11 Thread luke-tierney

On Wed, 10 Jul 2024, Duncan Murdoch wrote:

An update to the rgl package was rejected with this note:

* checking compiled code ... NOTE
File ‘rgl/libs/rgl.so’:
Found non-API call to R: ‘R_InputHandlers’

Compiled code should not call non-API entry points in R.

See ‘Writing portable packages’ in the ‘Writing R Extensions’ manual,
and section ‘Moving into C API compliance’ for issues with the use of
non-API entry points.

See
<https://win-builder.r-project.org/incoming_pretest/rgl_1.3.10_20240707_165632/Debian/00check.log>
for more info.

`R_InputHandlers` isn't actually a function, it's a linked list of structures
holding input handlers. rgl links into it to handle mouse and keyboard
interaction when it is displaying a window in X11.

Ideally the code generating the NOTE should be revised since a few of
the things checked for at that point are global variables, not entry
points.

Given the design of this interface this variable should be added to
the embedding API and dropped from the check list. I have not dropped
it from the check list in R-devel so you should no longer get this
NOTE.

Best,

luke

`R_InputHandlers` is declared in R in src/include/R_ext/eventloop.h, where
comments state

/*
For use by alternative front-ends and packages which need to share
the R event loop (on Unix-alikes).

Not part of the API and subject to change without notice.

NB: HAVE_SYS_SELECT_H should be checked and defined before this is
included.
*/

WRE has a discussion of the issue in 8.1.4, "meshing event loops". It refers
to comments in src/unix/sys-std.c, but I'm not sure which comments.

rgl references it from this code:

https://github.com/dmurdoch/rgl/blob/fbedc326e291c3ec28a9ccac7d030f04b05edfa3/src/x11lib.cpp#L53-L72

Can anyone tell me whether I can fix this?

Duncan Murdoch

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics andFax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu/
__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [Rd] [External] API for converting LANGSXP to LISTSXP?

2024-07-06 Thread luke-tierney--- via R-devel


We have long been discouraging the use of pairlists. So no, we will
not do anything to facilitate this conversion; if anything the
opposite. SET_TYPEOF is used more than it should be in the sources.
It is something I would like us to fix sometime, but isn't high
priority.

Best,

luke

On Fri, 5 Jul 2024, Kevin Ushey wrote:


Hi,

A common idiom in the R sources is to convert objects between LANGSXP
and LISTSXP by using SET_TYPEOF. However, this is soon going to be
disallowed in packages. From what I can see, there isn't currently a
direct way to convert between these two object types using the
available API. At the R level, one can convert calls to pairlists
with:


as.call(pairlist(as.symbol("rnorm"), 42))

rnorm(42)

However, the reverse is not possible:


as.pairlist(call("rnorm", 42))

Error in as.pairlist(call("rnorm", 42)) :
 'language' object cannot be coerced to type 'pairlist'

One can do such a conversion via conversion to e.g. an intermediate R
list (VECSXP), but that seems wasteful. Would it make sense to permit
this coercion? Or, is there some other relevant API I'm missing?

For completeness, Rf_coerceVector() also emits the same error above
since it uses the same code path.

Thanks,
Kevin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Non-API updates

2024-06-25 Thread luke-tierney--- via R-devel


On Tue, 25 Jun 2024, Josiah Parry wrote:


Hey folks,

I'm sure many of you all woke to the same message I did: "Please correct
before 2024-07-09 to safely retain your package on CRAN" caused by Non-API
changes to CRAN.

This is quite unexpected as Luke Tierney's June 6th email writes (emphasis
mine):

"An *experimental* *effort* is underway to add annotations to the WRE..."

"*Once things have gelled a bit more *I hope to turn this into a blog
post that will include some examples of moving non-API entry point
uses into compliance."

Since then there has not been any indication of stabilization of the
Non-API changes nor has there been a blog post outlining how to migrate. As
things have been coming and going from the Non-API changes for quite some
time now, we (the royal we, here) have been waiting for an official
announcement from CRAN on the stabilizing changes.


I posted an update to this list a few days ago. If you missed it you
can find it in the archive.


*Can we extend this very short notice to handle the Non-API changes before
removal from CRAN? *


Timing decisions are up to CRAN.


In the case of the 3 packages I have to fix within 2 weeks, these are all
using Rust. These changes require upstream changes to the extendr library.
There are other packages that are also affected here. Making these changes
is a delicate act and requires care and focus. All of the extendr
developers work full time and cannot make addressing these changes their
only priority for the next 2 weeks.


Using non-API entry points is a choice that comes with risks. The ones
leading to WARNINGs for your packages (PRSEEN and SYMVALUE)have been
receiving NOTEs in check results for several weeks. Using
tools:::checkPkgAPI you can see that your packages are referencing a
lot of non-API entry points. Some of these may be added to the API,
but most will not. This may be a good time to look into that.

To minimize disruption we have been adding entry points to the API as
long as it is safe to do so, in some cases against our better
judgment. But ones that are unsafe to use will not be
added. Eventually their declarations will be removed from public
header files and they will be hidden when that is possible. Packages
that have chosen to use these non-API entry points will have to adapt
if they want to pass R CMD check. For now, we will try to first have
use of these entry points result in NOTEs, and then WARNINGs. Once
their declarations are removed and they are hidden, packages using
them will fail to install.


Additionally, a blog post with "examples of moving non-API entry point uses
into compliance" would be very helpful in this endeavor.


WRE now contains a section 'Moving into C API compliance'; that seems
a better option for the moment given that things are still very much
in flux. We will try to add to this section as needed. For the
specific entry points generating WARNINGs for your packages the advice
is simple: stop using them.

Best,

luke



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] non-API entry point Rf_findVarInFrame3 will be removed

2024-06-19 Thread luke-tierney--- via R-devel


The non-API entry point Rf_findVarInFrame3 used by some packages will
be removed as it is not needed in one use case and not working as
intended in the other.

The most common use case, Rf_findVarInFrame3(rho, sym, TRUE), is
equivalent to the simpler Rf_findVarInFrame(rho, sym).

The less common use case is to test for existence of a binding with

findVarInFrame(rho, sym, FALSE) != R_UnboundValue

The intent is that this have no side effects, but that is not the
case: if the binding exists and is an active binding, then its
function will be called to produce a value. This usage should be
replaced with R_existsVarInFrame(rho, sym).

R_existsVarInFrame has been marked as part of the experimental API.
It is not yet clear whether Rf_findVarInFrame will become part of an
API.  If it does, then its semantics will likely have to change; if it
does not, an alternate interface will be provided.

Best,

luke


--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] clarifying and adjusting the C API for R

2024-06-18 Thread luke-tierney--- via R-devel

ainder mostly fall into two groups:

- Entry points that should never be used in packages, such as
 SET_OBJECT or SETLENGTH (or any non-API SETXYZ functions for that
 matter) that can create inconsistent or corrupt internal state.

- Entry points that depend on the existence of internal structure that
 might be subject to change, such as the existence of promise objects
 or internal structure of environments.

Many, if not most, of these seem to be used in idioms that can either
be accomplished with existing higher-level functions already in the
API, or by new higher level functions that can be created and
added. Working through these will take some time and coordination
between R-core and maintainers of affected packages.

Once things have gelled a bit more I hope to turn this into a blog
post that will include some examples of moving non-API entry point
uses into compliance.

Best,

luke




--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: changes in R-devel and zero-extent objects in Rcpp

2024-06-08 Thread luke-tierney--- via R-devel

t.org <mailto:R-devel@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
    <https://stat.ethz.ch/mailman/listinfo/r-devel>






--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu/
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: clarifying and adjusting the C API for R

2024-06-07 Thread luke-tierney--- via R-devel


On Sat, 8 Jun 2024, Reed A. Cartwright wrote:


[You don't often get email from racartwri...@gmail.com. Learn why this is 
important at https://aka.ms/LearnAboutSenderIdentification ]

Would it be reasonable to move the non-API stuff that cannot be hidden
into header files inside a "details" directory (or some other specific
naming scheme)?

That's what I use when I need to separate a public API from an internal API.


As do I, as does everyone else. As I wrote originally: " ... for a
variety of reasons that isn't achievable, at least not in the near
term." Can we leave it at that please?

luke



On Fri, Jun 7, 2024 at 7:30 AM luke-tierney--- via R-devel
 wrote:


On Fri, 7 Jun 2024, Steven Dirkse wrote:


You don't often get email from sdir...@gams.com. Learn why this is important
Thanks for sharing this overview of an interesting and much-needed project.
You mention that R exports about 1500 symbols (on platforms supporting
visibility) but this subject isn't mentioned explicitly again in your note,
so I'm wondering how things tie together.  Un-exported symbols cannot be
part of the API - how would people use them in this case?  In a perfect
world the set of exported symbols could define the API or match it exactly,
but I guess that isn't the case at present.  So I conclude that R exports
extra (i.e. non-API) symbols.  Is part of the goal to remove these extra
exports?


No. We'll hide what we can, but base packages for one need access to
some entry points that should not be in the API, so those have to stay
un-hidden.

Best,

luke



-Steve

On Thu, Jun 6, 2024 at 10:47 AM luke-tierney--- via R-devel
 wrote:
  This is an update on some current work on the C API for use in R
  extensions.

  The internal R implementation makes use of tens of thousands of
  C
  entry points. On Linux and Windows, which support visibility
  restrictions, most of these are visible only within the R
  executble or
  shared library. About 1500 are not hidden and are visible to
  dynamically loaded shared libraries, such as ones in packages,
  and to
  embedding applications.

  There are two main reasons for limiting access to entry points
  in a
  software framework:

  - Some entry points are very easy to use in ways that corrupt
  internal
 data, leading to segfaults or, worse, incorrect computations
  without
 segfaults.

  - Some entry point expose internal structure and other
  implementation
 details, which makes it hard to make improvements without
  breaking
 client code that has come to depend on these details.

  The API of C entry points that can be used in R extensions, both
  for
  packages and embedding, has evolved organically over many years.
  The
  definition for the current release expressed in the Writing R
  Extensions manual (WRE) is roughly:

   An entry point can be used if (1) it is declared in a
  header file
   in R.home("include"), and (2) if it is documented for use
  in WRE.

  Ideally, (1) would be necessary and sufficient, but for a
  variety of
  reasons that isn't achievable, at least not in the near term.
  (2) can
  be challenging to determine; in particular, it is not amenable
  to a
  computational answer.

  An experimental effort is underway to add annotations to the WRE
  Texinfo source to allow (2) to be answered unambiguously. The
  annotations so far mostly reflect my reading or WRE and may be
  revised
  as they are reviewed by others. The annotated document can be
  used for
  programmatically identifying what is currently considered part
  of the C
  API. The result so far is an experimental function
  tools:::funAPI():

  > head(tools:::funAPI())
   nameloc apitype
   1 Rf_AdobeSymbol2utf8 R_ext/GraphicsDevice.heapi
   2alloc3DArrayWRE api
   3  allocArrayWRE api
   4   allocLangWRE api
   5   allocListWRE api
   6 allocMatrixWRE api

  The 'apitype' field has three possible levels

   | api  | stable (ideally) API |
   | eapi | experimental API |
   | emb  | embedding API|

  Entry points in the embedded API would typically only be used in
  applications embedding R or providing new front ends, but might
  be
  reasonable to use in packages that support embedding.

  The 'loc' field indicates how the entry point is identified as
  part of
  an API: explicit mention in WRE, or declaration in a header file
  identified as fully part of an API.

  [tools:::funAPI() may not be

Re: [Rd] [External] Re: clarifying and adjusting the C API for R

2024-06-07 Thread luke-tierney--- via R-devel


On Fri, 7 Jun 2024, Hadley Wickham wrote:


Thanks for working on this Luke! We appreciate your efforts to make it
easier to tell what's in the exported API and we're very happy to work with
you on any changes needed to tidyverse/r-lib packages.
Hadley


Thanks. Glad to hear -- I may be reminding you when we hit some of the
tougher challenges down the road :-)

Best,

luke



On Thu, Jun 6, 2024 at 9:47 AM luke-tierney--- via R-devel
 wrote:
  This is an update on some current work on the C API for use in R
  extensions.

  The internal R implementation makes use of tens of thousands of
  C
  entry points. On Linux and Windows, which support visibility
  restrictions, most of these are visible only within the R
  executble or
  shared library. About 1500 are not hidden and are visible to
  dynamically loaded shared libraries, such as ones in packages,
  and to
  embedding applications.

  There are two main reasons for limiting access to entry points
  in a
  software framework:

  - Some entry points are very easy to use in ways that corrupt
  internal
     data, leading to segfaults or, worse, incorrect computations
  without
     segfaults.

  - Some entry point expose internal structure and other
  implementation
     details, which makes it hard to make improvements without
  breaking
     client code that has come to depend on these details.

  The API of C entry points that can be used in R extensions, both
  for
  packages and embedding, has evolved organically over many years.
  The
  definition for the current release expressed in the Writing R
  Extensions manual (WRE) is roughly:

       An entry point can be used if (1) it is declared in a
  header file
       in R.home("include"), and (2) if it is documented for use
  in WRE.

  Ideally, (1) would be necessary and sufficient, but for a
  variety of
  reasons that isn't achievable, at least not in the near term.
  (2) can
  be challenging to determine; in particular, it is not amenable
  to a
  computational answer.

  An experimental effort is underway to add annotations to the WRE
  Texinfo source to allow (2) to be answered unambiguously. The
  annotations so far mostly reflect my reading or WRE and may be
  revised
  as they are reviewed by others. The annotated document can be
  used for
  programmatically identifying what is currently considered part
  of the C
  API. The result so far is an experimental function
  tools:::funAPI():

       > head(tools:::funAPI())
                       name                    loc apitype
       1 Rf_AdobeSymbol2utf8 R_ext/GraphicsDevice.h    eapi
       2        alloc3DArray                    WRE     api
       3          allocArray                    WRE     api
       4           allocLang                    WRE     api
       5           allocList                    WRE     api
       6         allocMatrix                    WRE     api

  The 'apitype' field has three possible levels

       | api  | stable (ideally) API |
       | eapi | experimental API     |
       | emb  | embedding API        |

  Entry points in the embedded API would typically only be used in
  applications embedding R or providing new front ends, but might
  be
  reasonable to use in packages that support embedding.

  The 'loc' field indicates how the entry point is identified as
  part of
  an API: explicit mention in WRE, or declaration in a header file
  identified as fully part of an API.

  [tools:::funAPI() may not be completely accurate as it relies on
  regular expressions for examining header files considered part
  of the
  API rather than proper parsing. But it seems to be pretty close
  to
  what can be achieved with proper parsing.  Proper parsing would
  add
  dependencies on additional tools, which I would like to avoid
  for
  now. One dependency already present is that a C compiler has to
  be on
  the search path and cc -E has to run the C pre-processor.]

  Two additional experimental functions are available for
  analyzing
  package compliance: tools:::checkPkgAPI and
  tools:::checkAllPkgsAPI.
  These examine installed packages.

  [These may produce some false positives on macOS; they may or
  may not
  work on Windows at this point.]

  Using these tools initially showed around 200 non-API entry
  points
  used across packages on CRAN and BIOC. Ideally this number
  should be
  reduced to zero. This will require a combination of additions to
  the
  API and changes in packages.

  Some entry points can safely be added to the API. Around 40 have
  already been added to WRE with API annotation

Re: [Rd] [External] Re: clarifying and adjusting the C API for R

2024-06-07 Thread luke-tierney--- via R-devel


On Fri, 7 Jun 2024, Steven Dirkse wrote:


You don't often get email from sdir...@gams.com. Learn why this is important
Thanks for sharing this overview of an interesting and much-needed project.
You mention that R exports about 1500 symbols (on platforms supporting
visibility) but this subject isn't mentioned explicitly again in your note,
so I'm wondering how things tie together.  Un-exported symbols cannot be
part of the API - how would people use them in this case?  In a perfect
world the set of exported symbols could define the API or match it exactly,
but I guess that isn't the case at present.  So I conclude that R exports
extra (i.e. non-API) symbols.  Is part of the goal to remove these extra
exports?


No. We'll hide what we can, but base packages for one need access to
some entry points that should not be in the API, so those have to stay
un-hidden.

Best,

luke



-Steve

On Thu, Jun 6, 2024 at 10:47 AM luke-tierney--- via R-devel
 wrote:
  This is an update on some current work on the C API for use in R
  extensions.

  The internal R implementation makes use of tens of thousands of
  C
  entry points. On Linux and Windows, which support visibility
  restrictions, most of these are visible only within the R
  executble or
  shared library. About 1500 are not hidden and are visible to
  dynamically loaded shared libraries, such as ones in packages,
  and to
  embedding applications.

  There are two main reasons for limiting access to entry points
  in a
  software framework:

  - Some entry points are very easy to use in ways that corrupt
  internal
     data, leading to segfaults or, worse, incorrect computations
  without
     segfaults.

  - Some entry point expose internal structure and other
  implementation
     details, which makes it hard to make improvements without
  breaking
     client code that has come to depend on these details.

  The API of C entry points that can be used in R extensions, both
  for
  packages and embedding, has evolved organically over many years.
  The
  definition for the current release expressed in the Writing R
  Extensions manual (WRE) is roughly:

       An entry point can be used if (1) it is declared in a
  header file
       in R.home("include"), and (2) if it is documented for use
  in WRE.

  Ideally, (1) would be necessary and sufficient, but for a
  variety of
  reasons that isn't achievable, at least not in the near term.
  (2) can
  be challenging to determine; in particular, it is not amenable
  to a
  computational answer.

  An experimental effort is underway to add annotations to the WRE
  Texinfo source to allow (2) to be answered unambiguously. The
  annotations so far mostly reflect my reading or WRE and may be
  revised
  as they are reviewed by others. The annotated document can be
  used for
  programmatically identifying what is currently considered part
  of the C
  API. The result so far is an experimental function
  tools:::funAPI():

       > head(tools:::funAPI())
                       name                    loc apitype
       1 Rf_AdobeSymbol2utf8 R_ext/GraphicsDevice.h    eapi
       2        alloc3DArray                    WRE     api
       3          allocArray                    WRE     api
       4           allocLang                    WRE     api
       5           allocList                    WRE     api
       6         allocMatrix                    WRE     api

  The 'apitype' field has three possible levels

       | api  | stable (ideally) API |
       | eapi | experimental API     |
       | emb  | embedding API        |

  Entry points in the embedded API would typically only be used in
  applications embedding R or providing new front ends, but might
  be
  reasonable to use in packages that support embedding.

  The 'loc' field indicates how the entry point is identified as
  part of
  an API: explicit mention in WRE, or declaration in a header file
  identified as fully part of an API.

  [tools:::funAPI() may not be completely accurate as it relies on
  regular expressions for examining header files considered part
  of the
  API rather than proper parsing. But it seems to be pretty close
  to
  what can be achieved with proper parsing.  Proper parsing would
  add
  dependencies on additional tools, which I would like to avoid
  for
  now. One dependency already present is that a C compiler has to
  be on
  the search path and cc -E has to run the C pre-processor.]

  Two additional experimental functions are available for
  analyzing
  package compliance: tools:::checkPkgAPI and
  tools:::checkAllPkgsAPI.
  These examine insta

[Rd] clarifying and adjusting the C API for R

2024-06-06 Thread luke-tierney--- via R-devel


This is an update on some current work on the C API for use in R
extensions.

The internal R implementation makes use of tens of thousands of C
entry points. On Linux and Windows, which support visibility
restrictions, most of these are visible only within the R executble or
shared library. About 1500 are not hidden and are visible to
dynamically loaded shared libraries, such as ones in packages, and to
embedding applications.

There are two main reasons for limiting access to entry points in a
software framework:

- Some entry points are very easy to use in ways that corrupt internal
  data, leading to segfaults or, worse, incorrect computations without
  segfaults.

- Some entry point expose internal structure and other implementation
  details, which makes it hard to make improvements without breaking
  client code that has come to depend on these details.

The API of C entry points that can be used in R extensions, both for
packages and embedding, has evolved organically over many years. The
definition for the current release expressed in the Writing R
Extensions manual (WRE) is roughly:

An entry point can be used if (1) it is declared in a header file
in R.home("include"), and (2) if it is documented for use in WRE.

Ideally, (1) would be necessary and sufficient, but for a variety of
reasons that isn't achievable, at least not in the near term. (2) can
be challenging to determine; in particular, it is not amenable to a
computational answer.

An experimental effort is underway to add annotations to the WRE
Texinfo source to allow (2) to be answered unambiguously. The
annotations so far mostly reflect my reading or WRE and may be revised
as they are reviewed by others. The annotated document can be used for
programmatically identifying what is currently considered part of the C
API. The result so far is an experimental function tools:::funAPI():

> head(tools:::funAPI())
 nameloc apitype
1 Rf_AdobeSymbol2utf8 R_ext/GraphicsDevice.heapi
2alloc3DArrayWRE api
3  allocArrayWRE api
4   allocLangWRE api
5   allocListWRE api
6 allocMatrixWRE api

The 'apitype' field has three possible levels

| api  | stable (ideally) API |
| eapi | experimental API |
| emb  | embedding API|

Entry points in the embedded API would typically only be used in
applications embedding R or providing new front ends, but might be
reasonable to use in packages that support embedding.

The 'loc' field indicates how the entry point is identified as part of
an API: explicit mention in WRE, or declaration in a header file
identified as fully part of an API.

[tools:::funAPI() may not be completely accurate as it relies on
regular expressions for examining header files considered part of the
API rather than proper parsing. But it seems to be pretty close to
what can be achieved with proper parsing.  Proper parsing would add
dependencies on additional tools, which I would like to avoid for
now. One dependency already present is that a C compiler has to be on
the search path and cc -E has to run the C pre-processor.]

Two additional experimental functions are available for analyzing
package compliance: tools:::checkPkgAPI and tools:::checkAllPkgsAPI.
These examine installed packages.

[These may produce some false positives on macOS; they may or may not
work on Windows at this point.]

Using these tools initially showed around 200 non-API entry points
used across packages on CRAN and BIOC. Ideally this number should be
reduced to zero. This will require a combination of additions to the
API and changes in packages.

Some entry points can safely be added to the API. Around 40 have
already been added to WRE with API annotations; another 40 or so can
probably be added after review.

The remainder mostly fall into two groups:

- Entry points that should never be used in packages, such as
  SET_OBJECT or SETLENGTH (or any non-API SETXYZ functions for that
  matter) that can create inconsistent or corrupt internal state.

- Entry points that depend on the existence of internal structure that
  might be subject to change, such as the existence of promise objects
  or internal structure of environments.

Many, if not most, of these seem to be used in idioms that can either
be accomplished with existing higher-level functions already in the
API, or by new higher level functions that can be created and
added. Working through these will take some time and coordination
between R-core and maintainers of affected packages.

Once things have gelled a bit more I hope to turn this into a blog
post that will include some examples of moving non-API entry point
uses into compliance.

Best,

luke

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
Unive

Re: [Rd] [External] R hang/bug with circular references and promises

2024-05-13 Thread luke-tierney--- via R-devel

On Mon, 13 May 2024, Ivan Krylov wrote:

[You don't often get email from ikry...@disroot.org. Learn why this is 
important at https://aka.ms/LearnAboutSenderIdentification ]

On Mon, 13 May 2024 09:54:27 -0500 (CDT)
luke-tierney--- via R-devel  wrote:

Looks like I added that warning 22 years ago, so that should be enough
notice :-). I'll look into removing it now.

For now I have just changed the internal code to throw an error
if the change would produce a cycle (r86545). This gives

> e <- new.env()
> parent.env(e) <- e
Error in `parent.env<-`(`*tmp*`, value = ) :
  cycles in parent chains are not allowed

Dear Luke,

I've got a somewhat niche use case: as a way of protecting myself
against rogue *.rds files and vulnerabilities in the C code, I've been
manually unserializing "plain" data objects (without anything
executable), including environments, in R [1].

I would try using two passes: create the environments in the first pass
and in a second pass, either over the file or a new object with place holders, 
fill them in.

I see that SET_ENCLOS() is already commented as "not API and probably
should not be <...> used". Do you think there is a way to recreate an
environment, taking the REFSXP entries into account, without
`parent.env<-`?  Would you recommend to abandon the folly of
unserializing environments manually?

SET_ENCLOS is one of a number of SET... functions that are not in the
API and should not be since they are potentially unsafe to use. (One
that is in the API and needs to be removed is SET_TYPEOF). So we would
like to move them out of installed headers and not export them as
entry points. For this particular case most uses I see are something
like

env = allocSExp(ENVSXP);
SET_FRAME(env, R_NilValue);
SET_ENCLOS(env, parent);
SET_HASHTAB(env, R_NilValue);
SET_ATTRIB(env, R_NilValue);

which could just use

 env = R_NewEnv(parent, FALSE, 0);

Best,

luke

--
Best regards,
Ivan

[1]
https://codeberg.org/aitap/unserializeData/src/commit/33d72705c1ee265349b3e369874ce4b47f9cd358/R/unserialize.R#L289-L313

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] R hang/bug with circular references and promises

2024-05-13 Thread luke-tierney--- via R-devel


On Sat, 11 May 2024, Peter Langfelder wrote:


On Sat, May 11, 2024 at 9:34 AM luke-tierney--- via R-devel
 wrote:


On Sat, 11 May 2024, Travers Ching wrote:


The following code snippet causes R to hang. This example might be a
bit contrived as I was experimenting and trying to understand
promises, but uses only base R.


This has nothing to do with promises. You created a cycle in the
environment chain. A simpler variant:

e <- new.env()
parent.env(e) <- e
get("x", e)

This will hang and is not interruptable -- loops searching up
environment chains are too speed-critical to check for interrupts.  It
is, however, pretty easy to check whether the parent change would
create a cycle and throw an error if it would. Need to think a bit
about exactly where the check should go.


FWIW, the help for parent.env already explicitly warns against using
parent.env <-:

The replacement function ‘parent.env<-’ is extremely dangerous as
it can be used to destructively change environments in ways that
violate assumptions made by the internal C code.  It may be
removed in the near future.


Looks like I added that warning 22 years ago, so that should be enough
notice :-). I'll look into removing it now.

Best,

luke



Peter



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] R hang/bug with circular references and promises

2024-05-10 Thread luke-tierney--- via R-devel


On Sat, 11 May 2024, Travers Ching wrote:


The following code snippet causes R to hang. This example might be a
bit contrived as I was experimenting and trying to understand
promises, but uses only base R.

It looks like it is looking for "not_a_variable" recursively but since
it doesn't exist it goes on indefinitely.

x0 <- new.env()
x1 <- new.env(parent = x0)
parent.env(x0) <- x1
delayedAssign("v", not_a_variable, eval.env=x1)
delayedAssign("w", v, assign.env=x1, eval.env=x0)
x1$w


This has nothing to do with promises. You created a cycle in the
environment chain. A simpler variant:

e <- new.env()
parent.env(e) <- e
get("x", e)

This will hang and is not interruptable -- loops searching up
environment chains are too speed-critical to check for interrupts.  It
is, however, pretty easy to check whether the parent change would
create a cycle and throw an error if it would. Need to think a bit
about exactly where the check should go.

Best,

luke



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [R-pkg-devel] [External] SETLENGTH()

2024-05-04 Thread luke-tierney


On Sat, 4 May 2024, Vladimir Dergachev wrote:

[Some people who received this message don't often get email from 
volo...@mindspring.com. Learn why this is important at 
https://aka.ms/LearnAboutSenderIdentification ]


I noticed a note on RMVL package check page for development version of R:

Found non-API call to R: ?SETLENGTH?

Is this something that is work-in-progress for the development version, or
has SETLENGTH() been deprecated ? What should I use instead ?


SETLENGTH has never been part of the API. It is not safe to use except
in a very, very limited set of circumstances. Using it in other
settings will confuse the memory manager, leading at least to
mis-calculation of memory use information and possibly to
segfaults. For most uses I have seen, copying to a new vector of the
right size is the only safe option.

The one context where something along these lines might be OK is for
growable vectors. This concept is emphatically not in the API at this
point, and the way it is currently implemented in base is not robust
enough to become an API (even though some packages have used it). It
is possible that a proper API for this will be added; at that point
SETLENGTH will be removed from the accessible entry points on
platforms that allow this.

So if you are getting a note about SETLENGTH, either stop using it or
be prepared to make some changes at fairly short notice.

[Similar considerations apply to SET_TRUELENGT. In most but not all
cases using it is less dangerous, but you should still look for other
options if you want your code to continue to work.]

Best,

luke



thank you very much

Vladimir Dergachev
__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] [External] May .External2() be used in packages?

2024-05-01 Thread luke-tierney


yOn Wed, 1 May 2024, Konrad Rudolph wrote:


Thanks,

That’s a shame but good to know.

  Packages that for whatever reason have chosen to use it
  could instead use .External(), and that is what yo should use.


Unfortunately .External() is not a replacement (in general, or for my
purpose) since it’s missing the `call` and `rho` arguments, and computing
the same information without these arguments in C code is far from trivial.


The call you would get is not likely to be all that useful, but it is
the one you wrote. The environment is the one you get from environment()
at the point where you would call .External2. So instead of

.External2("foo", x, y)

do

.External("foo", quote(.External2("foo", x, y)), environment(), x, y)

and adjust your C function accordingly.

Best,

luke


--
Konrad Rudolph // @klmr




--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] [External] May .External2() be used in packages?

2024-05-01 Thread luke-tierney


.External2() is not in the API and is not intended to be used in
packages.  Packages that for whatever reason have chosen to use it
could instead use .External(), and that is what yo should use. I don't
expect that to be enforced by the check code soon, but it might be.

[.External2() exists for historical reason to ease moving things that
used to be primitives in base out into packages where they fit more
naturally. It could be removed now, but I don't think that is high on
anyone's priority list.]

Best,

luke

On Wed, 1 May 2024, Konrad Rudolph wrote:


Hello,

Is the `.External2()` function part of the public API, and can it be used
in R packages submitted to CRAN? I would like to start using it in a
package, and there *are* packages on CRAN which use it. But its man page
[1] calls it “internal”, R-exts doesn’t mention it at all (unlike `.C()`,
`.Call()` and `.External()`), and it doesn’t have any actual documentation.
In the context of the recent tightening of the C API CRAN rules, this makes
me concerned that `.External2()` might be next on the chopping block.

[1]
https://stat.ethz.ch/R-manual/R-devel/library/base/html/Foreign-internal.html

Cheers,




--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu/
__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [Rd] [External] View() segfaulting ...

2024-04-25 Thread luke-tierney--- via R-devel


I saw it also on some of my Ubuntu builds, but the issue went away
after a make clean/make, so maybe give that a try.

Best,

luke

On Wed, 24 Apr 2024, Ben Bolker wrote:

 I'm using bleeding-edge R-devel, so maybe my build is weird. Can anyone 
else reproduce this?


 View() seems to crash on just about anything.

View(1:3)
*** stack smashing detected ***: terminated
Aborted (core dumped)

 If I debug(View) I get to the last line of code with nothing obviously 
looking pathological:


Browse[1]>
debug: invisible(.External2(C_dataviewer, x, title))
Browse[1]> x
$x
[1] "1" "2" "3"

Browse[1]> title
[1] "Data: 1:3"
Browse[1]>
*** stack smashing detected ***: terminated
Aborted (core dumped)




R Under development (unstable) (2024-04-24 r86483)
Platform: x86_64-pc-linux-gnu
Running under: Pop!_OS 22.04 LTS

Matrix products: default
BLAS/LAPACK: 
/usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK 
version 3.10.0


locale:
[1] LC_CTYPE=en_CA.UTF-8   LC_NUMERIC=C
[3] LC_TIME=en_CA.UTF-8LC_COLLATE=en_CA.UTF-8
[5] LC_MONETARY=en_CA.UTF-8LC_MESSAGES=en_CA.UTF-8
[7] LC_PAPER=en_CA.UTF-8   LC_NAME=C
[9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C

time zone: America/Toronto
tzcode source: system (glibc)

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.5.0

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: Is ALTREP "non-API"?

2024-04-24 Thread luke-tierney--- via R-devel


On Wed, 24 Apr 2024, Hadley Wickham wrote:


A few more thoughts based on a simple question: how do you determine the
length of a vector?

Rf_length() is used in example code in R-exts, but I don't think it's
formally documented anywhere (although it's possible I missed it). Is using
in an example sufficient to consider a function to be part of the public
API? If so, SET_TYPEOF() is used in a number of examples, and hence used by
CRAN packages, but is no longer considered part of the public API.

Rf_xlength() doesn't appear to be mentioned anywhere in R-exts. Does this
imply that long vectors are not part of the exported API? Or is there some
other way we should be determining the length of such vectors?

Are the macro variants LENGTH and XLENGTH part of the exported API? Are we
supposed to use them or avoid them?

Relatedly, I presume that LOGICAL() is the way we're supposed to extract
logical values from a vector, but it isn't documented in R-exts, suggesting
that it's not part of the public API?


My pragmatic approach to deciding if an entry point is usable in a
package is to

grep for it in the installed headers

grep for it in WRE

if those are good, check the text in both places to make sure it
doesn't tell me not to use is

The first two can be automated; the text reading can't for now.

One place this runs into trouble is when the prose in WRE doesn't
explicitly mention the entry point, but says something like 'this one
and similar ones are OK'. A couple of years ago I worked on improving
some of those by explicitly adding some of those implicit ones, which
did sometimes make the text more cumbersome. I'm pretty sure I added
LOGICAL() and RAW() at that point (but may be mis-remebering); they
are there now. In some other cases I left the text alone but added
index entries. That makes them findable with a text search. I think I
got most that can be handled that way, but there may be some others
left. Far from ideal, but at least a step forward.



---

It's also worth pointing out where R-exts does well, with the documentation
of utility functions (
https://cran.r-project.org/doc/manuals/R-exts.html#Utility-functions). I
think this is what most people would consider documentation to imply, i.e.
a list of input arguments/types, the output type, and basic notes on their
operation.
---

Finally, it's worth noting that there's some lingering ill feelings over
how the connections API was treated. It was documented in R-exts only to be
later removed, including expunging mentions of it in the news. That's
obviously water under the bridge, but I do believe that there is
the potential for the R core team to build goodwill with the community if
they are willing to engage a bit more with the users of their APIs.


As you well know R-core is not a monolith. There are several R-core
members who also are not happy about how that played out and where
that stands now. But there was and is no viable option other than to
agree to disagree. There is really no upside to re-litigating this
now.

Best,

luke



Hadley

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: Is ALTREP "non-API"?

2024-04-24 Thread luke-tierney--- via R-devel

t it will remain up to date is not
realistic.  The only way that would work reliably is if the list could
be programmatically generated, for example by parsing installed
headers for declarations and caveats as above. Which would be possible
with changes like the ones listed above.

Best,

luke



Hadley





--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu/
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Calling applyClosure from a package?

2024-04-14 Thread luke-tierney--- via R-devel


On Sun, 14 Apr 2024, Matthew Kay wrote:


[You don't often get email from matthew@u.northwestern.edu. Learn why this 
is important at https://aka.ms/LearnAboutSenderIdentification ]

Hi,

Short version of my question: Rf_applyClosure was marked
attribute_hidden in Oct 2023, and I am curious why and if there is an
alternative interface to it planned.


applyClosure has never been part of the API and was/is not intended
for use by packages. Keeping things like this internal is essential to
give us flexibility to make needed improvements to the basic engine.
Moving this out of the installed headers and marking it as not to be
exported merely clarifies that it is internal.


Long version:

I have been toying with building a package that makes it easier to do
non-standard evaluation directly using promises, rather than wrapping
these in a custom type (like e.g. rlang does). The advantage of this
approach is that it should be fully compatible with functions that use
the standard R functions for NSE and inspecting function context, like
substitute(), match.call(), or parent.frame(). And indeed, it works!
-- in R 4.3, that is. The prototype version of the package is here:
https://github.com/mjskay/uneval  (the relevant function to my
question is probably do_invoke, in R/invoke.R).

While testing on R-devel, I noticed that Rf_applyClosure(), which used
to be exported, is now marked with attribute_hidden. I traced the
change to this commit in Oct 2023:
https://github.com/r-devel/r-svn/commit/57dbe8ad471c8a34314ee74362ad479db03c033a

However, the commit message did not give me clarity on the reason for
the change, and I have not been able to find mention of this change in
R-devel, R-package-devel, or the R bug tracker.
So, I am curious why this function is no longer exported and if there
is an alternative function planned to take its place.

Neither Rf_eval nor do.call can do what I need to fully support
rlang-style NSE using base R. The problem is that I need to be able to
manually set up the list of promises provided as arguments to the
function.

I fully understand that the answer to my question might be "don't do
that" ;).


That would be my advice: Don't do that. The API does not provide an
interface for working with promises; in fact the existence of promises
is not guaranteed in the future. Some packages have unfortunately made
use of some internal functions related to promises. For the ones on
CRAN we will work with the maintainers to find alternate
approaches. This may mean adding some functions to the API for dealing
with some lazy-evaluation-related features at a higher level.

Best,

luke


But I will humbly suggest that it would be really nice to be
able to do NSE that can capture expressions with heterogeneous
environments and pass these to functions in a way that is compatible
with existing R functions that do NSE. The basic tools to do it are
there in R 4.3, I think...

Thanks for the help!

---Matt

--
Matthew Kay
Associate Professor
Computer Science & Communication Studies
Northwestern University
matthew@u.northwestern.edu
http://www.mjskay.com/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: Repeated library() of one package with different include.only= entries

2024-04-11 Thread luke-tierney--- via R-devel

On Thu, 11 Apr 2024, Duncan Murdoch wrote:

On 11/04/2024 7:04 a.m., Martin Maechler wrote:

Michael Chirico
 on Mon, 8 Apr 2024 10:19:29 -0700 writes:

 > Right now, attaching the same package with different include.only= 
has no

 > effect:

 > library(Matrix, include.only="fac2sparse")
 > library(Matrix)
 > ls("package:Matrix")
 > # [1] "fac2sparse"

 > ?library does not cover this case -- what is covered is the 
_loading_

 > behavior of repeated calls:

 >> [library and require] check and update the list of currently 
attached

 > packages and do not reload a namespace which is already loaded

 > But here we're looking at the _attach_ behavior of repeated calls.

 > I am particularly interested in allowing the exports of a package to 
be

 > built up gradually:

 > library(Matrix, include.only="fac2sparse")
 > library(Matrix, include.only="isDiagonal") # want: 
ls("package:Matrix") -->

 > c("fac2sparse", "isDiagonal")
 > ...

 > It seems quite hard to accomplish this at the moment. Is the 
behavior to

 > ignore new inclusions intentional? Could there be an argument to get
 > different behavior?

As you did not get an answer yet, ..., some remarks by an
R-corer who has tweaked library() behavior in the past :

- The `include.only = *` argument to library() has been a
   *relatively* recent addition {given the 25+ years of R history}:

   It was part of the extensive new features by Luke Tierney for
   R 3.6.0  [r76248 | luke | 2019-03-18 17:29:35 +0100], with NEWS entry

 • library() and require() now allow more control over handling
   search path conflicts when packages are attached. The policy is
   controlled by the new conflicts.policy option.

- I haven't seen these (then) new features been used much, unfortunately,
   also not from R-core members, but I'd be happy to be told a different 
story.

For the above reasons, it could well be that the current
implementation {of these features} has not been exercised a lot
yet, and limitations as you found them haven't been noticed yet,
or at least not noticed on the public R mailing lists, nor
otherwise by R-core (?).

Your implicitly proposed new feature (or even *changed*
default behavior) seems to make sense to me -- but as alluded
to, above, I haven't been a conscious user of any
'library(.., include.only = *)' till now.

I don't think it makes sense.  I would assume that

 library(Matrix, include.only="isDiagonal")

implies that only `isDiagonal` ends up on the search path, i.e. 
"include.only" means "include only", not "include in addition to whatever 
else has already been attached".

I think a far better approach to solve Michael's problem is simply to use

 fac2sparse <- Matrix::fac2sparse
 isDiagonal <- Matrix::isDiagonal

instead of messing around with the user's search list, which may have been 
intentionally set to include only one of those.

So I'd suggest changing the docs to say

"[library and require] check and update the list of currently attached
packages and do not reload a namespace which is already loaded.  If a package 
is already attached, no change will be made."

?library could also mention using detach() followed by library() or
attachNamespace() with a new include.only specification.

Best,

luke

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu/
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Bug in out-of-bounds assignment of list object to expression() vector

2024-04-05 Thread luke-tierney--- via R-devel


Thanks for the report. Fixed in R-devel and R-patched (both
R-4-4-branch and R-4-3-branch).

On Fri, 5 Apr 2024, June Choe wrote:


[You don't often get email from jchoe...@gmail.com. Learn why this is important 
at https://aka.ms/LearnAboutSenderIdentification ]

There seems to be a bug in out-of-bounds assignment of list objects to an
expression() vector. Tested on release and devel. (Many thanks to folks
over at Mastodon for the help narrowing down this bug)

When assigning a list into an existing index, it correctly errors on
incompatible type, and the expression vector is unchanged:

```
x <- expression(a,b,c)
x[[3]] <- list() # Error
x
#> expression(a, b, c)
```

When assigning a list to an out of bounds index (ex: the next, n+1 index),
it errors the same but now changes the values of the vector to NULL:

```
x <- expression(a,b,c)
x[[4]] <- list() # Error
x
#> expression(NULL, NULL, NULL)
```

Curiously, this behavior disappears if a prior attempt is made at assigning
to the same index, using a different incompatible object that does not
share this bug (like a function):

```
x <- expression(a,b,c)
x[[4]] <- base::sum # Error
x[[4]] <- list() # Error
x
#> expression(a, b, c)
```

That "protection" persists until x[[4]] is evaluated, at which point the
bug can be produced again:

```
x[[4]] # Error
x[[4]] <- list() # Error
x
#> expression(NULL, NULL, NULL)
```

Note that `x` has remained a 3-length vector throughout.

Best,
June

   [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: Bug in out-of-bounds assignment of list object to expression() vector

2024-04-05 Thread luke-tierney--- via R-devel


On Fri, 5 Apr 2024, Ivan Krylov via R-devel wrote:


On Fri, 5 Apr 2024 08:15:20 -0400
June Choe  wrote:


When assigning a list to an out of bounds index (ex: the next, n+1
index), it errors the same but now changes the values of the vector
to NULL:

```
x <- expression(a,b,c)
x[[4]] <- list() # Error
x
#> expression(NULL, NULL, NULL)
```

Curiously, this behavior disappears if a prior attempt is made at
assigning to the same index, using a different incompatible object
that does not share this bug (like a function)


Here's how the problem happens:

1. The call lands in src/main/subassign.c, do_subassign2_dflt().

2. do_subassign2_dflt() calls SubassignTypeFix() to prepare the operand
for the assignment.

3. Since the assignment is "stretching", SubassignTypeFix() calls
EnlargeVector() to provide the space for the assignment.

The bug relies on `x` not being IS_GROWABLE(), which may explain
why a plain x[[4]] <- list() sometimes doesn't fail.

The future assignment result `x` is now expression(a, b, c, NULL), and
the old `x` set to expression(NULL, NULL, NULL) by SET_VECTOR_ELT(newx,
i, VECTOR_ELT(x, i)); CLEAR_VECTOR_ELT(x, i); during EnlargeVector().

4. But then the assignment fails, raising the error back in
do_subassign2_dflt(), because the assignment kind is invalid: there is
no way to put data.frames into an expression vector. The new resized
`x` is lost, and the old overwritten `x` stays there.

Not sure what the right way to fix this is. It's desirable to avoid
shallow_duplicate(x) for the overwriting assignments, but then the
sub-assignment must either succeed or leave the operand untouched.
Is there a way to perform the type check before overwriting the operand?


Yes. There are two places where there are some checks, one early and
the other late. The early one is explicitly letting this one through
and shouldn't. So a one line change would address this particular
problem. But it would be a good idea to review why we the late checks
are needed at all and maybe change that. I'll look into it.

Best,

luke

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Ordered comparison operators on language objects will signal errors

2024-03-04 Thread luke-tierney--- via R-devel


Comparison operators == and != can be used on language objects
(i.e. call objects and symbols). The == operator in particular often
seems to be used as a shorthand for calling identical(). The current
implementation involves comparing deparsed calls as strings. This has
a number of drawbacks and we would like to transition to a more robust
and efficient implementation. As a first step, R-devel will soon be
modified to signal an error when the ordered comparison operators <,
<=, >, >= are used on language objects. A small number of CRAN and
BIOC packages will fail after this change. If you want to check your
packages or code before the change is committed you can run the
current R-devel with the environment variable setting

_R_COMPARE_LANG_OBJECTS=eqonly

where using such a comparison now produces

> quote(x + y) > 1
Error in quote(x + y) > 1 :
  comparison (>) is not possible for language types

Best,

luke


--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Get list of active calling handlers?

2024-02-07 Thread luke-tierney--- via R-devel


On Tue, 6 Feb 2024, Duncan Murdoch wrote:

The SO post https://stackoverflow.com/q/77943180 tried to call 
globalCallingHandlers() from a function, and it failed with the error message 
"should not be called with handlers on the stack".  A much simpler 
illustration of the same error comes from this line:


 try(globalCallingHandlers(warning = function(e) e))

The problem here is that try() sets an error handler, and 
globalCallingHandlers() sees it and aborts.


If I call globalCallingHandlers() with no arguments, I get a list of 
currently active global handlers.  Is there also a way to get a list of 
active handlers, including non-global ones (like the one try() added in the 
line above)?


There is not. The internal stack is not safe to allow to escape to the
R level.  It would be possible to write a reflection function to
provide some information, but it would be a fair bit of work to design
and I don't think would be of enough value to justify that.

The original SO question would be better addressed to
Posit/RStudio. Someone with enough motivation might also be able to
figure out an answer by looking at the source code at
https://github.com/rstudio/rstudio.

Best,

luke




Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] readChar() could read the whole file by default?

2024-01-26 Thread luke-tierney--- via R-devel


On Fri, 26 Jan 2024, Michael Chirico wrote:


I am curious why readLines() has a default (n=-1L) to read the full
file while readChar() has no default for nchars= (i.e., readChar(file)
is an error). Is there a technical reason for this?

I often[1] see code like paste(readLines(f), collapse="\n") which
would be better served by readChar(), especially given issues with the
global string cache I've come across[2]. But lacking the default, the
replacement might come across less clean.


The string cache seems like a very dark pink herring to me. The fact
that the lines are allocated on the heap might create an issue; the
cache isn't likely to add much to that. In any case I would need to
see a realistic example to convince me this is worth addressing on
performance grounds.

I don't see any reason in principle not to have readChar and readBin
read the entire file if n = -1 (others might) but someone would need
to write a patch to implement that.

Best,

luke


For my own purposes the incantation readChar(file, file.size(file)) is
ubiquitous. Taking CRAN code[3] as a sample[4], 41% of readChar()
calls use either readChar(f, file.info(f)$size) or readChar(f,
file.size(f))[5].

Thanks for the consideration and feedback,
Mike C

[1] e.g. a quick search shows O(100) usages in CRAN packages:
https://github.com/search?q=org%3Acran+%2Fpaste%5B%28%5D%5Cs*readLines%5B%28%5D.*%5B%29%5D%2C%5Cs*collapse%5Cs*%3D%5Cs*%5B%27%22%5D%5B%5C%5C%5D%2F+lang%3AR=code,
and O(1000) usages generally on GitHub:
https://github.com/search?q=lang%3AR+%2Fpaste%5B%28%5D%5Cs*readLines%5B%28%5D.*%5B%29%5D%2C%5Cs*collapse%5Cs*%3D%5Cs*%5B%27%22%5D%5B%5C%5C%5D%2F+lang%3AR=code
[2] AIUI the readLines() approach "pollutes" the global string cache
with potentially 1000s/1s of strings for each line, only to get
them gc()'d after combining everything with paste(collapse="\n")
[3] The mirror on GitHub, which includes archived packages as well as
current (well, eventually-consistent) versions.
[4] Note that usage in packages is likely not representative of usage
in scripts, e.g. I often saw readChar(f, 1), or eol-finders like
readChar(f, 500) + grep("[\n\r]"), which makes more sense to me as
something to find in package internals than in analysis scripts. FWIW
I searched an internal codebase (scripts and packages) and found 70%
of usages reading the full file.
[5] repro: 
https://gist.github.com/MichaelChirico/247ea9500460dca239f031e74bdcf76b
requires GitHub PAT in env GITHUB_PAT for API permissions.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: Choices to remove `srcref` (and its buddies) when serializing objects

2024-01-18 Thread luke-tierney--- via R-devel


On Thu, 18 Jan 2024, Ivan Krylov via R-devel wrote:


В Tue, 16 Jan 2024 14:16:19 -0500
Dipterix Wang  пишет:


Could you recommend any packages/functions that compute hash such
that the source references and sexpinfo_struct are ignored? Basically
a version of `serialize` that convert R objects to raw without
storing the ancillary source reference and sexpinfo.


I can show how this can be done, but it's not currently on CRAN or even
a well-defined package API. I have adapted a copy of R's serialize()
[*] with the following changes:

* Function bytecode and flags are ignored:

f <- function() invisible()
depcache:::hash(f, 2) # This is plain FNV1a-64 of serialize() output
# [1] "9b7a1af5468deba4"
.Call(depcache:::C_hash2, f) # This is the new hash
[1] 91 5f b8 a1 b0 6b cb 40
f() # called once: function gets the MAYBEJIT_MASK flag
depcache:::hash(f, 2)
# [1] "7d30e05546e7a230"
.Call(depcache:::C_hash2, f)
# [1] 91 5f b8 a1 b0 6b cb 40
f() # called twice: function now has bytecode
depcache:::hash(f, 2)
# [1] "2a2cba4150e722b8"
.Call(depcache:::C_hash2, f)
# [1] 91 5f b8 a1 b0 6b cb 40 # new hash stays the same

* Source references are ignored:

.Call(depcache:::C_hash2, \( ) invisible( ))
# [1] 91 5f b8 a1 b0 6b cb 40 # compare vs. above

# For quoted function definitions, source references have to be handled
# differently
.Call(depcache:::C_hash2, quote(function(){}))
[1] 58 0d 44 8e d4 fd 37 6f
.Call(depcache:::C_hash2, quote(\( ){  }))
[1] 58 0d 44 8e d4 fd 37 6f

* ALTREP is ignored:

identical(1:10, 1:10+0L)
# [1] TRUE
identical(serialize(1:10, NULL), serialize(1:10+0L, NULL))
# [1] FALSE
identical(
.Call(depcache:::C_hash2, 1:10),
.Call(depcache:::C_hash2, 1:10+0L)
)
# [1] TRUE

* Strings not marked as bytes are encoded into UTF-8:

identical('\uff', iconv('\uff', 'UTF-8', 'latin1'))
# [1] TRUE
identical(
serialize('\uff', NULL),
serialize(iconv('\uff', 'UTF-8', 'latin1'), NULL)
)
# [1] FALSE
identical(
.Call(depcache:::C_hash2, '\uff'),
.Call(depcache:::C_hash2, iconv('\uff', 'UTF-8', 'latin1'))
)
# [1] TRUE

* NaNs with different payloads (except NA_numeric_) are replaced by
  R_NaN.

One of the many downsides to the current approach is that we rely on
the non-API entry point getPRIMNAME() in order to hash builtins.
Looking at the source code for identical() is no help here, because it
uses the private PRIMOFFSET macro.

The bitstream being hashed is also, unfortunately, not exactly
compatible with R serialization format version 2: I had to ignore the
LEVELS of the language objects being hashed both because identical()
seems to ignore those and because I was missing multiple private
definitions (e.g. the MAYBEJIT flag) to handle them properly.

Then there's also the problem of immediate bindings [**]: I've seen bits
of vctrs, rstudio, rlang blow up when calling CAR() on SEXP objects that
are not safe to handle this way, but R_expand_binding_value() (used by
serialize()) is again a private function that is not accessible from
packages. identical() won't help here, because it compares reference
objects (which may or may not contain such immediate bindings) by their
pointer values instead of digging down into them.


What does 'blow up' mean? If it is anything other than signal a "bad
binding access" error then it would be good to have more details.

Best,

luke


Dropping the (already violated) requirement to be compatible with R
serialization bitstream will make it possible to simplify the code
further.

Finally:

a <- new.env()
b <- new.env()
a$x <- b$x <- 42
identical(a, b)
# [1] FALSE
.Call(depcache:::C_hash2, a)
# [1] 44 21 f1 36 5d 92 03 1b
.Call(depcache:::C_hash2, b)
# [1] 44 21 f1 36 5d 92 03 1b

...but that's unavoidable when looking at frozen object contents
instead of their live memory layout.

If you're interested, here's the development version of the package:
install.packages('depcache',contriburl='https://aitap.github.io/Rpackages')




--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] UseMethod forwarding of local variables

2023-10-20 Thread luke-tierney


UseMethod has since the beginning had the 'feature' that local
variables in the generic are added to the environment in which the
method body is evaluated. This is documented in ?UseMethod and
R-lang.texi, but use of this 'feature' has been explicitly discouraged
in R-lang.texi for many years.

This is an unfortunate design decision for a number of reasons (see
below), so the plan is to remove this 'feature' in the next major
release.

Fortunately only a small number of packages on CRAN (see below) seem
to make use of this feature directly; a few more as reverse
dependencies.  The maintainers of the directly affected packages will
be notified separately.

Current R-devel allows you to set the environment variable
R_USEMETHOD_FORWARD_LOCALS=none to run R without this feature or
R_USEMETHOD_FORWARD_LOCALS=error to signal an error when a forwarded
variable's value is used.

Some more details:

An example:

> foo <- function(x) { yyy <- 77; UseMethod("foo") }
> foo.bar <- function(x) yyy
> foo(structure(1, class = "bar"))
[1] 77

Some reasons the design is a bad idea:

- You can't determine what a method does without knowing what the
  generic it will be called from looks like.

- Code analysis (codetools, the compiler) can't analyze method
  code reliably.

- You can't debug a method on its own. For the foo() example,

> foo.bar(structure(1, class = "bar"))
Error in foo.bar(structure(1, class = "bar")) : object 'yyy' not found

- A method relying on these variables won't work when reached via NextMethod:

> foo.baz <- function(x) NextMethod("foo")
> foo(structure(2, class = c("baz", "bar")))
Error in foo.bar(structure(2, class = c("baz", "bar"))) :
  object 'yyy' not found

The directly affected CRAN packages I have identified are:

- actuar
- quanteda
- optmatch
- rlang
- saeRobust
- Sim.DiffProc
- sugrrants
- texmex

Some of these fail with the environment set to 'error' but not to
'none', so they are getting a value from somewhere else that may or
may not be right.

Affected as revdeps of optmatch:

- cobalt
- htetree
- jointVIP
- MatchIt
- PCAmatchR
- rcbalance
- rcbsubset
- RItools
- stratamatch

Affected as revdeps of texmex:

- lax
- mobirep

Best,

luke

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] On PRINTNAME() encoding, EncodeChar(), and being painted into a corner

2023-09-22 Thread luke-tierney

,7 @@ SEXP eval(SEXP e, SEXP rho)
const char *n = CHAR(PRINTNAME(e));
-   if(*n) errorcall(getLexicalCall(rho),
+   if(*n) errorcall_cpy(getLexicalCall(rho),
 _("argument \"%s\" is missing, with no default"),
-CHAR(PRINTNAME(e)));
+EncodeChar(PRINTNAME(e)));
else errorcall(getLexicalCall(rho),
   _("argument is missing, with no default"));
}

--- src/main/match.c
+++ src/main/match.c
@@ -229,7 +229,7 @@ attribute_hidden SEXP matchArgs_NR(SEXP
  if (fargused[arg_i] == 2)
- errorcall(call,
+ errorcall_cpy(call,
  _("formal argument \"%s\" matched by multiple actual 
arguments"),
- CHAR(PRINTNAME(TAG(f;
+ EncodeChar(PRINTNAME(TAG(f;
  if (ARGUSED(b) == 2)
  errorcall(call,
  _("argument %d matches multiple formal 
arguments"),
@@ -272,12 +271,12 @@ attribute_hidden SEXP matchArgs_NR(SEXP
if (fargused[arg_i] == 1)
-   errorcall(call,
+   errorcall_cpy(call,
_("formal argument \"%s\" matched by multiple actual 
arguments"),
-   CHAR(PRINTNAME(TAG(f;
+   EncodeChar(PRINTNAME(TAG(f;
if (R_warn_partial_match_args) {
warningcall(call,
_("partial argument match of '%s' to 
'%s'"), CHAR(PRINTNAME(TAG(b))),
CHAR(PRINTNAME(TAG(f))) );
}
SETCAR(a, CAR(b));
if (CAR(b) != R_MissingArg) SET_MISSING(a, 0);

The changes become more complicated with a plain error() (have to
figure out the current call and provide it to errorcall_cpy), still
more complicated with warnings (there's currently no warningcall_cpy(),
though one can be implemented) and even more complicated when multiple
symbols are used in the same warning or error, like in the last
warningcall() above (EncodeChar() can only be called once at a time).

The only solution to the latter problem is an EncodeChar() variant that
allocates its memory dynamically. Would R_alloc() be acceptable in this
context? With errors, the allocation stack would be quickly reset
(except when withCallingHandlers() is in effect?), but with warnings,
the code would have to restore it manually every time.


Or allow/require a buffer to be provided. So replacing the calls like

   CHAR(PRINTNAME(sym))

with

   EncodeSymbol(sym, buf, buf_size)


Is it even worth
the effort to try to handle the (pretty rare) non-syntactic symbol names
while constructing error messages? Other languages (like Lua or SQLite)
provide a special printf specifier (typically %q) to create
quoted/escaped string representations, but we're not yet at the point
of providing a C-level printf implementation.


Not clear it is worth it. But the situation now is not good, because
sometimes we encode and sometimes we don't. It would be better to be
consistent, both for the end user and for maintainers who now have to
spend time figuring out which way to go.

Best,

luke

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: Calling a replacement function in a custom environment

2023-08-27 Thread luke-tierney


On Sun, 27 Aug 2023, Duncan Murdoch wrote:

I think there isn't a way to make this work other than calling `is.na<-` 
explicitly:


 x <- b$`is.na<-`(x, TRUE)



Replacement functions are not intended to be called directly. Calling
a replacement function directly may produce an error, or may just do
the wrong thing in terms of mutation.


It seems like a reasonable suggestion to make

 b$is.na(x) <- TRUE

work as long as b is an environment.



I do not think it is a reasonable suggestion. The reasons a::b and
a:::b were made to work is that many users read these as a single
symbol, not a call to a binary operator. So supporting this helped to
reduce confusion.

Allowing $<- to "work" on environments was probably a mistake since
environments behave differently with respect to
duplication. Disallowing it entirely may be too disruptive at this
point, but disallowing it in complex assignment expressions may be
necessary to prevent mutations that should not happen. (There are open
bug reports that boil down to this.)

In any case, complicating the complex assignment code, which is
already barely maintainable, would be a very bad idea.

Best,

luke

If you wanted it to work when b was a list, it would be more problematic 
because of partial name matching.  E.g. suppose b was a list containing 
functions partial(), partial<-(), and part<-(), and I call


 b$part(x) <- 1

what would be called?

Duncan Murdoch

On 27/08/2023 10:59 a.m., Konrad Rudolph wrote:

Hello all,

I am wondering whether it’s at all possible to call a replacement function
in a custom environment. From my experiments this appears not to be the
case, and I am wondering whether that restriction is intentional.

To wit, the following works:

x = 1
base::is.na(x) = TRUE

However, the following fails:

x = 1
b = baseenv()
b$is.na(x) = TRUE

The error message is "invalid function in complex assignment". Grepping the
R code for this error message reveals that this behaviour seems to be
hard-coded in function `applydefine` in src/main/eval.c: the function
explicitly checks for `::` and :::` and permits those assignments, but has
no equivalent treatment for `$`.

Am I overlooking something to make this work? And if not — unless there’s a
concrete reason against it, could it be considered to add support for this
syntax, i.e. for calling a replacement function by `$`-subsetting the
defining environment, as shown above?

Cheers,
Konrad



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [R-pkg-devel] [External] Another DLL requires the use of native symbols thread

2023-03-25 Thread luke-tierney


Take another look at R-exts, specifically

https://cran.r-project.org/doc/manuals/R-exts.html#Registering-native-routines

Your call

R_forceSymbols(dll, TRUE);

says you want foreign function calls to only work with entry points
specified by R objects representing native symbols. This was not being
enforced but is now in R devel.

Your NAMESPACE file directive

useDynLib(splines, .registration = TRUE, .fixes = "C_")

says that these R objects should use a C_ prefix, so your .Fortran
call should be

   res <- .Fortran(C_bt, as.double(Temp), as.double(y), as.integer(icode))

So you can either fix your .Fortran call to be consistent with what
you have asked R to use, or drop the R_forceSymbols call.

Best,

luke

On Sat, 25 Mar 2023, Shawn Way wrote:


Sorry to kind of repeat this but I really didn't understand the issues with the 
prior thread and how it relates to my issue.

I'm getting the error message in

   B_T <- BT(Temp)
   Error in BT(Temp) : DLL requires the use of native symbols
   Execution halted

And frankly, I have no idea what this means.  The function call is pretty 
simple and meets the requirement of .Fortran

res <- .Fortran('BT', as.double(Temp), as.double(y), as.integer(icode))

and the following matches the R-ext's method for using registration:

extern void F77_NAME(bt)(double *T, double *B, int *icode);

void R_init_IAPWS95(DllInfo *dll)
{
 R_registerRoutines(dll, NULL, NULL, FortranEntries, NULL);
 R_useDynamicSymbols(dll, FALSE);
 R_forceSymbols(dll, TRUE);
}

With

static const R_FortranMethodDef FortranEntries[] = {
 {"bt",(DL_FUNC) _NAME(bt),3},
 {NULL, NULL, 0}
};

Furthermore the NAMESPACE includes:

useDynLib(splines, .registration = TRUE, .fixes = "C_")


which matches R-ext.

Since this is only occurring on the Linux versions for the software and I use 
windows, can someone point me in the right direction to fix this error?

Shawn Way

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [Rd] [External] Time to drop globalenv() from searches in package code?

2022-09-17 Thread luke-tierney


On Sat, 17 Sep 2022, Kurt Hornik wrote:


luke-tierney  writes:



On Thu, 15 Sep 2022, Duncan Murdoch wrote:

The author of this Stackoverflow question
https://stackoverflow.com/q/73722496/2554330 got confused because a typo in
his code didn't trigger an error in normal circumstances, but it did when he
ran his code in pkgdown.

The typo was to use "x" in a test, when the local variable was named ".x".
There was no "x" defined locally or in the package or its imports, so the
search got all the way to the global environment and found one.  (The very
confusing part for this user was that it found the right variable.)

This author had suppressed the "R CMD check" check for use of global
variables.  Obviously he shouldn't have done that, but he's working with
tidyverse NSE, and that causes so many false positives that it is somewhat
understandable he would suppress one too many.

The pkgdown simulation of code in examples doesn't do perfect mimicry of
running it at top level; the fake global environment never makes it onto the
search list.  Some might call this a bug, but I'd call it the right search
strategy.

My suggestion is that the search for variables in package code should never
get to globalenv().  The chain of environments should stop after handling the
imports.  (Probably base package functions should also be implicitly
imported, but nothing else.)




This was considered and discussed when I added namespaces. Basically
it would mean making the parent of the base namespace environment be
the empty environment instead of the global environment. As a design
this is cleaner, and it would be a one-line change in eval.c.  But
there were technical reasons this was not a viable option at the time,
also a few political reasons. The technical reasons mostly had to do
with S3 dispatch.



Changes over the years, mostly from work Kurt has done, to S3 dispatch
for methods defined and registered in packages might make this more
viable in principle, but there would still be a lot of existing code
that would stop working. For example, 'make check' with the one-line
change fails in a base example that defines an S3 method. It might be
possible to fiddle with the dispatch to keep most of that code
working, but I suspect that would be a lot of work. Seeing what it
would take to get 'make check' to succeed would be a first step if
anyone wants to take a crack at it.


Luke,

Can you please share the one-line change so that I can take a closer
look?


Index: src/main/envir.c
===
--- src/main/envir.c(revision 82861)
+++ src/main/envir.c(working copy)
@@ -683,7 +683,7 @@
 R_GlobalCachePreserve = CONS(R_GlobalCache, R_NilValue);
 R_PreserveObject(R_GlobalCachePreserve);
 #endif
-R_BaseNamespace = NewEnvironment(R_NilValue, R_NilValue, R_GlobalEnv);
+R_BaseNamespace = NewEnvironment(R_NilValue, R_NilValue, R_EmptyEnv);
 R_PreserveObject(R_BaseNamespace);
 SET_SYMVALUE(install(".BaseNamespaceEnv"), R_BaseNamespace);
 R_BaseNamespaceName = ScalarString(mkChar("base"));

-

For S3 the dispatch will have to be changed to explicitly search
.GlobalEnv and parents after the namespace if we don't want to break
too much.

Another idiom that will be broken is

if (require("foo"))
   bar(...)

with bar exported from foo. I don't know if that is already warned
about.  Moving away from this is arguably good in principle but also
probably fairly disruptive. We might need to add some cleaner
use-if-available mechanism, or maybe just adjust some checking code.

Best,

luke



Best
-k


I suspect this change would reveal errors in lots of packages, but the number
of legitimate uses of the current search strategy has got to be pretty small
nowadays, since we've been getting warnings for years about implicit imports
from other standard packages.



Your definition of 'legitimate' is probably quite similar to mine, but
there is likely to be a small but vocal minority with very different
views :-).



Best,



luke



Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Scien

Re: [Rd] [External] assignment

2021-12-27 Thread luke-tierney


On Mon, 27 Dec 2021, Gabor Grothendieck wrote:


In a recent SO post this came up (changed example to simplify it
here).  It seems that `test` still has the value sin.

 test <- sin
 environment(test)$test <- cos
 test(0)
 ## [1] 0

It appears to be related to the double use of `test` in `$<-` since if
we break it up it works as expected:

 test <- sin
 e <- environment(test)
 e$test <- cos
 test(0)
 ## [1] 1

`assign` also works:

 test <- sin
 assign("test", cos, environment(test))
 test(0)
 ## [1] 1

Can anyone shed some light on this?


See my response in

https://bugs.r-project.org/show_bug.cgi?id=18269

Best,

luke

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: hashtab address arg

2021-12-22 Thread luke-tierney


On Wed, 22 Dec 2021, Ivan Krylov wrote:


On Sat, 18 Dec 2021 11:50:54 +0100
Arnaud FELD  wrote:


However, I'm a bit troubled about the "address" argument. What is it
intended for since (as far as I know) "address equality" is until now
something that isn't really let for the user to decide within R.


Using the words from "Extending R" by John M. Chambers, the concept of
address identity could be related to the question:


If some of the data in the object has changed, is this still the
same object?


Most objects in R are defined by their content. If you had a 100x100
matrix and changed an element at [50,50], it's now a different matrix,
even if it's stored in the same variable. If you create another 100x100
matrix in a different variable but fill it with the same numbers, it
should still compare equal to your original matrix.

Not all types of R objects are like that. Environments are good
candidates for pointer equality comparison. For example, the contents
of the global environment change every time you assign some variable in
the R command line, but it remains the same global environment. Indeed,
identical() for environments just compares their pointers: even if two
different environments only contain objects that compare equal, they
cannot be considered the same environment, because different closures
might be referring to them. Similar are data.tables: if you had a giant
dataset and, as part of cleaning it up, removed some outliers, perhaps
it should be considered the same dataset, even if the contents aren't
strictly the same any more. Same goes for reference class and R6
objects: unlike the pass-by-value semantics associated with most
objects in R, these are assumed to carry global state within them, and
modifications to them are reflected everywhere they are referenced, not
limited to the current function call.


This is still experimental and the 'address' option may not survive at
the R level. There are some C level applications where it can be
useful; maybe it will only be retained there.


I *think* that most (if not all) objects with reference semantics
already use pointer comparison when being compared by identical(), so
the default of "identical" is, as the help page says, almost always the
right choice, but if it matters to your code whether the objects are
actually stored in the same area in the memory, use hashes of type
"address".


Unfortunately not all: External pointer objects are reference objects
but by default are not compared based on object address. Fixing the
default is not an option in the short term as it breaks too much code
(mostly through dependencies on a few packages).


(Perhaps this topic could be a better fit for R-help.)


R-devel is the right place for this.

Best,

luke

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Status of '=>'

2021-12-20 Thread luke-tierney


It's still work in progress. Probably => will be dropped in favor of
limited use of _ for non-first-argument passing.

Best,

luke

On Mon, 20 Dec 2021, Dirk Eddelbuettel wrote:



R 4.1.0 brought the native pipe and the related ability to use '=>' if one
opted into it by setting _R_USE_PIPEBIND_. I often forget about '=>' and
sadly can never find anything in the docs either (particularly no 'see als'
from '|>' docs) which is not all that heplful.

Can we anticipate a change with R 4.2.0, or will it remain as is, somewhat
available but not really documented or enabled? Clarifications welcome,
otherwise 'time will tell' as usual.

Thanks,  Dirk




--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] DOCS: Exactly when, in the signaling process, is option 'warn' applied?

2021-11-18 Thread luke-tierney


On Thu, 18 Nov 2021, Henrik Bengtsson wrote:


Hi,

the following question sprung out of a package settings option warn=-1
to silence warnings, but those warnings were still caught by
withCallingHandlers(..., warning), which the package author did not
anticipate. The package has been updated to use suppressWarnings()
instead, but as I see a lot of packages on CRAN [1] use
options(warn=-1) to temporarily silence warnings, I wanted to bring
this one up. Even base R itself [2] does this, e.g.
utils::assignInMyNamespace().

Exactly when is the value of 'warn' options used when calling warning("boom")?



In the default handler; it doesn't affect signaling.

Much of the documentation pre-dates the condition system; happy to
consider patches.

Best,

luke


I think the docs, including ?options, would benefit from clarifying
that. To the best of my understanding, it should also mention that
options 'warn' is meant to be used by end-users, and not in package
code where suppressWarnings() should be used.

To clarify, if we do:


options(warn = -1)
tryCatch(warning("boom"), warning = function(w) stop("Caught warning: ", 
conditionMessage(w), call. = FALSE))

Error: Caught warning: boom

we see that the warning is indeed signaled.  However, in Section '8.2
warning' of the 'R Language Definition' [3], we can read:

"The function `warning` takes a single argument that is a character
string. The behaviour of a call to `warning` depends on the value of
the option `"warn"`. If `"warn"` is negative warnings are ignored.
[...]"

The way this is written, it may suggest that warnings are
ignored/silences already early on when calling warning(), but the
above example shows that that is not the case.

From the same section, we can also read:

"[...] If it is zero, they are stored and printed after the top-level
function has completed. [...]"

which may hint at the 'warn' option is applied only when a warning
condition is allowed to "bubble up" all the way to the top level.
(FWIW, this is how always though it worked, but it's only now I looked
into the docs and see it's ambiguous on this).

/Henrik

[1] 
https://github.com/search?q=org%3Acran+language%3Ar+R%2F+in%3Afile%2Cpath+options+warn+%22-1%22=Code
[2] 
https://github.com/wch/r-source/blob/0a31ab2d1df247a4289efca5a235dc45b511d04a/src/library/utils/R/objects.R#L402-L405
[3] https://cran.r-project.org/doc/manuals/R-lang.html#warning

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] GC: speeding-up the CHARSXP cache maintenance, 2nd try

2021-11-04 Thread luke-tierney


Can you please submit this as a wishlist item to bugzilla? it is
easier to keep track of there. You could also submit your threads
based suggestion there, again to keep it easier to keep track of and
possibly get back to in the future.

I will have a look at your approach when I get a chance, but I am
exploring a different approach to avoid scanning old generations that
may be simpler.

Best,

luke

On Wed, 3 Nov 2021, Andreas Kersting wrote:


Hi,

In https://stat.ethz.ch/pipermail/r-devel/2021-October/081147.html I proposed 
to speed up the CHARSXP cache maintenance during GC using threading. This was 
rejected by Luke in 
https://stat.ethz.ch/pipermail/r-devel/2021-October/081172.html.

Here I want to propose an alternative approach to significantly speed up 
CHARSXP cache maintenance during partial GCs. A patch which passes `make 
check-devel` is attached. Compared to R devel (revision 81110) I get the 
following performance improvements on my system:

Elapsed time for five non-full gc in a session after

x <- as.character(runif(5e7))[]
gc(full = TRUE)

+20sec -> ~1sec.


This patch introduces (theoretical) overheads to mkCharLenCE() and full GCs. 
However, I did not measure dramatic differences:

y <- "old_CHARSXP"

after

x <- "old_CHARSXP"; gc(); gc()

takes a median 32 nanoseconds with and without the patch.


gc(full = TRUE)

in a new session takes a median 16 milliseconds with and 14 without the patch.


The basic idea is to maintain the CHARSXP cache using subtables in R_StringHash, one for 
each of the (NUM_GC_GENERATIONS := NUM_OLD_GENERATIONS + 1) GC generations. New CHARSXPs 
are added by mkCharLenCE() to the subtable of the youngest generation. After a partial 
GC, only the chains anchored at the subtables of the youngest (num_old_gens_to_collect + 
1) generations need to be searched for and cleaned of unmarked nodes. Afterwards, these 
chains need to be merged into those of the respective next generation, if any. This 
approach relies on the fact that an object/CHARSXP can never become younger again. It is 
OK though if an object/CHARSXP "skips" a GC generation.

R_StringHash, which is now of length (NUM_GC_GENERATIONS * char_hash_size), is 
structured such that the chains for the same hashcode but for different 
generations are anchored at slots of R_StringHash which are next to each other 
in memory. This is because we often need to access two or more (i.e. currently 
all three) of them for one operation and this avoids cache misses.

HASHPRI, i.e. the number of occupied primary slots, is computed and stored as 
NUM_GC_GENERATIONS times the number of slots which are occupied in at least one 
of the subtables. This is done because in mkCharLenCE() we need to iterate 
through one or more chains if and only if there is a chain for the particular 
hashcode in at least one subtable.

I tried to keep the patch as minimal as possible. In particular, I did not add 
long vector support to R_StringHash. I rather reduced the max value of 
char_hash_size from 2^30 to 2^29, assuming that NUM_OLD_GENERATIONS is (not 
larger than) 2. I also did not yet adjust do_show_cache() and do_write_cache(), 
but I could do so if the patch is accepted.

Thanks for your consideration and feedback.

Regards,
Andreas


P.S. I had a hard time to get the indentation right in the patch due the mix of 
tabs and spaces. Sorry, if I screwed this up.


--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: Wrong number of names?

2021-11-02 Thread luke-tierney

On Mon, 1 Nov 2021, Martin Maechler wrote:

Duncan Murdoch
on Mon, 1 Nov 2021 06:36:17 -0400 writes:

   > The StackOverflow post
   > https://stackoverflow.com/a/69767361/2554330 discusses a
   > dataframe which has a named numeric column of length 1488
   > that has 744 names. I don't think this is ever legal, but
   > am I wrong about that?

   > The `dat.rds` file mentioned in the post is temporarily
   > available online in case anyone else wants to examine it.

   > Assuming that the file contains a badly formed object, I
   > wonder if readRDS() should do some sanity checks as it
   > reads.

   > Duncan Murdoch

Good question.

In the mean time, I've also added a bit on the SO page
above.. e.g.

---

d <- readRDS("<.>dat.rds")
str(d)
## 'data.frame':1488 obs. of  4 variables:
##  $ facet_var: chr  "AUT" "AUT" "AUT" "AUT" ...
##  $ date : Date, format: "2020-04-26" "2020-04-27" ...
##  $ variable : Factor w/ 2 levels "arima","prophet": 1 1 1 1 1 1 1 1 1 1 ...
##  $ score: Named num  2.74e-06 2.41e-06 2.48e-06 2.39e-06 2.79e-06 ...
##   ..- attr(*, "names")= chr [1:744] "new_confirmed10" "new_confirmed10" 
"new_confirmed10" "new_confirmed10" ...

ds <- d$score
c(length(ds), length(names(ds)))
## 1488   744

dput(ds) # ->

##  *** caught segfault ***
## address (nil), cause 'memory not mapped'

If I'm reading this right then dput is where the segfault is
happening, so that could use some more bulletproofing.

Best,

luke

---

Hence  "proving" that the dat.rds  really contains an invalid object,
when simple  dput(.) directly gives a segmentation fault.

I think we are aware that using C code and say .Call(..)  one
can create all kinds of invalid objects "easily".. and I think
it's clear that it's not feasible to check for validity of such
objects "everwhere".

Your proposal to have at least our deserialization code used in
readRDS() do (at least *some*) validity checks seems good, but
maybe we should think of more cases, and / or  do such validity
checks already during serialization { <-> saveRDS() here } ?

.. Such questions then really are for those who understand more than
me about (de)serialization in R, its performance bottlenecks etc.
Given the speed impact we should probably have such checks *optional*
but have them *on* by default e.g., at least for saveRDS() ?

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] GC: improving the marking performance for STRSXPs

2021-10-18 Thread luke-tierney


Thanks. I have committed a modified version, also incorporating the
handling of R_StringHash from your other post, in r81073. I prefer to
be more conservative in the GC. for example not assume without
checking that STRSXP elements are CHARSXP. This does add some
overhead, but the change is still beneficial.

I don't think we would want to add the complexity of threading at this
point, though it might be worth considering at a later time. There are
a few other possible modifications that I'll explore that might
provide comparable improvements to the ones seen with your patch
without adding the complexity of threads.

Best,

luke

On Thu, 7 Oct 2021, Andreas Kersting wrote:


Hi all,

in GC (in src/main/memory.c), FORWARD_CHILDREN() (called by PROCESS_NODES()) 
treats STRSXPs just like VECSXPs, i.e. it calls FORWARD_NODE() for all its 
children. I claim that this is unnecessarily inefficient since the children of 
a STRSXP can legitimately only be (atomic) CHARSXPs and could hence be marked 
directly in the call of FORWARD_CHILDREN() on the STRSXP.

Attached patch (atomic_CHARSXP.diff) implements this and gives the following 
performance improvements on my system compared to R devel (revision 81008):

Elapsed time for two full gc in a session after

x <- as.character(runif(5e7))[]

19sec -> 15sec.

This is the best-case scenario for the patch: very many unique/unmarked CHARSXP 
in the STRSXP. For already marked CHARSXP there is no performance gain since 
FORWARD_NODE() is a no-op for them.

The relative performance gain is even bigger if iterating through the STRSXP 
produces many cache misses, as e.g. after

x <- as.character(runif(5e7))[]
x <- sample(x, length(x))

Elapsed time for two full gc here: 83sec -> 52sec. This is because we have less 
cache misses per CHARSXP.

This patch additionally also assumes that the ATTRIBs of a CHARSXP are not to 
be traced because they are just used for maintaining the CHARSXP hash chains.

The second attached patch (atomic_CHARSXP_safe_unlikely.diff) checks both 
assumptions and calls gc_error() if they are violated and is still noticeably faster 
than R devel: 19sec -> 17sec and 83sec -> 54sec, respectively.

Attached gc_test.R is the script I used to get the previously mentioned and 
more gc timings.

Do you think that this is a reasonable change? It does make the code more 
complex and I am not sure if there might be situations in which the assumptions 
are violated, even though SET_STRING_ELT() and installAttrib() do enforce 
them.

Best regards,
Andreas


--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Bioc-devel] [External] Re: Strange "internal logical NA value has been modified" error

2021-10-13 Thread luke-tierney

The most likely culprit is C code that is modifying a logical vector
without checking whether this is legitimate for R semantics
(i.e. making sure MAYBE_REFERENCED or at least MAYBE_SHARED is FALSE).
If that is the case, then this is legitimate for C code to do in
principle, so UBSAN and valgrind won't help. You need to set a gdb
watchpoint on the location, catch where it is modified, and look up
the call stack from there.

The error signaled in the GC is a sanity check for catching that this
sort of misbehavior has happened in C code. But it is a check after
the fact; it can't tell you more that that the problem happened
sometime before it was detected.

Best,

luke

On Wed, 13 Oct 2021, Martin Morgan wrote:

The problem with using gdb is you'd find yourself in the garbage collector, but 
perhaps quite removed from where the corruption occurred, e.g., gc() might / 
will likely be triggered after you've returned to the top-level evaluation 
loop, and the part of your code that did the corruption might be off the stack.

The problem with devtools::check() (and R CMD check) is that running the unit 
tests occurs in a separate process, so things like setting a global option (and 
even system variable from within R) may not be visible in the process doing the 
check. Conversely, for the same reasons, it seems like the problem can be 
tickled by running the tests alone. So

 R -f /tests/testthat.R

would seem to be a good enough starting point.

Actually, I liked Henrik's UBSAN suggestion, which requires the least amount of 
work. I think I'd then try

 R -d valgrind -f /tests/testthat.R

and then further into the weeds... actually from the section of R-exts you 
mention

 R_C_BOUNDS_CHECK=yes R -f /tests/testthat.R

might also be promising.

Martin

On 10/12/21, 10:30 PM, "Bioc-devel on behalf of Pariksheet Nanda" 
 wrote:

   Hi all,

   On 10/12/21 6:43 PM, Pariksheet Nanda wrote:
   >
   > Error in `...`: internal logical NA value has been modified

   In the R source code, this error is in src/main/memory.c so I was
   thinking one way of investigating might be to run `R --debugger gdb`,
   then running R to load the symbols and either:

   1) set a breakpoint for when it reaches that particular line in
   memory.c:R_gc_internal and then walk up the stack,

   2) or set a watch point on memory.c:R_gc_internal:R_LogicalNAValue
   (somehow; having trouble getting gdb to reach that context).

   3) Then I thought, maybe this is getting far into the weeds and instead
   I could check the most common C related error by enabling bounds
   checking of my C arrays per section 4.4 of the R-exts manual:

   $ R -q
> options(CBoundsCheck = TRUE)
> Sys.setenv(R_C_BOUNDS_CHECK = "yes") # Try both ways *shrug*
> devtools::test()
   ... # All tests still pass.
> devtools::check()
   ... # No change :(

   Maybe I'm not sure I'm using that option correctly?  Or the option is
   ignored in devtools::check().  Or indeed, the error is not from over
   running C array boundaries.

   It turns out that using the precompiled debug symbols[1] isn't all that
   useful here because I don't get line numbers in gdb without the source
   files and many symbols are optimized out, so it looks like I would need
   to compile R from source with -ggdb first instead of using the Debian
   packages.

   Hopefully this is still the right approach?

   Pariksheet

   [1] After install r-base-core-dbg on Debian for the debug symbols.

   ___
   Bioc-devel@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/bioc-devel
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Rd] [External] Re: Workaround very slow NAN/Infinities arithmetic?

2021-09-30 Thread luke-tierney

e not rare in real-life, I think that it would worth an extra check 
in functions based on long doubles, such as sum(). The check for special 
representations do not necessarily have to be done at each iteration for 
cumulative functions.
If you are interested, I can write a bunch of patches to fix the main functions using 
long doubles: cumsum, cumprod, sum, prod, rowSums, colSums, matrix multiplication 
(matprod="internal").

What do you think of that?

--
Sincerely
Andr� GILLIBERT

 [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: Is it a good choice to increase the NCONNECTION value?

2021-08-24 Thread luke-tierney


We do need to be careful about using too many file descriptors.  The
standard soft limit on Linux is fairly low (1024; the hard limit is
usually quite a bit higher). Hitting that limit, e.g. with runaway
with code allocating lots of connections, can cause other things, like
loading packages, to fail with hard to diagnose error messages. A
static connection limit is a crude way to guard against that. Doing
anything substantially better is probably a lot of work. A simple
option that may be worth pursuing is to allow the limit to be adjusted
at runtime. Users who want to go higher would do so at their own risk
and may need to know how to adjust the soft limit on the process.

Best,

luke

On Wed, 25 Aug 2021, Simon Urbanek wrote:



Martin,

I don't think static connection limit is sensible. Recall that connections can 
be anything, not just necessarily sockets or file descriptions so they are not 
linked to the system fd limit. For example, if you use a codec then you will 
need twice the number of connections than the fds. To be honest the connection 
limit is one of the main reasons why in our big data applications we have 
always avoided R connections and used C-level sockets instead (others were lack 
of control over the socket flags, but that has been addressed in the last 
release). So I'd vote for at the very least increasing the limit significantly 
(at least 1k if not more) and, ideally, make it dynamic if memory footprint is 
an issue.

Cheers,
Simon



On Aug 25, 2021, at 8:53 AM, Martin Maechler  wrote:


GILLIBERT, Andre
   on Tue, 24 Aug 2021 09:49:52 + writes:



RConnection is a pointer to a Rconn structure. The Rconn
structure must be allocated independently (e.g. by
malloc() in R_new_custom_connection).  Therefore,
increasing NCONNECTION to 1024 should only use 8
kilobytes on 64-bits platforms and 4 kilobytes on 32
bits platforms.


You are right indeed, and I was wrong.


Ideally, it should be dynamically allocated : either as
a linked list or as a dynamic array
(malloc/realloc). However, a simple change of
NCONNECTION to 1024 should be enough for most uses.


There is one important other problem I've been made aware
(similarly to the number of open DLL libraries, an issue 1-2
years ago) :

The OS itself has limits on the number of open files
(yes, I know that there are other connections than files) and
these limits may quite differ from platform to platform.

On my Linux laptop, in a shell, I see

 $ ulimit -n
 1024

which is barely conformant with your proposed 1024 NCONNECTION.

Now if NCONNCECTION is larger than the max allowed number of
open files and if R opens more files than the OS allowed, the
user may get quite unpleasant behavior, e.g. R being terminated brutally
(or behaving crazily) without good R-level warning / error messages.

It's also not at all sufficient to check for the open files
limit at compile time, but rather at R process startup time

So this may need considerably more work than you / we have
hoped, and it's probably hard to find a safe number that is
considerably larger than 128  and less than the smallest of all
non-crazy platforms' {number of open files limit}.


Sincerely
Andr� GILLIBERT


 []

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: Update on rtools4 and ucrt support

2021-08-23 Thread luke-tierney


On Mon, 23 Aug 2021, Duncan Murdoch wrote:


On 23/08/2021 8:15 a.m., jan Vitek via R-devel wrote:

Hi Jeroen,

I mostly lurk on this list, but I was struck by your combative tone.

To pick on two random bits:


… a 6gb tarball with manually built things on his personal machine…



… a black-box system that is so opaque and complex that only one person
knows how it works, and would make it much more difficult for
students, universities, and other organisations to build R packages
and libraries on Windows…



Tomas’ tool chain isn't a blackbox, it has copious documentation (see [1])
and builds on any machine thanks to the provided docker container.

This is not to criticise your work which has its unique strengths, but to
state the obvious: these strengths are best discussed without passion
based on factually accurate descriptions.


I agree with Jan.  I'm not sure a discussion in this forum would be fruitful, 
but I really wish Jeroen and Tomas would get together, aiming to merge their 
toolchains, keeping the best aspects of both.


I haven't been involved in the development of either one, but have been a 
"victim" of the two chain rivalry, because the rgl package is not easy to 
build.  I get instructions from each of them on how to do the build, and 
those instructions for one toolchain generally break the build on the other 
one.  While it is probably possible to detect the toolchain and have the 
build adapt to whichever one is in use, it would be a lot easier for me (and 
I imagine every other maintainer of a package using external libs) if I just 
had to follow one set of instructions.


Duncan Murdoch


Here are just a few comments from my perspective (I am an R-core
member, but am not part of the CRAN team and do only very limited work
on Windows). Other R-core members may have different perspectives and
insights.

One bit of background: dealing with encoding issues on Windows has
been taking an unsustainable amount of R-core resources for some time
now. Tomas Kalibera has been taking the lead on trying to address
these issues in the existing framework, but this means he has not had
the time to make any of the many other valuable and important
contributions he could make. The only viable way forward is to move to
a Windows tool chain that supports UTF-8 as the C library current
encoding via the Windows UCRT framework.

Tomas Kalibera has, on behalf of all of R core and in
coordination with CRAN, been looking for a way forward for some
time and has reported on the progress in several blog posts at
https://developer.r-project.org/Blog/public/. This has lead to
the development of the MXE-based UCRT tool chain, which is now
well tested and ready for deployment.  Checks using the UCRT tool
chain have been part of the CRAN check process for a while. I
believe CRAN plans to switch R-devel checks and builds to the
UCRT tool chain during the upcoming CRAN downtime. I expect there
will be some communication from CRAN on this soon, including on
any issues in supporting binaries for both R-devel and R-patched.

In putting together something as large as a tool chain there will
always be many choices, each with advantages and disadvantages.  Some
things may be advantages in some settings and not others. Taking just
one case in point: Cross compilation. This is likely to be a better
approach for CRAN in the future and is supported by the MXE framework
on which the new tool chain is based.

The much more recent changes in rtools4 to support UCRT are at this
point not yet as well tested as the new tool chain. Once these changes
to rtools4 mature, and if binary compatibility can be assured, then
having a second tool chain may be useful in some cases.  But if there
are incompatibilities then it will be up to rtools4 to keep up with
the tool chain used by CRAN. On the other, contributing to improving
the MXE-based tool chain may be a better investment of time.

Best,

luke



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: JIT compiler does not compile closures with custom environments

2021-08-18 Thread luke-tierney


On Wed, 18 Aug 2021, Duncan Murdoch wrote:


On 18/08/2021 9:00 a.m., Taras Zakharko wrote:
I have encountered a behavior of R’s JIT compiler that I can’t quite figure 
out. Consider the following code:



f_global <- function(x) {
  for(i in 1:1) x <- x + 1
  x
}

f_env <- local({
 function(x) {
   for(i in 1:1) x <- x + 1
   x
 }
})

compiler::enableJIT(3)

   bench::mark(f_global(0), f_env(0))
   # 1 f_global(0)103µs 107.61µs 8770.11.4KB  04384 
0
   # 2 f_env(0)   1.1ms   1.42ms  712.0B 66.3   290 
27
   Inspecting the closures shows that f_global has been byte-compiled while 
f_env has not been byte-compiled. Furthermore, if I assign a new 
environment to f_global (e.g. via environment(f_global) <- new.env()), it 
won’t be byte-compiled either.


However, if I have a function returning a closure, that closure does get 
byte-compiled:


   f_closure <- (function() {
 function(x) {
   for(i in 1:1) x <- x + 1
  x
}
   })()

   bench::mark(f_closure(0))
   # 1 f_closure(0)105µs109µs 8625.0B 2.01  4284 
1  497ms


What is going on here? Both f_closure and f_env have non-global 
environments. Why is one JIT-compiled, but not the other? Is there a way to 
ensure that functions defined in environments will be JIT-compiled?


About what is going on in f_closure:  I think the anonymous factory

function() {
 function(x) {
   for(i in 1:1) x <- x + 1
  x
}
   }

got byte compiled before first use, and that compiled its result.  That seems 
to be what this code indicates:


 f_closure <- (function() {
 res <- function(x) {
 for(i in 1:1) x <- x + 1
 x
 }; print(res); res
 })()
 #> function(x) {
 #> for(i in 1:1) x <- x + 1
 #> x
 #> }
 #> 
 #> 


That is right.

But even if that's true, it doesn't address the bigger question of why 
f_global and f_env are treated differently.


There are various heuristics in the JIT code to avoid spending too
much time in the JIT. The current details are in the source
code. Mostly this is to deal with usually ill-advised coding practices
that programmatically build many small functions.  Hopefully these
heuristics can be reduced or eliminated over time.

For now, putting the code in a package, where the default is to byte
compile on source install, or explicitly calling compiler::cmpfun are
options.

Best,

luke



Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] svd For Large Matrix

2021-08-13 Thread luke-tierney


[copying the list]

svd() does support matrices with long vector data. Your example works
fine for me on a machine with enough memory with either the reference
BLAS/LAPACK or the BLAS/LAPACK used on Fedora 33 (flexiblas backed, I
believe, by a version of openBLAS). Take a look at sessionInfo() to
see what you are using and consider switching to another BLAS/LAPACK
if necessary. Running under gdb may help tracking down where the issue
is and reporting it for the BLAS/LAPACK you are using.

Best,

luke

On Fri, 13 Aug 2021, Dario Strbenac via R-devel wrote:


Good day,

I have a real scenario involving 45 million biological cells (samples) and 60 
proteins (variables) which leads to a segmentation fault for svd. I thought 
this might be a good example of why it might benefit from a long vector upgrade.

test <- matrix(rnorm(4500*60), ncol = 60)
testSVD <- svd(test)

*** caught segfault ***
address 0x7fe93514d618, cause 'memory not mapped'

Traceback:
1: La.svd(x, nu, nv)
2: svd(test)

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] difference of m1 <- lm(f, data) and update(m1, formula=f)

2021-08-11 Thread luke-tierney

in a
formula object actually works: The only difference between the AST for
a call of `~` and the formula such a call produces when evaluated is
the class and environment attributes the call adds, and most code that
works with expressions, like eval(), ignores attributes.

It would seem somewhat more consistent if update.default put the
expression that would produce the formula into the call (i.e. stripped
out the two attributes).

But I do not know if there is logic in base R code, never mind package
code, that takes advantage of the attributes on the formula expression
in if they are found. formula() looks in the 'terms' component so would
not be affects, but I don't know if something else might be.

Best,

luke




Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] problem with pipes, textConnection and read.dcf

2021-08-10 Thread luke-tierney


Not an issue with pipes. The pipe just rewrites the expression to a
nested call and that is then evaluated. The call this produces is


quote(L |>

+gsub(pattern = " ", replacement = "") |>
+gsub(pattern = " ", replacement = "") |>
+textConnection() |>
+read.dcf())
read.dcf(textConnection(gsub(gsub(L, pattern = " ", replacement = ""),
pattern = " ", replacement = "")))

If you run that expression, or just the argument to read.dcf, then you
get the error you report. So the issue is somewhere in textConnection().
This produces a similar message:

read.dcf(textConnection(c(L, "aa", "", 
"", "ddd")))

File a bug report and someone who understands the textConnection()
internals better than I do can take a look.

Best,

luke

On Tue, 10 Aug 2021, Gabor Grothendieck wrote:


This gives an error bit if the first gsub line is commented out then there is no
error even though it is equivalent code.

 L <- c("Variable:id", "Length:112630 ")

 L |>
   gsub(pattern = " ", replacement = "") |>
   gsub(pattern = " ", replacement = "") |>
   textConnection() |>
   read.dcf()
 ## Error in textConnection(gsub(gsub(L, pattern = " ", replacement = ""),  :
 ##  argument 'object' must deparse to a single character string

That is this works:

 L |>
   # gsub(pattern = " ", replacement = "") |>
   gsub(pattern = " ", replacement = "") |>
   textConnection() |>
   read.dcf()
 ##  Variable Length
 ## [1,] "id" "112630"

 R.version.string
 ## [1] "R version 4.1.0 RC (2021-05-16 r80303)"
 win.version()
 ## [1] "Windows 10 x64 (build 19042)"




--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: [R-pkg-devel] Tracking down inconsistent errors and notes across operating systems

2021-07-22 Thread luke-tierney


Thanks; fix committed in r80654.

Best,

luke

On Thu, 22 Jul 2021, Bill Dunlap wrote:


A small example of the problem is 
#define USE_RINTERNALS 1
#include 
#include 
#include 
static s_object* obj = NULL;

Prior to 2021-07-20, with svn 80639, this compiled but after, svn 80647,
that I get

$ gcc -I"/mnt/c/R/R-svn/trunk/src/include" -I.   -I/usr/local/include  
-fpic  -g -O2 -flto -c s_object.c 2>&1
In file included from s_object.c:5:
/mnt/c/R/R-svn/trunk/src/include/Rdefines.h:168:33: error: unknown type name
‘SEXPREC’
  168 | #define s_object                SEXPREC
      |                                 ^~~
s_object.c:7:8: note: in expansion of macro ‘s_object’
    7 | static s_object* obj = NULL;
      |        ^~~~



On Thu, Jul 22, 2021 at 10:18 AM Bill Dunlap 
wrote:
  I think the problem with RPostgreSQL/sec/RS-DBI.c comes from
  some changes to Defn.h and Rinternals.h in RHOME/include that
  Luke made recently (2021-07-20, svn 80647).  Since then the
  line   #define s_object SEXPREC
in Rdefines.h causes problems.  Should it now be 'struct SEXPREC'?

-Bill


On Thu, Jul 22, 2021 at 7:04 AM Iñaki Ucar 
wrote:
  Hi,

  On Thu, 22 Jul 2021 at 15:51, Hannah Owens
   wrote:
  >
  > Hi all,
  > I am working on an update to a package I have on CRAN
  called occCite. My
  > latest release attempt didn’t pass incoming automated
  checks, because there
  > is an outstanding error. Additionally, there are some
  weird notes I would
  > like to get rid of, if anyone has suggestions.
  >
  > The killing error is in r-devel-linux-x86_64-debian-gcc,
  which is: Packages
  > required but not available: 'BIEN', 'taxize',
  ‘RPostgreSQL'
  >
  > I don’t understand this, as it is the only system that
  throws this error,
  > and the packages mentioned are available via CRAN. Any
  suggestions?

  This kind of message usually arises when there is some
  problem with
  those packages on CRAN. Indeed,

  https://cran.r-project.org/web/checks/check_results_BIEN.html
  https://cran.r-project.org/web/checks/check_results_taxize.html
  https://cran.r-project.org/web/checks/check_results_RPostgreSQL.html

  the three of them have ERRORs in that platform. No issue
  on your end.
  You reply pointing to that.

  > Additionally, there are multiple platforms
  > (r-devel-linux-x86_64-fedora-clang;
  r-devel-linux-x86_64-fedora-gcc;
  > r-devel-windows-x86_64-gcc10-UCRT;
  r-patched-solaris-x86;
  > r-release-macos-arm64; r-release-macos-x86_64;
  r-oldrel-macos-x86_64) where
  > two notes pop up:
  >
  > NOTE 1: Namespace in Imports field not imported from:
  ‘bit64’ All declared
  > Imports should be used.
  >
  > The package does use bit64. Any tips on how to address
  this note?

  Are you sure? Your NAMESPACE file does not import(bit64)
  nor
  importFrom(bit64,) anything.

  > NOTE 2: Found 6 marked UTF-8 strings.
  >
  > I presume this is thrown because of the small sample
  dataset I’ve included
  > in the package, but why is it not thrown for all the
  platforms?

  Not all the checks are necessarily done in all the
  platforms. You can
  silence this NOTE by converting the offending strings in
  your datasets
  to ASCII and resaving them.

  --
  Iñaki Úcar

  __
  r-package-de...@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-package-devel





--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] changes in some header files

2021-07-20 Thread luke-tierney


We are working on rearranging some of our header files with the goal
of making the installed headers correspond more closely to the C API
available to packages. Packages that only use entry points and
definitions that are part of the API as specified in Chapter 6 of
Writing R Extensions should not be affected.

I have committed an initial set of changes to R-devel in
r80644. About 10 CRAN packages that use non-API features will fail
under R-devel after these changes and their maintainers have been
notified.

If you are currently using non-API features in a package it would be a
good idea to review what you are doing and to try to revise your code
to work within the API. If you feel there are features missing in the
API then you can suggest additions on this mailing list or bugzilla.

Best,

luke

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Clearing attributes returns ALTREP, serialize still saves them

2021-07-03 Thread luke-tierney


Please do not cross post. You have already rased this on bugzilla. I
will follow up there later today.

luke

On Sat, 3 Jul 2021, Zafer Barutcuoglu wrote:


Hi all,

Setting names/dimnames on vectors/matrices of length>=64 returns an ALTREP 
wrapper which internally still contains the names/dimnames, and calling 
base::serialize on the result writes them out. They are unserialized in the same 
way, with the names/dimnames hidden in the ALTREP wrapper, so the problem is not 
obvious except in wasted time, bandwidth, or disk space.

Example:
  v1 <- setNames(rnorm(64), paste("element name", 1:64))
  v2 <- unname(v1)
  names(v2)
  # NULL
  length(serialize(v1, NULL))
  # [1] 2039
  length(serialize(v2, NULL))
  # [1] 2132
  length(serialize(v2[TRUE], NULL))
  # [1] 543

  con <- rawConnection(raw(), "w")
  serialize(v2, con)
  v3 <- unserialize(rawConnectionValue(con))
  names(v3)
  # NULL
  length(serialize(v3, NULL))
  # 2132

  # Similarly for matrices:
  m1 <- matrix(rnorm(64), 8, 8, dimnames=list(paste("row name", 1:8), paste("col 
name", 1:8)))
  m2 <- unname(m1)
  dimnames(m2)
  # NULL
  length(serialize(m1, NULL))
  # [1] 918
  length(serialize(m2, NULL))
  # [1] 1035
  length(serialize(m2[TRUE, TRUE], NULL))
  # 582

Previously discussed here, too:
https://r.789695.n4.nabble.com/Invisible-names-problem-td4764688.html

This happens with other attributes as well, but less predictably:
  x1 <- structure(rnorm(100), data=rnorm(100))
  x2 <- structure(x1, data=NULL)
  length(serialize(x1, NULL))
  # [1] 8000952
  length(serialize(x2, NULL))
  # [1] 924

  x1b <- rnorm(100)
  attr(x1b, "data") <- rnorm(100)
  x2b <- x1b
  attr(x2b, "data") <- NULL
  length(serialize(x1b, NULL))
  # [1] 8000863
  length(serialize(x2b, NULL))
  # [1] 8000956

This is pretty severe, trying to track down why serializing a small object 
kills the network, because of which large attributes it may have once had 
during its lifetime around the codebase that are still secretly tagging along.

Is there a plan to resolve this? Any suggestions for maybe a C++ workaround 
until then? Or an alternative performant serialization solution?

Best,
--
Zafer


[[alternative HTML version deleted]]

______
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] SET_COMPLEX_ELT and SET_RAW_ELT missing from Rinternals.h

2021-07-02 Thread luke-tierney

On Thu, 1 Jul 2021, Konrad Siek wrote:

Thanks!

So what would be the prescribed way of assigning elements to a CPLXSXP if I
needed to?

The first question is whether you need to do this. Or, more to the
point, whether it is safe to do this. In R objects should behave as if
they are not mutable. Mutation in C code may be OK if the objects are
not reachable from any R variables, but that almost always means they
are private to your code so yo can use what you know about internal
structure.

If it is legitimate to mutate you can use SET_COMPLEX_ELT. I've added
the declaration to Rinternals in R-devel and R-patched.

For SET_COMPLEX_ELT(x, in v) is equivalent to COMPLEX(sexp)[index] = value,
but that could change in the future it Set methods are supported.

This does materialize a potentially compact object, but again the most
important question is whether mutation is legitimate at all.

One way I see is to do what most of the code inside the interpreter does and
grab the vector's data pointer:

    COMPLEX(sexp)[index] = value;
    COMPLEX0(sexp)[index] = value;

COMPLEX0 is not in the API; it will probably be removed from the
installed header files as we clean these up.

This will materialize an ALTREP CPLXSXP though, so maybe the best way would
be to mirror what SET_COMPLEX_ELT does in Rinlinedfuns.h?

    if (ALTREP(sexp)) ALTCOMPLEX_SET_ELT(sexp, index, value); else
COMPLEX0(sexp)[index] = vector;

ALTCOMPLEX_SET_ELT is an internal implementation feature and not in the API.
Again, it will probably be removed from the installed headers.

Best,

luke

This seems better, but it's not used in the interpreter anywhere as far as I
can tell, presumably because of the setter interface not being complete, as
you point out. But should I be avoiding this second approach for some
reaosn?

k

On Tue, Jun 29, 2021 at 4:06 AM  wrote:
  The setter interface for atomic types is not yer implemented. It
  may
  be some day.

  Best,

  luke

  On Fri, 25 Jun 2021, Konrad Siek wrote:

  > Hello,
  >
  > I am working on a package that works with various types of R
  vectors,
  > implemented in C. My code has a lot of SET_*_ELT operations in
  it for
  > various types of vectors, including for CPLXSXPs and RAWSXPs.
  >
  > I noticed SET_COMPLEX_ELT and SET_RAW_ELT are defined in
  Rinlinedfuns.h but
  > not declared in Rinternals.h, so they cannot be used in
  packages. I was
  > going to re-implement them or extern them in my package,
  however,
  > interestingly, ALTCOMPLEX_SET_ELT and ALTRAW_SET_ELT  are both
  declared in
  > Rinternals.h, making me think SET_COMPLEX_ELT and SET_RAW_ELT
  could be
  > purposefully obscured. Otherwise it may just be an oversight
  and I should
  > bring it to someone's attention anyway.
  >
  > I have three questions that I hope R-devel could help me with.
  >
  > 1. Is this an oversight, or are SET_COMPLEX_ELT and
  SET_RAW_ELT not exposed
  > on purpose? 2. If they are not exposed on purpose, I was
  wondering why.
  > 3. More importantly, what would be good ways to set elements
  of these
  > vectors while playing nice with ALTREP and avoiding whatever
  pitfalls
  > caused these functions to be obscured in the first place?
  >
  > Best regards,
  > Konrad,
  >
  >       [[alternative HTML version deleted]]
  >
  > __
  > R-devel@r-project.org mailing list
      > https://stat.ethz.ch/mailman/listinfo/r-devel
  >

  --
  Luke Tierney
  Ralph E. Wareham Professor of Mathematical Sciences
  University of Iowa                  Phone:           
   319-335-3386
  Department of Statistics and        Fax:             
   319-335-3017
      Actuarial Science
  241 Schaeffer Hall                  email: 
   luke-tier...@uiowa.edu
  Iowa City, IA 52242                 WWW: 
  http://www.stat.uiowa.edu

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: ALTREP ALTINTEGER_SUM/MIN/MAX Return Value and Behavior

2021-06-29 Thread luke-tierney


Call the R sum() function, either before going to C code or by calling
back into R. You may only want to do this if the vector is long enough
for e possible savings to be worth while.


On Tue, 29 Jun 2021, Sebastian Martin Krantz wrote:


Thanks both. Is there a suggested way I can get this speedup in a package?
Or just leave it for now?

Thanks also for the clarification Bill. The issue I have with that is that
in my C code ALTREP(x) evaluates to true even after adding and removing
dimensions (otherwise it would be handled by the normal sum method and I’d
be fine).


When you use a longer vector


Also .Internal(inspect(x)) still shows the compact
representation.


A different representation (wrapper around a compact sequence).

Best,

luke



-Sebastian

On Tue 29. Jun 2021 at 19:43, Bill Dunlap  wrote:


Adding the dimensions attribute takes away the altrep-ness.  Removing
dimensions
does not make it altrep.  E.g.,


a <- 1:10
am <- a ; dim(am) <- c(2L,5L)
amn <- am ; dim(amn) <- NULL
.Call("is_altrep", a)

[1] TRUE

.Call("is_altrep", am)

[1] FALSE

.Call("is_altrep", amn)

[1] FALSE

where is_altrep() is defined by the following C code:

#include 
#include 

SEXP is_altrep(SEXP x)
{
return Rf_ScalarLogical(ALTREP(x));
}


-Bill

On Tue, Jun 29, 2021 at 8:03 AM Sebastian Martin Krantz <
sebastian.kra...@graduateinstitute.ch> wrote:


Hello together, I'm working on some custom (grouped, weighted) sum, min
and
max functions and I want them to support the special case of plain integer
sequences using ALTREP. I thereby encountered some behavior I cannot
explain to myself. The head of my fsum C function looks like this (g is
optional grouping vector, w is optional weights vector):

SEXP fsumC(SEXP x, SEXP Rng, SEXP g, SEXP w, SEXP Rnarm) {
  int l = length(x), tx = TYPEOF(x), ng = asInteger(Rng),
narm = asLogical(Rnarm), nprotect = 1, nwl = isNull(w);
  if(ALTREP(x) && ng == 0 && nwl) {
switch(tx) {
case INTSXP: return ALTINTEGER_SUM(x, (Rboolean)narm);
case LGLSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
case REALSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
default: error("ALTREP object must be integer or real typed");
}
  }
// ...
}

when I let x <- 1:1e8, fsum(x) works fine and returns the correct value.
If
I now make this a matrix dim(x) <- c(1e2, 1e6) and subsequently turn this
into a vector again, dim(x) <- NULL, fsum(x) gives  NULL and a warning
message 'converting NULL pointer to R NULL'. For functions fmin and fmax
(similarly defined using ALTINTEGER_MIN/MAX), I get this error right away
e.g. fmin(1:1e8) gives NULL and warning 'converting NULL pointer to R
NULL'. So what is going on here? What do these functions return? And how
do
I make this a robust implementation?

Best regards,

Sebastian Krantz

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: ALTREP ALTINTEGER_SUM/MIN/MAX Return Value and Behavior

2021-06-29 Thread luke-tierney


It depends on the size. For a larger vector adding dim will create a
wrapper ALTREP.

Currently the wrapper does not try to use the payload's sum method;
this could be added.

Best,

luke

On Tue, 29 Jun 2021, Bill Dunlap wrote:


Adding the dimensions attribute takes away the altrep-ness.  Removing
dimensions
does not make it altrep.  E.g.,


a <- 1:10
am <- a ; dim(am) <- c(2L,5L)
amn <- am ; dim(amn) <- NULL
.Call("is_altrep", a)

[1] TRUE

.Call("is_altrep", am)

[1] FALSE

.Call("is_altrep", amn)

[1] FALSE

where is_altrep() is defined by the following C code:

#include 
#include 

SEXP is_altrep(SEXP x)
{
   return Rf_ScalarLogical(ALTREP(x));
}

-Bill

On Tue, Jun 29, 2021 at 8:03 AM Sebastian Martin Krantz <
sebastian.kra...@graduateinstitute.ch> wrote:


Hello together, I'm working on some custom (grouped, weighted) sum, min and
max functions and I want them to support the special case of plain integer
sequences using ALTREP. I thereby encountered some behavior I cannot
explain to myself. The head of my fsum C function looks like this (g is
optional grouping vector, w is optional weights vector):

SEXP fsumC(SEXP x, SEXP Rng, SEXP g, SEXP w, SEXP Rnarm) {
  int l = length(x), tx = TYPEOF(x), ng = asInteger(Rng),
narm = asLogical(Rnarm), nprotect = 1, nwl = isNull(w);
  if(ALTREP(x) && ng == 0 && nwl) {
switch(tx) {
case INTSXP: return ALTINTEGER_SUM(x, (Rboolean)narm);
case LGLSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
case REALSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
default: error("ALTREP object must be integer or real typed");
}
  }
// ...
}

when I let x <- 1:1e8, fsum(x) works fine and returns the correct value. If
I now make this a matrix dim(x) <- c(1e2, 1e6) and subsequently turn this
into a vector again, dim(x) <- NULL, fsum(x) gives  NULL and a warning
message 'converting NULL pointer to R NULL'. For functions fmin and fmax
(similarly defined using ALTINTEGER_MIN/MAX), I get this error right away
e.g. fmin(1:1e8) gives NULL and warning 'converting NULL pointer to R
NULL'. So what is going on here? What do these functions return? And how do
I make this a robust implementation?

Best regards,

Sebastian Krantz

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] ALTREP ALTINTEGER_SUM/MIN/MAX Return Value and Behavior

2021-06-29 Thread luke-tierney


ALTINTEGER_SUM and friends are _not_ intended for use in package code.
Once we get some time to clean up headers they will no longer be
visible to packages.

Best,

luke

On Tue, 29 Jun 2021, Sebastian Martin Krantz wrote:


Hello together, I'm working on some custom (grouped, weighted) sum, min and
max functions and I want them to support the special case of plain integer
sequences using ALTREP. I thereby encountered some behavior I cannot
explain to myself. The head of my fsum C function looks like this (g is
optional grouping vector, w is optional weights vector):

SEXP fsumC(SEXP x, SEXP Rng, SEXP g, SEXP w, SEXP Rnarm) {
 int l = length(x), tx = TYPEOF(x), ng = asInteger(Rng),
   narm = asLogical(Rnarm), nprotect = 1, nwl = isNull(w);
 if(ALTREP(x) && ng == 0 && nwl) {
   switch(tx) {
   case INTSXP: return ALTINTEGER_SUM(x, (Rboolean)narm);
   case LGLSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
   case REALSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
   default: error("ALTREP object must be integer or real typed");
   }
 }
// ...
}

when I let x <- 1:1e8, fsum(x) works fine and returns the correct value. If
I now make this a matrix dim(x) <- c(1e2, 1e6) and subsequently turn this
into a vector again, dim(x) <- NULL, fsum(x) gives  NULL and a warning
message 'converting NULL pointer to R NULL'. For functions fmin and fmax
(similarly defined using ALTINTEGER_MIN/MAX), I get this error right away
e.g. fmin(1:1e8) gives NULL and warning 'converting NULL pointer to R
NULL'. So what is going on here? What do these functions return? And how do
I make this a robust implementation?

Best regards,

Sebastian Krantz

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] SET_COMPLEX_ELT and SET_RAW_ELT missing from Rinternals.h

2021-06-28 Thread luke-tierney


The setter interface for atomic types is not yer implemented. It may
be some day.

Best,

luke

On Fri, 25 Jun 2021, Konrad Siek wrote:


Hello,

I am working on a package that works with various types of R vectors,
implemented in C. My code has a lot of SET_*_ELT operations in it for
various types of vectors, including for CPLXSXPs and RAWSXPs.

I noticed SET_COMPLEX_ELT and SET_RAW_ELT are defined in Rinlinedfuns.h but
not declared in Rinternals.h, so they cannot be used in packages. I was
going to re-implement them or extern them in my package, however,
interestingly, ALTCOMPLEX_SET_ELT and ALTRAW_SET_ELT  are both declared in
Rinternals.h, making me think SET_COMPLEX_ELT and SET_RAW_ELT could be
purposefully obscured. Otherwise it may just be an oversight and I should
bring it to someone's attention anyway.

I have three questions that I hope R-devel could help me with.

1. Is this an oversight, or are SET_COMPLEX_ELT and SET_RAW_ELT not exposed
on purpose? 2. If they are not exposed on purpose, I was wondering why.
3. More importantly, what would be good ways to set elements of these
vectors while playing nice with ALTREP and avoiding whatever pitfalls
caused these functions to be obscured in the first place?

Best regards,
Konrad,

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Possible ALTREP bug

2021-06-17 Thread luke-tierney

o hopefully add a bit to what Luke already answered,
  from what I am
  >>> recalling looking back at that bioconductor thread Elt
  methods are used
  >> in
  >>> places where there are hard implicit assumptions that no
  garbage
  >> collection
  >>> will occur (ie they are called on things that aren't
  PROTECTed), and
  >> beyond
  >>> that, in places where there are hard assumptions that no
  error (longjmp)
  >>> will occur. I could be wrong, but I don't know that
  suspending garbage
  >>> collection would protect from the second one. Ie it is
  possible that an
  >>> error *ever* being raised from R code that implements an elt
  method could
  >>> cause all hell to break loose.
  >>>
  >>> Luke or Tomas Kalibera would know more.
  >>>
  >>> I was disappointed that implementing ALTREPs in R code was
  not in the
  >> cards
  >>> (it was in my original proposal back in 2016 to the DSC) but
  I trust Luke
  >>> that there are important reasons we can't safely allow that.
  >>>
  >>> Best,
  >>> ~G
  >>>
  >>> On Fri, May 28, 2021 at 8:31 AM Jim Hester
  
  >> wrote:
  >>>      From reading the discussion on the Bioconductor issue
  tracker it
  >>>      seems like
  >>>      the reason the GC is not suspended for the non-string
  ALTREP Elt
  >>>      methods is
  >>>      primarily due to performance concerns.
  >>>
  >>>      If this is the case perhaps an additional flag could be
  added to
  >>>      the
  >>>      `R_set_altrep_*()` functions so ALTREP authors could
  indicate if
  >>>      GC should
  >>>      be halted when that particular method is called for
  that
  >>>      particular ALTREP
  >>>      class.
  >>>
  >>>      This would avoid the performance hit (other than a
  boolean
  >>>      check) for the
  >>>      standard case when no allocations are expected, but
  allow
  >>>      authors to
  >>>      indicate that R should pause GC if needed for methods
  in their
  >>>      class.
  >>>
  >>>      On Fri, May 28, 2021 at 9:42 AM
   wrote:
  >>>
  >>>> integer and real Elt methods are not expected to allocate.
  You
  >>>      would
  >>>> have to suspend GC to be able to do that. This currently
  can't
  >>>      be done
  >>>> from package code.
  >>>>
  >>>> Best,
  >>>>
  >>>> luke
  >>>>
  >>>> On Fri, 28 May 2021, Gábor Csárdi wrote:
  >>>>
  >>>>> I have found some weird SEXP corruption behavior with
  >>>      ALTREP, which
  >>>>> could be a bug. (Or I could be doing something wrong.)
  >>>>>
  >>>>> I have an integer ALTREP vector that calls back to R from
  >>>      the Elt
  >>>>> method. When this vector is indexed in a lapply(), its
  first
  >>>      element
  >>>>> gets corrupted. Sometimes it's just a type change to
  >>>      logical, but
  >>>>> sometimes the corruption causes a crash.
  >>>>>
  >>>>> I saw this on macOS from R 3.5.3 to 4.2.0. I created a
  small
  >>>      package
  >>>>> that demonstrates this:
  >>>      https://github.com/gaborcsardi/redfish
  >>>>>
  >>>>> The R callback in this package calls
  >>>      `loadNamespace("Matrix")`, but
  >>>>> the same crash happens for other packages as well, and
  >>>      sometimes it
  >>>>> also happens if I don't load any packages at all. (But
  that
  >>>      example
  >>>>> was much more complicated, so I went with the package
  >>>      loading.)
  >>>>>
  >>>>> It is somewhat random, and sometimes turning off the JIT
  >>>      avoids the
  >>>>> crash, but not always.
  >>>>>
  >>>>> Hopefully I am just doing something wrong in the ALTREP

Re: [Rd] [External] Possible ALTREP bug

2021-05-28 Thread luke-tierney

Since the INTEGER_ELT, REAL_ELT, etc, functions are fairly new it may
be possible to check that places where they are used allow for them to
allocate. I have fixed the one that got caught by Gabor's example, and
a rchk run might be able to pick up others if rchk knows these could
allocate. (I may also be forgetting other places where the _ELt
methods are used.)  Fixing all call sites for REAL, INTEGER, etc, was
never realistic so there GC has to be suspended during the method
call, and that is done in the dispatch mechanism.

The bigger problem is jumps from inside things that existing code
assumes will not do that. Catching those jumps is possible but
expensive; doing anything sensible if one is caught is really not
possible.

Best,

luke

On Fri, 28 May 2021, Gabriel Becker wrote:

Hi Jim et al,
Just to hopefully add a bit to what Luke already answered, from what I am
recalling looking back at that bioconductor thread Elt methods are used in
places where there are hard implicit assumptions that no garbage collection
will occur (ie they are called on things that aren't PROTECTed), and beyond
that, in places where there are hard assumptions that no error (longjmp)
will occur. I could be wrong, but I don't know that suspending garbage
collection would protect from the second one. Ie it is possible that an
error *ever* being raised from R code that implements an elt method could
cause all hell to break loose.

Luke or Tomas Kalibera would know more.

I was disappointed that implementing ALTREPs in R code was not in the cards
(it was in my original proposal back in 2016 to the DSC) but I trust Luke
that there are important reasons we can't safely allow that.

Best,
~G

On Fri, May 28, 2021 at 8:31 AM Jim Hester  wrote:
  From reading the discussion on the Bioconductor issue tracker it
  seems like
  the reason the GC is not suspended for the non-string ALTREP Elt
  methods is
  primarily due to performance concerns.

  If this is the case perhaps an additional flag could be added to
  the
  `R_set_altrep_*()` functions so ALTREP authors could indicate if
  GC should
  be halted when that particular method is called for that
  particular ALTREP
  class.

  This would avoid the performance hit (other than a boolean
  check) for the
  standard case when no allocations are expected, but allow
  authors to
  indicate that R should pause GC if needed for methods in their
  class.

  On Fri, May 28, 2021 at 9:42 AM  wrote:

  > integer and real Elt methods are not expected to allocate. You
  would
  > have to suspend GC to be able to do that. This currently can't
  be done
  > from package code.
  >
  > Best,
  >
  > luke
  >
  > On Fri, 28 May 2021, Gábor Csárdi wrote:
  >
  > > I have found some weird SEXP corruption behavior with
  ALTREP, which
  > > could be a bug. (Or I could be doing something wrong.)
  > >
  > > I have an integer ALTREP vector that calls back to R from
  the Elt
  > > method. When this vector is indexed in a lapply(), its first
  element
  > > gets corrupted. Sometimes it's just a type change to
  logical, but
  > > sometimes the corruption causes a crash.
  > >
  > > I saw this on macOS from R 3.5.3 to 4.2.0. I created a small
  package
  > > that demonstrates this:
  https://github.com/gaborcsardi/redfish
  > >
  > > The R callback in this package calls
  `loadNamespace("Matrix")`, but
  > > the same crash happens for other packages as well, and
  sometimes it
  > > also happens if I don't load any packages at all. (But that
  example
  > > was much more complicated, so I went with the package
  loading.)
  > >
  > > It is somewhat random, and sometimes turning off the JIT
  avoids the
  > > crash, but not always.
  > >
  > > Hopefully I am just doing something wrong in the ALTREP code
  (see
  > >
  https://github.com/gaborcsardi/redfish/blob/main/src/test.c),
  and it
  > > is not actually a bug.
  > >
  > > Thanks,
  > > Gabor
  > >
      > > __
  > > R-devel@r-project.org mailing list
  > > https://stat.ethz.ch/mailman/listinfo/r-devel
  > >
  >
  > --
  > Luke Tierney
  > Ralph E. Wareham Professor of Mathematical Sciences
  > University of Iowa                  Phone:           
   319-335-3386
  > Department of Statistics and        Fax:             
   319-335-3017
  >     Actuarial Science
  > 241 Schaeffer Hall                  email: 
   luke-tier...

Re: [Rd] [External] Possible ALTREP bug

2021-05-28 Thread luke-tierney


integer and real Elt methods are not expected to allocate. You would
have to suspend GC to be able to do that. This currently can't be done
from package code.

Best,

luke

On Fri, 28 May 2021, Gábor Csárdi wrote:


I have found some weird SEXP corruption behavior with ALTREP, which
could be a bug. (Or I could be doing something wrong.)

I have an integer ALTREP vector that calls back to R from the Elt
method. When this vector is indexed in a lapply(), its first element
gets corrupted. Sometimes it's just a type change to logical, but
sometimes the corruption causes a crash.

I saw this on macOS from R 3.5.3 to 4.2.0. I created a small package
that demonstrates this: https://github.com/gaborcsardi/redfish

The R callback in this package calls `loadNamespace("Matrix")`, but
the same crash happens for other packages as well, and sometimes it
also happens if I don't load any packages at all. (But that example
was much more complicated, so I went with the package loading.)

It is somewhat random, and sometimes turning off the JIT avoids the
crash, but not always.

Hopefully I am just doing something wrong in the ALTREP code (see
https://github.com/gaborcsardi/redfish/blob/main/src/test.c), and it
is not actually a bug.

Thanks,
Gabor

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: 1954 from NA

2021-05-25 Thread luke-tierney

 is there any reason only an NA should have such meta-data? Why not have
reasons associated with Inf stating it was an Inf because you asked for one
or the result of a calculation such as dividing by Zero (albeit maybe that
might be a NaN) and so on. Maybe I could annotate integers with whether
they are prime or even  versus odd  or a factor of 144 or anything else I
can imagine. But at some point, the overhead from allowing all this can
become substantial. I was amused at how python allows a function to be
annotated including by itself since it is an object. So it can store such
metadata perhaps in an attached dictionary so a complex costly calculation
can have the results cached and when you ask for the same thing in the same
session, it checks if it has done it and just returns the result in linear
time. But after a while, how many cached results can there be?

-Original Message-
From: R-devel  On Behalf Of
luke-tier...@uiowa.edu
Sent: Monday, May 24, 2021 9:15 AM
To: Adrian Dușa 
Cc: Greg Minshall ; r-devel 
Subject: Re: [Rd] [External] Re: 1954 from NA

On Mon, 24 May 2021, Adrian Dușa wrote:


On Mon, May 24, 2021 at 2:11 PM Greg Minshall 

wrote:



[...]
if you have 500 columns of possibly-NA'd variables, you could have
one column of 500 "bits", where each bit has one of N values, N being
the number of explanations the corresponding column has for why the
NA exists.



PLEASE DO NOT DO THIS!

It will not work reliably, as has been explained to you ad nauseam in this
thread.

If you distribute code that does this it will only lead to bug reports on
R that will waste R-core time.

As Alex explained, you can use attributes for this. If you need operations
to preserve attributes across subsetting you can define subsetting methods
that do that.

If you are dead set on doing something in C you can try to develop an
ALTREP class that provides augmented missing value information.

Best,

luke





The mere thought of implementing something like that gives me shivers.
Not to mention such a solution should also be robust when subsetting,
splitting, column and row binding, etc. and everything can be lost if
the user deletes that particular column without realising its importance.

Social science datasets are much more alive and complex than one might
first think: there are multi-wave studies with tens of countries, and
aggregating such data is already a complex process to add even more
complexity on top of that.

As undocumented as they may be, or even subject to change, I think the
R internals are much more reliable that this.

Best wishes,
Adrian




--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel







--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: 1954 from NA

2021-05-24 Thread luke-tierney


On Mon, 24 May 2021, Adrian Dușa wrote:


On Mon, May 24, 2021 at 2:11 PM Greg Minshall  wrote:


[...]
if you have 500 columns of possibly-NA'd variables, you could have one
column of 500 "bits", where each bit has one of N values, N being the
number of explanations the corresponding column has for why the NA
exists.



PLEASE DO NOT DO THIS!

It will not work reliably, as has been explained to you ad nauseam in
this thread.

If you distribute code that does this it will only lead to bug reports
on R that will waste R-core time.

As Alex explained, you can use attributes for this. If you need
operations to preserve attributes across subsetting you can define
subsetting methods that do that.

If you are dead set on doing something in C you can try to develop an
ALTREP class that provides augmented missing value information.

Best,

luke





The mere thought of implementing something like that gives me shivers. Not
to mention such a solution should also be robust when subsetting,
splitting, column and row binding, etc. and everything can be lost if the
user deletes that particular column without realising its importance.

Social science datasets are much more alive and complex than one might
first think: there are multi-wave studies with tens of countries, and
aggregating such data is already a complex process to add even more
complexity on top of that.

As undocumented as they may be, or even subject to change, I think the R
internals are much more reliable that this.

Best wishes,
Adrian




--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Pipe bind restored in R 4.1.0?

2021-04-17 Thread luke-tierney


No. We need more time to resolve issues revealed in testing.

Best,

luke

On Sat, 17 Apr 2021, Brenton Wiernik wrote:


Is the pipe bind `=>` operator likely to be restored by default in time for the 
4.1 release?

Brenton


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] memory consumption of nested (un)serialize of sys.frames()

2021-04-08 Thread luke-tierney


Looks like this is an unavoidable interaction between the way source
references and lazy loading are implemented. The link back to the
crash_dumps environment comes though source references on an
unevaluated argument promise. Creating a fresh environment is
.onLoad() avoids this and is probably your best bet.

Having an option to serialize without source references might be nice
but would probably not be high enough on anyone's priority list to get
done anytime soon.

Best,

luke

On Thu, 8 Apr 2021, luke-tier...@uiowa.edu wrote:


I see that now also. Not sure yet what is going on.

One work-around that may work for you is to create a fresh crash dump
in a .onLoad function; somehting like

crash_dumps <- NULL
.onLoad <- function(...) crash_dumps <<- new.env()

Best,

luke

On Wed, 7 Apr 2021, Andreas Kersting wrote:


Hi Dirk, hi Luke,

Thanks for checking!

I could narrow it down further. I have the issue only if I install 
--with-keep.source, i.e.


R CMD INSTALL --with-keep.source dumpTest

Since this is the default in RStudio when clicking "Install and Restart", I 
was always having the issue - also from base R. If I install using e.g. 
devtools::install_github() directly it is also fine for me.


Could you please confirm? Thanks!

Regards,
Andreas

2021-04-07 16:20 GMT+02:00 "Dirk Eddelbuettel" :


On 7 April 2021 at 16:06, Andreas Kersting wrote:
| Hi Luke,
|
| Please see https://github.com/akersting/dumpTest for the package.
|
| Here a session showing my issue:
|
| > library(dumpTest)
| > sessionInfo()
| R version 4.0.5 (2021-03-31)
| Platform: x86_64-pc-linux-gnu (64-bit)
| Running under: Debian GNU/Linux 10 (buster)
|
| Matrix products: default
| BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
| LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0
|
| locale:
|  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
|  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
|  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
|  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
|  [9] LC_ADDRESS=C   LC_TELEPHONE=C
| [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
|
| attached base packages:
| [1] stats graphics  grDevices utils datasets  methods   base
|
| other attached packages:
| [1] dumpTest_0.1.0
|
| loaded via a namespace (and not attached):
| [1] compiler_4.0.5
| > for (i in 1:100) {
| +   print(i)
| +   print(system.time(f()))
| + }
| [1] 1
|user  system elapsed
|   0.028   0.004   0.034
| [1] 2
|user  system elapsed
|   0.067   0.008   0.075
| [1] 3
|user  system elapsed
|   0.176   0.000   0.176
| [1] 4
|user  system elapsed
|   0.335   0.012   0.349
| [1] 5
|user  system elapsed
|   0.745   0.023   0.770
| [1] 6
|user  system elapsed
|   1.495   0.060   1.572
| [1] 7
|user  system elapsed
|   2.902   0.136   3.040
| [1] 8
|user  system elapsed
|   5.753   0.272   6.034
| [1] 9
|user  system elapsed
|  11.807   0.708  12.597
| [1] 10
| ^C
| Timing stopped at: 6.638 0.549 7.214
|
| I had to interrupt in iteration 10 because I was running low on RAM.

No issue here.  Ubuntu 20.10, R 4.0.5 'from CRAN' i.e. Michael's PPA build
off my Debian package, hence instrumentation as in the Debian package.

edd@rob:~$ installGithub.r akersting/dumpTest
Using github PAT from envvar GITHUB_PAT
Downloading GitHub repo akersting/dumpTest@HEAD
✔  checking for file 
‘/tmp/remotes3f9af733166ccd/akersting-dumpTest-3bed8e2/DESCRIPTION’ ...

─  preparing ‘dumpTest’:
✔  checking DESCRIPTION meta-information ...
─  checking for LF line-endings in source and make files and shell scripts
─  checking for empty or unneeded directories
─  building ‘dumpTest_0.1.0.tar.gz’

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
* installing *source* package ‘dumpTest’ ...
** using staged installation
** R
** byte-compile and prepare package for lazy loading
** help
No man pages found in package  ‘dumpTest’
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation 
path

* DONE (dumpTest)
edd@rob:~$ Rscript -e 'system.time({for (i in 1:100) dumpTest::f()})'
   user  system elapsed
  0.481   0.019   0.500
edd@rob:~$

(I also ran the variant you showed with the dual print statements, it just
consumes more screen real estate and ends on

[...]
[1] 97
   user  system elapsed
  0.004   0.000   0.005
[1] 98
   user  system elapsed
  0.004   0.000   0.005
[1] 99
   user  system elapsed
  0.004   0.000   0.004
[1] 100
   user  system elapsed
  0.005   0.000   0.005
edd@rob:~$ )

Dirk

--
https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org






--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-338

Re: [Rd] [External] memory consumption of nested (un)serialize of sys.frames()

2021-04-08 Thread luke-tierney


I see that now also. Not sure yet what is going on.

One work-around that may work for you is to create a fresh crash dump
in a .onLoad function; somehting like

crash_dumps <- NULL
.onLoad <- function(...) crash_dumps <<- new.env()

Best,

luke

On Wed, 7 Apr 2021, Andreas Kersting wrote:


Hi Dirk, hi Luke,

Thanks for checking!

I could narrow it down further. I have the issue only if I install 
--with-keep.source, i.e.

R CMD INSTALL --with-keep.source dumpTest

Since this is the default in RStudio when clicking "Install and Restart", I was 
always having the issue - also from base R. If I install using e.g. 
devtools::install_github() directly it is also fine for me.

Could you please confirm? Thanks!

Regards,
Andreas

2021-04-07 16:20 GMT+02:00 "Dirk Eddelbuettel" :


On 7 April 2021 at 16:06, Andreas Kersting wrote:
| Hi Luke,
|
| Please see https://github.com/akersting/dumpTest for the package.
|
| Here a session showing my issue:
|
| > library(dumpTest)
| > sessionInfo()
| R version 4.0.5 (2021-03-31)
| Platform: x86_64-pc-linux-gnu (64-bit)
| Running under: Debian GNU/Linux 10 (buster)
|
| Matrix products: default
| BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
| LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0
|
| locale:
|  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
|  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
|  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
|  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
|  [9] LC_ADDRESS=C   LC_TELEPHONE=C
| [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
|
| attached base packages:
| [1] stats graphics  grDevices utils datasets  methods   base
|
| other attached packages:
| [1] dumpTest_0.1.0
|
| loaded via a namespace (and not attached):
| [1] compiler_4.0.5
| > for (i in 1:100) {
| +   print(i)
| +   print(system.time(f()))
| + }
| [1] 1
|user  system elapsed
|   0.028   0.004   0.034
| [1] 2
|user  system elapsed
|   0.067   0.008   0.075
| [1] 3
|user  system elapsed
|   0.176   0.000   0.176
| [1] 4
|user  system elapsed
|   0.335   0.012   0.349
| [1] 5
|user  system elapsed
|   0.745   0.023   0.770
| [1] 6
|user  system elapsed
|   1.495   0.060   1.572
| [1] 7
|user  system elapsed
|   2.902   0.136   3.040
| [1] 8
|user  system elapsed
|   5.753   0.272   6.034
| [1] 9
|user  system elapsed
|  11.807   0.708  12.597
| [1] 10
| ^C
| Timing stopped at: 6.638 0.549 7.214
|
| I had to interrupt in iteration 10 because I was running low on RAM.

No issue here.  Ubuntu 20.10, R 4.0.5 'from CRAN' i.e. Michael's PPA build
off my Debian package, hence instrumentation as in the Debian package.

edd@rob:~$ installGithub.r akersting/dumpTest
Using github PAT from envvar GITHUB_PAT
Downloading GitHub repo akersting/dumpTest@HEAD
✔  checking for file 
‘/tmp/remotes3f9af733166ccd/akersting-dumpTest-3bed8e2/DESCRIPTION’ ...
─  preparing ‘dumpTest’:
✔  checking DESCRIPTION meta-information ...
─  checking for LF line-endings in source and make files and shell scripts
─  checking for empty or unneeded directories
─  building ‘dumpTest_0.1.0.tar.gz’

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
* installing *source* package ‘dumpTest’ ...
** using staged installation
** R
** byte-compile and prepare package for lazy loading
** help
No man pages found in package  ‘dumpTest’
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (dumpTest)
edd@rob:~$ Rscript -e 'system.time({for (i in 1:100) dumpTest::f()})'
   user  system elapsed
  0.481   0.019   0.500
edd@rob:~$

(I also ran the variant you showed with the dual print statements, it just
consumes more screen real estate and ends on

[...]
[1] 97
   user  system elapsed
  0.004   0.000   0.005
[1] 98
   user  system elapsed
  0.004   0.000   0.005
[1] 99
   user  system elapsed
  0.004   0.000   0.004
[1] 100
   user  system elapsed
  0.005   0.000   0.005
edd@rob:~$ )

Dirk

--
https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] memory consumption of nested (un)serialize of sys.frames()

2021-04-07 Thread luke-tierney


No issues here with that either. Looks like something is different on
your end.

Best,

luke

On Wed, 7 Apr 2021, Andreas Kersting wrote:


Hi Luke,

Please see https://github.com/akersting/dumpTest for the package.

Here a session showing my issue:


library(dumpTest)
sessionInfo()

R version 4.0.5 (2021-03-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0

locale:
[1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8   LC_NAME=C
[9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] dumpTest_0.1.0

loaded via a namespace (and not attached):
[1] compiler_4.0.5

for (i in 1:100) {

+   print(i)
+   print(system.time(f()))
+ }
[1] 1
  user  system elapsed
 0.028   0.004   0.034
[1] 2
  user  system elapsed
 0.067   0.008   0.075
[1] 3
  user  system elapsed
 0.176   0.000   0.176
[1] 4
  user  system elapsed
 0.335   0.012   0.349
[1] 5
  user  system elapsed
 0.745   0.023   0.770
[1] 6
  user  system elapsed
 1.495   0.060   1.572
[1] 7
  user  system elapsed
 2.902   0.136   3.040
[1] 8
  user  system elapsed
 5.753   0.272   6.034
[1] 9
  user  system elapsed
11.807   0.708  12.597
[1] 10
^C
Timing stopped at: 6.638 0.549 7.214

I had to interrupt in iteration 10 because I was running low on RAM.

Regards,
Andreas

2021-04-07 15:28 GMT+02:00 luke-tier...@uiowa.edu:

On Wed, 7 Apr 2021, Andreas Kersting wrote:


Hi,

please consider the following minimal reproducible example:

Create a new R package which just contains the following two (exported) objects:


I would not expect this behavior and I don't see it when I make such a
package (in R 4.0.3 or R-devel on Ubuntu).  You will need to provide a
more complete reproducible example if you want help with what you are
trying to do; also sessionInfo() would help.

Best,

luke




crash_dumps <- new.env()

f <- function() {
 x <- runif(1e5)
 dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL)))
 assign("last.dump", dump, crash_dumps)
}


WARNING: the following will probably eat all your RAM!

Attach this package and run:

for (i in 1:100) {
 print(i)
 f()
}

You will notice that with each iteration the execution of f() slows down 
significantly while the memory consumption of the R process (v4.0.5 on Linux) 
quickly explodes.

I am having a hard time to understand what exactly is happening here. Something 
w.r.t. too deeply nested environments? Could someone please enlighten me? 
Thanks!

Regards,
Andreas


Background:
In an R package I store crash dumps on error in a parallel processes in a way 
similar to what I have just shown (hence the (un)serialize(), which happens as 
part of returning the objects to the parent process). The first 2 or 3 times I 
do so in a session everything is fine, but afterwards it takes very long and I 
soon run out of memory.

Some more observations:
- If I omit `x <- runif(1e5)`, the issues seem to be less pronounced.
- If I assign to .GlobalEnv instead of crash_dumps, there seems to be no issue 
- probably because .GlobalEnv is not included in sys.frames(), while 
crash_dumps is indirectly via the namespace of the package being the parent.env 
of some of the sys.frames()!?
- If I omit the lapply(...), i.e. use `dump <- 
unserialize(serialize(sys.frames(), NULL))` directly, there seems to be no issue. 
The immediate consequence is that there are less sys.frames and - in particular - 
there is no frame which has the base namespace as its parent.env.
- If I make crash_dumps a list and use assignInMyNamespace() to store the dump 
in it, there also seems to be no issue. I will probably use this as a 
workaround:

crash_dumps <- list()

f <- function() {
 x <- runif(1e5)
 dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL)))
 crash_dumps[["last.dump"]] <- dump
 assignInMyNamespace("crash_dumps", crash_dumps)
}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics and

Re: [Rd] [External] memory consumption of nested (un)serialize of sys.frames()

2021-04-07 Thread luke-tierney




On Wed, 7 Apr 2021, Andreas Kersting wrote:


Hi,

please consider the following minimal reproducible example:

Create a new R package which just contains the following two (exported) objects:


I would not expect this behavior and I don't see it when I make such a
package (in R 4.0.3 or R-devel on Ubuntu).  You will need to provide a
more complete reproducible example if you want help with what you are
trying to do; also sessionInfo() would help.

Best,

luke




crash_dumps <- new.env()

f <- function() {
 x <- runif(1e5)
 dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL)))
 assign("last.dump", dump, crash_dumps)
}


WARNING: the following will probably eat all your RAM!

Attach this package and run:

for (i in 1:100) {
 print(i)
 f()
}

You will notice that with each iteration the execution of f() slows down 
significantly while the memory consumption of the R process (v4.0.5 on Linux) 
quickly explodes.

I am having a hard time to understand what exactly is happening here. Something 
w.r.t. too deeply nested environments? Could someone please enlighten me? 
Thanks!

Regards,
Andreas


Background:
In an R package I store crash dumps on error in a parallel processes in a way 
similar to what I have just shown (hence the (un)serialize(), which happens as 
part of returning the objects to the parent process). The first 2 or 3 times I 
do so in a session everything is fine, but afterwards it takes very long and I 
soon run out of memory.

Some more observations:
- If I omit `x <- runif(1e5)`, the issues seem to be less pronounced.
- If I assign to .GlobalEnv instead of crash_dumps, there seems to be no issue 
- probably because .GlobalEnv is not included in sys.frames(), while 
crash_dumps is indirectly via the namespace of the package being the parent.env 
of some of the sys.frames()!?
- If I omit the lapply(...), i.e. use `dump <- 
unserialize(serialize(sys.frames(), NULL))` directly, there seems to be no issue. 
The immediate consequence is that there are less sys.frames and - in particular - 
there is no frame which has the base namespace as its parent.env.
- If I make crash_dumps a list and use assignInMyNamespace() to store the dump 
in it, there also seems to be no issue. I will probably use this as a 
workaround:

crash_dumps <- list()

f <- function() {
 x <- runif(1e5)
 dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL)))
 crash_dumps[["last.dump"]] <- dump
 assignInMyNamespace("crash_dumps", crash_dumps)
}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] brief update on the pipe operator in R-devel

2021-01-12 Thread luke-tierney

After some discussions we've settled on a syntax of the form

mtcars |> subset(cyl == 4) |> d => lm(mpg ~ disp, data = d)

to handle cases where the pipe lhs needs to be passed to an argument
other than the first of the function called on the rhs. This seems a
to be a reasonable balance between making these non-standard cases
easy to see but still easy to write. This is now committed to R-devel.

Best,

luke

On Tue, 22 Dec 2020, luke-tier...@uiowa.edu wrote:

It turns out that allowing a bare function expression on the
right-hand side (RHS) of a pipe creates opportunities for confusion
and mistakes that are too risky. So we will be dropping support for
this from the pipe operator.

The case of a RHS call that wants to receive the LHS result in an
argument other than the first can be handled with just implicit first
argument passing along the lines of

   mtcars |> subset(cyl == 4) |> (\(d) lm(mpg ~ disp, data = d))()

It was hoped that allowing a bare function expression would make this
more convenient, but it has issues as outlined below. We are exploring
some alternatives, and will hopefully settle on one soon after the
holidays.

The basic problem, pointed out in a comment on Twitter, is that in
expressions of the form

   1 |> \(x) x + 1 -> y
   1 |> \(x) x + 1 |> \(y) x + y

everything after the \(x) is parsed as part of the body of the
function.  So these are parsed along the lines of

   1 |> \(x) { x + 1 -> y }
   1 |> \(x) { x + 1 |> \(y) x + y }

In the first case the result is assigned to a (useless) local
variable.  Someone writing this is more likely to have intended to
assign the result to a global variable, as this would:

   (1 |> \(x) x + 1) -> y

In the second case the 'x' in 'x + y' refers to the local variable 'x'
in the first RHS function. Someone writing this is more likely to have
meant

   (1 |> \(x) x + 1) |> \(y) x + y

with 'x' in 'x + y' now referring to a global variable:

   > x <- 2
   > 1 |> \(x) x + 1 |> \(y) x + y
   [1] 3
   > (1 |> \(x) x + 1) |> \(y) x + y
   [1] 4

These issues arise with any approach in R that allows a bare function
expression on the RHS of a pipe operation. It also arises in other
languages with pipe operators. For example, here is the last example
in Julia:

   julia> x = 2
   2
   julia> 1 |> x -> x + 1 |> y -> x + y
   3
   julia> ( 1 |> x -> x + 1 ) |> y -> x + y
   4

Even though proper use of parentheses can work around these issues,
the likelihood of making mistakes that are hard to track down is too
high. So we will disallow the use of bare function expressions on the
right hand side of a pipe.

Best,

luke

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] brief update on the pipe operator in R-devel

2020-12-22 Thread luke-tierney


It turns out that allowing a bare function expression on the
right-hand side (RHS) of a pipe creates opportunities for confusion
and mistakes that are too risky. So we will be dropping support for
this from the pipe operator.

The case of a RHS call that wants to receive the LHS result in an
argument other than the first can be handled with just implicit first
argument passing along the lines of

mtcars |> subset(cyl == 4) |> (\(d) lm(mpg ~ disp, data = d))()

It was hoped that allowing a bare function expression would make this
more convenient, but it has issues as outlined below. We are exploring
some alternatives, and will hopefully settle on one soon after the
holidays.

The basic problem, pointed out in a comment on Twitter, is that in
expressions of the form

1 |> \(x) x + 1 -> y
1 |> \(x) x + 1 |> \(y) x + y

everything after the \(x) is parsed as part of the body of the
function.  So these are parsed along the lines of

1 |> \(x) { x + 1 -> y }
1 |> \(x) { x + 1 |> \(y) x + y }

In the first case the result is assigned to a (useless) local
variable.  Someone writing this is more likely to have intended to
assign the result to a global variable, as this would:

(1 |> \(x) x + 1) -> y

In the second case the 'x' in 'x + y' refers to the local variable 'x'
in the first RHS function. Someone writing this is more likely to have
meant

(1 |> \(x) x + 1) |> \(y) x + y

with 'x' in 'x + y' now referring to a global variable:

> x <- 2
> 1 |> \(x) x + 1 |> \(y) x + y
[1] 3
> (1 |> \(x) x + 1) |> \(y) x + y
[1] 4

These issues arise with any approach in R that allows a bare function
expression on the RHS of a pipe operation. It also arises in other
languages with pipe operators. For example, here is the last example
in Julia:

julia> x = 2
2
julia> 1 |> x -> x + 1 |> y -> x + y
3
julia> ( 1 |> x -> x + 1 ) |> y -> x + y
4

Even though proper use of parentheses can work around these issues,
the likelihood of making mistakes that are hard to track down is too
high. So we will disallow the use of bare function expressions on the
right hand side of a pipe.

Best,

luke

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] setting .libPaths() with parallel::clusterCall

2020-12-22 Thread luke-tierney


On Tue, 22 Dec 2020, Mark van der Loo wrote:


Dear all,

It is not possible to set library paths on worker nodes with
parallel::clusterCall (or snow::clusterCall) and I wonder if this is
intended behavior.

Example.

library(parallel)
libdir <- "./tmplib"
if (!dir.exists(libdir)) dir.create("./tmplib")

cl <- makeCluster(2)
clusterCall(cl, .libPaths, c(libdir, .libPaths()) )

The output is as expected with the extra libdir returned for each worker
node. However, running

clusterEvalQ(cl, .libPaths())

Shows that the library paths have not been set.


Use this:

clusterCall(cl, ".libPaths", c(libdir, .libPaths()) )

This will find the function .libPaths on the workers.

Your clusterCall sends across a serialized copy of your process'
.libPaths and calls that. Usually that is equivalent to calling the
function found by the name you used on the workers, but not when the
function has an enclosing environment that the function modifies by
assignment.

Alternate implementations of .libPaths that are more
serialization-friendly are possible in principle but probably not
practical given limitations of the base package.

The distinction between providing a function value or a character
string as the function argument to clusterCall and others could
probably use a paragraph in the help file; happy to consider a patch
if anyone wants to take a crack at it.

Best,

luke



If this is indeed a bug, I'm happy to file it at bugzilla. Tested on R
4.0.3 and r-devel.

Best,
Mark
ps: a workaround is documented here:
https://www.markvanderloo.eu/yaRb/2020/12/17/how-to-set-library-path-on-a-parallel-r-cluster/



sessionInfo()

R Under development (unstable) (2020-12-21 r79668)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS

Matrix products: default
BLAS:   /home/mark/projects/Rdev/R-devel/lib/libRblas.so
LAPACK: /home/mark/projects/Rdev/R-devel/lib/libRlapack.so

locale:
[1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
[3] LC_TIME=nl_NL.UTF-8LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=nl_NL.UTF-8LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=nl_NL.UTF-8   LC_NAME=C
[9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats graphics  grDevices utils datasets  methods
[8] base

loaded via a namespace (and not attached):
[1] compiler_4.1.0

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] R crashes when using huge data sets with character string variables

2020-12-12 Thread luke-tierney


If R is receiving a kill signal there is nothing it can do about it.

I am guessing you are running into a memory over-commit issue in your OS.
https://en.wikipedia.org/wiki/Memory_overcommitment
https://engineering.pivotal.io/post/virtual_memory_settings_in_linux_-_the_problem_with_overcommit/

If you have to run this close to your physical memory limits you might
try using your shell's facility (ulimit for bash, limit for some
others) to limit process memory/virtual memory use to your available
physical memory. You can also try setting the R_MAX_VSIZE environment
variable mentioned in ?Memory; that only affects the R heap, not
malloc() done elsewhere.

Best,

luke

On Sat, 12 Dec 2020, Arne Henningsen wrote:


When working with a huge data set with character string variables, I
experienced that various commands let R crash. When I run R in a
Linux/bash console, R terminates with the message "Killed". When I use
RStudio, I get the message "R Session Aborted. R encountered a fatal
error. The session was terminated. Start New Session". If an object in
the R workspace needs too much memory, I would expect that R would not
crash but issue an error message "Error: cannot allocate vector of
size ...".  A minimal reproducible example (at least on my computer)
is:

nObs <- 1e9

date <- paste( round( runif( nObs, 1981, 2015 ) ), round( runif( nObs,
1, 12 ) ), round( runif( nObs, 1, 31 ) ), sep = "-" )

Is this a bug or a feature of R?

Some information about my R version, OS, etc:

R> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
[1] LC_CTYPE=en_DK.UTF-8   LC_NUMERIC=C
[3] LC_TIME=en_DK.UTF-8LC_COLLATE=en_DK.UTF-8
[5] LC_MONETARY=en_DK.UTF-8LC_MESSAGES=en_DK.UTF-8
[7] LC_PAPER=en_DK.UTF-8   LC_NAME=C
[9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.0.3

/Arne




--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: New pipe operator

2020-12-07 Thread luke-tierney


On Mon, 7 Dec 2020, Peter Dalgaard wrote:





On 7 Dec 2020, at 17:35 , Duncan Murdoch  wrote:

On 07/12/2020 11:18 a.m., peter dalgaard wrote:

Hmm,
I feel a bit bad coming late to this, but I think I am beginning to side with those who want  
"... |> head" to work. And yes, that has to happen at the expense of |> head().


Just curious, how would you express head(df, 10)?  Currently it is

df |> head(10)

Would I have to write it as

df |> function(d) head(d, 10)


It could be

df |> ~ head(_, 10)

which in a sense is "yes" to your question.




As I think it was Gabor points out, the current structure goes down a 
nonstandard evaluation route, which may be difficult to explain and departs 
from usual operator evaluation paradigms by being an odd mix of syntax and 
semantics. R lets you do these sorts of thing, witness ggplot and tidyverse, 
but the transparency of the language tends to suffer.


I wouldn't call it non-standard evaluation.  There is no function corresponding to |>, so there's no evaluation at 
all.  It is more like the way "x -> y" is parsed as "y <- x", or "if (x) y" is 
transformed to `if`(x, y).


That's a point, but maybe also my point. Currently, the parser is inserting the 
LHS as the 1st argument of the RHS, right? Things might be simpler if it was 
more like a simple binop.


It can only be a simple binop if you only allow RHS functions of one argument.
Which would require currying along the lines Duncan showed. Something like:

`%>>%` <- function(x, f) f(x)
C1 <- function(f, ...) function(x) f(x, ...)

mtcars %>>% head
mtcars %>>% C1(head, 2)
mtcars %>>% C1(subset, cyl == 4) %>>% \(d) lm(mpg ~ disp, data = d)

This might fly if we lived in a world where most RHS functions take
one argument and only a few needed currying. That is the case in many
functional languages, but not for R. Making the common case of
multiple arguments easy means you have to work at the source level,
either in the parser or with some form of NSE.

Best,

luke



-pd


Duncan Murdoch


It would be neater if it was simply so that the class/type of the object on the 
right hand side decided what should happen. So we could have a rule that we 
could have an object, an expression, and possibly an unevaluated call on the 
RHS. Or maybe a formula, I.e., we could hav
... |> head
but not
... |> head()
because head() does not evaluate to anything useful. Instead, we could have 
some of these
... |> quote(head())
... |> expression(head())
... |> ~ head()
... |> \(_) head(_)
possibly also using a placeholder mechanism for the three first ones. I kind of 
like the idea that the ~ could be equivalent to \(_).
(And yes, I am kicking myself a bit for not using ~ in the NSE arguments in 
subset() and transform())
-pd

On 7 Dec 2020, at 16:20 , Deepayan Sarkar  wrote:

On Mon, Dec 7, 2020 at 6:53 PM Gabor Grothendieck
 wrote:


On Mon, Dec 7, 2020 at 5:41 AM Duncan Murdoch  wrote:

I agree it's all about call expressions, but they aren't all being
treated equally:

x |> f(...)

expands to f(x, ...), while

x |> `function`(...)

expands to `function`(...)(x).  This is an exception to the rule for
other calls, but I think it's a justified one.


This admitted inconsistency is justified by what?  No argument has been
presented.  The justification seems to be implicitly driven by implementation
concerns at the expense of usability and language consistency.


Sorry if I have missed something, but is your consistency argument
basically that if

foo <- function(x) x + 1

then

x |> foo
x |> function(x) x + 1

should both work the same? Suppose it did. Would you then be OK if

x |> foo()

no longer worked as it does now, and produced foo()(x) instead of foo(x)?

If you are not OK with that and want to retain the current behaviour,
what would you want to happen with the following?

bar <- function(x) function(n) rnorm(n, mean = x)

10 |> bar(runif(1))() # works 'as expected' ~ bar(runif(1))(10)
10 |> bar(runif(1)) # currently bar(10, runif(1))

both of which you probably want. But then

baz <-  bar(runif(1))
10 |> baz

(not currently allowed) will not be the same as what you would want from

10 |> bar(runif(1))

which leads to a different kind of inconsistency, doesn't it?

-Deepayan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel







--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] anonymous functions

2020-12-07 Thread luke-tierney


I don't disagree in principle, but the reality is users want shortcuts
and as a result various packages, in particular tidyverse, have been
providing them. Mostly based on formulas, mostly with significant
issues since formulas weren't designed for this, and mostly
incompatible (tidyverse ones are compatible within tidyverse but not
with others). And of course none work in sapply or lapply. Providing a
shorthand in base may help to improve this. You don't have to use it
if you don't want to, and you can establish coding standards that
disallow it if you like.

Best,

luke

On Mon, 7 Dec 2020, Therneau, Terry M., Ph.D. via R-devel wrote:

“The shorthand form \(x) x + 1 is parsed as function(x) x + 1. It may be 
helpful in making code containing simple function expressions more readable.”


Color me unimpressed.
Over the decades I've seen several "who can write the shortest code" threads: 
in Fortran, in C, in Splus, ...   The same old idea that "short" is a synonym 
for either elegant, readable, or efficient is now being recylced in the 
tidyverse.   The truth is that "short" is actually an antonym for all of 
these things, at least for anyone else reading the code; or for the original 
coder 30-60 minutes after the "clever" lines were written.  Minimal use of 
the spacebar and/or the return key isn't usually held up as a goal, but 
creeps into many practiioner's code as well.


People are excited by replacing "function(" with "\("?  Really?   Are people 
typing code with their thumbs?
I am ambivalent about pipes: I think it is a great concept, but too many of 
my colleagues think that using pipes = no need for any comments.


As time goes on, I find my goal is to make my code less compact and more 
readable.  Every bug fix or new feature in the survival package now adds more 
lines of comments or other documentation than lines of code.  If I have to 
puzzle out what a line does, what about the poor sod who inherits the 
maintainance?






--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: New pipe operator

2020-12-07 Thread luke-tierney


Or, keeping dplyr but with R-devel pipe and function shorthand:

DF <- "myfile.csv" %>%
   readLines() |>
   \(.) gsub(r'{(c\(.*?\)|integer\(0\))}', r'{"\1"}', .) |>
   \(.) read.csv(text = .) |>
   mutate(across(2:3, \(col) lapply(col, \(x) eval(parse(text = x)

Using named arguments to redirect to the implicit first does work,
also in magrittr, but for me at least it is the kind of thing I would
probably regret a month later when trying to figure out the code.

Best,

luke

On Mon, 7 Dec 2020, Gabor Grothendieck wrote:


On Sat, Dec 5, 2020 at 1:19 PM  wrote:

Let's get some experience


Here is my last SO post using dplyr rewritten to use R 4.1 devel.  Seems
not too bad.  Was able to work around the placeholder for gsub by specifying
the arg names and used \(...)... elsewhere.  This does not address the
inconsistency discussed though.  I have indented by 2 spaced in case the
email wraps around.  The objective is to read myfile.csv including columns that
contain c(...) and integer(0), parsing and evaluating them.


 # taken from:
 # 
https://stackoverflow.com/questions/65174764/reading-in-a-csv-that-contains-vectors-cx-y-in-r/65175172#65175172

 # create input file for testing
 Lines <- 
"\"col1\",\"col2\",\"col3\"\n\"a\",1,integer(0)\n\"c\",c(3,4),5\n\"e\",6,7\n"
 cat(Lines, file = "myfile.csv")

 #
 # base R 4.1 (devel)
 DF <- "myfile.csv" |>
   readLines() |>
   gsub(pattern = r'{(c\(.*?\)|integer\(0\))}', replacement = r'{"\1"}') |>
   \(.) read.csv(text = .) |>
   \(.) replace(., 2:3, lapply(.[2:3], \(col) lapply(col, \(x)
eval(parse(text = x)

 #
 # dplyr/magrittr
 library(dplyr)

 DF <- "myfile.csv" %>%
   readLines %>%
   gsub(r'{(c\(.*?\)|integer\(0\))}', r'{"\1"}', .) %>%
   { read.csv(text = .) } %>%
   mutate(across(2:3, ~ lapply(., function(x) eval(parse(text = x)



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: New pipe operator

2020-12-06 Thread luke-tierney

+ 1’. The pipe implementation as a syntax transformation
was motivated by suggestions from Jim Hester and Lionel Henry. These
features are experimental and may change prior to release.


This is a good addition; by using "|>" instead of "%>%" there should be
a chance to get operator precedence right.  That said, the ?Syntax help
topic hasn't been updated, so I'm not sure where it fits in.

There are some choices that take a little getting used to:


mtcars |> head

Error: The pipe operator requires a function call or an anonymous
function expression as RHS

(I need to say mtcars |> head() instead.)  This sometimes leads to error
messages that are somewhat confusing:


mtcars |> magrittr::debug_pipe |> head

Error: function '::' not supported in RHS call of a pipe

but

mtcars |> magrittr::debug_pipe() |> head()

works.

Overall, I think this is a great addition, though it's going to be
disruptive for a while.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel







--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu







--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: New pipe operator

2020-12-06 Thread luke-tierney

nce right.  That said, the ?Syntax help
topic hasn't been updated, so I'm not sure where it fits in.

There are some choices that take a little getting used to:

> mtcars |> head
Error: The pipe operator requires a function call or an anonymous
function expression as RHS

(I need to say mtcars |> head() instead.)  This sometimes leads to error
messages that are somewhat confusing:

> mtcars |> magrittr::debug_pipe |> head
Error: function '::' not supported in RHS call of a pipe

but

mtcars |> magrittr::debug_pipe() |> head()

works.

Overall, I think this is a great addition, though it's going to be
disruptive for a while.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel







--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: New pipe operator

2020-12-05 Thread luke-tierney

or a while.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


______
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: New pipe operator

2020-12-04 Thread luke-tierney


On Sat, 5 Dec 2020, Duncan Murdoch wrote:


On 04/12/2020 2:26 p.m., luke-tier...@uiowa.edu wrote:

On Fri, 4 Dec 2020, Dénes Tóth wrote:



On 12/4/20 3:05 PM, Duncan Murdoch wrote:

...

It's tempting to suggest it should allow something like

    mtcars |> subset(cyl == 4) |> lm(mpg ~ disp, data = .)

which would be expanded to something equivalent to the other versions: 
but
that makes it quite a bit more complicated.  (Maybe _ or \. should be 
used

instead of ., since those are not legal variable names.)


I support the idea of using an underscore (_) as the placeholder symbol.


I strongly oppose adding a placeholder. Allowing for an optional
placeholder significantly complicates both implementing and explaining
the semantics. For a simple syntax transformation to be viable it
would also require some restrictions, such as only allowing a
placeholder as a top level argument and only once. Checking that these
restrictions are met, and accurately signaling when they are not with
reasonable error messages, is essentially an unsolvable problem given
R's semantics.


I don't think you read my suggestion, but that's okay:  you're maintaining 
it, not me.


I thought I did but maybe I missed something. You are right that
supporting a placeholder makes things a lot more complicated. For
being able to easily recognize the non-standard cases _ is better than
. but for me at least not by much.

We did try a number of variations; the code is in the R-syntax branch.
At the root of that branch are two .md files with some notes as of
around useR20. Once things settle down I may update those and look
into turning them into a blog post.

Best,

luke



Duncan Murdoch



The case where the LHS is to be passed as something other than the
first argument is unusual. For me, having that case stand out by using
a function expression makes it much easier to see and so makes the
code easier to understand. As a wearer of progressive bifocals
and someone whose screen is not always free of small dust particles,
having to spot the non-standard pipe stages by seeing a placeholder,
especially a . placeholder, is be a bug, not a feature.

Best,

luke

Syntactic sugars work the the best if 1) they require less keystrokes 
and/or
2) are easier to read compared to the "normal" syntax, and 3) can not lead 
to
unexpected bugs (which is a major problem with the magrittr pipe). Using 
'_'
fulfills all of these criteria since '_' can not clash with any variable 
in

the environment.

Denes

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel








--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: New pipe operator

2020-12-04 Thread luke-tierney


On Fri, 4 Dec 2020, Dénes Tóth wrote:



On 12/4/20 3:05 PM, Duncan Murdoch wrote:

...

It's tempting to suggest it should allow something like

   mtcars |> subset(cyl == 4) |> lm(mpg ~ disp, data = .)

which would be expanded to something equivalent to the other versions: but 
that makes it quite a bit more complicated.  (Maybe _ or \. should be used 
instead of ., since those are not legal variable names.)


I support the idea of using an underscore (_) as the placeholder symbol.


I strongly oppose adding a placeholder. Allowing for an optional
placeholder significantly complicates both implementing and explaining
the semantics. For a simple syntax transformation to be viable it
would also require some restrictions, such as only allowing a
placeholder as a top level argument and only once. Checking that these
restrictions are met, and accurately signaling when they are not with
reasonable error messages, is essentially an unsolvable problem given
R's semantics.

The case where the LHS is to be passed as something other than the
first argument is unusual. For me, having that case stand out by using
a function expression makes it much easier to see and so makes the
code easier to understand. As a wearer of progressive bifocals
and someone whose screen is not always free of small dust particles,
having to spot the non-standard pipe stages by seeing a placeholder,
especially a . placeholder, is be a bug, not a feature.

Best,

luke

Syntactic sugars work the the best if 1) they require less keystrokes and/or 
2) are easier to read compared to the "normal" syntax, and 3) can not lead to 
unexpected bugs (which is a major problem with the magrittr pipe). Using '_' 
fulfills all of these criteria since '_' can not clash with any variable in 
the environment.


Denes

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: .Internal(quit(...)): system call failed: Cannot allocate memory

2020-12-01 Thread luke-tierney

leanTempDir() could 
call R_unlink() instead of having a subprocess call 'rm -rf ...'.  Then it 
could also issue a specific warning if it was impossible to delete all of 
tempdir().  (That should be very rare.)


q("no")

Breakpoint 1, R_system (command=command@entry=0x7fffa1e0 "rm -Rf 
/tmp/RtmppoKPXb") at sysutils.c:311
311 {
(gdb) where
#0  R_system (command=command@entry=0x7fffa1e0 "rm -Rf /tmp/RtmppoKPXb") at 
sysutils.c:311
#1  0x557c30ec in R_CleanTempDir () at sys-std.c:1178
#2  0x557c31d7 in Rstd_CleanUp (saveact=, status=0, 
runLast=) at sys-std.c:1243
#3  0x557c593d in R_CleanUp (saveact=saveact@entry=SA_NOSAVE, 
status=status@entry=0, runLast=) at system.c:87
#4  0x556cc85e in do_quit (call=, op=, 
args=0x57813f90, rho=) at main.c:1393

-Bill

On Mon, Nov 23, 2020 at 3:15 AM Tomas Kalibera  wrote:

On 11/21/20 6:51 PM, Jan Gorecki wrote:

Dear R-developers,

Some of the more fat scripts (50+ GB mem used by R) that I am running,
when they finish they do quit with q("no", status=0)
Quite often it happens that there is an extra stderr output produced
at the very end which looks like this:

Warning message:
In .Internal(quit(save, status, runLast)) :
system call failed: Cannot allocate memory

Is there any way to avoid this kind of warnings? I am using stderr
output for detecting failures in scripts and this warning is a false
positive of a failure.

Maybe quit function could wait little bit longer trying to allocate
before it raises this warning?

If you see this warning, some call to system() or system2() or similar,
which executes an external program, failed to even run a shell to run
that external program, because there was not enough memory. You should
be able to find out where it happens by checking the exit status of
system().

Tomas



Best regards,
Jan Gorecki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





______
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall      email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu




--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall      email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu




--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: .Internal(quit(...)): system call failed: Cannot allocate memory

2020-11-27 Thread luke-tierney

 directory
should follow.

It might also help diagnosing your problem, but I don't think it would
solve it. If the diagnostics in R works fine and the OS was so
hopelessly out of memory that it couldn't run any more external
processes, then really this is not a problem of R, but of having
exhausted the resources. And it would be a coincidence that just this
particular call to "system" at the end of the session did not work.
Anything else could break as well close to the end of the script. This
seems the most likely explanation to me.

Do you get this warning repeatedly, reproducibly at least in slightly
different scripts at the very end, with this warning always from quit()?
So that the "call" part of the warning message has .Internal(quit) like
in the case you posted? Would adding another call to "system" before the
call to "q()" work - with checking the return value? If it is always
only the last call to "system" in "q()", then it is suspicious, perhaps
an indication that some diagnostics in R is not correct. In that case, a
reproducible example would be the key - so either if you could diagnose
on your end what is the problem, or create a reproducible example that
someone else can use to reproduce and debug.

Best
Tomas



On Mon, Nov 23, 2020 at 7:10 PM Bill Dunlap  wrote:

The call to system() probably is an internal call used to delete the session's 
tempdir().  This sort of failure means that a potentially large amount of disk 
space is not being recovered when R is done.  Perhaps R_CleanTempDir() could 
call R_unlink() instead of having a subprocess call 'rm -rf ...'.  Then it 
could also issue a specific warning if it was impossible to delete all of 
tempdir().  (That should be very rare.)


q("no")

Breakpoint 1, R_system (command=command@entry=0x7fffa1e0 "rm -Rf 
/tmp/RtmppoKPXb") at sysutils.c:311
311 {
(gdb) where
#0  R_system (command=command@entry=0x7fffa1e0 "rm -Rf /tmp/RtmppoKPXb") at 
sysutils.c:311
#1  0x557c30ec in R_CleanTempDir () at sys-std.c:1178
#2  0x557c31d7 in Rstd_CleanUp (saveact=, status=0, 
runLast=) at sys-std.c:1243
#3  0x557c593d in R_CleanUp (saveact=saveact@entry=SA_NOSAVE, 
status=status@entry=0, runLast=) at system.c:87
#4  0x556cc85e in do_quit (call=, op=, 
args=0x57813f90, rho=) at main.c:1393

-Bill

On Mon, Nov 23, 2020 at 3:15 AM Tomas Kalibera  wrote:

On 11/21/20 6:51 PM, Jan Gorecki wrote:

Dear R-developers,

Some of the more fat scripts (50+ GB mem used by R) that I am running,
when they finish they do quit with q("no", status=0)
Quite often it happens that there is an extra stderr output produced
at the very end which looks like this:

Warning message:
In .Internal(quit(save, status, runLast)) :
system call failed: Cannot allocate memory

Is there any way to avoid this kind of warnings? I am using stderr
output for detecting failures in scripts and this warning is a false
positive of a failure.

Maybe quit function could wait little bit longer trying to allocate
before it raises this warning?

If you see this warning, some call to system() or system2() or similar,
which executes an external program, failed to even run a shell to run
that external program, because there was not enough memory. You should
be able to find out where it happens by checking the exit status of
system().

Tomas



Best regards,
Jan Gorecki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





______
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall      email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu




--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: .Internal(quit(...)): system call failed: Cannot allocate memory

2020-11-25 Thread luke-tierney

 a subprocess call 'rm -rf ...'.  Then it 
could also issue a specific warning if it was impossible to delete all of 
tempdir().  (That should be very rare.)


q("no")

Breakpoint 1, R_system (command=command@entry=0x7fffa1e0 "rm -Rf 
/tmp/RtmppoKPXb") at sysutils.c:311
311 {
(gdb) where
#0  R_system (command=command@entry=0x7fffa1e0 "rm -Rf /tmp/RtmppoKPXb") at 
sysutils.c:311
#1  0x557c30ec in R_CleanTempDir () at sys-std.c:1178
#2  0x557c31d7 in Rstd_CleanUp (saveact=, status=0, 
runLast=) at sys-std.c:1243
#3  0x557c593d in R_CleanUp (saveact=saveact@entry=SA_NOSAVE, 
status=status@entry=0, runLast=) at system.c:87
#4  0x556cc85e in do_quit (call=, op=, 
args=0x57813f90, rho=) at main.c:1393

-Bill

On Mon, Nov 23, 2020 at 3:15 AM Tomas Kalibera  wrote:

On 11/21/20 6:51 PM, Jan Gorecki wrote:

Dear R-developers,

Some of the more fat scripts (50+ GB mem used by R) that I am running,
when they finish they do quit with q("no", status=0)
Quite often it happens that there is an extra stderr output produced
at the very end which looks like this:

Warning message:
In .Internal(quit(save, status, runLast)) :
system call failed: Cannot allocate memory

Is there any way to avoid this kind of warnings? I am using stderr
output for detecting failures in scripts and this warning is a false
positive of a failure.

Maybe quit function could wait little bit longer trying to allocate
before it raises this warning?

If you see this warning, some call to system() or system2() or similar,
which executes an external program, failed to even run a shell to run
that external program, because there was not enough memory. You should
be able to find out where it happens by checking the exit status of
system().

Tomas



Best regards,
Jan Gorecki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





______
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] exists, get and get0 accept silently inputs of length > 1

2020-11-23 Thread luke-tierney


Thanks for the suggestion.

In R-devel (as of r79474) exists(), get(), and get0() now signal an
error if the first argument has length > 1. This will cause about 30
CRAN packages and possibly a couple of Bioconductor packages to fail
under R-devel.

getS3method() now also signals an error if the class argument has
length > 1. Calls of the form getS2method(generic, class(x)) will now
fail if class(x) has length > 1. I believe most CRAN package issues
related to this change have already been resolved, but a few may
remain.

Best,

luke

On Fri, 13 Nov 2020, Antoine Fabri wrote:


Dear R-devel,

The doc of exists, get and get0 is unambiguous, x should be an object given
as a character string. However these accept longer inputs. It can lead an
uncareful user to think these functions are vectorized when they're not,
and generally lets through bugs that one might have preferred to trigger
earlier failure.

``` r
exists("d")
#> [1] FALSE
exists(c("c", "d"))
#> [1] TRUE
get(c("c", "d"))
#> function (...)  .Primitive("c")
get0(c("c", "d"))
#> function (...)  .Primitive("c")
```

I believe these should either fail, or be vectorized, probably the former.

Thanks,

Antoine

[[alternative HTML version deleted]]

__________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Two ALTREP questions

2020-11-21 Thread luke-tierney


On Sat, 21 Nov 2020, Jiefei Wang wrote:


Hello,

I have two related ALTREP questions. It seems like there is no way to
assign attributes to an ALTREP vector without using C++ code. To be more
specifically, I want to make an ALTREP matrix, I have tried the following R
code but none of them work.
```
.Internal(inspect(1:6))
.Internal(inspect(matrix(1:6, 2,3)))
.Internal(inspect(as.matrix(1:6)))
.Internal(inspect(structure(1:6, dim = c(2L,3L
.Internal(inspect({x <- 1:6;attr(x, "dim") <- c(2L,3L);x}))
.Internal(inspect({x <- 1:6;attributes(x)<- list(dim = c(2L,3L));x}))
```


Some things that my help you:

- Try with 1:6 replaced by as.character(1:6), and look at the REF
  values in both cases.

- In particular, look at what this gives you:

x <- as.character(1:6)
attr(x, "dim") <- c(2, 3)

- Things can be a little different with larger vectors; try variants
  of your examples for more than 64 elements.


This also brings
my second question, it seems like the ALTREP coercion function does not
handle attributes correctly.  After the coercion, the ALTREP object will
lose its attributes.
```
coerceFunc <- inline::cxxfunction( signature(x = "SEXP", attr = "SEXP" ) , '
SET_ATTRIB(x,attr);
return(Rf_coerceVector(x, REALSXP));
')

coerceFunc(1:6, pairlist(dim = c(2L, 3L)))

[1] 1 2 3 4 5 6

coerceFunc(1:6 + 0L, pairlist(dim = c(2L, 3L)))

[,1] [,2] [,3]
[1,]135
[2,]246
```
The problem is that the coercion function is directly dispatched to the
user-defined ALTREP coercion function, so the user is responsible to attach
the attributes after the coercion. If he forgets to do so, then the result
is a plain vector. Similar to the `Duplicate` and `DuplicateEX` functions
where the former one will attach the attributes by default, I feel that the
`Coerce` function should only return a plain vector and there should be a
`CoerceEx` function to do the attribute assignment, so the logic in the
no-EX ALTREP functions can be consistent. I do not know how dramastic the
change would be, so maybe this is too hard to do.


Since you raised this earlier I have been looking at it and also think
that this needs to he handled along the lines of
Duplicate/DuplicateEx. I need to find some time to think that through
and implement it; hopefully I'll get to it before the end of the year.


BTW, is there any way to contribute to the R source? I know R has a limited
resouces, so if possible, I will be happy to fix the matrix issue myself
and make some minor contributions to the R community.


You can find the suggested process for contributing described in the
'Reporting Bugs' link on the R home page https://www.r-project.org/

Best,

luke


Best,
Jiefei

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] exists, get and get0 accept silently inputs of length > 1

2020-11-16 Thread luke-tierney

Come on, folks. There is no NSE involved in calls to get(): it's
standard evaluation all the way into the C code. Prior to the change a
first argument that is anything other than a character vector would
produce an error. After the change, passing in a symbol will do the
obvious thing. Code that worked previously without error (i.e. called
get() with string values) will continue to work exactly as it did
before.

It's a little more convenient and a little more efficient for some
computations on the language not to have to call as.character on
symbols before passing them to get(). Hence the change expanding the
domain of get().

luke

On Tue, 17 Nov 2020, Gabriel Becker wrote:

Hi all,
I have used variable values in get() as well, and including, I think, in
package code (though pretty infrequently).
Perhaps a character.only argument similar to library?

~G

On Mon, Nov 16, 2020 at 5:31 PM Hugh Parsonage 
wrote:
  I noticed the recent commit to R-dev (r79434).  Is this wise?
  I've
  often used get() in constructions like

  for (j in ls()) if (is.numeric(x <- get(j))) ...

  (and often interactively, rather than in a package)

  Am I to understand that get(j) will now be equivalent to `j`
  even if j
  is a string referring putatively to another object?

  On Sat, 14 Nov 2020 at 01:34,  wrote:
  >
  > Worth looking into. It would probably cause some check
  failures, so
  > would probably be a good idea to run a check across
  BIOC/CRAN.  At the
  > same time it would be worth allowing name objects (type
  "symbol") so
  > thee don't have to be converted to character for the call and
  then
  > back to names internally for the environment lookup.
  >
  > Best,
  >
  > luke
  >
  > On Fri, 13 Nov 2020, Antoine Fabri wrote:
  >
  > > Dear R-devel,
  > >
  > > The doc of exists, get and get0 is unambiguous, x should be
  an object given
  > > as a character string. However these accept longer inputs.
  It can lead an
  > > uncareful user to think these functions are vectorized when
  they're not,
  > > and generally lets through bugs that one might have
  preferred to trigger
  > > earlier failure.
  > >
  > > ``` r
  > > exists("d")
  > > #> [1] FALSE
  > > exists(c("c", "d"))
  > > #> [1] TRUE
  > > get(c("c", "d"))
  > > #> function (...)  .Primitive("c")
  > > get0(c("c", "d"))
  > > #> function (...)  .Primitive("c")
  > > ```
  > >
  > > I believe these should either fail, or be vectorized,
  probably the former.
  > >
  > > Thanks,
  > >
  > > Antoine
  > >
  > >       [[alternative HTML version deleted]]
  > >
  > > __
  > > R-devel@r-project.org mailing list
  > > https://stat.ethz.ch/mailman/listinfo/r-devel
  > >
  >
  > --
  > Luke Tierney
  > Ralph E. Wareham Professor of Mathematical Sciences
  > University of Iowa                  Phone:
   319-335-3386
  > Department of Statistics and        Fax:
   319-335-3017
  >     Actuarial Science
  > 241 Schaeffer Hall                  email:
   luke-tier...@uiowa.edu
  > Iowa City, IA 52242                 WWW:
  http://www.stat.uiowa.edu
  >
  > __
  > R-devel@r-project.org mailing list
  > https://stat.ethz.ch/mailman/listinfo/r-devel

  __
  R-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-devel

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] exists, get and get0 accept silently inputs of length > 1

2020-11-13 Thread luke-tierney


Worth looking into. It would probably cause some check failures, so
would probably be a good idea to run a check across BIOC/CRAN.  At the
same time it would be worth allowing name objects (type "symbol") so
thee don't have to be converted to character for the call and then
back to names internally for the environment lookup.

Best,

luke

On Fri, 13 Nov 2020, Antoine Fabri wrote:


Dear R-devel,

The doc of exists, get and get0 is unambiguous, x should be an object given
as a character string. However these accept longer inputs. It can lead an
uncareful user to think these functions are vectorized when they're not,
and generally lets through bugs that one might have preferred to trigger
earlier failure.

``` r
exists("d")
#> [1] FALSE
exists(c("c", "d"))
#> [1] TRUE
get(c("c", "d"))
#> function (...)  .Primitive("c")
get0(c("c", "d"))
#> function (...)  .Primitive("c")
```

I believe these should either fail, or be vectorized, probably the former.

Thanks,

Antoine

[[alternative HTML version deleted]]

______
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: Change to I() in R 4.1

2020-10-30 Thread luke-tierney


On Fri, 30 Oct 2020, Pages, Herve wrote:



On 10/29/20 23:08, Pages, Herve wrote:
...


I can think of 2 ways to move forward:

1. Keep I()'s current implementation but suppress the warning. We'll
make the necessary adjustments to DataFrame() to repair columns supplied
as I() objects. Note that we would still be in the situation where
I() objects break validObject() but we've been in that situation for
years and so far we've managed to work around it. However this doesn't
mean that validObject() shouldn't be fixed. Note that print(I())
would also need to be fixed (it says "" which is
misleading). Anyways, these 2 issues are separated from the main issue
and can be dealt with later.


1b. A variant of the above could be to use the old implementation for S4
objects only:

  I <- function(x)
  {
  if (isS4(x)) {
  structure(x, class = unique.default(c("AsIs", oldClass(x
  } else {
  `class<-`(x, unique.default(c("AsIs", oldClass(x
  }
  }

That is probably a good compromise for now.


Not really. The underlying problem is that class<- and attributes<-
(which is what structure() uses) handle the 'class' attribute
differently, and that needs to be fixed. I don't have a strong opinion
on what either should do, but they should do the same thing.

It's probably worth re-thinking the I() mechanism. ?Modifying the
value, whether by changing the class or an attribute, is going to be
brittle. A little less so for an attribute, but using an attribute
rules out dispatch on the AsIs property.

Best,

luke



I would also suggest that the "package" attribute of the S4 class be
kept around so the code that we use to restore the original object has a
way to restore it exactly, including its full class specification. Right
now, and also with the previous implementation, we cannot do that
because attr(class(x), "package") is lost. So something like this:

  I <- function(x)
  {
  if (isS4(x)) {
  x_class <- class(x)
  new_classes <- c("AsIs", x_class)
  attr(new_classes, "package") <- attr(x_class, "package")
  structure(x, class=new_classes)
  } else {
      `class<-`(x, unique.default(c("AsIs", oldClass(x
  }
  }

Thanks,
H.





--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Something is wrong with the unserialize function

2020-10-29 Thread luke-tierney


I found that also; fixed in r79386 in the trunk. Will port to R-patched
shortly.

Best,

luke

On Thu, 29 Oct 2020, Martin Morgan wrote:


This

Index: src/main/altrep.c
===
--- src/main/altrep.c   (revision 79385)
+++ src/main/altrep.c   (working copy)
@@ -275,10 +275,11 @@
SEXP psym = ALTREP_SERIALIZED_CLASS_PKGSYM(info);
SEXP class = LookupClass(csym, psym);
if (class == NULL) {
-   SEXP pname = ScalarString(PRINTNAME(psym));
+   SEXP pname = PROTECT(ScalarString(PRINTNAME(psym)));
R_tryCatchError(find_namespace, pname,
handle_namespace_error, NULL);
class = LookupClass(csym, psym);
+   UNPROTECT(1);
}
return class;
}

seems to remove the warning; I'm guessing that the other SEXP already exist so 
don't need protecting?

Martin Morgan


On 10/29/20, 12:47 PM, "R-devel on behalf of luke-tier...@uiowa.edu" 
 wrote:

   Thanks for the report. Will look into it when I get a chance unless
   someone else gets there first.

   A simpler reprex:

   ## create and serialize a memmory-mapped file object
   filePath <- "x.dat"
   con <- file(filePath, "wrb")
   writeBin(rep(0.0,10),con)
   close(con)

   library(simplemmap)
   x <- mmap(filePath, "double")
   saveRDS(x, file = "x.Rds")

   ## in a separate R process:
   gctorture()
   readRDS("x.Rds")

   Looks like a missing PROTECT somewhere.

   Best,

   luke

   On Thu, 29 Oct 2020, Jiefei Wang wrote:

   > Hi all,
   >
   > I am not able to export an ALTREP object when `gctorture` is on in the
   > worker. The package simplemmap can be used to reproduce the problem. See
   > the example below
   > ```
   > ## Create a temporary file
   > filePath <- tempfile()
   > con <- file(filePath, "wrb")
   > writeBin(rep(0.0,10),con)
   > close(con)
   >
   > library(simplemmap)
   > library(parallel)
   > cl <- makeCluster(1)
   > x <- mmap(filePath, "double")
   > ## Turn gctorture on
   > clusterEvalQ(cl, gctorture())
   > clusterExport(cl, "x")
   > ## x is an 0-length vector on the worker
   > clusterEvalQ(cl, x)
   > stopCluster(cl)
   > ```
   >
   > you can find more info on the problem if you manually build a connection
   > between two R processes and export the ALTREP object. See output below
   > ```
   >> con <- socketConnection(port = 1234,server = FALSE)
   >> gctorture()
   >> x <- unserialize(con)
   > Warning message:
   > In unserialize(con) :
   >  cannot unserialize ALTVEC object of class 'mmap_real' from package
   > 'simplemmap'; returning length zero vector
   > ```
   > It seems like  simplemmap did not get loaded correctly on the worker. If
   > you run `library( simplemmap)` before unserializing the ALTREP, there will
   > be no problem. But I suppose we should be able to unserialize objects
   > without preloading the library?
   >
   > This issue can be reproduced on Ubuntu with R version 4.0.2 (2020-06-22)
   > and Windows with R Under development (unstable) (2020-09-03 r79126).
   >
   > Here is the link to simplemmap:
   > https://github.com/ALTREP-examples/Rpkg-simplemmap
   >
   > Best,
   > Jiefei
   >
   > [[alternative HTML version deleted]]
   >
   > __
   > R-devel@r-project.org mailing list
   > https://stat.ethz.ch/mailman/listinfo/r-devel
   >

   --
   Luke Tierney
   Ralph E. Wareham Professor of Mathematical Sciences
   University of Iowa  Phone: 319-335-3386
   Department of Statistics andFax:   319-335-3017
   Actuarial Science
   241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
   Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

   __
   R-devel@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Something is wrong with the unserialize function

2020-10-29 Thread luke-tierney


Thanks for the report. Will look into it when I get a chance unless
someone else gets there first.

A simpler reprex:

## create and serialize a memmory-mapped file object
filePath <- "x.dat"
con <- file(filePath, "wrb")
writeBin(rep(0.0,10),con)
close(con)

library(simplemmap)
x <- mmap(filePath, "double")
saveRDS(x, file = "x.Rds")

## in a separate R process:
gctorture()
readRDS("x.Rds")

Looks like a missing PROTECT somewhere.

Best,

luke

On Thu, 29 Oct 2020, Jiefei Wang wrote:


Hi all,

I am not able to export an ALTREP object when `gctorture` is on in the
worker. The package simplemmap can be used to reproduce the problem. See
the example below
```
## Create a temporary file
filePath <- tempfile()
con <- file(filePath, "wrb")
writeBin(rep(0.0,10),con)
close(con)

library(simplemmap)
library(parallel)
cl <- makeCluster(1)
x <- mmap(filePath, "double")
## Turn gctorture on
clusterEvalQ(cl, gctorture())
clusterExport(cl, "x")
## x is an 0-length vector on the worker
clusterEvalQ(cl, x)
stopCluster(cl)
```

you can find more info on the problem if you manually build a connection
between two R processes and export the ALTREP object. See output below
```

con <- socketConnection(port = 1234,server = FALSE)
gctorture()
x <- unserialize(con)

Warning message:
In unserialize(con) :
 cannot unserialize ALTVEC object of class 'mmap_real' from package
'simplemmap'; returning length zero vector
```
It seems like  simplemmap did not get loaded correctly on the worker. If
you run `library( simplemmap)` before unserializing the ALTREP, there will
be no problem. But I suppose we should be able to unserialize objects
without preloading the library?

This issue can be reproduced on Ubuntu with R version 4.0.2 (2020-06-22)
and Windows with R Under development (unstable) (2020-09-03 r79126).

Here is the link to simplemmap:
https://github.com/ALTREP-examples/Rpkg-simplemmap

Best,
Jiefei

[[alternative HTML version deleted]]

__________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: Coercion function does not work for the ALTREP object

2020-10-08 Thread luke-tierney


For larger atomic vectors (currently >= 64 elements) the complex
assignment process tries to avoid duplicating when only attributes are
updated, This is done with an ALTREP wrapper. The differences in
whether the Duplicate method are called for smaller and larger vectors
are therefore as intended, Ideally there should be no difference for
Coerce. There is a difference because wrappers currently don't
delegate the Coerce method when the wrapped object is an ALTREP. I'll
look into whether that can be addressed without breaking things.

Best,

luke

On Thu, 8 Oct 2020, Jiefei Wang wrote:


Hi Gabriel, here is a simple package for reproducing the problem.

https://github.com/Jiefei-Wang/testPkg

Best,
Jiefei

On Thu, Oct 8, 2020 at 5:04 AM Gabriel Becker  wrote:


Jiefei,

Where does the code for your altrep class live?

Thanks,
~G

On Wed, Oct 7, 2020 at 4:25 AM Jiefei Wang  wrote:


Hi all,

The coercion function defined for the ALTREP object will not be called by
R
when an assignment operation implicitly introduces coercion for a large
ALTREP object.

For example, If I create a vector of length 10, the ALTREP coercion
function seems to work fine.
```

x <- 1:10
y <- wrap_altrep(x)
.Internal(inspect(y))

@0x1f9271c0 13 INTSXP g0c0 [REF(2)] I am altrep

y[1] <- 1.0

Duplicating object
Coercing object

.Internal(inspect(y))

@0x1f927c08 14 REALSXP g0c0 [REF(1)] I am altrep
```

However, if I create a vector of length 1024, R will give me a normal
real-type vector
```

x <- 1:1024
y <- wrap_altrep(x)
.Internal(inspect(y))

@0x1f8ddb20 13 INTSXP g0c0 [REF(2)] I am altrep

y[1] <- 1.0
.Internal(inspect(y))

@0x1f0d72a0 14 REALSXP g0c7 [REF(1)] (len=1024, tl=0)
1,2,3,4,5,...
```

Note that the duplicate function is also called for the first example. It
seems like R completely ignores my ALTREP functions in the second example.
I feel this might be designed on purpose, but I do not understand the
reason behind it. Is there any reason why we are not consistent here? Here
is my session info

sessionInfo()
R Under development (unstable) (2020-09-03 r79126)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)

Best,
Jiefei

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Thread-safe R functions

2020-09-13 Thread luke-tierney


You should assume that NO functions or macros in the R API are
thread-safe.  If some happen to be now, on some platforms, they are
not guaranteed to be in the future. Even if you use a global lock you
need to keep in mind that any function in the R API can signal an
error and execute a longjmp, so you need to make sure you have set a
top level context in your thread.

Best,

luke

On Sun, 13 Sep 2020, Jiefei Wang wrote:


Hi,

I am curious about whether there exist thread-safe functions in
`Rinternals.h`.  I know that R is single-threaded designed, but for the
simple and straightforward functions like `DATAPTR` and `INTEGER_GET_REGION`,
are these functions safe to call in a multi-thread environment?

Best,
Jiefei

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: Operations with long altrep vectors cause segfaults on Windows

2020-09-08 Thread luke-tierney


On Tue, 8 Sep 2020, Martin Maechler wrote:


luke-tierney
on Tue, 8 Sep 2020 09:42:43 -0500 (CDT) writes:


   > On Tue, 8 Sep 2020, Martin Maechler wrote:
   >>>>>>> Martin Maechler
   >>>>>>> on Tue, 8 Sep 2020 10:40:24 +0200 writes:
   >>
   >>>>>>> Hugh Parsonage
   >>>>>>> on Tue, 8 Sep 2020 18:08:11 +1000 writes:
   >>
   >> >> I can only reproduce on Windows, but reliably (both 4.0.0 and 4.0.2):
   >>
   >> >> $> R --vanilla
   >> >> x <- c(0L, -2e9:2e9)
   >>
   >> >> # > Segmentation fault
   >>
   >> >> Tried to reproduce on Linux but the above worked as expected. Not an
   >> >> issue merely with the length of the vector; for example, x <-
   >> >> rep_len(1:10, 1e10) works, though the altrep vector must be long to
   >> >> reproduce:
   >>
   >> >> x <- c(0L, -1e9:1e9)  #ok
   >>
   >> >> Segmentation faults occur with the following too:
   >>
   >> >> x <- (-2e9:2e9) + 1L
   >>
   >> > Your operation would "need" (not in theory, but in practice)
   >> > to go from altrep to regular vectors.
   >> > I guess the segfault occurs because of something like this :
   >>
   >> > R asks Windows to hand it a huge amount of memory and Windows replies
   >> > "ok, here is the memory pointer"
   >> > and then R tries to write to there, but illegally (because
   >> > Windows should have told R that it does not really have enough
   >> > memory for that ..).
   >>
   >> > I cannot reproduce the segmentation fault .. but I can confirm
   >> > there is a bug there that shows for me on Windows but not on
   >> > Linux:
   >>
   >> > "My" Windows is on a terminalserver not with too many GB of memory
   >> > (but then in a version of Windows that recognizes that it cannot
   >> > get so much memory):
   >>
   >> > - Here some transcript (thanks to
   >> > using Emacs w/ ESS also on Windows) --
   >>
   >> > R Under development (unstable) (2020-08-24 r79074) -- "Unsuffered 
Consequences"
   >> > Copyright (C) 2020 The R Foundation for Statistical Computing
   >> > Platform: x86_64-w64-mingw32/x64 (64-bit)
   >>
   >> > R ist freie Software und kommt OHNE JEGLICHE GARANTIE.
   >> > Sie sind eingeladen, es unter bestimmten Bedingungen weiter zu 
verbreiten.
   >> > Tippen Sie 'license()' or 'licence()' für Details dazu.
   >>
   >> > R ist ein Gemeinschaftsprojekt mit vielen Beitragenden.
   >> > Tippen Sie 'contributors()' für mehr Information und 'citation()',
   >> > um zu erfahren, wie R oder R packages in Publikationen zitiert werden 
können.
   >>
   >> > Tippen Sie 'demo()' für einige Demos, 'help()' für on-line Hilfe, oder
   >> > 'help.start()' für eine HTML Browserschnittstelle zur Hilfe.
   >> > Tippen Sie 'q()', um R zu verlassen.
   >>
   >> >> x <- (-2e9:2e9) + 1L
   >> > Fehler: kann Vektor der Größe 14.9 GB nicht allozieren
   >> >> y <- c(0L, -2e9:2e9)
   >> > Fehler: kann Vektor der Größe 14.9 GB nicht allozieren
   >> >> Sys.setenv(LANGUAGE="en")
   >> >> y <- c(0L, -2e9:2e9)
   >> > Error: cannot allocate vector of size 14.9 Gb
   >> >> y <- -1e9:4e9
   >> >> .Internal(inspect(y))
   >> > @0x195a6808 14 REALSXP g0c0 [REF(65535)]  -10 : 
-294967296 (compact)
   >> >> .Machine$integer.max / 1e9
   >> > [1] 2.147484
   >> >> y <- -1e6:2.2e9
   >> >> .Internal(inspect(y))
   >> > @0x0a11a5d8 14 REALSXP g0c0 [REF(65535)]  -100 : 
-2094967296 (compact)
   >> >> y <- -1e6:2e9
   >> >> .Internal(inspect(y))
   >> > @0x0a13adf0 13 INTSXP g0c0 [REF(65535)]  -100 : 20 
(compact)
   >> >>
   >> > - end of transcript 
---
   >>
   >> > So indeed, no seg.fault, R notices that it can't get 15 GB of
   >> > memory.
   >>
   >> > But the bug is bad news:  We have *silent* integer overflow happening
   >> > according to what  .Internal(inspect(y)) shows...
   >>
   >> >  less bad new: Probably the bug is only in the 'internal inspect' 
code
   >> > where a format specifier is used in C's printf() that does not work

Re: [Rd] [External] Re: Operations with long altrep vectors cause segfaults on Windows

2020-09-08 Thread luke-tierney

nal(inspect(y))
 @0x0a285648 13 INTSXP g0c0 [REF(65535)]  -1000 : 20 (compact)
 > y <- -1e3:2.1e9 ;.Internal(inspect(y))
 @0x19925930 13 INTSXP g0c0 [REF(65535)]  -1000 : 21 (compact)

and here, y is correct, just the printing from
.Internal(inspect(y)) is bugous (probably prints the double as an integer):


It's a '%ld' that probably needs to be '%lld' for Windows. Will fix
sometime soon.

Best,

luke



 > y <- -1e3:2.2e9 ; .Internal(inspect(y))
 @0x195c0178 14 REALSXP g0c0 [REF(65535)]  -1000 : -2094967296 (compact)
 > length(y)
 [1] 221001
 > tail(y)
 [1] 2.2e+09 2.2e+09 2.2e+09 2.2e+09 2.2e+09 2.2e+09
 > tail(y) - 2.2e9
 [1] -5 -4 -3 -2 -1  0
 >

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: Operations with long altrep vectors cause segfaults on Windows

2020-09-08 Thread luke-tierney

ms much better ... until I do find a bug, may again
   > only in the C code underlying .Internal(inspect(.)) :

   >> y <- -1e9:2e9 ; .Internal(inspect(y))
   > @7d86ac0 13 INTSXP g0c0 [REF(65535)] Error: long vectors not supported 
yet: ../../../R/src/main/altclasses.c:139
   >>

Indeed, the purported "integer overflow" (above) does not
happen.
It is "only" a  'printf' related bug inside .Internal(inspect(.)) on Windows.

*interestingly*, the above bug I've noticed on (64-bit) Linux
does *not* show on Windows (64-bit), at least not for that case:

On Windows, things are fine as long as they remain (compacted
aka 'ALTREP') INTSXP:

 > y <- -1e3:2e9 ;.Internal(inspect(y))
  @0x0a285648 13 INTSXP g0c0 [REF(65535)]  -1000 : 20 (compact)
 > y <- -1e3:2.1e9 ;.Internal(inspect(y))
  @0x19925930 13 INTSXP g0c0 [REF(65535)]  -1000 : 21 (compact)

and here, y is correct, just the printing from
.Internal(inspect(y)) is bugous (probably prints the double as an integer):

 > y <- -1e3:2.2e9 ; .Internal(inspect(y))
  @0x195c0178 14 REALSXP g0c0 [REF(65535)]  -1000 : -2094967296 
(compact)
 > length(y)
  [1] 221001
 > tail(y)
  [1] 2.2e+09 2.2e+09 2.2e+09 2.2e+09 2.2e+09 2.2e+09
 > tail(y) - 2.2e9
  [1] -5 -4 -3 -2 -1  0
 >



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: some questions about R internal SEXP types

2020-09-08 Thread luke-tierney


On Tue, 8 Sep 2020, Hadley Wickham wrote:


On Tue, Sep 8, 2020 at 4:12 AM Tomas Kalibera  wrote:



The general principle is that R packages are only allowed to use what is
documented in the R help (? command) and in Writing R Extensions. The
former covers what is allowed from R code in extensions, the latter
mostly what is allowed from C code in extensions (with some references
to Fortran).


Could you clarify what you mean by "documented"? For example,
Rf_allocVector() is mentioned several times in R-exts, but I don't see
anywhere where the inputs and output are precisely described (which is
what I would consider to be documented). Is Rf_allocVector() part of
the API?


For now, documented means mentioned as something extension writers can
use.  Details are in the header files, Rinternals.h for
Rf_allocVector().

Ideally someone would find the time to refactor the header files,
Rinternals.h in particular, so everything in installed headers is
considered in the API and everything else is considered private and
subject to change. Unfortunately that would take a lot of effort, both
technical and political, and I don't see it happening soon. But I'm
happy to be proved wrong.

Best,

luke



Hadley




--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] HELPWANTED keyword in bugs.r-project.org

2020-08-05 Thread luke-tierney


Just a quick note to mention that we have added a HELPWANTED keyword
on bugs.r-project.org for tagging bugs and issues where a good
well-tested patch would be particularly appreciated.  You can find the
HELPWANTED issues by selecting the keyword in the search interface or at

https://bugs.r-project.org/bugzilla/buglist.cgi?keywords=HELPWANTED

This URL shows both open and resolved HELPWANTED issues.

At the moment only a handful of issues have been tagged, but there
will be more over time. One of these may be a good place to start if
you are looking for ways to contribute. The techincal level varies;
some might be resolved with a small amount of R code; others might
need more extensive changes at the C level.

Best,

luke


--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: Invisible names problem

2020-07-22 Thread luke-tierney

p(1:4, length.out=2)
   k <- c(a=1, b=2, c=3, d=4)

   x1 <- unname(k[i])
   x2 <- k[i]
   x2 <- unname(x2)

Are they identical?

   identical(x1,x2) # TRUE

but no

   identical(serialize(x1,NULL),serialize(x2,NULL)) # FALSE

But problem is with serialization type 3, cause:

   identical(serialize(x1,NULL,version = 2),serialize(x2,NULL,version =
2)) # TRUE

It seems that the second one keeps names somewhere invisibly.

Some function can lost them, e.g. head:

   identical(serialize(head(x1, 20001),NULL),serialize(head(x2,
20001),NULL)) # TRUE

But not saveRDS (so files are bigger), tibble family keeps them but base
data.frame seems to drop them.

From my test invisible names are in following cases:

  x1 <- k[i] %>% unname()
  x3 <- k[i]; x3 <- unname(x3)
  x5 <- k[i]; x5 <- `names<-`(x5, NULL)
  x6 <- k[i]; x6 <- unname(x6)

but not in this one
  x2 <- unname(k[i])
  x4 <- k[i]; names(x4) <- NULL

What kind of magick is that?

It hits us when we upgrade from 3.5 (when serialization changed) and had
impact on parallelization (cause serialized objects were bigger).

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [R-pkg-devel] [External] Re: incomplete gamma function Fortran subroutine

2020-07-21 Thread luke-tierney


Looking at this section of WRE may help:

https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Numerical-analysis-subroutines

Best,

luke

On Tue, 21 Jul 2020, Wang, Zhu wrote:


Thanks to Ben and John. Perhaps the program should call pgamma in R and pass 
the number to Fortran. Calling some Fortran subroutines older than R can 
trigger concerns when submitting the package to the CRAN.

Best,
Zhu

-Original Message-
From: John P. Nolan 
Sent: Tuesday, July 21, 2020 1:43 PM
To: Wang, Zhu ; Ben Bolker ; 
r-package-devel@r-project.org
Subject: RE: [R-pkg-devel] incomplete gamma function Fortran subroutine

As others have said, built-in function pgamma is a (normalized) version of the 
incomplete gamma function!   John

-Original Message-
From: R-package-devel  On Behalf Of 
Wang, Zhu
Sent: Tuesday, July 21, 2020 2:16 PM
To: Ben Bolker ; r-package-devel@r-project.org
Subject: Re: [R-pkg-devel] incomplete gamma function Fortran subroutine

External Email: Use caution with links and attachments.

Sorry for not making myself clear: The Fortran subroutine in an R package needs 
to call incomplete gamma function.

-Original Message-
From: R-package-devel  On Behalf Of Ben 
Bolker
Sent: Tuesday, July 21, 2020 12:54 PM
To: r-package-devel@r-project.org
Subject: Re: [R-pkg-devel] incomplete gamma function Fortran subroutine

Is there a reason not to use pgamma(), possibly adjusted by a 
(de-)normalizing constant?   (See detailed notes in ?pgamma)

On 7/21/20 1:44 PM, Wang, Zhu wrote:

Hello,

In an R function within a package, I would like to call a Fortran subroutine to 
compute lower gamma function. Any advice will be appreciated.

Thanks!

Zhu Wang

  [[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mail
man_listinfo_r-2Dpackage-2Ddevel=DwIGaQ=U0G0XJAMhEk_X0GAGzCL7Q=7
rQvU8hscCTWlvO-F5wI2-2eTiW40XI5qUKda0AnbG0=Y5sTjoEyQhEnvYqP-rmi1Pmvi
Z_5jj7ur9P8ujvLiBc=w0p6b_yBQ1jDH3amMKQGvmEKYJD-BAid_CphFO37yu0=


__
R-package-devel@r-project.org mailing list 
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dpackage-2Ddevel=DwIGaQ=U0G0XJAMhEk_X0GAGzCL7Q=7rQvU8hscCTWlvO-F5wI2-2eTiW40XI5qUKda0AnbG0=Y5sTjoEyQhEnvYqP-rmi1PmviZ_5jj7ur9P8ujvLiBc=w0p6b_yBQ1jDH3amMKQGvmEKYJD-BAid_CphFO37yu0=
__
R-package-devel@r-project.org mailing list 
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dpackage-2Ddevel=DwIGaQ=U0G0XJAMhEk_X0GAGzCL7Q=7rQvU8hscCTWlvO-F5wI2-2eTiW40XI5qUKda0AnbG0=Y5sTjoEyQhEnvYqP-rmi1PmviZ_5jj7ur9P8ujvLiBc=w0p6b_yBQ1jDH3amMKQGvmEKYJD-BAid_CphFO37yu0=
__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] [External] Re: Interpret feedback: not write testthat-tests in examples

2020-07-16 Thread luke-tierney


On Thu, 16 Jul 2020, Ben Bolker wrote:

FWIW/in defense of the OP, this is a *very* common idiom in the base R code 
base.  There may be some false positives, but


 find . -name "*.Rd" -exec grep -Fl "stopifnot(" {} \; | grep -v doc | wc

lists 187 files, e.g. from src/library/utils/man/object.size.Rd


And I probably wrote some of them, but I don't think I would now.
As a rule, I think the documentation is clearer without the tests.

On the other hand, we don't all agree on these things.



stopifnot(identical( ## assert that all three are the same :
 unique(substr(as.vector(fsl), 1,5)),
 format(round(as.vector(sl)/1024, 1


On 7/16/20 2:02 PM, luke-tier...@uiowa.edu wrote:

On Thu, 16 Jul 2020, Henrik Bengtsson wrote:


If the point of having, say,

stopifnot(add(1, 2) == sum(c(1, 2))

is to make it explicit to the reader that your add() function gives
the same results as sum(), then I argue that is valid to use in an
example.  I'm pretty sure I've used that in some of my examples.  For
the purpose, there should be no reason why you can't use other
"assert" functions for this purpose, e.g.

testthat::expect_equal(add(1, 2), sum(c(1, 2))


If the point is to communicate this to users I would write something like

## The following evaluates to TRUE:
add(1, 2) == sum(c(1, 2)

Using stopifnot just adds clutter that obscures the message for a
human reader; testthat::expect_equal even more so.

Best,

luke



Now, if the point of your "assert" statement is only to validate your
package/code, then I agree it should not be in the example code
because it adds clutter.  Such validation should be in a package test.

So, if the former, I suggest you reply to the CRAN Team and explain this.

/Henrik

On Thu, Jul 16, 2020 at 6:28 AM Richel Bilderbeek
 wrote:


Dear R package developers,

I would enjoy some help regarding some feedback I got on my package from 
a CRAN volunteer, as I am unsure how to interpret this correctly.


This is the feedback I got (I added '[do]'):


Please [do] not write testthat-tests in your examples.


I wonder if this is about using `testthat` or using tests in general.

To simplify the context, say I wrote a package with a function called 
`add`, that adds two numbers. My example code would then be something 
like this:


```
library(testthat)

expect_equal(add(1, 2), 3)
```

The first interpretation is about using `testthat`: maybe I should use 
base R (`stopifnot`) or another testing library (`testit`) or hand-craft 
it myself?


The second interpretation is about using tests in example code. I like 
to actively demonstrate that my code works as expected. I checked the 
policies regarding examples, and I could not find a rule that I should 
refrain from doing so.


What is the correct response to this feedback?

Thanks for your guidance, Richel Bilderbeek

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel





__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] [External] Re: Interpret feedback: not write testthat-tests in examples

2020-07-16 Thread luke-tierney


On Thu, 16 Jul 2020, Henrik Bengtsson wrote:


If the point of having, say,

stopifnot(add(1, 2) == sum(c(1, 2))

is to make it explicit to the reader that your add() function gives
the same results as sum(), then I argue that is valid to use in an
example.  I'm pretty sure I've used that in some of my examples.  For
the purpose, there should be no reason why you can't use other
"assert" functions for this purpose, e.g.

testthat::expect_equal(add(1, 2), sum(c(1, 2))


If the point is to communicate this to users I would write something like

## The following evaluates to TRUE:
add(1, 2) == sum(c(1, 2)

Using stopifnot just adds clutter that obscures the message for a
human reader; testthat::expect_equal even more so.

Best,

luke



Now, if the point of your "assert" statement is only to validate your
package/code, then I agree it should not be in the example code
because it adds clutter.  Such validation should be in a package test.

So, if the former, I suggest you reply to the CRAN Team and explain this.

/Henrik

On Thu, Jul 16, 2020 at 6:28 AM Richel Bilderbeek
 wrote:


Dear R package developers,

I would enjoy some help regarding some feedback I got on my package from a CRAN 
volunteer, as I am unsure how to interpret this correctly.

This is the feedback I got (I added '[do]'):


Please [do] not write testthat-tests in your examples.


I wonder if this is about using `testthat` or using tests in general.

To simplify the context, say I wrote a package with a function called `add`, 
that adds two numbers. My example code would then be something like this:

```
library(testthat)

expect_equal(add(1, 2), 3)
```

The first interpretation is about using `testthat`: maybe I should use base R 
(`stopifnot`) or another testing library (`testit`) or hand-craft it myself?

The second interpretation is about using tests in example code. I like to 
actively demonstrate that my code works as expected. I checked the policies 
regarding examples, and I could not find a rule that I should refrain from 
doing so.

What is the correct response to this feedback?

Thanks for your guidance, Richel Bilderbeek

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [Rd] [External] Re: R-devel internal errors during check produce?

2020-06-30 Thread luke-tierney

Thanks. Fixed in R-devel in r78754. This was related to a fix for
PR#17809, not the change to unique.default.

Best,

luke

On Tue, 30 Jun 2020, Jan Gorecki wrote:

No packages are being loaded, or even installed.
Did you try running the example on R-devel built with flags I have
provided in this email?
I checked now and it is required to use --enable-strict-barrier to
reproduce the issue.

On Tue, Jun 30, 2020 at 9:02 AM Martin Maechler
 wrote:

Kurt Hornik
on Tue, 30 Jun 2020 06:20:57 +0200 writes:

Jan Gorecki writes:

   >> Thank you both, You are absolutely correct that example
   >> should be minimal, so here it is.

   >> l = list(a=new.env(), b=new.env()) unique(l)

   >> Just for completeness, env_list during check that raises
   >> error

   >> env_list <- list(baseenv(),
   >>   as.environment("package:graphics"),
   >>   as.environment("package:stats"),
   >>   as.environment("package:utils"),
   >>   as.environment("package:methods") )

   >> unique(env_list)

   > Thanks ... but the above work fine for me.  E.g.,

R> l = list(a=new.env(), b=new.env())
R> unique(l)
   > [[1]] 

   > [[2]] 

   > Best -k

Ditto here;  also your (Jan) 2nd example works fine.

So, you must have loaded some (untidy) packages / code which redefine
standard base R behavior ?

Martin

__________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Possible ABI change in R 4.0.1

2020-06-29 Thread luke-tierney


EXTPTR_PTR is not in the API so it is not guaranteed to even exist in
the future. The API function for accessing the pointer address is
R_ExternalPtrAddr. See Section 5.13 in WRE.

Sometimes internals need to be changed, In this case a change was made
to deal with a segfault; the commit notice tells you the PR this
addressed.

As it says in Writing R Extensions about defining USE_RINTERNALS:

Also be prepared to adjust your code should R internals change.

The same goes for any use of non-API macros and functions.

Best,

luke


On Mon, 29 Jun 2020, Gábor Csárdi wrote:


Hi all,

it seems that from R 4.0.1 EXTPTR_PTR can be either a macro or a
function, depending on whether USE_RINTERNALS is requested.

Jeroen helped me find that this was in 78592:
https://github.com/wch/r-source/commit/c634fec5214e73747b44d7c0e6f047fefe44667d

This is a problem, because binary packages that are built on R 4.0.1
or R 4.0.2 will potentially not load on R 4.0.0, if they use the
EXTPTR_PTR function.

E.g. this is R 4.0.0 on Linux:


library(Rcpp)

Error: package or namespace load failed for ‘Rcpp’ in dyn.load(file,
DLLpath = DLLpath, ...):
unable to load shared object '/usr/local/lib/R/library/Rcpp/libs/Rcpp.so':
 Error relocating /usr/local/lib/R/library/Rcpp/libs/Rcpp.so:
EXTPTR_PTR: symbol not found
In addition: Warning message:
package ‘Rcpp’ was built under R version 4.0.1

It is easiest to reproduce this on Windows, because the CRAN binaries
are now built on R 4.0.2, so if you install Rcpp on R 4.0.0 from CRAN,
and try to load it you'll get:


library(Rcpp)

Error: package or namespace load failed for 'Rcpp' in inDL(x,
as.logical(local), as.logical(now), ...):
unable to load shared object
'C:/Users/csard/R/win-library/4.0/Rcpp/libs/x64/Rcpp.dll':
 LoadLibrary failure:  The specified procedure could not be found.
In addition: Warning message:
package 'Rcpp' was built under R version 4.0.2

I suppose this change was not intended?

Best,
Gabor

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

1 2 3 4 5 >

1 - 100 of 402 matches

Mail list logo