from:"\"luke\""

Re: [Rd] [External] API for converting LANGSXP to LISTSXP?

2024-07-06 Thread luke-tierney--- via R-devel


We have long been discouraging the use of pairlists. So no, we will
not do anything to facilitate this conversion; if anything the
opposite. SET_TYPEOF is used more than it should be in the sources.
It is something I would like us to fix sometime, but isn't high
priority.

Best,

luke

On Fri, 5 Jul 2024, Kevin Ushey wrote:


Hi,

A common idiom in the R sources is to convert objects between LANGSXP
and LISTSXP by using SET_TYPEOF. However, this is soon going to be
disallowed in packages. From what I can see, there isn't currently a
direct way to convert between these two object types using the
available API. At the R level, one can convert calls to pairlists
with:


as.call(pairlist(as.symbol("rnorm"), 42))

rnorm(42)

However, the reverse is not possible:


as.pairlist(call("rnorm", 42))

Error in as.pairlist(call("rnorm", 42)) :
 'language' object cannot be coerced to type 'pairlist'

One can do such a conversion via conversion to e.g. an intermediate R
list (VECSXP), but that seems wasteful. Would it make sense to permit
this coercion? Or, is there some other relevant API I'm missing?

For completeness, Rf_coerceVector() also emits the same error above
since it uses the same code path.

Thanks,
Kevin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Non-API updates

2024-06-25 Thread luke-tierney--- via R-devel


On Tue, 25 Jun 2024, Josiah Parry wrote:


Hey folks,

I'm sure many of you all woke to the same message I did: "Please correct
before 2024-07-09 to safely retain your package on CRAN" caused by Non-API
changes to CRAN.

This is quite unexpected as Luke Tierney's June 6th email writes (emphasis
mine):

"An *experimental* *effort* is underway to add annotations to the WRE..."

"*Once things have gelled a bit more *I hope to turn this into a blog
post that will include some examples of moving non-API entry point
uses into compliance."

Since then there has not been any indication of stabilization of the
Non-API changes nor has there been a blog post outlining how to migrate. As
things have been coming and going from the Non-API changes for quite some
time now, we (the royal we, here) have been waiting for an official
announcement from CRAN on the stabilizing changes.


I posted an update to this list a few days ago. If you missed it you
can find it in the archive.


*Can we extend this very short notice to handle the Non-API changes before
removal from CRAN? *


Timing decisions are up to CRAN.


In the case of the 3 packages I have to fix within 2 weeks, these are all
using Rust. These changes require upstream changes to the extendr library.
There are other packages that are also affected here. Making these changes
is a delicate act and requires care and focus. All of the extendr
developers work full time and cannot make addressing these changes their
only priority for the next 2 weeks.


Using non-API entry points is a choice that comes with risks. The ones
leading to WARNINGs for your packages (PRSEEN and SYMVALUE)have been
receiving NOTEs in check results for several weeks. Using
tools:::checkPkgAPI you can see that your packages are referencing a
lot of non-API entry points. Some of these may be added to the API,
but most will not. This may be a good time to look into that.

To minimize disruption we have been adding entry points to the API as
long as it is safe to do so, in some cases against our better
judgment. But ones that are unsafe to use will not be
added. Eventually their declarations will be removed from public
header files and they will be hidden when that is possible. Packages
that have chosen to use these non-API entry points will have to adapt
if they want to pass R CMD check. For now, we will try to first have
use of these entry points result in NOTEs, and then WARNINGs. Once
their declarations are removed and they are hidden, packages using
them will fail to install.


Additionally, a blog post with "examples of moving non-API entry point uses
into compliance" would be very helpful in this endeavor.


WRE now contains a section 'Moving into C API compliance'; that seems
a better option for the moment given that things are still very much
in flux. We will try to add to this section as needed. For the
specific entry points generating WARNINGs for your packages the advice
is simple: stop using them.

Best,

luke



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] non-API entry point Rf_findVarInFrame3 will be removed

2024-06-19 Thread luke-tierney--- via R-devel


The non-API entry point Rf_findVarInFrame3 used by some packages will
be removed as it is not needed in one use case and not working as
intended in the other.

The most common use case, Rf_findVarInFrame3(rho, sym, TRUE), is
equivalent to the simpler Rf_findVarInFrame(rho, sym).

The less common use case is to test for existence of a binding with

findVarInFrame(rho, sym, FALSE) != R_UnboundValue

The intent is that this have no side effects, but that is not the
case: if the binding exists and is an active binding, then its
function will be called to produce a value. This usage should be
replaced with R_existsVarInFrame(rho, sym).

R_existsVarInFrame has been marked as part of the experimental API.
It is not yet clear whether Rf_findVarInFrame will become part of an
API.  If it does, then its semantics will likely have to change; if it
does not, an alternate interface will be provided.

Best,

luke


--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] clarifying and adjusting the C API for R

2024-06-18 Thread luke-tierney--- via R-devel


Another quick update:

Over 100 entry points used in packages for which it was safe to do so
have now been marked as part of an API (in some cases after adding
error checking of arguments). These can be used in package C code,
with caveats for ones considered experimental or intended for embedded
use.

The remaining 100 or so non-API entry points used in packages will
require changes in package C code. In some cases the API already
provides safe alternatives to unsafe internal entry points.  In most
other cases it should be possible to develop safer interfaces that
allow packages to accomplish what they need to do in a more robust
way, while giving R maintainers and developers the freedom to make
needed internal changes without disrupting package space.

It will take some time to develop these new interfaces. 'Writing R
extensions' now has a new section 'Moving into C API compliance' that
should help with adapting to these changes.

Best,

luke

On Thu, 6 Jun 2024, luke-tier...@uiowa.edu wrote:


This is an update on some current work on the C API for use in R
extensions.

The internal R implementation makes use of tens of thousands of C
entry points. On Linux and Windows, which support visibility
restrictions, most of these are visible only within the R executble or
shared library. About 1500 are not hidden and are visible to
dynamically loaded shared libraries, such as ones in packages, and to
embedding applications.

There are two main reasons for limiting access to entry points in a
software framework:

- Some entry points are very easy to use in ways that corrupt internal
 data, leading to segfaults or, worse, incorrect computations without
 segfaults.

- Some entry point expose internal structure and other implementation
 details, which makes it hard to make improvements without breaking
 client code that has come to depend on these details.

The API of C entry points that can be used in R extensions, both for
packages and embedding, has evolved organically over many years. The
definition for the current release expressed in the Writing R
Extensions manual (WRE) is roughly:

   An entry point can be used if (1) it is declared in a header file
   in R.home("include"), and (2) if it is documented for use in WRE.

Ideally, (1) would be necessary and sufficient, but for a variety of
reasons that isn't achievable, at least not in the near term. (2) can
be challenging to determine; in particular, it is not amenable to a
computational answer.

An experimental effort is underway to add annotations to the WRE
Texinfo source to allow (2) to be answered unambiguously. The
annotations so far mostly reflect my reading or WRE and may be revised
as they are reviewed by others. The annotated document can be used for
programmatically identifying what is currently considered part of the C
API. The result so far is an experimental function tools:::funAPI():

   > head(tools:::funAPI())
 nameloc apitype
   1 Rf_AdobeSymbol2utf8 R_ext/GraphicsDevice.heapi
   2alloc3DArrayWRE api
   3  allocArrayWRE api
   4   allocLangWRE api
   5   allocListWRE api
   6 allocMatrixWRE api

The 'apitype' field has three possible levels

   | api  | stable (ideally) API |
   | eapi | experimental API |
   | emb  | embedding API|

Entry points in the embedded API would typically only be used in
applications embedding R or providing new front ends, but might be
reasonable to use in packages that support embedding.

The 'loc' field indicates how the entry point is identified as part of
an API: explicit mention in WRE, or declaration in a header file
identified as fully part of an API.

[tools:::funAPI() may not be completely accurate as it relies on
regular expressions for examining header files considered part of the
API rather than proper parsing. But it seems to be pretty close to
what can be achieved with proper parsing.  Proper parsing would add
dependencies on additional tools, which I would like to avoid for
now. One dependency already present is that a C compiler has to be on
the search path and cc -E has to run the C pre-processor.]

Two additional experimental functions are available for analyzing
package compliance: tools:::checkPkgAPI and tools:::checkAllPkgsAPI.
These examine installed packages.

[These may produce some false positives on macOS; they may or may not
work on Windows at this point.]

Using these tools initially showed around 200 non-API entry points
used across packages on CRAN and BIOC. Ideally this number should be
reduced to zero. This will require a combination of additions to the
API and changes in packages.

Some entry points can safely be added to the API. Around 40 have
already been added to WRE with API annotations; another 40 or so

Re: [Rd] [External] Re: changes in R-devel and zero-extent objects in Rcpp

2024-06-08 Thread luke-tierney--- via R-devel


On Sat, 8 Jun 2024, Ben Bolker wrote:

 The ASAN errors occur *even if the zero-length object is not actually 
accessed*/is used in a perfectly correct manner, i.e. it's perfectly legal in 
base R to define `m <- numeric(0)` or `m <- matrix(nrow = 0, ncol = 0)`, 
whereas doing the equivalent in Rcpp will (now) lead to an ASAN error.


 i.e., these are *not* previously cryptic out-of-bounds accesses that are 
now being revealed, but instead sensible and previously legal definitions of 
zero-length objects that are now causing problems.


  I'm pretty sure I'm right about this, but it's absolutely possible that 
I'm just confused at this point; I don't have a super-simple example to show 
you at the moment. The closest is this example by Mikael Jagan: 
https://github.com/lme4/lme4/issues/794#issuecomment-2155093049


 which shows that if x is a pointer to a zero-length vector (in plain C++ 
for R, no Rcpp is involved), DATAPTR(x) and REAL(x) evaluate to different 
values.


 Mikael further points out that "Rcpp seems to cast a (void *) returned by 
DATAPTR to (double *) when constructing a Vector from a SEXP, rather 
than using the (double *) returned by REAL." So perhaps R-core doesn't want 
to guarantee that these operations give identical answers, in which case Rcpp 
will have to change the way it does things ...


It looks like REAL and friends should also get this check, but it's
not high priority at this point, at least to me. DATAPTR has been
using this check for a while in a barrier build, so you might want to
test there as well. I expect we will activate more integrity checks
from the barrier build on the API client side as things are tidied up.

However: DATAPTR is not in the API and can't be at least in this form:
It allows access to a writable pointer to STRSXP and VECSXP data and
that is too dangerous for memory manager integrity. I'm not sure
exactly how this will be resolve, but be prepared for changes.

Best,

luke



 cheers
  Ben



On 2024-06-08 6:39 p.m., Kevin Ushey wrote:

IMHO, this should be changed in both Rcpp and downstream packages:

1. Rcpp could check for out-of-bounds accesses in cases like these, and 
emit an R warning / error when such an access is detected;


2. The downstream packages unintentionally making these out-of-bounds 
accesses should be fixed to avoid doing that.


That is, I think this is ultimately a bug in the affected packages, but 
Rcpp could do better in detecting and handling this for client packages 
(avoiding a segfault).


Best,
Kevin


On Sat, Jun 8, 2024, 3:06 PM Ben Bolker <mailto:bbol...@gmail.com>> wrote:



     A change to R-devel (SVN r86629 or
https://github.com/r-devel/r-svn/commit/92c1d5de23c93576f55062e26d446feface07250 
<https://github.com/r-devel/r-svn/commit/92c1d5de23c93576f55062e26d446feface07250>

has changed the handling of pointers to zero-length objects, leading to
ASAN issues with a number of Rcpp-based packages (the commit message
reads, in part, "Also define STRICT_TYPECHECK when compiling
inlined.c.")

    I'm interested in discussion from the community.

    Details/diagnosis for the issues in the lme4 package are here:
https://github.com/lme4/lme4/issues/794
<https://github.com/lme4/lme4/issues/794>, 
with a bit more discussion

about how zero-length objects should be handled.

    The short(ish) version is that r86629 enables the
CATCH_ZERO_LENGTH_ACCESS definition. This turns on the CHKZLN macro
<https://github.com/r-devel/r-svn/blob/4ef83b9dc3c6874e774195d329cbb6c11a71c414/src/main/memory.c#L4090-L4104 
<https://github.com/r-devel/r-svn/blob/4ef83b9dc3c6874e774195d329cbb6c11a71c414/src/main/memory.c#L4090-L4104>>,

which returns a trivial pointer (rather than the data pointer that
would
be returned in the normal control flow) if an object has length 0:

/* Attempts to read or write elements of a zero length vector will
     result in a segfault, rather than read and write random memory.
     Returning NULL would be more natural, but Matrix seems to assume
     that even zero-length vectors have non-NULL data pointers, so
     return (void *) 1 instead. Zero-length CHARSXP objects still have 
a

     trailing zero byte so they are not handled. */

    In the Rcpp context this leads to an inconsistency, where `REAL(x)`
is a 'real' external pointer and `DATAPTR(x)` is 0x1, which in turn
leads to ASAN warnings like

runtime error: reference binding to misaligned address 0x0001
for type 'const double', which requires 8 byte alignment
0x0001: note: pointer points here

     I'm in over my head and hoping for insight into whether this
problem
should be resolved by changing R, Rcpp, or downstream Rcpp packages ...

    cheers
     Ben

Re: [Rd] [External] Re: clarifying and adjusting the C API for R

2024-06-07 Thread luke-tierney--- via R-devel


On Sat, 8 Jun 2024, Reed A. Cartwright wrote:


[You don't often get email from racartwri...@gmail.com. Learn why this is 
important at https://aka.ms/LearnAboutSenderIdentification ]

Would it be reasonable to move the non-API stuff that cannot be hidden
into header files inside a "details" directory (or some other specific
naming scheme)?

That's what I use when I need to separate a public API from an internal API.


As do I, as does everyone else. As I wrote originally: " ... for a
variety of reasons that isn't achievable, at least not in the near
term." Can we leave it at that please?

luke



On Fri, Jun 7, 2024 at 7:30 AM luke-tierney--- via R-devel
 wrote:


On Fri, 7 Jun 2024, Steven Dirkse wrote:


You don't often get email from sdir...@gams.com. Learn why this is important
Thanks for sharing this overview of an interesting and much-needed project.
You mention that R exports about 1500 symbols (on platforms supporting
visibility) but this subject isn't mentioned explicitly again in your note,
so I'm wondering how things tie together.  Un-exported symbols cannot be
part of the API - how would people use them in this case?  In a perfect
world the set of exported symbols could define the API or match it exactly,
but I guess that isn't the case at present.  So I conclude that R exports
extra (i.e. non-API) symbols.  Is part of the goal to remove these extra
exports?


No. We'll hide what we can, but base packages for one need access to
some entry points that should not be in the API, so those have to stay
un-hidden.

Best,

luke



-Steve

On Thu, Jun 6, 2024 at 10:47 AM luke-tierney--- via R-devel
 wrote:
  This is an update on some current work on the C API for use in R
  extensions.

  The internal R implementation makes use of tens of thousands of
  C
  entry points. On Linux and Windows, which support visibility
  restrictions, most of these are visible only within the R
  executble or
  shared library. About 1500 are not hidden and are visible to
  dynamically loaded shared libraries, such as ones in packages,
  and to
  embedding applications.

  There are two main reasons for limiting access to entry points
  in a
  software framework:

  - Some entry points are very easy to use in ways that corrupt
  internal
 data, leading to segfaults or, worse, incorrect computations
  without
 segfaults.

  - Some entry point expose internal structure and other
  implementation
 details, which makes it hard to make improvements without
  breaking
 client code that has come to depend on these details.

  The API of C entry points that can be used in R extensions, both
  for
  packages and embedding, has evolved organically over many years.
  The
  definition for the current release expressed in the Writing R
  Extensions manual (WRE) is roughly:

   An entry point can be used if (1) it is declared in a
  header file
   in R.home("include"), and (2) if it is documented for use
  in WRE.

  Ideally, (1) would be necessary and sufficient, but for a
  variety of
  reasons that isn't achievable, at least not in the near term.
  (2) can
  be challenging to determine; in particular, it is not amenable
  to a
  computational answer.

  An experimental effort is underway to add annotations to the WRE
  Texinfo source to allow (2) to be answered unambiguously. The
  annotations so far mostly reflect my reading or WRE and may be
  revised
  as they are reviewed by others. The annotated document can be
  used for
  programmatically identifying what is currently considered part
  of the C
  API. The result so far is an experimental function
  tools:::funAPI():

  > head(tools:::funAPI())
   nameloc apitype
   1 Rf_AdobeSymbol2utf8 R_ext/GraphicsDevice.heapi
   2alloc3DArrayWRE api
   3  allocArrayWRE api
   4   allocLangWRE api
   5   allocListWRE api
   6 allocMatrixWRE api

  The 'apitype' field has three possible levels

   | api  | stable (ideally) API |
   | eapi | experimental API |
   | emb  | embedding API|

  Entry points in the embedded API would typically only be used in
  applications embedding R or providing new front ends, but might
  be
  reasonable to use in packages that support embedding.

  The 'loc' field indicates how the entry point is identified as
  part of
  an API: explicit mention in WRE, or declaration in a header file
  identifie

Re: [Rd] [External] Re: clarifying and adjusting the C API for R

2024-06-07 Thread luke-tierney--- via R-devel


On Fri, 7 Jun 2024, Hadley Wickham wrote:


Thanks for working on this Luke! We appreciate your efforts to make it
easier to tell what's in the exported API and we're very happy to work with
you on any changes needed to tidyverse/r-lib packages.
Hadley


Thanks. Glad to hear -- I may be reminding you when we hit some of the
tougher challenges down the road :-)

Best,

luke



On Thu, Jun 6, 2024 at 9:47 AM luke-tierney--- via R-devel
 wrote:
  This is an update on some current work on the C API for use in R
  extensions.

  The internal R implementation makes use of tens of thousands of
  C
  entry points. On Linux and Windows, which support visibility
  restrictions, most of these are visible only within the R
  executble or
  shared library. About 1500 are not hidden and are visible to
  dynamically loaded shared libraries, such as ones in packages,
  and to
  embedding applications.

  There are two main reasons for limiting access to entry points
  in a
  software framework:

  - Some entry points are very easy to use in ways that corrupt
  internal
     data, leading to segfaults or, worse, incorrect computations
  without
     segfaults.

  - Some entry point expose internal structure and other
  implementation
     details, which makes it hard to make improvements without
  breaking
     client code that has come to depend on these details.

  The API of C entry points that can be used in R extensions, both
  for
  packages and embedding, has evolved organically over many years.
  The
  definition for the current release expressed in the Writing R
  Extensions manual (WRE) is roughly:

       An entry point can be used if (1) it is declared in a
  header file
       in R.home("include"), and (2) if it is documented for use
  in WRE.

  Ideally, (1) would be necessary and sufficient, but for a
  variety of
  reasons that isn't achievable, at least not in the near term.
  (2) can
  be challenging to determine; in particular, it is not amenable
  to a
  computational answer.

  An experimental effort is underway to add annotations to the WRE
  Texinfo source to allow (2) to be answered unambiguously. The
  annotations so far mostly reflect my reading or WRE and may be
  revised
  as they are reviewed by others. The annotated document can be
  used for
  programmatically identifying what is currently considered part
  of the C
  API. The result so far is an experimental function
  tools:::funAPI():

       > head(tools:::funAPI())
                       name                    loc apitype
       1 Rf_AdobeSymbol2utf8 R_ext/GraphicsDevice.h    eapi
       2        alloc3DArray                    WRE     api
       3          allocArray                    WRE     api
       4           allocLang                    WRE     api
       5           allocList                    WRE     api
       6         allocMatrix                    WRE     api

  The 'apitype' field has three possible levels

       | api  | stable (ideally) API |
       | eapi | experimental API     |
       | emb  | embedding API        |

  Entry points in the embedded API would typically only be used in
  applications embedding R or providing new front ends, but might
  be
  reasonable to use in packages that support embedding.

  The 'loc' field indicates how the entry point is identified as
  part of
  an API: explicit mention in WRE, or declaration in a header file
  identified as fully part of an API.

  [tools:::funAPI() may not be completely accurate as it relies on
  regular expressions for examining header files considered part
  of the
  API rather than proper parsing. But it seems to be pretty close
  to
  what can be achieved with proper parsing.  Proper parsing would
  add
  dependencies on additional tools, which I would like to avoid
  for
  now. One dependency already present is that a C compiler has to
  be on
  the search path and cc -E has to run the C pre-processor.]

  Two additional experimental functions are available for
  analyzing
  package compliance: tools:::checkPkgAPI and
  tools:::checkAllPkgsAPI.
  These examine installed packages.

  [These may produce some false positives on macOS; they may or
  may not
  work on Windows at this point.]

  Using these tools initially showed around 200 non-API entry
  points
  used across packages on CRAN and BIOC. Ideally this number
  should be
  reduced to zero. This will require a combination of additions to
  the
  API and changes in packages.

  Some entry points can safely be added to the API. Around 40 have
  already be

Re: [Rd] [External] Re: clarifying and adjusting the C API for R

2024-06-07 Thread luke-tierney--- via R-devel


On Fri, 7 Jun 2024, Steven Dirkse wrote:


You don't often get email from sdir...@gams.com. Learn why this is important
Thanks for sharing this overview of an interesting and much-needed project.
You mention that R exports about 1500 symbols (on platforms supporting
visibility) but this subject isn't mentioned explicitly again in your note,
so I'm wondering how things tie together.  Un-exported symbols cannot be
part of the API - how would people use them in this case?  In a perfect
world the set of exported symbols could define the API or match it exactly,
but I guess that isn't the case at present.  So I conclude that R exports
extra (i.e. non-API) symbols.  Is part of the goal to remove these extra
exports?


No. We'll hide what we can, but base packages for one need access to
some entry points that should not be in the API, so those have to stay
un-hidden.

Best,

luke



-Steve

On Thu, Jun 6, 2024 at 10:47 AM luke-tierney--- via R-devel
 wrote:
  This is an update on some current work on the C API for use in R
  extensions.

  The internal R implementation makes use of tens of thousands of
  C
  entry points. On Linux and Windows, which support visibility
  restrictions, most of these are visible only within the R
  executble or
  shared library. About 1500 are not hidden and are visible to
  dynamically loaded shared libraries, such as ones in packages,
  and to
  embedding applications.

  There are two main reasons for limiting access to entry points
  in a
  software framework:

  - Some entry points are very easy to use in ways that corrupt
  internal
     data, leading to segfaults or, worse, incorrect computations
  without
     segfaults.

  - Some entry point expose internal structure and other
  implementation
     details, which makes it hard to make improvements without
  breaking
     client code that has come to depend on these details.

  The API of C entry points that can be used in R extensions, both
  for
  packages and embedding, has evolved organically over many years.
  The
  definition for the current release expressed in the Writing R
  Extensions manual (WRE) is roughly:

       An entry point can be used if (1) it is declared in a
  header file
       in R.home("include"), and (2) if it is documented for use
  in WRE.

  Ideally, (1) would be necessary and sufficient, but for a
  variety of
  reasons that isn't achievable, at least not in the near term.
  (2) can
  be challenging to determine; in particular, it is not amenable
  to a
  computational answer.

  An experimental effort is underway to add annotations to the WRE
  Texinfo source to allow (2) to be answered unambiguously. The
  annotations so far mostly reflect my reading or WRE and may be
  revised
  as they are reviewed by others. The annotated document can be
  used for
  programmatically identifying what is currently considered part
  of the C
  API. The result so far is an experimental function
  tools:::funAPI():

       > head(tools:::funAPI())
                       name                    loc apitype
       1 Rf_AdobeSymbol2utf8 R_ext/GraphicsDevice.h    eapi
       2        alloc3DArray                    WRE     api
       3          allocArray                    WRE     api
       4           allocLang                    WRE     api
       5           allocList                    WRE     api
       6         allocMatrix                    WRE     api

  The 'apitype' field has three possible levels

       | api  | stable (ideally) API |
       | eapi | experimental API     |
       | emb  | embedding API        |

  Entry points in the embedded API would typically only be used in
  applications embedding R or providing new front ends, but might
  be
  reasonable to use in packages that support embedding.

  The 'loc' field indicates how the entry point is identified as
  part of
  an API: explicit mention in WRE, or declaration in a header file
  identified as fully part of an API.

  [tools:::funAPI() may not be completely accurate as it relies on
  regular expressions for examining header files considered part
  of the
  API rather than proper parsing. But it seems to be pretty close
  to
  what can be achieved with proper parsing.  Proper parsing would
  add
  dependencies on additional tools, which I would like to avoid
  for
  now. One dependency already present is that a C compiler has to
  be on
  the search path and cc -E has to run the C pre-processor.]

  Two additional experimental functions are available for
  analyzing
  package compliance: tools:::checkPkgAPI and
  tools:::checkAl

[Rd] clarifying and adjusting the C API for R

2024-06-06 Thread luke-tierney--- via R-devel


This is an update on some current work on the C API for use in R
extensions.

The internal R implementation makes use of tens of thousands of C
entry points. On Linux and Windows, which support visibility
restrictions, most of these are visible only within the R executble or
shared library. About 1500 are not hidden and are visible to
dynamically loaded shared libraries, such as ones in packages, and to
embedding applications.

There are two main reasons for limiting access to entry points in a
software framework:

- Some entry points are very easy to use in ways that corrupt internal
  data, leading to segfaults or, worse, incorrect computations without
  segfaults.

- Some entry point expose internal structure and other implementation
  details, which makes it hard to make improvements without breaking
  client code that has come to depend on these details.

The API of C entry points that can be used in R extensions, both for
packages and embedding, has evolved organically over many years. The
definition for the current release expressed in the Writing R
Extensions manual (WRE) is roughly:

An entry point can be used if (1) it is declared in a header file
in R.home("include"), and (2) if it is documented for use in WRE.

Ideally, (1) would be necessary and sufficient, but for a variety of
reasons that isn't achievable, at least not in the near term. (2) can
be challenging to determine; in particular, it is not amenable to a
computational answer.

An experimental effort is underway to add annotations to the WRE
Texinfo source to allow (2) to be answered unambiguously. The
annotations so far mostly reflect my reading or WRE and may be revised
as they are reviewed by others. The annotated document can be used for
programmatically identifying what is currently considered part of the C
API. The result so far is an experimental function tools:::funAPI():

> head(tools:::funAPI())
 nameloc apitype
1 Rf_AdobeSymbol2utf8 R_ext/GraphicsDevice.heapi
2alloc3DArrayWRE api
3  allocArrayWRE api
4   allocLangWRE api
5   allocListWRE api
6 allocMatrixWRE api

The 'apitype' field has three possible levels

| api  | stable (ideally) API |
| eapi | experimental API |
| emb  | embedding API|

Entry points in the embedded API would typically only be used in
applications embedding R or providing new front ends, but might be
reasonable to use in packages that support embedding.

The 'loc' field indicates how the entry point is identified as part of
an API: explicit mention in WRE, or declaration in a header file
identified as fully part of an API.

[tools:::funAPI() may not be completely accurate as it relies on
regular expressions for examining header files considered part of the
API rather than proper parsing. But it seems to be pretty close to
what can be achieved with proper parsing.  Proper parsing would add
dependencies on additional tools, which I would like to avoid for
now. One dependency already present is that a C compiler has to be on
the search path and cc -E has to run the C pre-processor.]

Two additional experimental functions are available for analyzing
package compliance: tools:::checkPkgAPI and tools:::checkAllPkgsAPI.
These examine installed packages.

[These may produce some false positives on macOS; they may or may not
work on Windows at this point.]

Using these tools initially showed around 200 non-API entry points
used across packages on CRAN and BIOC. Ideally this number should be
reduced to zero. This will require a combination of additions to the
API and changes in packages.

Some entry points can safely be added to the API. Around 40 have
already been added to WRE with API annotations; another 40 or so can
probably be added after review.

The remainder mostly fall into two groups:

- Entry points that should never be used in packages, such as
  SET_OBJECT or SETLENGTH (or any non-API SETXYZ functions for that
  matter) that can create inconsistent or corrupt internal state.

- Entry points that depend on the existence of internal structure that
  might be subject to change, such as the existence of promise objects
  or internal structure of environments.

Many, if not most, of these seem to be used in idioms that can either
be accomplished with existing higher-level functions already in the
API, or by new higher level functions that can be created and
added. Working through these will take some time and coordination
between R-core and maintainers of affected packages.

Once things have gelled a bit more I hope to turn this into a blog
post that will include some examples of moving non-API entry point
uses into compliance.

Best,

luke

--
Luke Tierney
Ralph E. Wareham Professor of Ma

Re: [Rd] [External] R hang/bug with circular references and promises

2024-05-13 Thread luke-tierney--- via R-devel

On Mon, 13 May 2024, Ivan Krylov wrote:

[You don't often get email from ikry...@disroot.org. Learn why this is 
important at https://aka.ms/LearnAboutSenderIdentification ]

On Mon, 13 May 2024 09:54:27 -0500 (CDT)
luke-tierney--- via R-devel  wrote:

Looks like I added that warning 22 years ago, so that should be enough
notice :-). I'll look into removing it now.

For now I have just changed the internal code to throw an error
if the change would produce a cycle (r86545). This gives

> e <- new.env()
> parent.env(e) <- e
Error in `parent.env<-`(`*tmp*`, value = ) :
  cycles in parent chains are not allowed

Dear Luke,

I've got a somewhat niche use case: as a way of protecting myself
against rogue *.rds files and vulnerabilities in the C code, I've been
manually unserializing "plain" data objects (without anything
executable), including environments, in R [1].

I would try using two passes: create the environments in the first pass
and in a second pass, either over the file or a new object with place holders, 
fill them in.

I see that SET_ENCLOS() is already commented as "not API and probably
should not be <...> used". Do you think there is a way to recreate an
environment, taking the REFSXP entries into account, without
`parent.env<-`?  Would you recommend to abandon the folly of
unserializing environments manually?

SET_ENCLOS is one of a number of SET... functions that are not in the
API and should not be since they are potentially unsafe to use. (One
that is in the API and needs to be removed is SET_TYPEOF). So we would
like to move them out of installed headers and not export them as
entry points. For this particular case most uses I see are something
like

env = allocSExp(ENVSXP);
SET_FRAME(env, R_NilValue);
SET_ENCLOS(env, parent);
SET_HASHTAB(env, R_NilValue);
SET_ATTRIB(env, R_NilValue);

which could just use

 env = R_NewEnv(parent, FALSE, 0);

Best,

luke

--
Best regards,
Ivan

[1]
https://codeberg.org/aitap/unserializeData/src/commit/33d72705c1ee265349b3e369874ce4b47f9cd358/R/unserialize.R#L289-L313

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] R hang/bug with circular references and promises

2024-05-13 Thread luke-tierney--- via R-devel


On Sat, 11 May 2024, Peter Langfelder wrote:


On Sat, May 11, 2024 at 9:34 AM luke-tierney--- via R-devel
 wrote:


On Sat, 11 May 2024, Travers Ching wrote:


The following code snippet causes R to hang. This example might be a
bit contrived as I was experimenting and trying to understand
promises, but uses only base R.


This has nothing to do with promises. You created a cycle in the
environment chain. A simpler variant:

e <- new.env()
parent.env(e) <- e
get("x", e)

This will hang and is not interruptable -- loops searching up
environment chains are too speed-critical to check for interrupts.  It
is, however, pretty easy to check whether the parent change would
create a cycle and throw an error if it would. Need to think a bit
about exactly where the check should go.


FWIW, the help for parent.env already explicitly warns against using
parent.env <-:

The replacement function ‘parent.env<-’ is extremely dangerous as
it can be used to destructively change environments in ways that
violate assumptions made by the internal C code.  It may be
removed in the near future.


Looks like I added that warning 22 years ago, so that should be enough
notice :-). I'll look into removing it now.

Best,

luke



Peter



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] R hang/bug with circular references and promises

2024-05-10 Thread luke-tierney--- via R-devel


On Sat, 11 May 2024, Travers Ching wrote:


The following code snippet causes R to hang. This example might be a
bit contrived as I was experimenting and trying to understand
promises, but uses only base R.

It looks like it is looking for "not_a_variable" recursively but since
it doesn't exist it goes on indefinitely.

x0 <- new.env()
x1 <- new.env(parent = x0)
parent.env(x0) <- x1
delayedAssign("v", not_a_variable, eval.env=x1)
delayedAssign("w", v, assign.env=x1, eval.env=x0)
x1$w


This has nothing to do with promises. You created a cycle in the
environment chain. A simpler variant:

e <- new.env()
parent.env(e) <- e
get("x", e)

This will hang and is not interruptable -- loops searching up
environment chains are too speed-critical to check for interrupts.  It
is, however, pretty easy to check whether the parent change would
create a cycle and throw an error if it would. Need to think a bit
about exactly where the check should go.

Best,

luke



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: Patches for CVE-2024-27322

2024-04-30 Thread Tierney, Luke via R-devel

That should do it

Sent from my iPad

On Apr 30, 2024, at 9:57 AM, Iñaki Ucar  wrote:


Many thanks both. I'll wait for Luke's confirmation to trigger the update with 
the backported fix.

Iñaki

On Tue, 30 Apr 2024 at 12:42, Dirk Eddelbuettel 
mailto:e...@debian.org>> wrote:

On 30 April 2024 at 11:59, peter dalgaard wrote:
| svn diff -c 86235 ~/r-devel/R

Which is also available as
  
https://github.com/r-devel/r-svn/commit/f7c46500f455eb4edfc3656c3fa20af61b16abb7

Dirk

| (or 86238 for the port to the release branch) should be easily backported.
|
| (CC Luke in case there is more to it)
|
| - pd
|
| > On 30 Apr 2024, at 11:28 , Iñaki Ucar 
mailto:iu...@fedoraproject.org>> wrote:
| >
| > Dear R-core,
| >
| > I just received notification of CVE-2024-27322 [1] in RedHat's Bugzilla. We
| > updated R to v4.4.0 in Fedora rawhide, F40, EPEL9 and EPEL8, so no problem
| > there. However, F38 and F39 will stay at v4.3.3, and I was wondering if
| > there's a specific patch available, or if you could point me to the commits
| > that fixed the issue, so that we can cherry-pick them for F38 and F39.
| > Thanks.
| >
| > [1] https://nvd.nist.gov/vuln/detail/CVE-2024-27322
| >
| > Best,
| > --
| > Iñaki Úcar
| >
| > [[alternative HTML version deleted]]
| >
| > __
| > R-devel@r-project.org<mailto:R-devel@r-project.org> mailing list
| > https://stat.ethz.ch/mailman/listinfo/r-devel
|
| --
| Peter Dalgaard, Professor,
| Center for Statistics, Copenhagen Business School
| Solbjerg Plads 3, 2000 Frederiksberg, Denmark
| Phone: (+45)38153501
| Office: A 4.23
| Email: pd@cbs.dk<mailto:pd@cbs.dk>  Priv: 
pda...@gmail.com<mailto:pda...@gmail.com>
|
| __
| R-devel@r-project.org<mailto:R-devel@r-project.org> mailing list
| https://stat.ethz.ch/mailman/listinfo/r-devel

--
dirk.eddelbuettel.com<http://dirk.eddelbuettel.com/> | @eddelbuettel | 
e...@debian.org<mailto:e...@debian.org>


--
Iñaki Úcar

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] View() segfaulting ...

2024-04-25 Thread luke-tierney--- via R-devel


I saw it also on some of my Ubuntu builds, but the issue went away
after a make clean/make, so maybe give that a try.

Best,

luke

On Wed, 24 Apr 2024, Ben Bolker wrote:

 I'm using bleeding-edge R-devel, so maybe my build is weird. Can anyone 
else reproduce this?


 View() seems to crash on just about anything.

View(1:3)
*** stack smashing detected ***: terminated
Aborted (core dumped)

 If I debug(View) I get to the last line of code with nothing obviously 
looking pathological:


Browse[1]>
debug: invisible(.External2(C_dataviewer, x, title))
Browse[1]> x
$x
[1] "1" "2" "3"

Browse[1]> title
[1] "Data: 1:3"
Browse[1]>
*** stack smashing detected ***: terminated
Aborted (core dumped)




R Under development (unstable) (2024-04-24 r86483)
Platform: x86_64-pc-linux-gnu
Running under: Pop!_OS 22.04 LTS

Matrix products: default
BLAS/LAPACK: 
/usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK 
version 3.10.0


locale:
[1] LC_CTYPE=en_CA.UTF-8   LC_NUMERIC=C
[3] LC_TIME=en_CA.UTF-8LC_COLLATE=en_CA.UTF-8
[5] LC_MONETARY=en_CA.UTF-8LC_MESSAGES=en_CA.UTF-8
[7] LC_PAPER=en_CA.UTF-8   LC_NAME=C
[9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C

time zone: America/Toronto
tzcode source: system (glibc)

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.5.0

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: Is ALTREP "non-API"?

2024-04-24 Thread luke-tierney--- via R-devel


On Wed, 24 Apr 2024, Hadley Wickham wrote:


A few more thoughts based on a simple question: how do you determine the
length of a vector?

Rf_length() is used in example code in R-exts, but I don't think it's
formally documented anywhere (although it's possible I missed it). Is using
in an example sufficient to consider a function to be part of the public
API? If so, SET_TYPEOF() is used in a number of examples, and hence used by
CRAN packages, but is no longer considered part of the public API.

Rf_xlength() doesn't appear to be mentioned anywhere in R-exts. Does this
imply that long vectors are not part of the exported API? Or is there some
other way we should be determining the length of such vectors?

Are the macro variants LENGTH and XLENGTH part of the exported API? Are we
supposed to use them or avoid them?

Relatedly, I presume that LOGICAL() is the way we're supposed to extract
logical values from a vector, but it isn't documented in R-exts, suggesting
that it's not part of the public API?


My pragmatic approach to deciding if an entry point is usable in a
package is to

grep for it in the installed headers

grep for it in WRE

if those are good, check the text in both places to make sure it
doesn't tell me not to use is

The first two can be automated; the text reading can't for now.

One place this runs into trouble is when the prose in WRE doesn't
explicitly mention the entry point, but says something like 'this one
and similar ones are OK'. A couple of years ago I worked on improving
some of those by explicitly adding some of those implicit ones, which
did sometimes make the text more cumbersome. I'm pretty sure I added
LOGICAL() and RAW() at that point (but may be mis-remebering); they
are there now. In some other cases I left the text alone but added
index entries. That makes them findable with a text search. I think I
got most that can be handled that way, but there may be some others
left. Far from ideal, but at least a step forward.



---

It's also worth pointing out where R-exts does well, with the documentation
of utility functions (
https://cran.r-project.org/doc/manuals/R-exts.html#Utility-functions). I
think this is what most people would consider documentation to imply, i.e.
a list of input arguments/types, the output type, and basic notes on their
operation.
---

Finally, it's worth noting that there's some lingering ill feelings over
how the connections API was treated. It was documented in R-exts only to be
later removed, including expunging mentions of it in the news. That's
obviously water under the bridge, but I do believe that there is
the potential for the R core team to build goodwill with the community if
they are willing to engage a bit more with the users of their APIs.


As you well know R-core is not a monolith. There are several R-core
members who also are not happy about how that played out and where
that stands now. But there was and is no viable option other than to
agree to disagree. There is really no upside to re-litigating this
now.

Best,

luke



Hadley

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: Is ALTREP "non-API"?

2024-04-24 Thread luke-tierney--- via R-devel

o learning that we weren't
supposed to.


Making a list and hoping that it will remain up to date is not
realistic.  The only way that would work reliably is if the list could
be programmatically generated, for example by parsing installed
headers for declarations and caveats as above. Which would be possible
with changes like the ones listed above.

Best,

luke



Hadley





--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu/
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Calling applyClosure from a package?

2024-04-14 Thread luke-tierney--- via R-devel


On Sun, 14 Apr 2024, Matthew Kay wrote:


[You don't often get email from matthew@u.northwestern.edu. Learn why this 
is important at https://aka.ms/LearnAboutSenderIdentification ]

Hi,

Short version of my question: Rf_applyClosure was marked
attribute_hidden in Oct 2023, and I am curious why and if there is an
alternative interface to it planned.


applyClosure has never been part of the API and was/is not intended
for use by packages. Keeping things like this internal is essential to
give us flexibility to make needed improvements to the basic engine.
Moving this out of the installed headers and marking it as not to be
exported merely clarifies that it is internal.


Long version:

I have been toying with building a package that makes it easier to do
non-standard evaluation directly using promises, rather than wrapping
these in a custom type (like e.g. rlang does). The advantage of this
approach is that it should be fully compatible with functions that use
the standard R functions for NSE and inspecting function context, like
substitute(), match.call(), or parent.frame(). And indeed, it works!
-- in R 4.3, that is. The prototype version of the package is here:
https://github.com/mjskay/uneval  (the relevant function to my
question is probably do_invoke, in R/invoke.R).

While testing on R-devel, I noticed that Rf_applyClosure(), which used
to be exported, is now marked with attribute_hidden. I traced the
change to this commit in Oct 2023:
https://github.com/r-devel/r-svn/commit/57dbe8ad471c8a34314ee74362ad479db03c033a

However, the commit message did not give me clarity on the reason for
the change, and I have not been able to find mention of this change in
R-devel, R-package-devel, or the R bug tracker.
So, I am curious why this function is no longer exported and if there
is an alternative function planned to take its place.

Neither Rf_eval nor do.call can do what I need to fully support
rlang-style NSE using base R. The problem is that I need to be able to
manually set up the list of promises provided as arguments to the
function.

I fully understand that the answer to my question might be "don't do
that" ;).


That would be my advice: Don't do that. The API does not provide an
interface for working with promises; in fact the existence of promises
is not guaranteed in the future. Some packages have unfortunately made
use of some internal functions related to promises. For the ones on
CRAN we will work with the maintainers to find alternate
approaches. This may mean adding some functions to the API for dealing
with some lazy-evaluation-related features at a higher level.

Best,

luke


But I will humbly suggest that it would be really nice to be
able to do NSE that can capture expressions with heterogeneous
environments and pass these to functions in a way that is compatible
with existing R functions that do NSE. The basic tools to do it are
there in R 4.3, I think...

Thanks for the help!

---Matt

--
Matthew Kay
Associate Professor
Computer Science & Communication Studies
Northwestern University
matthew@u.northwestern.edu
http://www.mjskay.com/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: Repeated library() of one package with different include.only= entries

2024-04-11 Thread luke-tierney--- via R-devel

On Thu, 11 Apr 2024, Duncan Murdoch wrote:

On 11/04/2024 7:04 a.m., Martin Maechler wrote:

Michael Chirico
 on Mon, 8 Apr 2024 10:19:29 -0700 writes:

 > Right now, attaching the same package with different include.only= 
has no

 > effect:

 > library(Matrix, include.only="fac2sparse")
 > library(Matrix)
 > ls("package:Matrix")
 > # [1] "fac2sparse"

 > ?library does not cover this case -- what is covered is the 
_loading_

 > behavior of repeated calls:

 >> [library and require] check and update the list of currently 
attached

 > packages and do not reload a namespace which is already loaded

 > But here we're looking at the _attach_ behavior of repeated calls.

 > I am particularly interested in allowing the exports of a package to 
be

 > built up gradually:

 > library(Matrix, include.only="fac2sparse")
 > library(Matrix, include.only="isDiagonal") # want: 
ls("package:Matrix") -->

 > c("fac2sparse", "isDiagonal")
 > ...

 > It seems quite hard to accomplish this at the moment. Is the 
behavior to

 > ignore new inclusions intentional? Could there be an argument to get
 > different behavior?

As you did not get an answer yet, ..., some remarks by an
R-corer who has tweaked library() behavior in the past :

- The `include.only = *` argument to library() has been a
   *relatively* recent addition {given the 25+ years of R history}:

   It was part of the extensive new features by Luke Tierney for
   R 3.6.0  [r76248 | luke | 2019-03-18 17:29:35 +0100], with NEWS entry

 • library() and require() now allow more control over handling
   search path conflicts when packages are attached. The policy is
   controlled by the new conflicts.policy option.

- I haven't seen these (then) new features been used much, unfortunately,
   also not from R-core members, but I'd be happy to be told a different 
story.

For the above reasons, it could well be that the current
implementation {of these features} has not been exercised a lot
yet, and limitations as you found them haven't been noticed yet,
or at least not noticed on the public R mailing lists, nor
otherwise by R-core (?).

Your implicitly proposed new feature (or even *changed*
default behavior) seems to make sense to me -- but as alluded
to, above, I haven't been a conscious user of any
'library(.., include.only = *)' till now.

I don't think it makes sense.  I would assume that

 library(Matrix, include.only="isDiagonal")

implies that only `isDiagonal` ends up on the search path, i.e. 
"include.only" means "include only", not "include in addition to whatever 
else has already been attached".

I think a far better approach to solve Michael's problem is simply to use

 fac2sparse <- Matrix::fac2sparse
 isDiagonal <- Matrix::isDiagonal

instead of messing around with the user's search list, which may have been 
intentionally set to include only one of those.

So I'd suggest changing the docs to say

"[library and require] check and update the list of currently attached
packages and do not reload a namespace which is already loaded.  If a package 
is already attached, no change will be made."

?library could also mention using detach() followed by library() or
attachNamespace() with a new include.only specification.

Best,

luke

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu/
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Bug in out-of-bounds assignment of list object to expression() vector

2024-04-05 Thread luke-tierney--- via R-devel


Thanks for the report. Fixed in R-devel and R-patched (both
R-4-4-branch and R-4-3-branch).

On Fri, 5 Apr 2024, June Choe wrote:


[You don't often get email from jchoe...@gmail.com. Learn why this is important 
at https://aka.ms/LearnAboutSenderIdentification ]

There seems to be a bug in out-of-bounds assignment of list objects to an
expression() vector. Tested on release and devel. (Many thanks to folks
over at Mastodon for the help narrowing down this bug)

When assigning a list into an existing index, it correctly errors on
incompatible type, and the expression vector is unchanged:

```
x <- expression(a,b,c)
x[[3]] <- list() # Error
x
#> expression(a, b, c)
```

When assigning a list to an out of bounds index (ex: the next, n+1 index),
it errors the same but now changes the values of the vector to NULL:

```
x <- expression(a,b,c)
x[[4]] <- list() # Error
x
#> expression(NULL, NULL, NULL)
```

Curiously, this behavior disappears if a prior attempt is made at assigning
to the same index, using a different incompatible object that does not
share this bug (like a function):

```
x <- expression(a,b,c)
x[[4]] <- base::sum # Error
x[[4]] <- list() # Error
x
#> expression(a, b, c)
```

That "protection" persists until x[[4]] is evaluated, at which point the
bug can be produced again:

```
x[[4]] # Error
x[[4]] <- list() # Error
x
#> expression(NULL, NULL, NULL)
```

Note that `x` has remained a 3-length vector throughout.

Best,
June

   [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: Bug in out-of-bounds assignment of list object to expression() vector

2024-04-05 Thread luke-tierney--- via R-devel


On Fri, 5 Apr 2024, Ivan Krylov via R-devel wrote:


On Fri, 5 Apr 2024 08:15:20 -0400
June Choe  wrote:


When assigning a list to an out of bounds index (ex: the next, n+1
index), it errors the same but now changes the values of the vector
to NULL:

```
x <- expression(a,b,c)
x[[4]] <- list() # Error
x
#> expression(NULL, NULL, NULL)
```

Curiously, this behavior disappears if a prior attempt is made at
assigning to the same index, using a different incompatible object
that does not share this bug (like a function)


Here's how the problem happens:

1. The call lands in src/main/subassign.c, do_subassign2_dflt().

2. do_subassign2_dflt() calls SubassignTypeFix() to prepare the operand
for the assignment.

3. Since the assignment is "stretching", SubassignTypeFix() calls
EnlargeVector() to provide the space for the assignment.

The bug relies on `x` not being IS_GROWABLE(), which may explain
why a plain x[[4]] <- list() sometimes doesn't fail.

The future assignment result `x` is now expression(a, b, c, NULL), and
the old `x` set to expression(NULL, NULL, NULL) by SET_VECTOR_ELT(newx,
i, VECTOR_ELT(x, i)); CLEAR_VECTOR_ELT(x, i); during EnlargeVector().

4. But then the assignment fails, raising the error back in
do_subassign2_dflt(), because the assignment kind is invalid: there is
no way to put data.frames into an expression vector. The new resized
`x` is lost, and the old overwritten `x` stays there.

Not sure what the right way to fix this is. It's desirable to avoid
shallow_duplicate(x) for the overwriting assignments, but then the
sub-assignment must either succeed or leave the operand untouched.
Is there a way to perform the type check before overwriting the operand?


Yes. There are two places where there are some checks, one early and
the other late. The early one is explicitly letting this one through
and shouldn't. So a one line change would address this particular
problem. But it would be a good idea to review why we the late checks
are needed at all and maybe change that. I'll look into it.

Best,

luke

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Ordered comparison operators on language objects will signal errors

2024-03-04 Thread luke-tierney--- via R-devel


Comparison operators == and != can be used on language objects
(i.e. call objects and symbols). The == operator in particular often
seems to be used as a shorthand for calling identical(). The current
implementation involves comparing deparsed calls as strings. This has
a number of drawbacks and we would like to transition to a more robust
and efficient implementation. As a first step, R-devel will soon be
modified to signal an error when the ordered comparison operators <,
<=, >, >= are used on language objects. A small number of CRAN and
BIOC packages will fail after this change. If you want to check your
packages or code before the change is committed you can run the
current R-devel with the environment variable setting

_R_COMPARE_LANG_OBJECTS=eqonly

where using such a comparison now produces

> quote(x + y) > 1
Error in quote(x + y) > 1 :
  comparison (>) is not possible for language types

Best,

luke


--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall      email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Get list of active calling handlers?

2024-02-07 Thread luke-tierney--- via R-devel


On Tue, 6 Feb 2024, Duncan Murdoch wrote:

The SO post https://stackoverflow.com/q/77943180 tried to call 
globalCallingHandlers() from a function, and it failed with the error message 
"should not be called with handlers on the stack".  A much simpler 
illustration of the same error comes from this line:


 try(globalCallingHandlers(warning = function(e) e))

The problem here is that try() sets an error handler, and 
globalCallingHandlers() sees it and aborts.


If I call globalCallingHandlers() with no arguments, I get a list of 
currently active global handlers.  Is there also a way to get a list of 
active handlers, including non-global ones (like the one try() added in the 
line above)?


There is not. The internal stack is not safe to allow to escape to the
R level.  It would be possible to write a reflection function to
provide some information, but it would be a fair bit of work to design
and I don't think would be of enough value to justify that.

The original SO question would be better addressed to
Posit/RStudio. Someone with enough motivation might also be able to
figure out an answer by looking at the source code at
https://github.com/rstudio/rstudio.

Best,

luke




Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] readChar() could read the whole file by default?

2024-01-26 Thread luke-tierney--- via R-devel


On Fri, 26 Jan 2024, Michael Chirico wrote:


I am curious why readLines() has a default (n=-1L) to read the full
file while readChar() has no default for nchars= (i.e., readChar(file)
is an error). Is there a technical reason for this?

I often[1] see code like paste(readLines(f), collapse="\n") which
would be better served by readChar(), especially given issues with the
global string cache I've come across[2]. But lacking the default, the
replacement might come across less clean.


The string cache seems like a very dark pink herring to me. The fact
that the lines are allocated on the heap might create an issue; the
cache isn't likely to add much to that. In any case I would need to
see a realistic example to convince me this is worth addressing on
performance grounds.

I don't see any reason in principle not to have readChar and readBin
read the entire file if n = -1 (others might) but someone would need
to write a patch to implement that.

Best,

luke


For my own purposes the incantation readChar(file, file.size(file)) is
ubiquitous. Taking CRAN code[3] as a sample[4], 41% of readChar()
calls use either readChar(f, file.info(f)$size) or readChar(f,
file.size(f))[5].

Thanks for the consideration and feedback,
Mike C

[1] e.g. a quick search shows O(100) usages in CRAN packages:
https://github.com/search?q=org%3Acran+%2Fpaste%5B%28%5D%5Cs*readLines%5B%28%5D.*%5B%29%5D%2C%5Cs*collapse%5Cs*%3D%5Cs*%5B%27%22%5D%5B%5C%5C%5D%2F+lang%3AR&type=code,
and O(1000) usages generally on GitHub:
https://github.com/search?q=lang%3AR+%2Fpaste%5B%28%5D%5Cs*readLines%5B%28%5D.*%5B%29%5D%2C%5Cs*collapse%5Cs*%3D%5Cs*%5B%27%22%5D%5B%5C%5C%5D%2F+lang%3AR&type=code
[2] AIUI the readLines() approach "pollutes" the global string cache
with potentially 1000s/1s of strings for each line, only to get
them gc()'d after combining everything with paste(collapse="\n")
[3] The mirror on GitHub, which includes archived packages as well as
current (well, eventually-consistent) versions.
[4] Note that usage in packages is likely not representative of usage
in scripts, e.g. I often saw readChar(f, 1), or eol-finders like
readChar(f, 500) + grep("[\n\r]"), which makes more sense to me as
something to find in package internals than in analysis scripts. FWIW
I searched an internal codebase (scripts and packages) and found 70%
of usages reading the full file.
[5] repro: 
https://gist.github.com/MichaelChirico/247ea9500460dca239f031e74bdcf76b
requires GitHub PAT in env GITHUB_PAT for API permissions.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: Choices to remove `srcref` (and its buddies) when serializing objects

2024-01-18 Thread luke-tierney--- via R-devel


On Thu, 18 Jan 2024, Ivan Krylov via R-devel wrote:


В Tue, 16 Jan 2024 14:16:19 -0500
Dipterix Wang  пишет:


Could you recommend any packages/functions that compute hash such
that the source references and sexpinfo_struct are ignored? Basically
a version of `serialize` that convert R objects to raw without
storing the ancillary source reference and sexpinfo.


I can show how this can be done, but it's not currently on CRAN or even
a well-defined package API. I have adapted a copy of R's serialize()
[*] with the following changes:

* Function bytecode and flags are ignored:

f <- function() invisible()
depcache:::hash(f, 2) # This is plain FNV1a-64 of serialize() output
# [1] "9b7a1af5468deba4"
.Call(depcache:::C_hash2, f) # This is the new hash
[1] 91 5f b8 a1 b0 6b cb 40
f() # called once: function gets the MAYBEJIT_MASK flag
depcache:::hash(f, 2)
# [1] "7d30e05546e7a230"
.Call(depcache:::C_hash2, f)
# [1] 91 5f b8 a1 b0 6b cb 40
f() # called twice: function now has bytecode
depcache:::hash(f, 2)
# [1] "2a2cba4150e722b8"
.Call(depcache:::C_hash2, f)
# [1] 91 5f b8 a1 b0 6b cb 40 # new hash stays the same

* Source references are ignored:

.Call(depcache:::C_hash2, \( ) invisible( ))
# [1] 91 5f b8 a1 b0 6b cb 40 # compare vs. above

# For quoted function definitions, source references have to be handled
# differently
.Call(depcache:::C_hash2, quote(function(){}))
[1] 58 0d 44 8e d4 fd 37 6f
.Call(depcache:::C_hash2, quote(\( ){  }))
[1] 58 0d 44 8e d4 fd 37 6f

* ALTREP is ignored:

identical(1:10, 1:10+0L)
# [1] TRUE
identical(serialize(1:10, NULL), serialize(1:10+0L, NULL))
# [1] FALSE
identical(
.Call(depcache:::C_hash2, 1:10),
.Call(depcache:::C_hash2, 1:10+0L)
)
# [1] TRUE

* Strings not marked as bytes are encoded into UTF-8:

identical('\uff', iconv('\uff', 'UTF-8', 'latin1'))
# [1] TRUE
identical(
serialize('\uff', NULL),
serialize(iconv('\uff', 'UTF-8', 'latin1'), NULL)
)
# [1] FALSE
identical(
.Call(depcache:::C_hash2, '\uff'),
.Call(depcache:::C_hash2, iconv('\uff', 'UTF-8', 'latin1'))
)
# [1] TRUE

* NaNs with different payloads (except NA_numeric_) are replaced by
  R_NaN.

One of the many downsides to the current approach is that we rely on
the non-API entry point getPRIMNAME() in order to hash builtins.
Looking at the source code for identical() is no help here, because it
uses the private PRIMOFFSET macro.

The bitstream being hashed is also, unfortunately, not exactly
compatible with R serialization format version 2: I had to ignore the
LEVELS of the language objects being hashed both because identical()
seems to ignore those and because I was missing multiple private
definitions (e.g. the MAYBEJIT flag) to handle them properly.

Then there's also the problem of immediate bindings [**]: I've seen bits
of vctrs, rstudio, rlang blow up when calling CAR() on SEXP objects that
are not safe to handle this way, but R_expand_binding_value() (used by
serialize()) is again a private function that is not accessible from
packages. identical() won't help here, because it compares reference
objects (which may or may not contain such immediate bindings) by their
pointer values instead of digging down into them.


What does 'blow up' mean? If it is anything other than signal a "bad
binding access" error then it would be good to have more details.

Best,

luke


Dropping the (already violated) requirement to be compatible with R
serialization bitstream will make it possible to simplify the code
further.

Finally:

a <- new.env()
b <- new.env()
a$x <- b$x <- 42
identical(a, b)
# [1] FALSE
.Call(depcache:::C_hash2, a)
# [1] 44 21 f1 36 5d 92 03 1b
.Call(depcache:::C_hash2, b)
# [1] 44 21 f1 36 5d 92 03 1b

...but that's unavoidable when looking at frozen object contents
instead of their live memory layout.

If you're interested, here's the development version of the package:
install.packages('depcache',contriburl='https://aitap.github.io/Rpackages')




--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] UseMethod forwarding of local variables

2023-10-20 Thread luke-tierney


UseMethod has since the beginning had the 'feature' that local
variables in the generic are added to the environment in which the
method body is evaluated. This is documented in ?UseMethod and
R-lang.texi, but use of this 'feature' has been explicitly discouraged
in R-lang.texi for many years.

This is an unfortunate design decision for a number of reasons (see
below), so the plan is to remove this 'feature' in the next major
release.

Fortunately only a small number of packages on CRAN (see below) seem
to make use of this feature directly; a few more as reverse
dependencies.  The maintainers of the directly affected packages will
be notified separately.

Current R-devel allows you to set the environment variable
R_USEMETHOD_FORWARD_LOCALS=none to run R without this feature or
R_USEMETHOD_FORWARD_LOCALS=error to signal an error when a forwarded
variable's value is used.

Some more details:

An example:

> foo <- function(x) { yyy <- 77; UseMethod("foo") }
> foo.bar <- function(x) yyy
> foo(structure(1, class = "bar"))
[1] 77

Some reasons the design is a bad idea:

- You can't determine what a method does without knowing what the
  generic it will be called from looks like.

- Code analysis (codetools, the compiler) can't analyze method
  code reliably.

- You can't debug a method on its own. For the foo() example,

> foo.bar(structure(1, class = "bar"))
Error in foo.bar(structure(1, class = "bar")) : object 'yyy' not found

- A method relying on these variables won't work when reached via NextMethod:

> foo.baz <- function(x) NextMethod("foo")
> foo(structure(2, class = c("baz", "bar")))
Error in foo.bar(structure(2, class = c("baz", "bar"))) :
  object 'yyy' not found

The directly affected CRAN packages I have identified are:

- actuar
- quanteda
- optmatch
- rlang
- saeRobust
- Sim.DiffProc
- sugrrants
- texmex

Some of these fail with the environment set to 'error' but not to
'none', so they are getting a value from somewhere else that may or
may not be right.

Affected as revdeps of optmatch:

- cobalt
- htetree
- jointVIP
- MatchIt
- PCAmatchR
- rcbalance
- rcbsubset
- RItools
- stratamatch

Affected as revdeps of texmex:

- lax
- mobirep

Best,

luke

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] On PRINTNAME() encoding, EncodeChar(), and being painted into a corner

2023-09-22 Thread luke-tierney

n/eval.c
+++ src/main/eval.c
@@ -1161,7 +1161,7 @@ SEXP eval(SEXP e, SEXP rho)
const char *n = CHAR(PRINTNAME(e));
-   if(*n) errorcall(getLexicalCall(rho),
+   if(*n) errorcall_cpy(getLexicalCall(rho),
 _("argument \"%s\" is missing, with no default"),
-CHAR(PRINTNAME(e)));
+EncodeChar(PRINTNAME(e)));
else errorcall(getLexicalCall(rho),
   _("argument is missing, with no default"));
}

--- src/main/match.c
+++ src/main/match.c
@@ -229,7 +229,7 @@ attribute_hidden SEXP matchArgs_NR(SEXP
  if (fargused[arg_i] == 2)
- errorcall(call,
+ errorcall_cpy(call,
  _("formal argument \"%s\" matched by multiple actual 
arguments"),
- CHAR(PRINTNAME(TAG(f;
+ EncodeChar(PRINTNAME(TAG(f;
  if (ARGUSED(b) == 2)
  errorcall(call,
  _("argument %d matches multiple formal 
arguments"),
@@ -272,12 +271,12 @@ attribute_hidden SEXP matchArgs_NR(SEXP
if (fargused[arg_i] == 1)
-   errorcall(call,
+   errorcall_cpy(call,
_("formal argument \"%s\" matched by multiple actual 
arguments"),
-   CHAR(PRINTNAME(TAG(f;
+   EncodeChar(PRINTNAME(TAG(f;
if (R_warn_partial_match_args) {
warningcall(call,
_("partial argument match of '%s' to 
'%s'"), CHAR(PRINTNAME(TAG(b))),
CHAR(PRINTNAME(TAG(f))) );
}
SETCAR(a, CAR(b));
if (CAR(b) != R_MissingArg) SET_MISSING(a, 0);

The changes become more complicated with a plain error() (have to
figure out the current call and provide it to errorcall_cpy), still
more complicated with warnings (there's currently no warningcall_cpy(),
though one can be implemented) and even more complicated when multiple
symbols are used in the same warning or error, like in the last
warningcall() above (EncodeChar() can only be called once at a time).

The only solution to the latter problem is an EncodeChar() variant that
allocates its memory dynamically. Would R_alloc() be acceptable in this
context? With errors, the allocation stack would be quickly reset
(except when withCallingHandlers() is in effect?), but with warnings,
the code would have to restore it manually every time.


Or allow/require a buffer to be provided. So replacing the calls like

   CHAR(PRINTNAME(sym))

with

   EncodeSymbol(sym, buf, buf_size)


Is it even worth
the effort to try to handle the (pretty rare) non-syntactic symbol names
while constructing error messages? Other languages (like Lua or SQLite)
provide a special printf specifier (typically %q) to create
quoted/escaped string representations, but we're not yet at the point
of providing a C-level printf implementation.


Not clear it is worth it. But the situation now is not good, because
sometimes we encode and sometimes we don't. It would be better to be
consistent, both for the end user and for maintainers who now have to
spend time figuring out which way to go.

Best,

luke

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: Calling a replacement function in a custom environment

2023-08-27 Thread luke-tierney


On Sun, 27 Aug 2023, Duncan Murdoch wrote:

I think there isn't a way to make this work other than calling `is.na<-` 
explicitly:


 x <- b$`is.na<-`(x, TRUE)



Replacement functions are not intended to be called directly. Calling
a replacement function directly may produce an error, or may just do
the wrong thing in terms of mutation.


It seems like a reasonable suggestion to make

 b$is.na(x) <- TRUE

work as long as b is an environment.



I do not think it is a reasonable suggestion. The reasons a::b and
a:::b were made to work is that many users read these as a single
symbol, not a call to a binary operator. So supporting this helped to
reduce confusion.

Allowing $<- to "work" on environments was probably a mistake since
environments behave differently with respect to
duplication. Disallowing it entirely may be too disruptive at this
point, but disallowing it in complex assignment expressions may be
necessary to prevent mutations that should not happen. (There are open
bug reports that boil down to this.)

In any case, complicating the complex assignment code, which is
already barely maintainable, would be a very bad idea.

Best,

luke

If you wanted it to work when b was a list, it would be more problematic 
because of partial name matching.  E.g. suppose b was a list containing 
functions partial(), partial<-(), and part<-(), and I call


 b$part(x) <- 1

what would be called?

Duncan Murdoch

On 27/08/2023 10:59 a.m., Konrad Rudolph wrote:

Hello all,

I am wondering whether it’s at all possible to call a replacement function
in a custom environment. From my experiments this appears not to be the
case, and I am wondering whether that restriction is intentional.

To wit, the following works:

x = 1
base::is.na(x) = TRUE

However, the following fails:

x = 1
b = baseenv()
b$is.na(x) = TRUE

The error message is "invalid function in complex assignment". Grepping the
R code for this error message reveals that this behaviour seems to be
hard-coded in function `applydefine` in src/main/eval.c: the function
explicitly checks for `::` and :::` and permits those assignments, but has
no equivalent treatment for `$`.

Am I overlooking something to make this work? And if not — unless there’s a
concrete reason against it, could it be considered to add support for this
syntax, i.e. for calling a replacement function by `$`-subsetting the
defining environment, as shown above?

Cheers,
Konrad



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Time to drop globalenv() from searches in package code?

2022-09-17 Thread luke-tierney


On Sat, 17 Sep 2022, Kurt Hornik wrote:


luke-tierney  writes:



On Thu, 15 Sep 2022, Duncan Murdoch wrote:

The author of this Stackoverflow question
https://stackoverflow.com/q/73722496/2554330 got confused because a typo in
his code didn't trigger an error in normal circumstances, but it did when he
ran his code in pkgdown.

The typo was to use "x" in a test, when the local variable was named ".x".
There was no "x" defined locally or in the package or its imports, so the
search got all the way to the global environment and found one.  (The very
confusing part for this user was that it found the right variable.)

This author had suppressed the "R CMD check" check for use of global
variables.  Obviously he shouldn't have done that, but he's working with
tidyverse NSE, and that causes so many false positives that it is somewhat
understandable he would suppress one too many.

The pkgdown simulation of code in examples doesn't do perfect mimicry of
running it at top level; the fake global environment never makes it onto the
search list.  Some might call this a bug, but I'd call it the right search
strategy.

My suggestion is that the search for variables in package code should never
get to globalenv().  The chain of environments should stop after handling the
imports.  (Probably base package functions should also be implicitly
imported, but nothing else.)




This was considered and discussed when I added namespaces. Basically
it would mean making the parent of the base namespace environment be
the empty environment instead of the global environment. As a design
this is cleaner, and it would be a one-line change in eval.c.  But
there were technical reasons this was not a viable option at the time,
also a few political reasons. The technical reasons mostly had to do
with S3 dispatch.



Changes over the years, mostly from work Kurt has done, to S3 dispatch
for methods defined and registered in packages might make this more
viable in principle, but there would still be a lot of existing code
that would stop working. For example, 'make check' with the one-line
change fails in a base example that defines an S3 method. It might be
possible to fiddle with the dispatch to keep most of that code
working, but I suspect that would be a lot of work. Seeing what it
would take to get 'make check' to succeed would be a first step if
anyone wants to take a crack at it.


Luke,

Can you please share the one-line change so that I can take a closer
look?


Index: src/main/envir.c
===
--- src/main/envir.c(revision 82861)
+++ src/main/envir.c(working copy)
@@ -683,7 +683,7 @@
 R_GlobalCachePreserve = CONS(R_GlobalCache, R_NilValue);
 R_PreserveObject(R_GlobalCachePreserve);
 #endif
-R_BaseNamespace = NewEnvironment(R_NilValue, R_NilValue, R_GlobalEnv);
+R_BaseNamespace = NewEnvironment(R_NilValue, R_NilValue, R_EmptyEnv);
 R_PreserveObject(R_BaseNamespace);
 SET_SYMVALUE(install(".BaseNamespaceEnv"), R_BaseNamespace);
 R_BaseNamespaceName = ScalarString(mkChar("base"));

-

For S3 the dispatch will have to be changed to explicitly search
.GlobalEnv and parents after the namespace if we don't want to break
too much.

Another idiom that will be broken is

if (require("foo"))
   bar(...)

with bar exported from foo. I don't know if that is already warned
about.  Moving away from this is arguably good in principle but also
probably fairly disruptive. We might need to add some cleaner
use-if-available mechanism, or maybe just adjust some checking code.

Best,

luke



Best
-k


I suspect this change would reveal errors in lots of packages, but the number
of legitimate uses of the current search strategy has got to be pretty small
nowadays, since we've been getting warnings for years about implicit imports
from other standard packages.



Your definition of 'legitimate' is probably quite similar to mine, but
there is likely to be a small but vocal minority with very different
views :-).



Best,



luke



Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of St

Re: [Rd] [External] assignment

2021-12-27 Thread luke-tierney


On Mon, 27 Dec 2021, Gabor Grothendieck wrote:


In a recent SO post this came up (changed example to simplify it
here).  It seems that `test` still has the value sin.

 test <- sin
 environment(test)$test <- cos
 test(0)
 ## [1] 0

It appears to be related to the double use of `test` in `$<-` since if
we break it up it works as expected:

 test <- sin
 e <- environment(test)
 e$test <- cos
 test(0)
 ## [1] 1

`assign` also works:

 test <- sin
 assign("test", cos, environment(test))
 test(0)
 ## [1] 1

Can anyone shed some light on this?


See my response in

https://bugs.r-project.org/show_bug.cgi?id=18269

Best,

luke

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: hashtab address arg

2021-12-22 Thread luke-tierney


On Wed, 22 Dec 2021, Ivan Krylov wrote:


On Sat, 18 Dec 2021 11:50:54 +0100
Arnaud FELD  wrote:


However, I'm a bit troubled about the "address" argument. What is it
intended for since (as far as I know) "address equality" is until now
something that isn't really let for the user to decide within R.


Using the words from "Extending R" by John M. Chambers, the concept of
address identity could be related to the question:


If some of the data in the object has changed, is this still the
same object?


Most objects in R are defined by their content. If you had a 100x100
matrix and changed an element at [50,50], it's now a different matrix,
even if it's stored in the same variable. If you create another 100x100
matrix in a different variable but fill it with the same numbers, it
should still compare equal to your original matrix.

Not all types of R objects are like that. Environments are good
candidates for pointer equality comparison. For example, the contents
of the global environment change every time you assign some variable in
the R command line, but it remains the same global environment. Indeed,
identical() for environments just compares their pointers: even if two
different environments only contain objects that compare equal, they
cannot be considered the same environment, because different closures
might be referring to them. Similar are data.tables: if you had a giant
dataset and, as part of cleaning it up, removed some outliers, perhaps
it should be considered the same dataset, even if the contents aren't
strictly the same any more. Same goes for reference class and R6
objects: unlike the pass-by-value semantics associated with most
objects in R, these are assumed to carry global state within them, and
modifications to them are reflected everywhere they are referenced, not
limited to the current function call.


This is still experimental and the 'address' option may not survive at
the R level. There are some C level applications where it can be
useful; maybe it will only be retained there.


I *think* that most (if not all) objects with reference semantics
already use pointer comparison when being compared by identical(), so
the default of "identical" is, as the help page says, almost always the
right choice, but if it matters to your code whether the objects are
actually stored in the same area in the memory, use hashes of type
"address".


Unfortunately not all: External pointer objects are reference objects
but by default are not compared based on object address. Fixing the
default is not an option in the short term as it breaks too much code
(mostly through dependencies on a few packages).


(Perhaps this topic could be a better fit for R-help.)


R-devel is the right place for this.

Best,

luke

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Status of '=>'

2021-12-20 Thread luke-tierney


It's still work in progress. Probably => will be dropped in favor of
limited use of _ for non-first-argument passing.

Best,

luke

On Mon, 20 Dec 2021, Dirk Eddelbuettel wrote:



R 4.1.0 brought the native pipe and the related ability to use '=>' if one
opted into it by setting _R_USE_PIPEBIND_. I often forget about '=>' and
sadly can never find anything in the docs either (particularly no 'see als'
from '|>' docs) which is not all that heplful.

Can we anticipate a change with R 4.2.0, or will it remain as is, somewhat
available but not really documented or enabled? Clarifications welcome,
otherwise 'time will tell' as usual.

Thanks,  Dirk




--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] DOCS: Exactly when, in the signaling process, is option 'warn' applied?

2021-11-18 Thread luke-tierney


On Thu, 18 Nov 2021, Henrik Bengtsson wrote:


Hi,

the following question sprung out of a package settings option warn=-1
to silence warnings, but those warnings were still caught by
withCallingHandlers(..., warning), which the package author did not
anticipate. The package has been updated to use suppressWarnings()
instead, but as I see a lot of packages on CRAN [1] use
options(warn=-1) to temporarily silence warnings, I wanted to bring
this one up. Even base R itself [2] does this, e.g.
utils::assignInMyNamespace().

Exactly when is the value of 'warn' options used when calling warning("boom")?



In the default handler; it doesn't affect signaling.

Much of the documentation pre-dates the condition system; happy to
consider patches.

Best,

luke


I think the docs, including ?options, would benefit from clarifying
that. To the best of my understanding, it should also mention that
options 'warn' is meant to be used by end-users, and not in package
code where suppressWarnings() should be used.

To clarify, if we do:


options(warn = -1)
tryCatch(warning("boom"), warning = function(w) stop("Caught warning: ", 
conditionMessage(w), call. = FALSE))

Error: Caught warning: boom

we see that the warning is indeed signaled.  However, in Section '8.2
warning' of the 'R Language Definition' [3], we can read:

"The function `warning` takes a single argument that is a character
string. The behaviour of a call to `warning` depends on the value of
the option `"warn"`. If `"warn"` is negative warnings are ignored.
[...]"

The way this is written, it may suggest that warnings are
ignored/silences already early on when calling warning(), but the
above example shows that that is not the case.

From the same section, we can also read:

"[...] If it is zero, they are stored and printed after the top-level
function has completed. [...]"

which may hint at the 'warn' option is applied only when a warning
condition is allowed to "bubble up" all the way to the top level.
(FWIW, this is how always though it worked, but it's only now I looked
into the docs and see it's ambiguous on this).

/Henrik

[1] 
https://github.com/search?q=org%3Acran+language%3Ar+R%2F+in%3Afile%2Cpath+options+warn+%22-1%22&type=Code
[2] 
https://github.com/wch/r-source/blob/0a31ab2d1df247a4289efca5a235dc45b511d04a/src/library/utils/R/objects.R#L402-L405
[3] https://cran.r-project.org/doc/manuals/R-lang.html#warning

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone:     319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] GC: speeding-up the CHARSXP cache maintenance, 2nd try

2021-11-04 Thread luke-tierney


Can you please submit this as a wishlist item to bugzilla? it is
easier to keep track of there. You could also submit your threads
based suggestion there, again to keep it easier to keep track of and
possibly get back to in the future.

I will have a look at your approach when I get a chance, but I am
exploring a different approach to avoid scanning old generations that
may be simpler.

Best,

luke

On Wed, 3 Nov 2021, Andreas Kersting wrote:


Hi,

In https://stat.ethz.ch/pipermail/r-devel/2021-October/081147.html I proposed 
to speed up the CHARSXP cache maintenance during GC using threading. This was 
rejected by Luke in 
https://stat.ethz.ch/pipermail/r-devel/2021-October/081172.html.

Here I want to propose an alternative approach to significantly speed up 
CHARSXP cache maintenance during partial GCs. A patch which passes `make 
check-devel` is attached. Compared to R devel (revision 81110) I get the 
following performance improvements on my system:

Elapsed time for five non-full gc in a session after

x <- as.character(runif(5e7))[]
gc(full = TRUE)

+20sec -> ~1sec.


This patch introduces (theoretical) overheads to mkCharLenCE() and full GCs. 
However, I did not measure dramatic differences:

y <- "old_CHARSXP"

after

x <- "old_CHARSXP"; gc(); gc()

takes a median 32 nanoseconds with and without the patch.


gc(full = TRUE)

in a new session takes a median 16 milliseconds with and 14 without the patch.


The basic idea is to maintain the CHARSXP cache using subtables in R_StringHash, one for 
each of the (NUM_GC_GENERATIONS := NUM_OLD_GENERATIONS + 1) GC generations. New CHARSXPs 
are added by mkCharLenCE() to the subtable of the youngest generation. After a partial 
GC, only the chains anchored at the subtables of the youngest (num_old_gens_to_collect + 
1) generations need to be searched for and cleaned of unmarked nodes. Afterwards, these 
chains need to be merged into those of the respective next generation, if any. This 
approach relies on the fact that an object/CHARSXP can never become younger again. It is 
OK though if an object/CHARSXP "skips" a GC generation.

R_StringHash, which is now of length (NUM_GC_GENERATIONS * char_hash_size), is 
structured such that the chains for the same hashcode but for different 
generations are anchored at slots of R_StringHash which are next to each other 
in memory. This is because we often need to access two or more (i.e. currently 
all three) of them for one operation and this avoids cache misses.

HASHPRI, i.e. the number of occupied primary slots, is computed and stored as 
NUM_GC_GENERATIONS times the number of slots which are occupied in at least one 
of the subtables. This is done because in mkCharLenCE() we need to iterate 
through one or more chains if and only if there is a chain for the particular 
hashcode in at least one subtable.

I tried to keep the patch as minimal as possible. In particular, I did not add 
long vector support to R_StringHash. I rather reduced the max value of 
char_hash_size from 2^30 to 2^29, assuming that NUM_OLD_GENERATIONS is (not 
larger than) 2. I also did not yet adjust do_show_cache() and do_write_cache(), 
but I could do so if the patch is accepted.

Thanks for your consideration and feedback.

Regards,
Andreas


P.S. I had a hard time to get the indentation right in the patch due the mix of 
tabs and spaces. Sorry, if I screwed this up.


--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: Wrong number of names?

2021-11-02 Thread luke-tierney

On Mon, 1 Nov 2021, Martin Maechler wrote:

Duncan Murdoch
on Mon, 1 Nov 2021 06:36:17 -0400 writes:

   > The StackOverflow post
   > https://stackoverflow.com/a/69767361/2554330 discusses a
   > dataframe which has a named numeric column of length 1488
   > that has 744 names. I don't think this is ever legal, but
   > am I wrong about that?

   > The `dat.rds` file mentioned in the post is temporarily
   > available online in case anyone else wants to examine it.

   > Assuming that the file contains a badly formed object, I
   > wonder if readRDS() should do some sanity checks as it
   > reads.

   > Duncan Murdoch

Good question.

In the mean time, I've also added a bit on the SO page
above.. e.g.

---

d <- readRDS("<.>dat.rds")
str(d)
## 'data.frame':1488 obs. of  4 variables:
##  $ facet_var: chr  "AUT" "AUT" "AUT" "AUT" ...
##  $ date : Date, format: "2020-04-26" "2020-04-27" ...
##  $ variable : Factor w/ 2 levels "arima","prophet": 1 1 1 1 1 1 1 1 1 1 ...
##  $ score: Named num  2.74e-06 2.41e-06 2.48e-06 2.39e-06 2.79e-06 ...
##   ..- attr(*, "names")= chr [1:744] "new_confirmed10" "new_confirmed10" 
"new_confirmed10" "new_confirmed10" ...

ds <- d$score
c(length(ds), length(names(ds)))
## 1488   744

dput(ds) # ->

##  *** caught segfault ***
## address (nil), cause 'memory not mapped'

If I'm reading this right then dput is where the segfault is
happening, so that could use some more bulletproofing.

Best,

luke

---

Hence  "proving" that the dat.rds  really contains an invalid object,
when simple  dput(.) directly gives a segmentation fault.

I think we are aware that using C code and say .Call(..)  one
can create all kinds of invalid objects "easily".. and I think
it's clear that it's not feasible to check for validity of such
objects "everwhere".

Your proposal to have at least our deserialization code used in
readRDS() do (at least *some*) validity checks seems good, but
maybe we should think of more cases, and / or  do such validity
checks already during serialization { <-> saveRDS() here } ?

.. Such questions then really are for those who understand more than
me about (de)serialization in R, its performance bottlenecks etc.
Given the speed impact we should probably have such checks *optional*
but have them *on* by default e.g., at least for saveRDS() ?

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] GC: improving the marking performance for STRSXPs

2021-10-18 Thread luke-tierney


Thanks. I have committed a modified version, also incorporating the
handling of R_StringHash from your other post, in r81073. I prefer to
be more conservative in the GC. for example not assume without
checking that STRSXP elements are CHARSXP. This does add some
overhead, but the change is still beneficial.

I don't think we would want to add the complexity of threading at this
point, though it might be worth considering at a later time. There are
a few other possible modifications that I'll explore that might
provide comparable improvements to the ones seen with your patch
without adding the complexity of threads.

Best,

luke

On Thu, 7 Oct 2021, Andreas Kersting wrote:


Hi all,

in GC (in src/main/memory.c), FORWARD_CHILDREN() (called by PROCESS_NODES()) 
treats STRSXPs just like VECSXPs, i.e. it calls FORWARD_NODE() for all its 
children. I claim that this is unnecessarily inefficient since the children of 
a STRSXP can legitimately only be (atomic) CHARSXPs and could hence be marked 
directly in the call of FORWARD_CHILDREN() on the STRSXP.

Attached patch (atomic_CHARSXP.diff) implements this and gives the following 
performance improvements on my system compared to R devel (revision 81008):

Elapsed time for two full gc in a session after

x <- as.character(runif(5e7))[]

19sec -> 15sec.

This is the best-case scenario for the patch: very many unique/unmarked CHARSXP 
in the STRSXP. For already marked CHARSXP there is no performance gain since 
FORWARD_NODE() is a no-op for them.

The relative performance gain is even bigger if iterating through the STRSXP 
produces many cache misses, as e.g. after

x <- as.character(runif(5e7))[]
x <- sample(x, length(x))

Elapsed time for two full gc here: 83sec -> 52sec. This is because we have less 
cache misses per CHARSXP.

This patch additionally also assumes that the ATTRIBs of a CHARSXP are not to 
be traced because they are just used for maintaining the CHARSXP hash chains.

The second attached patch (atomic_CHARSXP_safe_unlikely.diff) checks both 
assumptions and calls gc_error() if they are violated and is still noticeably faster 
than R devel: 19sec -> 17sec and 83sec -> 54sec, respectively.

Attached gc_test.R is the script I used to get the previously mentioned and 
more gc timings.

Do you think that this is a reasonable change? It does make the code more 
complex and I am not sure if there might be situations in which the assumptions 
are violated, even though SET_STRING_ELT() and installAttrib() do enforce 
them.

Best regards,
Andreas


--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: Workaround very slow NAN/Infinities arithmetic?

2021-09-30 Thread luke-tierney


On Thu, 30 Sep 2021, brodie gaslam via R-devel wrote:



André,

I'm not an R core member, but happen to have looked a little bit at this
issue myself.  I've seen similar things on Skylake and Coffee Lake 2
(9700, one generation past your latest) too.  I think it would make sense
to have some handling of this, although I would want to show the trade-off
with performance impacts on CPUs that are not affected by this, and on
vectors that don't actually have NAs and similar.  I think the performance
impact is likely to be small so long as branch prediction is active, but
since branch prediction is involved you might need to check with different
ratios of NAs (not for your NA bailout branch, but for e.g. interaction
of what you add and the existing `na.rm=TRUE` logic).


I would want to see realistic examples where this matters, not
microbenchmarks, before thinking about complicating the code. Not all
but most cases where sum(x) returns NaN/NA would eventually result in
an error; getting to the error faster is not likely to be useful.

My understanding is that arm64 does not support proper long doubles
(they are the same as regular doubles). So code using long doubles
isn't getting the hoped-for improved precision. Since that
architecture is becoming more common we should probably be looking at
replacing uses of long doubles with better algorithms that can work
with regular doubles, e.g Kahan summation or variants for sum.


You'll also need to think of cases such as c(Inf, NA), c(NaN, NA), etc.,
which might complicate the logic a fair bit.

Presumably the x87 FPU will remain common for a long time, but if there
was reason to think otherwise, then the value of this becomes
questionable.

Either way, I would probably wait to see what R Core says.

For reference this 2012 blog post[1] discusses some aspects of the issue,
including that at least "historically" AMD was not affected.

Since we're on the topic I want to point out that the default NA in R
starts off as a signaling NA:

    example(numToBits)   # for `bitC`
    bitC(NA_real_)
    ## [1] 0 111 | 00100010
    bitC(NA_real_ + 0)
    ## [1] 0 111 | 10100010

Notice the leading bit of the significant starts off as zero, which marks
it as a signaling NA, but becomes 1, i.e. non-signaling, after any
operation[2].

This is meaningful because the mere act of loading a signaling NA into the
x87 FPU is sufficient to trigger the slowdowns, even if the NA is not
actually used in arithmetic operations.  This happens sometimes under some
optimization levels.  I don't now of any benefit of starting off with a
signaling NA, especially since the encoding is lost pretty much as soon as
it is used.  If folks are interested I can provide patch to turn the NA
quiet by default.


In principle this might be a good idea, but the current bit pattern is
unfortunately baked into a number of packages and documents on
internals, as well as serialized objects. The work needed to sort that
out is probably not worth the effort.

It also doesn't seem to affect the performance issue here since
setting b[1] <- NA_real_ + 0 produces the same slowdown (at least on
my current Intel machine).

Best,

luke



Best,

B.

[1]: 
https://randomascii.wordpress.com/2012/05/20/thats-not-normalthe-performance-of-odd-floats/
[2]: https://en.wikipedia.org/wiki/NaN#Encoding






On Thursday, September 30, 2021, 06:52:59 AM EDT, GILLIBERT, Andre 
 wrote:

Dear R developers,

By default, R uses the "long double" data type to get extra precision for 
intermediate computations, with a small performance tradeoff.

Unfortunately, on all Intel x86 computers I have ever seen, long doubles 
(implemented in the x87 FPU) are extremely slow whenever a special 
representation (NA, NaN or infinities) is used; probably because it triggers 
poorly optimized microcode in the CPU firmware. A function such as sum() 
becomes more than hundred times slower!
Test code:
a=runif(1e7);system.time(for(i in 1:100)sum(a))
b=a;b[1]=NA;system.time(sum(b))

The slowdown factors are as follows on a few intel CPU:

1)  Pentium Gold G5400 (Coffee Lake, 8th generation) with R 64 bits : 140 
times slower with NA

2)  Pentium G4400 (Skylake, 6th generation) with R 64 bits : 150 times 
slower with NA

3)  Pentium G3220 (Haswell, 4th generation) with R 64 bits : 130 times 
slower with NA

4)  Celeron J1900 (Atom Silvermont) with R 64 bits : 45 times slower with NA

I do not have access to more recent Intel CPUs, but I doubt that it has 
improved much.

Recent AMD CPUs have no significant slowdown.
There is no significant slowdown on Intel CPUs (more recent than Sandy Bridge) 
for 64 bits floating point calculations based on SSE2. Therefore, operators 
using doubles, such as '+' are unaffected.

I do not know whether recent ARM CPUs have slowdowns

Re: [Rd] [External] Re: Is it a good choice to increase the NCONNECTION value?

2021-08-24 Thread luke-tierney


We do need to be careful about using too many file descriptors.  The
standard soft limit on Linux is fairly low (1024; the hard limit is
usually quite a bit higher). Hitting that limit, e.g. with runaway
with code allocating lots of connections, can cause other things, like
loading packages, to fail with hard to diagnose error messages. A
static connection limit is a crude way to guard against that. Doing
anything substantially better is probably a lot of work. A simple
option that may be worth pursuing is to allow the limit to be adjusted
at runtime. Users who want to go higher would do so at their own risk
and may need to know how to adjust the soft limit on the process.

Best,

luke

On Wed, 25 Aug 2021, Simon Urbanek wrote:



Martin,

I don't think static connection limit is sensible. Recall that connections can 
be anything, not just necessarily sockets or file descriptions so they are not 
linked to the system fd limit. For example, if you use a codec then you will 
need twice the number of connections than the fds. To be honest the connection 
limit is one of the main reasons why in our big data applications we have 
always avoided R connections and used C-level sockets instead (others were lack 
of control over the socket flags, but that has been addressed in the last 
release). So I'd vote for at the very least increasing the limit significantly 
(at least 1k if not more) and, ideally, make it dynamic if memory footprint is 
an issue.

Cheers,
Simon



On Aug 25, 2021, at 8:53 AM, Martin Maechler  wrote:


GILLIBERT, Andre
   on Tue, 24 Aug 2021 09:49:52 + writes:



RConnection is a pointer to a Rconn structure. The Rconn
structure must be allocated independently (e.g. by
malloc() in R_new_custom_connection).  Therefore,
increasing NCONNECTION to 1024 should only use 8
kilobytes on 64-bits platforms and 4 kilobytes on 32
bits platforms.


You are right indeed, and I was wrong.


Ideally, it should be dynamically allocated : either as
a linked list or as a dynamic array
(malloc/realloc). However, a simple change of
NCONNECTION to 1024 should be enough for most uses.


There is one important other problem I've been made aware
(similarly to the number of open DLL libraries, an issue 1-2
years ago) :

The OS itself has limits on the number of open files
(yes, I know that there are other connections than files) and
these limits may quite differ from platform to platform.

On my Linux laptop, in a shell, I see

 $ ulimit -n
 1024

which is barely conformant with your proposed 1024 NCONNECTION.

Now if NCONNCECTION is larger than the max allowed number of
open files and if R opens more files than the OS allowed, the
user may get quite unpleasant behavior, e.g. R being terminated brutally
(or behaving crazily) without good R-level warning / error messages.

It's also not at all sufficient to check for the open files
limit at compile time, but rather at R process startup time

So this may need considerably more work than you / we have
hoped, and it's probably hard to find a safe number that is
considerably larger than 128  and less than the smallest of all
non-crazy platforms' {number of open files limit}.


Sincerely
Andr� GILLIBERT


 []

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: Update on rtools4 and ucrt support

2021-08-23 Thread luke-tierney


On Mon, 23 Aug 2021, Duncan Murdoch wrote:


On 23/08/2021 8:15 a.m., jan Vitek via R-devel wrote:

Hi Jeroen,

I mostly lurk on this list, but I was struck by your combative tone.

To pick on two random bits:


… a 6gb tarball with manually built things on his personal machine…



… a black-box system that is so opaque and complex that only one person
knows how it works, and would make it much more difficult for
students, universities, and other organisations to build R packages
and libraries on Windows…



Tomas’ tool chain isn't a blackbox, it has copious documentation (see [1])
and builds on any machine thanks to the provided docker container.

This is not to criticise your work which has its unique strengths, but to
state the obvious: these strengths are best discussed without passion
based on factually accurate descriptions.


I agree with Jan.  I'm not sure a discussion in this forum would be fruitful, 
but I really wish Jeroen and Tomas would get together, aiming to merge their 
toolchains, keeping the best aspects of both.


I haven't been involved in the development of either one, but have been a 
"victim" of the two chain rivalry, because the rgl package is not easy to 
build.  I get instructions from each of them on how to do the build, and 
those instructions for one toolchain generally break the build on the other 
one.  While it is probably possible to detect the toolchain and have the 
build adapt to whichever one is in use, it would be a lot easier for me (and 
I imagine every other maintainer of a package using external libs) if I just 
had to follow one set of instructions.


Duncan Murdoch


Here are just a few comments from my perspective (I am an R-core
member, but am not part of the CRAN team and do only very limited work
on Windows). Other R-core members may have different perspectives and
insights.

One bit of background: dealing with encoding issues on Windows has
been taking an unsustainable amount of R-core resources for some time
now. Tomas Kalibera has been taking the lead on trying to address
these issues in the existing framework, but this means he has not had
the time to make any of the many other valuable and important
contributions he could make. The only viable way forward is to move to
a Windows tool chain that supports UTF-8 as the C library current
encoding via the Windows UCRT framework.

Tomas Kalibera has, on behalf of all of R core and in
coordination with CRAN, been looking for a way forward for some
time and has reported on the progress in several blog posts at
https://developer.r-project.org/Blog/public/. This has lead to
the development of the MXE-based UCRT tool chain, which is now
well tested and ready for deployment.  Checks using the UCRT tool
chain have been part of the CRAN check process for a while. I
believe CRAN plans to switch R-devel checks and builds to the
UCRT tool chain during the upcoming CRAN downtime. I expect there
will be some communication from CRAN on this soon, including on
any issues in supporting binaries for both R-devel and R-patched.

In putting together something as large as a tool chain there will
always be many choices, each with advantages and disadvantages.  Some
things may be advantages in some settings and not others. Taking just
one case in point: Cross compilation. This is likely to be a better
approach for CRAN in the future and is supported by the MXE framework
on which the new tool chain is based.

The much more recent changes in rtools4 to support UCRT are at this
point not yet as well tested as the new tool chain. Once these changes
to rtools4 mature, and if binary compatibility can be assured, then
having a second tool chain may be useful in some cases.  But if there
are incompatibilities then it will be up to rtools4 to keep up with
the tool chain used by CRAN. On the other, contributing to improving
the MXE-based tool chain may be a better investment of time.

Best,

luke



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: JIT compiler does not compile closures with custom environments

2021-08-18 Thread luke-tierney


On Wed, 18 Aug 2021, Duncan Murdoch wrote:


On 18/08/2021 9:00 a.m., Taras Zakharko wrote:
I have encountered a behavior of R’s JIT compiler that I can’t quite figure 
out. Consider the following code:



f_global <- function(x) {
  for(i in 1:1) x <- x + 1
  x
}

f_env <- local({
 function(x) {
   for(i in 1:1) x <- x + 1
   x
 }
})

compiler::enableJIT(3)

   bench::mark(f_global(0), f_env(0))
   # 1 f_global(0)103µs 107.61µs 8770.11.4KB  04384 
0
   # 2 f_env(0)   1.1ms   1.42ms  712.0B 66.3   290 
27
   Inspecting the closures shows that f_global has been byte-compiled while 
f_env has not been byte-compiled. Furthermore, if I assign a new 
environment to f_global (e.g. via environment(f_global) <- new.env()), it 
won’t be byte-compiled either.


However, if I have a function returning a closure, that closure does get 
byte-compiled:


   f_closure <- (function() {
 function(x) {
   for(i in 1:1) x <- x + 1
  x
}
   })()

   bench::mark(f_closure(0))
   # 1 f_closure(0)105µs109µs 8625.0B 2.01  4284 
1  497ms


What is going on here? Both f_closure and f_env have non-global 
environments. Why is one JIT-compiled, but not the other? Is there a way to 
ensure that functions defined in environments will be JIT-compiled?


About what is going on in f_closure:  I think the anonymous factory

function() {
 function(x) {
   for(i in 1:1) x <- x + 1
  x
}
   }

got byte compiled before first use, and that compiled its result.  That seems 
to be what this code indicates:


 f_closure <- (function() {
 res <- function(x) {
 for(i in 1:1) x <- x + 1
 x
 }; print(res); res
 })()
 #> function(x) {
 #> for(i in 1:1) x <- x + 1
 #> x
 #> }
 #> 
 #> 


That is right.

But even if that's true, it doesn't address the bigger question of why 
f_global and f_env are treated differently.


There are various heuristics in the JIT code to avoid spending too
much time in the JIT. The current details are in the source
code. Mostly this is to deal with usually ill-advised coding practices
that programmatically build many small functions.  Hopefully these
heuristics can be reduced or eliminated over time.

For now, putting the code in a package, where the default is to byte
compile on source install, or explicitly calling compiler::cmpfun are
options.

Best,

luke



Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] svd For Large Matrix

2021-08-13 Thread luke-tierney


[copying the list]

svd() does support matrices with long vector data. Your example works
fine for me on a machine with enough memory with either the reference
BLAS/LAPACK or the BLAS/LAPACK used on Fedora 33 (flexiblas backed, I
believe, by a version of openBLAS). Take a look at sessionInfo() to
see what you are using and consider switching to another BLAS/LAPACK
if necessary. Running under gdb may help tracking down where the issue
is and reporting it for the BLAS/LAPACK you are using.

Best,

luke

On Fri, 13 Aug 2021, Dario Strbenac via R-devel wrote:


Good day,

I have a real scenario involving 45 million biological cells (samples) and 60 
proteins (variables) which leads to a segmentation fault for svd. I thought 
this might be a good example of why it might benefit from a long vector upgrade.

test <- matrix(rnorm(4500*60), ncol = 60)
testSVD <- svd(test)

*** caught segfault ***
address 0x7fe93514d618, cause 'memory not mapped'

Traceback:
1: La.svd(x, nu, nv)
2: svd(test)

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall      email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] difference of m1 <- lm(f, data) and update(m1, formula=f)

2021-08-11 Thread luke-tierney

 putting in a new
expression for the formula argument. It so happens that putting in a
formula object actually works: The only difference between the AST for
a call of `~` and the formula such a call produces when evaluated is
the class and environment attributes the call adds, and most code that
works with expressions, like eval(), ignores attributes.

It would seem somewhat more consistent if update.default put the
expression that would produce the formula into the call (i.e. stripped
out the two attributes).

But I do not know if there is logic in base R code, never mind package
code, that takes advantage of the attributes on the formula expression
in if they are found. formula() looks in the 'terms' component so would
not be affects, but I don't know if something else might be.

Best,

luke




Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] problem with pipes, textConnection and read.dcf

2021-08-10 Thread luke-tierney


Not an issue with pipes. The pipe just rewrites the expression to a
nested call and that is then evaluated. The call this produces is


quote(L |>

+gsub(pattern = " ", replacement = "") |>
+gsub(pattern = " ", replacement = "") |>
+textConnection() |>
+read.dcf())
read.dcf(textConnection(gsub(gsub(L, pattern = " ", replacement = ""),
pattern = " ", replacement = "")))

If you run that expression, or just the argument to read.dcf, then you
get the error you report. So the issue is somewhere in textConnection().
This produces a similar message:

read.dcf(textConnection(c(L, "aa", "", 
"", "ddd")))

File a bug report and someone who understands the textConnection()
internals better than I do can take a look.

Best,

luke

On Tue, 10 Aug 2021, Gabor Grothendieck wrote:


This gives an error bit if the first gsub line is commented out then there is no
error even though it is equivalent code.

 L <- c("Variable:id", "Length:112630 ")

 L |>
   gsub(pattern = " ", replacement = "") |>
   gsub(pattern = " ", replacement = "") |>
   textConnection() |>
   read.dcf()
 ## Error in textConnection(gsub(gsub(L, pattern = " ", replacement = ""),  :
 ##  argument 'object' must deparse to a single character string

That is this works:

 L |>
   # gsub(pattern = " ", replacement = "") |>
   gsub(pattern = " ", replacement = "") |>
   textConnection() |>
   read.dcf()
 ##  Variable Length
 ## [1,] "id" "112630"

 R.version.string
 ## [1] "R version 4.1.0 RC (2021-05-16 r80303)"
 win.version()
 ## [1] "Windows 10 x64 (build 19042)"




--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: [R-pkg-devel] Tracking down inconsistent errors and notes across operating systems

2021-07-22 Thread luke-tierney


Thanks; fix committed in r80654.

Best,

luke

On Thu, 22 Jul 2021, Bill Dunlap wrote:


A small example of the problem is 
#define USE_RINTERNALS 1
#include 
#include 
#include 
static s_object* obj = NULL;

Prior to 2021-07-20, with svn 80639, this compiled but after, svn 80647,
that I get

$ gcc -I"/mnt/c/R/R-svn/trunk/src/include" -I.   -I/usr/local/include  
-fpic  -g -O2 -flto -c s_object.c 2>&1
In file included from s_object.c:5:
/mnt/c/R/R-svn/trunk/src/include/Rdefines.h:168:33: error: unknown type name
‘SEXPREC’
  168 | #define s_object                SEXPREC
      |                                 ^~~
s_object.c:7:8: note: in expansion of macro ‘s_object’
    7 | static s_object* obj = NULL;
      |        ^~~~



On Thu, Jul 22, 2021 at 10:18 AM Bill Dunlap 
wrote:
  I think the problem with RPostgreSQL/sec/RS-DBI.c comes from
  some changes to Defn.h and Rinternals.h in RHOME/include that
  Luke made recently (2021-07-20, svn 80647).  Since then the
  line   #define s_object SEXPREC
in Rdefines.h causes problems.  Should it now be 'struct SEXPREC'?

-Bill


On Thu, Jul 22, 2021 at 7:04 AM Iñaki Ucar 
wrote:
  Hi,

  On Thu, 22 Jul 2021 at 15:51, Hannah Owens
   wrote:
  >
  > Hi all,
  > I am working on an update to a package I have on CRAN
  called occCite. My
  > latest release attempt didn’t pass incoming automated
  checks, because there
  > is an outstanding error. Additionally, there are some
  weird notes I would
  > like to get rid of, if anyone has suggestions.
  >
  > The killing error is in r-devel-linux-x86_64-debian-gcc,
  which is: Packages
  > required but not available: 'BIEN', 'taxize',
  ‘RPostgreSQL'
  >
  > I don’t understand this, as it is the only system that
  throws this error,
  > and the packages mentioned are available via CRAN. Any
  suggestions?

  This kind of message usually arises when there is some
  problem with
  those packages on CRAN. Indeed,

  https://cran.r-project.org/web/checks/check_results_BIEN.html
  https://cran.r-project.org/web/checks/check_results_taxize.html
  https://cran.r-project.org/web/checks/check_results_RPostgreSQL.html

  the three of them have ERRORs in that platform. No issue
  on your end.
  You reply pointing to that.

  > Additionally, there are multiple platforms
  > (r-devel-linux-x86_64-fedora-clang;
  r-devel-linux-x86_64-fedora-gcc;
  > r-devel-windows-x86_64-gcc10-UCRT;
  r-patched-solaris-x86;
  > r-release-macos-arm64; r-release-macos-x86_64;
  r-oldrel-macos-x86_64) where
  > two notes pop up:
  >
  > NOTE 1: Namespace in Imports field not imported from:
  ‘bit64’ All declared
  > Imports should be used.
  >
  > The package does use bit64. Any tips on how to address
  this note?

  Are you sure? Your NAMESPACE file does not import(bit64)
  nor
  importFrom(bit64,) anything.

  > NOTE 2: Found 6 marked UTF-8 strings.
  >
  > I presume this is thrown because of the small sample
  dataset I’ve included
  > in the package, but why is it not thrown for all the
  platforms?

  Not all the checks are necessarily done in all the
  platforms. You can
  silence this NOTE by converting the offending strings in
  your datasets
  to ASCII and resaving them.

  --
  Iñaki Úcar

  __
  r-package-de...@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-package-devel





--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] changes in some header files

2021-07-20 Thread luke-tierney


We are working on rearranging some of our header files with the goal
of making the installed headers correspond more closely to the C API
available to packages. Packages that only use entry points and
definitions that are part of the API as specified in Chapter 6 of
Writing R Extensions should not be affected.

I have committed an initial set of changes to R-devel in
r80644. About 10 CRAN packages that use non-API features will fail
under R-devel after these changes and their maintainers have been
notified.

If you are currently using non-API features in a package it would be a
good idea to review what you are doing and to try to revise your code
to work within the API. If you feel there are features missing in the
API then you can suggest additions on this mailing list or bugzilla.

Best,

luke

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Clearing attributes returns ALTREP, serialize still saves them

2021-07-03 Thread luke-tierney


Please do not cross post. You have already rased this on bugzilla. I
will follow up there later today.

luke

On Sat, 3 Jul 2021, Zafer Barutcuoglu wrote:


Hi all,

Setting names/dimnames on vectors/matrices of length>=64 returns an ALTREP 
wrapper which internally still contains the names/dimnames, and calling 
base::serialize on the result writes them out. They are unserialized in the same 
way, with the names/dimnames hidden in the ALTREP wrapper, so the problem is not 
obvious except in wasted time, bandwidth, or disk space.

Example:
  v1 <- setNames(rnorm(64), paste("element name", 1:64))
  v2 <- unname(v1)
  names(v2)
  # NULL
  length(serialize(v1, NULL))
  # [1] 2039
  length(serialize(v2, NULL))
  # [1] 2132
  length(serialize(v2[TRUE], NULL))
  # [1] 543

  con <- rawConnection(raw(), "w")
  serialize(v2, con)
  v3 <- unserialize(rawConnectionValue(con))
  names(v3)
  # NULL
  length(serialize(v3, NULL))
  # 2132

  # Similarly for matrices:
  m1 <- matrix(rnorm(64), 8, 8, dimnames=list(paste("row name", 1:8), paste("col 
name", 1:8)))
  m2 <- unname(m1)
  dimnames(m2)
  # NULL
  length(serialize(m1, NULL))
  # [1] 918
  length(serialize(m2, NULL))
  # [1] 1035
  length(serialize(m2[TRUE, TRUE], NULL))
  # 582

Previously discussed here, too:
https://r.789695.n4.nabble.com/Invisible-names-problem-td4764688.html

This happens with other attributes as well, but less predictably:
  x1 <- structure(rnorm(100), data=rnorm(100))
  x2 <- structure(x1, data=NULL)
  length(serialize(x1, NULL))
  # [1] 8000952
  length(serialize(x2, NULL))
  # [1] 924

  x1b <- rnorm(100)
  attr(x1b, "data") <- rnorm(100)
  x2b <- x1b
  attr(x2b, "data") <- NULL
  length(serialize(x1b, NULL))
  # [1] 8000863
  length(serialize(x2b, NULL))
  # [1] 8000956

This is pretty severe, trying to track down why serializing a small object 
kills the network, because of which large attributes it may have once had 
during its lifetime around the codebase that are still secretly tagging along.

Is there a plan to resolve this? Any suggestions for maybe a C++ workaround 
until then? Or an alternative performant serialization solution?

Best,
--
Zafer


[[alternative HTML version deleted]]

______
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics and    Fax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] SET_COMPLEX_ELT and SET_RAW_ELT missing from Rinternals.h

2021-07-02 Thread luke-tierney

On Thu, 1 Jul 2021, Konrad Siek wrote:

Thanks!

So what would be the prescribed way of assigning elements to a CPLXSXP if I
needed to?

The first question is whether you need to do this. Or, more to the
point, whether it is safe to do this. In R objects should behave as if
they are not mutable. Mutation in C code may be OK if the objects are
not reachable from any R variables, but that almost always means they
are private to your code so yo can use what you know about internal
structure.

If it is legitimate to mutate you can use SET_COMPLEX_ELT. I've added
the declaration to Rinternals in R-devel and R-patched.

For SET_COMPLEX_ELT(x, in v) is equivalent to COMPLEX(sexp)[index] = value,
but that could change in the future it Set methods are supported.

This does materialize a potentially compact object, but again the most
important question is whether mutation is legitimate at all.

One way I see is to do what most of the code inside the interpreter does and
grab the vector's data pointer:

    COMPLEX(sexp)[index] = value;
    COMPLEX0(sexp)[index] = value;

COMPLEX0 is not in the API; it will probably be removed from the
installed header files as we clean these up.

This will materialize an ALTREP CPLXSXP though, so maybe the best way would
be to mirror what SET_COMPLEX_ELT does in Rinlinedfuns.h?

    if (ALTREP(sexp)) ALTCOMPLEX_SET_ELT(sexp, index, value); else
COMPLEX0(sexp)[index] = vector;

ALTCOMPLEX_SET_ELT is an internal implementation feature and not in the API.
Again, it will probably be removed from the installed headers.

Best,

luke

This seems better, but it's not used in the interpreter anywhere as far as I
can tell, presumably because of the setter interface not being complete, as
you point out. But should I be avoiding this second approach for some
reaosn?

k

On Tue, Jun 29, 2021 at 4:06 AM  wrote:
  The setter interface for atomic types is not yer implemented. It
  may
  be some day.

  Best,

  luke

  On Fri, 25 Jun 2021, Konrad Siek wrote:

  > Hello,
  >
  > I am working on a package that works with various types of R
  vectors,
  > implemented in C. My code has a lot of SET_*_ELT operations in
  it for
  > various types of vectors, including for CPLXSXPs and RAWSXPs.
  >
  > I noticed SET_COMPLEX_ELT and SET_RAW_ELT are defined in
  Rinlinedfuns.h but
  > not declared in Rinternals.h, so they cannot be used in
  packages. I was
  > going to re-implement them or extern them in my package,
  however,
  > interestingly, ALTCOMPLEX_SET_ELT and ALTRAW_SET_ELT  are both
  declared in
  > Rinternals.h, making me think SET_COMPLEX_ELT and SET_RAW_ELT
  could be
  > purposefully obscured. Otherwise it may just be an oversight
  and I should
  > bring it to someone's attention anyway.
  >
  > I have three questions that I hope R-devel could help me with.
  >
  > 1. Is this an oversight, or are SET_COMPLEX_ELT and
  SET_RAW_ELT not exposed
  > on purpose? 2. If they are not exposed on purpose, I was
  wondering why.
  > 3. More importantly, what would be good ways to set elements
  of these
  > vectors while playing nice with ALTREP and avoiding whatever
  pitfalls
  > caused these functions to be obscured in the first place?
  >
  > Best regards,
  > Konrad,
  >
  >       [[alternative HTML version deleted]]
  >
  > __
  > R-devel@r-project.org mailing list
  > https://stat.ethz.ch/mailman/listinfo/r-devel
  >

  --
  Luke Tierney
  Ralph E. Wareham Professor of Mathematical Sciences
  University of Iowa                  Phone:           
   319-335-3386
  Department of Statistics and        Fax:             
   319-335-3017
      Actuarial Science
  241 Schaeffer Hall                  email: 
   luke-tier...@uiowa.edu
  Iowa City, IA 52242                 WWW: 
  http://www.stat.uiowa.edu

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: ALTREP ALTINTEGER_SUM/MIN/MAX Return Value and Behavior

2021-06-29 Thread luke-tierney


Call the R sum() function, either before going to C code or by calling
back into R. You may only want to do this if the vector is long enough
for e possible savings to be worth while.


On Tue, 29 Jun 2021, Sebastian Martin Krantz wrote:


Thanks both. Is there a suggested way I can get this speedup in a package?
Or just leave it for now?

Thanks also for the clarification Bill. The issue I have with that is that
in my C code ALTREP(x) evaluates to true even after adding and removing
dimensions (otherwise it would be handled by the normal sum method and I’d
be fine).


When you use a longer vector


Also .Internal(inspect(x)) still shows the compact
representation.


A different representation (wrapper around a compact sequence).

Best,

luke



-Sebastian

On Tue 29. Jun 2021 at 19:43, Bill Dunlap  wrote:


Adding the dimensions attribute takes away the altrep-ness.  Removing
dimensions
does not make it altrep.  E.g.,


a <- 1:10
am <- a ; dim(am) <- c(2L,5L)
amn <- am ; dim(amn) <- NULL
.Call("is_altrep", a)

[1] TRUE

.Call("is_altrep", am)

[1] FALSE

.Call("is_altrep", amn)

[1] FALSE

where is_altrep() is defined by the following C code:

#include 
#include 

SEXP is_altrep(SEXP x)
{
return Rf_ScalarLogical(ALTREP(x));
}


-Bill

On Tue, Jun 29, 2021 at 8:03 AM Sebastian Martin Krantz <
sebastian.kra...@graduateinstitute.ch> wrote:


Hello together, I'm working on some custom (grouped, weighted) sum, min
and
max functions and I want them to support the special case of plain integer
sequences using ALTREP. I thereby encountered some behavior I cannot
explain to myself. The head of my fsum C function looks like this (g is
optional grouping vector, w is optional weights vector):

SEXP fsumC(SEXP x, SEXP Rng, SEXP g, SEXP w, SEXP Rnarm) {
  int l = length(x), tx = TYPEOF(x), ng = asInteger(Rng),
narm = asLogical(Rnarm), nprotect = 1, nwl = isNull(w);
  if(ALTREP(x) && ng == 0 && nwl) {
switch(tx) {
case INTSXP: return ALTINTEGER_SUM(x, (Rboolean)narm);
case LGLSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
case REALSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
default: error("ALTREP object must be integer or real typed");
}
  }
// ...
}

when I let x <- 1:1e8, fsum(x) works fine and returns the correct value.
If
I now make this a matrix dim(x) <- c(1e2, 1e6) and subsequently turn this
into a vector again, dim(x) <- NULL, fsum(x) gives  NULL and a warning
message 'converting NULL pointer to R NULL'. For functions fmin and fmax
(similarly defined using ALTINTEGER_MIN/MAX), I get this error right away
e.g. fmin(1:1e8) gives NULL and warning 'converting NULL pointer to R
NULL'. So what is going on here? What do these functions return? And how
do
I make this a robust implementation?

Best regards,

Sebastian Krantz

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





[[alternative HTML version deleted]]

______
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: ALTREP ALTINTEGER_SUM/MIN/MAX Return Value and Behavior

2021-06-29 Thread luke-tierney


It depends on the size. For a larger vector adding dim will create a
wrapper ALTREP.

Currently the wrapper does not try to use the payload's sum method;
this could be added.

Best,

luke

On Tue, 29 Jun 2021, Bill Dunlap wrote:


Adding the dimensions attribute takes away the altrep-ness.  Removing
dimensions
does not make it altrep.  E.g.,


a <- 1:10
am <- a ; dim(am) <- c(2L,5L)
amn <- am ; dim(amn) <- NULL
.Call("is_altrep", a)

[1] TRUE

.Call("is_altrep", am)

[1] FALSE

.Call("is_altrep", amn)

[1] FALSE

where is_altrep() is defined by the following C code:

#include 
#include 

SEXP is_altrep(SEXP x)
{
   return Rf_ScalarLogical(ALTREP(x));
}

-Bill

On Tue, Jun 29, 2021 at 8:03 AM Sebastian Martin Krantz <
sebastian.kra...@graduateinstitute.ch> wrote:


Hello together, I'm working on some custom (grouped, weighted) sum, min and
max functions and I want them to support the special case of plain integer
sequences using ALTREP. I thereby encountered some behavior I cannot
explain to myself. The head of my fsum C function looks like this (g is
optional grouping vector, w is optional weights vector):

SEXP fsumC(SEXP x, SEXP Rng, SEXP g, SEXP w, SEXP Rnarm) {
  int l = length(x), tx = TYPEOF(x), ng = asInteger(Rng),
narm = asLogical(Rnarm), nprotect = 1, nwl = isNull(w);
  if(ALTREP(x) && ng == 0 && nwl) {
switch(tx) {
case INTSXP: return ALTINTEGER_SUM(x, (Rboolean)narm);
case LGLSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
case REALSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
default: error("ALTREP object must be integer or real typed");
}
  }
// ...
}

when I let x <- 1:1e8, fsum(x) works fine and returns the correct value. If
I now make this a matrix dim(x) <- c(1e2, 1e6) and subsequently turn this
into a vector again, dim(x) <- NULL, fsum(x) gives  NULL and a warning
message 'converting NULL pointer to R NULL'. For functions fmin and fmax
(similarly defined using ALTINTEGER_MIN/MAX), I get this error right away
e.g. fmin(1:1e8) gives NULL and warning 'converting NULL pointer to R
NULL'. So what is going on here? What do these functions return? And how do
I make this a robust implementation?

Best regards,

Sebastian Krantz

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] ALTREP ALTINTEGER_SUM/MIN/MAX Return Value and Behavior

2021-06-29 Thread luke-tierney


ALTINTEGER_SUM and friends are _not_ intended for use in package code.
Once we get some time to clean up headers they will no longer be
visible to packages.

Best,

luke

On Tue, 29 Jun 2021, Sebastian Martin Krantz wrote:


Hello together, I'm working on some custom (grouped, weighted) sum, min and
max functions and I want them to support the special case of plain integer
sequences using ALTREP. I thereby encountered some behavior I cannot
explain to myself. The head of my fsum C function looks like this (g is
optional grouping vector, w is optional weights vector):

SEXP fsumC(SEXP x, SEXP Rng, SEXP g, SEXP w, SEXP Rnarm) {
 int l = length(x), tx = TYPEOF(x), ng = asInteger(Rng),
   narm = asLogical(Rnarm), nprotect = 1, nwl = isNull(w);
 if(ALTREP(x) && ng == 0 && nwl) {
   switch(tx) {
   case INTSXP: return ALTINTEGER_SUM(x, (Rboolean)narm);
   case LGLSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
   case REALSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
   default: error("ALTREP object must be integer or real typed");
   }
 }
// ...
}

when I let x <- 1:1e8, fsum(x) works fine and returns the correct value. If
I now make this a matrix dim(x) <- c(1e2, 1e6) and subsequently turn this
into a vector again, dim(x) <- NULL, fsum(x) gives  NULL and a warning
message 'converting NULL pointer to R NULL'. For functions fmin and fmax
(similarly defined using ALTINTEGER_MIN/MAX), I get this error right away
e.g. fmin(1:1e8) gives NULL and warning 'converting NULL pointer to R
NULL'. So what is going on here? What do these functions return? And how do
I make this a robust implementation?

Best regards,

Sebastian Krantz

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] SET_COMPLEX_ELT and SET_RAW_ELT missing from Rinternals.h

2021-06-28 Thread luke-tierney


The setter interface for atomic types is not yer implemented. It may
be some day.

Best,

luke

On Fri, 25 Jun 2021, Konrad Siek wrote:


Hello,

I am working on a package that works with various types of R vectors,
implemented in C. My code has a lot of SET_*_ELT operations in it for
various types of vectors, including for CPLXSXPs and RAWSXPs.

I noticed SET_COMPLEX_ELT and SET_RAW_ELT are defined in Rinlinedfuns.h but
not declared in Rinternals.h, so they cannot be used in packages. I was
going to re-implement them or extern them in my package, however,
interestingly, ALTCOMPLEX_SET_ELT and ALTRAW_SET_ELT  are both declared in
Rinternals.h, making me think SET_COMPLEX_ELT and SET_RAW_ELT could be
purposefully obscured. Otherwise it may just be an oversight and I should
bring it to someone's attention anyway.

I have three questions that I hope R-devel could help me with.

1. Is this an oversight, or are SET_COMPLEX_ELT and SET_RAW_ELT not exposed
on purpose? 2. If they are not exposed on purpose, I was wondering why.
3. More importantly, what would be good ways to set elements of these
vectors while playing nice with ALTREP and avoiding whatever pitfalls
caused these functions to be obscured in the first place?

Best regards,
Konrad,

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Possible ALTREP bug

2021-06-17 Thread luke-tierney


On Thu, 17 Jun 2021, Toby Hocking wrote:


Oliver, for clarification that section in writing R extensions mentions
VECTOR_ELT and REAL but not REAL_ELT nor any other *_ELT functions. I was
looking for an explanation of all the *_ELT functions (which are apparently
new), not just VECTOR_ELT.
Thanks Simon that response was very helpful.
One more question: are there any circumstances in which one should use
REAL_ELT(x,i) rather than REAL(x)[i] or vice versa? Or can they be used
interchangeably?


For a single call it is better to use REAL_ELT(x, i) since it doesn't
force allocating a possibly large object in order to get a pointer to
its data with REAL(x).  If you are iterating over a whole object you
may want to get data in chunks. There are iteration macros that
help. Some examples are in src/main/summary.c.

Best,

luke



On Wed, Jun 16, 2021 at 4:29 PM Simon Urbanek 
wrote:
  The usual quote applies: "use the source, Luke":

  $ grep _ELT *.h | sort
  Rdefines.h:#define SET_ELEMENT(x, i, val)     
   SET_VECTOR_ELT(x, i, val)
  Rinternals.h:   The function STRING_ELT is used as an argument
  to arrayAssign even
  Rinternals.h:#define VECTOR_ELT(x,i)    ((SEXP *) DATAPTR(x))[i]
  Rinternals.h://SEXP (STRING_ELT)(SEXP x, R_xlen_t i);
  Rinternals.h:Rbyte (RAW_ELT)(SEXP x, R_xlen_t i);
  Rinternals.h:Rbyte ALTRAW_ELT(SEXP x, R_xlen_t i);
  Rinternals.h:Rcomplex (COMPLEX_ELT)(SEXP x, R_xlen_t i);
  Rinternals.h:Rcomplex ALTCOMPLEX_ELT(SEXP x, R_xlen_t i);
  Rinternals.h:SEXP (STRING_ELT)(SEXP x, R_xlen_t i);
  Rinternals.h:SEXP (VECTOR_ELT)(SEXP x, R_xlen_t i);
  Rinternals.h:SEXP ALTSTRING_ELT(SEXP, R_xlen_t);
  Rinternals.h:SEXP SET_VECTOR_ELT(SEXP x, R_xlen_t i, SEXP v);
  Rinternals.h:double (REAL_ELT)(SEXP x, R_xlen_t i);
  Rinternals.h:double ALTREAL_ELT(SEXP x, R_xlen_t i);
  Rinternals.h:int (INTEGER_ELT)(SEXP x, R_xlen_t i);
  Rinternals.h:int (LOGICAL_ELT)(SEXP x, R_xlen_t i);
  Rinternals.h:int ALTINTEGER_ELT(SEXP x, R_xlen_t i);
  Rinternals.h:int ALTLOGICAL_ELT(SEXP x, R_xlen_t i);
  Rinternals.h:void ALTCOMPLEX_SET_ELT(SEXP x, R_xlen_t i,
  Rcomplex v);
  Rinternals.h:void ALTINTEGER_SET_ELT(SEXP x, R_xlen_t i, int v);
  Rinternals.h:void ALTLOGICAL_SET_ELT(SEXP x, R_xlen_t i, int v);
  Rinternals.h:void ALTRAW_SET_ELT(SEXP x, R_xlen_t i, Rbyte v);
  Rinternals.h:void ALTREAL_SET_ELT(SEXP x, R_xlen_t i, double v);
  Rinternals.h:void ALTSTRING_SET_ELT(SEXP, R_xlen_t, SEXP);
  Rinternals.h:void SET_INTEGER_ELT(SEXP x, R_xlen_t i, int v);
  Rinternals.h:void SET_LOGICAL_ELT(SEXP x, R_xlen_t i, int v);
  Rinternals.h:void SET_REAL_ELT(SEXP x, R_xlen_t i, double v);
  Rinternals.h:void SET_STRING_ELT(SEXP x, R_xlen_t i, SEXP v);

  So the indexing is with R_xlen_t and they return the value
  itself as one would expect.

  Cheers,
  Simon


  > On Jun 17, 2021, at 2:22 AM, Toby Hocking 
  wrote:
  >
  > By the way, where is the documentation for INTEGER_ELT,
  REAL_ELT, etc? I
  > looked in Writing R Extensions and R Internals but I did not
  see any
  > mention.
  > REAL_ELT is briefly mentioned on
  > https://svn.r-project.org/R/branches/ALTREP/ALTREP.html
  > Would it be possible to please add some mention of them to
  Writing R
  > Extensions?
  > - how many of these _ELT functions are there? INTEGER, REAL,
  ... ?
  > - in what version of R were they introduced?
  > - I guess input types are always SEXP and int?
  > - What are the output types for each?
  >
  > On Fri, May 28, 2021 at 5:16 PM 
  wrote:
  >
  >> Since the INTEGER_ELT, REAL_ELT, etc, functions are fairly
  new it may
  >> be possible to check that places where they are used allow
  for them to
  >> allocate. I have fixed the one that got caught by Gabor's
  example, and
  >> a rchk run might be able to pick up others if rchk knows
  these could
  >> allocate. (I may also be forgetting other places where the
  _ELt
  >> methods are used.)  Fixing all call sites for REAL, INTEGER,
  etc, was
  >> never realistic so there GC has to be suspended during the
  method
  >> call, and that is done in the dispatch mechanism.
  >>
  >> The bigger problem is jumps from inside things that existing
  code
  >> assumes will not do that. Catching those jumps is possible
  but
  >> expensive; doing anything sensible if one is caught is really
  not
  >> possible.
  >>
  >> Best,
  >>
  >> luke
  >>
  >> On Fri, 28 May 2021, Gabriel Becker wrote:
  >>
  >>> Hi Jim et al,
  >>> Just t

Re: [Rd] [External] Possible ALTREP bug

2021-05-28 Thread luke-tierney

Since the INTEGER_ELT, REAL_ELT, etc, functions are fairly new it may
be possible to check that places where they are used allow for them to
allocate. I have fixed the one that got caught by Gabor's example, and
a rchk run might be able to pick up others if rchk knows these could
allocate. (I may also be forgetting other places where the _ELt
methods are used.)  Fixing all call sites for REAL, INTEGER, etc, was
never realistic so there GC has to be suspended during the method
call, and that is done in the dispatch mechanism.

The bigger problem is jumps from inside things that existing code
assumes will not do that. Catching those jumps is possible but
expensive; doing anything sensible if one is caught is really not
possible.

Best,

luke

On Fri, 28 May 2021, Gabriel Becker wrote:

Hi Jim et al,
Just to hopefully add a bit to what Luke already answered, from what I am
recalling looking back at that bioconductor thread Elt methods are used in
places where there are hard implicit assumptions that no garbage collection
will occur (ie they are called on things that aren't PROTECTed), and beyond
that, in places where there are hard assumptions that no error (longjmp)
will occur. I could be wrong, but I don't know that suspending garbage
collection would protect from the second one. Ie it is possible that an
error *ever* being raised from R code that implements an elt method could
cause all hell to break loose.

Luke or Tomas Kalibera would know more.

I was disappointed that implementing ALTREPs in R code was not in the cards
(it was in my original proposal back in 2016 to the DSC) but I trust Luke
that there are important reasons we can't safely allow that.

Best,
~G

On Fri, May 28, 2021 at 8:31 AM Jim Hester  wrote:
  From reading the discussion on the Bioconductor issue tracker it
  seems like
  the reason the GC is not suspended for the non-string ALTREP Elt
  methods is
  primarily due to performance concerns.

  If this is the case perhaps an additional flag could be added to
  the
  `R_set_altrep_*()` functions so ALTREP authors could indicate if
  GC should
  be halted when that particular method is called for that
  particular ALTREP
  class.

  This would avoid the performance hit (other than a boolean
  check) for the
  standard case when no allocations are expected, but allow
  authors to
  indicate that R should pause GC if needed for methods in their
  class.

  On Fri, May 28, 2021 at 9:42 AM  wrote:

  > integer and real Elt methods are not expected to allocate. You
  would
  > have to suspend GC to be able to do that. This currently can't
  be done
  > from package code.
  >
  > Best,
  >
  > luke
  >
  > On Fri, 28 May 2021, Gábor Csárdi wrote:
  >
  > > I have found some weird SEXP corruption behavior with
  ALTREP, which
  > > could be a bug. (Or I could be doing something wrong.)
  > >
  > > I have an integer ALTREP vector that calls back to R from
  the Elt
  > > method. When this vector is indexed in a lapply(), its first
  element
  > > gets corrupted. Sometimes it's just a type change to
  logical, but
  > > sometimes the corruption causes a crash.
  > >
  > > I saw this on macOS from R 3.5.3 to 4.2.0. I created a small
  package
  > > that demonstrates this:
  https://github.com/gaborcsardi/redfish
  > >
  > > The R callback in this package calls
  `loadNamespace("Matrix")`, but
  > > the same crash happens for other packages as well, and
  sometimes it
  > > also happens if I don't load any packages at all. (But that
  example
  > > was much more complicated, so I went with the package
  loading.)
  > >
  > > It is somewhat random, and sometimes turning off the JIT
  avoids the
  > > crash, but not always.
  > >
  > > Hopefully I am just doing something wrong in the ALTREP code
  (see
  > >
  https://github.com/gaborcsardi/redfish/blob/main/src/test.c),
  and it
  > > is not actually a bug.
  > >
  > > Thanks,
  > > Gabor
  > >
  > > __
  > > R-devel@r-project.org mailing list
  > > https://stat.ethz.ch/mailman/listinfo/r-devel
  > >
  >
  > --
  > Luke Tierney
  > Ralph E. Wareham Professor of Mathematical Sciences
  > University of Iowa                  Phone:           
   319-335-3386
  > Department of Statistics and        Fax:             
   319-335-3017
  >     Actuarial Science
  > 241 Schaeffer Hall

Re: [Rd] [External] Possible ALTREP bug

2021-05-28 Thread luke-tierney


integer and real Elt methods are not expected to allocate. You would
have to suspend GC to be able to do that. This currently can't be done
from package code.

Best,

luke

On Fri, 28 May 2021, Gábor Csárdi wrote:


I have found some weird SEXP corruption behavior with ALTREP, which
could be a bug. (Or I could be doing something wrong.)

I have an integer ALTREP vector that calls back to R from the Elt
method. When this vector is indexed in a lapply(), its first element
gets corrupted. Sometimes it's just a type change to logical, but
sometimes the corruption causes a crash.

I saw this on macOS from R 3.5.3 to 4.2.0. I created a small package
that demonstrates this: https://github.com/gaborcsardi/redfish

The R callback in this package calls `loadNamespace("Matrix")`, but
the same crash happens for other packages as well, and sometimes it
also happens if I don't load any packages at all. (But that example
was much more complicated, so I went with the package loading.)

It is somewhat random, and sometimes turning off the JIT avoids the
crash, but not always.

Hopefully I am just doing something wrong in the ALTREP code (see
https://github.com/gaborcsardi/redfish/blob/main/src/test.c), and it
is not actually a bug.

Thanks,
Gabor

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: 1954 from NA

2021-05-25 Thread luke-tierney


On Tue, 25 May 2021, Adrian Dușa wrote:


Dear Avi,

Thank you so much for the extended messages, I read them carefully.
While partially offering a solution (I've already been there), it creates
additional work for the user, and some of that is unnecessary.

What I am trying to achieve is best described in this draft vignette:

devtools::install_github("dusadrian/mixed")
vignette("mixed")

Once a value is declared to be missing, the user should not do anything
else about it. Despite being present, the value should automatically be
treated as missing by the software. That is the way it's done in all major
statistical packages like SAS, Stata and even SPSS.

My end goal is to make R attractive for my faculty peers (and beyond),
almost all of whom are massively using SPSS and sometimes Stata. But in
order to convince them to (finally) make the switch, I need to provide
similar functionality, not additional work.

Re. your first part of the message, I am definitely not trying to change
the R internals. The NA will still be NA, exactly as currently defined.
My initial proposal was based on the observation that the 1954 payload was
stored as an unsigned int (thus occupying 32 bits) when it is obvious it
doesn't need more than 16. That was the only proposed modification, and
everything else stays the same.

I now learned, thanks to all contributors in this list, that building
something around that payload is risky because we do not know exactly what
the compilers will do. One possible solution that I can think of, while
(still) maintaining the current functionality around the NA, is to use a
different high word for the NA that would not trigger compilation issues.
But I have absolutely no idea what that implies for the other inner
workings of R.

I very much trust the R core will eventually find a robust solution,
they've solved much more complicated problems than this. I just hope the
current thread will push the idea of tagged NAs on the table, for when they
will discuss this.

Once that will be solved, and despite the current advice discouraging this
route, I believe tagging NAs is a valuable idea that should not be
discarded.


Yes, it should be discarded.

You can of course do what you like in code you keep to yourself. But
please do not distribute code that does this. via CRAN or any other
means. It will only create problems for those maintaining R.


After all, the NA is nothing but a tagged NaN.


And we are now paying a price for what was, in hindsight, an
unfortunate decision.

Best,

luke


All the best,
Adrian


On Tue, May 25, 2021 at 7:05 AM Avi Gross via R-devel 
wrote:


I was thinking about how one does things in a language that is properly
object-oriented versus R that makes various half-assed attempts at being
such.

Clearly in some such languages you can make an object that is a wrapper
that allows you to save an item that is the main payload as well as
anything else you want. You might need a way to convince everything else to
allow you to make things like lists and vectors and other collections of
the objects and perhaps automatically unbox them for many purposes. As an
example in a language like Python, you might provide methods so that adding
A and B actually gets the value out of A and/or B and adds them properly.
But there may be too many edge cases to handle and some software may not
pay attention to what you want including some libraries written in other
languages.

I mention Python for the odd reason that it is now possible to combine
Python and R in the same program and sort of switch back and forth between
data representations. This may provide some openings for preserving and
accessing metadata when needed.

Realistically, if R was being designed from scratch TODAY, many things
might be done differently. But I recall it being developed at Bell Labs for
purposes where it was sort of revolutionary at the time (back when it was
S) and designed to do things in a vectorized way and probably primarily for
the kinds of scientific and mathematical operations where a single NA (of
several types depending on the data) was enough when augmented by a few
things like a Nan and Inf and -Inf. I doubt they seriously saw a need for
an unlimited number of NA that were all the same AND also all different
that they felt had to be built-in. As noted, had they had a reason to make
it fully object-oriented too and made the base types such as integer into
full-fledged objects with room for additional metadata, then things may be
different. I note I have seen languages which have both a data type called
integer as lower case and Integer as upper case. One of them is regularly
boxed and unboxed automagically when used in a context that needs the
other. As far as efficiency goes, this invisibly adds many steps. So do
languages that sometimes take a variable that is a pointer and invisibly
reference it to provide the underlying field rather than make you do extra
t

Re: [Rd] [External] Re: 1954 from NA

2021-05-24 Thread luke-tierney


On Mon, 24 May 2021, Adrian Dușa wrote:


On Mon, May 24, 2021 at 2:11 PM Greg Minshall  wrote:


[...]
if you have 500 columns of possibly-NA'd variables, you could have one
column of 500 "bits", where each bit has one of N values, N being the
number of explanations the corresponding column has for why the NA
exists.



PLEASE DO NOT DO THIS!

It will not work reliably, as has been explained to you ad nauseam in
this thread.

If you distribute code that does this it will only lead to bug reports
on R that will waste R-core time.

As Alex explained, you can use attributes for this. If you need
operations to preserve attributes across subsetting you can define
subsetting methods that do that.

If you are dead set on doing something in C you can try to develop an
ALTREP class that provides augmented missing value information.

Best,

luke





The mere thought of implementing something like that gives me shivers. Not
to mention such a solution should also be robust when subsetting,
splitting, column and row binding, etc. and everything can be lost if the
user deletes that particular column without realising its importance.

Social science datasets are much more alive and complex than one might
first think: there are multi-wave studies with tens of countries, and
aggregating such data is already a complex process to add even more
complexity on top of that.

As undocumented as they may be, or even subject to change, I think the R
internals are much more reliable that this.

Best wishes,
Adrian




--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Pipe bind restored in R 4.1.0?

2021-04-17 Thread luke-tierney


No. We need more time to resolve issues revealed in testing.

Best,

luke

On Sat, 17 Apr 2021, Brenton Wiernik wrote:


Is the pipe bind `=>` operator likely to be restored by default in time for the 
4.1 release?

Brenton


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] memory consumption of nested (un)serialize of sys.frames()

2021-04-08 Thread luke-tierney


Looks like this is an unavoidable interaction between the way source
references and lazy loading are implemented. The link back to the
crash_dumps environment comes though source references on an
unevaluated argument promise. Creating a fresh environment is
.onLoad() avoids this and is probably your best bet.

Having an option to serialize without source references might be nice
but would probably not be high enough on anyone's priority list to get
done anytime soon.

Best,

luke

On Thu, 8 Apr 2021, luke-tier...@uiowa.edu wrote:


I see that now also. Not sure yet what is going on.

One work-around that may work for you is to create a fresh crash dump
in a .onLoad function; somehting like

crash_dumps <- NULL
.onLoad <- function(...) crash_dumps <<- new.env()

Best,

luke

On Wed, 7 Apr 2021, Andreas Kersting wrote:


Hi Dirk, hi Luke,

Thanks for checking!

I could narrow it down further. I have the issue only if I install 
--with-keep.source, i.e.


R CMD INSTALL --with-keep.source dumpTest

Since this is the default in RStudio when clicking "Install and Restart", I 
was always having the issue - also from base R. If I install using e.g. 
devtools::install_github() directly it is also fine for me.


Could you please confirm? Thanks!

Regards,
Andreas

2021-04-07 16:20 GMT+02:00 "Dirk Eddelbuettel" :


On 7 April 2021 at 16:06, Andreas Kersting wrote:
| Hi Luke,
|
| Please see https://github.com/akersting/dumpTest for the package.
|
| Here a session showing my issue:
|
| > library(dumpTest)
| > sessionInfo()
| R version 4.0.5 (2021-03-31)
| Platform: x86_64-pc-linux-gnu (64-bit)
| Running under: Debian GNU/Linux 10 (buster)
|
| Matrix products: default
| BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
| LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0
|
| locale:
|  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
|  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
|  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
|  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
|  [9] LC_ADDRESS=C   LC_TELEPHONE=C
| [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
|
| attached base packages:
| [1] stats graphics  grDevices utils datasets  methods   base
|
| other attached packages:
| [1] dumpTest_0.1.0
|
| loaded via a namespace (and not attached):
| [1] compiler_4.0.5
| > for (i in 1:100) {
| +   print(i)
| +   print(system.time(f()))
| + }
| [1] 1
|user  system elapsed
|   0.028   0.004   0.034
| [1] 2
|user  system elapsed
|   0.067   0.008   0.075
| [1] 3
|user  system elapsed
|   0.176   0.000   0.176
| [1] 4
|user  system elapsed
|   0.335   0.012   0.349
| [1] 5
|user  system elapsed
|   0.745   0.023   0.770
| [1] 6
|user  system elapsed
|   1.495   0.060   1.572
| [1] 7
|user  system elapsed
|   2.902   0.136   3.040
| [1] 8
|user  system elapsed
|   5.753   0.272   6.034
| [1] 9
|user  system elapsed
|  11.807   0.708  12.597
| [1] 10
| ^C
| Timing stopped at: 6.638 0.549 7.214
|
| I had to interrupt in iteration 10 because I was running low on RAM.

No issue here.  Ubuntu 20.10, R 4.0.5 'from CRAN' i.e. Michael's PPA build
off my Debian package, hence instrumentation as in the Debian package.

edd@rob:~$ installGithub.r akersting/dumpTest
Using github PAT from envvar GITHUB_PAT
Downloading GitHub repo akersting/dumpTest@HEAD
✔  checking for file 
‘/tmp/remotes3f9af733166ccd/akersting-dumpTest-3bed8e2/DESCRIPTION’ ...

─  preparing ‘dumpTest’:
✔  checking DESCRIPTION meta-information ...
─  checking for LF line-endings in source and make files and shell scripts
─  checking for empty or unneeded directories
─  building ‘dumpTest_0.1.0.tar.gz’

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
* installing *source* package ‘dumpTest’ ...
** using staged installation
** R
** byte-compile and prepare package for lazy loading
** help
No man pages found in package  ‘dumpTest’
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation 
path

* DONE (dumpTest)
edd@rob:~$ Rscript -e 'system.time({for (i in 1:100) dumpTest::f()})'
   user  system elapsed
  0.481   0.019   0.500
edd@rob:~$

(I also ran the variant you showed with the dual print statements, it just
consumes more screen real estate and ends on

[...]
[1] 97
   user  system elapsed
  0.004   0.000   0.005
[1] 98
   user  system elapsed
  0.004   0.000   0.005
[1] 99
   user  system elapsed
  0.004   0.000   0.004
[1] 100
   user  system elapsed
  0.005   0.000   0.005
edd@rob:~$ )

Dirk

--
https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org






--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa

Re: [Rd] [External] memory consumption of nested (un)serialize of sys.frames()

2021-04-08 Thread luke-tierney


I see that now also. Not sure yet what is going on.

One work-around that may work for you is to create a fresh crash dump
in a .onLoad function; somehting like

crash_dumps <- NULL
.onLoad <- function(...) crash_dumps <<- new.env()

Best,

luke

On Wed, 7 Apr 2021, Andreas Kersting wrote:


Hi Dirk, hi Luke,

Thanks for checking!

I could narrow it down further. I have the issue only if I install 
--with-keep.source, i.e.

R CMD INSTALL --with-keep.source dumpTest

Since this is the default in RStudio when clicking "Install and Restart", I was 
always having the issue - also from base R. If I install using e.g. 
devtools::install_github() directly it is also fine for me.

Could you please confirm? Thanks!

Regards,
Andreas

2021-04-07 16:20 GMT+02:00 "Dirk Eddelbuettel" :


On 7 April 2021 at 16:06, Andreas Kersting wrote:
| Hi Luke,
|
| Please see https://github.com/akersting/dumpTest for the package.
|
| Here a session showing my issue:
|
| > library(dumpTest)
| > sessionInfo()
| R version 4.0.5 (2021-03-31)
| Platform: x86_64-pc-linux-gnu (64-bit)
| Running under: Debian GNU/Linux 10 (buster)
|
| Matrix products: default
| BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
| LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0
|
| locale:
|  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
|  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
|  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
|  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
|  [9] LC_ADDRESS=C   LC_TELEPHONE=C
| [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
|
| attached base packages:
| [1] stats graphics  grDevices utils datasets  methods   base
|
| other attached packages:
| [1] dumpTest_0.1.0
|
| loaded via a namespace (and not attached):
| [1] compiler_4.0.5
| > for (i in 1:100) {
| +   print(i)
| +   print(system.time(f()))
| + }
| [1] 1
|user  system elapsed
|   0.028   0.004   0.034
| [1] 2
|user  system elapsed
|   0.067   0.008   0.075
| [1] 3
|user  system elapsed
|   0.176   0.000   0.176
| [1] 4
|user  system elapsed
|   0.335   0.012   0.349
| [1] 5
|user  system elapsed
|   0.745   0.023   0.770
| [1] 6
|user  system elapsed
|   1.495   0.060   1.572
| [1] 7
|user  system elapsed
|   2.902   0.136   3.040
| [1] 8
|user  system elapsed
|   5.753   0.272   6.034
| [1] 9
|user  system elapsed
|  11.807   0.708  12.597
| [1] 10
| ^C
| Timing stopped at: 6.638 0.549 7.214
|
| I had to interrupt in iteration 10 because I was running low on RAM.

No issue here.  Ubuntu 20.10, R 4.0.5 'from CRAN' i.e. Michael's PPA build
off my Debian package, hence instrumentation as in the Debian package.

edd@rob:~$ installGithub.r akersting/dumpTest
Using github PAT from envvar GITHUB_PAT
Downloading GitHub repo akersting/dumpTest@HEAD
✔  checking for file 
‘/tmp/remotes3f9af733166ccd/akersting-dumpTest-3bed8e2/DESCRIPTION’ ...
─  preparing ‘dumpTest’:
✔  checking DESCRIPTION meta-information ...
─  checking for LF line-endings in source and make files and shell scripts
─  checking for empty or unneeded directories
─  building ‘dumpTest_0.1.0.tar.gz’

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
* installing *source* package ‘dumpTest’ ...
** using staged installation
** R
** byte-compile and prepare package for lazy loading
** help
No man pages found in package  ‘dumpTest’
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (dumpTest)
edd@rob:~$ Rscript -e 'system.time({for (i in 1:100) dumpTest::f()})'
   user  system elapsed
  0.481   0.019   0.500
edd@rob:~$

(I also ran the variant you showed with the dual print statements, it just
consumes more screen real estate and ends on

[...]
[1] 97
   user  system elapsed
  0.004   0.000   0.005
[1] 98
   user  system elapsed
  0.004   0.000   0.005
[1] 99
   user  system elapsed
  0.004   0.000   0.004
[1] 100
   user  system elapsed
  0.005   0.000   0.005
edd@rob:~$ )

Dirk

--
https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] memory consumption of nested (un)serialize of sys.frames()

2021-04-07 Thread luke-tierney


No issues here with that either. Looks like something is different on
your end.

Best,

luke

On Wed, 7 Apr 2021, Andreas Kersting wrote:


Hi Luke,

Please see https://github.com/akersting/dumpTest for the package.

Here a session showing my issue:


library(dumpTest)
sessionInfo()

R version 4.0.5 (2021-03-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0

locale:
[1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8   LC_NAME=C
[9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] dumpTest_0.1.0

loaded via a namespace (and not attached):
[1] compiler_4.0.5

for (i in 1:100) {

+   print(i)
+   print(system.time(f()))
+ }
[1] 1
  user  system elapsed
 0.028   0.004   0.034
[1] 2
  user  system elapsed
 0.067   0.008   0.075
[1] 3
  user  system elapsed
 0.176   0.000   0.176
[1] 4
  user  system elapsed
 0.335   0.012   0.349
[1] 5
  user  system elapsed
 0.745   0.023   0.770
[1] 6
  user  system elapsed
 1.495   0.060   1.572
[1] 7
  user  system elapsed
 2.902   0.136   3.040
[1] 8
  user  system elapsed
 5.753   0.272   6.034
[1] 9
  user  system elapsed
11.807   0.708  12.597
[1] 10
^C
Timing stopped at: 6.638 0.549 7.214

I had to interrupt in iteration 10 because I was running low on RAM.

Regards,
Andreas

2021-04-07 15:28 GMT+02:00 luke-tier...@uiowa.edu:

On Wed, 7 Apr 2021, Andreas Kersting wrote:


Hi,

please consider the following minimal reproducible example:

Create a new R package which just contains the following two (exported) objects:


I would not expect this behavior and I don't see it when I make such a
package (in R 4.0.3 or R-devel on Ubuntu).  You will need to provide a
more complete reproducible example if you want help with what you are
trying to do; also sessionInfo() would help.

Best,

luke




crash_dumps <- new.env()

f <- function() {
 x <- runif(1e5)
 dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL)))
 assign("last.dump", dump, crash_dumps)
}


WARNING: the following will probably eat all your RAM!

Attach this package and run:

for (i in 1:100) {
 print(i)
 f()
}

You will notice that with each iteration the execution of f() slows down 
significantly while the memory consumption of the R process (v4.0.5 on Linux) 
quickly explodes.

I am having a hard time to understand what exactly is happening here. Something 
w.r.t. too deeply nested environments? Could someone please enlighten me? 
Thanks!

Regards,
Andreas


Background:
In an R package I store crash dumps on error in a parallel processes in a way 
similar to what I have just shown (hence the (un)serialize(), which happens as 
part of returning the objects to the parent process). The first 2 or 3 times I 
do so in a session everything is fine, but afterwards it takes very long and I 
soon run out of memory.

Some more observations:
- If I omit `x <- runif(1e5)`, the issues seem to be less pronounced.
- If I assign to .GlobalEnv instead of crash_dumps, there seems to be no issue 
- probably because .GlobalEnv is not included in sys.frames(), while 
crash_dumps is indirectly via the namespace of the package being the parent.env 
of some of the sys.frames()!?
- If I omit the lapply(...), i.e. use `dump <- 
unserialize(serialize(sys.frames(), NULL))` directly, there seems to be no issue. 
The immediate consequence is that there are less sys.frames and - in particular - 
there is no frame which has the base namespace as its parent.env.
- If I make crash_dumps a list and use assignInMyNamespace() to store the dump 
in it, there also seems to be no issue. I will probably use this as a 
workaround:

crash_dumps <- list()

f <- function() {
 x <- runif(1e5)
 dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL)))
 crash_dumps[["last.dump"]] <- dump
 assignInMyNamespace("crash_dumps", crash_dumps)
}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics

Re: [Rd] [External] memory consumption of nested (un)serialize of sys.frames()

2021-04-07 Thread luke-tierney




On Wed, 7 Apr 2021, Andreas Kersting wrote:


Hi,

please consider the following minimal reproducible example:

Create a new R package which just contains the following two (exported) objects:


I would not expect this behavior and I don't see it when I make such a
package (in R 4.0.3 or R-devel on Ubuntu).  You will need to provide a
more complete reproducible example if you want help with what you are
trying to do; also sessionInfo() would help.

Best,

luke




crash_dumps <- new.env()

f <- function() {
 x <- runif(1e5)
 dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL)))
 assign("last.dump", dump, crash_dumps)
}


WARNING: the following will probably eat all your RAM!

Attach this package and run:

for (i in 1:100) {
 print(i)
 f()
}

You will notice that with each iteration the execution of f() slows down 
significantly while the memory consumption of the R process (v4.0.5 on Linux) 
quickly explodes.

I am having a hard time to understand what exactly is happening here. Something 
w.r.t. too deeply nested environments? Could someone please enlighten me? 
Thanks!

Regards,
Andreas


Background:
In an R package I store crash dumps on error in a parallel processes in a way 
similar to what I have just shown (hence the (un)serialize(), which happens as 
part of returning the objects to the parent process). The first 2 or 3 times I 
do so in a session everything is fine, but afterwards it takes very long and I 
soon run out of memory.

Some more observations:
- If I omit `x <- runif(1e5)`, the issues seem to be less pronounced.
- If I assign to .GlobalEnv instead of crash_dumps, there seems to be no issue 
- probably because .GlobalEnv is not included in sys.frames(), while 
crash_dumps is indirectly via the namespace of the package being the parent.env 
of some of the sys.frames()!?
- If I omit the lapply(...), i.e. use `dump <- 
unserialize(serialize(sys.frames(), NULL))` directly, there seems to be no issue. 
The immediate consequence is that there are less sys.frames and - in particular - 
there is no frame which has the base namespace as its parent.env.
- If I make crash_dumps a list and use assignInMyNamespace() to store the dump 
in it, there also seems to be no issue. I will probably use this as a 
workaround:

crash_dumps <- list()

f <- function() {
 x <- runif(1e5)
 dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL)))
 crash_dumps[["last.dump"]] <- dump
 assignInMyNamespace("crash_dumps", crash_dumps)
}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] brief update on the pipe operator in R-devel

2021-01-12 Thread luke-tierney

After some discussions we've settled on a syntax of the form

mtcars |> subset(cyl == 4) |> d => lm(mpg ~ disp, data = d)

to handle cases where the pipe lhs needs to be passed to an argument
other than the first of the function called on the rhs. This seems a
to be a reasonable balance between making these non-standard cases
easy to see but still easy to write. This is now committed to R-devel.

Best,

luke

On Tue, 22 Dec 2020, luke-tier...@uiowa.edu wrote:

It turns out that allowing a bare function expression on the
right-hand side (RHS) of a pipe creates opportunities for confusion
and mistakes that are too risky. So we will be dropping support for
this from the pipe operator.

The case of a RHS call that wants to receive the LHS result in an
argument other than the first can be handled with just implicit first
argument passing along the lines of

   mtcars |> subset(cyl == 4) |> (\(d) lm(mpg ~ disp, data = d))()

It was hoped that allowing a bare function expression would make this
more convenient, but it has issues as outlined below. We are exploring
some alternatives, and will hopefully settle on one soon after the
holidays.

The basic problem, pointed out in a comment on Twitter, is that in
expressions of the form

   1 |> \(x) x + 1 -> y
   1 |> \(x) x + 1 |> \(y) x + y

everything after the \(x) is parsed as part of the body of the
function.  So these are parsed along the lines of

   1 |> \(x) { x + 1 -> y }
   1 |> \(x) { x + 1 |> \(y) x + y }

In the first case the result is assigned to a (useless) local
variable.  Someone writing this is more likely to have intended to
assign the result to a global variable, as this would:

   (1 |> \(x) x + 1) -> y

In the second case the 'x' in 'x + y' refers to the local variable 'x'
in the first RHS function. Someone writing this is more likely to have
meant

   (1 |> \(x) x + 1) |> \(y) x + y

with 'x' in 'x + y' now referring to a global variable:

   > x <- 2
   > 1 |> \(x) x + 1 |> \(y) x + y
   [1] 3
   > (1 |> \(x) x + 1) |> \(y) x + y
   [1] 4

These issues arise with any approach in R that allows a bare function
expression on the RHS of a pipe operation. It also arises in other
languages with pipe operators. For example, here is the last example
in Julia:

   julia> x = 2
   2
   julia> 1 |> x -> x + 1 |> y -> x + y
   3
   julia> ( 1 |> x -> x + 1 ) |> y -> x + y
   4

Even though proper use of parentheses can work around these issues,
the likelihood of making mistakes that are hard to track down is too
high. So we will disallow the use of bare function expressions on the
right hand side of a pipe.

Best,

luke

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] brief update on the pipe operator in R-devel

2020-12-22 Thread luke-tierney


It turns out that allowing a bare function expression on the
right-hand side (RHS) of a pipe creates opportunities for confusion
and mistakes that are too risky. So we will be dropping support for
this from the pipe operator.

The case of a RHS call that wants to receive the LHS result in an
argument other than the first can be handled with just implicit first
argument passing along the lines of

mtcars |> subset(cyl == 4) |> (\(d) lm(mpg ~ disp, data = d))()

It was hoped that allowing a bare function expression would make this
more convenient, but it has issues as outlined below. We are exploring
some alternatives, and will hopefully settle on one soon after the
holidays.

The basic problem, pointed out in a comment on Twitter, is that in
expressions of the form

1 |> \(x) x + 1 -> y
1 |> \(x) x + 1 |> \(y) x + y

everything after the \(x) is parsed as part of the body of the
function.  So these are parsed along the lines of

1 |> \(x) { x + 1 -> y }
1 |> \(x) { x + 1 |> \(y) x + y }

In the first case the result is assigned to a (useless) local
variable.  Someone writing this is more likely to have intended to
assign the result to a global variable, as this would:

(1 |> \(x) x + 1) -> y

In the second case the 'x' in 'x + y' refers to the local variable 'x'
in the first RHS function. Someone writing this is more likely to have
meant

(1 |> \(x) x + 1) |> \(y) x + y

with 'x' in 'x + y' now referring to a global variable:

> x <- 2
> 1 |> \(x) x + 1 |> \(y) x + y
[1] 3
> (1 |> \(x) x + 1) |> \(y) x + y
[1] 4

These issues arise with any approach in R that allows a bare function
expression on the RHS of a pipe operation. It also arises in other
languages with pipe operators. For example, here is the last example
in Julia:

julia> x = 2
2
julia> 1 |> x -> x + 1 |> y -> x + y
3
julia> ( 1 |> x -> x + 1 ) |> y -> x + y
4

Even though proper use of parentheses can work around these issues,
the likelihood of making mistakes that are hard to track down is too
high. So we will disallow the use of bare function expressions on the
right hand side of a pipe.

Best,

luke

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone:     319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] setting .libPaths() with parallel::clusterCall

2020-12-22 Thread luke-tierney


On Tue, 22 Dec 2020, Mark van der Loo wrote:


Dear all,

It is not possible to set library paths on worker nodes with
parallel::clusterCall (or snow::clusterCall) and I wonder if this is
intended behavior.

Example.

library(parallel)
libdir <- "./tmplib"
if (!dir.exists(libdir)) dir.create("./tmplib")

cl <- makeCluster(2)
clusterCall(cl, .libPaths, c(libdir, .libPaths()) )

The output is as expected with the extra libdir returned for each worker
node. However, running

clusterEvalQ(cl, .libPaths())

Shows that the library paths have not been set.


Use this:

clusterCall(cl, ".libPaths", c(libdir, .libPaths()) )

This will find the function .libPaths on the workers.

Your clusterCall sends across a serialized copy of your process'
.libPaths and calls that. Usually that is equivalent to calling the
function found by the name you used on the workers, but not when the
function has an enclosing environment that the function modifies by
assignment.

Alternate implementations of .libPaths that are more
serialization-friendly are possible in principle but probably not
practical given limitations of the base package.

The distinction between providing a function value or a character
string as the function argument to clusterCall and others could
probably use a paragraph in the help file; happy to consider a patch
if anyone wants to take a crack at it.

Best,

luke



If this is indeed a bug, I'm happy to file it at bugzilla. Tested on R
4.0.3 and r-devel.

Best,
Mark
ps: a workaround is documented here:
https://www.markvanderloo.eu/yaRb/2020/12/17/how-to-set-library-path-on-a-parallel-r-cluster/



sessionInfo()

R Under development (unstable) (2020-12-21 r79668)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS

Matrix products: default
BLAS:   /home/mark/projects/Rdev/R-devel/lib/libRblas.so
LAPACK: /home/mark/projects/Rdev/R-devel/lib/libRlapack.so

locale:
[1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
[3] LC_TIME=nl_NL.UTF-8LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=nl_NL.UTF-8LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=nl_NL.UTF-8   LC_NAME=C
[9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats graphics  grDevices utils datasets  methods
[8] base

loaded via a namespace (and not attached):
[1] compiler_4.1.0

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] R crashes when using huge data sets with character string variables

2020-12-12 Thread luke-tierney


If R is receiving a kill signal there is nothing it can do about it.

I am guessing you are running into a memory over-commit issue in your OS.
https://en.wikipedia.org/wiki/Memory_overcommitment
https://engineering.pivotal.io/post/virtual_memory_settings_in_linux_-_the_problem_with_overcommit/

If you have to run this close to your physical memory limits you might
try using your shell's facility (ulimit for bash, limit for some
others) to limit process memory/virtual memory use to your available
physical memory. You can also try setting the R_MAX_VSIZE environment
variable mentioned in ?Memory; that only affects the R heap, not
malloc() done elsewhere.

Best,

luke

On Sat, 12 Dec 2020, Arne Henningsen wrote:


When working with a huge data set with character string variables, I
experienced that various commands let R crash. When I run R in a
Linux/bash console, R terminates with the message "Killed". When I use
RStudio, I get the message "R Session Aborted. R encountered a fatal
error. The session was terminated. Start New Session". If an object in
the R workspace needs too much memory, I would expect that R would not
crash but issue an error message "Error: cannot allocate vector of
size ...".  A minimal reproducible example (at least on my computer)
is:

nObs <- 1e9

date <- paste( round( runif( nObs, 1981, 2015 ) ), round( runif( nObs,
1, 12 ) ), round( runif( nObs, 1, 31 ) ), sep = "-" )

Is this a bug or a feature of R?

Some information about my R version, OS, etc:

R> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
[1] LC_CTYPE=en_DK.UTF-8   LC_NUMERIC=C
[3] LC_TIME=en_DK.UTF-8LC_COLLATE=en_DK.UTF-8
[5] LC_MONETARY=en_DK.UTF-8LC_MESSAGES=en_DK.UTF-8
[7] LC_PAPER=en_DK.UTF-8   LC_NAME=C
[9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.0.3

/Arne




--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: New pipe operator

2020-12-07 Thread luke-tierney


On Mon, 7 Dec 2020, Peter Dalgaard wrote:





On 7 Dec 2020, at 17:35 , Duncan Murdoch  wrote:

On 07/12/2020 11:18 a.m., peter dalgaard wrote:

Hmm,
I feel a bit bad coming late to this, but I think I am beginning to side with those who want  
"... |> head" to work. And yes, that has to happen at the expense of |> head().


Just curious, how would you express head(df, 10)?  Currently it is

df |> head(10)

Would I have to write it as

df |> function(d) head(d, 10)


It could be

df |> ~ head(_, 10)

which in a sense is "yes" to your question.




As I think it was Gabor points out, the current structure goes down a 
nonstandard evaluation route, which may be difficult to explain and departs 
from usual operator evaluation paradigms by being an odd mix of syntax and 
semantics. R lets you do these sorts of thing, witness ggplot and tidyverse, 
but the transparency of the language tends to suffer.


I wouldn't call it non-standard evaluation.  There is no function corresponding to |>, so there's no evaluation at 
all.  It is more like the way "x -> y" is parsed as "y <- x", or "if (x) y" is 
transformed to `if`(x, y).


That's a point, but maybe also my point. Currently, the parser is inserting the 
LHS as the 1st argument of the RHS, right? Things might be simpler if it was 
more like a simple binop.


It can only be a simple binop if you only allow RHS functions of one argument.
Which would require currying along the lines Duncan showed. Something like:

`%>>%` <- function(x, f) f(x)
C1 <- function(f, ...) function(x) f(x, ...)

mtcars %>>% head
mtcars %>>% C1(head, 2)
mtcars %>>% C1(subset, cyl == 4) %>>% \(d) lm(mpg ~ disp, data = d)

This might fly if we lived in a world where most RHS functions take
one argument and only a few needed currying. That is the case in many
functional languages, but not for R. Making the common case of
multiple arguments easy means you have to work at the source level,
either in the parser or with some form of NSE.

Best,

luke



-pd


Duncan Murdoch


It would be neater if it was simply so that the class/type of the object on the 
right hand side decided what should happen. So we could have a rule that we 
could have an object, an expression, and possibly an unevaluated call on the 
RHS. Or maybe a formula, I.e., we could hav
... |> head
but not
... |> head()
because head() does not evaluate to anything useful. Instead, we could have 
some of these
... |> quote(head())
... |> expression(head())
... |> ~ head()
... |> \(_) head(_)
possibly also using a placeholder mechanism for the three first ones. I kind of 
like the idea that the ~ could be equivalent to \(_).
(And yes, I am kicking myself a bit for not using ~ in the NSE arguments in 
subset() and transform())
-pd

On 7 Dec 2020, at 16:20 , Deepayan Sarkar  wrote:

On Mon, Dec 7, 2020 at 6:53 PM Gabor Grothendieck
 wrote:


On Mon, Dec 7, 2020 at 5:41 AM Duncan Murdoch  wrote:

I agree it's all about call expressions, but they aren't all being
treated equally:

x |> f(...)

expands to f(x, ...), while

x |> `function`(...)

expands to `function`(...)(x).  This is an exception to the rule for
other calls, but I think it's a justified one.


This admitted inconsistency is justified by what?  No argument has been
presented.  The justification seems to be implicitly driven by implementation
concerns at the expense of usability and language consistency.


Sorry if I have missed something, but is your consistency argument
basically that if

foo <- function(x) x + 1

then

x |> foo
x |> function(x) x + 1

should both work the same? Suppose it did. Would you then be OK if

x |> foo()

no longer worked as it does now, and produced foo()(x) instead of foo(x)?

If you are not OK with that and want to retain the current behaviour,
what would you want to happen with the following?

bar <- function(x) function(n) rnorm(n, mean = x)

10 |> bar(runif(1))() # works 'as expected' ~ bar(runif(1))(10)
10 |> bar(runif(1)) # currently bar(10, runif(1))

both of which you probably want. But then

baz <-  bar(runif(1))
10 |> baz

(not currently allowed) will not be the same as what you would want from

10 |> bar(runif(1))

which leads to a different kind of inconsistency, doesn't it?

-Deepayan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel







--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] anonymous functions

2020-12-07 Thread luke-tierney


I don't disagree in principle, but the reality is users want shortcuts
and as a result various packages, in particular tidyverse, have been
providing them. Mostly based on formulas, mostly with significant
issues since formulas weren't designed for this, and mostly
incompatible (tidyverse ones are compatible within tidyverse but not
with others). And of course none work in sapply or lapply. Providing a
shorthand in base may help to improve this. You don't have to use it
if you don't want to, and you can establish coding standards that
disallow it if you like.

Best,

luke

On Mon, 7 Dec 2020, Therneau, Terry M., Ph.D. via R-devel wrote:

“The shorthand form \(x) x + 1 is parsed as function(x) x + 1. It may be 
helpful in making code containing simple function expressions more readable.”


Color me unimpressed.
Over the decades I've seen several "who can write the shortest code" threads: 
in Fortran, in C, in Splus, ...   The same old idea that "short" is a synonym 
for either elegant, readable, or efficient is now being recylced in the 
tidyverse.   The truth is that "short" is actually an antonym for all of 
these things, at least for anyone else reading the code; or for the original 
coder 30-60 minutes after the "clever" lines were written.  Minimal use of 
the spacebar and/or the return key isn't usually held up as a goal, but 
creeps into many practiioner's code as well.


People are excited by replacing "function(" with "\("?  Really?   Are people 
typing code with their thumbs?
I am ambivalent about pipes: I think it is a great concept, but too many of 
my colleagues think that using pipes = no need for any comments.


As time goes on, I find my goal is to make my code less compact and more 
readable.  Every bug fix or new feature in the survival package now adds more 
lines of comments or other documentation than lines of code.  If I have to 
puzzle out what a line does, what about the poor sod who inherits the 
maintainance?






--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: New pipe operator

2020-12-07 Thread luke-tierney


Or, keeping dplyr but with R-devel pipe and function shorthand:

DF <- "myfile.csv" %>%
   readLines() |>
   \(.) gsub(r'{(c\(.*?\)|integer\(0\))}', r'{"\1"}', .) |>
   \(.) read.csv(text = .) |>
   mutate(across(2:3, \(col) lapply(col, \(x) eval(parse(text = x)

Using named arguments to redirect to the implicit first does work,
also in magrittr, but for me at least it is the kind of thing I would
probably regret a month later when trying to figure out the code.

Best,

luke

On Mon, 7 Dec 2020, Gabor Grothendieck wrote:


On Sat, Dec 5, 2020 at 1:19 PM  wrote:

Let's get some experience


Here is my last SO post using dplyr rewritten to use R 4.1 devel.  Seems
not too bad.  Was able to work around the placeholder for gsub by specifying
the arg names and used \(...)... elsewhere.  This does not address the
inconsistency discussed though.  I have indented by 2 spaced in case the
email wraps around.  The objective is to read myfile.csv including columns that
contain c(...) and integer(0), parsing and evaluating them.


 # taken from:
 # 
https://stackoverflow.com/questions/65174764/reading-in-a-csv-that-contains-vectors-cx-y-in-r/65175172#65175172

 # create input file for testing
 Lines <- 
"\"col1\",\"col2\",\"col3\"\n\"a\",1,integer(0)\n\"c\",c(3,4),5\n\"e\",6,7\n"
 cat(Lines, file = "myfile.csv")

 #
 # base R 4.1 (devel)
 DF <- "myfile.csv" |>
   readLines() |>
   gsub(pattern = r'{(c\(.*?\)|integer\(0\))}', replacement = r'{"\1"}') |>
   \(.) read.csv(text = .) |>
   \(.) replace(., 2:3, lapply(.[2:3], \(col) lapply(col, \(x)
eval(parse(text = x)

 #
 # dplyr/magrittr
 library(dplyr)

 DF <- "myfile.csv" %>%
   readLines %>%
   gsub(r'{(c\(.*?\)|integer\(0\))}', r'{"\1"}', .) %>%
   { read.csv(text = .) } %>%
   mutate(across(2:3, ~ lapply(., function(x) eval(parse(text = x)



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: New pipe operator

2020-12-06 Thread luke-tierney


On Sun, 6 Dec 2020, Gabor Grothendieck wrote:


Why is that ambiguous?  It works in magrittr.


For now, all functions marked internally as syntactically special are
disallowed. Not all of these lead to ambiguities.

Best,

luke




library(magrittr)
1 %>% `+`()

[1] 1

On Sun, Dec 6, 2020 at 1:09 PM  wrote:


On Sun, 6 Dec 2020, Gabor Grothendieck wrote:


The following gives an error.

  1 |> `+`(2)
  ## Error: function '+' is not supported in RHS call of a pipe

  1 |> `+`()
  ## Error: function '+' is not supported in RHS call of a pipe

but this does work:

  1 |> (`+`)(2)
  ## [1] 3

  1 |> (`+`)()
  ## [1] 1

The error message suggests that this was intentional.
It isn't mentioned in ?"|>"


?"|>" says:

  To avoid ambiguities, functions in ‘rhs’ calls may not
  be syntactically special, such as ‘+’ or ‘if’.

(used to say lhs; fixed now).

Best,

luke



On Sat, Dec 5, 2020 at 1:19 PM  wrote:


We went back and forth on this several times. The key advantage of
requiring parentheses is to keep things simple and consistent.  Let's
get some experience with that. If experience shows requiring
parentheses creates too many issues then we can add the option of
dropping them later (with special handling of :: and :::). It's easier
to add flexibility and complexity than to restrict it after the fact.

Best,

luke

On Sat, 5 Dec 2020, Hugh Parsonage wrote:


I'm surprised by the aversion to

mtcars |> nrow

over

mtcars |> nrow()

and I think the decision to disallow the former should be
reconsidered.  The pipe operator is only going to be used when the rhs
is a function, so there is no ambiguity with omitting the parentheses.
If it's disallowed, it becomes inconsistent with other treatments like
sapply(mtcars, typeof) where sapply(mtcars, typeof()) would just be
noise.  I'm not sure why this decision was taken

If the only issue is with the double (and triple) colon operator, then
ideally `mtcars |> base::head` should resolve to `base::head(mtcars)`
-- in other words, demote the precedence of |>

Obviously (looking at the R-Syntax branch) this decision was
considered, put into place, then dropped, but I can't see why
precisely.

Best,


Hugh.







On Sat, 5 Dec 2020 at 04:07, Deepayan Sarkar  wrote:


On Fri, Dec 4, 2020 at 7:35 PM Duncan Murdoch  wrote:


On 04/12/2020 8:13 a.m., Hiroaki Yutani wrote:

  Error: function '::' not supported in RHS call of a pipe


To me, this error looks much more friendly than magrittr's error.
Some of them got too used to specify functions without (). This
is OK until they use `::`, but when they need to use it, it takes
hours to figure out why

mtcars %>% base::head
#> Error in .::base : unused argument (head)

won't work but

mtcars %>% head

works. I think this is a too harsh lesson for ordinary R users to
learn `::` is a function. I've been wanting for magrittr to drop the
support for a function name without () to avoid this confusion,
so I would very much welcome the new pipe operator's behavior.
Thank you all the developers who implemented this!


I agree, it's an improvement on the corresponding magrittr error.

I think the semantics of not evaluating the RHS, but treating the pipe
as purely syntactical is a good decision.

I'm not sure I like the recommended way to pipe into a particular argument:

   mtcars |> subset(cyl == 4) |> \(d) lm(mpg ~ disp, data = d)

or

   mtcars |> subset(cyl == 4) |> function(d) lm(mpg ~ disp, data = d)

both of which are equivalent to

   mtcars |> subset(cyl == 4) |> (function(d) lm(mpg ~ disp, data = d))()

It's tempting to suggest it should allow something like

   mtcars |> subset(cyl == 4) |> lm(mpg ~ disp, data = .)


Which is really not that far off from

mtcars |> subset(cyl == 4) |> \(.) lm(mpg ~ disp, data = .)

once you get used to it.

One consequence of the implementation is that it's not clear how
multiple occurrences of the placeholder would be interpreted. With
magrittr,

sort(runif(10)) %>% ecdf(.)(.)
## [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

This is probably what you would expect, if you expect it to work at all, and not

ecdf(sort(runif(10)))(sort(runif(10)))

There would be no such ambiguity with anonymous functions

sort(runif(10)) |> \(.) ecdf(.)(.)

-Deepayan


which would be expanded to something equivalent to the other versions:
but that makes it quite a bit more complicated.  (Maybe _ or \. should
be used instead of ., since those are not legal variable names.)

I don't think there should be an attempt to copy magrittr's special
casing of how . is used in determining whether to also include the
previous value as first argument.

Duncan Murdoch




Best,
Hiroaki Yutani

2020年12月4日(金) 20:51 Duncan Murdoch :


Just saw this on the R-devel news:


R now provides a simple native pipe sy

Re: [Rd] [External] Re: New pipe operator

2020-12-06 Thread luke-tierney


On Sun, 6 Dec 2020, Gabor Grothendieck wrote:


The following gives an error.

  1 |> `+`(2)
  ## Error: function '+' is not supported in RHS call of a pipe

  1 |> `+`()
  ## Error: function '+' is not supported in RHS call of a pipe

but this does work:

  1 |> (`+`)(2)
  ## [1] 3

  1 |> (`+`)()
  ## [1] 1

The error message suggests that this was intentional.
It isn't mentioned in ?"|>"


?"|>" says:

 To avoid ambiguities, functions in ‘rhs’ calls may not
 be syntactically special, such as ‘+’ or ‘if’.

(used to say lhs; fixed now).

Best,

luke



On Sat, Dec 5, 2020 at 1:19 PM  wrote:


We went back and forth on this several times. The key advantage of
requiring parentheses is to keep things simple and consistent.  Let's
get some experience with that. If experience shows requiring
parentheses creates too many issues then we can add the option of
dropping them later (with special handling of :: and :::). It's easier
to add flexibility and complexity than to restrict it after the fact.

Best,

luke

On Sat, 5 Dec 2020, Hugh Parsonage wrote:


I'm surprised by the aversion to

mtcars |> nrow

over

mtcars |> nrow()

and I think the decision to disallow the former should be
reconsidered.  The pipe operator is only going to be used when the rhs
is a function, so there is no ambiguity with omitting the parentheses.
If it's disallowed, it becomes inconsistent with other treatments like
sapply(mtcars, typeof) where sapply(mtcars, typeof()) would just be
noise.  I'm not sure why this decision was taken

If the only issue is with the double (and triple) colon operator, then
ideally `mtcars |> base::head` should resolve to `base::head(mtcars)`
-- in other words, demote the precedence of |>

Obviously (looking at the R-Syntax branch) this decision was
considered, put into place, then dropped, but I can't see why
precisely.

Best,


Hugh.







On Sat, 5 Dec 2020 at 04:07, Deepayan Sarkar  wrote:


On Fri, Dec 4, 2020 at 7:35 PM Duncan Murdoch  wrote:


On 04/12/2020 8:13 a.m., Hiroaki Yutani wrote:

  Error: function '::' not supported in RHS call of a pipe


To me, this error looks much more friendly than magrittr's error.
Some of them got too used to specify functions without (). This
is OK until they use `::`, but when they need to use it, it takes
hours to figure out why

mtcars %>% base::head
#> Error in .::base : unused argument (head)

won't work but

mtcars %>% head

works. I think this is a too harsh lesson for ordinary R users to
learn `::` is a function. I've been wanting for magrittr to drop the
support for a function name without () to avoid this confusion,
so I would very much welcome the new pipe operator's behavior.
Thank you all the developers who implemented this!


I agree, it's an improvement on the corresponding magrittr error.

I think the semantics of not evaluating the RHS, but treating the pipe
as purely syntactical is a good decision.

I'm not sure I like the recommended way to pipe into a particular argument:

   mtcars |> subset(cyl == 4) |> \(d) lm(mpg ~ disp, data = d)

or

   mtcars |> subset(cyl == 4) |> function(d) lm(mpg ~ disp, data = d)

both of which are equivalent to

   mtcars |> subset(cyl == 4) |> (function(d) lm(mpg ~ disp, data = d))()

It's tempting to suggest it should allow something like

   mtcars |> subset(cyl == 4) |> lm(mpg ~ disp, data = .)


Which is really not that far off from

mtcars |> subset(cyl == 4) |> \(.) lm(mpg ~ disp, data = .)

once you get used to it.

One consequence of the implementation is that it's not clear how
multiple occurrences of the placeholder would be interpreted. With
magrittr,

sort(runif(10)) %>% ecdf(.)(.)
## [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

This is probably what you would expect, if you expect it to work at all, and not

ecdf(sort(runif(10)))(sort(runif(10)))

There would be no such ambiguity with anonymous functions

sort(runif(10)) |> \(.) ecdf(.)(.)

-Deepayan


which would be expanded to something equivalent to the other versions:
but that makes it quite a bit more complicated.  (Maybe _ or \. should
be used instead of ., since those are not legal variable names.)

I don't think there should be an attempt to copy magrittr's special
casing of how . is used in determining whether to also include the
previous value as first argument.

Duncan Murdoch




Best,
Hiroaki Yutani

2020年12月4日(金) 20:51 Duncan Murdoch :


Just saw this on the R-devel news:


R now provides a simple native pipe syntax ‘|>’ as well as a shorthand
notation for creating functions, e.g. ‘\(x) x + 1’ is parsed as
‘function(x) x + 1’. The pipe implementation as a syntax transformation
was motivated by suggestions from Jim Hester and Lionel Henry. These
features are experimental and may change prior to release.


This is a good

Re: [Rd] [External] Re: New pipe operator

2020-12-05 Thread luke-tierney

We went back and forth on this several times. The key advantage of
requiring parentheses is to keep things simple and consistent.  Let's
get some experience with that. If experience shows requiring
parentheses creates too many issues then we can add the option of
dropping them later (with special handling of :: and :::). It's easier
to add flexibility and complexity than to restrict it after the fact.

Best,

luke

On Sat, 5 Dec 2020, Hugh Parsonage wrote:

I'm surprised by the aversion to

mtcars |> nrow

over

mtcars |> nrow()

and I think the decision to disallow the former should be
reconsidered.  The pipe operator is only going to be used when the rhs
is a function, so there is no ambiguity with omitting the parentheses.
If it's disallowed, it becomes inconsistent with other treatments like
sapply(mtcars, typeof) where sapply(mtcars, typeof()) would just be
noise.  I'm not sure why this decision was taken

If the only issue is with the double (and triple) colon operator, then
ideally `mtcars |> base::head` should resolve to `base::head(mtcars)`
-- in other words, demote the precedence of |>

Obviously (looking at the R-Syntax branch) this decision was
considered, put into place, then dropped, but I can't see why
precisely.

Best,

Hugh.

On Sat, 5 Dec 2020 at 04:07, Deepayan Sarkar  wrote:

On Fri, Dec 4, 2020 at 7:35 PM Duncan Murdoch  wrote:

On 04/12/2020 8:13 a.m., Hiroaki Yutani wrote:

  Error: function '::' not supported in RHS call of a pipe

To me, this error looks much more friendly than magrittr's error.
Some of them got too used to specify functions without (). This
is OK until they use `::`, but when they need to use it, it takes
hours to figure out why

mtcars %>% base::head
#> Error in .::base : unused argument (head)

won't work but

mtcars %>% head

works. I think this is a too harsh lesson for ordinary R users to
learn `::` is a function. I've been wanting for magrittr to drop the
support for a function name without () to avoid this confusion,
so I would very much welcome the new pipe operator's behavior.
Thank you all the developers who implemented this!

I agree, it's an improvement on the corresponding magrittr error.

I think the semantics of not evaluating the RHS, but treating the pipe
as purely syntactical is a good decision.

I'm not sure I like the recommended way to pipe into a particular argument:

   mtcars |> subset(cyl == 4) |> \(d) lm(mpg ~ disp, data = d)

or

   mtcars |> subset(cyl == 4) |> function(d) lm(mpg ~ disp, data = d)

both of which are equivalent to

   mtcars |> subset(cyl == 4) |> (function(d) lm(mpg ~ disp, data = d))()

It's tempting to suggest it should allow something like

   mtcars |> subset(cyl == 4) |> lm(mpg ~ disp, data = .)

Which is really not that far off from

mtcars |> subset(cyl == 4) |> \(.) lm(mpg ~ disp, data = .)

once you get used to it.

One consequence of the implementation is that it's not clear how
multiple occurrences of the placeholder would be interpreted. With
magrittr,

sort(runif(10)) %>% ecdf(.)(.)
## [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

This is probably what you would expect, if you expect it to work at all, and not

ecdf(sort(runif(10)))(sort(runif(10)))

There would be no such ambiguity with anonymous functions

sort(runif(10)) |> \(.) ecdf(.)(.)

-Deepayan

which would be expanded to something equivalent to the other versions:
but that makes it quite a bit more complicated.  (Maybe _ or \. should
be used instead of ., since those are not legal variable names.)

I don't think there should be an attempt to copy magrittr's special
casing of how . is used in determining whether to also include the
previous value as first argument.

Duncan Murdoch

Best,
Hiroaki Yutani

2020年12月4日(金) 20:51 Duncan Murdoch :

Just saw this on the R-devel news:

R now provides a simple native pipe syntax ‘|>’ as well as a shorthand
notation for creating functions, e.g. ‘\(x) x + 1’ is parsed as
‘function(x) x + 1’. The pipe implementation as a syntax transformation
was motivated by suggestions from Jim Hester and Lionel Henry. These
features are experimental and may change prior to release.

This is a good addition; by using "|>" instead of "%>%" there should be
a chance to get operator precedence right.  That said, the ?Syntax help
topic hasn't been updated, so I'm not sure where it fits in.

There are some choices that take a little getting used to:

 > mtcars |> head
Error: The pipe operator requires a function call or an anonymous
function expression as RHS

(I need to say mtcars |> head() instead.)  This sometimes leads to error
messages that are somewhat confusing:

 > mtcars |> magrittr::debug_pipe |> head
Error: function '::' not supported in RHS call of a pipe

but

mtcars |> magrittr::debu

Re: [Rd] [External] Re: New pipe operator

2020-12-04 Thread luke-tierney


On Sat, 5 Dec 2020, Duncan Murdoch wrote:


On 04/12/2020 2:26 p.m., luke-tier...@uiowa.edu wrote:

On Fri, 4 Dec 2020, Dénes Tóth wrote:



On 12/4/20 3:05 PM, Duncan Murdoch wrote:

...

It's tempting to suggest it should allow something like

    mtcars |> subset(cyl == 4) |> lm(mpg ~ disp, data = .)

which would be expanded to something equivalent to the other versions: 
but
that makes it quite a bit more complicated.  (Maybe _ or \. should be 
used

instead of ., since those are not legal variable names.)


I support the idea of using an underscore (_) as the placeholder symbol.


I strongly oppose adding a placeholder. Allowing for an optional
placeholder significantly complicates both implementing and explaining
the semantics. For a simple syntax transformation to be viable it
would also require some restrictions, such as only allowing a
placeholder as a top level argument and only once. Checking that these
restrictions are met, and accurately signaling when they are not with
reasonable error messages, is essentially an unsolvable problem given
R's semantics.


I don't think you read my suggestion, but that's okay:  you're maintaining 
it, not me.


I thought I did but maybe I missed something. You are right that
supporting a placeholder makes things a lot more complicated. For
being able to easily recognize the non-standard cases _ is better than
. but for me at least not by much.

We did try a number of variations; the code is in the R-syntax branch.
At the root of that branch are two .md files with some notes as of
around useR20. Once things settle down I may update those and look
into turning them into a blog post.

Best,

luke



Duncan Murdoch



The case where the LHS is to be passed as something other than the
first argument is unusual. For me, having that case stand out by using
a function expression makes it much easier to see and so makes the
code easier to understand. As a wearer of progressive bifocals
and someone whose screen is not always free of small dust particles,
having to spot the non-standard pipe stages by seeing a placeholder,
especially a . placeholder, is be a bug, not a feature.

Best,

luke

Syntactic sugars work the the best if 1) they require less keystrokes 
and/or
2) are easier to read compared to the "normal" syntax, and 3) can not lead 
to
unexpected bugs (which is a major problem with the magrittr pipe). Using 
'_'
fulfills all of these criteria since '_' can not clash with any variable 
in

the environment.

Denes

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel








--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: New pipe operator

2020-12-04 Thread luke-tierney


On Fri, 4 Dec 2020, Dénes Tóth wrote:



On 12/4/20 3:05 PM, Duncan Murdoch wrote:

...

It's tempting to suggest it should allow something like

   mtcars |> subset(cyl == 4) |> lm(mpg ~ disp, data = .)

which would be expanded to something equivalent to the other versions: but 
that makes it quite a bit more complicated.  (Maybe _ or \. should be used 
instead of ., since those are not legal variable names.)


I support the idea of using an underscore (_) as the placeholder symbol.


I strongly oppose adding a placeholder. Allowing for an optional
placeholder significantly complicates both implementing and explaining
the semantics. For a simple syntax transformation to be viable it
would also require some restrictions, such as only allowing a
placeholder as a top level argument and only once. Checking that these
restrictions are met, and accurately signaling when they are not with
reasonable error messages, is essentially an unsolvable problem given
R's semantics.

The case where the LHS is to be passed as something other than the
first argument is unusual. For me, having that case stand out by using
a function expression makes it much easier to see and so makes the
code easier to understand. As a wearer of progressive bifocals
and someone whose screen is not always free of small dust particles,
having to spot the non-standard pipe stages by seeing a placeholder,
especially a . placeholder, is be a bug, not a feature.

Best,

luke

Syntactic sugars work the the best if 1) they require less keystrokes and/or 
2) are easier to read compared to the "normal" syntax, and 3) can not lead to 
unexpected bugs (which is a major problem with the magrittr pipe). Using '_' 
fulfills all of these criteria since '_' can not clash with any variable in 
the environment.


Denes

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: .Internal(quit(...)): system call failed: Cannot allocate memory

2020-12-01 Thread luke-tierney


The fact that your max resident size isn't affected looks odd.  Are
you setting the environment variable outside R? When I run

env R_MAX_VSIZE=16Gb /usr/bin/time bin/Rscript jg.R 1e9 2e0 0 0

(your code in jg.R). I get a quick failure with 11785524maxresident)k

Best,

luke

On Tue, 1 Dec 2020, Jan Gorecki wrote:


Thank you Luke,

I tried your suggestion about R_MAX_VSIZE but I am not able to get the
error you are getting.
I tried recent R devel as I have seen you made a change to GC there.
My machine is 128GB, free -h reports 125GB available. I tried to set
128, 125 and 100. In all cases the result is "Command terminated by
signal 9". Each took around 6-6.5h.
Details below, if it tells you anything how could I optimize it (or
raise an exception early) please do let me know.

R 4.0.3

unset R_MAX_VSIZE
   User time (seconds): 40447.92
   System time (seconds): 4034.37
   Percent of CPU this job got: 201%
   Elapsed (wall clock) time (h:mm:ss or m:ss): 6:07:59
   Maximum resident set size (kbytes): 127261184
   Major (requiring I/O) page faults: 72441
   Minor (reclaiming a frame) page faults: 3315491751
   Voluntary context switches: 381446
   Involuntary context switches: 529554
   File system inputs: 108339200
   File system outputs: 120

R-devel 2020-11-27 r79522

unset R_MAX_VSIZE
   User time (seconds): 40713.52
   System time (seconds): 4039.52
   Percent of CPU this job got: 198%
   Elapsed (wall clock) time (h:mm:ss or m:ss): 6:15:52
   Maximum resident set size (kbytes): 127254796
   Major (requiring I/O) page faults: 72810
   Minor (reclaiming a frame) page faults: 3433589848
   Voluntary context switches: 384363
   Involuntary context switches: 609024
   File system inputs: 108467064
   File system outputs: 112

R_MAX_VSIZE=128Gb
   User time (seconds): 40411.13
   System time (seconds): 4227.99
   Percent of CPU this job got: 198%
   Elapsed (wall clock) time (h:mm:ss or m:ss): 6:14:01
   Maximum resident set size (kbytes): 127249316
   Major (requiring I/O) page faults: 88500
   Minor (reclaiming a frame) page faults: 3544520527
   Voluntary context switches: 384117
   Involuntary context switches: 545397
   File system inputs: 111675896
   File system outputs: 120

R_MAX_VSIZE=125Gb
   User time (seconds): 40246.83
   System time (seconds): 4042.76
   Percent of CPU this job got: 201%
   Elapsed (wall clock) time (h:mm:ss or m:ss): 6:06:56
   Maximum resident set size (kbytes): 127254200
   Major (requiring I/O) page faults: 63867
   Minor (reclaiming a frame) page faults: 3449493803
   Voluntary context switches: 370753
   Involuntary context switches: 614607
   File system inputs: 106322880
   File system outputs: 112

R_MAX_VSIZE=100Gb
   User time (seconds): 41837.10
   System time (seconds): 3979.57
   Percent of CPU this job got: 192%
   Elapsed (wall clock) time (h:mm:ss or m:ss): 6:36:34
   Maximum resident set size (kbytes): 127256940
   Major (requiring I/O) page faults: 66829
   Minor (reclaiming a frame) page faults: 3357778594
   Voluntary context switches: 391149
   Involuntary context switches: 646410
   File system inputs: 106605648
   File system outputs: 120

On Fri, Nov 27, 2020 at 10:18 PM  wrote:


On Thu, 26 Nov 2020, Jan Gorecki wrote:


Thank you Luke for looking into it. Your knowledge of gc is definitely
helpful here. I put comments inline below.

Best,
Jan

On Wed, Nov 25, 2020 at 10:38 PM  wrote:


On Tue, 24 Nov 2020, Jan Gorecki wrote:


As for other calls to system. I avoid calling system. In the past I
had some (to get memory stats from OS), but they were failing with
exactly the same issue. So yes, if I would add call to system before
calling quit, I believe it would fail with the same error.
At the same time I think (although I am not sure) that new allocations
made in R are working fine. So R seems to reserve some memory and can
continue to operate, while external call like system will fail. Maybe
it is like this by design, don't know.


Thanks for the report on quit(). We're exploring how to make the
cleanup on exit more robust to low memory situations like these.



Aside from this problem that is easy to report due to the warning
message, I think that gc() is choking at the same time. I tried to
make reproducible example for that, multiple times but couldn't, let
me try one more time.
It happens to manifest when there is 4e8+ unique characters/factors in
an R session. I am able to reproduce it using data.table and dplyr
(0.8.4 because 1.0.0+ fails even sooner), but using base R is not easy
because of the size. I described briefly problem in:
https://github.com/h2oai/db-benchmark/issues/110


Because of the design of R's character vectors, with each element
allocated separately, R is never going t

Re: [Rd] [External] Re: .Internal(quit(...)): system call failed: Cannot allocate memory

2020-11-27 Thread luke-tierney


On Thu, 26 Nov 2020, Jan Gorecki wrote:


Thank you Luke for looking into it. Your knowledge of gc is definitely
helpful here. I put comments inline below.

Best,
Jan

On Wed, Nov 25, 2020 at 10:38 PM  wrote:


On Tue, 24 Nov 2020, Jan Gorecki wrote:


As for other calls to system. I avoid calling system. In the past I
had some (to get memory stats from OS), but they were failing with
exactly the same issue. So yes, if I would add call to system before
calling quit, I believe it would fail with the same error.
At the same time I think (although I am not sure) that new allocations
made in R are working fine. So R seems to reserve some memory and can
continue to operate, while external call like system will fail. Maybe
it is like this by design, don't know.


Thanks for the report on quit(). We're exploring how to make the
cleanup on exit more robust to low memory situations like these.



Aside from this problem that is easy to report due to the warning
message, I think that gc() is choking at the same time. I tried to
make reproducible example for that, multiple times but couldn't, let
me try one more time.
It happens to manifest when there is 4e8+ unique characters/factors in
an R session. I am able to reproduce it using data.table and dplyr
(0.8.4 because 1.0.0+ fails even sooner), but using base R is not easy
because of the size. I described briefly problem in:
https://github.com/h2oai/db-benchmark/issues/110


Because of the design of R's character vectors, with each element
allocated separately, R is never going to be great at handling huge
numbers of distinct strings. But it can do an adequate job given
enough memory to work with.

When I run your GitHub issue example on a machine with around 500 Gb
of RAM it seems to run OK; /usr/bin/time reports

2706.89user 161.89system 37:10.65elapsed 128%CPU (0avgtext+0avgdata 
92180796maxresident)k
0inputs+103450552outputs (0major+38716351minor)pagefaults 0swaps

So the memory footprint is quite large. Using gc.time() it looks like
about 1/3 of the time is in GC. Not ideal, and maybe could be improved
on a bit, but probably not by much. The GC is basically doing an
adequate job, given enough RAM.


Agree, 1/3 is a lot but still acceptable. So this strictly is not
something that requires intervention.
PS. I wasn't aware of gc.time(), it may be worth linking it from
SeeAlso in gc() manual.



If you run this example on a system without enough RAM, or with other
programs competing for RAM, you are likely to end up fighting with
your OS/hardware's virtual memory system. When I try to run it on a
16Gb system it churns for an hour or so before getting killed, and
/usr/bin/time reports a huge number of page faults:

312523816inputs+0outputs (24761285major+25762068minor)pagefaults 0swaps

You are probably experiencing something similar.


Yes, this is exactly what I am experiencing.
The machine is a bare metal machine of 128GB mem, csv size 50GB,
data.frame size 74GB.
In my case it churns for ~3h before it gets killed with SIGINT from
the parent R process which uses 3h as a timeout for this script.
This is something I would like to be addressed because gc time is far
bigger than actual computation time. This is not really acceptable, I
would prefer to raise an exception instead.



There may be opportunities for more tuning of the GC to better handle
running this close to memory limits, but I doubt the payoff would be
worth the effort.


If you don't have plans/time to work on that anytime soon, then I can
fill bugzilla for this problem so it won't get lost in the mailing
list.


I'm not convinced anything useful can be done that would work well for
your application without working badly for others.

If you want to drive this close to your memory limits you are probably
going to have to take responsibility for some tuning at your end. One
option in ?Memory you might try is the R_MAX_VSIZE environment
variable. On my 16Gb machine with R_MAX_VSIZE=16Gb your example fails
very quickly with

Error: vector memory exhausted (limit reached?)

rather than churning for an hour trying to make things work. Setting
memory and/or virtual memory limits in your shell is another option.

Best,

luke






Best,

luke


It would help if gcinfo() could take FALSE/TRUE/2L where 2L will print
even more information about gc, like how much time the each gc()
process took, how many objects it has to check on each level.

Best regards,
Jan



On Tue, Nov 24, 2020 at 1:05 PM Tomas Kalibera  wrote:


On 11/24/20 11:27 AM, Jan Gorecki wrote:

Thanks Bill for checking that.
It was my impression that warnings are raised from some internal
system calls made when quitting R. At that point I don't have much
control over checking the return status of those.
Your suggestion looks good to me.

Tomas, do you think this could help? could this be implemented?


I think this is a good suggestion. Deleting files on Unix was changed
from system("rm&q

Re: [Rd] [External] Re: .Internal(quit(...)): system call failed: Cannot allocate memory

2020-11-25 Thread luke-tierney


On Tue, 24 Nov 2020, Jan Gorecki wrote:


As for other calls to system. I avoid calling system. In the past I
had some (to get memory stats from OS), but they were failing with
exactly the same issue. So yes, if I would add call to system before
calling quit, I believe it would fail with the same error.
At the same time I think (although I am not sure) that new allocations
made in R are working fine. So R seems to reserve some memory and can
continue to operate, while external call like system will fail. Maybe
it is like this by design, don't know.


Thanks for the report on quit(). We're exploring how to make the
cleanup on exit more robust to low memory situations like these.



Aside from this problem that is easy to report due to the warning
message, I think that gc() is choking at the same time. I tried to
make reproducible example for that, multiple times but couldn't, let
me try one more time.
It happens to manifest when there is 4e8+ unique characters/factors in
an R session. I am able to reproduce it using data.table and dplyr
(0.8.4 because 1.0.0+ fails even sooner), but using base R is not easy
because of the size. I described briefly problem in:
https://github.com/h2oai/db-benchmark/issues/110


Because of the design of R's character vectors, with each element
allocated separately, R is never going to be great at handling huge
numbers of distinct strings. But it can do an adequate job given
enough memory to work with.

When I run your GitHub issue example on a machine with around 500 Gb
of RAM it seems to run OK; /usr/bin/time reports

2706.89user 161.89system 37:10.65elapsed 128%CPU (0avgtext+0avgdata 
92180796maxresident)k
0inputs+103450552outputs (0major+38716351minor)pagefaults 0swaps

So the memory footprint is quite large. Using gc.time() it looks like
about 1/3 of the time is in GC. Not ideal, and maybe could be improved
on a bit, but probably not by much. The GC is basically doing an
adequate job, given enough RAM.

If you run this example on a system without enough RAM, or with other
programs competing for RAM, you are likely to end up fighting with
your OS/hardware's virtual memory system. When I try to run it on a
16Gb system it churns for an hour or so before getting killed, and
/usr/bin/time reports a huge number of page faults:

312523816inputs+0outputs (24761285major+25762068minor)pagefaults 0swaps

You are probably experiencing something similar.

There may be opportunities for more tuning of the GC to better handle
running this close to memory limits, but I doubt the payoff would be
worth the effort.

Best,

luke


It would help if gcinfo() could take FALSE/TRUE/2L where 2L will print
even more information about gc, like how much time the each gc()
process took, how many objects it has to check on each level.

Best regards,
Jan



On Tue, Nov 24, 2020 at 1:05 PM Tomas Kalibera  wrote:


On 11/24/20 11:27 AM, Jan Gorecki wrote:

Thanks Bill for checking that.
It was my impression that warnings are raised from some internal
system calls made when quitting R. At that point I don't have much
control over checking the return status of those.
Your suggestion looks good to me.

Tomas, do you think this could help? could this be implemented?


I think this is a good suggestion. Deleting files on Unix was changed
from system("rm") to doing that in C, and deleting the session directory
should follow.

It might also help diagnosing your problem, but I don't think it would
solve it. If the diagnostics in R works fine and the OS was so
hopelessly out of memory that it couldn't run any more external
processes, then really this is not a problem of R, but of having
exhausted the resources. And it would be a coincidence that just this
particular call to "system" at the end of the session did not work.
Anything else could break as well close to the end of the script. This
seems the most likely explanation to me.

Do you get this warning repeatedly, reproducibly at least in slightly
different scripts at the very end, with this warning always from quit()?
So that the "call" part of the warning message has .Internal(quit) like
in the case you posted? Would adding another call to "system" before the
call to "q()" work - with checking the return value? If it is always
only the last call to "system" in "q()", then it is suspicious, perhaps
an indication that some diagnostics in R is not correct. In that case, a
reproducible example would be the key - so either if you could diagnose
on your end what is the problem, or create a reproducible example that
someone else can use to reproduce and debug.

Best
Tomas



On Mon, Nov 23, 2020 at 7:10 PM Bill Dunlap  wrote:

The call to system() probably is an internal call used to delete the session's 
tempdir().  This sort of failure means that a potentially large amount of disk 
space is not being recovered when R is done.  Perhaps R_CleanTempD

Re: [Rd] [External] exists, get and get0 accept silently inputs of length > 1

2020-11-23 Thread luke-tierney


Thanks for the suggestion.

In R-devel (as of r79474) exists(), get(), and get0() now signal an
error if the first argument has length > 1. This will cause about 30
CRAN packages and possibly a couple of Bioconductor packages to fail
under R-devel.

getS3method() now also signals an error if the class argument has
length > 1. Calls of the form getS2method(generic, class(x)) will now
fail if class(x) has length > 1. I believe most CRAN package issues
related to this change have already been resolved, but a few may
remain.

Best,

luke

On Fri, 13 Nov 2020, Antoine Fabri wrote:


Dear R-devel,

The doc of exists, get and get0 is unambiguous, x should be an object given
as a character string. However these accept longer inputs. It can lead an
uncareful user to think these functions are vectorized when they're not,
and generally lets through bugs that one might have preferred to trigger
earlier failure.

``` r
exists("d")
#> [1] FALSE
exists(c("c", "d"))
#> [1] TRUE
get(c("c", "d"))
#> function (...)  .Primitive("c")
get0(c("c", "d"))
#> function (...)  .Primitive("c")
```

I believe these should either fail, or be vectorized, probably the former.

Thanks,

Antoine

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Two ALTREP questions

2020-11-21 Thread luke-tierney


On Sat, 21 Nov 2020, Jiefei Wang wrote:


Hello,

I have two related ALTREP questions. It seems like there is no way to
assign attributes to an ALTREP vector without using C++ code. To be more
specifically, I want to make an ALTREP matrix, I have tried the following R
code but none of them work.
```
.Internal(inspect(1:6))
.Internal(inspect(matrix(1:6, 2,3)))
.Internal(inspect(as.matrix(1:6)))
.Internal(inspect(structure(1:6, dim = c(2L,3L
.Internal(inspect({x <- 1:6;attr(x, "dim") <- c(2L,3L);x}))
.Internal(inspect({x <- 1:6;attributes(x)<- list(dim = c(2L,3L));x}))
```


Some things that my help you:

- Try with 1:6 replaced by as.character(1:6), and look at the REF
  values in both cases.

- In particular, look at what this gives you:

x <- as.character(1:6)
attr(x, "dim") <- c(2, 3)

- Things can be a little different with larger vectors; try variants
  of your examples for more than 64 elements.


This also brings
my second question, it seems like the ALTREP coercion function does not
handle attributes correctly.  After the coercion, the ALTREP object will
lose its attributes.
```
coerceFunc <- inline::cxxfunction( signature(x = "SEXP", attr = "SEXP" ) , '
SET_ATTRIB(x,attr);
return(Rf_coerceVector(x, REALSXP));
')

coerceFunc(1:6, pairlist(dim = c(2L, 3L)))

[1] 1 2 3 4 5 6

coerceFunc(1:6 + 0L, pairlist(dim = c(2L, 3L)))

[,1] [,2] [,3]
[1,]135
[2,]246
```
The problem is that the coercion function is directly dispatched to the
user-defined ALTREP coercion function, so the user is responsible to attach
the attributes after the coercion. If he forgets to do so, then the result
is a plain vector. Similar to the `Duplicate` and `DuplicateEX` functions
where the former one will attach the attributes by default, I feel that the
`Coerce` function should only return a plain vector and there should be a
`CoerceEx` function to do the attribute assignment, so the logic in the
no-EX ALTREP functions can be consistent. I do not know how dramastic the
change would be, so maybe this is too hard to do.


Since you raised this earlier I have been looking at it and also think
that this needs to he handled along the lines of
Duplicate/DuplicateEx. I need to find some time to think that through
and implement it; hopefully I'll get to it before the end of the year.


BTW, is there any way to contribute to the R source? I know R has a limited
resouces, so if possible, I will be happy to fix the matrix issue myself
and make some minor contributions to the R community.


You can find the suggested process for contributing described in the
'Reporting Bugs' link on the R home page https://www.r-project.org/

Best,

luke


Best,
Jiefei

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] exists, get and get0 accept silently inputs of length > 1

2020-11-16 Thread luke-tierney

Come on, folks. There is no NSE involved in calls to get(): it's
standard evaluation all the way into the C code. Prior to the change a
first argument that is anything other than a character vector would
produce an error. After the change, passing in a symbol will do the
obvious thing. Code that worked previously without error (i.e. called
get() with string values) will continue to work exactly as it did
before.

It's a little more convenient and a little more efficient for some
computations on the language not to have to call as.character on
symbols before passing them to get(). Hence the change expanding the
domain of get().

luke

On Tue, 17 Nov 2020, Gabriel Becker wrote:

Hi all,
I have used variable values in get() as well, and including, I think, in
package code (though pretty infrequently).
Perhaps a character.only argument similar to library?

~G

On Mon, Nov 16, 2020 at 5:31 PM Hugh Parsonage 
wrote:
  I noticed the recent commit to R-dev (r79434).  Is this wise?
  I've
  often used get() in constructions like

  for (j in ls()) if (is.numeric(x <- get(j))) ...

  (and often interactively, rather than in a package)

  Am I to understand that get(j) will now be equivalent to `j`
  even if j
  is a string referring putatively to another object?

  On Sat, 14 Nov 2020 at 01:34,  wrote:
  >
  > Worth looking into. It would probably cause some check
  failures, so
  > would probably be a good idea to run a check across
  BIOC/CRAN.  At the
  > same time it would be worth allowing name objects (type
  "symbol") so
  > thee don't have to be converted to character for the call and
  then
  > back to names internally for the environment lookup.
  >
  > Best,
  >
  > luke
  >
  > On Fri, 13 Nov 2020, Antoine Fabri wrote:
  >
  > > Dear R-devel,
  > >
  > > The doc of exists, get and get0 is unambiguous, x should be
  an object given
  > > as a character string. However these accept longer inputs.
  It can lead an
  > > uncareful user to think these functions are vectorized when
  they're not,
  > > and generally lets through bugs that one might have
  preferred to trigger
  > > earlier failure.
  > >
  > > ``` r
  > > exists("d")
  > > #> [1] FALSE
  > > exists(c("c", "d"))
  > > #> [1] TRUE
  > > get(c("c", "d"))
  > > #> function (...)  .Primitive("c")
  > > get0(c("c", "d"))
  > > #> function (...)  .Primitive("c")
  > > ```
  > >
  > > I believe these should either fail, or be vectorized,
  probably the former.
  > >
  > > Thanks,
  > >
  > > Antoine
  > >
  > >       [[alternative HTML version deleted]]
  > >
  > > __
  > > R-devel@r-project.org mailing list
  > > https://stat.ethz.ch/mailman/listinfo/r-devel
  > >
  >
  > --
  > Luke Tierney
  > Ralph E. Wareham Professor of Mathematical Sciences
  > University of Iowa                  Phone:
   319-335-3386
  > Department of Statistics and        Fax:
   319-335-3017
  >     Actuarial Science
  > 241 Schaeffer Hall                  email:
   luke-tier...@uiowa.edu
  > Iowa City, IA 52242                 WWW:
  http://www.stat.uiowa.edu
  >
  > __
  > R-devel@r-project.org mailing list
  > https://stat.ethz.ch/mailman/listinfo/r-devel

  ______
  R-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-devel

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] exists, get and get0 accept silently inputs of length > 1

2020-11-13 Thread luke-tierney


Worth looking into. It would probably cause some check failures, so
would probably be a good idea to run a check across BIOC/CRAN.  At the
same time it would be worth allowing name objects (type "symbol") so
thee don't have to be converted to character for the call and then
back to names internally for the environment lookup.

Best,

luke

On Fri, 13 Nov 2020, Antoine Fabri wrote:


Dear R-devel,

The doc of exists, get and get0 is unambiguous, x should be an object given
as a character string. However these accept longer inputs. It can lead an
uncareful user to think these functions are vectorized when they're not,
and generally lets through bugs that one might have preferred to trigger
earlier failure.

``` r
exists("d")
#> [1] FALSE
exists(c("c", "d"))
#> [1] TRUE
get(c("c", "d"))
#> function (...)  .Primitive("c")
get0(c("c", "d"))
#> function (...)  .Primitive("c")
```

I believe these should either fail, or be vectorized, probably the former.

Thanks,

Antoine

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: Change to I() in R 4.1

2020-10-30 Thread luke-tierney


On Fri, 30 Oct 2020, Pages, Herve wrote:



On 10/29/20 23:08, Pages, Herve wrote:
...


I can think of 2 ways to move forward:

1. Keep I()'s current implementation but suppress the warning. We'll
make the necessary adjustments to DataFrame() to repair columns supplied
as I() objects. Note that we would still be in the situation where
I() objects break validObject() but we've been in that situation for
years and so far we've managed to work around it. However this doesn't
mean that validObject() shouldn't be fixed. Note that print(I())
would also need to be fixed (it says "" which is
misleading). Anyways, these 2 issues are separated from the main issue
and can be dealt with later.


1b. A variant of the above could be to use the old implementation for S4
objects only:

  I <- function(x)
  {
  if (isS4(x)) {
  structure(x, class = unique.default(c("AsIs", oldClass(x
  } else {
  `class<-`(x, unique.default(c("AsIs", oldClass(x
  }
  }

That is probably a good compromise for now.


Not really. The underlying problem is that class<- and attributes<-
(which is what structure() uses) handle the 'class' attribute
differently, and that needs to be fixed. I don't have a strong opinion
on what either should do, but they should do the same thing.

It's probably worth re-thinking the I() mechanism. ?Modifying the
value, whether by changing the class or an attribute, is going to be
brittle. A little less so for an attribute, but using an attribute
rules out dispatch on the AsIs property.

Best,

luke



I would also suggest that the "package" attribute of the S4 class be
kept around so the code that we use to restore the original object has a
way to restore it exactly, including its full class specification. Right
now, and also with the previous implementation, we cannot do that
because attr(class(x), "package") is lost. So something like this:

  I <- function(x)
  {
  if (isS4(x)) {
  x_class <- class(x)
  new_classes <- c("AsIs", x_class)
  attr(new_classes, "package") <- attr(x_class, "package")
  structure(x, class=new_classes)
  } else {
  `class<-`(x, unique.default(c("AsIs", oldClass(x
  }
  }

Thanks,
H.





--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Something is wrong with the unserialize function

2020-10-29 Thread luke-tierney


I found that also; fixed in r79386 in the trunk. Will port to R-patched
shortly.

Best,

luke

On Thu, 29 Oct 2020, Martin Morgan wrote:


This

Index: src/main/altrep.c
===
--- src/main/altrep.c   (revision 79385)
+++ src/main/altrep.c   (working copy)
@@ -275,10 +275,11 @@
SEXP psym = ALTREP_SERIALIZED_CLASS_PKGSYM(info);
SEXP class = LookupClass(csym, psym);
if (class == NULL) {
-   SEXP pname = ScalarString(PRINTNAME(psym));
+   SEXP pname = PROTECT(ScalarString(PRINTNAME(psym)));
R_tryCatchError(find_namespace, pname,
handle_namespace_error, NULL);
class = LookupClass(csym, psym);
+   UNPROTECT(1);
}
return class;
}

seems to remove the warning; I'm guessing that the other SEXP already exist so 
don't need protecting?

Martin Morgan


On 10/29/20, 12:47 PM, "R-devel on behalf of luke-tier...@uiowa.edu" 
 wrote:

   Thanks for the report. Will look into it when I get a chance unless
   someone else gets there first.

   A simpler reprex:

   ## create and serialize a memmory-mapped file object
   filePath <- "x.dat"
   con <- file(filePath, "wrb")
   writeBin(rep(0.0,10),con)
   close(con)

   library(simplemmap)
   x <- mmap(filePath, "double")
   saveRDS(x, file = "x.Rds")

   ## in a separate R process:
   gctorture()
   readRDS("x.Rds")

   Looks like a missing PROTECT somewhere.

   Best,

   luke

   On Thu, 29 Oct 2020, Jiefei Wang wrote:

   > Hi all,
   >
   > I am not able to export an ALTREP object when `gctorture` is on in the
   > worker. The package simplemmap can be used to reproduce the problem. See
   > the example below
   > ```
   > ## Create a temporary file
   > filePath <- tempfile()
   > con <- file(filePath, "wrb")
   > writeBin(rep(0.0,10),con)
   > close(con)
   >
   > library(simplemmap)
   > library(parallel)
   > cl <- makeCluster(1)
   > x <- mmap(filePath, "double")
   > ## Turn gctorture on
   > clusterEvalQ(cl, gctorture())
   > clusterExport(cl, "x")
   > ## x is an 0-length vector on the worker
   > clusterEvalQ(cl, x)
   > stopCluster(cl)
   > ```
   >
   > you can find more info on the problem if you manually build a connection
   > between two R processes and export the ALTREP object. See output below
   > ```
   >> con <- socketConnection(port = 1234,server = FALSE)
   >> gctorture()
   >> x <- unserialize(con)
   > Warning message:
   > In unserialize(con) :
   >  cannot unserialize ALTVEC object of class 'mmap_real' from package
   > 'simplemmap'; returning length zero vector
   > ```
   > It seems like  simplemmap did not get loaded correctly on the worker. If
   > you run `library( simplemmap)` before unserializing the ALTREP, there will
   > be no problem. But I suppose we should be able to unserialize objects
   > without preloading the library?
   >
   > This issue can be reproduced on Ubuntu with R version 4.0.2 (2020-06-22)
   > and Windows with R Under development (unstable) (2020-09-03 r79126).
   >
   > Here is the link to simplemmap:
   > https://github.com/ALTREP-examples/Rpkg-simplemmap
   >
   > Best,
   > Jiefei
   >
   > [[alternative HTML version deleted]]
   >
   > __
   > R-devel@r-project.org mailing list
   > https://stat.ethz.ch/mailman/listinfo/r-devel
   >

   --
   Luke Tierney
   Ralph E. Wareham Professor of Mathematical Sciences
   University of Iowa  Phone: 319-335-3386
   Department of Statistics andFax:   319-335-3017
   Actuarial Science
   241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
   Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

   __
   R-devel@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Something is wrong with the unserialize function

2020-10-29 Thread luke-tierney


Thanks for the report. Will look into it when I get a chance unless
someone else gets there first.

A simpler reprex:

## create and serialize a memmory-mapped file object
filePath <- "x.dat"
con <- file(filePath, "wrb")
writeBin(rep(0.0,10),con)
close(con)

library(simplemmap)
x <- mmap(filePath, "double")
saveRDS(x, file = "x.Rds")

## in a separate R process:
gctorture()
readRDS("x.Rds")

Looks like a missing PROTECT somewhere.

Best,

luke

On Thu, 29 Oct 2020, Jiefei Wang wrote:


Hi all,

I am not able to export an ALTREP object when `gctorture` is on in the
worker. The package simplemmap can be used to reproduce the problem. See
the example below
```
## Create a temporary file
filePath <- tempfile()
con <- file(filePath, "wrb")
writeBin(rep(0.0,10),con)
close(con)

library(simplemmap)
library(parallel)
cl <- makeCluster(1)
x <- mmap(filePath, "double")
## Turn gctorture on
clusterEvalQ(cl, gctorture())
clusterExport(cl, "x")
## x is an 0-length vector on the worker
clusterEvalQ(cl, x)
stopCluster(cl)
```

you can find more info on the problem if you manually build a connection
between two R processes and export the ALTREP object. See output below
```

con <- socketConnection(port = 1234,server = FALSE)
gctorture()
x <- unserialize(con)

Warning message:
In unserialize(con) :
 cannot unserialize ALTVEC object of class 'mmap_real' from package
'simplemmap'; returning length zero vector
```
It seems like  simplemmap did not get loaded correctly on the worker. If
you run `library( simplemmap)` before unserializing the ALTREP, there will
be no problem. But I suppose we should be able to unserialize objects
without preloading the library?

This issue can be reproduced on Ubuntu with R version 4.0.2 (2020-06-22)
and Windows with R Under development (unstable) (2020-09-03 r79126).

Here is the link to simplemmap:
https://github.com/ALTREP-examples/Rpkg-simplemmap

Best,
Jiefei

[[alternative HTML version deleted]]

______
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: Coercion function does not work for the ALTREP object

2020-10-08 Thread luke-tierney


For larger atomic vectors (currently >= 64 elements) the complex
assignment process tries to avoid duplicating when only attributes are
updated, This is done with an ALTREP wrapper. The differences in
whether the Duplicate method are called for smaller and larger vectors
are therefore as intended, Ideally there should be no difference for
Coerce. There is a difference because wrappers currently don't
delegate the Coerce method when the wrapped object is an ALTREP. I'll
look into whether that can be addressed without breaking things.

Best,

luke

On Thu, 8 Oct 2020, Jiefei Wang wrote:


Hi Gabriel, here is a simple package for reproducing the problem.

https://github.com/Jiefei-Wang/testPkg

Best,
Jiefei

On Thu, Oct 8, 2020 at 5:04 AM Gabriel Becker  wrote:


Jiefei,

Where does the code for your altrep class live?

Thanks,
~G

On Wed, Oct 7, 2020 at 4:25 AM Jiefei Wang  wrote:


Hi all,

The coercion function defined for the ALTREP object will not be called by
R
when an assignment operation implicitly introduces coercion for a large
ALTREP object.

For example, If I create a vector of length 10, the ALTREP coercion
function seems to work fine.
```

x <- 1:10
y <- wrap_altrep(x)
.Internal(inspect(y))

@0x1f9271c0 13 INTSXP g0c0 [REF(2)] I am altrep

y[1] <- 1.0

Duplicating object
Coercing object

.Internal(inspect(y))

@0x1f927c08 14 REALSXP g0c0 [REF(1)] I am altrep
```

However, if I create a vector of length 1024, R will give me a normal
real-type vector
```

x <- 1:1024
y <- wrap_altrep(x)
.Internal(inspect(y))

@0x1f8ddb20 13 INTSXP g0c0 [REF(2)] I am altrep

y[1] <- 1.0
.Internal(inspect(y))

@0x1f0d72a0 14 REALSXP g0c7 [REF(1)] (len=1024, tl=0)
1,2,3,4,5,...
```

Note that the duplicate function is also called for the first example. It
seems like R completely ignores my ALTREP functions in the second example.
I feel this might be designed on purpose, but I do not understand the
reason behind it. Is there any reason why we are not consistent here? Here
is my session info

sessionInfo()
R Under development (unstable) (2020-09-03 r79126)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)

Best,
Jiefei

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Thread-safe R functions

2020-09-13 Thread luke-tierney


You should assume that NO functions or macros in the R API are
thread-safe.  If some happen to be now, on some platforms, they are
not guaranteed to be in the future. Even if you use a global lock you
need to keep in mind that any function in the R API can signal an
error and execute a longjmp, so you need to make sure you have set a
top level context in your thread.

Best,

luke

On Sun, 13 Sep 2020, Jiefei Wang wrote:


Hi,

I am curious about whether there exist thread-safe functions in
`Rinternals.h`.  I know that R is single-threaded designed, but for the
simple and straightforward functions like `DATAPTR` and `INTEGER_GET_REGION`,
are these functions safe to call in a multi-thread environment?

Best,
Jiefei

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: Operations with long altrep vectors cause segfaults on Windows

2020-09-08 Thread luke-tierney


On Tue, 8 Sep 2020, Martin Maechler wrote:


luke-tierney
on Tue, 8 Sep 2020 09:42:43 -0500 (CDT) writes:


   > On Tue, 8 Sep 2020, Martin Maechler wrote:
   >>>>>>> Martin Maechler
   >>>>>>> on Tue, 8 Sep 2020 10:40:24 +0200 writes:
   >>
   >>>>>>> Hugh Parsonage
   >>>>>>> on Tue, 8 Sep 2020 18:08:11 +1000 writes:
   >>
   >> >> I can only reproduce on Windows, but reliably (both 4.0.0 and 4.0.2):
   >>
   >> >> $> R --vanilla
   >> >> x <- c(0L, -2e9:2e9)
   >>
   >> >> # > Segmentation fault
   >>
   >> >> Tried to reproduce on Linux but the above worked as expected. Not an
   >> >> issue merely with the length of the vector; for example, x <-
   >> >> rep_len(1:10, 1e10) works, though the altrep vector must be long to
   >> >> reproduce:
   >>
   >> >> x <- c(0L, -1e9:1e9)  #ok
   >>
   >> >> Segmentation faults occur with the following too:
   >>
   >> >> x <- (-2e9:2e9) + 1L
   >>
   >> > Your operation would "need" (not in theory, but in practice)
   >> > to go from altrep to regular vectors.
   >> > I guess the segfault occurs because of something like this :
   >>
   >> > R asks Windows to hand it a huge amount of memory and Windows replies
   >> > "ok, here is the memory pointer"
   >> > and then R tries to write to there, but illegally (because
   >> > Windows should have told R that it does not really have enough
   >> > memory for that ..).
   >>
   >> > I cannot reproduce the segmentation fault .. but I can confirm
   >> > there is a bug there that shows for me on Windows but not on
   >> > Linux:
   >>
   >> > "My" Windows is on a terminalserver not with too many GB of memory
   >> > (but then in a version of Windows that recognizes that it cannot
   >> > get so much memory):
   >>
   >> > - Here some transcript (thanks to
   >> > using Emacs w/ ESS also on Windows) --
   >>
   >> > R Under development (unstable) (2020-08-24 r79074) -- "Unsuffered 
Consequences"
   >> > Copyright (C) 2020 The R Foundation for Statistical Computing
   >> > Platform: x86_64-w64-mingw32/x64 (64-bit)
   >>
   >> > R ist freie Software und kommt OHNE JEGLICHE GARANTIE.
   >> > Sie sind eingeladen, es unter bestimmten Bedingungen weiter zu 
verbreiten.
   >> > Tippen Sie 'license()' or 'licence()' für Details dazu.
   >>
   >> > R ist ein Gemeinschaftsprojekt mit vielen Beitragenden.
   >> > Tippen Sie 'contributors()' für mehr Information und 'citation()',
   >> > um zu erfahren, wie R oder R packages in Publikationen zitiert werden 
können.
   >>
   >> > Tippen Sie 'demo()' für einige Demos, 'help()' für on-line Hilfe, oder
   >> > 'help.start()' für eine HTML Browserschnittstelle zur Hilfe.
   >> > Tippen Sie 'q()', um R zu verlassen.
   >>
   >> >> x <- (-2e9:2e9) + 1L
   >> > Fehler: kann Vektor der Größe 14.9 GB nicht allozieren
   >> >> y <- c(0L, -2e9:2e9)
   >> > Fehler: kann Vektor der Größe 14.9 GB nicht allozieren
   >> >> Sys.setenv(LANGUAGE="en")
   >> >> y <- c(0L, -2e9:2e9)
   >> > Error: cannot allocate vector of size 14.9 Gb
   >> >> y <- -1e9:4e9
   >> >> .Internal(inspect(y))
   >> > @0x195a6808 14 REALSXP g0c0 [REF(65535)]  -10 : 
-294967296 (compact)
   >> >> .Machine$integer.max / 1e9
   >> > [1] 2.147484
   >> >> y <- -1e6:2.2e9
   >> >> .Internal(inspect(y))
   >> > @0x0a11a5d8 14 REALSXP g0c0 [REF(65535)]  -100 : 
-2094967296 (compact)
   >> >> y <- -1e6:2e9
   >> >> .Internal(inspect(y))
   >> > @0x0a13adf0 13 INTSXP g0c0 [REF(65535)]  -100 : 20 
(compact)
   >> >>
   >> > - end of transcript 
---
   >>
   >> > So indeed, no seg.fault, R notices that it can't get 15 GB of
   >> > memory.
   >>
   >> > But the bug is bad news:  We have *silent* integer overflow happening
   >> > according to what  .Internal(inspect(y)) shows...
   >>
   >> >  less bad new: Probably the bug is only in the 'internal inspect' 
code

Re: [Rd] [External] Re: Operations with long altrep vectors cause segfaults on Windows

2020-09-08 Thread luke-tierney

On Windows, things are fine as long as they remain (compacted
aka 'ALTREP') INTSXP:

 > y <- -1e3:2e9 ;.Internal(inspect(y))
 @0x0a285648 13 INTSXP g0c0 [REF(65535)]  -1000 : 20 (compact)
 > y <- -1e3:2.1e9 ;.Internal(inspect(y))
 @0x19925930 13 INTSXP g0c0 [REF(65535)]  -1000 : 21 (compact)

and here, y is correct, just the printing from
.Internal(inspect(y)) is bugous (probably prints the double as an integer):


It's a '%ld' that probably needs to be '%lld' for Windows. Will fix
sometime soon.

Best,

luke



 > y <- -1e3:2.2e9 ; .Internal(inspect(y))
 @0x195c0178 14 REALSXP g0c0 [REF(65535)]  -1000 : -2094967296 (compact)
 > length(y)
 [1] 221001
 > tail(y)
 [1] 2.2e+09 2.2e+09 2.2e+09 2.2e+09 2.2e+09 2.2e+09
 > tail(y) - 2.2e9
 [1] -5 -4 -3 -2 -1  0
 >

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: Operations with long altrep vectors cause segfaults on Windows

2020-09-08 Thread luke-tierney


On Tue, 8 Sep 2020, Hugh Parsonage wrote:


Thanks Martin.  On further testing, it seems that the segmentation
fault can only occur when the amount of obtainable memory is
sufficiently high. On my machine (admittedly with other processes
running):

$ R --vanilla --max-mem-size=30G -e "x <- c(0L, -2e9:2e9)"
Segmentation fault

$ R --vanilla --max-mem-size=29G -e "x <- c(0L, -2e9:2e9)"
Error: cannot allocate vector of size 14.9 Gb
Execution halted


Unfortunately I don't have access to a Windows machine with enough
memory to get to the point of failure. If you have rtools and gdb
installed can you run in gdb and see where the segfault is happening?

Best,

luke



On Tue, 8 Sep 2020 at 18:52, Martin Maechler  wrote:



Martin Maechler
on Tue, 8 Sep 2020 10:40:24 +0200 writes:



Hugh Parsonage
on Tue, 8 Sep 2020 18:08:11 +1000 writes:


   >> I can only reproduce on Windows, but reliably (both 4.0.0 and 4.0.2):

   >> $> R --vanilla
   >> x <- c(0L, -2e9:2e9)

   >> # > Segmentation fault

   >> Tried to reproduce on Linux but the above worked as expected. Not an
   >> issue merely with the length of the vector; for example, x <-
   >> rep_len(1:10, 1e10) works, though the altrep vector must be long to
   >> reproduce:

   >> x <- c(0L, -1e9:1e9)  #ok

   >> Segmentation faults occur with the following too:

   >> x <- (-2e9:2e9) + 1L

   > Your operation would "need" (not in theory, but in practice)
   > to go from altrep to regular vectors.
   > I guess the segfault occurs because of something like this :

   > R asks Windows to hand it a huge amount of memory and Windows replies
   > "ok, here is the memory pointer"
   > and then R tries to write to there, but illegally (because
   > Windows should have told R that it does not really have enough
   > memory for that ..).

   > I cannot reproduce the segmentation fault .. but I can confirm
   > there is a bug there that shows for me on Windows but not on
   > Linux:

   > "My" Windows is on a terminalserver not with too many GB of memory
   > (but then in a version of Windows that recognizes that it cannot
   > get so much memory):

   > - Here some transcript (thanks to
   > using Emacs w/ ESS also on Windows) --

   > R Under development (unstable) (2020-08-24 r79074) -- "Unsuffered 
Consequences"
   > Copyright (C) 2020 The R Foundation for Statistical Computing
   > Platform: x86_64-w64-mingw32/x64 (64-bit)

   > R ist freie Software und kommt OHNE JEGLICHE GARANTIE.
   > Sie sind eingeladen, es unter bestimmten Bedingungen weiter zu verbreiten.
   > Tippen Sie 'license()' or 'licence()' für Details dazu.

   > R ist ein Gemeinschaftsprojekt mit vielen Beitragenden.
   > Tippen Sie 'contributors()' für mehr Information und 'citation()',
   > um zu erfahren, wie R oder R packages in Publikationen zitiert werden 
können.

   > Tippen Sie 'demo()' für einige Demos, 'help()' für on-line Hilfe, oder
   > 'help.start()' für eine HTML Browserschnittstelle zur Hilfe.
   > Tippen Sie 'q()', um R zu verlassen.

   >> x <- (-2e9:2e9) + 1L
   > Fehler: kann Vektor der Größe 14.9 GB nicht allozieren
   >> y <- c(0L, -2e9:2e9)
   > Fehler: kann Vektor der Größe 14.9 GB nicht allozieren
   >> Sys.setenv(LANGUAGE="en")
   >> y <- c(0L, -2e9:2e9)
   > Error: cannot allocate vector of size 14.9 Gb
   >> y <- -1e9:4e9
   >> .Internal(inspect(y))
   > @0x195a6808 14 REALSXP g0c0 [REF(65535)]  -10 : -294967296 
(compact)
   >> .Machine$integer.max / 1e9
   > [1] 2.147484
   >> y <- -1e6:2.2e9
   >> .Internal(inspect(y))
   > @0x0a11a5d8 14 REALSXP g0c0 [REF(65535)]  -100 : -2094967296 
(compact)
   >> y <- -1e6:2e9
   >> .Internal(inspect(y))
   > @0x0a13adf0 13 INTSXP g0c0 [REF(65535)]  -100 : 20 
(compact)
   >>
   > - end of transcript 
---

   > So indeed, no seg.fault, R notices that it can't get 15 GB of
   > memory.

   > But the bug is bad news:  We have *silent* integer overflow happening
   > according to what  .Internal(inspect(y)) shows...

   >  less bad new: Probably the bug is only in the 'internal inspect' code
   > where a format specifier is used in C's printf() that does not work
   > correctly on Windows, at least the way it is currently compiled ..


   > On (64-bit) Linux, I get

   >> y <- -1e9:4e9 ; .Internal(inspect(y))
   > @7d86388 14 REALSXP g0c0 [REF(65535)]  -10 : 40 (compact)

Re: [Rd] [External] Re: some questions about R internal SEXP types

2020-09-08 Thread luke-tierney


On Tue, 8 Sep 2020, Hadley Wickham wrote:


On Tue, Sep 8, 2020 at 4:12 AM Tomas Kalibera  wrote:



The general principle is that R packages are only allowed to use what is
documented in the R help (? command) and in Writing R Extensions. The
former covers what is allowed from R code in extensions, the latter
mostly what is allowed from C code in extensions (with some references
to Fortran).


Could you clarify what you mean by "documented"? For example,
Rf_allocVector() is mentioned several times in R-exts, but I don't see
anywhere where the inputs and output are precisely described (which is
what I would consider to be documented). Is Rf_allocVector() part of
the API?


For now, documented means mentioned as something extension writers can
use.  Details are in the header files, Rinternals.h for
Rf_allocVector().

Ideally someone would find the time to refactor the header files,
Rinternals.h in particular, so everything in installed headers is
considered in the API and everything else is considered private and
subject to change. Unfortunately that would take a lot of effort, both
technical and political, and I don't see it happening soon. But I'm
happy to be proved wrong.

Best,

luke



Hadley




--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] HELPWANTED keyword in bugs.r-project.org

2020-08-05 Thread luke-tierney


Just a quick note to mention that we have added a HELPWANTED keyword
on bugs.r-project.org for tagging bugs and issues where a good
well-tested patch would be particularly appreciated.  You can find the
HELPWANTED issues by selecting the keyword in the search interface or at

https://bugs.r-project.org/bugzilla/buglist.cgi?keywords=HELPWANTED

This URL shows both open and resolved HELPWANTED issues.

At the moment only a handful of issues have been tagged, but there
will be more over time. One of these may be a good place to start if
you are looking for ways to contribute. The techincal level varies;
some might be resolved with a small amount of R code; others might
need more extensive changes at the C level.

Best,

luke


--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: Invisible names problem

2020-07-22 Thread luke-tierney


On Wed, 22 Jul 2020, Simon Urbanek wrote:


Very interesting:


.Internal(inspect(k[i]))

@10a4bc000 14 REALSXP g0c7 [ATT] (len=2, tl=0) 1,2,3,4,1,...
ATTRIB:
 @7fa24f07fa58 02 LISTSXP g0c0 [REF(1)]
   TAG: @7fa24b803e90 01 SYMSXP g0c0 [MARK,REF(5814),LCK,gp=0x6000] "names" 
(has value)
   @10a4e4000 16 STRSXP g0c7 [REF(1)] (len=2, tl=0)
 @7fa24ba575c8 09 CHARSXP g0c1 [MARK,REF(35005),gp=0x61] [ASCII] [cached] 
"a"
 @7fa24be24428 09 CHARSXP g0c1 [MARK,REF(35010),gp=0x61] [ASCII] [cached] 
"b"
 @7fa24b806ec0 09 CHARSXP g0c1 [MARK,REF(35082),gp=0x61] [ASCII] [cached] 
"c"
 @7fa24bcc6af0 09 CHARSXP g0c1 [MARK,REF(35003),gp=0x61] [ASCII] [cached] 
"d"
 @7fa24ba575c8 09 CHARSXP g0c1 [MARK,REF(35005),gp=0x61] [ASCII] [cached] 
"a"
 ...


.Internal(inspect(unname(k[i])))

@10a50c000 14 REALSXP g0c7 [] (len=2, tl=0) 1,2,3,4,1,...


.Internal(inspect(x2))

@7fa24fc692d8 14 REALSXP g0c0 [REF(1)]  wrapper [srt=-2147483648,no_na=0]
 @10a228000 14 REALSXP g0c7 [REF(1),ATT] (len=2, tl=0) 1,2,3,4,1,...
 ATTRIB:
   @7fa24fc69850 02 LISTSXP g0c0 [REF(1)]
 TAG: @7fa24b803e90 01 SYMSXP g0c0 [MARK,REF(5797),LCK,gp=0x4000] "names" 
(has value)
 @10a25 16 STRSXP g0c7 [REF(65535)] (len=2, tl=0)
@7fa24ba575c8 09 CHARSXP g0c1 [MARK,REF(10005),gp=0x61] [ASCII] [cached] 
"a"
@7fa24be24428 09 CHARSXP g0c1 [MARK,REF(10010),gp=0x61] [ASCII] [cached] 
"b"
@7fa24b806ec0 09 CHARSXP g0c1 [MARK,REF(10077),gp=0x61] [ASCII] [cached] 
"c"
@7fa24bcc6af0 09 CHARSXP g0c1 [MARK,REF(10003),gp=0x61] [ASCII] [cached] 
"d"
@7fa24ba575c8 09 CHARSXP g0c1 [MARK,REF(10005),gp=0x61] [ASCII] [cached] 
"a"
...

If you don't assign the intermediate result things are simple as R knows there are no 
references so the names can be simply removed. However, if you assign the result that 
is not possible as there is still the reference in x2 at the time when unname() 
creates its own local temporary variable obj to do what probably most of us would use 
which is names(obj) <- NULL (i.e. names(x2) <- NULL avoids that problem.since 
you don't need both x2 and obj).

To be precise, when you use unname() on an assigned object, R has to technically keep two copies - one 
for the existing x2 and a second in unname() for obj so it can call names(obj)<-NULL for the 
modification. To avoid that R instead creates a wrapper for the original x2 which says "like x2 
but names are NULL". The rationale is that for large vector it is better to keep records of 
metadata changes rather than duplicating the object. This way the vector is stored only once. However, 
as you blow way the original x2, all that is left is k[I] with the extra information "don't use 
the names". Unfortunately, R cannot know that you will eventually only keep the version without 
the names - at which point it could strip the names since they are not referenced anymore.

I'm not sure what is the best solution here. In theory, if the wrapper found 
out that the object it is wrapping has no more references it could remove the 
names, but I'm sure that would only solve some cases (what if you duplicated 
the wrapper and thus there were multiple wrappers referencing it?) and not sure 
if it has a way to find out. The other way to deal with that would be at 
serialization time if it could be detected such that it can remove the wrapper. 
Since the intersection of serialization experts and ALTREP experts is exactly 
one, I'll leave it to that set to comment further ;).


Currently the wrapper serialization mechanism just serializes the
wrapped object and unserialize re-wraps it at the other end.

If there is only one reference to the wrapped value then we know the
attributes can't be accessed from the R level anymore, so it would be
safe to remove the attributes before passing it off for serializing.
Unless I'm missing something that would be an easy change. But it
would be good to know if it would really make a difference in
realistic situations.

[Dropping attributes could be done at other times as well if there is
only one reference, e.g. on accessing the data, but that is not likely
to be worth while within a single R session.]

If there is more than one reference to the wrapped object, then things
is more complicated. We could duplicate the payload and send that off
for serialization (and install it in the wrapper), but that could be a
bad idea of the object is large.

A tighter integration of ALTREP serialization with the serialization
internals might allow and ALTREP's serialization method to write
directly to the serialization stream, but that would make things much
harder to maintain.

Best,

luke



Cheers,
Simon




On Jul 23, 2020, at 07:29, Pan Domu  wrote:

I ran into strange behavior when removing names

Re: [Rd] [External] Re: R-devel internal errors during check produce?

2020-06-30 Thread luke-tierney

Thanks. Fixed in R-devel in r78754. This was related to a fix for
PR#17809, not the change to unique.default.

Best,

luke

On Tue, 30 Jun 2020, Jan Gorecki wrote:

No packages are being loaded, or even installed.
Did you try running the example on R-devel built with flags I have
provided in this email?
I checked now and it is required to use --enable-strict-barrier to
reproduce the issue.

On Tue, Jun 30, 2020 at 9:02 AM Martin Maechler
 wrote:

Kurt Hornik
on Tue, 30 Jun 2020 06:20:57 +0200 writes:

Jan Gorecki writes:

   >> Thank you both, You are absolutely correct that example
   >> should be minimal, so here it is.

   >> l = list(a=new.env(), b=new.env()) unique(l)

   >> Just for completeness, env_list during check that raises
   >> error

   >> env_list <- list(baseenv(),
   >>   as.environment("package:graphics"),
   >>   as.environment("package:stats"),
   >>   as.environment("package:utils"),
   >>   as.environment("package:methods") )

   >> unique(env_list)

   > Thanks ... but the above work fine for me.  E.g.,

R> l = list(a=new.env(), b=new.env())
R> unique(l)
   > [[1]] 

   > [[2]] 

   > Best -k

Ditto here;  also your (Jan) 2nd example works fine.

So, you must have loaded some (untidy) packages / code which redefine
standard base R behavior ?

Martin

______
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Possible ABI change in R 4.0.1

2020-06-29 Thread luke-tierney


EXTPTR_PTR is not in the API so it is not guaranteed to even exist in
the future. The API function for accessing the pointer address is
R_ExternalPtrAddr. See Section 5.13 in WRE.

Sometimes internals need to be changed, In this case a change was made
to deal with a segfault; the commit notice tells you the PR this
addressed.

As it says in Writing R Extensions about defining USE_RINTERNALS:

Also be prepared to adjust your code should R internals change.

The same goes for any use of non-API macros and functions.

Best,

luke


On Mon, 29 Jun 2020, Gábor Csárdi wrote:


Hi all,

it seems that from R 4.0.1 EXTPTR_PTR can be either a macro or a
function, depending on whether USE_RINTERNALS is requested.

Jeroen helped me find that this was in 78592:
https://github.com/wch/r-source/commit/c634fec5214e73747b44d7c0e6f047fefe44667d

This is a problem, because binary packages that are built on R 4.0.1
or R 4.0.2 will potentially not load on R 4.0.0, if they use the
EXTPTR_PTR function.

E.g. this is R 4.0.0 on Linux:


library(Rcpp)

Error: package or namespace load failed for ‘Rcpp’ in dyn.load(file,
DLLpath = DLLpath, ...):
unable to load shared object '/usr/local/lib/R/library/Rcpp/libs/Rcpp.so':
 Error relocating /usr/local/lib/R/library/Rcpp/libs/Rcpp.so:
EXTPTR_PTR: symbol not found
In addition: Warning message:
package ‘Rcpp’ was built under R version 4.0.1

It is easiest to reproduce this on Windows, because the CRAN binaries
are now built on R 4.0.2, so if you install Rcpp on R 4.0.0 from CRAN,
and try to load it you'll get:


library(Rcpp)

Error: package or namespace load failed for 'Rcpp' in inDL(x,
as.logical(local), as.logical(now), ...):
unable to load shared object
'C:/Users/csard/R/win-library/4.0/Rcpp/libs/x64/Rcpp.dll':
 LoadLibrary failure:  The specified procedure could not be found.
In addition: Warning message:
package 'Rcpp' was built under R version 4.0.2

I suppose this change was not intended?

Best,
Gabor

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Change in lapply's missing argument passing

2020-06-27 Thread luke-tierney


Yes, to resolve

https://bugs.r-project.org/bugzilla/show_bug.cgi?id=15199

Best,

luke

On Fri, 26 Jun 2020, William Dunlap via R-devel wrote:


Consider the following expression, in which we pass 'i=', with no value
given for the 'i' argument, to lapply.
   lapply("x", function(i, j) c(i=missing(i),j=missing(j), i=)
From R-2.14.0 (2011-10-31) through R-3.4.4 (2018-03-15) this evaluated to
c(i=TRUE, j=FALSE).  From R-3.5.0 (2018-04-23) through R-4.0.0 (2020-04-24)
this evaluated to c(i=FALSE, j=TRUE).

Was this change intentional?

Bill Dunlap
TIBCO Software
wdunlap tibco.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Unexpected Error Handling by Generic in R 4.0.1

2020-06-25 Thread luke-tierney

Thanks for the report.

This is due to a change restoring behavior that was disabled
temporarily to work around a bug
(https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16111).
So it is again working as originally designed.

There are a number of places in the S4 dispatch code where errors are
caught and re-signaled with some additional information about the
dispatch context that might be helpful. Unfortunately all that is
retained from the original error is the message. It would be better to
signal a structured error object that includes the original error in a
slot.

Some options:

Create and signal a structured error wrapping the original error in
these cases.

Revert the argument evaluation to not wrap errors.

Drop wrapping of error from all other cases.

I don't have strong views on which way to go. But wrapping and
re-signaling from C would take a decent amount of effort so isn't likely
to happen without someone contributing a well-tested patch.

Best,

luke

On Thu, 25 Jun 2020, Matthew Carlucci wrote:

Hello R-devel community,

I posted a new R 4.0.1 behaviour to stack overflow
(https://stackoverflow.com/questions/62327810/inconsistent-error-handling-of-function-and-s4-generics-on-r-4-0-1),
where I think it is an undesired or unexpected change in 4.0.1. Attributes of
errors seem to be lost or obscured when encountered in an S4 generic context.

An example of this being undesirable comes in shiny applications where
my_reactive (an unevaluated reactive object) returns a shiny.silent.error
attribute which is lost upon error within an S4 generic function. The lack of
this attribute causes the entire application to exit with an error (with no
stack trace available). For example, within a shiny context:

foo <- try(nrow(my_reactive()))

attr(foo,"condition")

Where the S4 generic returns:

bar <- try(BiocGenerics::nrow(my_reactive()))

attr(bar,"condition")

From what I can tell from the release notes of 4.0.1, this does not appear to
be an expected breaking change so I am hesitant to update old code and shiny
applications to account for this behaviour. Any guidance would be appreciated.

Thank you,

Matthew Carlucci

CONFIDENTIALITY NOTICE: This e-mail message, including a...{{dropped:18}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics andFax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] numericDeriv alters result of eval in R 4.0.1

2020-06-16 Thread luke-tierney


The eval() call could also throw an error that would leave the input
environment modified. Better change along the lines described in the
bug report at

https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17831

Best,

luke

On Tue, 16 Jun 2020, Raimundo Neto wrote:


Dear all

As far as I could trace, looking at the function C function numeric_deriv,
this unwanted behavior comes from the inner most loop in, at the very end of
the function,
for(i = 0, start = 0; i < LENGTH(theta); i++) {
  for(j = 0; j < LENGTH(VECTOR_ELT(pars, i)); j++, start += LENGTH(ans)) {
    SEXP ans_del;
    double origPar, xx, delta;

    origPar = REAL(VECTOR_ELT(pars, i))[j];
    xx = fabs(origPar);
    delta = (xx == 0) ? eps : xx*eps;
    REAL(VECTOR_ELT(pars, i))[j] += rDir[i] * delta;
    PROTECT(ans_del = eval(expr, rho));
    if(!isReal(ans_del)) ans_del = coerceVector(ans_del, REALSXP);
    UNPROTECT(1);
    for(k = 0; k < LENGTH(ans); k++) {
      if (!R_FINITE(REAL(ans_del)[k]))
        error(_("Missing value or an infinity produced when evaluating the
model"));
      REAL(gradient)[start + k] = rDir[i] * (REAL(ans_del)[k] -
REAL(ans)[k])/delta;
    }
    REAL(VECTOR_ELT(pars, i))[j] = origPar;
  }
}
Maybe a (naive?) fix is change the if statement in the inner most loop to

if (!R_FINITE(REAL(ans_del)[k])) {
  REAL(VECTOR_ELT(pars, i))[j] = origPar;
  error(_("Missing value or an infinity produced when evaluating the
model"));
}


Regards,
Raimundo Neto


Em ter., 16 de jun. de 2020 às 11:31,  escreveu:
  Thanks; definitely a bug. I've submitted it to the bug tracker
  at

  https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17831

  Best,

  luke

  On Mon, 15 Jun 2020, Raimundo Neto wrote:

  > Dear R developers,
  >
  > I've run into a weird behavior of the numericDeriv function
  (from the stats
  > package) which I also posted on StackOverflow (question has
  same title as
  > this email, except for the version of R).
  >
  > Running the code bellow we can see that the numericDeriv
  function gives an
  > error as the derivative of x^a wrt a is x^a * log(x) and log
  is not defined
  > for negative numbers. However, seems like the function changes
  the value of
  > env1$a from 3 to 3.00044703483581543. If x is a vector of
  positive
  > values numericDeriv function completes the task without
  errors  and env1$a
  > remains unchanged as expected.
  >
  > This happened to me running R 4.0.1 on Ubuntu 20.04 and also
  to another
  > StackOverflow user using running the same version of R on
  Windows 10. I
  > wonder, is this an intended behavior of the function or really
  a bug?
  >
  > options(digits=22)
  > env1 = new.env()
  > env1$x = rnorm(10)
  > env1$a = 3
  > eval(quote(x^a), env1)
  > numericDeriv(quote(x^a), "a", env1)
  > eval(quote(x^a), env1)
  > env1$a
  >
  > Thank you!
  > Raimundo Neto
  >
  >       [[alternative HTML version deleted]]
  >
  > ______
  > R-devel@r-project.org mailing list
  > https://stat.ethz.ch/mailman/listinfo/r-devel
  >

  --
  Luke Tierney
  Ralph E. Wareham Professor of Mathematical Sciences
  University of Iowa                  Phone:           
   319-335-3386
  Department of Statistics and        Fax:             
   319-335-3017
      Actuarial Science
  241 Schaeffer Hall                  email: 
   luke-tier...@uiowa.edu
  Iowa City, IA 52242                 WWW: 
  http://www.stat.uiowa.edu





--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone:     319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] numericDeriv alters result of eval in R 4.0.1

2020-06-16 Thread luke-tierney


Thanks; definitely a bug. I've submitted it to the bug tracker at

https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17831

Best,

luke

On Mon, 15 Jun 2020, Raimundo Neto wrote:


Dear R developers,

I've run into a weird behavior of the numericDeriv function (from the stats
package) which I also posted on StackOverflow (question has same title as
this email, except for the version of R).

Running the code bellow we can see that the numericDeriv function gives an
error as the derivative of x^a wrt a is x^a * log(x) and log is not defined
for negative numbers. However, seems like the function changes the value of
env1$a from 3 to 3.00044703483581543. If x is a vector of positive
values numericDeriv function completes the task without errors  and env1$a
remains unchanged as expected.

This happened to me running R 4.0.1 on Ubuntu 20.04 and also to another
StackOverflow user using running the same version of R on Windows 10. I
wonder, is this an intended behavior of the function or really a bug?

options(digits=22)
env1 = new.env()
env1$x = rnorm(10)
env1$a = 3
eval(quote(x^a), env1)
numericDeriv(quote(x^a), "a", env1)
eval(quote(x^a), env1)
env1$a

Thank you!
Raimundo Neto

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall      email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: use of the tcltk package crashes R 4.0.1 for Windows

2020-06-07 Thread luke-tierney


I've committed the change to use Free instead of free in tcltk.c and
sys-std.c (r78652 for R-devel, r78653 for R-patched).

We might consider either moving Calloc/Free out of the Windows
remapping or moving the remapping into header files so everything
seeing our header files uses our calloc/free. Either would be less
brittle that the current status.

Best,

luke

On Sun, 7 Jun 2020, peter dalgaard wrote:





On 7 Jun 2020, at 18:59 , Jeroen Ooms  wrote:

On Sun, Jun 7, 2020 at 5:53 PM  wrote:


On Sun, 7 Jun 2020, peter dalgaard wrote:


So this wasn't tested for a month?

Anyways, Free() is just free() with a check that we're not freeing a null 
pointer, followed by setting the pointer to NULL. At that point of tcltk.c, we 
have

 for (objc = i = 0; i < length(avec); i++){
  const char *s;
  char *tmp;
  if (!isNull(nm) && strlen(s = translateChar(STRING_ELT(nm, i{
  //  tmp = calloc(strlen(s)+2, sizeof(char));
  tmp = Calloc(strlen(s)+2, char);
  *tmp = '-';
  strcpy(tmp+1, s);
  objv[objc++] = Tcl_NewStringObj(tmp, -1);
  free(tmp);
  }
  if (!isNull(t = VECTOR_ELT(avec, i)))
  objv[objc++] = (Tcl_Obj *) R_ExternalPtrAddr(t);
  }

and I can't see how tmp can be NULL at the free(), nor can I see it mattering 
if it is not set to NULL (notice that it goes out of scope with the for loop).


Right. And the calloc->Calloc change doesn't look like an issue either
-- just checking for a NULL.

If the crash is happening in free() then that most likely means
corrupted malloc data structures. Unfortunately that could be
happening anywhere.


Writing R extensions, section 6.1.2 says: "Do not assume that memory
allocated by Calloc/Realloc comes from the same pool as used by
malloc: in particular do not use free or strdup with it."

I think the reason is that R uses dlmalloc for Calloc on Windows:
https://github.com/wch/r-source/blob/c634fec5214e73747b44d7c0e6f047fefe44667d/src/main/memory.c#L94-L103


But that section #defines calloc and free to Rm_... counterparts in lockstep? 
(I assume that is where dlmalloc comes in?)

Anyways, does it actually work to change free() to Free()? If so, then all this 
post mortem analysis is rather a moot point.

-pd




--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: use of the tcltk package crashes R 4.0.1 for Windows

2020-06-07 Thread luke-tierney


There is one other possibility:

It may be that the calloc/free pair picked up by the tcltk package DLL
is different from the pair picked up when building base R. (We provide
our own malloc framework, but if the macros aren't quite right it may
be that the system malloc is picked up in some cases). In that case
using Calloc and free would be mismatching the malloc systems and
probably segfault.

If that is indeed happening we should fix it, but using Free with
Calloc should cure the immediate symptom.

Best,

luke

On Sun, 7 Jun 2020, luke-tier...@uiowa.edu wrote:


On Sun, 7 Jun 2020, peter dalgaard wrote:


So this wasn't tested for a month?

Anyways, Free() is just free() with a check that we're not freeing a null 
pointer, followed by setting the pointer to NULL. At that point of tcltk.c, 
we have


  for (objc = i = 0; i < length(avec); i++){
   const char *s;
   char *tmp;
   if (!isNull(nm) && strlen(s = translateChar(STRING_ELT(nm, i{
   //  tmp = calloc(strlen(s)+2, sizeof(char));
   tmp = Calloc(strlen(s)+2, char);
   *tmp = '-';
   strcpy(tmp+1, s);
   objv[objc++] = Tcl_NewStringObj(tmp, -1);
   free(tmp);
   }
   if (!isNull(t = VECTOR_ELT(avec, i)))
   objv[objc++] = (Tcl_Obj *) R_ExternalPtrAddr(t);
   }

and I can't see how tmp can be NULL at the free(), nor can I see it 
mattering if it is not set to NULL (notice that it goes out of scope with 
the for loop).


Right. And the calloc->Calloc change doesn't look like an issue either
-- just checking for a NULL.

If the crash is happening in free() then that most likely means
corrupted malloc data structures. Unfortunately that could be
happening anywhere.

Best bet to narrow this down is for someone with a good Windows setup
who can reproduce this to bisect the svn commits and see at what
commit this started happening. Unfortunately my office Windows machine
isn't responding and it will probably take some time to get that
fixed.

Best,

luke



-pd



On 7 Jun 2020, at 16:00 , Jeroen Ooms  wrote:

On Sun, Jun 7, 2020 at 3:13 AM Fox, John  wrote:


Hi,

The following code, from the examples in ?TkWidgets , immediately crashes 
R 4.0.1 for Windows:


- snip 
library("tcltk")
tt <- tktoplevel()
label.widget <- tklabel(tt, text = "Hello, World!")
button.widget <- tkbutton(tt, text = "Push",
command = function()cat("OW!\n"))
tkpack(label.widget, button.widget) # geometry manager
- snip 



I can reproduce this. The backtrace shows the crash happens in
dotTclObjv  [/src/library/tcltk/src/tcltk.c@243 ]. This looks like a
bug that was introduced by commit 78408/78409 about a month ago. I
think the problem is that this commit changes 'calloc' to 'Calloc'
without changing the corresponding 'free' to 'Free'.

This has nothing to do with the Windows build or installation. Nothing
has changed in the windows build procedure between 4.0.0 and 4.0.1.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel








--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: use of the tcltk package crashes R 4.0.1 for Windows

2020-06-07 Thread luke-tierney


On Sun, 7 Jun 2020, peter dalgaard wrote:


So this wasn't tested for a month?

Anyways, Free() is just free() with a check that we're not freeing a null 
pointer, followed by setting the pointer to NULL. At that point of tcltk.c, we 
have

  for (objc = i = 0; i < length(avec); i++){
   const char *s;
   char *tmp;
   if (!isNull(nm) && strlen(s = translateChar(STRING_ELT(nm, i{
   //  tmp = calloc(strlen(s)+2, sizeof(char));
   tmp = Calloc(strlen(s)+2, char);
   *tmp = '-';
   strcpy(tmp+1, s);
   objv[objc++] = Tcl_NewStringObj(tmp, -1);
   free(tmp);
   }
   if (!isNull(t = VECTOR_ELT(avec, i)))
   objv[objc++] = (Tcl_Obj *) R_ExternalPtrAddr(t);
   }

and I can't see how tmp can be NULL at the free(), nor can I see it mattering 
if it is not set to NULL (notice that it goes out of scope with the for loop).


Right. And the calloc->Calloc change doesn't look like an issue either
-- just checking for a NULL.

If the crash is happening in free() then that most likely means
corrupted malloc data structures. Unfortunately that could be
happening anywhere.

Best bet to narrow this down is for someone with a good Windows setup
who can reproduce this to bisect the svn commits and see at what
commit this started happening. Unfortunately my office Windows machine
isn't responding and it will probably take some time to get that
fixed.

Best,

luke



-pd



On 7 Jun 2020, at 16:00 , Jeroen Ooms  wrote:

On Sun, Jun 7, 2020 at 3:13 AM Fox, John  wrote:


Hi,

The following code, from the examples in ?TkWidgets , immediately crashes R 
4.0.1 for Windows:

- snip 
library("tcltk")
tt <- tktoplevel()
label.widget <- tklabel(tt, text = "Hello, World!")
button.widget <- tkbutton(tt, text = "Push",
command = function()cat("OW!\n"))
tkpack(label.widget, button.widget) # geometry manager
- snip 



I can reproduce this. The backtrace shows the crash happens in
dotTclObjv  [/src/library/tcltk/src/tcltk.c@243 ]. This looks like a
bug that was introduced by commit 78408/78409 about a month ago. I
think the problem is that this commit changes 'calloc' to 'Calloc'
without changing the corresponding 'free' to 'Free'.

This has nothing to do with the Windows build or installation. Nothing
has changed in the windows build procedure between 4.0.0 and 4.0.1.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: Surpising behavior when using an active binding as loop index in R 4.0.0

2020-05-24 Thread luke-tierney


On Sun, 24 May 2020, Deepayan Sarkar wrote:


A shorter reproducible example:

example(makeActiveBinding)
for (fred in 1:3) { 0 }
ls()

Both problems go away if you first do

compiler::enableJIT(2)

So looks like a bug in compiling the for loop.


Not in compiling but in the byte code interpreter. It was not handling
active bindings for the loop variable properly. This was fixed
yesterday in R--devel and R-patched, so will be fixed in R 4.0.1.

Best,

luke



-Deepayan

On Sat, May 23, 2020 at 5:45 PM Thomas Friedrichsmeier via R-devel
 wrote:


Possibly just a symptom of the earlier behavior, but I'll amend my
example, below, with an even more disturbing observation:

Am Sat, 23 May 2020 13:19:24 +0200
schrieb Thomas Friedrichsmeier via R-devel :
[...]

Consider the code below:

makeActiveBinding("i",
  function(value) {
  if (missing(value)) {
  x
  } else {
  print("set")
  x <<- value
  }
  }, globalenv())

i <- 1 # output "set"
print(i)   # output [1] 1

# Surprising behavior starts here:
for(i in 2:3) print(i) # output [1] "set"
   #NULL
   #NULL

print(i)   # output NULL
print(x)   # output NULL

i <- 4 # output "set"
print(i)   # ouput [1] 4
print(x)   # ouput [1] 4


ls()
# Error in ls() :
#  Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'NULL'

Regards
Thomas
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

1 2 3 4 5 6 >

1 - 100 of 506 matches

Mail list logo