Re: [Rd] [External] API for converting LANGSXP to LISTSXP?
We have long been discouraging the use of pairlists. So no, we will not do anything to facilitate this conversion; if anything the opposite. SET_TYPEOF is used more than it should be in the sources. It is something I would like us to fix sometime, but isn't high priority. Best, luke On Fri, 5 Jul 2024, Kevin Ushey wrote: Hi, A common idiom in the R sources is to convert objects between LANGSXP and LISTSXP by using SET_TYPEOF. However, this is soon going to be disallowed in packages. From what I can see, there isn't currently a direct way to convert between these two object types using the available API. At the R level, one can convert calls to pairlists with: as.call(pairlist(as.symbol("rnorm"), 42)) rnorm(42) However, the reverse is not possible: as.pairlist(call("rnorm", 42)) Error in as.pairlist(call("rnorm", 42)) : 'language' object cannot be coerced to type 'pairlist' One can do such a conversion via conversion to e.g. an intermediate R list (VECSXP), but that seems wasteful. Would it make sense to permit this coercion? Or, is there some other relevant API I'm missing? For completeness, Rf_coerceVector() also emits the same error above since it uses the same code path. Thanks, Kevin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Non-API updates
On Tue, 25 Jun 2024, Josiah Parry wrote: Hey folks, I'm sure many of you all woke to the same message I did: "Please correct before 2024-07-09 to safely retain your package on CRAN" caused by Non-API changes to CRAN. This is quite unexpected as Luke Tierney's June 6th email writes (emphasis mine): "An *experimental* *effort* is underway to add annotations to the WRE..." "*Once things have gelled a bit more *I hope to turn this into a blog post that will include some examples of moving non-API entry point uses into compliance." Since then there has not been any indication of stabilization of the Non-API changes nor has there been a blog post outlining how to migrate. As things have been coming and going from the Non-API changes for quite some time now, we (the royal we, here) have been waiting for an official announcement from CRAN on the stabilizing changes. I posted an update to this list a few days ago. If you missed it you can find it in the archive. *Can we extend this very short notice to handle the Non-API changes before removal from CRAN? * Timing decisions are up to CRAN. In the case of the 3 packages I have to fix within 2 weeks, these are all using Rust. These changes require upstream changes to the extendr library. There are other packages that are also affected here. Making these changes is a delicate act and requires care and focus. All of the extendr developers work full time and cannot make addressing these changes their only priority for the next 2 weeks. Using non-API entry points is a choice that comes with risks. The ones leading to WARNINGs for your packages (PRSEEN and SYMVALUE)have been receiving NOTEs in check results for several weeks. Using tools:::checkPkgAPI you can see that your packages are referencing a lot of non-API entry points. Some of these may be added to the API, but most will not. This may be a good time to look into that. To minimize disruption we have been adding entry points to the API as long as it is safe to do so, in some cases against our better judgment. But ones that are unsafe to use will not be added. Eventually their declarations will be removed from public header files and they will be hidden when that is possible. Packages that have chosen to use these non-API entry points will have to adapt if they want to pass R CMD check. For now, we will try to first have use of these entry points result in NOTEs, and then WARNINGs. Once their declarations are removed and they are hidden, packages using them will fail to install. Additionally, a blog post with "examples of moving non-API entry point uses into compliance" would be very helpful in this endeavor. WRE now contains a section 'Moving into C API compliance'; that seems a better option for the moment given that things are still very much in flux. We will try to add to this section as needed. For the specific entry points generating WARNINGs for your packages the advice is simple: stop using them. Best, luke [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] non-API entry point Rf_findVarInFrame3 will be removed
The non-API entry point Rf_findVarInFrame3 used by some packages will be removed as it is not needed in one use case and not working as intended in the other. The most common use case, Rf_findVarInFrame3(rho, sym, TRUE), is equivalent to the simpler Rf_findVarInFrame(rho, sym). The less common use case is to test for existence of a binding with findVarInFrame(rho, sym, FALSE) != R_UnboundValue The intent is that this have no side effects, but that is not the case: if the binding exists and is an active binding, then its function will be called to produce a value. This usage should be replaced with R_existsVarInFrame(rho, sym). R_existsVarInFrame has been marked as part of the experimental API. It is not yet clear whether Rf_findVarInFrame will become part of an API. If it does, then its semantics will likely have to change; if it does not, an alternate interface will be provided. Best, luke -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] clarifying and adjusting the C API for R
Another quick update: Over 100 entry points used in packages for which it was safe to do so have now been marked as part of an API (in some cases after adding error checking of arguments). These can be used in package C code, with caveats for ones considered experimental or intended for embedded use. The remaining 100 or so non-API entry points used in packages will require changes in package C code. In some cases the API already provides safe alternatives to unsafe internal entry points. In most other cases it should be possible to develop safer interfaces that allow packages to accomplish what they need to do in a more robust way, while giving R maintainers and developers the freedom to make needed internal changes without disrupting package space. It will take some time to develop these new interfaces. 'Writing R extensions' now has a new section 'Moving into C API compliance' that should help with adapting to these changes. Best, luke On Thu, 6 Jun 2024, luke-tier...@uiowa.edu wrote: This is an update on some current work on the C API for use in R extensions. The internal R implementation makes use of tens of thousands of C entry points. On Linux and Windows, which support visibility restrictions, most of these are visible only within the R executble or shared library. About 1500 are not hidden and are visible to dynamically loaded shared libraries, such as ones in packages, and to embedding applications. There are two main reasons for limiting access to entry points in a software framework: - Some entry points are very easy to use in ways that corrupt internal data, leading to segfaults or, worse, incorrect computations without segfaults. - Some entry point expose internal structure and other implementation details, which makes it hard to make improvements without breaking client code that has come to depend on these details. The API of C entry points that can be used in R extensions, both for packages and embedding, has evolved organically over many years. The definition for the current release expressed in the Writing R Extensions manual (WRE) is roughly: An entry point can be used if (1) it is declared in a header file in R.home("include"), and (2) if it is documented for use in WRE. Ideally, (1) would be necessary and sufficient, but for a variety of reasons that isn't achievable, at least not in the near term. (2) can be challenging to determine; in particular, it is not amenable to a computational answer. An experimental effort is underway to add annotations to the WRE Texinfo source to allow (2) to be answered unambiguously. The annotations so far mostly reflect my reading or WRE and may be revised as they are reviewed by others. The annotated document can be used for programmatically identifying what is currently considered part of the C API. The result so far is an experimental function tools:::funAPI(): > head(tools:::funAPI()) nameloc apitype 1 Rf_AdobeSymbol2utf8 R_ext/GraphicsDevice.heapi 2alloc3DArrayWRE api 3 allocArrayWRE api 4 allocLangWRE api 5 allocListWRE api 6 allocMatrixWRE api The 'apitype' field has three possible levels | api | stable (ideally) API | | eapi | experimental API | | emb | embedding API| Entry points in the embedded API would typically only be used in applications embedding R or providing new front ends, but might be reasonable to use in packages that support embedding. The 'loc' field indicates how the entry point is identified as part of an API: explicit mention in WRE, or declaration in a header file identified as fully part of an API. [tools:::funAPI() may not be completely accurate as it relies on regular expressions for examining header files considered part of the API rather than proper parsing. But it seems to be pretty close to what can be achieved with proper parsing. Proper parsing would add dependencies on additional tools, which I would like to avoid for now. One dependency already present is that a C compiler has to be on the search path and cc -E has to run the C pre-processor.] Two additional experimental functions are available for analyzing package compliance: tools:::checkPkgAPI and tools:::checkAllPkgsAPI. These examine installed packages. [These may produce some false positives on macOS; they may or may not work on Windows at this point.] Using these tools initially showed around 200 non-API entry points used across packages on CRAN and BIOC. Ideally this number should be reduced to zero. This will require a combination of additions to the API and changes in packages. Some entry points can safely be added to the API. Around 40 have already been added to WRE with API annotations; another 40 or so
Re: [Rd] [External] Re: changes in R-devel and zero-extent objects in Rcpp
On Sat, 8 Jun 2024, Ben Bolker wrote: The ASAN errors occur *even if the zero-length object is not actually accessed*/is used in a perfectly correct manner, i.e. it's perfectly legal in base R to define `m <- numeric(0)` or `m <- matrix(nrow = 0, ncol = 0)`, whereas doing the equivalent in Rcpp will (now) lead to an ASAN error. i.e., these are *not* previously cryptic out-of-bounds accesses that are now being revealed, but instead sensible and previously legal definitions of zero-length objects that are now causing problems. I'm pretty sure I'm right about this, but it's absolutely possible that I'm just confused at this point; I don't have a super-simple example to show you at the moment. The closest is this example by Mikael Jagan: https://github.com/lme4/lme4/issues/794#issuecomment-2155093049 which shows that if x is a pointer to a zero-length vector (in plain C++ for R, no Rcpp is involved), DATAPTR(x) and REAL(x) evaluate to different values. Mikael further points out that "Rcpp seems to cast a (void *) returned by DATAPTR to (double *) when constructing a Vector from a SEXP, rather than using the (double *) returned by REAL." So perhaps R-core doesn't want to guarantee that these operations give identical answers, in which case Rcpp will have to change the way it does things ... It looks like REAL and friends should also get this check, but it's not high priority at this point, at least to me. DATAPTR has been using this check for a while in a barrier build, so you might want to test there as well. I expect we will activate more integrity checks from the barrier build on the API client side as things are tidied up. However: DATAPTR is not in the API and can't be at least in this form: It allows access to a writable pointer to STRSXP and VECSXP data and that is too dangerous for memory manager integrity. I'm not sure exactly how this will be resolve, but be prepared for changes. Best, luke cheers Ben On 2024-06-08 6:39 p.m., Kevin Ushey wrote: IMHO, this should be changed in both Rcpp and downstream packages: 1. Rcpp could check for out-of-bounds accesses in cases like these, and emit an R warning / error when such an access is detected; 2. The downstream packages unintentionally making these out-of-bounds accesses should be fixed to avoid doing that. That is, I think this is ultimately a bug in the affected packages, but Rcpp could do better in detecting and handling this for client packages (avoiding a segfault). Best, Kevin On Sat, Jun 8, 2024, 3:06 PM Ben Bolker <mailto:bbol...@gmail.com>> wrote: A change to R-devel (SVN r86629 or https://github.com/r-devel/r-svn/commit/92c1d5de23c93576f55062e26d446feface07250 <https://github.com/r-devel/r-svn/commit/92c1d5de23c93576f55062e26d446feface07250> has changed the handling of pointers to zero-length objects, leading to ASAN issues with a number of Rcpp-based packages (the commit message reads, in part, "Also define STRICT_TYPECHECK when compiling inlined.c.") I'm interested in discussion from the community. Details/diagnosis for the issues in the lme4 package are here: https://github.com/lme4/lme4/issues/794 <https://github.com/lme4/lme4/issues/794>, with a bit more discussion about how zero-length objects should be handled. The short(ish) version is that r86629 enables the CATCH_ZERO_LENGTH_ACCESS definition. This turns on the CHKZLN macro <https://github.com/r-devel/r-svn/blob/4ef83b9dc3c6874e774195d329cbb6c11a71c414/src/main/memory.c#L4090-L4104 <https://github.com/r-devel/r-svn/blob/4ef83b9dc3c6874e774195d329cbb6c11a71c414/src/main/memory.c#L4090-L4104>>, which returns a trivial pointer (rather than the data pointer that would be returned in the normal control flow) if an object has length 0: /* Attempts to read or write elements of a zero length vector will result in a segfault, rather than read and write random memory. Returning NULL would be more natural, but Matrix seems to assume that even zero-length vectors have non-NULL data pointers, so return (void *) 1 instead. Zero-length CHARSXP objects still have a trailing zero byte so they are not handled. */ In the Rcpp context this leads to an inconsistency, where `REAL(x)` is a 'real' external pointer and `DATAPTR(x)` is 0x1, which in turn leads to ASAN warnings like runtime error: reference binding to misaligned address 0x0001 for type 'const double', which requires 8 byte alignment 0x0001: note: pointer points here I'm in over my head and hoping for insight into whether this problem should be resolved by changing R, Rcpp, or downstream Rcpp packages ... cheers Ben
Re: [Rd] [External] Re: clarifying and adjusting the C API for R
On Sat, 8 Jun 2024, Reed A. Cartwright wrote: [You don't often get email from racartwri...@gmail.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] Would it be reasonable to move the non-API stuff that cannot be hidden into header files inside a "details" directory (or some other specific naming scheme)? That's what I use when I need to separate a public API from an internal API. As do I, as does everyone else. As I wrote originally: " ... for a variety of reasons that isn't achievable, at least not in the near term." Can we leave it at that please? luke On Fri, Jun 7, 2024 at 7:30 AM luke-tierney--- via R-devel wrote: On Fri, 7 Jun 2024, Steven Dirkse wrote: You don't often get email from sdir...@gams.com. Learn why this is important Thanks for sharing this overview of an interesting and much-needed project. You mention that R exports about 1500 symbols (on platforms supporting visibility) but this subject isn't mentioned explicitly again in your note, so I'm wondering how things tie together. Un-exported symbols cannot be part of the API - how would people use them in this case? In a perfect world the set of exported symbols could define the API or match it exactly, but I guess that isn't the case at present. So I conclude that R exports extra (i.e. non-API) symbols. Is part of the goal to remove these extra exports? No. We'll hide what we can, but base packages for one need access to some entry points that should not be in the API, so those have to stay un-hidden. Best, luke -Steve On Thu, Jun 6, 2024 at 10:47 AM luke-tierney--- via R-devel wrote: This is an update on some current work on the C API for use in R extensions. The internal R implementation makes use of tens of thousands of C entry points. On Linux and Windows, which support visibility restrictions, most of these are visible only within the R executble or shared library. About 1500 are not hidden and are visible to dynamically loaded shared libraries, such as ones in packages, and to embedding applications. There are two main reasons for limiting access to entry points in a software framework: - Some entry points are very easy to use in ways that corrupt internal data, leading to segfaults or, worse, incorrect computations without segfaults. - Some entry point expose internal structure and other implementation details, which makes it hard to make improvements without breaking client code that has come to depend on these details. The API of C entry points that can be used in R extensions, both for packages and embedding, has evolved organically over many years. The definition for the current release expressed in the Writing R Extensions manual (WRE) is roughly: An entry point can be used if (1) it is declared in a header file in R.home("include"), and (2) if it is documented for use in WRE. Ideally, (1) would be necessary and sufficient, but for a variety of reasons that isn't achievable, at least not in the near term. (2) can be challenging to determine; in particular, it is not amenable to a computational answer. An experimental effort is underway to add annotations to the WRE Texinfo source to allow (2) to be answered unambiguously. The annotations so far mostly reflect my reading or WRE and may be revised as they are reviewed by others. The annotated document can be used for programmatically identifying what is currently considered part of the C API. The result so far is an experimental function tools:::funAPI(): > head(tools:::funAPI()) nameloc apitype 1 Rf_AdobeSymbol2utf8 R_ext/GraphicsDevice.heapi 2alloc3DArrayWRE api 3 allocArrayWRE api 4 allocLangWRE api 5 allocListWRE api 6 allocMatrixWRE api The 'apitype' field has three possible levels | api | stable (ideally) API | | eapi | experimental API | | emb | embedding API| Entry points in the embedded API would typically only be used in applications embedding R or providing new front ends, but might be reasonable to use in packages that support embedding. The 'loc' field indicates how the entry point is identified as part of an API: explicit mention in WRE, or declaration in a header file identifie
Re: [Rd] [External] Re: clarifying and adjusting the C API for R
On Fri, 7 Jun 2024, Hadley Wickham wrote: Thanks for working on this Luke! We appreciate your efforts to make it easier to tell what's in the exported API and we're very happy to work with you on any changes needed to tidyverse/r-lib packages. Hadley Thanks. Glad to hear -- I may be reminding you when we hit some of the tougher challenges down the road :-) Best, luke On Thu, Jun 6, 2024 at 9:47 AM luke-tierney--- via R-devel wrote: This is an update on some current work on the C API for use in R extensions. The internal R implementation makes use of tens of thousands of C entry points. On Linux and Windows, which support visibility restrictions, most of these are visible only within the R executble or shared library. About 1500 are not hidden and are visible to dynamically loaded shared libraries, such as ones in packages, and to embedding applications. There are two main reasons for limiting access to entry points in a software framework: - Some entry points are very easy to use in ways that corrupt internal data, leading to segfaults or, worse, incorrect computations without segfaults. - Some entry point expose internal structure and other implementation details, which makes it hard to make improvements without breaking client code that has come to depend on these details. The API of C entry points that can be used in R extensions, both for packages and embedding, has evolved organically over many years. The definition for the current release expressed in the Writing R Extensions manual (WRE) is roughly: An entry point can be used if (1) it is declared in a header file in R.home("include"), and (2) if it is documented for use in WRE. Ideally, (1) would be necessary and sufficient, but for a variety of reasons that isn't achievable, at least not in the near term. (2) can be challenging to determine; in particular, it is not amenable to a computational answer. An experimental effort is underway to add annotations to the WRE Texinfo source to allow (2) to be answered unambiguously. The annotations so far mostly reflect my reading or WRE and may be revised as they are reviewed by others. The annotated document can be used for programmatically identifying what is currently considered part of the C API. The result so far is an experimental function tools:::funAPI(): > head(tools:::funAPI()) name loc apitype 1 Rf_AdobeSymbol2utf8 R_ext/GraphicsDevice.h eapi 2 alloc3DArray WRE api 3 allocArray WRE api 4 allocLang WRE api 5 allocList WRE api 6 allocMatrix WRE api The 'apitype' field has three possible levels | api | stable (ideally) API | | eapi | experimental API | | emb | embedding API | Entry points in the embedded API would typically only be used in applications embedding R or providing new front ends, but might be reasonable to use in packages that support embedding. The 'loc' field indicates how the entry point is identified as part of an API: explicit mention in WRE, or declaration in a header file identified as fully part of an API. [tools:::funAPI() may not be completely accurate as it relies on regular expressions for examining header files considered part of the API rather than proper parsing. But it seems to be pretty close to what can be achieved with proper parsing. Proper parsing would add dependencies on additional tools, which I would like to avoid for now. One dependency already present is that a C compiler has to be on the search path and cc -E has to run the C pre-processor.] Two additional experimental functions are available for analyzing package compliance: tools:::checkPkgAPI and tools:::checkAllPkgsAPI. These examine installed packages. [These may produce some false positives on macOS; they may or may not work on Windows at this point.] Using these tools initially showed around 200 non-API entry points used across packages on CRAN and BIOC. Ideally this number should be reduced to zero. This will require a combination of additions to the API and changes in packages. Some entry points can safely be added to the API. Around 40 have already be
Re: [Rd] [External] Re: clarifying and adjusting the C API for R
On Fri, 7 Jun 2024, Steven Dirkse wrote: You don't often get email from sdir...@gams.com. Learn why this is important Thanks for sharing this overview of an interesting and much-needed project. You mention that R exports about 1500 symbols (on platforms supporting visibility) but this subject isn't mentioned explicitly again in your note, so I'm wondering how things tie together. Un-exported symbols cannot be part of the API - how would people use them in this case? In a perfect world the set of exported symbols could define the API or match it exactly, but I guess that isn't the case at present. So I conclude that R exports extra (i.e. non-API) symbols. Is part of the goal to remove these extra exports? No. We'll hide what we can, but base packages for one need access to some entry points that should not be in the API, so those have to stay un-hidden. Best, luke -Steve On Thu, Jun 6, 2024 at 10:47 AM luke-tierney--- via R-devel wrote: This is an update on some current work on the C API for use in R extensions. The internal R implementation makes use of tens of thousands of C entry points. On Linux and Windows, which support visibility restrictions, most of these are visible only within the R executble or shared library. About 1500 are not hidden and are visible to dynamically loaded shared libraries, such as ones in packages, and to embedding applications. There are two main reasons for limiting access to entry points in a software framework: - Some entry points are very easy to use in ways that corrupt internal data, leading to segfaults or, worse, incorrect computations without segfaults. - Some entry point expose internal structure and other implementation details, which makes it hard to make improvements without breaking client code that has come to depend on these details. The API of C entry points that can be used in R extensions, both for packages and embedding, has evolved organically over many years. The definition for the current release expressed in the Writing R Extensions manual (WRE) is roughly: An entry point can be used if (1) it is declared in a header file in R.home("include"), and (2) if it is documented for use in WRE. Ideally, (1) would be necessary and sufficient, but for a variety of reasons that isn't achievable, at least not in the near term. (2) can be challenging to determine; in particular, it is not amenable to a computational answer. An experimental effort is underway to add annotations to the WRE Texinfo source to allow (2) to be answered unambiguously. The annotations so far mostly reflect my reading or WRE and may be revised as they are reviewed by others. The annotated document can be used for programmatically identifying what is currently considered part of the C API. The result so far is an experimental function tools:::funAPI(): > head(tools:::funAPI()) name loc apitype 1 Rf_AdobeSymbol2utf8 R_ext/GraphicsDevice.h eapi 2 alloc3DArray WRE api 3 allocArray WRE api 4 allocLang WRE api 5 allocList WRE api 6 allocMatrix WRE api The 'apitype' field has three possible levels | api | stable (ideally) API | | eapi | experimental API | | emb | embedding API | Entry points in the embedded API would typically only be used in applications embedding R or providing new front ends, but might be reasonable to use in packages that support embedding. The 'loc' field indicates how the entry point is identified as part of an API: explicit mention in WRE, or declaration in a header file identified as fully part of an API. [tools:::funAPI() may not be completely accurate as it relies on regular expressions for examining header files considered part of the API rather than proper parsing. But it seems to be pretty close to what can be achieved with proper parsing. Proper parsing would add dependencies on additional tools, which I would like to avoid for now. One dependency already present is that a C compiler has to be on the search path and cc -E has to run the C pre-processor.] Two additional experimental functions are available for analyzing package compliance: tools:::checkPkgAPI and tools:::checkAl
[Rd] clarifying and adjusting the C API for R
This is an update on some current work on the C API for use in R extensions. The internal R implementation makes use of tens of thousands of C entry points. On Linux and Windows, which support visibility restrictions, most of these are visible only within the R executble or shared library. About 1500 are not hidden and are visible to dynamically loaded shared libraries, such as ones in packages, and to embedding applications. There are two main reasons for limiting access to entry points in a software framework: - Some entry points are very easy to use in ways that corrupt internal data, leading to segfaults or, worse, incorrect computations without segfaults. - Some entry point expose internal structure and other implementation details, which makes it hard to make improvements without breaking client code that has come to depend on these details. The API of C entry points that can be used in R extensions, both for packages and embedding, has evolved organically over many years. The definition for the current release expressed in the Writing R Extensions manual (WRE) is roughly: An entry point can be used if (1) it is declared in a header file in R.home("include"), and (2) if it is documented for use in WRE. Ideally, (1) would be necessary and sufficient, but for a variety of reasons that isn't achievable, at least not in the near term. (2) can be challenging to determine; in particular, it is not amenable to a computational answer. An experimental effort is underway to add annotations to the WRE Texinfo source to allow (2) to be answered unambiguously. The annotations so far mostly reflect my reading or WRE and may be revised as they are reviewed by others. The annotated document can be used for programmatically identifying what is currently considered part of the C API. The result so far is an experimental function tools:::funAPI(): > head(tools:::funAPI()) nameloc apitype 1 Rf_AdobeSymbol2utf8 R_ext/GraphicsDevice.heapi 2alloc3DArrayWRE api 3 allocArrayWRE api 4 allocLangWRE api 5 allocListWRE api 6 allocMatrixWRE api The 'apitype' field has three possible levels | api | stable (ideally) API | | eapi | experimental API | | emb | embedding API| Entry points in the embedded API would typically only be used in applications embedding R or providing new front ends, but might be reasonable to use in packages that support embedding. The 'loc' field indicates how the entry point is identified as part of an API: explicit mention in WRE, or declaration in a header file identified as fully part of an API. [tools:::funAPI() may not be completely accurate as it relies on regular expressions for examining header files considered part of the API rather than proper parsing. But it seems to be pretty close to what can be achieved with proper parsing. Proper parsing would add dependencies on additional tools, which I would like to avoid for now. One dependency already present is that a C compiler has to be on the search path and cc -E has to run the C pre-processor.] Two additional experimental functions are available for analyzing package compliance: tools:::checkPkgAPI and tools:::checkAllPkgsAPI. These examine installed packages. [These may produce some false positives on macOS; they may or may not work on Windows at this point.] Using these tools initially showed around 200 non-API entry points used across packages on CRAN and BIOC. Ideally this number should be reduced to zero. This will require a combination of additions to the API and changes in packages. Some entry points can safely be added to the API. Around 40 have already been added to WRE with API annotations; another 40 or so can probably be added after review. The remainder mostly fall into two groups: - Entry points that should never be used in packages, such as SET_OBJECT or SETLENGTH (or any non-API SETXYZ functions for that matter) that can create inconsistent or corrupt internal state. - Entry points that depend on the existence of internal structure that might be subject to change, such as the existence of promise objects or internal structure of environments. Many, if not most, of these seem to be used in idioms that can either be accomplished with existing higher-level functions already in the API, or by new higher level functions that can be created and added. Working through these will take some time and coordination between R-core and maintainers of affected packages. Once things have gelled a bit more I hope to turn this into a blog post that will include some examples of moving non-API entry point uses into compliance. Best, luke -- Luke Tierney Ralph E. Wareham Professor of Ma
Re: [Rd] [External] R hang/bug with circular references and promises
On Mon, 13 May 2024, Ivan Krylov wrote: [You don't often get email from ikry...@disroot.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] On Mon, 13 May 2024 09:54:27 -0500 (CDT) luke-tierney--- via R-devel wrote: Looks like I added that warning 22 years ago, so that should be enough notice :-). I'll look into removing it now. For now I have just changed the internal code to throw an error if the change would produce a cycle (r86545). This gives > e <- new.env() > parent.env(e) <- e Error in `parent.env<-`(`*tmp*`, value = ) : cycles in parent chains are not allowed Dear Luke, I've got a somewhat niche use case: as a way of protecting myself against rogue *.rds files and vulnerabilities in the C code, I've been manually unserializing "plain" data objects (without anything executable), including environments, in R [1]. I would try using two passes: create the environments in the first pass and in a second pass, either over the file or a new object with place holders, fill them in. I see that SET_ENCLOS() is already commented as "not API and probably should not be <...> used". Do you think there is a way to recreate an environment, taking the REFSXP entries into account, without `parent.env<-`? Would you recommend to abandon the folly of unserializing environments manually? SET_ENCLOS is one of a number of SET... functions that are not in the API and should not be since they are potentially unsafe to use. (One that is in the API and needs to be removed is SET_TYPEOF). So we would like to move them out of installed headers and not export them as entry points. For this particular case most uses I see are something like env = allocSExp(ENVSXP); SET_FRAME(env, R_NilValue); SET_ENCLOS(env, parent); SET_HASHTAB(env, R_NilValue); SET_ATTRIB(env, R_NilValue); which could just use env = R_NewEnv(parent, FALSE, 0); Best, luke -- Best regards, Ivan [1] https://codeberg.org/aitap/unserializeData/src/commit/33d72705c1ee265349b3e369874ce4b47f9cd358/R/unserialize.R#L289-L313 -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] R hang/bug with circular references and promises
On Sat, 11 May 2024, Peter Langfelder wrote: On Sat, May 11, 2024 at 9:34 AM luke-tierney--- via R-devel wrote: On Sat, 11 May 2024, Travers Ching wrote: The following code snippet causes R to hang. This example might be a bit contrived as I was experimenting and trying to understand promises, but uses only base R. This has nothing to do with promises. You created a cycle in the environment chain. A simpler variant: e <- new.env() parent.env(e) <- e get("x", e) This will hang and is not interruptable -- loops searching up environment chains are too speed-critical to check for interrupts. It is, however, pretty easy to check whether the parent change would create a cycle and throw an error if it would. Need to think a bit about exactly where the check should go. FWIW, the help for parent.env already explicitly warns against using parent.env <-: The replacement function ‘parent.env<-’ is extremely dangerous as it can be used to destructively change environments in ways that violate assumptions made by the internal C code. It may be removed in the near future. Looks like I added that warning 22 years ago, so that should be enough notice :-). I'll look into removing it now. Best, luke Peter -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] R hang/bug with circular references and promises
On Sat, 11 May 2024, Travers Ching wrote: The following code snippet causes R to hang. This example might be a bit contrived as I was experimenting and trying to understand promises, but uses only base R. It looks like it is looking for "not_a_variable" recursively but since it doesn't exist it goes on indefinitely. x0 <- new.env() x1 <- new.env(parent = x0) parent.env(x0) <- x1 delayedAssign("v", not_a_variable, eval.env=x1) delayedAssign("w", v, assign.env=x1, eval.env=x0) x1$w This has nothing to do with promises. You created a cycle in the environment chain. A simpler variant: e <- new.env() parent.env(e) <- e get("x", e) This will hang and is not interruptable -- loops searching up environment chains are too speed-critical to check for interrupts. It is, however, pretty easy to check whether the parent change would create a cycle and throw an error if it would. Need to think a bit about exactly where the check should go. Best, luke __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Re: Patches for CVE-2024-27322
That should do it Sent from my iPad On Apr 30, 2024, at 9:57 AM, Iñaki Ucar wrote: Many thanks both. I'll wait for Luke's confirmation to trigger the update with the backported fix. Iñaki On Tue, 30 Apr 2024 at 12:42, Dirk Eddelbuettel mailto:e...@debian.org>> wrote: On 30 April 2024 at 11:59, peter dalgaard wrote: | svn diff -c 86235 ~/r-devel/R Which is also available as https://github.com/r-devel/r-svn/commit/f7c46500f455eb4edfc3656c3fa20af61b16abb7 Dirk | (or 86238 for the port to the release branch) should be easily backported. | | (CC Luke in case there is more to it) | | - pd | | > On 30 Apr 2024, at 11:28 , Iñaki Ucar mailto:iu...@fedoraproject.org>> wrote: | > | > Dear R-core, | > | > I just received notification of CVE-2024-27322 [1] in RedHat's Bugzilla. We | > updated R to v4.4.0 in Fedora rawhide, F40, EPEL9 and EPEL8, so no problem | > there. However, F38 and F39 will stay at v4.3.3, and I was wondering if | > there's a specific patch available, or if you could point me to the commits | > that fixed the issue, so that we can cherry-pick them for F38 and F39. | > Thanks. | > | > [1] https://nvd.nist.gov/vuln/detail/CVE-2024-27322 | > | > Best, | > -- | > Iñaki Úcar | > | > [[alternative HTML version deleted]] | > | > __ | > R-devel@r-project.org<mailto:R-devel@r-project.org> mailing list | > https://stat.ethz.ch/mailman/listinfo/r-devel | | -- | Peter Dalgaard, Professor, | Center for Statistics, Copenhagen Business School | Solbjerg Plads 3, 2000 Frederiksberg, Denmark | Phone: (+45)38153501 | Office: A 4.23 | Email: pd@cbs.dk<mailto:pd@cbs.dk> Priv: pda...@gmail.com<mailto:pda...@gmail.com> | | __ | R-devel@r-project.org<mailto:R-devel@r-project.org> mailing list | https://stat.ethz.ch/mailman/listinfo/r-devel -- dirk.eddelbuettel.com<http://dirk.eddelbuettel.com/> | @eddelbuettel | e...@debian.org<mailto:e...@debian.org> -- Iñaki Úcar [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] View() segfaulting ...
I saw it also on some of my Ubuntu builds, but the issue went away after a make clean/make, so maybe give that a try. Best, luke On Wed, 24 Apr 2024, Ben Bolker wrote: I'm using bleeding-edge R-devel, so maybe my build is weird. Can anyone else reproduce this? View() seems to crash on just about anything. View(1:3) *** stack smashing detected ***: terminated Aborted (core dumped) If I debug(View) I get to the last line of code with nothing obviously looking pathological: Browse[1]> debug: invisible(.External2(C_dataviewer, x, title)) Browse[1]> x $x [1] "1" "2" "3" Browse[1]> title [1] "Data: 1:3" Browse[1]> *** stack smashing detected ***: terminated Aborted (core dumped) R Under development (unstable) (2024-04-24 r86483) Platform: x86_64-pc-linux-gnu Running under: Pop!_OS 22.04 LTS Matrix products: default BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0 locale: [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_CA.UTF-8LC_COLLATE=en_CA.UTF-8 [5] LC_MONETARY=en_CA.UTF-8LC_MESSAGES=en_CA.UTF-8 [7] LC_PAPER=en_CA.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C time zone: America/Toronto tzcode source: system (glibc) attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.5.0 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Re: Is ALTREP "non-API"?
On Wed, 24 Apr 2024, Hadley Wickham wrote: A few more thoughts based on a simple question: how do you determine the length of a vector? Rf_length() is used in example code in R-exts, but I don't think it's formally documented anywhere (although it's possible I missed it). Is using in an example sufficient to consider a function to be part of the public API? If so, SET_TYPEOF() is used in a number of examples, and hence used by CRAN packages, but is no longer considered part of the public API. Rf_xlength() doesn't appear to be mentioned anywhere in R-exts. Does this imply that long vectors are not part of the exported API? Or is there some other way we should be determining the length of such vectors? Are the macro variants LENGTH and XLENGTH part of the exported API? Are we supposed to use them or avoid them? Relatedly, I presume that LOGICAL() is the way we're supposed to extract logical values from a vector, but it isn't documented in R-exts, suggesting that it's not part of the public API? My pragmatic approach to deciding if an entry point is usable in a package is to grep for it in the installed headers grep for it in WRE if those are good, check the text in both places to make sure it doesn't tell me not to use is The first two can be automated; the text reading can't for now. One place this runs into trouble is when the prose in WRE doesn't explicitly mention the entry point, but says something like 'this one and similar ones are OK'. A couple of years ago I worked on improving some of those by explicitly adding some of those implicit ones, which did sometimes make the text more cumbersome. I'm pretty sure I added LOGICAL() and RAW() at that point (but may be mis-remebering); they are there now. In some other cases I left the text alone but added index entries. That makes them findable with a text search. I think I got most that can be handled that way, but there may be some others left. Far from ideal, but at least a step forward. --- It's also worth pointing out where R-exts does well, with the documentation of utility functions ( https://cran.r-project.org/doc/manuals/R-exts.html#Utility-functions). I think this is what most people would consider documentation to imply, i.e. a list of input arguments/types, the output type, and basic notes on their operation. --- Finally, it's worth noting that there's some lingering ill feelings over how the connections API was treated. It was documented in R-exts only to be later removed, including expunging mentions of it in the news. That's obviously water under the bridge, but I do believe that there is the potential for the R core team to build goodwill with the community if they are willing to engage a bit more with the users of their APIs. As you well know R-core is not a monolith. There are several R-core members who also are not happy about how that played out and where that stands now. But there was and is no viable option other than to agree to disagree. There is really no upside to re-litigating this now. Best, luke Hadley [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Re: Is ALTREP "non-API"?
o learning that we weren't supposed to. Making a list and hoping that it will remain up to date is not realistic. The only way that would work reliably is if the list could be programmatically generated, for example by parsing installed headers for declarations and caveats as above. Which would be possible with changes like the ones listed above. Best, luke Hadley -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Calling applyClosure from a package?
On Sun, 14 Apr 2024, Matthew Kay wrote: [You don't often get email from matthew@u.northwestern.edu. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] Hi, Short version of my question: Rf_applyClosure was marked attribute_hidden in Oct 2023, and I am curious why and if there is an alternative interface to it planned. applyClosure has never been part of the API and was/is not intended for use by packages. Keeping things like this internal is essential to give us flexibility to make needed improvements to the basic engine. Moving this out of the installed headers and marking it as not to be exported merely clarifies that it is internal. Long version: I have been toying with building a package that makes it easier to do non-standard evaluation directly using promises, rather than wrapping these in a custom type (like e.g. rlang does). The advantage of this approach is that it should be fully compatible with functions that use the standard R functions for NSE and inspecting function context, like substitute(), match.call(), or parent.frame(). And indeed, it works! -- in R 4.3, that is. The prototype version of the package is here: https://github.com/mjskay/uneval (the relevant function to my question is probably do_invoke, in R/invoke.R). While testing on R-devel, I noticed that Rf_applyClosure(), which used to be exported, is now marked with attribute_hidden. I traced the change to this commit in Oct 2023: https://github.com/r-devel/r-svn/commit/57dbe8ad471c8a34314ee74362ad479db03c033a However, the commit message did not give me clarity on the reason for the change, and I have not been able to find mention of this change in R-devel, R-package-devel, or the R bug tracker. So, I am curious why this function is no longer exported and if there is an alternative function planned to take its place. Neither Rf_eval nor do.call can do what I need to fully support rlang-style NSE using base R. The problem is that I need to be able to manually set up the list of promises provided as arguments to the function. I fully understand that the answer to my question might be "don't do that" ;). That would be my advice: Don't do that. The API does not provide an interface for working with promises; in fact the existence of promises is not guaranteed in the future. Some packages have unfortunately made use of some internal functions related to promises. For the ones on CRAN we will work with the maintainers to find alternate approaches. This may mean adding some functions to the API for dealing with some lazy-evaluation-related features at a higher level. Best, luke But I will humbly suggest that it would be really nice to be able to do NSE that can capture expressions with heterogeneous environments and pass these to functions in a way that is compatible with existing R functions that do NSE. The basic tools to do it are there in R 4.3, I think... Thanks for the help! ---Matt -- Matthew Kay Associate Professor Computer Science & Communication Studies Northwestern University matthew@u.northwestern.edu http://www.mjskay.com/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Re: Repeated library() of one package with different include.only= entries
On Thu, 11 Apr 2024, Duncan Murdoch wrote: On 11/04/2024 7:04 a.m., Martin Maechler wrote: Michael Chirico on Mon, 8 Apr 2024 10:19:29 -0700 writes: > Right now, attaching the same package with different include.only= has no > effect: > library(Matrix, include.only="fac2sparse") > library(Matrix) > ls("package:Matrix") > # [1] "fac2sparse" > ?library does not cover this case -- what is covered is the _loading_ > behavior of repeated calls: >> [library and require] check and update the list of currently attached > packages and do not reload a namespace which is already loaded > But here we're looking at the _attach_ behavior of repeated calls. > I am particularly interested in allowing the exports of a package to be > built up gradually: > library(Matrix, include.only="fac2sparse") > library(Matrix, include.only="isDiagonal") # want: ls("package:Matrix") --> > c("fac2sparse", "isDiagonal") > ... > It seems quite hard to accomplish this at the moment. Is the behavior to > ignore new inclusions intentional? Could there be an argument to get > different behavior? As you did not get an answer yet, ..., some remarks by an R-corer who has tweaked library() behavior in the past : - The `include.only = *` argument to library() has been a *relatively* recent addition {given the 25+ years of R history}: It was part of the extensive new features by Luke Tierney for R 3.6.0 [r76248 | luke | 2019-03-18 17:29:35 +0100], with NEWS entry • library() and require() now allow more control over handling search path conflicts when packages are attached. The policy is controlled by the new conflicts.policy option. - I haven't seen these (then) new features been used much, unfortunately, also not from R-core members, but I'd be happy to be told a different story. For the above reasons, it could well be that the current implementation {of these features} has not been exercised a lot yet, and limitations as you found them haven't been noticed yet, or at least not noticed on the public R mailing lists, nor otherwise by R-core (?). Your implicitly proposed new feature (or even *changed* default behavior) seems to make sense to me -- but as alluded to, above, I haven't been a conscious user of any 'library(.., include.only = *)' till now. I don't think it makes sense. I would assume that library(Matrix, include.only="isDiagonal") implies that only `isDiagonal` ends up on the search path, i.e. "include.only" means "include only", not "include in addition to whatever else has already been attached". I think a far better approach to solve Michael's problem is simply to use fac2sparse <- Matrix::fac2sparse isDiagonal <- Matrix::isDiagonal instead of messing around with the user's search list, which may have been intentionally set to include only one of those. So I'd suggest changing the docs to say "[library and require] check and update the list of currently attached packages and do not reload a namespace which is already loaded. If a package is already attached, no change will be made." ?library could also mention using detach() followed by library() or attachNamespace() with a new include.only specification. Best, luke Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Bug in out-of-bounds assignment of list object to expression() vector
Thanks for the report. Fixed in R-devel and R-patched (both R-4-4-branch and R-4-3-branch). On Fri, 5 Apr 2024, June Choe wrote: [You don't often get email from jchoe...@gmail.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] There seems to be a bug in out-of-bounds assignment of list objects to an expression() vector. Tested on release and devel. (Many thanks to folks over at Mastodon for the help narrowing down this bug) When assigning a list into an existing index, it correctly errors on incompatible type, and the expression vector is unchanged: ``` x <- expression(a,b,c) x[[3]] <- list() # Error x #> expression(a, b, c) ``` When assigning a list to an out of bounds index (ex: the next, n+1 index), it errors the same but now changes the values of the vector to NULL: ``` x <- expression(a,b,c) x[[4]] <- list() # Error x #> expression(NULL, NULL, NULL) ``` Curiously, this behavior disappears if a prior attempt is made at assigning to the same index, using a different incompatible object that does not share this bug (like a function): ``` x <- expression(a,b,c) x[[4]] <- base::sum # Error x[[4]] <- list() # Error x #> expression(a, b, c) ``` That "protection" persists until x[[4]] is evaluated, at which point the bug can be produced again: ``` x[[4]] # Error x[[4]] <- list() # Error x #> expression(NULL, NULL, NULL) ``` Note that `x` has remained a 3-length vector throughout. Best, June [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Re: Bug in out-of-bounds assignment of list object to expression() vector
On Fri, 5 Apr 2024, Ivan Krylov via R-devel wrote: On Fri, 5 Apr 2024 08:15:20 -0400 June Choe wrote: When assigning a list to an out of bounds index (ex: the next, n+1 index), it errors the same but now changes the values of the vector to NULL: ``` x <- expression(a,b,c) x[[4]] <- list() # Error x #> expression(NULL, NULL, NULL) ``` Curiously, this behavior disappears if a prior attempt is made at assigning to the same index, using a different incompatible object that does not share this bug (like a function) Here's how the problem happens: 1. The call lands in src/main/subassign.c, do_subassign2_dflt(). 2. do_subassign2_dflt() calls SubassignTypeFix() to prepare the operand for the assignment. 3. Since the assignment is "stretching", SubassignTypeFix() calls EnlargeVector() to provide the space for the assignment. The bug relies on `x` not being IS_GROWABLE(), which may explain why a plain x[[4]] <- list() sometimes doesn't fail. The future assignment result `x` is now expression(a, b, c, NULL), and the old `x` set to expression(NULL, NULL, NULL) by SET_VECTOR_ELT(newx, i, VECTOR_ELT(x, i)); CLEAR_VECTOR_ELT(x, i); during EnlargeVector(). 4. But then the assignment fails, raising the error back in do_subassign2_dflt(), because the assignment kind is invalid: there is no way to put data.frames into an expression vector. The new resized `x` is lost, and the old overwritten `x` stays there. Not sure what the right way to fix this is. It's desirable to avoid shallow_duplicate(x) for the overwriting assignments, but then the sub-assignment must either succeed or leave the operand untouched. Is there a way to perform the type check before overwriting the operand? Yes. There are two places where there are some checks, one early and the other late. The early one is explicitly letting this one through and shouldn't. So a one line change would address this particular problem. But it would be a good idea to review why we the late checks are needed at all and maybe change that. I'll look into it. Best, luke -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Ordered comparison operators on language objects will signal errors
Comparison operators == and != can be used on language objects (i.e. call objects and symbols). The == operator in particular often seems to be used as a shorthand for calling identical(). The current implementation involves comparing deparsed calls as strings. This has a number of drawbacks and we would like to transition to a more robust and efficient implementation. As a first step, R-devel will soon be modified to signal an error when the ordered comparison operators <, <=, >, >= are used on language objects. A small number of CRAN and BIOC packages will fail after this change. If you want to check your packages or code before the change is committed you can run the current R-devel with the environment variable setting _R_COMPARE_LANG_OBJECTS=eqonly where using such a comparison now produces > quote(x + y) > 1 Error in quote(x + y) > 1 : comparison (>) is not possible for language types Best, luke -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Get list of active calling handlers?
On Tue, 6 Feb 2024, Duncan Murdoch wrote: The SO post https://stackoverflow.com/q/77943180 tried to call globalCallingHandlers() from a function, and it failed with the error message "should not be called with handlers on the stack". A much simpler illustration of the same error comes from this line: try(globalCallingHandlers(warning = function(e) e)) The problem here is that try() sets an error handler, and globalCallingHandlers() sees it and aborts. If I call globalCallingHandlers() with no arguments, I get a list of currently active global handlers. Is there also a way to get a list of active handlers, including non-global ones (like the one try() added in the line above)? There is not. The internal stack is not safe to allow to escape to the R level. It would be possible to write a reflection function to provide some information, but it would be a fair bit of work to design and I don't think would be of enough value to justify that. The original SO question would be better addressed to Posit/RStudio. Someone with enough motivation might also be able to figure out an answer by looking at the source code at https://github.com/rstudio/rstudio. Best, luke Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] readChar() could read the whole file by default?
On Fri, 26 Jan 2024, Michael Chirico wrote: I am curious why readLines() has a default (n=-1L) to read the full file while readChar() has no default for nchars= (i.e., readChar(file) is an error). Is there a technical reason for this? I often[1] see code like paste(readLines(f), collapse="\n") which would be better served by readChar(), especially given issues with the global string cache I've come across[2]. But lacking the default, the replacement might come across less clean. The string cache seems like a very dark pink herring to me. The fact that the lines are allocated on the heap might create an issue; the cache isn't likely to add much to that. In any case I would need to see a realistic example to convince me this is worth addressing on performance grounds. I don't see any reason in principle not to have readChar and readBin read the entire file if n = -1 (others might) but someone would need to write a patch to implement that. Best, luke For my own purposes the incantation readChar(file, file.size(file)) is ubiquitous. Taking CRAN code[3] as a sample[4], 41% of readChar() calls use either readChar(f, file.info(f)$size) or readChar(f, file.size(f))[5]. Thanks for the consideration and feedback, Mike C [1] e.g. a quick search shows O(100) usages in CRAN packages: https://github.com/search?q=org%3Acran+%2Fpaste%5B%28%5D%5Cs*readLines%5B%28%5D.*%5B%29%5D%2C%5Cs*collapse%5Cs*%3D%5Cs*%5B%27%22%5D%5B%5C%5C%5D%2F+lang%3AR&type=code, and O(1000) usages generally on GitHub: https://github.com/search?q=lang%3AR+%2Fpaste%5B%28%5D%5Cs*readLines%5B%28%5D.*%5B%29%5D%2C%5Cs*collapse%5Cs*%3D%5Cs*%5B%27%22%5D%5B%5C%5C%5D%2F+lang%3AR&type=code [2] AIUI the readLines() approach "pollutes" the global string cache with potentially 1000s/1s of strings for each line, only to get them gc()'d after combining everything with paste(collapse="\n") [3] The mirror on GitHub, which includes archived packages as well as current (well, eventually-consistent) versions. [4] Note that usage in packages is likely not representative of usage in scripts, e.g. I often saw readChar(f, 1), or eol-finders like readChar(f, 500) + grep("[\n\r]"), which makes more sense to me as something to find in package internals than in analysis scripts. FWIW I searched an internal codebase (scripts and packages) and found 70% of usages reading the full file. [5] repro: https://gist.github.com/MichaelChirico/247ea9500460dca239f031e74bdcf76b requires GitHub PAT in env GITHUB_PAT for API permissions. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Re: Choices to remove `srcref` (and its buddies) when serializing objects
On Thu, 18 Jan 2024, Ivan Krylov via R-devel wrote: В Tue, 16 Jan 2024 14:16:19 -0500 Dipterix Wang пишет: Could you recommend any packages/functions that compute hash such that the source references and sexpinfo_struct are ignored? Basically a version of `serialize` that convert R objects to raw without storing the ancillary source reference and sexpinfo. I can show how this can be done, but it's not currently on CRAN or even a well-defined package API. I have adapted a copy of R's serialize() [*] with the following changes: * Function bytecode and flags are ignored: f <- function() invisible() depcache:::hash(f, 2) # This is plain FNV1a-64 of serialize() output # [1] "9b7a1af5468deba4" .Call(depcache:::C_hash2, f) # This is the new hash [1] 91 5f b8 a1 b0 6b cb 40 f() # called once: function gets the MAYBEJIT_MASK flag depcache:::hash(f, 2) # [1] "7d30e05546e7a230" .Call(depcache:::C_hash2, f) # [1] 91 5f b8 a1 b0 6b cb 40 f() # called twice: function now has bytecode depcache:::hash(f, 2) # [1] "2a2cba4150e722b8" .Call(depcache:::C_hash2, f) # [1] 91 5f b8 a1 b0 6b cb 40 # new hash stays the same * Source references are ignored: .Call(depcache:::C_hash2, \( ) invisible( )) # [1] 91 5f b8 a1 b0 6b cb 40 # compare vs. above # For quoted function definitions, source references have to be handled # differently .Call(depcache:::C_hash2, quote(function(){})) [1] 58 0d 44 8e d4 fd 37 6f .Call(depcache:::C_hash2, quote(\( ){ })) [1] 58 0d 44 8e d4 fd 37 6f * ALTREP is ignored: identical(1:10, 1:10+0L) # [1] TRUE identical(serialize(1:10, NULL), serialize(1:10+0L, NULL)) # [1] FALSE identical( .Call(depcache:::C_hash2, 1:10), .Call(depcache:::C_hash2, 1:10+0L) ) # [1] TRUE * Strings not marked as bytes are encoded into UTF-8: identical('\uff', iconv('\uff', 'UTF-8', 'latin1')) # [1] TRUE identical( serialize('\uff', NULL), serialize(iconv('\uff', 'UTF-8', 'latin1'), NULL) ) # [1] FALSE identical( .Call(depcache:::C_hash2, '\uff'), .Call(depcache:::C_hash2, iconv('\uff', 'UTF-8', 'latin1')) ) # [1] TRUE * NaNs with different payloads (except NA_numeric_) are replaced by R_NaN. One of the many downsides to the current approach is that we rely on the non-API entry point getPRIMNAME() in order to hash builtins. Looking at the source code for identical() is no help here, because it uses the private PRIMOFFSET macro. The bitstream being hashed is also, unfortunately, not exactly compatible with R serialization format version 2: I had to ignore the LEVELS of the language objects being hashed both because identical() seems to ignore those and because I was missing multiple private definitions (e.g. the MAYBEJIT flag) to handle them properly. Then there's also the problem of immediate bindings [**]: I've seen bits of vctrs, rstudio, rlang blow up when calling CAR() on SEXP objects that are not safe to handle this way, but R_expand_binding_value() (used by serialize()) is again a private function that is not accessible from packages. identical() won't help here, because it compares reference objects (which may or may not contain such immediate bindings) by their pointer values instead of digging down into them. What does 'blow up' mean? If it is anything other than signal a "bad binding access" error then it would be good to have more details. Best, luke Dropping the (already violated) requirement to be compatible with R serialization bitstream will make it possible to simplify the code further. Finally: a <- new.env() b <- new.env() a$x <- b$x <- 42 identical(a, b) # [1] FALSE .Call(depcache:::C_hash2, a) # [1] 44 21 f1 36 5d 92 03 1b .Call(depcache:::C_hash2, b) # [1] 44 21 f1 36 5d 92 03 1b ...but that's unavoidable when looking at frozen object contents instead of their live memory layout. If you're interested, here's the development version of the package: install.packages('depcache',contriburl='https://aitap.github.io/Rpackages') -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] UseMethod forwarding of local variables
UseMethod has since the beginning had the 'feature' that local variables in the generic are added to the environment in which the method body is evaluated. This is documented in ?UseMethod and R-lang.texi, but use of this 'feature' has been explicitly discouraged in R-lang.texi for many years. This is an unfortunate design decision for a number of reasons (see below), so the plan is to remove this 'feature' in the next major release. Fortunately only a small number of packages on CRAN (see below) seem to make use of this feature directly; a few more as reverse dependencies. The maintainers of the directly affected packages will be notified separately. Current R-devel allows you to set the environment variable R_USEMETHOD_FORWARD_LOCALS=none to run R without this feature or R_USEMETHOD_FORWARD_LOCALS=error to signal an error when a forwarded variable's value is used. Some more details: An example: > foo <- function(x) { yyy <- 77; UseMethod("foo") } > foo.bar <- function(x) yyy > foo(structure(1, class = "bar")) [1] 77 Some reasons the design is a bad idea: - You can't determine what a method does without knowing what the generic it will be called from looks like. - Code analysis (codetools, the compiler) can't analyze method code reliably. - You can't debug a method on its own. For the foo() example, > foo.bar(structure(1, class = "bar")) Error in foo.bar(structure(1, class = "bar")) : object 'yyy' not found - A method relying on these variables won't work when reached via NextMethod: > foo.baz <- function(x) NextMethod("foo") > foo(structure(2, class = c("baz", "bar"))) Error in foo.bar(structure(2, class = c("baz", "bar"))) : object 'yyy' not found The directly affected CRAN packages I have identified are: - actuar - quanteda - optmatch - rlang - saeRobust - Sim.DiffProc - sugrrants - texmex Some of these fail with the environment set to 'error' but not to 'none', so they are getting a value from somewhere else that may or may not be right. Affected as revdeps of optmatch: - cobalt - htetree - jointVIP - MatchIt - PCAmatchR - rcbalance - rcbsubset - RItools - stratamatch Affected as revdeps of texmex: - lax - mobirep Best, luke -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] On PRINTNAME() encoding, EncodeChar(), and being painted into a corner
n/eval.c +++ src/main/eval.c @@ -1161,7 +1161,7 @@ SEXP eval(SEXP e, SEXP rho) const char *n = CHAR(PRINTNAME(e)); - if(*n) errorcall(getLexicalCall(rho), + if(*n) errorcall_cpy(getLexicalCall(rho), _("argument \"%s\" is missing, with no default"), -CHAR(PRINTNAME(e))); +EncodeChar(PRINTNAME(e))); else errorcall(getLexicalCall(rho), _("argument is missing, with no default")); } --- src/main/match.c +++ src/main/match.c @@ -229,7 +229,7 @@ attribute_hidden SEXP matchArgs_NR(SEXP if (fargused[arg_i] == 2) - errorcall(call, + errorcall_cpy(call, _("formal argument \"%s\" matched by multiple actual arguments"), - CHAR(PRINTNAME(TAG(f; + EncodeChar(PRINTNAME(TAG(f; if (ARGUSED(b) == 2) errorcall(call, _("argument %d matches multiple formal arguments"), @@ -272,12 +271,12 @@ attribute_hidden SEXP matchArgs_NR(SEXP if (fargused[arg_i] == 1) - errorcall(call, + errorcall_cpy(call, _("formal argument \"%s\" matched by multiple actual arguments"), - CHAR(PRINTNAME(TAG(f; + EncodeChar(PRINTNAME(TAG(f; if (R_warn_partial_match_args) { warningcall(call, _("partial argument match of '%s' to '%s'"), CHAR(PRINTNAME(TAG(b))), CHAR(PRINTNAME(TAG(f))) ); } SETCAR(a, CAR(b)); if (CAR(b) != R_MissingArg) SET_MISSING(a, 0); The changes become more complicated with a plain error() (have to figure out the current call and provide it to errorcall_cpy), still more complicated with warnings (there's currently no warningcall_cpy(), though one can be implemented) and even more complicated when multiple symbols are used in the same warning or error, like in the last warningcall() above (EncodeChar() can only be called once at a time). The only solution to the latter problem is an EncodeChar() variant that allocates its memory dynamically. Would R_alloc() be acceptable in this context? With errors, the allocation stack would be quickly reset (except when withCallingHandlers() is in effect?), but with warnings, the code would have to restore it manually every time. Or allow/require a buffer to be provided. So replacing the calls like CHAR(PRINTNAME(sym)) with EncodeSymbol(sym, buf, buf_size) Is it even worth the effort to try to handle the (pretty rare) non-syntactic symbol names while constructing error messages? Other languages (like Lua or SQLite) provide a special printf specifier (typically %q) to create quoted/escaped string representations, but we're not yet at the point of providing a C-level printf implementation. Not clear it is worth it. But the situation now is not good, because sometimes we encode and sometimes we don't. It would be better to be consistent, both for the end user and for maintainers who now have to spend time figuring out which way to go. Best, luke -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Re: Calling a replacement function in a custom environment
On Sun, 27 Aug 2023, Duncan Murdoch wrote: I think there isn't a way to make this work other than calling `is.na<-` explicitly: x <- b$`is.na<-`(x, TRUE) Replacement functions are not intended to be called directly. Calling a replacement function directly may produce an error, or may just do the wrong thing in terms of mutation. It seems like a reasonable suggestion to make b$is.na(x) <- TRUE work as long as b is an environment. I do not think it is a reasonable suggestion. The reasons a::b and a:::b were made to work is that many users read these as a single symbol, not a call to a binary operator. So supporting this helped to reduce confusion. Allowing $<- to "work" on environments was probably a mistake since environments behave differently with respect to duplication. Disallowing it entirely may be too disruptive at this point, but disallowing it in complex assignment expressions may be necessary to prevent mutations that should not happen. (There are open bug reports that boil down to this.) In any case, complicating the complex assignment code, which is already barely maintainable, would be a very bad idea. Best, luke If you wanted it to work when b was a list, it would be more problematic because of partial name matching. E.g. suppose b was a list containing functions partial(), partial<-(), and part<-(), and I call b$part(x) <- 1 what would be called? Duncan Murdoch On 27/08/2023 10:59 a.m., Konrad Rudolph wrote: Hello all, I am wondering whether it’s at all possible to call a replacement function in a custom environment. From my experiments this appears not to be the case, and I am wondering whether that restriction is intentional. To wit, the following works: x = 1 base::is.na(x) = TRUE However, the following fails: x = 1 b = baseenv() b$is.na(x) = TRUE The error message is "invalid function in complex assignment". Grepping the R code for this error message reveals that this behaviour seems to be hard-coded in function `applydefine` in src/main/eval.c: the function explicitly checks for `::` and :::` and permits those assignments, but has no equivalent treatment for `$`. Am I overlooking something to make this work? And if not — unless there’s a concrete reason against it, could it be considered to add support for this syntax, i.e. for calling a replacement function by `$`-subsetting the defining environment, as shown above? Cheers, Konrad __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Time to drop globalenv() from searches in package code?
On Sat, 17 Sep 2022, Kurt Hornik wrote: luke-tierney writes: On Thu, 15 Sep 2022, Duncan Murdoch wrote: The author of this Stackoverflow question https://stackoverflow.com/q/73722496/2554330 got confused because a typo in his code didn't trigger an error in normal circumstances, but it did when he ran his code in pkgdown. The typo was to use "x" in a test, when the local variable was named ".x". There was no "x" defined locally or in the package or its imports, so the search got all the way to the global environment and found one. (The very confusing part for this user was that it found the right variable.) This author had suppressed the "R CMD check" check for use of global variables. Obviously he shouldn't have done that, but he's working with tidyverse NSE, and that causes so many false positives that it is somewhat understandable he would suppress one too many. The pkgdown simulation of code in examples doesn't do perfect mimicry of running it at top level; the fake global environment never makes it onto the search list. Some might call this a bug, but I'd call it the right search strategy. My suggestion is that the search for variables in package code should never get to globalenv(). The chain of environments should stop after handling the imports. (Probably base package functions should also be implicitly imported, but nothing else.) This was considered and discussed when I added namespaces. Basically it would mean making the parent of the base namespace environment be the empty environment instead of the global environment. As a design this is cleaner, and it would be a one-line change in eval.c. But there were technical reasons this was not a viable option at the time, also a few political reasons. The technical reasons mostly had to do with S3 dispatch. Changes over the years, mostly from work Kurt has done, to S3 dispatch for methods defined and registered in packages might make this more viable in principle, but there would still be a lot of existing code that would stop working. For example, 'make check' with the one-line change fails in a base example that defines an S3 method. It might be possible to fiddle with the dispatch to keep most of that code working, but I suspect that would be a lot of work. Seeing what it would take to get 'make check' to succeed would be a first step if anyone wants to take a crack at it. Luke, Can you please share the one-line change so that I can take a closer look? Index: src/main/envir.c === --- src/main/envir.c(revision 82861) +++ src/main/envir.c(working copy) @@ -683,7 +683,7 @@ R_GlobalCachePreserve = CONS(R_GlobalCache, R_NilValue); R_PreserveObject(R_GlobalCachePreserve); #endif -R_BaseNamespace = NewEnvironment(R_NilValue, R_NilValue, R_GlobalEnv); +R_BaseNamespace = NewEnvironment(R_NilValue, R_NilValue, R_EmptyEnv); R_PreserveObject(R_BaseNamespace); SET_SYMVALUE(install(".BaseNamespaceEnv"), R_BaseNamespace); R_BaseNamespaceName = ScalarString(mkChar("base")); - For S3 the dispatch will have to be changed to explicitly search .GlobalEnv and parents after the namespace if we don't want to break too much. Another idiom that will be broken is if (require("foo")) bar(...) with bar exported from foo. I don't know if that is already warned about. Moving away from this is arguably good in principle but also probably fairly disruptive. We might need to add some cleaner use-if-available mechanism, or maybe just adjust some checking code. Best, luke Best -k I suspect this change would reveal errors in lots of packages, but the number of legitimate uses of the current search strategy has got to be pretty small nowadays, since we've been getting warnings for years about implicit imports from other standard packages. Your definition of 'legitimate' is probably quite similar to mine, but there is likely to be a small but vocal minority with very different views :-). Best, luke Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of St
Re: [Rd] [External] assignment
On Mon, 27 Dec 2021, Gabor Grothendieck wrote: In a recent SO post this came up (changed example to simplify it here). It seems that `test` still has the value sin. test <- sin environment(test)$test <- cos test(0) ## [1] 0 It appears to be related to the double use of `test` in `$<-` since if we break it up it works as expected: test <- sin e <- environment(test) e$test <- cos test(0) ## [1] 1 `assign` also works: test <- sin assign("test", cos, environment(test)) test(0) ## [1] 1 Can anyone shed some light on this? See my response in https://bugs.r-project.org/show_bug.cgi?id=18269 Best, luke -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Re: hashtab address arg
On Wed, 22 Dec 2021, Ivan Krylov wrote: On Sat, 18 Dec 2021 11:50:54 +0100 Arnaud FELD wrote: However, I'm a bit troubled about the "address" argument. What is it intended for since (as far as I know) "address equality" is until now something that isn't really let for the user to decide within R. Using the words from "Extending R" by John M. Chambers, the concept of address identity could be related to the question: If some of the data in the object has changed, is this still the same object? Most objects in R are defined by their content. If you had a 100x100 matrix and changed an element at [50,50], it's now a different matrix, even if it's stored in the same variable. If you create another 100x100 matrix in a different variable but fill it with the same numbers, it should still compare equal to your original matrix. Not all types of R objects are like that. Environments are good candidates for pointer equality comparison. For example, the contents of the global environment change every time you assign some variable in the R command line, but it remains the same global environment. Indeed, identical() for environments just compares their pointers: even if two different environments only contain objects that compare equal, they cannot be considered the same environment, because different closures might be referring to them. Similar are data.tables: if you had a giant dataset and, as part of cleaning it up, removed some outliers, perhaps it should be considered the same dataset, even if the contents aren't strictly the same any more. Same goes for reference class and R6 objects: unlike the pass-by-value semantics associated with most objects in R, these are assumed to carry global state within them, and modifications to them are reflected everywhere they are referenced, not limited to the current function call. This is still experimental and the 'address' option may not survive at the R level. There are some C level applications where it can be useful; maybe it will only be retained there. I *think* that most (if not all) objects with reference semantics already use pointer comparison when being compared by identical(), so the default of "identical" is, as the help page says, almost always the right choice, but if it matters to your code whether the objects are actually stored in the same area in the memory, use hashes of type "address". Unfortunately not all: External pointer objects are reference objects but by default are not compared based on object address. Fixing the default is not an option in the short term as it breaks too much code (mostly through dependencies on a few packages). (Perhaps this topic could be a better fit for R-help.) R-devel is the right place for this. Best, luke -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Status of '=>'
It's still work in progress. Probably => will be dropped in favor of limited use of _ for non-first-argument passing. Best, luke On Mon, 20 Dec 2021, Dirk Eddelbuettel wrote: R 4.1.0 brought the native pipe and the related ability to use '=>' if one opted into it by setting _R_USE_PIPEBIND_. I often forget about '=>' and sadly can never find anything in the docs either (particularly no 'see als' from '|>' docs) which is not all that heplful. Can we anticipate a change with R 4.2.0, or will it remain as is, somewhat available but not really documented or enabled? Clarifications welcome, otherwise 'time will tell' as usual. Thanks, Dirk -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] DOCS: Exactly when, in the signaling process, is option 'warn' applied?
On Thu, 18 Nov 2021, Henrik Bengtsson wrote: Hi, the following question sprung out of a package settings option warn=-1 to silence warnings, but those warnings were still caught by withCallingHandlers(..., warning), which the package author did not anticipate. The package has been updated to use suppressWarnings() instead, but as I see a lot of packages on CRAN [1] use options(warn=-1) to temporarily silence warnings, I wanted to bring this one up. Even base R itself [2] does this, e.g. utils::assignInMyNamespace(). Exactly when is the value of 'warn' options used when calling warning("boom")? In the default handler; it doesn't affect signaling. Much of the documentation pre-dates the condition system; happy to consider patches. Best, luke I think the docs, including ?options, would benefit from clarifying that. To the best of my understanding, it should also mention that options 'warn' is meant to be used by end-users, and not in package code where suppressWarnings() should be used. To clarify, if we do: options(warn = -1) tryCatch(warning("boom"), warning = function(w) stop("Caught warning: ", conditionMessage(w), call. = FALSE)) Error: Caught warning: boom we see that the warning is indeed signaled. However, in Section '8.2 warning' of the 'R Language Definition' [3], we can read: "The function `warning` takes a single argument that is a character string. The behaviour of a call to `warning` depends on the value of the option `"warn"`. If `"warn"` is negative warnings are ignored. [...]" The way this is written, it may suggest that warnings are ignored/silences already early on when calling warning(), but the above example shows that that is not the case. From the same section, we can also read: "[...] If it is zero, they are stored and printed after the top-level function has completed. [...]" which may hint at the 'warn' option is applied only when a warning condition is allowed to "bubble up" all the way to the top level. (FWIW, this is how always though it worked, but it's only now I looked into the docs and see it's ambiguous on this). /Henrik [1] https://github.com/search?q=org%3Acran+language%3Ar+R%2F+in%3Afile%2Cpath+options+warn+%22-1%22&type=Code [2] https://github.com/wch/r-source/blob/0a31ab2d1df247a4289efca5a235dc45b511d04a/src/library/utils/R/objects.R#L402-L405 [3] https://cran.r-project.org/doc/manuals/R-lang.html#warning __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] GC: speeding-up the CHARSXP cache maintenance, 2nd try
Can you please submit this as a wishlist item to bugzilla? it is easier to keep track of there. You could also submit your threads based suggestion there, again to keep it easier to keep track of and possibly get back to in the future. I will have a look at your approach when I get a chance, but I am exploring a different approach to avoid scanning old generations that may be simpler. Best, luke On Wed, 3 Nov 2021, Andreas Kersting wrote: Hi, In https://stat.ethz.ch/pipermail/r-devel/2021-October/081147.html I proposed to speed up the CHARSXP cache maintenance during GC using threading. This was rejected by Luke in https://stat.ethz.ch/pipermail/r-devel/2021-October/081172.html. Here I want to propose an alternative approach to significantly speed up CHARSXP cache maintenance during partial GCs. A patch which passes `make check-devel` is attached. Compared to R devel (revision 81110) I get the following performance improvements on my system: Elapsed time for five non-full gc in a session after x <- as.character(runif(5e7))[] gc(full = TRUE) +20sec -> ~1sec. This patch introduces (theoretical) overheads to mkCharLenCE() and full GCs. However, I did not measure dramatic differences: y <- "old_CHARSXP" after x <- "old_CHARSXP"; gc(); gc() takes a median 32 nanoseconds with and without the patch. gc(full = TRUE) in a new session takes a median 16 milliseconds with and 14 without the patch. The basic idea is to maintain the CHARSXP cache using subtables in R_StringHash, one for each of the (NUM_GC_GENERATIONS := NUM_OLD_GENERATIONS + 1) GC generations. New CHARSXPs are added by mkCharLenCE() to the subtable of the youngest generation. After a partial GC, only the chains anchored at the subtables of the youngest (num_old_gens_to_collect + 1) generations need to be searched for and cleaned of unmarked nodes. Afterwards, these chains need to be merged into those of the respective next generation, if any. This approach relies on the fact that an object/CHARSXP can never become younger again. It is OK though if an object/CHARSXP "skips" a GC generation. R_StringHash, which is now of length (NUM_GC_GENERATIONS * char_hash_size), is structured such that the chains for the same hashcode but for different generations are anchored at slots of R_StringHash which are next to each other in memory. This is because we often need to access two or more (i.e. currently all three) of them for one operation and this avoids cache misses. HASHPRI, i.e. the number of occupied primary slots, is computed and stored as NUM_GC_GENERATIONS times the number of slots which are occupied in at least one of the subtables. This is done because in mkCharLenCE() we need to iterate through one or more chains if and only if there is a chain for the particular hashcode in at least one subtable. I tried to keep the patch as minimal as possible. In particular, I did not add long vector support to R_StringHash. I rather reduced the max value of char_hash_size from 2^30 to 2^29, assuming that NUM_OLD_GENERATIONS is (not larger than) 2. I also did not yet adjust do_show_cache() and do_write_cache(), but I could do so if the patch is accepted. Thanks for your consideration and feedback. Regards, Andreas P.S. I had a hard time to get the indentation right in the patch due the mix of tabs and spaces. Sorry, if I screwed this up. -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Re: Wrong number of names?
On Mon, 1 Nov 2021, Martin Maechler wrote: Duncan Murdoch on Mon, 1 Nov 2021 06:36:17 -0400 writes: > The StackOverflow post > https://stackoverflow.com/a/69767361/2554330 discusses a > dataframe which has a named numeric column of length 1488 > that has 744 names. I don't think this is ever legal, but > am I wrong about that? > The `dat.rds` file mentioned in the post is temporarily > available online in case anyone else wants to examine it. > Assuming that the file contains a badly formed object, I > wonder if readRDS() should do some sanity checks as it > reads. > Duncan Murdoch Good question. In the mean time, I've also added a bit on the SO page above.. e.g. --- d <- readRDS("<.>dat.rds") str(d) ## 'data.frame':1488 obs. of 4 variables: ## $ facet_var: chr "AUT" "AUT" "AUT" "AUT" ... ## $ date : Date, format: "2020-04-26" "2020-04-27" ... ## $ variable : Factor w/ 2 levels "arima","prophet": 1 1 1 1 1 1 1 1 1 1 ... ## $ score: Named num 2.74e-06 2.41e-06 2.48e-06 2.39e-06 2.79e-06 ... ## ..- attr(*, "names")= chr [1:744] "new_confirmed10" "new_confirmed10" "new_confirmed10" "new_confirmed10" ... ds <- d$score c(length(ds), length(names(ds))) ## 1488 744 dput(ds) # -> ## *** caught segfault *** ## address (nil), cause 'memory not mapped' If I'm reading this right then dput is where the segfault is happening, so that could use some more bulletproofing. Best, luke --- Hence "proving" that the dat.rds really contains an invalid object, when simple dput(.) directly gives a segmentation fault. I think we are aware that using C code and say .Call(..) one can create all kinds of invalid objects "easily".. and I think it's clear that it's not feasible to check for validity of such objects "everwhere". Your proposal to have at least our deserialization code used in readRDS() do (at least *some*) validity checks seems good, but maybe we should think of more cases, and / or do such validity checks already during serialization { <-> saveRDS() here } ? .. Such questions then really are for those who understand more than me about (de)serialization in R, its performance bottlenecks etc. Given the speed impact we should probably have such checks *optional* but have them *on* by default e.g., at least for saveRDS() ? Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] GC: improving the marking performance for STRSXPs
Thanks. I have committed a modified version, also incorporating the handling of R_StringHash from your other post, in r81073. I prefer to be more conservative in the GC. for example not assume without checking that STRSXP elements are CHARSXP. This does add some overhead, but the change is still beneficial. I don't think we would want to add the complexity of threading at this point, though it might be worth considering at a later time. There are a few other possible modifications that I'll explore that might provide comparable improvements to the ones seen with your patch without adding the complexity of threads. Best, luke On Thu, 7 Oct 2021, Andreas Kersting wrote: Hi all, in GC (in src/main/memory.c), FORWARD_CHILDREN() (called by PROCESS_NODES()) treats STRSXPs just like VECSXPs, i.e. it calls FORWARD_NODE() for all its children. I claim that this is unnecessarily inefficient since the children of a STRSXP can legitimately only be (atomic) CHARSXPs and could hence be marked directly in the call of FORWARD_CHILDREN() on the STRSXP. Attached patch (atomic_CHARSXP.diff) implements this and gives the following performance improvements on my system compared to R devel (revision 81008): Elapsed time for two full gc in a session after x <- as.character(runif(5e7))[] 19sec -> 15sec. This is the best-case scenario for the patch: very many unique/unmarked CHARSXP in the STRSXP. For already marked CHARSXP there is no performance gain since FORWARD_NODE() is a no-op for them. The relative performance gain is even bigger if iterating through the STRSXP produces many cache misses, as e.g. after x <- as.character(runif(5e7))[] x <- sample(x, length(x)) Elapsed time for two full gc here: 83sec -> 52sec. This is because we have less cache misses per CHARSXP. This patch additionally also assumes that the ATTRIBs of a CHARSXP are not to be traced because they are just used for maintaining the CHARSXP hash chains. The second attached patch (atomic_CHARSXP_safe_unlikely.diff) checks both assumptions and calls gc_error() if they are violated and is still noticeably faster than R devel: 19sec -> 17sec and 83sec -> 54sec, respectively. Attached gc_test.R is the script I used to get the previously mentioned and more gc timings. Do you think that this is a reasonable change? It does make the code more complex and I am not sure if there might be situations in which the assumptions are violated, even though SET_STRING_ELT() and installAttrib() do enforce them. Best regards, Andreas -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Re: Workaround very slow NAN/Infinities arithmetic?
On Thu, 30 Sep 2021, brodie gaslam via R-devel wrote: André, I'm not an R core member, but happen to have looked a little bit at this issue myself. I've seen similar things on Skylake and Coffee Lake 2 (9700, one generation past your latest) too. I think it would make sense to have some handling of this, although I would want to show the trade-off with performance impacts on CPUs that are not affected by this, and on vectors that don't actually have NAs and similar. I think the performance impact is likely to be small so long as branch prediction is active, but since branch prediction is involved you might need to check with different ratios of NAs (not for your NA bailout branch, but for e.g. interaction of what you add and the existing `na.rm=TRUE` logic). I would want to see realistic examples where this matters, not microbenchmarks, before thinking about complicating the code. Not all but most cases where sum(x) returns NaN/NA would eventually result in an error; getting to the error faster is not likely to be useful. My understanding is that arm64 does not support proper long doubles (they are the same as regular doubles). So code using long doubles isn't getting the hoped-for improved precision. Since that architecture is becoming more common we should probably be looking at replacing uses of long doubles with better algorithms that can work with regular doubles, e.g Kahan summation or variants for sum. You'll also need to think of cases such as c(Inf, NA), c(NaN, NA), etc., which might complicate the logic a fair bit. Presumably the x87 FPU will remain common for a long time, but if there was reason to think otherwise, then the value of this becomes questionable. Either way, I would probably wait to see what R Core says. For reference this 2012 blog post[1] discusses some aspects of the issue, including that at least "historically" AMD was not affected. Since we're on the topic I want to point out that the default NA in R starts off as a signaling NA: example(numToBits) # for `bitC` bitC(NA_real_) ## [1] 0 111 | 00100010 bitC(NA_real_ + 0) ## [1] 0 111 | 10100010 Notice the leading bit of the significant starts off as zero, which marks it as a signaling NA, but becomes 1, i.e. non-signaling, after any operation[2]. This is meaningful because the mere act of loading a signaling NA into the x87 FPU is sufficient to trigger the slowdowns, even if the NA is not actually used in arithmetic operations. This happens sometimes under some optimization levels. I don't now of any benefit of starting off with a signaling NA, especially since the encoding is lost pretty much as soon as it is used. If folks are interested I can provide patch to turn the NA quiet by default. In principle this might be a good idea, but the current bit pattern is unfortunately baked into a number of packages and documents on internals, as well as serialized objects. The work needed to sort that out is probably not worth the effort. It also doesn't seem to affect the performance issue here since setting b[1] <- NA_real_ + 0 produces the same slowdown (at least on my current Intel machine). Best, luke Best, B. [1]: https://randomascii.wordpress.com/2012/05/20/thats-not-normalthe-performance-of-odd-floats/ [2]: https://en.wikipedia.org/wiki/NaN#Encoding On Thursday, September 30, 2021, 06:52:59 AM EDT, GILLIBERT, Andre wrote: Dear R developers, By default, R uses the "long double" data type to get extra precision for intermediate computations, with a small performance tradeoff. Unfortunately, on all Intel x86 computers I have ever seen, long doubles (implemented in the x87 FPU) are extremely slow whenever a special representation (NA, NaN or infinities) is used; probably because it triggers poorly optimized microcode in the CPU firmware. A function such as sum() becomes more than hundred times slower! Test code: a=runif(1e7);system.time(for(i in 1:100)sum(a)) b=a;b[1]=NA;system.time(sum(b)) The slowdown factors are as follows on a few intel CPU: 1) Pentium Gold G5400 (Coffee Lake, 8th generation) with R 64 bits : 140 times slower with NA 2) Pentium G4400 (Skylake, 6th generation) with R 64 bits : 150 times slower with NA 3) Pentium G3220 (Haswell, 4th generation) with R 64 bits : 130 times slower with NA 4) Celeron J1900 (Atom Silvermont) with R 64 bits : 45 times slower with NA I do not have access to more recent Intel CPUs, but I doubt that it has improved much. Recent AMD CPUs have no significant slowdown. There is no significant slowdown on Intel CPUs (more recent than Sandy Bridge) for 64 bits floating point calculations based on SSE2. Therefore, operators using doubles, such as '+' are unaffected. I do not know whether recent ARM CPUs have slowdowns
Re: [Rd] [External] Re: Is it a good choice to increase the NCONNECTION value?
We do need to be careful about using too many file descriptors. The standard soft limit on Linux is fairly low (1024; the hard limit is usually quite a bit higher). Hitting that limit, e.g. with runaway with code allocating lots of connections, can cause other things, like loading packages, to fail with hard to diagnose error messages. A static connection limit is a crude way to guard against that. Doing anything substantially better is probably a lot of work. A simple option that may be worth pursuing is to allow the limit to be adjusted at runtime. Users who want to go higher would do so at their own risk and may need to know how to adjust the soft limit on the process. Best, luke On Wed, 25 Aug 2021, Simon Urbanek wrote: Martin, I don't think static connection limit is sensible. Recall that connections can be anything, not just necessarily sockets or file descriptions so they are not linked to the system fd limit. For example, if you use a codec then you will need twice the number of connections than the fds. To be honest the connection limit is one of the main reasons why in our big data applications we have always avoided R connections and used C-level sockets instead (others were lack of control over the socket flags, but that has been addressed in the last release). So I'd vote for at the very least increasing the limit significantly (at least 1k if not more) and, ideally, make it dynamic if memory footprint is an issue. Cheers, Simon On Aug 25, 2021, at 8:53 AM, Martin Maechler wrote: GILLIBERT, Andre on Tue, 24 Aug 2021 09:49:52 + writes: RConnection is a pointer to a Rconn structure. The Rconn structure must be allocated independently (e.g. by malloc() in R_new_custom_connection). Therefore, increasing NCONNECTION to 1024 should only use 8 kilobytes on 64-bits platforms and 4 kilobytes on 32 bits platforms. You are right indeed, and I was wrong. Ideally, it should be dynamically allocated : either as a linked list or as a dynamic array (malloc/realloc). However, a simple change of NCONNECTION to 1024 should be enough for most uses. There is one important other problem I've been made aware (similarly to the number of open DLL libraries, an issue 1-2 years ago) : The OS itself has limits on the number of open files (yes, I know that there are other connections than files) and these limits may quite differ from platform to platform. On my Linux laptop, in a shell, I see $ ulimit -n 1024 which is barely conformant with your proposed 1024 NCONNECTION. Now if NCONNCECTION is larger than the max allowed number of open files and if R opens more files than the OS allowed, the user may get quite unpleasant behavior, e.g. R being terminated brutally (or behaving crazily) without good R-level warning / error messages. It's also not at all sufficient to check for the open files limit at compile time, but rather at R process startup time So this may need considerably more work than you / we have hoped, and it's probably hard to find a safe number that is considerably larger than 128 and less than the smallest of all non-crazy platforms' {number of open files limit}. Sincerely Andr� GILLIBERT [] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Re: Update on rtools4 and ucrt support
On Mon, 23 Aug 2021, Duncan Murdoch wrote: On 23/08/2021 8:15 a.m., jan Vitek via R-devel wrote: Hi Jeroen, I mostly lurk on this list, but I was struck by your combative tone. To pick on two random bits: … a 6gb tarball with manually built things on his personal machine… … a black-box system that is so opaque and complex that only one person knows how it works, and would make it much more difficult for students, universities, and other organisations to build R packages and libraries on Windows… Tomas’ tool chain isn't a blackbox, it has copious documentation (see [1]) and builds on any machine thanks to the provided docker container. This is not to criticise your work which has its unique strengths, but to state the obvious: these strengths are best discussed without passion based on factually accurate descriptions. I agree with Jan. I'm not sure a discussion in this forum would be fruitful, but I really wish Jeroen and Tomas would get together, aiming to merge their toolchains, keeping the best aspects of both. I haven't been involved in the development of either one, but have been a "victim" of the two chain rivalry, because the rgl package is not easy to build. I get instructions from each of them on how to do the build, and those instructions for one toolchain generally break the build on the other one. While it is probably possible to detect the toolchain and have the build adapt to whichever one is in use, it would be a lot easier for me (and I imagine every other maintainer of a package using external libs) if I just had to follow one set of instructions. Duncan Murdoch Here are just a few comments from my perspective (I am an R-core member, but am not part of the CRAN team and do only very limited work on Windows). Other R-core members may have different perspectives and insights. One bit of background: dealing with encoding issues on Windows has been taking an unsustainable amount of R-core resources for some time now. Tomas Kalibera has been taking the lead on trying to address these issues in the existing framework, but this means he has not had the time to make any of the many other valuable and important contributions he could make. The only viable way forward is to move to a Windows tool chain that supports UTF-8 as the C library current encoding via the Windows UCRT framework. Tomas Kalibera has, on behalf of all of R core and in coordination with CRAN, been looking for a way forward for some time and has reported on the progress in several blog posts at https://developer.r-project.org/Blog/public/. This has lead to the development of the MXE-based UCRT tool chain, which is now well tested and ready for deployment. Checks using the UCRT tool chain have been part of the CRAN check process for a while. I believe CRAN plans to switch R-devel checks and builds to the UCRT tool chain during the upcoming CRAN downtime. I expect there will be some communication from CRAN on this soon, including on any issues in supporting binaries for both R-devel and R-patched. In putting together something as large as a tool chain there will always be many choices, each with advantages and disadvantages. Some things may be advantages in some settings and not others. Taking just one case in point: Cross compilation. This is likely to be a better approach for CRAN in the future and is supported by the MXE framework on which the new tool chain is based. The much more recent changes in rtools4 to support UCRT are at this point not yet as well tested as the new tool chain. Once these changes to rtools4 mature, and if binary compatibility can be assured, then having a second tool chain may be useful in some cases. But if there are incompatibilities then it will be up to rtools4 to keep up with the tool chain used by CRAN. On the other, contributing to improving the MXE-based tool chain may be a better investment of time. Best, luke __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Re: JIT compiler does not compile closures with custom environments
On Wed, 18 Aug 2021, Duncan Murdoch wrote: On 18/08/2021 9:00 a.m., Taras Zakharko wrote: I have encountered a behavior of R’s JIT compiler that I can’t quite figure out. Consider the following code: f_global <- function(x) { for(i in 1:1) x <- x + 1 x } f_env <- local({ function(x) { for(i in 1:1) x <- x + 1 x } }) compiler::enableJIT(3) bench::mark(f_global(0), f_env(0)) # 1 f_global(0)103µs 107.61µs 8770.11.4KB 04384 0 # 2 f_env(0) 1.1ms 1.42ms 712.0B 66.3 290 27 Inspecting the closures shows that f_global has been byte-compiled while f_env has not been byte-compiled. Furthermore, if I assign a new environment to f_global (e.g. via environment(f_global) <- new.env()), it won’t be byte-compiled either. However, if I have a function returning a closure, that closure does get byte-compiled: f_closure <- (function() { function(x) { for(i in 1:1) x <- x + 1 x } })() bench::mark(f_closure(0)) # 1 f_closure(0)105µs109µs 8625.0B 2.01 4284 1 497ms What is going on here? Both f_closure and f_env have non-global environments. Why is one JIT-compiled, but not the other? Is there a way to ensure that functions defined in environments will be JIT-compiled? About what is going on in f_closure: I think the anonymous factory function() { function(x) { for(i in 1:1) x <- x + 1 x } } got byte compiled before first use, and that compiled its result. That seems to be what this code indicates: f_closure <- (function() { res <- function(x) { for(i in 1:1) x <- x + 1 x }; print(res); res })() #> function(x) { #> for(i in 1:1) x <- x + 1 #> x #> } #> #> That is right. But even if that's true, it doesn't address the bigger question of why f_global and f_env are treated differently. There are various heuristics in the JIT code to avoid spending too much time in the JIT. The current details are in the source code. Mostly this is to deal with usually ill-advised coding practices that programmatically build many small functions. Hopefully these heuristics can be reduced or eliminated over time. For now, putting the code in a package, where the default is to byte compile on source install, or explicitly calling compiler::cmpfun are options. Best, luke Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] svd For Large Matrix
[copying the list] svd() does support matrices with long vector data. Your example works fine for me on a machine with enough memory with either the reference BLAS/LAPACK or the BLAS/LAPACK used on Fedora 33 (flexiblas backed, I believe, by a version of openBLAS). Take a look at sessionInfo() to see what you are using and consider switching to another BLAS/LAPACK if necessary. Running under gdb may help tracking down where the issue is and reporting it for the BLAS/LAPACK you are using. Best, luke On Fri, 13 Aug 2021, Dario Strbenac via R-devel wrote: Good day, I have a real scenario involving 45 million biological cells (samples) and 60 proteins (variables) which leads to a segmentation fault for svd. I thought this might be a good example of why it might benefit from a long vector upgrade. test <- matrix(rnorm(4500*60), ncol = 60) testSVD <- svd(test) *** caught segfault *** address 0x7fe93514d618, cause 'memory not mapped' Traceback: 1: La.svd(x, nu, nv) 2: svd(test) -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] difference of m1 <- lm(f, data) and update(m1, formula=f)
putting in a new expression for the formula argument. It so happens that putting in a formula object actually works: The only difference between the AST for a call of `~` and the formula such a call produces when evaluated is the class and environment attributes the call adds, and most code that works with expressions, like eval(), ignores attributes. It would seem somewhat more consistent if update.default put the expression that would produce the formula into the call (i.e. stripped out the two attributes). But I do not know if there is logic in base R code, never mind package code, that takes advantage of the attributes on the formula expression in if they are found. formula() looks in the 'terms' component so would not be affects, but I don't know if something else might be. Best, luke Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] problem with pipes, textConnection and read.dcf
Not an issue with pipes. The pipe just rewrites the expression to a nested call and that is then evaluated. The call this produces is quote(L |> +gsub(pattern = " ", replacement = "") |> +gsub(pattern = " ", replacement = "") |> +textConnection() |> +read.dcf()) read.dcf(textConnection(gsub(gsub(L, pattern = " ", replacement = ""), pattern = " ", replacement = ""))) If you run that expression, or just the argument to read.dcf, then you get the error you report. So the issue is somewhere in textConnection(). This produces a similar message: read.dcf(textConnection(c(L, "aa", "", "", "ddd"))) File a bug report and someone who understands the textConnection() internals better than I do can take a look. Best, luke On Tue, 10 Aug 2021, Gabor Grothendieck wrote: This gives an error bit if the first gsub line is commented out then there is no error even though it is equivalent code. L <- c("Variable:id", "Length:112630 ") L |> gsub(pattern = " ", replacement = "") |> gsub(pattern = " ", replacement = "") |> textConnection() |> read.dcf() ## Error in textConnection(gsub(gsub(L, pattern = " ", replacement = ""), : ## argument 'object' must deparse to a single character string That is this works: L |> # gsub(pattern = " ", replacement = "") |> gsub(pattern = " ", replacement = "") |> textConnection() |> read.dcf() ## Variable Length ## [1,] "id" "112630" R.version.string ## [1] "R version 4.1.0 RC (2021-05-16 r80303)" win.version() ## [1] "Windows 10 x64 (build 19042)" -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Re: [R-pkg-devel] Tracking down inconsistent errors and notes across operating systems
Thanks; fix committed in r80654. Best, luke On Thu, 22 Jul 2021, Bill Dunlap wrote: A small example of the problem is #define USE_RINTERNALS 1 #include #include #include static s_object* obj = NULL; Prior to 2021-07-20, with svn 80639, this compiled but after, svn 80647, that I get $ gcc -I"/mnt/c/R/R-svn/trunk/src/include" -I. -I/usr/local/include -fpic -g -O2 -flto -c s_object.c 2>&1 In file included from s_object.c:5: /mnt/c/R/R-svn/trunk/src/include/Rdefines.h:168:33: error: unknown type name ‘SEXPREC’ 168 | #define s_object SEXPREC | ^~~ s_object.c:7:8: note: in expansion of macro ‘s_object’ 7 | static s_object* obj = NULL; | ^~~~ On Thu, Jul 22, 2021 at 10:18 AM Bill Dunlap wrote: I think the problem with RPostgreSQL/sec/RS-DBI.c comes from some changes to Defn.h and Rinternals.h in RHOME/include that Luke made recently (2021-07-20, svn 80647). Since then the line #define s_object SEXPREC in Rdefines.h causes problems. Should it now be 'struct SEXPREC'? -Bill On Thu, Jul 22, 2021 at 7:04 AM Iñaki Ucar wrote: Hi, On Thu, 22 Jul 2021 at 15:51, Hannah Owens wrote: > > Hi all, > I am working on an update to a package I have on CRAN called occCite. My > latest release attempt didn’t pass incoming automated checks, because there > is an outstanding error. Additionally, there are some weird notes I would > like to get rid of, if anyone has suggestions. > > The killing error is in r-devel-linux-x86_64-debian-gcc, which is: Packages > required but not available: 'BIEN', 'taxize', ‘RPostgreSQL' > > I don’t understand this, as it is the only system that throws this error, > and the packages mentioned are available via CRAN. Any suggestions? This kind of message usually arises when there is some problem with those packages on CRAN. Indeed, https://cran.r-project.org/web/checks/check_results_BIEN.html https://cran.r-project.org/web/checks/check_results_taxize.html https://cran.r-project.org/web/checks/check_results_RPostgreSQL.html the three of them have ERRORs in that platform. No issue on your end. You reply pointing to that. > Additionally, there are multiple platforms > (r-devel-linux-x86_64-fedora-clang; r-devel-linux-x86_64-fedora-gcc; > r-devel-windows-x86_64-gcc10-UCRT; r-patched-solaris-x86; > r-release-macos-arm64; r-release-macos-x86_64; r-oldrel-macos-x86_64) where > two notes pop up: > > NOTE 1: Namespace in Imports field not imported from: ‘bit64’ All declared > Imports should be used. > > The package does use bit64. Any tips on how to address this note? Are you sure? Your NAMESPACE file does not import(bit64) nor importFrom(bit64,) anything. > NOTE 2: Found 6 marked UTF-8 strings. > > I presume this is thrown because of the small sample dataset I’ve included > in the package, but why is it not thrown for all the platforms? Not all the checks are necessarily done in all the platforms. You can silence this NOTE by converting the offending strings in your datasets to ASCII and resaving them. -- Iñaki Úcar __ r-package-de...@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] changes in some header files
We are working on rearranging some of our header files with the goal of making the installed headers correspond more closely to the C API available to packages. Packages that only use entry points and definitions that are part of the API as specified in Chapter 6 of Writing R Extensions should not be affected. I have committed an initial set of changes to R-devel in r80644. About 10 CRAN packages that use non-API features will fail under R-devel after these changes and their maintainers have been notified. If you are currently using non-API features in a package it would be a good idea to review what you are doing and to try to revise your code to work within the API. If you feel there are features missing in the API then you can suggest additions on this mailing list or bugzilla. Best, luke -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Clearing attributes returns ALTREP, serialize still saves them
Please do not cross post. You have already rased this on bugzilla. I will follow up there later today. luke On Sat, 3 Jul 2021, Zafer Barutcuoglu wrote: Hi all, Setting names/dimnames on vectors/matrices of length>=64 returns an ALTREP wrapper which internally still contains the names/dimnames, and calling base::serialize on the result writes them out. They are unserialized in the same way, with the names/dimnames hidden in the ALTREP wrapper, so the problem is not obvious except in wasted time, bandwidth, or disk space. Example: v1 <- setNames(rnorm(64), paste("element name", 1:64)) v2 <- unname(v1) names(v2) # NULL length(serialize(v1, NULL)) # [1] 2039 length(serialize(v2, NULL)) # [1] 2132 length(serialize(v2[TRUE], NULL)) # [1] 543 con <- rawConnection(raw(), "w") serialize(v2, con) v3 <- unserialize(rawConnectionValue(con)) names(v3) # NULL length(serialize(v3, NULL)) # 2132 # Similarly for matrices: m1 <- matrix(rnorm(64), 8, 8, dimnames=list(paste("row name", 1:8), paste("col name", 1:8))) m2 <- unname(m1) dimnames(m2) # NULL length(serialize(m1, NULL)) # [1] 918 length(serialize(m2, NULL)) # [1] 1035 length(serialize(m2[TRUE, TRUE], NULL)) # 582 Previously discussed here, too: https://r.789695.n4.nabble.com/Invisible-names-problem-td4764688.html This happens with other attributes as well, but less predictably: x1 <- structure(rnorm(100), data=rnorm(100)) x2 <- structure(x1, data=NULL) length(serialize(x1, NULL)) # [1] 8000952 length(serialize(x2, NULL)) # [1] 924 x1b <- rnorm(100) attr(x1b, "data") <- rnorm(100) x2b <- x1b attr(x2b, "data") <- NULL length(serialize(x1b, NULL)) # [1] 8000863 length(serialize(x2b, NULL)) # [1] 8000956 This is pretty severe, trying to track down why serializing a small object kills the network, because of which large attributes it may have once had during its lifetime around the codebase that are still secretly tagging along. Is there a plan to resolve this? Any suggestions for maybe a C++ workaround until then? Or an alternative performant serialization solution? Best, -- Zafer [[alternative HTML version deleted]] ______ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] SET_COMPLEX_ELT and SET_RAW_ELT missing from Rinternals.h
On Thu, 1 Jul 2021, Konrad Siek wrote: Thanks! So what would be the prescribed way of assigning elements to a CPLXSXP if I needed to? The first question is whether you need to do this. Or, more to the point, whether it is safe to do this. In R objects should behave as if they are not mutable. Mutation in C code may be OK if the objects are not reachable from any R variables, but that almost always means they are private to your code so yo can use what you know about internal structure. If it is legitimate to mutate you can use SET_COMPLEX_ELT. I've added the declaration to Rinternals in R-devel and R-patched. For SET_COMPLEX_ELT(x, in v) is equivalent to COMPLEX(sexp)[index] = value, but that could change in the future it Set methods are supported. This does materialize a potentially compact object, but again the most important question is whether mutation is legitimate at all. One way I see is to do what most of the code inside the interpreter does and grab the vector's data pointer: COMPLEX(sexp)[index] = value; COMPLEX0(sexp)[index] = value; COMPLEX0 is not in the API; it will probably be removed from the installed header files as we clean these up. This will materialize an ALTREP CPLXSXP though, so maybe the best way would be to mirror what SET_COMPLEX_ELT does in Rinlinedfuns.h? if (ALTREP(sexp)) ALTCOMPLEX_SET_ELT(sexp, index, value); else COMPLEX0(sexp)[index] = vector; ALTCOMPLEX_SET_ELT is an internal implementation feature and not in the API. Again, it will probably be removed from the installed headers. Best, luke This seems better, but it's not used in the interpreter anywhere as far as I can tell, presumably because of the setter interface not being complete, as you point out. But should I be avoiding this second approach for some reaosn? k On Tue, Jun 29, 2021 at 4:06 AM wrote: The setter interface for atomic types is not yer implemented. It may be some day. Best, luke On Fri, 25 Jun 2021, Konrad Siek wrote: > Hello, > > I am working on a package that works with various types of R vectors, > implemented in C. My code has a lot of SET_*_ELT operations in it for > various types of vectors, including for CPLXSXPs and RAWSXPs. > > I noticed SET_COMPLEX_ELT and SET_RAW_ELT are defined in Rinlinedfuns.h but > not declared in Rinternals.h, so they cannot be used in packages. I was > going to re-implement them or extern them in my package, however, > interestingly, ALTCOMPLEX_SET_ELT and ALTRAW_SET_ELT are both declared in > Rinternals.h, making me think SET_COMPLEX_ELT and SET_RAW_ELT could be > purposefully obscured. Otherwise it may just be an oversight and I should > bring it to someone's attention anyway. > > I have three questions that I hope R-devel could help me with. > > 1. Is this an oversight, or are SET_COMPLEX_ELT and SET_RAW_ELT not exposed > on purpose? 2. If they are not exposed on purpose, I was wondering why. > 3. More importantly, what would be good ways to set elements of these > vectors while playing nice with ALTREP and avoiding whatever pitfalls > caused these functions to be obscured in the first place? > > Best regards, > Konrad, > > [[alternative HTML version deleted]] > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Re: ALTREP ALTINTEGER_SUM/MIN/MAX Return Value and Behavior
Call the R sum() function, either before going to C code or by calling back into R. You may only want to do this if the vector is long enough for e possible savings to be worth while. On Tue, 29 Jun 2021, Sebastian Martin Krantz wrote: Thanks both. Is there a suggested way I can get this speedup in a package? Or just leave it for now? Thanks also for the clarification Bill. The issue I have with that is that in my C code ALTREP(x) evaluates to true even after adding and removing dimensions (otherwise it would be handled by the normal sum method and I’d be fine). When you use a longer vector Also .Internal(inspect(x)) still shows the compact representation. A different representation (wrapper around a compact sequence). Best, luke -Sebastian On Tue 29. Jun 2021 at 19:43, Bill Dunlap wrote: Adding the dimensions attribute takes away the altrep-ness. Removing dimensions does not make it altrep. E.g., a <- 1:10 am <- a ; dim(am) <- c(2L,5L) amn <- am ; dim(amn) <- NULL .Call("is_altrep", a) [1] TRUE .Call("is_altrep", am) [1] FALSE .Call("is_altrep", amn) [1] FALSE where is_altrep() is defined by the following C code: #include #include SEXP is_altrep(SEXP x) { return Rf_ScalarLogical(ALTREP(x)); } -Bill On Tue, Jun 29, 2021 at 8:03 AM Sebastian Martin Krantz < sebastian.kra...@graduateinstitute.ch> wrote: Hello together, I'm working on some custom (grouped, weighted) sum, min and max functions and I want them to support the special case of plain integer sequences using ALTREP. I thereby encountered some behavior I cannot explain to myself. The head of my fsum C function looks like this (g is optional grouping vector, w is optional weights vector): SEXP fsumC(SEXP x, SEXP Rng, SEXP g, SEXP w, SEXP Rnarm) { int l = length(x), tx = TYPEOF(x), ng = asInteger(Rng), narm = asLogical(Rnarm), nprotect = 1, nwl = isNull(w); if(ALTREP(x) && ng == 0 && nwl) { switch(tx) { case INTSXP: return ALTINTEGER_SUM(x, (Rboolean)narm); case LGLSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm); case REALSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm); default: error("ALTREP object must be integer or real typed"); } } // ... } when I let x <- 1:1e8, fsum(x) works fine and returns the correct value. If I now make this a matrix dim(x) <- c(1e2, 1e6) and subsequently turn this into a vector again, dim(x) <- NULL, fsum(x) gives NULL and a warning message 'converting NULL pointer to R NULL'. For functions fmin and fmax (similarly defined using ALTINTEGER_MIN/MAX), I get this error right away e.g. fmin(1:1e8) gives NULL and warning 'converting NULL pointer to R NULL'. So what is going on here? What do these functions return? And how do I make this a robust implementation? Best regards, Sebastian Krantz [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] ______ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Re: ALTREP ALTINTEGER_SUM/MIN/MAX Return Value and Behavior
It depends on the size. For a larger vector adding dim will create a wrapper ALTREP. Currently the wrapper does not try to use the payload's sum method; this could be added. Best, luke On Tue, 29 Jun 2021, Bill Dunlap wrote: Adding the dimensions attribute takes away the altrep-ness. Removing dimensions does not make it altrep. E.g., a <- 1:10 am <- a ; dim(am) <- c(2L,5L) amn <- am ; dim(amn) <- NULL .Call("is_altrep", a) [1] TRUE .Call("is_altrep", am) [1] FALSE .Call("is_altrep", amn) [1] FALSE where is_altrep() is defined by the following C code: #include #include SEXP is_altrep(SEXP x) { return Rf_ScalarLogical(ALTREP(x)); } -Bill On Tue, Jun 29, 2021 at 8:03 AM Sebastian Martin Krantz < sebastian.kra...@graduateinstitute.ch> wrote: Hello together, I'm working on some custom (grouped, weighted) sum, min and max functions and I want them to support the special case of plain integer sequences using ALTREP. I thereby encountered some behavior I cannot explain to myself. The head of my fsum C function looks like this (g is optional grouping vector, w is optional weights vector): SEXP fsumC(SEXP x, SEXP Rng, SEXP g, SEXP w, SEXP Rnarm) { int l = length(x), tx = TYPEOF(x), ng = asInteger(Rng), narm = asLogical(Rnarm), nprotect = 1, nwl = isNull(w); if(ALTREP(x) && ng == 0 && nwl) { switch(tx) { case INTSXP: return ALTINTEGER_SUM(x, (Rboolean)narm); case LGLSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm); case REALSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm); default: error("ALTREP object must be integer or real typed"); } } // ... } when I let x <- 1:1e8, fsum(x) works fine and returns the correct value. If I now make this a matrix dim(x) <- c(1e2, 1e6) and subsequently turn this into a vector again, dim(x) <- NULL, fsum(x) gives NULL and a warning message 'converting NULL pointer to R NULL'. For functions fmin and fmax (similarly defined using ALTINTEGER_MIN/MAX), I get this error right away e.g. fmin(1:1e8) gives NULL and warning 'converting NULL pointer to R NULL'. So what is going on here? What do these functions return? And how do I make this a robust implementation? Best regards, Sebastian Krantz [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] ALTREP ALTINTEGER_SUM/MIN/MAX Return Value and Behavior
ALTINTEGER_SUM and friends are _not_ intended for use in package code. Once we get some time to clean up headers they will no longer be visible to packages. Best, luke On Tue, 29 Jun 2021, Sebastian Martin Krantz wrote: Hello together, I'm working on some custom (grouped, weighted) sum, min and max functions and I want them to support the special case of plain integer sequences using ALTREP. I thereby encountered some behavior I cannot explain to myself. The head of my fsum C function looks like this (g is optional grouping vector, w is optional weights vector): SEXP fsumC(SEXP x, SEXP Rng, SEXP g, SEXP w, SEXP Rnarm) { int l = length(x), tx = TYPEOF(x), ng = asInteger(Rng), narm = asLogical(Rnarm), nprotect = 1, nwl = isNull(w); if(ALTREP(x) && ng == 0 && nwl) { switch(tx) { case INTSXP: return ALTINTEGER_SUM(x, (Rboolean)narm); case LGLSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm); case REALSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm); default: error("ALTREP object must be integer or real typed"); } } // ... } when I let x <- 1:1e8, fsum(x) works fine and returns the correct value. If I now make this a matrix dim(x) <- c(1e2, 1e6) and subsequently turn this into a vector again, dim(x) <- NULL, fsum(x) gives NULL and a warning message 'converting NULL pointer to R NULL'. For functions fmin and fmax (similarly defined using ALTINTEGER_MIN/MAX), I get this error right away e.g. fmin(1:1e8) gives NULL and warning 'converting NULL pointer to R NULL'. So what is going on here? What do these functions return? And how do I make this a robust implementation? Best regards, Sebastian Krantz [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] SET_COMPLEX_ELT and SET_RAW_ELT missing from Rinternals.h
The setter interface for atomic types is not yer implemented. It may be some day. Best, luke On Fri, 25 Jun 2021, Konrad Siek wrote: Hello, I am working on a package that works with various types of R vectors, implemented in C. My code has a lot of SET_*_ELT operations in it for various types of vectors, including for CPLXSXPs and RAWSXPs. I noticed SET_COMPLEX_ELT and SET_RAW_ELT are defined in Rinlinedfuns.h but not declared in Rinternals.h, so they cannot be used in packages. I was going to re-implement them or extern them in my package, however, interestingly, ALTCOMPLEX_SET_ELT and ALTRAW_SET_ELT are both declared in Rinternals.h, making me think SET_COMPLEX_ELT and SET_RAW_ELT could be purposefully obscured. Otherwise it may just be an oversight and I should bring it to someone's attention anyway. I have three questions that I hope R-devel could help me with. 1. Is this an oversight, or are SET_COMPLEX_ELT and SET_RAW_ELT not exposed on purpose? 2. If they are not exposed on purpose, I was wondering why. 3. More importantly, what would be good ways to set elements of these vectors while playing nice with ALTREP and avoiding whatever pitfalls caused these functions to be obscured in the first place? Best regards, Konrad, [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Possible ALTREP bug
On Thu, 17 Jun 2021, Toby Hocking wrote: Oliver, for clarification that section in writing R extensions mentions VECTOR_ELT and REAL but not REAL_ELT nor any other *_ELT functions. I was looking for an explanation of all the *_ELT functions (which are apparently new), not just VECTOR_ELT. Thanks Simon that response was very helpful. One more question: are there any circumstances in which one should use REAL_ELT(x,i) rather than REAL(x)[i] or vice versa? Or can they be used interchangeably? For a single call it is better to use REAL_ELT(x, i) since it doesn't force allocating a possibly large object in order to get a pointer to its data with REAL(x). If you are iterating over a whole object you may want to get data in chunks. There are iteration macros that help. Some examples are in src/main/summary.c. Best, luke On Wed, Jun 16, 2021 at 4:29 PM Simon Urbanek wrote: The usual quote applies: "use the source, Luke": $ grep _ELT *.h | sort Rdefines.h:#define SET_ELEMENT(x, i, val) SET_VECTOR_ELT(x, i, val) Rinternals.h: The function STRING_ELT is used as an argument to arrayAssign even Rinternals.h:#define VECTOR_ELT(x,i) ((SEXP *) DATAPTR(x))[i] Rinternals.h://SEXP (STRING_ELT)(SEXP x, R_xlen_t i); Rinternals.h:Rbyte (RAW_ELT)(SEXP x, R_xlen_t i); Rinternals.h:Rbyte ALTRAW_ELT(SEXP x, R_xlen_t i); Rinternals.h:Rcomplex (COMPLEX_ELT)(SEXP x, R_xlen_t i); Rinternals.h:Rcomplex ALTCOMPLEX_ELT(SEXP x, R_xlen_t i); Rinternals.h:SEXP (STRING_ELT)(SEXP x, R_xlen_t i); Rinternals.h:SEXP (VECTOR_ELT)(SEXP x, R_xlen_t i); Rinternals.h:SEXP ALTSTRING_ELT(SEXP, R_xlen_t); Rinternals.h:SEXP SET_VECTOR_ELT(SEXP x, R_xlen_t i, SEXP v); Rinternals.h:double (REAL_ELT)(SEXP x, R_xlen_t i); Rinternals.h:double ALTREAL_ELT(SEXP x, R_xlen_t i); Rinternals.h:int (INTEGER_ELT)(SEXP x, R_xlen_t i); Rinternals.h:int (LOGICAL_ELT)(SEXP x, R_xlen_t i); Rinternals.h:int ALTINTEGER_ELT(SEXP x, R_xlen_t i); Rinternals.h:int ALTLOGICAL_ELT(SEXP x, R_xlen_t i); Rinternals.h:void ALTCOMPLEX_SET_ELT(SEXP x, R_xlen_t i, Rcomplex v); Rinternals.h:void ALTINTEGER_SET_ELT(SEXP x, R_xlen_t i, int v); Rinternals.h:void ALTLOGICAL_SET_ELT(SEXP x, R_xlen_t i, int v); Rinternals.h:void ALTRAW_SET_ELT(SEXP x, R_xlen_t i, Rbyte v); Rinternals.h:void ALTREAL_SET_ELT(SEXP x, R_xlen_t i, double v); Rinternals.h:void ALTSTRING_SET_ELT(SEXP, R_xlen_t, SEXP); Rinternals.h:void SET_INTEGER_ELT(SEXP x, R_xlen_t i, int v); Rinternals.h:void SET_LOGICAL_ELT(SEXP x, R_xlen_t i, int v); Rinternals.h:void SET_REAL_ELT(SEXP x, R_xlen_t i, double v); Rinternals.h:void SET_STRING_ELT(SEXP x, R_xlen_t i, SEXP v); So the indexing is with R_xlen_t and they return the value itself as one would expect. Cheers, Simon > On Jun 17, 2021, at 2:22 AM, Toby Hocking wrote: > > By the way, where is the documentation for INTEGER_ELT, REAL_ELT, etc? I > looked in Writing R Extensions and R Internals but I did not see any > mention. > REAL_ELT is briefly mentioned on > https://svn.r-project.org/R/branches/ALTREP/ALTREP.html > Would it be possible to please add some mention of them to Writing R > Extensions? > - how many of these _ELT functions are there? INTEGER, REAL, ... ? > - in what version of R were they introduced? > - I guess input types are always SEXP and int? > - What are the output types for each? > > On Fri, May 28, 2021 at 5:16 PM wrote: > >> Since the INTEGER_ELT, REAL_ELT, etc, functions are fairly new it may >> be possible to check that places where they are used allow for them to >> allocate. I have fixed the one that got caught by Gabor's example, and >> a rchk run might be able to pick up others if rchk knows these could >> allocate. (I may also be forgetting other places where the _ELt >> methods are used.) Fixing all call sites for REAL, INTEGER, etc, was >> never realistic so there GC has to be suspended during the method >> call, and that is done in the dispatch mechanism. >> >> The bigger problem is jumps from inside things that existing code >> assumes will not do that. Catching those jumps is possible but >> expensive; doing anything sensible if one is caught is really not >> possible. >> >> Best, >> >> luke >> >> On Fri, 28 May 2021, Gabriel Becker wrote: >> >>> Hi Jim et al, >>> Just t
Re: [Rd] [External] Possible ALTREP bug
Since the INTEGER_ELT, REAL_ELT, etc, functions are fairly new it may be possible to check that places where they are used allow for them to allocate. I have fixed the one that got caught by Gabor's example, and a rchk run might be able to pick up others if rchk knows these could allocate. (I may also be forgetting other places where the _ELt methods are used.) Fixing all call sites for REAL, INTEGER, etc, was never realistic so there GC has to be suspended during the method call, and that is done in the dispatch mechanism. The bigger problem is jumps from inside things that existing code assumes will not do that. Catching those jumps is possible but expensive; doing anything sensible if one is caught is really not possible. Best, luke On Fri, 28 May 2021, Gabriel Becker wrote: Hi Jim et al, Just to hopefully add a bit to what Luke already answered, from what I am recalling looking back at that bioconductor thread Elt methods are used in places where there are hard implicit assumptions that no garbage collection will occur (ie they are called on things that aren't PROTECTed), and beyond that, in places where there are hard assumptions that no error (longjmp) will occur. I could be wrong, but I don't know that suspending garbage collection would protect from the second one. Ie it is possible that an error *ever* being raised from R code that implements an elt method could cause all hell to break loose. Luke or Tomas Kalibera would know more. I was disappointed that implementing ALTREPs in R code was not in the cards (it was in my original proposal back in 2016 to the DSC) but I trust Luke that there are important reasons we can't safely allow that. Best, ~G On Fri, May 28, 2021 at 8:31 AM Jim Hester wrote: From reading the discussion on the Bioconductor issue tracker it seems like the reason the GC is not suspended for the non-string ALTREP Elt methods is primarily due to performance concerns. If this is the case perhaps an additional flag could be added to the `R_set_altrep_*()` functions so ALTREP authors could indicate if GC should be halted when that particular method is called for that particular ALTREP class. This would avoid the performance hit (other than a boolean check) for the standard case when no allocations are expected, but allow authors to indicate that R should pause GC if needed for methods in their class. On Fri, May 28, 2021 at 9:42 AM wrote: > integer and real Elt methods are not expected to allocate. You would > have to suspend GC to be able to do that. This currently can't be done > from package code. > > Best, > > luke > > On Fri, 28 May 2021, Gábor Csárdi wrote: > > > I have found some weird SEXP corruption behavior with ALTREP, which > > could be a bug. (Or I could be doing something wrong.) > > > > I have an integer ALTREP vector that calls back to R from the Elt > > method. When this vector is indexed in a lapply(), its first element > > gets corrupted. Sometimes it's just a type change to logical, but > > sometimes the corruption causes a crash. > > > > I saw this on macOS from R 3.5.3 to 4.2.0. I created a small package > > that demonstrates this: https://github.com/gaborcsardi/redfish > > > > The R callback in this package calls `loadNamespace("Matrix")`, but > > the same crash happens for other packages as well, and sometimes it > > also happens if I don't load any packages at all. (But that example > > was much more complicated, so I went with the package loading.) > > > > It is somewhat random, and sometimes turning off the JIT avoids the > > crash, but not always. > > > > Hopefully I am just doing something wrong in the ALTREP code (see > > https://github.com/gaborcsardi/redfish/blob/main/src/test.c), and it > > is not actually a bug. > > > > Thanks, > > Gabor > > > > __ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > -- > Luke Tierney > Ralph E. Wareham Professor of Mathematical Sciences > University of Iowa Phone: 319-335-3386 > Department of Statistics and Fax: 319-335-3017 > Actuarial Science > 241 Schaeffer Hall
Re: [Rd] [External] Possible ALTREP bug
integer and real Elt methods are not expected to allocate. You would have to suspend GC to be able to do that. This currently can't be done from package code. Best, luke On Fri, 28 May 2021, Gábor Csárdi wrote: I have found some weird SEXP corruption behavior with ALTREP, which could be a bug. (Or I could be doing something wrong.) I have an integer ALTREP vector that calls back to R from the Elt method. When this vector is indexed in a lapply(), its first element gets corrupted. Sometimes it's just a type change to logical, but sometimes the corruption causes a crash. I saw this on macOS from R 3.5.3 to 4.2.0. I created a small package that demonstrates this: https://github.com/gaborcsardi/redfish The R callback in this package calls `loadNamespace("Matrix")`, but the same crash happens for other packages as well, and sometimes it also happens if I don't load any packages at all. (But that example was much more complicated, so I went with the package loading.) It is somewhat random, and sometimes turning off the JIT avoids the crash, but not always. Hopefully I am just doing something wrong in the ALTREP code (see https://github.com/gaborcsardi/redfish/blob/main/src/test.c), and it is not actually a bug. Thanks, Gabor __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Re: 1954 from NA
On Tue, 25 May 2021, Adrian Dușa wrote: Dear Avi, Thank you so much for the extended messages, I read them carefully. While partially offering a solution (I've already been there), it creates additional work for the user, and some of that is unnecessary. What I am trying to achieve is best described in this draft vignette: devtools::install_github("dusadrian/mixed") vignette("mixed") Once a value is declared to be missing, the user should not do anything else about it. Despite being present, the value should automatically be treated as missing by the software. That is the way it's done in all major statistical packages like SAS, Stata and even SPSS. My end goal is to make R attractive for my faculty peers (and beyond), almost all of whom are massively using SPSS and sometimes Stata. But in order to convince them to (finally) make the switch, I need to provide similar functionality, not additional work. Re. your first part of the message, I am definitely not trying to change the R internals. The NA will still be NA, exactly as currently defined. My initial proposal was based on the observation that the 1954 payload was stored as an unsigned int (thus occupying 32 bits) when it is obvious it doesn't need more than 16. That was the only proposed modification, and everything else stays the same. I now learned, thanks to all contributors in this list, that building something around that payload is risky because we do not know exactly what the compilers will do. One possible solution that I can think of, while (still) maintaining the current functionality around the NA, is to use a different high word for the NA that would not trigger compilation issues. But I have absolutely no idea what that implies for the other inner workings of R. I very much trust the R core will eventually find a robust solution, they've solved much more complicated problems than this. I just hope the current thread will push the idea of tagged NAs on the table, for when they will discuss this. Once that will be solved, and despite the current advice discouraging this route, I believe tagging NAs is a valuable idea that should not be discarded. Yes, it should be discarded. You can of course do what you like in code you keep to yourself. But please do not distribute code that does this. via CRAN or any other means. It will only create problems for those maintaining R. After all, the NA is nothing but a tagged NaN. And we are now paying a price for what was, in hindsight, an unfortunate decision. Best, luke All the best, Adrian On Tue, May 25, 2021 at 7:05 AM Avi Gross via R-devel wrote: I was thinking about how one does things in a language that is properly object-oriented versus R that makes various half-assed attempts at being such. Clearly in some such languages you can make an object that is a wrapper that allows you to save an item that is the main payload as well as anything else you want. You might need a way to convince everything else to allow you to make things like lists and vectors and other collections of the objects and perhaps automatically unbox them for many purposes. As an example in a language like Python, you might provide methods so that adding A and B actually gets the value out of A and/or B and adds them properly. But there may be too many edge cases to handle and some software may not pay attention to what you want including some libraries written in other languages. I mention Python for the odd reason that it is now possible to combine Python and R in the same program and sort of switch back and forth between data representations. This may provide some openings for preserving and accessing metadata when needed. Realistically, if R was being designed from scratch TODAY, many things might be done differently. But I recall it being developed at Bell Labs for purposes where it was sort of revolutionary at the time (back when it was S) and designed to do things in a vectorized way and probably primarily for the kinds of scientific and mathematical operations where a single NA (of several types depending on the data) was enough when augmented by a few things like a Nan and Inf and -Inf. I doubt they seriously saw a need for an unlimited number of NA that were all the same AND also all different that they felt had to be built-in. As noted, had they had a reason to make it fully object-oriented too and made the base types such as integer into full-fledged objects with room for additional metadata, then things may be different. I note I have seen languages which have both a data type called integer as lower case and Integer as upper case. One of them is regularly boxed and unboxed automagically when used in a context that needs the other. As far as efficiency goes, this invisibly adds many steps. So do languages that sometimes take a variable that is a pointer and invisibly reference it to provide the underlying field rather than make you do extra t
Re: [Rd] [External] Re: 1954 from NA
On Mon, 24 May 2021, Adrian Dușa wrote: On Mon, May 24, 2021 at 2:11 PM Greg Minshall wrote: [...] if you have 500 columns of possibly-NA'd variables, you could have one column of 500 "bits", where each bit has one of N values, N being the number of explanations the corresponding column has for why the NA exists. PLEASE DO NOT DO THIS! It will not work reliably, as has been explained to you ad nauseam in this thread. If you distribute code that does this it will only lead to bug reports on R that will waste R-core time. As Alex explained, you can use attributes for this. If you need operations to preserve attributes across subsetting you can define subsetting methods that do that. If you are dead set on doing something in C you can try to develop an ALTREP class that provides augmented missing value information. Best, luke The mere thought of implementing something like that gives me shivers. Not to mention such a solution should also be robust when subsetting, splitting, column and row binding, etc. and everything can be lost if the user deletes that particular column without realising its importance. Social science datasets are much more alive and complex than one might first think: there are multi-wave studies with tens of countries, and aggregating such data is already a complex process to add even more complexity on top of that. As undocumented as they may be, or even subject to change, I think the R internals are much more reliable that this. Best wishes, Adrian -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Pipe bind restored in R 4.1.0?
No. We need more time to resolve issues revealed in testing. Best, luke On Sat, 17 Apr 2021, Brenton Wiernik wrote: Is the pipe bind `=>` operator likely to be restored by default in time for the 4.1 release? Brenton [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] memory consumption of nested (un)serialize of sys.frames()
Looks like this is an unavoidable interaction between the way source references and lazy loading are implemented. The link back to the crash_dumps environment comes though source references on an unevaluated argument promise. Creating a fresh environment is .onLoad() avoids this and is probably your best bet. Having an option to serialize without source references might be nice but would probably not be high enough on anyone's priority list to get done anytime soon. Best, luke On Thu, 8 Apr 2021, luke-tier...@uiowa.edu wrote: I see that now also. Not sure yet what is going on. One work-around that may work for you is to create a fresh crash dump in a .onLoad function; somehting like crash_dumps <- NULL .onLoad <- function(...) crash_dumps <<- new.env() Best, luke On Wed, 7 Apr 2021, Andreas Kersting wrote: Hi Dirk, hi Luke, Thanks for checking! I could narrow it down further. I have the issue only if I install --with-keep.source, i.e. R CMD INSTALL --with-keep.source dumpTest Since this is the default in RStudio when clicking "Install and Restart", I was always having the issue - also from base R. If I install using e.g. devtools::install_github() directly it is also fine for me. Could you please confirm? Thanks! Regards, Andreas 2021-04-07 16:20 GMT+02:00 "Dirk Eddelbuettel" : On 7 April 2021 at 16:06, Andreas Kersting wrote: | Hi Luke, | | Please see https://github.com/akersting/dumpTest for the package. | | Here a session showing my issue: | | > library(dumpTest) | > sessionInfo() | R version 4.0.5 (2021-03-31) | Platform: x86_64-pc-linux-gnu (64-bit) | Running under: Debian GNU/Linux 10 (buster) | | Matrix products: default | BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0 | LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0 | | locale: | [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C | [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 | [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 | [7] LC_PAPER=en_US.UTF-8 LC_NAME=C | [9] LC_ADDRESS=C LC_TELEPHONE=C | [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C | | attached base packages: | [1] stats graphics grDevices utils datasets methods base | | other attached packages: | [1] dumpTest_0.1.0 | | loaded via a namespace (and not attached): | [1] compiler_4.0.5 | > for (i in 1:100) { | + print(i) | + print(system.time(f())) | + } | [1] 1 |user system elapsed | 0.028 0.004 0.034 | [1] 2 |user system elapsed | 0.067 0.008 0.075 | [1] 3 |user system elapsed | 0.176 0.000 0.176 | [1] 4 |user system elapsed | 0.335 0.012 0.349 | [1] 5 |user system elapsed | 0.745 0.023 0.770 | [1] 6 |user system elapsed | 1.495 0.060 1.572 | [1] 7 |user system elapsed | 2.902 0.136 3.040 | [1] 8 |user system elapsed | 5.753 0.272 6.034 | [1] 9 |user system elapsed | 11.807 0.708 12.597 | [1] 10 | ^C | Timing stopped at: 6.638 0.549 7.214 | | I had to interrupt in iteration 10 because I was running low on RAM. No issue here. Ubuntu 20.10, R 4.0.5 'from CRAN' i.e. Michael's PPA build off my Debian package, hence instrumentation as in the Debian package. edd@rob:~$ installGithub.r akersting/dumpTest Using github PAT from envvar GITHUB_PAT Downloading GitHub repo akersting/dumpTest@HEAD ✔ checking for file ‘/tmp/remotes3f9af733166ccd/akersting-dumpTest-3bed8e2/DESCRIPTION’ ... ─ preparing ‘dumpTest’: ✔ checking DESCRIPTION meta-information ... ─ checking for LF line-endings in source and make files and shell scripts ─ checking for empty or unneeded directories ─ building ‘dumpTest_0.1.0.tar.gz’ Installing package into ‘/usr/local/lib/R/site-library’ (as ‘lib’ is unspecified) * installing *source* package ‘dumpTest’ ... ** using staged installation ** R ** byte-compile and prepare package for lazy loading ** help No man pages found in package ‘dumpTest’ *** installing help indices ** building package indices ** testing if installed package can be loaded from temporary location ** testing if installed package can be loaded from final location ** testing if installed package keeps a record of temporary installation path * DONE (dumpTest) edd@rob:~$ Rscript -e 'system.time({for (i in 1:100) dumpTest::f()})' user system elapsed 0.481 0.019 0.500 edd@rob:~$ (I also ran the variant you showed with the dual print statements, it just consumes more screen real estate and ends on [...] [1] 97 user system elapsed 0.004 0.000 0.005 [1] 98 user system elapsed 0.004 0.000 0.005 [1] 99 user system elapsed 0.004 0.000 0.004 [1] 100 user system elapsed 0.005 0.000 0.005 edd@rob:~$ ) Dirk -- https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa
Re: [Rd] [External] memory consumption of nested (un)serialize of sys.frames()
I see that now also. Not sure yet what is going on. One work-around that may work for you is to create a fresh crash dump in a .onLoad function; somehting like crash_dumps <- NULL .onLoad <- function(...) crash_dumps <<- new.env() Best, luke On Wed, 7 Apr 2021, Andreas Kersting wrote: Hi Dirk, hi Luke, Thanks for checking! I could narrow it down further. I have the issue only if I install --with-keep.source, i.e. R CMD INSTALL --with-keep.source dumpTest Since this is the default in RStudio when clicking "Install and Restart", I was always having the issue - also from base R. If I install using e.g. devtools::install_github() directly it is also fine for me. Could you please confirm? Thanks! Regards, Andreas 2021-04-07 16:20 GMT+02:00 "Dirk Eddelbuettel" : On 7 April 2021 at 16:06, Andreas Kersting wrote: | Hi Luke, | | Please see https://github.com/akersting/dumpTest for the package. | | Here a session showing my issue: | | > library(dumpTest) | > sessionInfo() | R version 4.0.5 (2021-03-31) | Platform: x86_64-pc-linux-gnu (64-bit) | Running under: Debian GNU/Linux 10 (buster) | | Matrix products: default | BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0 | LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0 | | locale: | [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C | [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 | [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 | [7] LC_PAPER=en_US.UTF-8 LC_NAME=C | [9] LC_ADDRESS=C LC_TELEPHONE=C | [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C | | attached base packages: | [1] stats graphics grDevices utils datasets methods base | | other attached packages: | [1] dumpTest_0.1.0 | | loaded via a namespace (and not attached): | [1] compiler_4.0.5 | > for (i in 1:100) { | + print(i) | + print(system.time(f())) | + } | [1] 1 |user system elapsed | 0.028 0.004 0.034 | [1] 2 |user system elapsed | 0.067 0.008 0.075 | [1] 3 |user system elapsed | 0.176 0.000 0.176 | [1] 4 |user system elapsed | 0.335 0.012 0.349 | [1] 5 |user system elapsed | 0.745 0.023 0.770 | [1] 6 |user system elapsed | 1.495 0.060 1.572 | [1] 7 |user system elapsed | 2.902 0.136 3.040 | [1] 8 |user system elapsed | 5.753 0.272 6.034 | [1] 9 |user system elapsed | 11.807 0.708 12.597 | [1] 10 | ^C | Timing stopped at: 6.638 0.549 7.214 | | I had to interrupt in iteration 10 because I was running low on RAM. No issue here. Ubuntu 20.10, R 4.0.5 'from CRAN' i.e. Michael's PPA build off my Debian package, hence instrumentation as in the Debian package. edd@rob:~$ installGithub.r akersting/dumpTest Using github PAT from envvar GITHUB_PAT Downloading GitHub repo akersting/dumpTest@HEAD ✔ checking for file ‘/tmp/remotes3f9af733166ccd/akersting-dumpTest-3bed8e2/DESCRIPTION’ ... ─ preparing ‘dumpTest’: ✔ checking DESCRIPTION meta-information ... ─ checking for LF line-endings in source and make files and shell scripts ─ checking for empty or unneeded directories ─ building ‘dumpTest_0.1.0.tar.gz’ Installing package into ‘/usr/local/lib/R/site-library’ (as ‘lib’ is unspecified) * installing *source* package ‘dumpTest’ ... ** using staged installation ** R ** byte-compile and prepare package for lazy loading ** help No man pages found in package ‘dumpTest’ *** installing help indices ** building package indices ** testing if installed package can be loaded from temporary location ** testing if installed package can be loaded from final location ** testing if installed package keeps a record of temporary installation path * DONE (dumpTest) edd@rob:~$ Rscript -e 'system.time({for (i in 1:100) dumpTest::f()})' user system elapsed 0.481 0.019 0.500 edd@rob:~$ (I also ran the variant you showed with the dual print statements, it just consumes more screen real estate and ends on [...] [1] 97 user system elapsed 0.004 0.000 0.005 [1] 98 user system elapsed 0.004 0.000 0.005 [1] 99 user system elapsed 0.004 0.000 0.004 [1] 100 user system elapsed 0.005 0.000 0.005 edd@rob:~$ ) Dirk -- https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] memory consumption of nested (un)serialize of sys.frames()
No issues here with that either. Looks like something is different on your end. Best, luke On Wed, 7 Apr 2021, Andreas Kersting wrote: Hi Luke, Please see https://github.com/akersting/dumpTest for the package. Here a session showing my issue: library(dumpTest) sessionInfo() R version 4.0.5 (2021-03-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 10 (buster) Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] dumpTest_0.1.0 loaded via a namespace (and not attached): [1] compiler_4.0.5 for (i in 1:100) { + print(i) + print(system.time(f())) + } [1] 1 user system elapsed 0.028 0.004 0.034 [1] 2 user system elapsed 0.067 0.008 0.075 [1] 3 user system elapsed 0.176 0.000 0.176 [1] 4 user system elapsed 0.335 0.012 0.349 [1] 5 user system elapsed 0.745 0.023 0.770 [1] 6 user system elapsed 1.495 0.060 1.572 [1] 7 user system elapsed 2.902 0.136 3.040 [1] 8 user system elapsed 5.753 0.272 6.034 [1] 9 user system elapsed 11.807 0.708 12.597 [1] 10 ^C Timing stopped at: 6.638 0.549 7.214 I had to interrupt in iteration 10 because I was running low on RAM. Regards, Andreas 2021-04-07 15:28 GMT+02:00 luke-tier...@uiowa.edu: On Wed, 7 Apr 2021, Andreas Kersting wrote: Hi, please consider the following minimal reproducible example: Create a new R package which just contains the following two (exported) objects: I would not expect this behavior and I don't see it when I make such a package (in R 4.0.3 or R-devel on Ubuntu). You will need to provide a more complete reproducible example if you want help with what you are trying to do; also sessionInfo() would help. Best, luke crash_dumps <- new.env() f <- function() { x <- runif(1e5) dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL))) assign("last.dump", dump, crash_dumps) } WARNING: the following will probably eat all your RAM! Attach this package and run: for (i in 1:100) { print(i) f() } You will notice that with each iteration the execution of f() slows down significantly while the memory consumption of the R process (v4.0.5 on Linux) quickly explodes. I am having a hard time to understand what exactly is happening here. Something w.r.t. too deeply nested environments? Could someone please enlighten me? Thanks! Regards, Andreas Background: In an R package I store crash dumps on error in a parallel processes in a way similar to what I have just shown (hence the (un)serialize(), which happens as part of returning the objects to the parent process). The first 2 or 3 times I do so in a session everything is fine, but afterwards it takes very long and I soon run out of memory. Some more observations: - If I omit `x <- runif(1e5)`, the issues seem to be less pronounced. - If I assign to .GlobalEnv instead of crash_dumps, there seems to be no issue - probably because .GlobalEnv is not included in sys.frames(), while crash_dumps is indirectly via the namespace of the package being the parent.env of some of the sys.frames()!? - If I omit the lapply(...), i.e. use `dump <- unserialize(serialize(sys.frames(), NULL))` directly, there seems to be no issue. The immediate consequence is that there are less sys.frames and - in particular - there is no frame which has the base namespace as its parent.env. - If I make crash_dumps a list and use assignInMyNamespace() to store the dump in it, there also seems to be no issue. I will probably use this as a workaround: crash_dumps <- list() f <- function() { x <- runif(1e5) dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL))) crash_dumps[["last.dump"]] <- dump assignInMyNamespace("crash_dumps", crash_dumps) } __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics
Re: [Rd] [External] memory consumption of nested (un)serialize of sys.frames()
On Wed, 7 Apr 2021, Andreas Kersting wrote: Hi, please consider the following minimal reproducible example: Create a new R package which just contains the following two (exported) objects: I would not expect this behavior and I don't see it when I make such a package (in R 4.0.3 or R-devel on Ubuntu). You will need to provide a more complete reproducible example if you want help with what you are trying to do; also sessionInfo() would help. Best, luke crash_dumps <- new.env() f <- function() { x <- runif(1e5) dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL))) assign("last.dump", dump, crash_dumps) } WARNING: the following will probably eat all your RAM! Attach this package and run: for (i in 1:100) { print(i) f() } You will notice that with each iteration the execution of f() slows down significantly while the memory consumption of the R process (v4.0.5 on Linux) quickly explodes. I am having a hard time to understand what exactly is happening here. Something w.r.t. too deeply nested environments? Could someone please enlighten me? Thanks! Regards, Andreas Background: In an R package I store crash dumps on error in a parallel processes in a way similar to what I have just shown (hence the (un)serialize(), which happens as part of returning the objects to the parent process). The first 2 or 3 times I do so in a session everything is fine, but afterwards it takes very long and I soon run out of memory. Some more observations: - If I omit `x <- runif(1e5)`, the issues seem to be less pronounced. - If I assign to .GlobalEnv instead of crash_dumps, there seems to be no issue - probably because .GlobalEnv is not included in sys.frames(), while crash_dumps is indirectly via the namespace of the package being the parent.env of some of the sys.frames()!? - If I omit the lapply(...), i.e. use `dump <- unserialize(serialize(sys.frames(), NULL))` directly, there seems to be no issue. The immediate consequence is that there are less sys.frames and - in particular - there is no frame which has the base namespace as its parent.env. - If I make crash_dumps a list and use assignInMyNamespace() to store the dump in it, there also seems to be no issue. I will probably use this as a workaround: crash_dumps <- list() f <- function() { x <- runif(1e5) dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL))) crash_dumps[["last.dump"]] <- dump assignInMyNamespace("crash_dumps", crash_dumps) } __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] brief update on the pipe operator in R-devel
After some discussions we've settled on a syntax of the form mtcars |> subset(cyl == 4) |> d => lm(mpg ~ disp, data = d) to handle cases where the pipe lhs needs to be passed to an argument other than the first of the function called on the rhs. This seems a to be a reasonable balance between making these non-standard cases easy to see but still easy to write. This is now committed to R-devel. Best, luke On Tue, 22 Dec 2020, luke-tier...@uiowa.edu wrote: It turns out that allowing a bare function expression on the right-hand side (RHS) of a pipe creates opportunities for confusion and mistakes that are too risky. So we will be dropping support for this from the pipe operator. The case of a RHS call that wants to receive the LHS result in an argument other than the first can be handled with just implicit first argument passing along the lines of mtcars |> subset(cyl == 4) |> (\(d) lm(mpg ~ disp, data = d))() It was hoped that allowing a bare function expression would make this more convenient, but it has issues as outlined below. We are exploring some alternatives, and will hopefully settle on one soon after the holidays. The basic problem, pointed out in a comment on Twitter, is that in expressions of the form 1 |> \(x) x + 1 -> y 1 |> \(x) x + 1 |> \(y) x + y everything after the \(x) is parsed as part of the body of the function. So these are parsed along the lines of 1 |> \(x) { x + 1 -> y } 1 |> \(x) { x + 1 |> \(y) x + y } In the first case the result is assigned to a (useless) local variable. Someone writing this is more likely to have intended to assign the result to a global variable, as this would: (1 |> \(x) x + 1) -> y In the second case the 'x' in 'x + y' refers to the local variable 'x' in the first RHS function. Someone writing this is more likely to have meant (1 |> \(x) x + 1) |> \(y) x + y with 'x' in 'x + y' now referring to a global variable: > x <- 2 > 1 |> \(x) x + 1 |> \(y) x + y [1] 3 > (1 |> \(x) x + 1) |> \(y) x + y [1] 4 These issues arise with any approach in R that allows a bare function expression on the RHS of a pipe operation. It also arises in other languages with pipe operators. For example, here is the last example in Julia: julia> x = 2 2 julia> 1 |> x -> x + 1 |> y -> x + y 3 julia> ( 1 |> x -> x + 1 ) |> y -> x + y 4 Even though proper use of parentheses can work around these issues, the likelihood of making mistakes that are hard to track down is too high. So we will disallow the use of bare function expressions on the right hand side of a pipe. Best, luke -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] brief update on the pipe operator in R-devel
It turns out that allowing a bare function expression on the right-hand side (RHS) of a pipe creates opportunities for confusion and mistakes that are too risky. So we will be dropping support for this from the pipe operator. The case of a RHS call that wants to receive the LHS result in an argument other than the first can be handled with just implicit first argument passing along the lines of mtcars |> subset(cyl == 4) |> (\(d) lm(mpg ~ disp, data = d))() It was hoped that allowing a bare function expression would make this more convenient, but it has issues as outlined below. We are exploring some alternatives, and will hopefully settle on one soon after the holidays. The basic problem, pointed out in a comment on Twitter, is that in expressions of the form 1 |> \(x) x + 1 -> y 1 |> \(x) x + 1 |> \(y) x + y everything after the \(x) is parsed as part of the body of the function. So these are parsed along the lines of 1 |> \(x) { x + 1 -> y } 1 |> \(x) { x + 1 |> \(y) x + y } In the first case the result is assigned to a (useless) local variable. Someone writing this is more likely to have intended to assign the result to a global variable, as this would: (1 |> \(x) x + 1) -> y In the second case the 'x' in 'x + y' refers to the local variable 'x' in the first RHS function. Someone writing this is more likely to have meant (1 |> \(x) x + 1) |> \(y) x + y with 'x' in 'x + y' now referring to a global variable: > x <- 2 > 1 |> \(x) x + 1 |> \(y) x + y [1] 3 > (1 |> \(x) x + 1) |> \(y) x + y [1] 4 These issues arise with any approach in R that allows a bare function expression on the RHS of a pipe operation. It also arises in other languages with pipe operators. For example, here is the last example in Julia: julia> x = 2 2 julia> 1 |> x -> x + 1 |> y -> x + y 3 julia> ( 1 |> x -> x + 1 ) |> y -> x + y 4 Even though proper use of parentheses can work around these issues, the likelihood of making mistakes that are hard to track down is too high. So we will disallow the use of bare function expressions on the right hand side of a pipe. Best, luke -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] setting .libPaths() with parallel::clusterCall
On Tue, 22 Dec 2020, Mark van der Loo wrote: Dear all, It is not possible to set library paths on worker nodes with parallel::clusterCall (or snow::clusterCall) and I wonder if this is intended behavior. Example. library(parallel) libdir <- "./tmplib" if (!dir.exists(libdir)) dir.create("./tmplib") cl <- makeCluster(2) clusterCall(cl, .libPaths, c(libdir, .libPaths()) ) The output is as expected with the extra libdir returned for each worker node. However, running clusterEvalQ(cl, .libPaths()) Shows that the library paths have not been set. Use this: clusterCall(cl, ".libPaths", c(libdir, .libPaths()) ) This will find the function .libPaths on the workers. Your clusterCall sends across a serialized copy of your process' .libPaths and calls that. Usually that is equivalent to calling the function found by the name you used on the workers, but not when the function has an enclosing environment that the function modifies by assignment. Alternate implementations of .libPaths that are more serialization-friendly are possible in principle but probably not practical given limitations of the base package. The distinction between providing a function value or a character string as the function argument to clusterCall and others could probably use a paragraph in the help file; happy to consider a patch if anyone wants to take a crack at it. Best, luke If this is indeed a bug, I'm happy to file it at bugzilla. Tested on R 4.0.3 and r-devel. Best, Mark ps: a workaround is documented here: https://www.markvanderloo.eu/yaRb/2020/12/17/how-to-set-library-path-on-a-parallel-r-cluster/ sessionInfo() R Under development (unstable) (2020-12-21 r79668) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.1 LTS Matrix products: default BLAS: /home/mark/projects/Rdev/R-devel/lib/libRblas.so LAPACK: /home/mark/projects/Rdev/R-devel/lib/libRlapack.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=nl_NL.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=nl_NL.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=nl_NL.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base loaded via a namespace (and not attached): [1] compiler_4.1.0 [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] R crashes when using huge data sets with character string variables
If R is receiving a kill signal there is nothing it can do about it. I am guessing you are running into a memory over-commit issue in your OS. https://en.wikipedia.org/wiki/Memory_overcommitment https://engineering.pivotal.io/post/virtual_memory_settings_in_linux_-_the_problem_with_overcommit/ If you have to run this close to your physical memory limits you might try using your shell's facility (ulimit for bash, limit for some others) to limit process memory/virtual memory use to your available physical memory. You can also try setting the R_MAX_VSIZE environment variable mentioned in ?Memory; that only affects the R heap, not malloc() done elsewhere. Best, luke On Sat, 12 Dec 2020, Arne Henningsen wrote: When working with a huge data set with character string variables, I experienced that various commands let R crash. When I run R in a Linux/bash console, R terminates with the message "Killed". When I use RStudio, I get the message "R Session Aborted. R encountered a fatal error. The session was terminated. Start New Session". If an object in the R workspace needs too much memory, I would expect that R would not crash but issue an error message "Error: cannot allocate vector of size ...". A minimal reproducible example (at least on my computer) is: nObs <- 1e9 date <- paste( round( runif( nObs, 1981, 2015 ) ), round( runif( nObs, 1, 12 ) ), round( runif( nObs, 1, 31 ) ), sep = "-" ) Is this a bug or a feature of R? Some information about my R version, OS, etc: R> sessionInfo() R version 4.0.3 (2020-10-10) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.1 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 locale: [1] LC_CTYPE=en_DK.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_DK.UTF-8LC_COLLATE=en_DK.UTF-8 [5] LC_MONETARY=en_DK.UTF-8LC_MESSAGES=en_DK.UTF-8 [7] LC_PAPER=en_DK.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.0.3 /Arne -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Re: New pipe operator
On Mon, 7 Dec 2020, Peter Dalgaard wrote: On 7 Dec 2020, at 17:35 , Duncan Murdoch wrote: On 07/12/2020 11:18 a.m., peter dalgaard wrote: Hmm, I feel a bit bad coming late to this, but I think I am beginning to side with those who want "... |> head" to work. And yes, that has to happen at the expense of |> head(). Just curious, how would you express head(df, 10)? Currently it is df |> head(10) Would I have to write it as df |> function(d) head(d, 10) It could be df |> ~ head(_, 10) which in a sense is "yes" to your question. As I think it was Gabor points out, the current structure goes down a nonstandard evaluation route, which may be difficult to explain and departs from usual operator evaluation paradigms by being an odd mix of syntax and semantics. R lets you do these sorts of thing, witness ggplot and tidyverse, but the transparency of the language tends to suffer. I wouldn't call it non-standard evaluation. There is no function corresponding to |>, so there's no evaluation at all. It is more like the way "x -> y" is parsed as "y <- x", or "if (x) y" is transformed to `if`(x, y). That's a point, but maybe also my point. Currently, the parser is inserting the LHS as the 1st argument of the RHS, right? Things might be simpler if it was more like a simple binop. It can only be a simple binop if you only allow RHS functions of one argument. Which would require currying along the lines Duncan showed. Something like: `%>>%` <- function(x, f) f(x) C1 <- function(f, ...) function(x) f(x, ...) mtcars %>>% head mtcars %>>% C1(head, 2) mtcars %>>% C1(subset, cyl == 4) %>>% \(d) lm(mpg ~ disp, data = d) This might fly if we lived in a world where most RHS functions take one argument and only a few needed currying. That is the case in many functional languages, but not for R. Making the common case of multiple arguments easy means you have to work at the source level, either in the parser or with some form of NSE. Best, luke -pd Duncan Murdoch It would be neater if it was simply so that the class/type of the object on the right hand side decided what should happen. So we could have a rule that we could have an object, an expression, and possibly an unevaluated call on the RHS. Or maybe a formula, I.e., we could hav ... |> head but not ... |> head() because head() does not evaluate to anything useful. Instead, we could have some of these ... |> quote(head()) ... |> expression(head()) ... |> ~ head() ... |> \(_) head(_) possibly also using a placeholder mechanism for the three first ones. I kind of like the idea that the ~ could be equivalent to \(_). (And yes, I am kicking myself a bit for not using ~ in the NSE arguments in subset() and transform()) -pd On 7 Dec 2020, at 16:20 , Deepayan Sarkar wrote: On Mon, Dec 7, 2020 at 6:53 PM Gabor Grothendieck wrote: On Mon, Dec 7, 2020 at 5:41 AM Duncan Murdoch wrote: I agree it's all about call expressions, but they aren't all being treated equally: x |> f(...) expands to f(x, ...), while x |> `function`(...) expands to `function`(...)(x). This is an exception to the rule for other calls, but I think it's a justified one. This admitted inconsistency is justified by what? No argument has been presented. The justification seems to be implicitly driven by implementation concerns at the expense of usability and language consistency. Sorry if I have missed something, but is your consistency argument basically that if foo <- function(x) x + 1 then x |> foo x |> function(x) x + 1 should both work the same? Suppose it did. Would you then be OK if x |> foo() no longer worked as it does now, and produced foo()(x) instead of foo(x)? If you are not OK with that and want to retain the current behaviour, what would you want to happen with the following? bar <- function(x) function(n) rnorm(n, mean = x) 10 |> bar(runif(1))() # works 'as expected' ~ bar(runif(1))(10) 10 |> bar(runif(1)) # currently bar(10, runif(1)) both of which you probably want. But then baz <- bar(runif(1)) 10 |> baz (not currently allowed) will not be the same as what you would want from 10 |> bar(runif(1)) which leads to a different kind of inconsistency, doesn't it? -Deepayan __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] anonymous functions
I don't disagree in principle, but the reality is users want shortcuts and as a result various packages, in particular tidyverse, have been providing them. Mostly based on formulas, mostly with significant issues since formulas weren't designed for this, and mostly incompatible (tidyverse ones are compatible within tidyverse but not with others). And of course none work in sapply or lapply. Providing a shorthand in base may help to improve this. You don't have to use it if you don't want to, and you can establish coding standards that disallow it if you like. Best, luke On Mon, 7 Dec 2020, Therneau, Terry M., Ph.D. via R-devel wrote: “The shorthand form \(x) x + 1 is parsed as function(x) x + 1. It may be helpful in making code containing simple function expressions more readable.” Color me unimpressed. Over the decades I've seen several "who can write the shortest code" threads: in Fortran, in C, in Splus, ... The same old idea that "short" is a synonym for either elegant, readable, or efficient is now being recylced in the tidyverse. The truth is that "short" is actually an antonym for all of these things, at least for anyone else reading the code; or for the original coder 30-60 minutes after the "clever" lines were written. Minimal use of the spacebar and/or the return key isn't usually held up as a goal, but creeps into many practiioner's code as well. People are excited by replacing "function(" with "\("? Really? Are people typing code with their thumbs? I am ambivalent about pipes: I think it is a great concept, but too many of my colleagues think that using pipes = no need for any comments. As time goes on, I find my goal is to make my code less compact and more readable. Every bug fix or new feature in the survival package now adds more lines of comments or other documentation than lines of code. If I have to puzzle out what a line does, what about the poor sod who inherits the maintainance? -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Re: New pipe operator
Or, keeping dplyr but with R-devel pipe and function shorthand: DF <- "myfile.csv" %>% readLines() |> \(.) gsub(r'{(c\(.*?\)|integer\(0\))}', r'{"\1"}', .) |> \(.) read.csv(text = .) |> mutate(across(2:3, \(col) lapply(col, \(x) eval(parse(text = x) Using named arguments to redirect to the implicit first does work, also in magrittr, but for me at least it is the kind of thing I would probably regret a month later when trying to figure out the code. Best, luke On Mon, 7 Dec 2020, Gabor Grothendieck wrote: On Sat, Dec 5, 2020 at 1:19 PM wrote: Let's get some experience Here is my last SO post using dplyr rewritten to use R 4.1 devel. Seems not too bad. Was able to work around the placeholder for gsub by specifying the arg names and used \(...)... elsewhere. This does not address the inconsistency discussed though. I have indented by 2 spaced in case the email wraps around. The objective is to read myfile.csv including columns that contain c(...) and integer(0), parsing and evaluating them. # taken from: # https://stackoverflow.com/questions/65174764/reading-in-a-csv-that-contains-vectors-cx-y-in-r/65175172#65175172 # create input file for testing Lines <- "\"col1\",\"col2\",\"col3\"\n\"a\",1,integer(0)\n\"c\",c(3,4),5\n\"e\",6,7\n" cat(Lines, file = "myfile.csv") # # base R 4.1 (devel) DF <- "myfile.csv" |> readLines() |> gsub(pattern = r'{(c\(.*?\)|integer\(0\))}', replacement = r'{"\1"}') |> \(.) read.csv(text = .) |> \(.) replace(., 2:3, lapply(.[2:3], \(col) lapply(col, \(x) eval(parse(text = x) # # dplyr/magrittr library(dplyr) DF <- "myfile.csv" %>% readLines %>% gsub(r'{(c\(.*?\)|integer\(0\))}', r'{"\1"}', .) %>% { read.csv(text = .) } %>% mutate(across(2:3, ~ lapply(., function(x) eval(parse(text = x) -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Re: New pipe operator
On Sun, 6 Dec 2020, Gabor Grothendieck wrote: Why is that ambiguous? It works in magrittr. For now, all functions marked internally as syntactically special are disallowed. Not all of these lead to ambiguities. Best, luke library(magrittr) 1 %>% `+`() [1] 1 On Sun, Dec 6, 2020 at 1:09 PM wrote: On Sun, 6 Dec 2020, Gabor Grothendieck wrote: The following gives an error. 1 |> `+`(2) ## Error: function '+' is not supported in RHS call of a pipe 1 |> `+`() ## Error: function '+' is not supported in RHS call of a pipe but this does work: 1 |> (`+`)(2) ## [1] 3 1 |> (`+`)() ## [1] 1 The error message suggests that this was intentional. It isn't mentioned in ?"|>" ?"|>" says: To avoid ambiguities, functions in ‘rhs’ calls may not be syntactically special, such as ‘+’ or ‘if’. (used to say lhs; fixed now). Best, luke On Sat, Dec 5, 2020 at 1:19 PM wrote: We went back and forth on this several times. The key advantage of requiring parentheses is to keep things simple and consistent. Let's get some experience with that. If experience shows requiring parentheses creates too many issues then we can add the option of dropping them later (with special handling of :: and :::). It's easier to add flexibility and complexity than to restrict it after the fact. Best, luke On Sat, 5 Dec 2020, Hugh Parsonage wrote: I'm surprised by the aversion to mtcars |> nrow over mtcars |> nrow() and I think the decision to disallow the former should be reconsidered. The pipe operator is only going to be used when the rhs is a function, so there is no ambiguity with omitting the parentheses. If it's disallowed, it becomes inconsistent with other treatments like sapply(mtcars, typeof) where sapply(mtcars, typeof()) would just be noise. I'm not sure why this decision was taken If the only issue is with the double (and triple) colon operator, then ideally `mtcars |> base::head` should resolve to `base::head(mtcars)` -- in other words, demote the precedence of |> Obviously (looking at the R-Syntax branch) this decision was considered, put into place, then dropped, but I can't see why precisely. Best, Hugh. On Sat, 5 Dec 2020 at 04:07, Deepayan Sarkar wrote: On Fri, Dec 4, 2020 at 7:35 PM Duncan Murdoch wrote: On 04/12/2020 8:13 a.m., Hiroaki Yutani wrote: Error: function '::' not supported in RHS call of a pipe To me, this error looks much more friendly than magrittr's error. Some of them got too used to specify functions without (). This is OK until they use `::`, but when they need to use it, it takes hours to figure out why mtcars %>% base::head #> Error in .::base : unused argument (head) won't work but mtcars %>% head works. I think this is a too harsh lesson for ordinary R users to learn `::` is a function. I've been wanting for magrittr to drop the support for a function name without () to avoid this confusion, so I would very much welcome the new pipe operator's behavior. Thank you all the developers who implemented this! I agree, it's an improvement on the corresponding magrittr error. I think the semantics of not evaluating the RHS, but treating the pipe as purely syntactical is a good decision. I'm not sure I like the recommended way to pipe into a particular argument: mtcars |> subset(cyl == 4) |> \(d) lm(mpg ~ disp, data = d) or mtcars |> subset(cyl == 4) |> function(d) lm(mpg ~ disp, data = d) both of which are equivalent to mtcars |> subset(cyl == 4) |> (function(d) lm(mpg ~ disp, data = d))() It's tempting to suggest it should allow something like mtcars |> subset(cyl == 4) |> lm(mpg ~ disp, data = .) Which is really not that far off from mtcars |> subset(cyl == 4) |> \(.) lm(mpg ~ disp, data = .) once you get used to it. One consequence of the implementation is that it's not clear how multiple occurrences of the placeholder would be interpreted. With magrittr, sort(runif(10)) %>% ecdf(.)(.) ## [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 This is probably what you would expect, if you expect it to work at all, and not ecdf(sort(runif(10)))(sort(runif(10))) There would be no such ambiguity with anonymous functions sort(runif(10)) |> \(.) ecdf(.)(.) -Deepayan which would be expanded to something equivalent to the other versions: but that makes it quite a bit more complicated. (Maybe _ or \. should be used instead of ., since those are not legal variable names.) I don't think there should be an attempt to copy magrittr's special casing of how . is used in determining whether to also include the previous value as first argument. Duncan Murdoch Best, Hiroaki Yutani 2020年12月4日(金) 20:51 Duncan Murdoch : Just saw this on the R-devel news: R now provides a simple native pipe sy
Re: [Rd] [External] Re: New pipe operator
On Sun, 6 Dec 2020, Gabor Grothendieck wrote: The following gives an error. 1 |> `+`(2) ## Error: function '+' is not supported in RHS call of a pipe 1 |> `+`() ## Error: function '+' is not supported in RHS call of a pipe but this does work: 1 |> (`+`)(2) ## [1] 3 1 |> (`+`)() ## [1] 1 The error message suggests that this was intentional. It isn't mentioned in ?"|>" ?"|>" says: To avoid ambiguities, functions in ‘rhs’ calls may not be syntactically special, such as ‘+’ or ‘if’. (used to say lhs; fixed now). Best, luke On Sat, Dec 5, 2020 at 1:19 PM wrote: We went back and forth on this several times. The key advantage of requiring parentheses is to keep things simple and consistent. Let's get some experience with that. If experience shows requiring parentheses creates too many issues then we can add the option of dropping them later (with special handling of :: and :::). It's easier to add flexibility and complexity than to restrict it after the fact. Best, luke On Sat, 5 Dec 2020, Hugh Parsonage wrote: I'm surprised by the aversion to mtcars |> nrow over mtcars |> nrow() and I think the decision to disallow the former should be reconsidered. The pipe operator is only going to be used when the rhs is a function, so there is no ambiguity with omitting the parentheses. If it's disallowed, it becomes inconsistent with other treatments like sapply(mtcars, typeof) where sapply(mtcars, typeof()) would just be noise. I'm not sure why this decision was taken If the only issue is with the double (and triple) colon operator, then ideally `mtcars |> base::head` should resolve to `base::head(mtcars)` -- in other words, demote the precedence of |> Obviously (looking at the R-Syntax branch) this decision was considered, put into place, then dropped, but I can't see why precisely. Best, Hugh. On Sat, 5 Dec 2020 at 04:07, Deepayan Sarkar wrote: On Fri, Dec 4, 2020 at 7:35 PM Duncan Murdoch wrote: On 04/12/2020 8:13 a.m., Hiroaki Yutani wrote: Error: function '::' not supported in RHS call of a pipe To me, this error looks much more friendly than magrittr's error. Some of them got too used to specify functions without (). This is OK until they use `::`, but when they need to use it, it takes hours to figure out why mtcars %>% base::head #> Error in .::base : unused argument (head) won't work but mtcars %>% head works. I think this is a too harsh lesson for ordinary R users to learn `::` is a function. I've been wanting for magrittr to drop the support for a function name without () to avoid this confusion, so I would very much welcome the new pipe operator's behavior. Thank you all the developers who implemented this! I agree, it's an improvement on the corresponding magrittr error. I think the semantics of not evaluating the RHS, but treating the pipe as purely syntactical is a good decision. I'm not sure I like the recommended way to pipe into a particular argument: mtcars |> subset(cyl == 4) |> \(d) lm(mpg ~ disp, data = d) or mtcars |> subset(cyl == 4) |> function(d) lm(mpg ~ disp, data = d) both of which are equivalent to mtcars |> subset(cyl == 4) |> (function(d) lm(mpg ~ disp, data = d))() It's tempting to suggest it should allow something like mtcars |> subset(cyl == 4) |> lm(mpg ~ disp, data = .) Which is really not that far off from mtcars |> subset(cyl == 4) |> \(.) lm(mpg ~ disp, data = .) once you get used to it. One consequence of the implementation is that it's not clear how multiple occurrences of the placeholder would be interpreted. With magrittr, sort(runif(10)) %>% ecdf(.)(.) ## [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 This is probably what you would expect, if you expect it to work at all, and not ecdf(sort(runif(10)))(sort(runif(10))) There would be no such ambiguity with anonymous functions sort(runif(10)) |> \(.) ecdf(.)(.) -Deepayan which would be expanded to something equivalent to the other versions: but that makes it quite a bit more complicated. (Maybe _ or \. should be used instead of ., since those are not legal variable names.) I don't think there should be an attempt to copy magrittr's special casing of how . is used in determining whether to also include the previous value as first argument. Duncan Murdoch Best, Hiroaki Yutani 2020年12月4日(金) 20:51 Duncan Murdoch : Just saw this on the R-devel news: R now provides a simple native pipe syntax ‘|>’ as well as a shorthand notation for creating functions, e.g. ‘\(x) x + 1’ is parsed as ‘function(x) x + 1’. The pipe implementation as a syntax transformation was motivated by suggestions from Jim Hester and Lionel Henry. These features are experimental and may change prior to release. This is a good
Re: [Rd] [External] Re: New pipe operator
We went back and forth on this several times. The key advantage of requiring parentheses is to keep things simple and consistent. Let's get some experience with that. If experience shows requiring parentheses creates too many issues then we can add the option of dropping them later (with special handling of :: and :::). It's easier to add flexibility and complexity than to restrict it after the fact. Best, luke On Sat, 5 Dec 2020, Hugh Parsonage wrote: I'm surprised by the aversion to mtcars |> nrow over mtcars |> nrow() and I think the decision to disallow the former should be reconsidered. The pipe operator is only going to be used when the rhs is a function, so there is no ambiguity with omitting the parentheses. If it's disallowed, it becomes inconsistent with other treatments like sapply(mtcars, typeof) where sapply(mtcars, typeof()) would just be noise. I'm not sure why this decision was taken If the only issue is with the double (and triple) colon operator, then ideally `mtcars |> base::head` should resolve to `base::head(mtcars)` -- in other words, demote the precedence of |> Obviously (looking at the R-Syntax branch) this decision was considered, put into place, then dropped, but I can't see why precisely. Best, Hugh. On Sat, 5 Dec 2020 at 04:07, Deepayan Sarkar wrote: On Fri, Dec 4, 2020 at 7:35 PM Duncan Murdoch wrote: On 04/12/2020 8:13 a.m., Hiroaki Yutani wrote: Error: function '::' not supported in RHS call of a pipe To me, this error looks much more friendly than magrittr's error. Some of them got too used to specify functions without (). This is OK until they use `::`, but when they need to use it, it takes hours to figure out why mtcars %>% base::head #> Error in .::base : unused argument (head) won't work but mtcars %>% head works. I think this is a too harsh lesson for ordinary R users to learn `::` is a function. I've been wanting for magrittr to drop the support for a function name without () to avoid this confusion, so I would very much welcome the new pipe operator's behavior. Thank you all the developers who implemented this! I agree, it's an improvement on the corresponding magrittr error. I think the semantics of not evaluating the RHS, but treating the pipe as purely syntactical is a good decision. I'm not sure I like the recommended way to pipe into a particular argument: mtcars |> subset(cyl == 4) |> \(d) lm(mpg ~ disp, data = d) or mtcars |> subset(cyl == 4) |> function(d) lm(mpg ~ disp, data = d) both of which are equivalent to mtcars |> subset(cyl == 4) |> (function(d) lm(mpg ~ disp, data = d))() It's tempting to suggest it should allow something like mtcars |> subset(cyl == 4) |> lm(mpg ~ disp, data = .) Which is really not that far off from mtcars |> subset(cyl == 4) |> \(.) lm(mpg ~ disp, data = .) once you get used to it. One consequence of the implementation is that it's not clear how multiple occurrences of the placeholder would be interpreted. With magrittr, sort(runif(10)) %>% ecdf(.)(.) ## [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 This is probably what you would expect, if you expect it to work at all, and not ecdf(sort(runif(10)))(sort(runif(10))) There would be no such ambiguity with anonymous functions sort(runif(10)) |> \(.) ecdf(.)(.) -Deepayan which would be expanded to something equivalent to the other versions: but that makes it quite a bit more complicated. (Maybe _ or \. should be used instead of ., since those are not legal variable names.) I don't think there should be an attempt to copy magrittr's special casing of how . is used in determining whether to also include the previous value as first argument. Duncan Murdoch Best, Hiroaki Yutani 2020年12月4日(金) 20:51 Duncan Murdoch : Just saw this on the R-devel news: R now provides a simple native pipe syntax ‘|>’ as well as a shorthand notation for creating functions, e.g. ‘\(x) x + 1’ is parsed as ‘function(x) x + 1’. The pipe implementation as a syntax transformation was motivated by suggestions from Jim Hester and Lionel Henry. These features are experimental and may change prior to release. This is a good addition; by using "|>" instead of "%>%" there should be a chance to get operator precedence right. That said, the ?Syntax help topic hasn't been updated, so I'm not sure where it fits in. There are some choices that take a little getting used to: > mtcars |> head Error: The pipe operator requires a function call or an anonymous function expression as RHS (I need to say mtcars |> head() instead.) This sometimes leads to error messages that are somewhat confusing: > mtcars |> magrittr::debug_pipe |> head Error: function '::' not supported in RHS call of a pipe but mtcars |> magrittr::debu
Re: [Rd] [External] Re: New pipe operator
On Sat, 5 Dec 2020, Duncan Murdoch wrote: On 04/12/2020 2:26 p.m., luke-tier...@uiowa.edu wrote: On Fri, 4 Dec 2020, Dénes Tóth wrote: On 12/4/20 3:05 PM, Duncan Murdoch wrote: ... It's tempting to suggest it should allow something like mtcars |> subset(cyl == 4) |> lm(mpg ~ disp, data = .) which would be expanded to something equivalent to the other versions: but that makes it quite a bit more complicated. (Maybe _ or \. should be used instead of ., since those are not legal variable names.) I support the idea of using an underscore (_) as the placeholder symbol. I strongly oppose adding a placeholder. Allowing for an optional placeholder significantly complicates both implementing and explaining the semantics. For a simple syntax transformation to be viable it would also require some restrictions, such as only allowing a placeholder as a top level argument and only once. Checking that these restrictions are met, and accurately signaling when they are not with reasonable error messages, is essentially an unsolvable problem given R's semantics. I don't think you read my suggestion, but that's okay: you're maintaining it, not me. I thought I did but maybe I missed something. You are right that supporting a placeholder makes things a lot more complicated. For being able to easily recognize the non-standard cases _ is better than . but for me at least not by much. We did try a number of variations; the code is in the R-syntax branch. At the root of that branch are two .md files with some notes as of around useR20. Once things settle down I may update those and look into turning them into a blog post. Best, luke Duncan Murdoch The case where the LHS is to be passed as something other than the first argument is unusual. For me, having that case stand out by using a function expression makes it much easier to see and so makes the code easier to understand. As a wearer of progressive bifocals and someone whose screen is not always free of small dust particles, having to spot the non-standard pipe stages by seeing a placeholder, especially a . placeholder, is be a bug, not a feature. Best, luke Syntactic sugars work the the best if 1) they require less keystrokes and/or 2) are easier to read compared to the "normal" syntax, and 3) can not lead to unexpected bugs (which is a major problem with the magrittr pipe). Using '_' fulfills all of these criteria since '_' can not clash with any variable in the environment. Denes __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Re: New pipe operator
On Fri, 4 Dec 2020, Dénes Tóth wrote: On 12/4/20 3:05 PM, Duncan Murdoch wrote: ... It's tempting to suggest it should allow something like mtcars |> subset(cyl == 4) |> lm(mpg ~ disp, data = .) which would be expanded to something equivalent to the other versions: but that makes it quite a bit more complicated. (Maybe _ or \. should be used instead of ., since those are not legal variable names.) I support the idea of using an underscore (_) as the placeholder symbol. I strongly oppose adding a placeholder. Allowing for an optional placeholder significantly complicates both implementing and explaining the semantics. For a simple syntax transformation to be viable it would also require some restrictions, such as only allowing a placeholder as a top level argument and only once. Checking that these restrictions are met, and accurately signaling when they are not with reasonable error messages, is essentially an unsolvable problem given R's semantics. The case where the LHS is to be passed as something other than the first argument is unusual. For me, having that case stand out by using a function expression makes it much easier to see and so makes the code easier to understand. As a wearer of progressive bifocals and someone whose screen is not always free of small dust particles, having to spot the non-standard pipe stages by seeing a placeholder, especially a . placeholder, is be a bug, not a feature. Best, luke Syntactic sugars work the the best if 1) they require less keystrokes and/or 2) are easier to read compared to the "normal" syntax, and 3) can not lead to unexpected bugs (which is a major problem with the magrittr pipe). Using '_' fulfills all of these criteria since '_' can not clash with any variable in the environment. Denes __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Re: .Internal(quit(...)): system call failed: Cannot allocate memory
The fact that your max resident size isn't affected looks odd. Are you setting the environment variable outside R? When I run env R_MAX_VSIZE=16Gb /usr/bin/time bin/Rscript jg.R 1e9 2e0 0 0 (your code in jg.R). I get a quick failure with 11785524maxresident)k Best, luke On Tue, 1 Dec 2020, Jan Gorecki wrote: Thank you Luke, I tried your suggestion about R_MAX_VSIZE but I am not able to get the error you are getting. I tried recent R devel as I have seen you made a change to GC there. My machine is 128GB, free -h reports 125GB available. I tried to set 128, 125 and 100. In all cases the result is "Command terminated by signal 9". Each took around 6-6.5h. Details below, if it tells you anything how could I optimize it (or raise an exception early) please do let me know. R 4.0.3 unset R_MAX_VSIZE User time (seconds): 40447.92 System time (seconds): 4034.37 Percent of CPU this job got: 201% Elapsed (wall clock) time (h:mm:ss or m:ss): 6:07:59 Maximum resident set size (kbytes): 127261184 Major (requiring I/O) page faults: 72441 Minor (reclaiming a frame) page faults: 3315491751 Voluntary context switches: 381446 Involuntary context switches: 529554 File system inputs: 108339200 File system outputs: 120 R-devel 2020-11-27 r79522 unset R_MAX_VSIZE User time (seconds): 40713.52 System time (seconds): 4039.52 Percent of CPU this job got: 198% Elapsed (wall clock) time (h:mm:ss or m:ss): 6:15:52 Maximum resident set size (kbytes): 127254796 Major (requiring I/O) page faults: 72810 Minor (reclaiming a frame) page faults: 3433589848 Voluntary context switches: 384363 Involuntary context switches: 609024 File system inputs: 108467064 File system outputs: 112 R_MAX_VSIZE=128Gb User time (seconds): 40411.13 System time (seconds): 4227.99 Percent of CPU this job got: 198% Elapsed (wall clock) time (h:mm:ss or m:ss): 6:14:01 Maximum resident set size (kbytes): 127249316 Major (requiring I/O) page faults: 88500 Minor (reclaiming a frame) page faults: 3544520527 Voluntary context switches: 384117 Involuntary context switches: 545397 File system inputs: 111675896 File system outputs: 120 R_MAX_VSIZE=125Gb User time (seconds): 40246.83 System time (seconds): 4042.76 Percent of CPU this job got: 201% Elapsed (wall clock) time (h:mm:ss or m:ss): 6:06:56 Maximum resident set size (kbytes): 127254200 Major (requiring I/O) page faults: 63867 Minor (reclaiming a frame) page faults: 3449493803 Voluntary context switches: 370753 Involuntary context switches: 614607 File system inputs: 106322880 File system outputs: 112 R_MAX_VSIZE=100Gb User time (seconds): 41837.10 System time (seconds): 3979.57 Percent of CPU this job got: 192% Elapsed (wall clock) time (h:mm:ss or m:ss): 6:36:34 Maximum resident set size (kbytes): 127256940 Major (requiring I/O) page faults: 66829 Minor (reclaiming a frame) page faults: 3357778594 Voluntary context switches: 391149 Involuntary context switches: 646410 File system inputs: 106605648 File system outputs: 120 On Fri, Nov 27, 2020 at 10:18 PM wrote: On Thu, 26 Nov 2020, Jan Gorecki wrote: Thank you Luke for looking into it. Your knowledge of gc is definitely helpful here. I put comments inline below. Best, Jan On Wed, Nov 25, 2020 at 10:38 PM wrote: On Tue, 24 Nov 2020, Jan Gorecki wrote: As for other calls to system. I avoid calling system. In the past I had some (to get memory stats from OS), but they were failing with exactly the same issue. So yes, if I would add call to system before calling quit, I believe it would fail with the same error. At the same time I think (although I am not sure) that new allocations made in R are working fine. So R seems to reserve some memory and can continue to operate, while external call like system will fail. Maybe it is like this by design, don't know. Thanks for the report on quit(). We're exploring how to make the cleanup on exit more robust to low memory situations like these. Aside from this problem that is easy to report due to the warning message, I think that gc() is choking at the same time. I tried to make reproducible example for that, multiple times but couldn't, let me try one more time. It happens to manifest when there is 4e8+ unique characters/factors in an R session. I am able to reproduce it using data.table and dplyr (0.8.4 because 1.0.0+ fails even sooner), but using base R is not easy because of the size. I described briefly problem in: https://github.com/h2oai/db-benchmark/issues/110 Because of the design of R's character vectors, with each element allocated separately, R is never going t
Re: [Rd] [External] Re: .Internal(quit(...)): system call failed: Cannot allocate memory
On Thu, 26 Nov 2020, Jan Gorecki wrote: Thank you Luke for looking into it. Your knowledge of gc is definitely helpful here. I put comments inline below. Best, Jan On Wed, Nov 25, 2020 at 10:38 PM wrote: On Tue, 24 Nov 2020, Jan Gorecki wrote: As for other calls to system. I avoid calling system. In the past I had some (to get memory stats from OS), but they were failing with exactly the same issue. So yes, if I would add call to system before calling quit, I believe it would fail with the same error. At the same time I think (although I am not sure) that new allocations made in R are working fine. So R seems to reserve some memory and can continue to operate, while external call like system will fail. Maybe it is like this by design, don't know. Thanks for the report on quit(). We're exploring how to make the cleanup on exit more robust to low memory situations like these. Aside from this problem that is easy to report due to the warning message, I think that gc() is choking at the same time. I tried to make reproducible example for that, multiple times but couldn't, let me try one more time. It happens to manifest when there is 4e8+ unique characters/factors in an R session. I am able to reproduce it using data.table and dplyr (0.8.4 because 1.0.0+ fails even sooner), but using base R is not easy because of the size. I described briefly problem in: https://github.com/h2oai/db-benchmark/issues/110 Because of the design of R's character vectors, with each element allocated separately, R is never going to be great at handling huge numbers of distinct strings. But it can do an adequate job given enough memory to work with. When I run your GitHub issue example on a machine with around 500 Gb of RAM it seems to run OK; /usr/bin/time reports 2706.89user 161.89system 37:10.65elapsed 128%CPU (0avgtext+0avgdata 92180796maxresident)k 0inputs+103450552outputs (0major+38716351minor)pagefaults 0swaps So the memory footprint is quite large. Using gc.time() it looks like about 1/3 of the time is in GC. Not ideal, and maybe could be improved on a bit, but probably not by much. The GC is basically doing an adequate job, given enough RAM. Agree, 1/3 is a lot but still acceptable. So this strictly is not something that requires intervention. PS. I wasn't aware of gc.time(), it may be worth linking it from SeeAlso in gc() manual. If you run this example on a system without enough RAM, or with other programs competing for RAM, you are likely to end up fighting with your OS/hardware's virtual memory system. When I try to run it on a 16Gb system it churns for an hour or so before getting killed, and /usr/bin/time reports a huge number of page faults: 312523816inputs+0outputs (24761285major+25762068minor)pagefaults 0swaps You are probably experiencing something similar. Yes, this is exactly what I am experiencing. The machine is a bare metal machine of 128GB mem, csv size 50GB, data.frame size 74GB. In my case it churns for ~3h before it gets killed with SIGINT from the parent R process which uses 3h as a timeout for this script. This is something I would like to be addressed because gc time is far bigger than actual computation time. This is not really acceptable, I would prefer to raise an exception instead. There may be opportunities for more tuning of the GC to better handle running this close to memory limits, but I doubt the payoff would be worth the effort. If you don't have plans/time to work on that anytime soon, then I can fill bugzilla for this problem so it won't get lost in the mailing list. I'm not convinced anything useful can be done that would work well for your application without working badly for others. If you want to drive this close to your memory limits you are probably going to have to take responsibility for some tuning at your end. One option in ?Memory you might try is the R_MAX_VSIZE environment variable. On my 16Gb machine with R_MAX_VSIZE=16Gb your example fails very quickly with Error: vector memory exhausted (limit reached?) rather than churning for an hour trying to make things work. Setting memory and/or virtual memory limits in your shell is another option. Best, luke Best, luke It would help if gcinfo() could take FALSE/TRUE/2L where 2L will print even more information about gc, like how much time the each gc() process took, how many objects it has to check on each level. Best regards, Jan On Tue, Nov 24, 2020 at 1:05 PM Tomas Kalibera wrote: On 11/24/20 11:27 AM, Jan Gorecki wrote: Thanks Bill for checking that. It was my impression that warnings are raised from some internal system calls made when quitting R. At that point I don't have much control over checking the return status of those. Your suggestion looks good to me. Tomas, do you think this could help? could this be implemented? I think this is a good suggestion. Deleting files on Unix was changed from system("rm&q
Re: [Rd] [External] Re: .Internal(quit(...)): system call failed: Cannot allocate memory
On Tue, 24 Nov 2020, Jan Gorecki wrote: As for other calls to system. I avoid calling system. In the past I had some (to get memory stats from OS), but they were failing with exactly the same issue. So yes, if I would add call to system before calling quit, I believe it would fail with the same error. At the same time I think (although I am not sure) that new allocations made in R are working fine. So R seems to reserve some memory and can continue to operate, while external call like system will fail. Maybe it is like this by design, don't know. Thanks for the report on quit(). We're exploring how to make the cleanup on exit more robust to low memory situations like these. Aside from this problem that is easy to report due to the warning message, I think that gc() is choking at the same time. I tried to make reproducible example for that, multiple times but couldn't, let me try one more time. It happens to manifest when there is 4e8+ unique characters/factors in an R session. I am able to reproduce it using data.table and dplyr (0.8.4 because 1.0.0+ fails even sooner), but using base R is not easy because of the size. I described briefly problem in: https://github.com/h2oai/db-benchmark/issues/110 Because of the design of R's character vectors, with each element allocated separately, R is never going to be great at handling huge numbers of distinct strings. But it can do an adequate job given enough memory to work with. When I run your GitHub issue example on a machine with around 500 Gb of RAM it seems to run OK; /usr/bin/time reports 2706.89user 161.89system 37:10.65elapsed 128%CPU (0avgtext+0avgdata 92180796maxresident)k 0inputs+103450552outputs (0major+38716351minor)pagefaults 0swaps So the memory footprint is quite large. Using gc.time() it looks like about 1/3 of the time is in GC. Not ideal, and maybe could be improved on a bit, but probably not by much. The GC is basically doing an adequate job, given enough RAM. If you run this example on a system without enough RAM, or with other programs competing for RAM, you are likely to end up fighting with your OS/hardware's virtual memory system. When I try to run it on a 16Gb system it churns for an hour or so before getting killed, and /usr/bin/time reports a huge number of page faults: 312523816inputs+0outputs (24761285major+25762068minor)pagefaults 0swaps You are probably experiencing something similar. There may be opportunities for more tuning of the GC to better handle running this close to memory limits, but I doubt the payoff would be worth the effort. Best, luke It would help if gcinfo() could take FALSE/TRUE/2L where 2L will print even more information about gc, like how much time the each gc() process took, how many objects it has to check on each level. Best regards, Jan On Tue, Nov 24, 2020 at 1:05 PM Tomas Kalibera wrote: On 11/24/20 11:27 AM, Jan Gorecki wrote: Thanks Bill for checking that. It was my impression that warnings are raised from some internal system calls made when quitting R. At that point I don't have much control over checking the return status of those. Your suggestion looks good to me. Tomas, do you think this could help? could this be implemented? I think this is a good suggestion. Deleting files on Unix was changed from system("rm") to doing that in C, and deleting the session directory should follow. It might also help diagnosing your problem, but I don't think it would solve it. If the diagnostics in R works fine and the OS was so hopelessly out of memory that it couldn't run any more external processes, then really this is not a problem of R, but of having exhausted the resources. And it would be a coincidence that just this particular call to "system" at the end of the session did not work. Anything else could break as well close to the end of the script. This seems the most likely explanation to me. Do you get this warning repeatedly, reproducibly at least in slightly different scripts at the very end, with this warning always from quit()? So that the "call" part of the warning message has .Internal(quit) like in the case you posted? Would adding another call to "system" before the call to "q()" work - with checking the return value? If it is always only the last call to "system" in "q()", then it is suspicious, perhaps an indication that some diagnostics in R is not correct. In that case, a reproducible example would be the key - so either if you could diagnose on your end what is the problem, or create a reproducible example that someone else can use to reproduce and debug. Best Tomas On Mon, Nov 23, 2020 at 7:10 PM Bill Dunlap wrote: The call to system() probably is an internal call used to delete the session's tempdir(). This sort of failure means that a potentially large amount of disk space is not being recovered when R is done. Perhaps R_CleanTempD
Re: [Rd] [External] exists, get and get0 accept silently inputs of length > 1
Thanks for the suggestion. In R-devel (as of r79474) exists(), get(), and get0() now signal an error if the first argument has length > 1. This will cause about 30 CRAN packages and possibly a couple of Bioconductor packages to fail under R-devel. getS3method() now also signals an error if the class argument has length > 1. Calls of the form getS2method(generic, class(x)) will now fail if class(x) has length > 1. I believe most CRAN package issues related to this change have already been resolved, but a few may remain. Best, luke On Fri, 13 Nov 2020, Antoine Fabri wrote: Dear R-devel, The doc of exists, get and get0 is unambiguous, x should be an object given as a character string. However these accept longer inputs. It can lead an uncareful user to think these functions are vectorized when they're not, and generally lets through bugs that one might have preferred to trigger earlier failure. ``` r exists("d") #> [1] FALSE exists(c("c", "d")) #> [1] TRUE get(c("c", "d")) #> function (...) .Primitive("c") get0(c("c", "d")) #> function (...) .Primitive("c") ``` I believe these should either fail, or be vectorized, probably the former. Thanks, Antoine [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Two ALTREP questions
On Sat, 21 Nov 2020, Jiefei Wang wrote: Hello, I have two related ALTREP questions. It seems like there is no way to assign attributes to an ALTREP vector without using C++ code. To be more specifically, I want to make an ALTREP matrix, I have tried the following R code but none of them work. ``` .Internal(inspect(1:6)) .Internal(inspect(matrix(1:6, 2,3))) .Internal(inspect(as.matrix(1:6))) .Internal(inspect(structure(1:6, dim = c(2L,3L .Internal(inspect({x <- 1:6;attr(x, "dim") <- c(2L,3L);x})) .Internal(inspect({x <- 1:6;attributes(x)<- list(dim = c(2L,3L));x})) ``` Some things that my help you: - Try with 1:6 replaced by as.character(1:6), and look at the REF values in both cases. - In particular, look at what this gives you: x <- as.character(1:6) attr(x, "dim") <- c(2, 3) - Things can be a little different with larger vectors; try variants of your examples for more than 64 elements. This also brings my second question, it seems like the ALTREP coercion function does not handle attributes correctly. After the coercion, the ALTREP object will lose its attributes. ``` coerceFunc <- inline::cxxfunction( signature(x = "SEXP", attr = "SEXP" ) , ' SET_ATTRIB(x,attr); return(Rf_coerceVector(x, REALSXP)); ') coerceFunc(1:6, pairlist(dim = c(2L, 3L))) [1] 1 2 3 4 5 6 coerceFunc(1:6 + 0L, pairlist(dim = c(2L, 3L))) [,1] [,2] [,3] [1,]135 [2,]246 ``` The problem is that the coercion function is directly dispatched to the user-defined ALTREP coercion function, so the user is responsible to attach the attributes after the coercion. If he forgets to do so, then the result is a plain vector. Similar to the `Duplicate` and `DuplicateEX` functions where the former one will attach the attributes by default, I feel that the `Coerce` function should only return a plain vector and there should be a `CoerceEx` function to do the attribute assignment, so the logic in the no-EX ALTREP functions can be consistent. I do not know how dramastic the change would be, so maybe this is too hard to do. Since you raised this earlier I have been looking at it and also think that this needs to he handled along the lines of Duplicate/DuplicateEx. I need to find some time to think that through and implement it; hopefully I'll get to it before the end of the year. BTW, is there any way to contribute to the R source? I know R has a limited resouces, so if possible, I will be happy to fix the matrix issue myself and make some minor contributions to the R community. You can find the suggested process for contributing described in the 'Reporting Bugs' link on the R home page https://www.r-project.org/ Best, luke Best, Jiefei [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] exists, get and get0 accept silently inputs of length > 1
Come on, folks. There is no NSE involved in calls to get(): it's standard evaluation all the way into the C code. Prior to the change a first argument that is anything other than a character vector would produce an error. After the change, passing in a symbol will do the obvious thing. Code that worked previously without error (i.e. called get() with string values) will continue to work exactly as it did before. It's a little more convenient and a little more efficient for some computations on the language not to have to call as.character on symbols before passing them to get(). Hence the change expanding the domain of get(). luke On Tue, 17 Nov 2020, Gabriel Becker wrote: Hi all, I have used variable values in get() as well, and including, I think, in package code (though pretty infrequently). Perhaps a character.only argument similar to library? ~G On Mon, Nov 16, 2020 at 5:31 PM Hugh Parsonage wrote: I noticed the recent commit to R-dev (r79434). Is this wise? I've often used get() in constructions like for (j in ls()) if (is.numeric(x <- get(j))) ... (and often interactively, rather than in a package) Am I to understand that get(j) will now be equivalent to `j` even if j is a string referring putatively to another object? On Sat, 14 Nov 2020 at 01:34, wrote: > > Worth looking into. It would probably cause some check failures, so > would probably be a good idea to run a check across BIOC/CRAN. At the > same time it would be worth allowing name objects (type "symbol") so > thee don't have to be converted to character for the call and then > back to names internally for the environment lookup. > > Best, > > luke > > On Fri, 13 Nov 2020, Antoine Fabri wrote: > > > Dear R-devel, > > > > The doc of exists, get and get0 is unambiguous, x should be an object given > > as a character string. However these accept longer inputs. It can lead an > > uncareful user to think these functions are vectorized when they're not, > > and generally lets through bugs that one might have preferred to trigger > > earlier failure. > > > > ``` r > > exists("d") > > #> [1] FALSE > > exists(c("c", "d")) > > #> [1] TRUE > > get(c("c", "d")) > > #> function (...) .Primitive("c") > > get0(c("c", "d")) > > #> function (...) .Primitive("c") > > ``` > > > > I believe these should either fail, or be vectorized, probably the former. > > > > Thanks, > > > > Antoine > > > > [[alternative HTML version deleted]] > > > > __ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > -- > Luke Tierney > Ralph E. Wareham Professor of Mathematical Sciences > University of Iowa Phone: 319-335-3386 > Department of Statistics and Fax: 319-335-3017 > Actuarial Science > 241 Schaeffer Hall email: luke-tier...@uiowa.edu > Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] exists, get and get0 accept silently inputs of length > 1
Worth looking into. It would probably cause some check failures, so would probably be a good idea to run a check across BIOC/CRAN. At the same time it would be worth allowing name objects (type "symbol") so thee don't have to be converted to character for the call and then back to names internally for the environment lookup. Best, luke On Fri, 13 Nov 2020, Antoine Fabri wrote: Dear R-devel, The doc of exists, get and get0 is unambiguous, x should be an object given as a character string. However these accept longer inputs. It can lead an uncareful user to think these functions are vectorized when they're not, and generally lets through bugs that one might have preferred to trigger earlier failure. ``` r exists("d") #> [1] FALSE exists(c("c", "d")) #> [1] TRUE get(c("c", "d")) #> function (...) .Primitive("c") get0(c("c", "d")) #> function (...) .Primitive("c") ``` I believe these should either fail, or be vectorized, probably the former. Thanks, Antoine [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Re: Change to I() in R 4.1
On Fri, 30 Oct 2020, Pages, Herve wrote: On 10/29/20 23:08, Pages, Herve wrote: ... I can think of 2 ways to move forward: 1. Keep I()'s current implementation but suppress the warning. We'll make the necessary adjustments to DataFrame() to repair columns supplied as I() objects. Note that we would still be in the situation where I() objects break validObject() but we've been in that situation for years and so far we've managed to work around it. However this doesn't mean that validObject() shouldn't be fixed. Note that print(I()) would also need to be fixed (it says "" which is misleading). Anyways, these 2 issues are separated from the main issue and can be dealt with later. 1b. A variant of the above could be to use the old implementation for S4 objects only: I <- function(x) { if (isS4(x)) { structure(x, class = unique.default(c("AsIs", oldClass(x } else { `class<-`(x, unique.default(c("AsIs", oldClass(x } } That is probably a good compromise for now. Not really. The underlying problem is that class<- and attributes<- (which is what structure() uses) handle the 'class' attribute differently, and that needs to be fixed. I don't have a strong opinion on what either should do, but they should do the same thing. It's probably worth re-thinking the I() mechanism. ?Modifying the value, whether by changing the class or an attribute, is going to be brittle. A little less so for an attribute, but using an attribute rules out dispatch on the AsIs property. Best, luke I would also suggest that the "package" attribute of the S4 class be kept around so the code that we use to restore the original object has a way to restore it exactly, including its full class specification. Right now, and also with the previous implementation, we cannot do that because attr(class(x), "package") is lost. So something like this: I <- function(x) { if (isS4(x)) { x_class <- class(x) new_classes <- c("AsIs", x_class) attr(new_classes, "package") <- attr(x_class, "package") structure(x, class=new_classes) } else { `class<-`(x, unique.default(c("AsIs", oldClass(x } } Thanks, H. -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Something is wrong with the unserialize function
I found that also; fixed in r79386 in the trunk. Will port to R-patched shortly. Best, luke On Thu, 29 Oct 2020, Martin Morgan wrote: This Index: src/main/altrep.c === --- src/main/altrep.c (revision 79385) +++ src/main/altrep.c (working copy) @@ -275,10 +275,11 @@ SEXP psym = ALTREP_SERIALIZED_CLASS_PKGSYM(info); SEXP class = LookupClass(csym, psym); if (class == NULL) { - SEXP pname = ScalarString(PRINTNAME(psym)); + SEXP pname = PROTECT(ScalarString(PRINTNAME(psym))); R_tryCatchError(find_namespace, pname, handle_namespace_error, NULL); class = LookupClass(csym, psym); + UNPROTECT(1); } return class; } seems to remove the warning; I'm guessing that the other SEXP already exist so don't need protecting? Martin Morgan On 10/29/20, 12:47 PM, "R-devel on behalf of luke-tier...@uiowa.edu" wrote: Thanks for the report. Will look into it when I get a chance unless someone else gets there first. A simpler reprex: ## create and serialize a memmory-mapped file object filePath <- "x.dat" con <- file(filePath, "wrb") writeBin(rep(0.0,10),con) close(con) library(simplemmap) x <- mmap(filePath, "double") saveRDS(x, file = "x.Rds") ## in a separate R process: gctorture() readRDS("x.Rds") Looks like a missing PROTECT somewhere. Best, luke On Thu, 29 Oct 2020, Jiefei Wang wrote: > Hi all, > > I am not able to export an ALTREP object when `gctorture` is on in the > worker. The package simplemmap can be used to reproduce the problem. See > the example below > ``` > ## Create a temporary file > filePath <- tempfile() > con <- file(filePath, "wrb") > writeBin(rep(0.0,10),con) > close(con) > > library(simplemmap) > library(parallel) > cl <- makeCluster(1) > x <- mmap(filePath, "double") > ## Turn gctorture on > clusterEvalQ(cl, gctorture()) > clusterExport(cl, "x") > ## x is an 0-length vector on the worker > clusterEvalQ(cl, x) > stopCluster(cl) > ``` > > you can find more info on the problem if you manually build a connection > between two R processes and export the ALTREP object. See output below > ``` >> con <- socketConnection(port = 1234,server = FALSE) >> gctorture() >> x <- unserialize(con) > Warning message: > In unserialize(con) : > cannot unserialize ALTVEC object of class 'mmap_real' from package > 'simplemmap'; returning length zero vector > ``` > It seems like simplemmap did not get loaded correctly on the worker. If > you run `library( simplemmap)` before unserializing the ALTREP, there will > be no problem. But I suppose we should be able to unserialize objects > without preloading the library? > > This issue can be reproduced on Ubuntu with R version 4.0.2 (2020-06-22) > and Windows with R Under development (unstable) (2020-09-03 r79126). > > Here is the link to simplemmap: > https://github.com/ALTREP-examples/Rpkg-simplemmap > > Best, > Jiefei > > [[alternative HTML version deleted]] > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Something is wrong with the unserialize function
Thanks for the report. Will look into it when I get a chance unless someone else gets there first. A simpler reprex: ## create and serialize a memmory-mapped file object filePath <- "x.dat" con <- file(filePath, "wrb") writeBin(rep(0.0,10),con) close(con) library(simplemmap) x <- mmap(filePath, "double") saveRDS(x, file = "x.Rds") ## in a separate R process: gctorture() readRDS("x.Rds") Looks like a missing PROTECT somewhere. Best, luke On Thu, 29 Oct 2020, Jiefei Wang wrote: Hi all, I am not able to export an ALTREP object when `gctorture` is on in the worker. The package simplemmap can be used to reproduce the problem. See the example below ``` ## Create a temporary file filePath <- tempfile() con <- file(filePath, "wrb") writeBin(rep(0.0,10),con) close(con) library(simplemmap) library(parallel) cl <- makeCluster(1) x <- mmap(filePath, "double") ## Turn gctorture on clusterEvalQ(cl, gctorture()) clusterExport(cl, "x") ## x is an 0-length vector on the worker clusterEvalQ(cl, x) stopCluster(cl) ``` you can find more info on the problem if you manually build a connection between two R processes and export the ALTREP object. See output below ``` con <- socketConnection(port = 1234,server = FALSE) gctorture() x <- unserialize(con) Warning message: In unserialize(con) : cannot unserialize ALTVEC object of class 'mmap_real' from package 'simplemmap'; returning length zero vector ``` It seems like simplemmap did not get loaded correctly on the worker. If you run `library( simplemmap)` before unserializing the ALTREP, there will be no problem. But I suppose we should be able to unserialize objects without preloading the library? This issue can be reproduced on Ubuntu with R version 4.0.2 (2020-06-22) and Windows with R Under development (unstable) (2020-09-03 r79126). Here is the link to simplemmap: https://github.com/ALTREP-examples/Rpkg-simplemmap Best, Jiefei [[alternative HTML version deleted]] ______ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Re: Coercion function does not work for the ALTREP object
For larger atomic vectors (currently >= 64 elements) the complex assignment process tries to avoid duplicating when only attributes are updated, This is done with an ALTREP wrapper. The differences in whether the Duplicate method are called for smaller and larger vectors are therefore as intended, Ideally there should be no difference for Coerce. There is a difference because wrappers currently don't delegate the Coerce method when the wrapped object is an ALTREP. I'll look into whether that can be addressed without breaking things. Best, luke On Thu, 8 Oct 2020, Jiefei Wang wrote: Hi Gabriel, here is a simple package for reproducing the problem. https://github.com/Jiefei-Wang/testPkg Best, Jiefei On Thu, Oct 8, 2020 at 5:04 AM Gabriel Becker wrote: Jiefei, Where does the code for your altrep class live? Thanks, ~G On Wed, Oct 7, 2020 at 4:25 AM Jiefei Wang wrote: Hi all, The coercion function defined for the ALTREP object will not be called by R when an assignment operation implicitly introduces coercion for a large ALTREP object. For example, If I create a vector of length 10, the ALTREP coercion function seems to work fine. ``` x <- 1:10 y <- wrap_altrep(x) .Internal(inspect(y)) @0x1f9271c0 13 INTSXP g0c0 [REF(2)] I am altrep y[1] <- 1.0 Duplicating object Coercing object .Internal(inspect(y)) @0x1f927c08 14 REALSXP g0c0 [REF(1)] I am altrep ``` However, if I create a vector of length 1024, R will give me a normal real-type vector ``` x <- 1:1024 y <- wrap_altrep(x) .Internal(inspect(y)) @0x1f8ddb20 13 INTSXP g0c0 [REF(2)] I am altrep y[1] <- 1.0 .Internal(inspect(y)) @0x1f0d72a0 14 REALSXP g0c7 [REF(1)] (len=1024, tl=0) 1,2,3,4,5,... ``` Note that the duplicate function is also called for the first example. It seems like R completely ignores my ALTREP functions in the second example. I feel this might be designed on purpose, but I do not understand the reason behind it. Is there any reason why we are not consistent here? Here is my session info sessionInfo() R Under development (unstable) (2020-09-03 r79126) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18362) Best, Jiefei [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Thread-safe R functions
You should assume that NO functions or macros in the R API are thread-safe. If some happen to be now, on some platforms, they are not guaranteed to be in the future. Even if you use a global lock you need to keep in mind that any function in the R API can signal an error and execute a longjmp, so you need to make sure you have set a top level context in your thread. Best, luke On Sun, 13 Sep 2020, Jiefei Wang wrote: Hi, I am curious about whether there exist thread-safe functions in `Rinternals.h`. I know that R is single-threaded designed, but for the simple and straightforward functions like `DATAPTR` and `INTEGER_GET_REGION`, are these functions safe to call in a multi-thread environment? Best, Jiefei [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Re: Operations with long altrep vectors cause segfaults on Windows
On Tue, 8 Sep 2020, Martin Maechler wrote: luke-tierney on Tue, 8 Sep 2020 09:42:43 -0500 (CDT) writes: > On Tue, 8 Sep 2020, Martin Maechler wrote: >>>>>>> Martin Maechler >>>>>>> on Tue, 8 Sep 2020 10:40:24 +0200 writes: >> >>>>>>> Hugh Parsonage >>>>>>> on Tue, 8 Sep 2020 18:08:11 +1000 writes: >> >> >> I can only reproduce on Windows, but reliably (both 4.0.0 and 4.0.2): >> >> >> $> R --vanilla >> >> x <- c(0L, -2e9:2e9) >> >> >> # > Segmentation fault >> >> >> Tried to reproduce on Linux but the above worked as expected. Not an >> >> issue merely with the length of the vector; for example, x <- >> >> rep_len(1:10, 1e10) works, though the altrep vector must be long to >> >> reproduce: >> >> >> x <- c(0L, -1e9:1e9) #ok >> >> >> Segmentation faults occur with the following too: >> >> >> x <- (-2e9:2e9) + 1L >> >> > Your operation would "need" (not in theory, but in practice) >> > to go from altrep to regular vectors. >> > I guess the segfault occurs because of something like this : >> >> > R asks Windows to hand it a huge amount of memory and Windows replies >> > "ok, here is the memory pointer" >> > and then R tries to write to there, but illegally (because >> > Windows should have told R that it does not really have enough >> > memory for that ..). >> >> > I cannot reproduce the segmentation fault .. but I can confirm >> > there is a bug there that shows for me on Windows but not on >> > Linux: >> >> > "My" Windows is on a terminalserver not with too many GB of memory >> > (but then in a version of Windows that recognizes that it cannot >> > get so much memory): >> >> > - Here some transcript (thanks to >> > using Emacs w/ ESS also on Windows) -- >> >> > R Under development (unstable) (2020-08-24 r79074) -- "Unsuffered Consequences" >> > Copyright (C) 2020 The R Foundation for Statistical Computing >> > Platform: x86_64-w64-mingw32/x64 (64-bit) >> >> > R ist freie Software und kommt OHNE JEGLICHE GARANTIE. >> > Sie sind eingeladen, es unter bestimmten Bedingungen weiter zu verbreiten. >> > Tippen Sie 'license()' or 'licence()' für Details dazu. >> >> > R ist ein Gemeinschaftsprojekt mit vielen Beitragenden. >> > Tippen Sie 'contributors()' für mehr Information und 'citation()', >> > um zu erfahren, wie R oder R packages in Publikationen zitiert werden können. >> >> > Tippen Sie 'demo()' für einige Demos, 'help()' für on-line Hilfe, oder >> > 'help.start()' für eine HTML Browserschnittstelle zur Hilfe. >> > Tippen Sie 'q()', um R zu verlassen. >> >> >> x <- (-2e9:2e9) + 1L >> > Fehler: kann Vektor der Größe 14.9 GB nicht allozieren >> >> y <- c(0L, -2e9:2e9) >> > Fehler: kann Vektor der Größe 14.9 GB nicht allozieren >> >> Sys.setenv(LANGUAGE="en") >> >> y <- c(0L, -2e9:2e9) >> > Error: cannot allocate vector of size 14.9 Gb >> >> y <- -1e9:4e9 >> >> .Internal(inspect(y)) >> > @0x195a6808 14 REALSXP g0c0 [REF(65535)] -10 : -294967296 (compact) >> >> .Machine$integer.max / 1e9 >> > [1] 2.147484 >> >> y <- -1e6:2.2e9 >> >> .Internal(inspect(y)) >> > @0x0a11a5d8 14 REALSXP g0c0 [REF(65535)] -100 : -2094967296 (compact) >> >> y <- -1e6:2e9 >> >> .Internal(inspect(y)) >> > @0x0a13adf0 13 INTSXP g0c0 [REF(65535)] -100 : 20 (compact) >> >> >> > - end of transcript --- >> >> > So indeed, no seg.fault, R notices that it can't get 15 GB of >> > memory. >> >> > But the bug is bad news: We have *silent* integer overflow happening >> > according to what .Internal(inspect(y)) shows... >> >> > less bad new: Probably the bug is only in the 'internal inspect' code
Re: [Rd] [External] Re: Operations with long altrep vectors cause segfaults on Windows
On Windows, things are fine as long as they remain (compacted aka 'ALTREP') INTSXP: > y <- -1e3:2e9 ;.Internal(inspect(y)) @0x0a285648 13 INTSXP g0c0 [REF(65535)] -1000 : 20 (compact) > y <- -1e3:2.1e9 ;.Internal(inspect(y)) @0x19925930 13 INTSXP g0c0 [REF(65535)] -1000 : 21 (compact) and here, y is correct, just the printing from .Internal(inspect(y)) is bugous (probably prints the double as an integer): It's a '%ld' that probably needs to be '%lld' for Windows. Will fix sometime soon. Best, luke > y <- -1e3:2.2e9 ; .Internal(inspect(y)) @0x195c0178 14 REALSXP g0c0 [REF(65535)] -1000 : -2094967296 (compact) > length(y) [1] 221001 > tail(y) [1] 2.2e+09 2.2e+09 2.2e+09 2.2e+09 2.2e+09 2.2e+09 > tail(y) - 2.2e9 [1] -5 -4 -3 -2 -1 0 > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Re: Operations with long altrep vectors cause segfaults on Windows
On Tue, 8 Sep 2020, Hugh Parsonage wrote: Thanks Martin. On further testing, it seems that the segmentation fault can only occur when the amount of obtainable memory is sufficiently high. On my machine (admittedly with other processes running): $ R --vanilla --max-mem-size=30G -e "x <- c(0L, -2e9:2e9)" Segmentation fault $ R --vanilla --max-mem-size=29G -e "x <- c(0L, -2e9:2e9)" Error: cannot allocate vector of size 14.9 Gb Execution halted Unfortunately I don't have access to a Windows machine with enough memory to get to the point of failure. If you have rtools and gdb installed can you run in gdb and see where the segfault is happening? Best, luke On Tue, 8 Sep 2020 at 18:52, Martin Maechler wrote: Martin Maechler on Tue, 8 Sep 2020 10:40:24 +0200 writes: Hugh Parsonage on Tue, 8 Sep 2020 18:08:11 +1000 writes: >> I can only reproduce on Windows, but reliably (both 4.0.0 and 4.0.2): >> $> R --vanilla >> x <- c(0L, -2e9:2e9) >> # > Segmentation fault >> Tried to reproduce on Linux but the above worked as expected. Not an >> issue merely with the length of the vector; for example, x <- >> rep_len(1:10, 1e10) works, though the altrep vector must be long to >> reproduce: >> x <- c(0L, -1e9:1e9) #ok >> Segmentation faults occur with the following too: >> x <- (-2e9:2e9) + 1L > Your operation would "need" (not in theory, but in practice) > to go from altrep to regular vectors. > I guess the segfault occurs because of something like this : > R asks Windows to hand it a huge amount of memory and Windows replies > "ok, here is the memory pointer" > and then R tries to write to there, but illegally (because > Windows should have told R that it does not really have enough > memory for that ..). > I cannot reproduce the segmentation fault .. but I can confirm > there is a bug there that shows for me on Windows but not on > Linux: > "My" Windows is on a terminalserver not with too many GB of memory > (but then in a version of Windows that recognizes that it cannot > get so much memory): > - Here some transcript (thanks to > using Emacs w/ ESS also on Windows) -- > R Under development (unstable) (2020-08-24 r79074) -- "Unsuffered Consequences" > Copyright (C) 2020 The R Foundation for Statistical Computing > Platform: x86_64-w64-mingw32/x64 (64-bit) > R ist freie Software und kommt OHNE JEGLICHE GARANTIE. > Sie sind eingeladen, es unter bestimmten Bedingungen weiter zu verbreiten. > Tippen Sie 'license()' or 'licence()' für Details dazu. > R ist ein Gemeinschaftsprojekt mit vielen Beitragenden. > Tippen Sie 'contributors()' für mehr Information und 'citation()', > um zu erfahren, wie R oder R packages in Publikationen zitiert werden können. > Tippen Sie 'demo()' für einige Demos, 'help()' für on-line Hilfe, oder > 'help.start()' für eine HTML Browserschnittstelle zur Hilfe. > Tippen Sie 'q()', um R zu verlassen. >> x <- (-2e9:2e9) + 1L > Fehler: kann Vektor der Größe 14.9 GB nicht allozieren >> y <- c(0L, -2e9:2e9) > Fehler: kann Vektor der Größe 14.9 GB nicht allozieren >> Sys.setenv(LANGUAGE="en") >> y <- c(0L, -2e9:2e9) > Error: cannot allocate vector of size 14.9 Gb >> y <- -1e9:4e9 >> .Internal(inspect(y)) > @0x195a6808 14 REALSXP g0c0 [REF(65535)] -10 : -294967296 (compact) >> .Machine$integer.max / 1e9 > [1] 2.147484 >> y <- -1e6:2.2e9 >> .Internal(inspect(y)) > @0x0a11a5d8 14 REALSXP g0c0 [REF(65535)] -100 : -2094967296 (compact) >> y <- -1e6:2e9 >> .Internal(inspect(y)) > @0x0a13adf0 13 INTSXP g0c0 [REF(65535)] -100 : 20 (compact) >> > - end of transcript --- > So indeed, no seg.fault, R notices that it can't get 15 GB of > memory. > But the bug is bad news: We have *silent* integer overflow happening > according to what .Internal(inspect(y)) shows... > less bad new: Probably the bug is only in the 'internal inspect' code > where a format specifier is used in C's printf() that does not work > correctly on Windows, at least the way it is currently compiled .. > On (64-bit) Linux, I get >> y <- -1e9:4e9 ; .Internal(inspect(y)) > @7d86388 14 REALSXP g0c0 [REF(65535)] -10 : 40 (compact)
Re: [Rd] [External] Re: some questions about R internal SEXP types
On Tue, 8 Sep 2020, Hadley Wickham wrote: On Tue, Sep 8, 2020 at 4:12 AM Tomas Kalibera wrote: The general principle is that R packages are only allowed to use what is documented in the R help (? command) and in Writing R Extensions. The former covers what is allowed from R code in extensions, the latter mostly what is allowed from C code in extensions (with some references to Fortran). Could you clarify what you mean by "documented"? For example, Rf_allocVector() is mentioned several times in R-exts, but I don't see anywhere where the inputs and output are precisely described (which is what I would consider to be documented). Is Rf_allocVector() part of the API? For now, documented means mentioned as something extension writers can use. Details are in the header files, Rinternals.h for Rf_allocVector(). Ideally someone would find the time to refactor the header files, Rinternals.h in particular, so everything in installed headers is considered in the API and everything else is considered private and subject to change. Unfortunately that would take a lot of effort, both technical and political, and I don't see it happening soon. But I'm happy to be proved wrong. Best, luke Hadley -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] HELPWANTED keyword in bugs.r-project.org
Just a quick note to mention that we have added a HELPWANTED keyword on bugs.r-project.org for tagging bugs and issues where a good well-tested patch would be particularly appreciated. You can find the HELPWANTED issues by selecting the keyword in the search interface or at https://bugs.r-project.org/bugzilla/buglist.cgi?keywords=HELPWANTED This URL shows both open and resolved HELPWANTED issues. At the moment only a handful of issues have been tagged, but there will be more over time. One of these may be a good place to start if you are looking for ways to contribute. The techincal level varies; some might be resolved with a small amount of R code; others might need more extensive changes at the C level. Best, luke -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Re: Invisible names problem
On Wed, 22 Jul 2020, Simon Urbanek wrote: Very interesting: .Internal(inspect(k[i])) @10a4bc000 14 REALSXP g0c7 [ATT] (len=2, tl=0) 1,2,3,4,1,... ATTRIB: @7fa24f07fa58 02 LISTSXP g0c0 [REF(1)] TAG: @7fa24b803e90 01 SYMSXP g0c0 [MARK,REF(5814),LCK,gp=0x6000] "names" (has value) @10a4e4000 16 STRSXP g0c7 [REF(1)] (len=2, tl=0) @7fa24ba575c8 09 CHARSXP g0c1 [MARK,REF(35005),gp=0x61] [ASCII] [cached] "a" @7fa24be24428 09 CHARSXP g0c1 [MARK,REF(35010),gp=0x61] [ASCII] [cached] "b" @7fa24b806ec0 09 CHARSXP g0c1 [MARK,REF(35082),gp=0x61] [ASCII] [cached] "c" @7fa24bcc6af0 09 CHARSXP g0c1 [MARK,REF(35003),gp=0x61] [ASCII] [cached] "d" @7fa24ba575c8 09 CHARSXP g0c1 [MARK,REF(35005),gp=0x61] [ASCII] [cached] "a" ... .Internal(inspect(unname(k[i]))) @10a50c000 14 REALSXP g0c7 [] (len=2, tl=0) 1,2,3,4,1,... .Internal(inspect(x2)) @7fa24fc692d8 14 REALSXP g0c0 [REF(1)] wrapper [srt=-2147483648,no_na=0] @10a228000 14 REALSXP g0c7 [REF(1),ATT] (len=2, tl=0) 1,2,3,4,1,... ATTRIB: @7fa24fc69850 02 LISTSXP g0c0 [REF(1)] TAG: @7fa24b803e90 01 SYMSXP g0c0 [MARK,REF(5797),LCK,gp=0x4000] "names" (has value) @10a25 16 STRSXP g0c7 [REF(65535)] (len=2, tl=0) @7fa24ba575c8 09 CHARSXP g0c1 [MARK,REF(10005),gp=0x61] [ASCII] [cached] "a" @7fa24be24428 09 CHARSXP g0c1 [MARK,REF(10010),gp=0x61] [ASCII] [cached] "b" @7fa24b806ec0 09 CHARSXP g0c1 [MARK,REF(10077),gp=0x61] [ASCII] [cached] "c" @7fa24bcc6af0 09 CHARSXP g0c1 [MARK,REF(10003),gp=0x61] [ASCII] [cached] "d" @7fa24ba575c8 09 CHARSXP g0c1 [MARK,REF(10005),gp=0x61] [ASCII] [cached] "a" ... If you don't assign the intermediate result things are simple as R knows there are no references so the names can be simply removed. However, if you assign the result that is not possible as there is still the reference in x2 at the time when unname() creates its own local temporary variable obj to do what probably most of us would use which is names(obj) <- NULL (i.e. names(x2) <- NULL avoids that problem.since you don't need both x2 and obj). To be precise, when you use unname() on an assigned object, R has to technically keep two copies - one for the existing x2 and a second in unname() for obj so it can call names(obj)<-NULL for the modification. To avoid that R instead creates a wrapper for the original x2 which says "like x2 but names are NULL". The rationale is that for large vector it is better to keep records of metadata changes rather than duplicating the object. This way the vector is stored only once. However, as you blow way the original x2, all that is left is k[I] with the extra information "don't use the names". Unfortunately, R cannot know that you will eventually only keep the version without the names - at which point it could strip the names since they are not referenced anymore. I'm not sure what is the best solution here. In theory, if the wrapper found out that the object it is wrapping has no more references it could remove the names, but I'm sure that would only solve some cases (what if you duplicated the wrapper and thus there were multiple wrappers referencing it?) and not sure if it has a way to find out. The other way to deal with that would be at serialization time if it could be detected such that it can remove the wrapper. Since the intersection of serialization experts and ALTREP experts is exactly one, I'll leave it to that set to comment further ;). Currently the wrapper serialization mechanism just serializes the wrapped object and unserialize re-wraps it at the other end. If there is only one reference to the wrapped value then we know the attributes can't be accessed from the R level anymore, so it would be safe to remove the attributes before passing it off for serializing. Unless I'm missing something that would be an easy change. But it would be good to know if it would really make a difference in realistic situations. [Dropping attributes could be done at other times as well if there is only one reference, e.g. on accessing the data, but that is not likely to be worth while within a single R session.] If there is more than one reference to the wrapped object, then things is more complicated. We could duplicate the payload and send that off for serialization (and install it in the wrapper), but that could be a bad idea of the object is large. A tighter integration of ALTREP serialization with the serialization internals might allow and ALTREP's serialization method to write directly to the serialization stream, but that would make things much harder to maintain. Best, luke Cheers, Simon On Jul 23, 2020, at 07:29, Pan Domu wrote: I ran into strange behavior when removing names
Re: [Rd] [External] Re: R-devel internal errors during check produce?
Thanks. Fixed in R-devel in r78754. This was related to a fix for PR#17809, not the change to unique.default. Best, luke On Tue, 30 Jun 2020, Jan Gorecki wrote: No packages are being loaded, or even installed. Did you try running the example on R-devel built with flags I have provided in this email? I checked now and it is required to use --enable-strict-barrier to reproduce the issue. On Tue, Jun 30, 2020 at 9:02 AM Martin Maechler wrote: Kurt Hornik on Tue, 30 Jun 2020 06:20:57 +0200 writes: Jan Gorecki writes: >> Thank you both, You are absolutely correct that example >> should be minimal, so here it is. >> l = list(a=new.env(), b=new.env()) unique(l) >> Just for completeness, env_list during check that raises >> error >> env_list <- list(baseenv(), >> as.environment("package:graphics"), >> as.environment("package:stats"), >> as.environment("package:utils"), >> as.environment("package:methods") ) >> unique(env_list) > Thanks ... but the above work fine for me. E.g., R> l = list(a=new.env(), b=new.env()) R> unique(l) > [[1]] > [[2]] > Best -k Ditto here; also your (Jan) 2nd example works fine. So, you must have loaded some (untidy) packages / code which redefine standard base R behavior ? Martin ______ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Possible ABI change in R 4.0.1
EXTPTR_PTR is not in the API so it is not guaranteed to even exist in the future. The API function for accessing the pointer address is R_ExternalPtrAddr. See Section 5.13 in WRE. Sometimes internals need to be changed, In this case a change was made to deal with a segfault; the commit notice tells you the PR this addressed. As it says in Writing R Extensions about defining USE_RINTERNALS: Also be prepared to adjust your code should R internals change. The same goes for any use of non-API macros and functions. Best, luke On Mon, 29 Jun 2020, Gábor Csárdi wrote: Hi all, it seems that from R 4.0.1 EXTPTR_PTR can be either a macro or a function, depending on whether USE_RINTERNALS is requested. Jeroen helped me find that this was in 78592: https://github.com/wch/r-source/commit/c634fec5214e73747b44d7c0e6f047fefe44667d This is a problem, because binary packages that are built on R 4.0.1 or R 4.0.2 will potentially not load on R 4.0.0, if they use the EXTPTR_PTR function. E.g. this is R 4.0.0 on Linux: library(Rcpp) Error: package or namespace load failed for ‘Rcpp’ in dyn.load(file, DLLpath = DLLpath, ...): unable to load shared object '/usr/local/lib/R/library/Rcpp/libs/Rcpp.so': Error relocating /usr/local/lib/R/library/Rcpp/libs/Rcpp.so: EXTPTR_PTR: symbol not found In addition: Warning message: package ‘Rcpp’ was built under R version 4.0.1 It is easiest to reproduce this on Windows, because the CRAN binaries are now built on R 4.0.2, so if you install Rcpp on R 4.0.0 from CRAN, and try to load it you'll get: library(Rcpp) Error: package or namespace load failed for 'Rcpp' in inDL(x, as.logical(local), as.logical(now), ...): unable to load shared object 'C:/Users/csard/R/win-library/4.0/Rcpp/libs/x64/Rcpp.dll': LoadLibrary failure: The specified procedure could not be found. In addition: Warning message: package 'Rcpp' was built under R version 4.0.2 I suppose this change was not intended? Best, Gabor __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Change in lapply's missing argument passing
Yes, to resolve https://bugs.r-project.org/bugzilla/show_bug.cgi?id=15199 Best, luke On Fri, 26 Jun 2020, William Dunlap via R-devel wrote: Consider the following expression, in which we pass 'i=', with no value given for the 'i' argument, to lapply. lapply("x", function(i, j) c(i=missing(i),j=missing(j), i=) From R-2.14.0 (2011-10-31) through R-3.4.4 (2018-03-15) this evaluated to c(i=TRUE, j=FALSE). From R-3.5.0 (2018-04-23) through R-4.0.0 (2020-04-24) this evaluated to c(i=FALSE, j=TRUE). Was this change intentional? Bill Dunlap TIBCO Software wdunlap tibco.com [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Unexpected Error Handling by Generic in R 4.0.1
Thanks for the report. This is due to a change restoring behavior that was disabled temporarily to work around a bug (https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16111). So it is again working as originally designed. There are a number of places in the S4 dispatch code where errors are caught and re-signaled with some additional information about the dispatch context that might be helpful. Unfortunately all that is retained from the original error is the message. It would be better to signal a structured error object that includes the original error in a slot. Some options: Create and signal a structured error wrapping the original error in these cases. Revert the argument evaluation to not wrap errors. Drop wrapping of error from all other cases. I don't have strong views on which way to go. But wrapping and re-signaling from C would take a decent amount of effort so isn't likely to happen without someone contributing a well-tested patch. Best, luke On Thu, 25 Jun 2020, Matthew Carlucci wrote: Hello R-devel community, I posted a new R 4.0.1 behaviour to stack overflow (https://stackoverflow.com/questions/62327810/inconsistent-error-handling-of-function-and-s4-generics-on-r-4-0-1), where I think it is an undesired or unexpected change in 4.0.1. Attributes of errors seem to be lost or obscured when encountered in an S4 generic context. An example of this being undesirable comes in shiny applications where my_reactive (an unevaluated reactive object) returns a shiny.silent.error attribute which is lost upon error within an S4 generic function. The lack of this attribute causes the entire application to exit with an error (with no stack trace available). For example, within a shiny context: foo <- try(nrow(my_reactive())) attr(foo,"condition") Where the S4 generic returns: bar <- try(BiocGenerics::nrow(my_reactive())) attr(bar,"condition") From what I can tell from the release notes of 4.0.1, this does not appear to be an expected breaking change so I am hesitant to update old code and shiny applications to account for this behaviour. Any guidance would be appreciated. Thank you, Matthew Carlucci CONFIDENTIALITY NOTICE: This e-mail message, including a...{{dropped:18}} __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] numericDeriv alters result of eval in R 4.0.1
The eval() call could also throw an error that would leave the input environment modified. Better change along the lines described in the bug report at https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17831 Best, luke On Tue, 16 Jun 2020, Raimundo Neto wrote: Dear all As far as I could trace, looking at the function C function numeric_deriv, this unwanted behavior comes from the inner most loop in, at the very end of the function, for(i = 0, start = 0; i < LENGTH(theta); i++) { for(j = 0; j < LENGTH(VECTOR_ELT(pars, i)); j++, start += LENGTH(ans)) { SEXP ans_del; double origPar, xx, delta; origPar = REAL(VECTOR_ELT(pars, i))[j]; xx = fabs(origPar); delta = (xx == 0) ? eps : xx*eps; REAL(VECTOR_ELT(pars, i))[j] += rDir[i] * delta; PROTECT(ans_del = eval(expr, rho)); if(!isReal(ans_del)) ans_del = coerceVector(ans_del, REALSXP); UNPROTECT(1); for(k = 0; k < LENGTH(ans); k++) { if (!R_FINITE(REAL(ans_del)[k])) error(_("Missing value or an infinity produced when evaluating the model")); REAL(gradient)[start + k] = rDir[i] * (REAL(ans_del)[k] - REAL(ans)[k])/delta; } REAL(VECTOR_ELT(pars, i))[j] = origPar; } } Maybe a (naive?) fix is change the if statement in the inner most loop to if (!R_FINITE(REAL(ans_del)[k])) { REAL(VECTOR_ELT(pars, i))[j] = origPar; error(_("Missing value or an infinity produced when evaluating the model")); } Regards, Raimundo Neto Em ter., 16 de jun. de 2020 às 11:31, escreveu: Thanks; definitely a bug. I've submitted it to the bug tracker at https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17831 Best, luke On Mon, 15 Jun 2020, Raimundo Neto wrote: > Dear R developers, > > I've run into a weird behavior of the numericDeriv function (from the stats > package) which I also posted on StackOverflow (question has same title as > this email, except for the version of R). > > Running the code bellow we can see that the numericDeriv function gives an > error as the derivative of x^a wrt a is x^a * log(x) and log is not defined > for negative numbers. However, seems like the function changes the value of > env1$a from 3 to 3.00044703483581543. If x is a vector of positive > values numericDeriv function completes the task without errors and env1$a > remains unchanged as expected. > > This happened to me running R 4.0.1 on Ubuntu 20.04 and also to another > StackOverflow user using running the same version of R on Windows 10. I > wonder, is this an intended behavior of the function or really a bug? > > options(digits=22) > env1 = new.env() > env1$x = rnorm(10) > env1$a = 3 > eval(quote(x^a), env1) > numericDeriv(quote(x^a), "a", env1) > eval(quote(x^a), env1) > env1$a > > Thank you! > Raimundo Neto > > [[alternative HTML version deleted]] > > ______ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] numericDeriv alters result of eval in R 4.0.1
Thanks; definitely a bug. I've submitted it to the bug tracker at https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17831 Best, luke On Mon, 15 Jun 2020, Raimundo Neto wrote: Dear R developers, I've run into a weird behavior of the numericDeriv function (from the stats package) which I also posted on StackOverflow (question has same title as this email, except for the version of R). Running the code bellow we can see that the numericDeriv function gives an error as the derivative of x^a wrt a is x^a * log(x) and log is not defined for negative numbers. However, seems like the function changes the value of env1$a from 3 to 3.00044703483581543. If x is a vector of positive values numericDeriv function completes the task without errors and env1$a remains unchanged as expected. This happened to me running R 4.0.1 on Ubuntu 20.04 and also to another StackOverflow user using running the same version of R on Windows 10. I wonder, is this an intended behavior of the function or really a bug? options(digits=22) env1 = new.env() env1$x = rnorm(10) env1$a = 3 eval(quote(x^a), env1) numericDeriv(quote(x^a), "a", env1) eval(quote(x^a), env1) env1$a Thank you! Raimundo Neto [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Re: use of the tcltk package crashes R 4.0.1 for Windows
I've committed the change to use Free instead of free in tcltk.c and sys-std.c (r78652 for R-devel, r78653 for R-patched). We might consider either moving Calloc/Free out of the Windows remapping or moving the remapping into header files so everything seeing our header files uses our calloc/free. Either would be less brittle that the current status. Best, luke On Sun, 7 Jun 2020, peter dalgaard wrote: On 7 Jun 2020, at 18:59 , Jeroen Ooms wrote: On Sun, Jun 7, 2020 at 5:53 PM wrote: On Sun, 7 Jun 2020, peter dalgaard wrote: So this wasn't tested for a month? Anyways, Free() is just free() with a check that we're not freeing a null pointer, followed by setting the pointer to NULL. At that point of tcltk.c, we have for (objc = i = 0; i < length(avec); i++){ const char *s; char *tmp; if (!isNull(nm) && strlen(s = translateChar(STRING_ELT(nm, i{ // tmp = calloc(strlen(s)+2, sizeof(char)); tmp = Calloc(strlen(s)+2, char); *tmp = '-'; strcpy(tmp+1, s); objv[objc++] = Tcl_NewStringObj(tmp, -1); free(tmp); } if (!isNull(t = VECTOR_ELT(avec, i))) objv[objc++] = (Tcl_Obj *) R_ExternalPtrAddr(t); } and I can't see how tmp can be NULL at the free(), nor can I see it mattering if it is not set to NULL (notice that it goes out of scope with the for loop). Right. And the calloc->Calloc change doesn't look like an issue either -- just checking for a NULL. If the crash is happening in free() then that most likely means corrupted malloc data structures. Unfortunately that could be happening anywhere. Writing R extensions, section 6.1.2 says: "Do not assume that memory allocated by Calloc/Realloc comes from the same pool as used by malloc: in particular do not use free or strdup with it." I think the reason is that R uses dlmalloc for Calloc on Windows: https://github.com/wch/r-source/blob/c634fec5214e73747b44d7c0e6f047fefe44667d/src/main/memory.c#L94-L103 But that section #defines calloc and free to Rm_... counterparts in lockstep? (I assume that is where dlmalloc comes in?) Anyways, does it actually work to change free() to Free()? If so, then all this post mortem analysis is rather a moot point. -pd -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Re: use of the tcltk package crashes R 4.0.1 for Windows
There is one other possibility: It may be that the calloc/free pair picked up by the tcltk package DLL is different from the pair picked up when building base R. (We provide our own malloc framework, but if the macros aren't quite right it may be that the system malloc is picked up in some cases). In that case using Calloc and free would be mismatching the malloc systems and probably segfault. If that is indeed happening we should fix it, but using Free with Calloc should cure the immediate symptom. Best, luke On Sun, 7 Jun 2020, luke-tier...@uiowa.edu wrote: On Sun, 7 Jun 2020, peter dalgaard wrote: So this wasn't tested for a month? Anyways, Free() is just free() with a check that we're not freeing a null pointer, followed by setting the pointer to NULL. At that point of tcltk.c, we have for (objc = i = 0; i < length(avec); i++){ const char *s; char *tmp; if (!isNull(nm) && strlen(s = translateChar(STRING_ELT(nm, i{ // tmp = calloc(strlen(s)+2, sizeof(char)); tmp = Calloc(strlen(s)+2, char); *tmp = '-'; strcpy(tmp+1, s); objv[objc++] = Tcl_NewStringObj(tmp, -1); free(tmp); } if (!isNull(t = VECTOR_ELT(avec, i))) objv[objc++] = (Tcl_Obj *) R_ExternalPtrAddr(t); } and I can't see how tmp can be NULL at the free(), nor can I see it mattering if it is not set to NULL (notice that it goes out of scope with the for loop). Right. And the calloc->Calloc change doesn't look like an issue either -- just checking for a NULL. If the crash is happening in free() then that most likely means corrupted malloc data structures. Unfortunately that could be happening anywhere. Best bet to narrow this down is for someone with a good Windows setup who can reproduce this to bisect the svn commits and see at what commit this started happening. Unfortunately my office Windows machine isn't responding and it will probably take some time to get that fixed. Best, luke -pd On 7 Jun 2020, at 16:00 , Jeroen Ooms wrote: On Sun, Jun 7, 2020 at 3:13 AM Fox, John wrote: Hi, The following code, from the examples in ?TkWidgets , immediately crashes R 4.0.1 for Windows: - snip library("tcltk") tt <- tktoplevel() label.widget <- tklabel(tt, text = "Hello, World!") button.widget <- tkbutton(tt, text = "Push", command = function()cat("OW!\n")) tkpack(label.widget, button.widget) # geometry manager - snip I can reproduce this. The backtrace shows the crash happens in dotTclObjv [/src/library/tcltk/src/tcltk.c@243 ]. This looks like a bug that was introduced by commit 78408/78409 about a month ago. I think the problem is that this commit changes 'calloc' to 'Calloc' without changing the corresponding 'free' to 'Free'. This has nothing to do with the Windows build or installation. Nothing has changed in the windows build procedure between 4.0.0 and 4.0.1. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Re: use of the tcltk package crashes R 4.0.1 for Windows
On Sun, 7 Jun 2020, peter dalgaard wrote: So this wasn't tested for a month? Anyways, Free() is just free() with a check that we're not freeing a null pointer, followed by setting the pointer to NULL. At that point of tcltk.c, we have for (objc = i = 0; i < length(avec); i++){ const char *s; char *tmp; if (!isNull(nm) && strlen(s = translateChar(STRING_ELT(nm, i{ // tmp = calloc(strlen(s)+2, sizeof(char)); tmp = Calloc(strlen(s)+2, char); *tmp = '-'; strcpy(tmp+1, s); objv[objc++] = Tcl_NewStringObj(tmp, -1); free(tmp); } if (!isNull(t = VECTOR_ELT(avec, i))) objv[objc++] = (Tcl_Obj *) R_ExternalPtrAddr(t); } and I can't see how tmp can be NULL at the free(), nor can I see it mattering if it is not set to NULL (notice that it goes out of scope with the for loop). Right. And the calloc->Calloc change doesn't look like an issue either -- just checking for a NULL. If the crash is happening in free() then that most likely means corrupted malloc data structures. Unfortunately that could be happening anywhere. Best bet to narrow this down is for someone with a good Windows setup who can reproduce this to bisect the svn commits and see at what commit this started happening. Unfortunately my office Windows machine isn't responding and it will probably take some time to get that fixed. Best, luke -pd On 7 Jun 2020, at 16:00 , Jeroen Ooms wrote: On Sun, Jun 7, 2020 at 3:13 AM Fox, John wrote: Hi, The following code, from the examples in ?TkWidgets , immediately crashes R 4.0.1 for Windows: - snip library("tcltk") tt <- tktoplevel() label.widget <- tklabel(tt, text = "Hello, World!") button.widget <- tkbutton(tt, text = "Push", command = function()cat("OW!\n")) tkpack(label.widget, button.widget) # geometry manager - snip I can reproduce this. The backtrace shows the crash happens in dotTclObjv [/src/library/tcltk/src/tcltk.c@243 ]. This looks like a bug that was introduced by commit 78408/78409 about a month ago. I think the problem is that this commit changes 'calloc' to 'Calloc' without changing the corresponding 'free' to 'Free'. This has nothing to do with the Windows build or installation. Nothing has changed in the windows build procedure between 4.0.0 and 4.0.1. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Re: Surpising behavior when using an active binding as loop index in R 4.0.0
On Sun, 24 May 2020, Deepayan Sarkar wrote: A shorter reproducible example: example(makeActiveBinding) for (fred in 1:3) { 0 } ls() Both problems go away if you first do compiler::enableJIT(2) So looks like a bug in compiling the for loop. Not in compiling but in the byte code interpreter. It was not handling active bindings for the loop variable properly. This was fixed yesterday in R--devel and R-patched, so will be fixed in R 4.0.1. Best, luke -Deepayan On Sat, May 23, 2020 at 5:45 PM Thomas Friedrichsmeier via R-devel wrote: Possibly just a symptom of the earlier behavior, but I'll amend my example, below, with an even more disturbing observation: Am Sat, 23 May 2020 13:19:24 +0200 schrieb Thomas Friedrichsmeier via R-devel : [...] Consider the code below: makeActiveBinding("i", function(value) { if (missing(value)) { x } else { print("set") x <<- value } }, globalenv()) i <- 1 # output "set" print(i) # output [1] 1 # Surprising behavior starts here: for(i in 2:3) print(i) # output [1] "set" #NULL #NULL print(i) # output NULL print(x) # output NULL i <- 4 # output "set" print(i) # ouput [1] 4 print(x) # ouput [1] 4 ls() # Error in ls() : # Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'NULL' Regards Thomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel