from:"Ivan Krylov via R\-devel"

Re: [Rd] API for converting LANGSXP to LISTSXP?

2024-07-06 Thread Ivan Krylov via R-devel

On Fri, 5 Jul 2024 15:27:50 +0800
Kevin Ushey  wrote:

> A common idiom in the R sources is to convert objects between LANGSXP
> and LISTSXP by using SET_TYPEOF. However, this is soon going to be
> disallowed in packages.

Would you mind providing an example where a package needs to take an
existing LISTSXP and convert it to a LANGSXP (or vice versa)? I think
that Luke Tierney intended to replace the uses of
SET_TYPEOF(allocList(...), LANGSXP) with allocLang(...).

At least it's easy to manually convert between the two by replacing the
head of the list using LCONS(CAR(list), CDR(list)) or CONS(CAR(lang),
CDR(lang)): in a call, the rest of the arguments are ordinary LISTSXPs.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] R FAQ 2.6, 7.21

2024-07-04 Thread Ivan Krylov via R-devel

Hello R-devel,

I would like to suggest a couple of updates for the R FAQ.

https://CRAN.R-project.org/bin/linux/suse is currently empty and the
directory has mtime from 2012, so it probably doesn't help to reference
it in FAQ 2.6.

There seems to be increased interest in using variables as variable
names [1,2], so it might be useful to expand 7.21 a little. Can an R
FAQ entry link to R-intro section 6.1?

Index: doc/manual/R-FAQ.texi
===
--- doc/manual/R-FAQ.texi   (revision 86871)
+++ doc/manual/R-FAQ.texi   (working copy)
@@ -503,9 +503,6 @@
 @abbr{RPM}s for @I{RedHat Enterprise Linux} and compatible distributions (e.g.,
 @I{Centos}, Scientific Linux, Oracle Linux).
 
-See @url{https://CRAN.R-project.org/bin/linux/suse/README.html} for
-information about @abbr{RPM}s for openSUSE.
-
 No other binary distributions are currently publicly available via
 @CRAN{}.
 
@@ -2624,8 +2621,31 @@
 @end example
 
 @noindent
-without any of this messing about.
+without any of this messing about. This becomes especially true if you
+are finding yourself creating and trying to programmatically access
+groups of related variables such as @code{result1}, @code{result2},
+@code{result3}, and so on: instead of fighting against the language to
+use
 
+@example
+# 'i'th result <- process('i'th dataset)
+assign(paste0("result", i), process(get(paste0("dataset", i
+@end example
+
+it is much easier to put the related variables in lists and use
+
+@example
+result[[i]] <- process(dataset[[i]])
+@end example
+
+and, eventually,
+
+@example
+result <- lapply(dataset, process)
+@end example
+
+which is easy to replace with @code{parLapply} for parallel processing.
+
 @node Why do lattice/trellis graphics not work?
 @section Why do lattice/trellis graphics not work?
 


-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Large vector support in data.frames

2024-07-02 Thread Ivan Krylov via R-devel

В Wed, 19 Jun 2024 09:52:20 +0200
Jan van der Laan  пишет:

> What is the status of supporting long vectors in data.frames (e.g. 
> data.frames with more than 2^31 records)? Is this something that is 
> being worked on? Is there a time line for this? Is this something I
> can contribute to?

Apologies if you've already received a better answer off-list.

>From from my limited understanding, the problem with supporting
larger-than-(2^31-1) dimensions has multiple facets:

 - In many parts of R code, there's the assumption that dim() is
   of integer type. That wouldn't be a problem by itself, except...

 - R currently lacks a native 64-bit integer type. About a year ago
   Gabe Becker mentioned that Luke Tierney has been considering
   improvements in this direction, but it's hard to introduce 64-bit
   integers without making the user worry even more about data types
   (numeric != integer != 64-bit integer) or introducing a lot of
   overhead (64-bit integers being twice as large as 32-bit ones and,
   depending on the workload, frequently redundant).

 - Two-dimensional objects eventually get transformed into matrices and
   handed to LAPACK for linear algebra operations. Currently, the
   interface used by R to talk to BLAS and LAPACK only supports 32-bit
   signed integers for lengths. 64-bit BLASes and LAPACKs do exist
   (e.g. OpenBLAS can be compiled with 64-bit lengths), but we haven't
   taught R to use them.

   (This isn't limited to array dimensions, by the way. If you try to
   svd() a 4 by 4 matrix, it'll try to ask for temporary memory
   with length that overflows a signed 32-bit integer, get a much
   shorter allocation instead, promptly overflow the buffer and
   crash the process.)

As you see, it's interconnected; work on one thing will involve the
other two.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Making use of win32_segv

2024-06-30 Thread Ivan Krylov via R-devel

Hello R-devel,

When checking packages on Windows, a crash of the process looks like a
sudden stop in the output of the child process, which can be very
perplexing for package maintainers (e.g. [1,2]), especially if the
stars are only right on Win-Builder but not on the maintainer's PC.

On Unix-like systems, we get a loud message from the SIGSEGV handler
and (if we're lucky and the memory manager is still mostly intact) an
R-level traceback. A similar signal handler, win32_segv(), is defined
in src/main.c for use on Windows, but I am not seeing a way it could be
called in the current codebase. The file src/gnuwin32/psignal.c that's
responsible for signals on Windows handles Ctrl+C but does not emit
SIGSEGV or SIGILL. Can we make use of vectored exception handling [3]
to globally catch unhandled exceptions in the Win32 process and
transform them into raise(SIGSEGV)?

One potential source of problems is threading. The normal Unix-like
sigactionSegv(...) doesn't care; if a non-main thread causes a crash
with SIGSEGV unblocked, it will run in that thread's context and call R
API from there. On Windows, where a simple REprintf() may go into a GUI
window, the crash handler may be written in a more cautious manner:

 - Only set up the crash handler in Rterm, because this is mostly for
   the benefit of the people reading the R CMD check output
 - Compare GetCurrentThreadId() against a saved value and don't call
   R_Traceback() if it doesn't match
 - Rewrite win32_segv in terms of StringCcbPrintf to static storage and
   WriteFile(GetStdHandle(STD_ERROR_HANDLE), ...), which may be too much

Attached is a crude first draft to see if the approach is viable. If it
turns out to be a good idea, I can add the Rterm or thread ID checks, a
reentrancy guard, declare and export a special struct win32_segvinfo
from psignal.c, put all the crash reporting in win32_segv(), and move
the VEH setup into psignal.c's signal(). (Just don't want to waste the
effort if this proves ill-advised.)

Without the patch:
User@WIN-LGTSPJA3F1V MSYS /c/R/R-svn/src/gnuwin32
$ cat crash.c
void crash(void) { *(double*)42 = 42; }

User@WIN-LGTSPJA3F1V MSYS /c/R/R-svn/src/gnuwin32
$ ../../bin/R -q -s -e 'dyn.load("crash.dll"); .C("crash")'
Segmentation fault # <-- printed by MSYS2 shell

With the patch:
User@WIN-LGTSPJA3F1V MSYS /c/R/R-svn/src/gnuwin32
$ ../../bin/R -q -s -e 'dyn.load("crash.dll"); .C("crash")'
*** caught access violation at program counter 0x7ff911c61387 ***
accessing address 0x002a, action: write

Traceback:
 1: .C("crash")
Segmentation fault

With the patch applied, I am not seeing changes in make check-devel or
package checks for V8. Couldn't test rJava yet. 

-- 
Best regards,
Ivan

[1]
https://stat.ethz.ch/pipermail/r-package-devel/2024q2/010919.html

[2]
https://stat.ethz.ch/pipermail/r-package-devel/2024q2/010872.html

[3]
https://learn.microsoft.com/en-us/windows/win32/debug/vectored-exception-handling
Index: src/gnuwin32/sys-win32.c
===
--- src/gnuwin32/sys-win32.c(revision 86850)
+++ src/gnuwin32/sys-win32.c(working copy)
@@ -26,6 +26,8 @@
 #include 
 #endif
 
+#include 
+
 #include 
 #include 
 #include 
@@ -384,3 +386,69 @@
return rval;
 }
 }
+
+static WINAPI LONG veh_report_and_raise(PEXCEPTION_POINTERS ei)
+{
+int signal = 0;
+const char *exception = 0;
+switch (ei->ExceptionRecord->ExceptionCode) {
+case EXCEPTION_ILLEGAL_INSTRUCTION:
+   exception = "illegal instruction";
+   signal = SIGILL;
+   break;
+case EXCEPTION_ACCESS_VIOLATION:
+   exception = "access violation";
+   signal = SIGSEGV;
+   break;
+case EXCEPTION_ARRAY_BOUNDS_EXCEEDED:
+   exception = "array bounds overflow";
+   signal = SIGSEGV;
+   break;
+case EXCEPTION_DATATYPE_MISALIGNMENT:
+   exception = "datatype misalignment";
+   signal = SIGSEGV;
+   break;
+case EXCEPTION_IN_PAGE_ERROR:
+   exception = "page load failure";
+   signal = SIGSEGV;
+   break;
+case EXCEPTION_PRIV_INSTRUCTION:
+   exception = "privileged instruction";
+   signal = SIGILL;
+   break;
+case EXCEPTION_STACK_OVERFLOW:
+   exception = "stack overflow";
+   signal = SIGSEGV;
+   break;
+default: /* do nothing */ ;
+}
+if (signal) {
+   REprintf("*** caught %s at program counter %p ***\n",
+exception, ei->ExceptionRecord->ExceptionAddress);
+   /* just two more special cases */
+   switch (ei->ExceptionRecord->ExceptionCode) {
+   case EXCEPTION_ACCESS_VIOLATION:
+   case EXCEPTION_IN_PAGE_ERROR:
+   {
+   const char * action;
+   switch (ei->ExceptionRecord->ExceptionInformation[0]) {
+   case 0: action = "read"; break;
+   case 1: action = "write"; break;
+   case 8: action = "execute"; break;
+   default:

Re: [Rd] write.csv problems

2024-06-28 Thread Ivan Krylov via R-devel

В Fri, 28 Jun 2024 11:02:12 -0500
Spencer Graves  пишет:

> df1 <- data.frame(x=1)
> class(df1) <- c('findFn', 'data.frame')
> write.csv(df1, 'df1.csv')
> # Error in x$Package : $ operator is invalid for atomic vectors

Judging by the traceback, only data frames that have a Package column
should have a findFn class:

9: PackageSummary(xi)
8: `[.findFn`(x, needconv)
7: x[needconv]
6: lapply(x[needconv], as.character)
5: utils::write.table(df1, "df1.csv", col.names = NA, sep = ",",
   dec = ".", qmethod = "double")

write.table sees columns that aren't of type character yet and tries to
convert them one by one, subsetting the data frame as a list. The call
lands in sos:::`[.findFn`

if (missing(j)) {
xi <- x[i, ]
attr(xi, "PackageSummary") <- PackageSummary(xi)
class(xi) <- c("findFn", "data.frame")
return(xi)
}

Subsetting methods are hard. For complex structures like data.frames,
`[.class` must handle all of x[rows,cols]; x[rows,]; x[,cols];
x[columns]; x[], and also respect the drop argument:
https://stat.ethz.ch/pipermail/r-help/2021-December/473207.html

I think that the `[.findFn` method mistakes x[needconv] for
x[needconv,] when it should instead perform x[,needconv].

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Fixing a CRAN note

2024-06-26 Thread Ivan Krylov via R-devel

26 июня 2024 г. 16:42:39 GMT+03:00, "Therneau, Terry M., Ph.D. via R-devel" 
 пишет:
>What is it complaining about -- that it doesn't like my name?

>* checking CRAN incoming feasibility ... [7s/18s] NOTE
>Maintainer: ‘Terry Therneau ’
>
>Found the following \keyword or \concept entries
>which likely give several index terms:
>   File ‘deming.Rd’:
>     \keyword{models, regression}
I think that the check points out that in order to specify multiple keywords, 
you need to use \keyword{models} and \keyword{regression} separately, not 
\keyword{models, regression} in one Rd command.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] API documentation for R

2024-06-26 Thread Ivan Krylov via R-devel

В Thu, 25 Apr 2024 10:10:44 -0700
Kevin Ushey  пишет:

> I'm guessing the most welcome kinds of contributions would be
> documentation? IMHO, "documenting an API" and "describing how an API
> can be used" are somewhat separate endeavors. I believe R-exts does an
> excellent job of the latter, but may not be the right vehicle for the
> former. To that end, I believe it would be helpful to have some
> structured API documentation as a separate R-api document.

Now that we have a machine-readable list of APIs in the form of
system.file('wre.txt', package = 'tools') (which is not yet an API
itself, but I trust we'll be able to adapt to ongoing changes), it's
possible to work on such an R-api document.

I've put a proof of concept that checks its Texinfo indices against the
list of @apifun entries in wre.txt at 
with a rendered version at . I've
tried to address Agner's concerns [*] about R_NO_REMAP by showing the
declarations available with or without this preprocessor symbol
defined.

34 vaguely documented entry points out of 538 lines in wre.txt is
obviously not enough, but I'm curious whether this is the right
direction. Should we keep to a strict structure, like in Rd files, with
a table for every argument and the return value? Can we group functions
together, or should there be a separate @node for every function and
variable? Is Rd (and Henrik's earlier work [**]) a better format than
Texinfo for a searchable C API reference?

-- 
Best regards,
Ivan

[*] https://stat.ethz.ch/pipermail/r-package-devel/2024q2/010913.html

[**] https://github.com/HenrikBengtsson/RNativeAPI

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Creating a text-based device/output format

2024-06-25 Thread Ivan Krylov via R-devel

В Tue, 25 Jun 2024 09:42:59 +
David McArthur  пишет:

> ggplot(data, aes(x=body_mass_g,fill = species)) +
>   geom _histogram()
> 
> Could output something like:
> 
> title "The body mass (g) of penguin species"
> x-axis "Body mass (g)" 3000 --> 5550
> y-axis "Count" 0 --> 2
> histogram
>   Adelie [3000, 3250, 3400]
>   ChinStrap [3250, 3600]
>   Gentoo [4300, 5050, 5200, 5300, 5450]
> 
> How should I go about this in R?

R graphics devices are very low-level:
https://cran.r-project.org/doc/manuals/r-devel/R-ints.html#Graphics-devices

Instead of drawing histograms, they are asked to draw points, lines,
polygons, and text. If you're curious what it's like to implement such
a device, packages such as 'devEMF' and 'ragg' will provide examples.

You could instead go the 'txtplot' route and implement your own
functions that would return text in mermaid syntax, unrelated to
existing plotting engines. An alternative would be to carefully
deconstruct 'ggplot2' and 'lattice' objects and translate what you can
into mermaid diagrams, but that will always be limited to the
intersection of the source and target featuresets.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Hard crash of lme4 in R-devel

2024-06-15 Thread Ivan Krylov via R-devel

В Sat, 15 Jun 2024 02:04:31 +
"Therneau, Terry M., Ph.D. via R-devel"  пишет:

> other attached packages:
> [1] lme4_1.1-35.1  Matrix_1.7-0 

I see you have a new Matrix (1.7-0 from 2024-04-26 with a new ABI) but
an older lme4 (1.1-35.1 from 2023-11-05).

I reproduced the crash and the giant backtrace by first installing
latest lme4 and then updating Matrix. With the latest version of lme4,
this results in a warning:

library(lme4)
# Loading required package: Matrix
# Warning message:
# In check_dep_version() : ABI version mismatch:
# lme4 was built with Matrix ABI version 1
# Current Matrix ABI version is 2
# Please re-install lme4 from source or restore original 'Matrix'
# package

The version of lme4 that you have installed doesn't have this check
because it only appeared in March 2024:
https://github.com/lme4/lme4/commit/8be641b7a1fd5b6e6ac962552add13e29bb5ff5b

The crash should go away if you update or at least reinstall lme4 from
source.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Mismatches for methods registered for non-generic:

2024-05-27 Thread Ivan Krylov via R-devel

В Mon, 27 May 2024 10:52:26 +
"Koenker, Roger W"  пишет:

> that have been fine until now and on my fresh R version 4.4.0
> (2024-04-24) are still ok with R CMD check —as-cran

This extra check requires the environment variable
_R_CHECK_S3_METHODS_SHOW_POSSIBLE_ISSUES_ to be set to TRUE to show the
issues.

> but CRAN checking reveals, e.g.
> 
> Check: S3 generic/method consistency, Result: NOTE
>  Mismatches for methods registered for non-generic:
>  as:
>function(object, Class, strict, ext)
>  as.matrix.coo:
>function(x, nrow, ncol, eps, …)
> 
> which I interpret as regarding  my generics as just S3 methods for
> the non-generic “as”.

There are calls to S3method("as", ...) and S3method("is", ...) at the
end of the NAMESPACE for the current CRAN version of SparseM. If I
comment them out, the package passes R CMD check --as-cran without the
NOTE and seemingly with no extra problems.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] confint Attempts to Use All Server CPUs by Default

2024-05-21 Thread Ivan Krylov via R-devel

В Tue, 21 May 2024 08:00:11 +
Dario Strbenac via R-devel  пишет:

> Would a less resource-intensive value, such as 1, be a safer default
> CPU value for confint?

Which confint() method do you have in mind? There is at least four of
them by default in R, and many additional classes could make use of
stats:::confint.default by implementing vcov().

> Also, there is no mention of such parallel processing in ?confint, so
> it was not clear at first where to look for performance degradation.
> It could at least be described in the manual page so that users would
> know that export OPENBLAS_NUM_THREADS=1 is a solution.

There isn't much R can do about the behaviour of the BLAS, because
there is no standard interface to set the number of threads. Some BLASes
(like ATLAS) don't even offer it as a tunable number at all [*].

A system administrator could link the installation of R against
FlexiBLAS [**], provide safe defaults in the environment variables and
educate the users about its tunables [***], but that's a choice just
like it had been a choice to link R against a parallel variant of
OpenBLAS on a shared computer. This is described in R Installation and
Administration, section A.3.1 [].

-- 
Best regards,
Ivan

[*]
https://math-atlas.sourceforge.net/faq.html#tnum

[**]
https://www.mpi-magdeburg.mpg.de/projects/flexiblas

[***]
https://search.r-project.org/CRAN/refmans/flexiblas/html/flexiblas-threads.html

[]
https://cran.r-project.org/doc/manuals/R-admin.html#BLAS

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] FR: Customize background colour of row and column headers for the View output

2024-05-16 Thread Ivan Krylov via R-devel

The change suggested by Iago Giné Vázquez is indeed very simple. It
sets the background colour of the row and column headers to the
background of the rest of the dataentry window. With this patch, R
passes 'make check'. As Duncan Murdoch mentions, the X11 editor already
behaves this way.

If it's not acceptable to make the row and column headers the same
colour as the rest of the text, let's make it into a separate setting.

--- src/library/utils/src/windows/dataentry.c   (revision 86557)
+++ src/library/utils/src/windows/dataentry.c   (working copy)
@@ -1474,7 +1474,7 @@
 resize(DE->de, r);
 
 DE->CellModified = DE->CellEditable = FALSE;
-bbg = dialog_bg();
+bbg = guiColors[dataeditbg];
 /* set the active cell to be the upper left one */
 DE->crow = 1;
 DE->ccol = 1;

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] R hang/bug with circular references and promises

2024-05-13 Thread Ivan Krylov via R-devel

On Mon, 13 May 2024 09:54:27 -0500 (CDT)
luke-tierney--- via R-devel  wrote:

> Looks like I added that warning 22 years ago, so that should be enough
> notice :-). I'll look into removing it now.

Dear Luke,

I've got a somewhat niche use case: as a way of protecting myself
against rogue *.rds files and vulnerabilities in the C code, I've been
manually unserializing "plain" data objects (without anything
executable), including environments, in R [1].

I see that SET_ENCLOS() is already commented as "not API and probably
should not be <...> used". Do you think there is a way to recreate an
environment, taking the REFSXP entries into account, without
`parent.env<-`?  Would you recommend to abandon the folly of
unserializing environments manually?

-- 
Best regards,
Ivan

[1]
https://codeberg.org/aitap/unserializeData/src/commit/33d72705c1ee265349b3e369874ce4b47f9cd358/R/unserialize.R#L289-L313

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] max on numeric_version with long components

2024-04-27 Thread Ivan Krylov via R-devel

В Sat, 27 Apr 2024 13:56:58 -0500
Jonathan Keane  пишет:

> In devel:
> > max(numeric_version(c("1.0.1.1", "1.0.3.1",  
> "1.0.2.1")))
> [1] ‘1.0.1.1’
> > max(numeric_version(c("1.0.1.1000", "1.0.3.1000",  
> "1.0.2.1000")))
> [1] ‘1.0.3.1000’

Thank you Jon for spotting this!

This is an unintended consequence of
https://bugs.r-project.org/show_bug.cgi?id=18697.

The old behaviour of max() was to call
which.max(xtfrm(x)), which first produced a permutation that sorted the
entire .encode_numeric_version(x). The new behavioiur is to call
which.max directly on .encode_numeric_version(x), which is faster (only
O(length(x)) instead of a sort).

What do the encoded version strings look like?

x <- numeric_version(c(
 "1.0.1.1", "1.0.3.1", "1.0.2.1"
))
# Ignore the attributes
(e <- as.vector(.encode_numeric_version(x)))
# [1] "101575360400"
# [2] "103575360400"
# [3] "102575360400"

# order(), xtfrm(), sort() all agree that e[2] is the maximum:
order(e)
# [1] 1 3 2
xtfrm(e)
# [1] 1 3 2
sort(e)
# [1] "101575360400"
# [2] "102575360400"
# [3] "103575360400"

# but not which.max:
which.max(e)
# [1] 1

This happens because which.max() converts its argument to double, which
loses precision:

(n <- as.numeric(e))
# [1] 1e+27 1e+27 1e+27
identical(n[1], n[2])
# [1] TRUE
identical(n[3], n[2])
# [1] TRUE

Will be curious to know if there is a clever way to keep both the O(N)
complexity and the full arbitrary precision.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R 4.4.0 has version of Matrix 1.7-0, but it's not available on CRAN

2024-04-26 Thread Ivan Krylov via R-devel

On Fri, 26 Apr 2024 13:15:47 +0200
Gábor Csárdi  wrote:

> That's not how this worked in the past AFAIR. Simply, the packages in
> the x.y.z/Recommended directories were included in
> src/contrib/PACKAGES*, metadata, with the correct R version
> dependencies, in the correct order, so that `install.packages()`
> automatically installed the correct version without having to add
> extra repositories or manually search for package files.

That's great, then there is no need to patch anything. Thanks for
letting me know.

Should we be asking c...@r-project.org to add 4.4.0/Recommended to the
index, then?

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R 4.4.0 has version of Matrix 1.7-0, but it's not available on CRAN

2024-04-26 Thread Ivan Krylov via R-devel

On Fri, 26 Apr 2024 12:32:59 +0200
Martin Maechler  wrote:

> Finally, I'd think it definitely would be nice for
> install.packages("Matrix") to automatically get the correct
> Matrix version from CRAN ... so we (R-core) would be grateful
> for a patch to install.packages() to achieve this

Since the binaries offered on CRAN are already of the correct version
(1.7-0 for -release and -devel), only source package installation needs
to concern itself with the Recommended subdirectory.

Would it be possible to generate the PACKAGES* index files in the
4.4.0/Recommended subdirectory? Then on the R side it would be needed
to add a new repo (adjusting chooseCRANmirror() to set it together with
repos["CRAN"]) and keep the rest of the machinery intact.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Big speedup in install.packages() by re-using connections

2024-04-25 Thread Ivan Krylov via R-devel

On Thu, 25 Apr 2024 14:45:04 +0200
Jeroen Ooms  wrote:

> Thoughts?

How verboten would it be to create an empty external pointer object,
add it to the preserved list, and set an on-exit finalizer to clean up
the curl multi-handle? As far as I can tell, the internet module is not
supposed to be unloaded, so this would not introduce an opportunity to
jump to an unmapped address. This makes it possible to avoid adding a
CurlCleanup() function to the internet module:

Index: src/modules/internet/libcurl.c
===
--- src/modules/internet/libcurl.c  (revision 86484)
+++ src/modules/internet/libcurl.c  (working copy)
@@ -55,6 +55,47 @@
 
 static int current_timeout = 0;
 
+// The multi-handle is shared between downloads for reusing connections
+static CURLM *shared_mhnd = NULL;
+static SEXP mhnd_sentinel = NULL;
+
+static void cleanup_mhnd(SEXP ignored)
+{
+if(shared_mhnd){
+curl_multi_cleanup(shared_mhnd);
+shared_mhnd = NULL;
+}
+curl_global_cleanup();
+}
+static void rollback_mhnd_sentinel(void* sentinel) {
+// Failed to allocate memory while registering a finalizer,
+// therefore must release the object
+R_ReleaseObject((SEXP)sentinel);
+}
+static CURLM *get_mhnd(void)
+{
+if (!mhnd_sentinel) {
+  SEXP sentinel = PROTECT(R_MakeExternalPtr(NULL, R_NilValue, R_NilValue));
+  R_PreserveObject(sentinel);
+  UNPROTECT(1);
+  // Avoid leaking the sentinel before setting the finalizer
+  RCNTXT cntxt;
+  begincontext(, CTXT_CCODE, R_NilValue, R_BaseEnv, R_BaseEnv,
+   R_NilValue, R_NilValue);
+  cntxt.cend = _mhnd_sentinel;
+  cntxt.cenddata = sentinel;
+  R_RegisterCFinalizerEx(sentinel, cleanup_mhnd, TRUE);
+  // Succeeded, no need to clean up if endcontext() fails allocation
+  mhnd_sentinel = sentinel;
+  cntxt.cend = NULL;
+  endcontext();
+}
+if(!shared_mhnd) {
+  shared_mhnd = curl_multi_init();
+}
+return shared_mhnd;
+}
+
 # if LIBCURL_VERSION_MAJOR < 7 || (LIBCURL_VERSION_MAJOR == 7 && 
LIBCURL_VERSION_MINOR < 28)
 
 // curl/curl.h includes  and headers it requires.
@@ -565,8 +606,6 @@
if (c->hnd && c->hnd[i])
curl_easy_cleanup(c->hnd[i]);
 }
-if (c->mhnd)
-   curl_multi_cleanup(c->mhnd);
 if (c->headers)
curl_slist_free_all(c->headers);
 
@@ -668,7 +707,7 @@
c.headers = headers = tmp;
 }
 
-CURLM *mhnd = curl_multi_init();
+CURLM *mhnd = get_mhnd();
 if (!mhnd)
error(_("could not create curl handle"));
 c.mhnd = mhnd;


-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: Is ALTREP "non-API"?

2024-04-25 Thread Ivan Krylov via R-devel

On Wed, 24 Apr 2024 15:31:39 -0500 (CDT)
luke-tierney--- via R-devel  wrote:

> We would be better off (in my view, not necessarily shared by others
> in R-core) if we could get to a point where:
> 
>  all entry points listed in installed header files can be used in
>  packages, at least with some caveats;
> 
>  the caveats are expressed in a standard way that is searchable,
>  e.g. with a standardized comment syntax at the header file or
>  individual declaration level.

This sounds almost like Doxygen, although the exact syntax used to
denote the entry points and the necessary comments is far from the most
important detail at this point.

> There are some 500 entry points in the R shared library that are in
> the installed headers but not mentioned in WRE. These would need to
> be reviewed and adjusted.

Is there a way for outsiders to help? For example, would it help to
produce the linking graph (package P links to entry points X, Y)? I
understand that an entry point being unpopular doesn't mean it
shouldn't be public (and the other way around), but combined with a
list of entry points that are listed in WRE, such a graph could be
useful to direct effort or estimate impact from interface changes.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] View() segfaulting ...

2024-04-25 Thread Ivan Krylov via R-devel

On Wed, 24 Apr 2024 19:35:42 -0400
Ben Bolker  wrote:

>  I'm using bleeding-edge R-devel, so maybe my build is weird. Can 
> anyone else reproduce this?
> 
>View() seems to crash on just about anything.

Not for me, sorry.

If you have a sufficiently new processor, you can use `rr` [*] to
capture the crash, set a breakpoint in in_R_X11_dataviewer and rewind,
then set a watchpoint on the stack canary and run the program forward
again:
https://www.redhat.com/en/blog/debugging-stack-protector-failures

If you can't locate the canary, try setting watchpoints on large local
variables. Without `rr`, the procedure is probably the same, but
without rewinding: set a breakpoint in in_R_X11_dataviewer, set some
watchpoints, see if they fire when they shouldn't, start from scratch
if you get past the watchpoints and the process crashes.

I think that that either an object file didn't get rebuilt when it
should have, or a shared library used by something downstream from
View() got an ABI-breaking update. If this still reproduces with a clean
rebuild of R, it's definitely worth investigating further, perhaps using
AddressSanitizer. Valgrind may be lacking the information about the
stack canary and thus failing to distinguish between overwriting the
canary and normal access to a stack variable via a pointer.

-- 
Best regards,
Ivan

[*] https://rr-project.org/
Edit distance of one from the domain name of the R project!

Use rr replay -g $EVENT_NUMBER to debug past the initial execve()
from the shell wrapper: https://github.com/rr-debugger/rr/wiki/FAQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Wish: a way to track progress of parallel operations

2024-04-09 Thread Ivan Krylov via R-devel

Dear Henrik (and everyone else):

Here's a patch implementing support for immediateConditions in
'parallel' socket clusters. What do you think?

I've tried to make the feature backwards-compatible in the sense that
an older R starting a newer cluster worker will not pass the flag
enabling condition passing and so will avoid being confused by packets
with type = 'CONDITION'.

In order to propagate the conditions in a timely manner, all 'parallel'
functions that currently use recvData() on individual nodes will have
to switch to calling recvOneData(). I've already adjusted
staticClusterApply(), but e.g. clusterCall() would still postpone
immediateConditions from nodes later in the list (should they appear).

If this is deemed a good way forward, I can prepare a similar patch for
the MPI and socket clusters implemented in the 'snow' package.

-- 
Best regards,
Ivan
Index: src/library/parallel/R/clusterApply.R
===
--- src/library/parallel/R/clusterApply.R	(revision 86373)
+++ src/library/parallel/R/clusterApply.R	(working copy)
@@ -28,8 +28,12 @@
 end <- min(n, start + p - 1L)
 	jobs <- end - start + 1L
 for (i in 1:jobs)
-sendCall(cl[[i]], fun, argfun(start + i - 1L))
-val[start:end] <- lapply(cl[1:jobs], recvResult)
+sendCall(cl[[i]], fun, argfun(start + i - 1L),
+ tag = start + i - 1L)
+for (i in 1:jobs) {
+d <- recvOneResult(cl)
+val[d$tag] <- list(d$value)
+}
 start <- start + jobs
 }
 checkForRemoteErrors(val)
Index: src/library/parallel/R/snow.R
===
--- src/library/parallel/R/snow.R	(revision 86373)
+++ src/library/parallel/R/snow.R	(working copy)
@@ -120,7 +120,8 @@
 rprog = file.path(R.home("bin"), "R"),
 snowlib = .libPaths()[1],
 useRscript = TRUE, # for use by snow clusters
-useXDR = TRUE)
+useXDR = TRUE,
+forward_conditions = TRUE)
 defaultClusterOptions <<- addClusterOptions(emptyenv(), options)
 }
 
Index: src/library/parallel/R/snowSOCK.R
===
--- src/library/parallel/R/snowSOCK.R	(revision 86373)
+++ src/library/parallel/R/snowSOCK.R	(working copy)
@@ -32,6 +32,7 @@
 methods <- getClusterOption("methods", options)
 useXDR <- getClusterOption("useXDR", options)
 homogeneous <- getClusterOption("homogeneous", options)
+forward_conditions <- getClusterOption('forward_conditions', options)
 
 ## build the local command for starting the worker
 env <- paste0("MASTER=", master,
@@ -40,7 +41,8 @@
  " SETUPTIMEOUT=", setup_timeout,
  " TIMEOUT=", timeout,
  " XDR=", useXDR,
- " SETUPSTRATEGY=", setup_strategy)
+ " SETUPSTRATEGY=", setup_strategy,
+ " FORWARDCONDITIONS=", forward_conditions)
 ## Should cmd be run on a worker with R <= 4.0.2,
 ## .workRSOCK will not exist, so fallback to .slaveRSOCK
 arg <- "tryCatch(parallel:::.workRSOCK,error=function(e)parallel:::.slaveRSOCK)()"
@@ -130,17 +132,26 @@
 sendData.SOCKnode <- function(node, data) serialize(data, node$con)
 sendData.SOCK0node <- function(node, data) serialize(data, node$con, xdr = FALSE)
 
-recvData.SOCKnode <- recvData.SOCK0node <- function(node) unserialize(node$con)
+recvData.SOCKnode <- recvData.SOCK0node <- function(node) repeat {
+val <- unserialize(node$con)
+if (val$type != 'CONDITION') return(val)
+signalCondition(val$value)
+}
 
 recvOneData.SOCKcluster <- function(cl)
 {
 socklist <- lapply(cl, function(x) x$con)
 repeat {
-ready <- socketSelect(socklist)
-if (length(ready) > 0) break;
+repeat {
+ready <- socketSelect(socklist)
+if (length(ready) > 0) break;
+}
+n <- which.max(ready) # may need rotation or some such for fairness
+value <- unserialize(socklist[[n]])
+if (value$type != 'CONDITION')
+return(list(node = n, value = value))
+signalCondition(value$value)
 }
-n <- which.max(ready) # may need rotation or some such for fairness
-list(node = n, value = unserialize(socklist[[n]]))
 }
 
 makePSOCKcluster <- function(names, ...)
@@ -349,6 +360,7 @@
 timeout <- 2592000L   # wait 30 days for new cmds before failing
 useXDR <- TRUE# binary serialization
 setup_strategy <- "sequential"
+forward_conditions <- FALSE
 
 for (a in commandArgs(TRUE)) {
 ## Or use strsplit?
@@ -365,6 +377,9 @@
SETUPSTRATEGY = {
setup_strategy <- match.arg(value,
c("sequential",

Re: [Rd] Bug in out-of-bounds assignment of list object to expression() vector

2024-04-05 Thread Ivan Krylov via R-devel

On Fri, 5 Apr 2024 08:15:20 -0400
June Choe  wrote:

> When assigning a list to an out of bounds index (ex: the next, n+1
> index), it errors the same but now changes the values of the vector
> to NULL:
> 
> ```
> x <- expression(a,b,c)
> x[[4]] <- list() # Error
> x
> #> expression(NULL, NULL, NULL)  
> ```
> 
> Curiously, this behavior disappears if a prior attempt is made at
> assigning to the same index, using a different incompatible object
> that does not share this bug (like a function)

Here's how the problem happens:

1. The call lands in src/main/subassign.c, do_subassign2_dflt().

2. do_subassign2_dflt() calls SubassignTypeFix() to prepare the operand
for the assignment.

3. Since the assignment is "stretching", SubassignTypeFix() calls
EnlargeVector() to provide the space for the assignment.

The bug relies on `x` not being IS_GROWABLE(), which may explain 
why a plain x[[4]] <- list() sometimes doesn't fail.

The future assignment result `x` is now expression(a, b, c, NULL), and
the old `x` set to expression(NULL, NULL, NULL) by SET_VECTOR_ELT(newx,
i, VECTOR_ELT(x, i)); CLEAR_VECTOR_ELT(x, i); during EnlargeVector().

4. But then the assignment fails, raising the error back in
do_subassign2_dflt(), because the assignment kind is invalid: there is
no way to put data.frames into an expression vector. The new resized
`x` is lost, and the old overwritten `x` stays there.

Not sure what the right way to fix this is. It's desirable to avoid
shallow_duplicate(x) for the overwriting assignments, but then the
sub-assignment must either succeed or leave the operand untouched.
Is there a way to perform the type check before overwriting the operand?

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] hand compile; link to MKL fails at BLAS zdotu

2024-03-30 Thread Ivan Krylov via R-devel

В Sat, 30 Mar 2024 20:31:25 +0300
Ivan Krylov via R-devel  пишет:

> It seems to crash inside MKL!

Should have read some more about mkl_gf_lp64 before posting. According
to the Intel forums, it is indeed required in order to work with the
GFortran calling convention, but if you're linking against it, you also
have to add the rest of the linker command line, i.e.:

-lmkl_gf_lp64 -lmkl_core -lmkl_sequential 
-Wl,--no-as-needed -lpthread -lm -ldl

https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/ARPACK-with-MKL-crashes-when-calling-zdotc/m-p/1054316

Maybe it's even documented somewhere, but Intel makes it too annoying
to read their documentation, and they definitely don't mention it in
the link line advisor. There's also the ominous comment saying that

>> you cannot call standard BLAS [c,z]dot[c,u] functions from C/C++
>> because the interface library that is linked is specific for
>> GFortran which has a different calling convention of returning a
>> Complex type and would cause issues

I'm not seeing any calls to [c,z]dot[c,u] from inside R's C code (which
is why R seems to work when running with libmkl_rt.so), and the
respective declarations in R_ext/BLAS.h have an appropriate warning:

>> WARNING!  The next two return a value that may not be compatible
>> between C and Fortran, and even if it is, this might not be the
>> right translation to C.

...so it's likely that everything will keep working.

Indeed, R configured with

--with-blas='-lmkl_gf_lp64 -lmkl_core -lmkl_sequential'
--with-lapack='-lmkl_gf_lp64 -lmkl_core -lmkl_sequential'

seems to work with MKL.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] hand compile; link to MKL fails at BLAS zdotu

2024-03-30 Thread Ivan Krylov via R-devel

В Sat, 30 Mar 2024 10:55:48 +
Ramón Fallon  пишет:

> In contrast to Dirk's solution, I've found R's configure script
> doesn't recognise the update-alternatives system on debian/ubuntu, if
> it's MKL.

It ought to work if configured with --with-blas=-lblas
--with-lapack=-llapack, but, as you found out (and I can confirm), if
libblas.so and liblapack.so already point to MKL, ./configure somehow
fails the test for zdotu and falls back to bundled Rblas and Rlapack.

If you'd like the built R to work with the update-alternatives system,
the workaround seems to help is to temporarily switch the alternatives
to reference BLAS & LAPACK, configure and build R, and then switch the
alternatives back to MKL.

> appending "-lmkl_gf_lp64" to the --with-blas option does not help
> (that's suggested by several posts out there).

MKL has an official "link line advisor" at
,
which may suggest a completely different set of linker options
depending on what it is told. Here's how R's zdotu test always fails
when linking directly with MKL:

# pre-configure some variables
echo '#define HAVE_F77_UNDERSCORE 1' > confdefs.h
FC=gfortran
FFLAGS='-g -Og'
CC=gcc
CFLAGS='-g -Og'
CPPFLAGS=-I/usr/local/include
MAIN_LDFLAGS='-Wl,--export-dynamic -fopenmp'
LDFLAGS='-L/usr/local/lib'
LIBM=-lm
FLIBS=' -lgfortran -lm -lquadmath'
# copied & pasted from the Intel web page
BLAS_LIBS='-lmkl_rt -Wl,--no-as-needed -lpthread -lm -ldl'

# R prepares to call zdotu from Fortran...
cat > conftestf.f < 1.0d-10) then
iflag = 1
  else
iflag = 0
  endif
  end
EOF
${FC} ${FFLAGS} -c conftestf.f

# and then call the Fortran subroutine from the C runner...
cat > conftest.c <
#include "confdefs.h"
#ifdef HAVE_F77_UNDERSCORE
# define F77_SYMBOL(x)   x ## _
#else
# define F77_SYMBOL(x)   x
#endif
extern void F77_SYMBOL(test1)(int *iflag);

int main () {
  int iflag;
  F77_SYMBOL(test1)();
  exit(iflag);
}
EOF
${CC} ${CPPFLAGS} ${CFLAGS} -c conftest.c

# and then finally link and execute the program
${CC} ${CPPFLAGS} ${CFLAGS} ${LDFLAGS} ${MAIN_LDFLAGS} \
 -o conftest conftest.o conftestf.o \
 ${BLAS_LIBS} ${FLIBS} ${LIBM}
./conftest

It seems to crash inside MKL!

rax=cccd rbx=5590ee102008 rcx=7ffdab2ddb20 
rdx=5590ee102008 
rsi=7ffdab2ddb18 rdi=5590ee10200c rbp=7ffdab2dd910 
rsp=7ffdab2db600 
 r8=5590ee102008  r9=7ffdab2ddb28 r10=7f4086a99178 
r11=7f4086e02490 
r12=5590ee10200c r13=7ffdab2ddb20 r14=5590ee102008 
r15=7ffdab2ddb28 
ip = 7f4086e02a60, sp = 7ffdab2db600 [mkl_blas_zdotu+1488]
ip = 7f4085dc5250, sp = 7ffdab2dd920 [zdotu+256]
ip = 5590ee1011cc, sp = 7ffdab2ddb40 [test1_+91]
ip = 5590ee101167, sp = 7ffdab2ddb70 [main+14]

It's especially strange that R does seem to work if you just
update-alternatives after linking it with the reference BLAS, but
./conftest starts crashing again in the same place. This is with
Debian's MKL version 2020.4.304-4, by the way.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] paths capability FALSE on devel?

2024-03-27 Thread Ivan Krylov via R-devel

В Wed, 27 Mar 2024 11:28:17 +0100
Alexandre Courtiol  пишет:

> after installing R-devel the output of
> grDevices::dev.capabilities()$paths is FALSE, while it is TRUE for R
> 4.3.3

Your system must be missing Cairo development headers, making x11()
fall back to type = 'Xlib':

$ R-devel -q -s -e 'x11(); grDevices::dev.capabilities()$paths'
 [1] TRUE
$ R-devel -q -s -e \
 'x11(type="Xlib"); grDevices::dev.capabilities()$paths'
 [1] FALSE

If that's not the case and capabilities()['cairo'] is TRUE in your
build of R-devel, please show us the sessionInfo() from your build of
R-devel.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Wish: a way to track progress of parallel operations

2024-03-26 Thread Ivan Krylov via R-devel

Henrik,

Thank you for taking the time to read and reply to my message!

On Mon, 25 Mar 2024 10:19:38 -0700
Henrik Bengtsson  wrote:

> * Target a solution that works the same regardless whether we run in
> parallel or not, i.e. the code/API should look the same regardless of
> using, say, parallel::parLapply(), parallel::mclapply(), or
> base::lapply(). The solution should also work as-is in other parallel
> frameworks.

You are absolutely right about mclapply(): it suffers from the same
problem where the task running inside it has no reliable mechanism of
reporting progress. Just like on a 'parallel' cluster (which can be
running on top of an R connection, MPI, the 'mirai' package, a server
pretending to be multiple cluster nodes, or something completely
different), there is currently no documented interface for the task to
report any additional data except the result of the computation.

> I argue the end-user should be able to decided whether they want to
> "see" progress updates or not, and the developer should focus on
> where to report on progress, but not how and when.

Agreed. As a package developer, I don't even want to bother calling
setTxtProgressBar(...), but it gets most of the job done at zero
dependency cost, and the users don't complain. The situation could
definitely be improved.

> It is possible to use the existing PSOCK socket connections to send
> such 'immediateCondition':s.

Thanks for pointing me towards ClusterFuture, that's a great hack, and
conditions are a much better fit for progress tracking than callbacks.

It would be even better if 'parallel' clusters could "officially"
handle immediateConditions and re-signal them in the main R session.
Since R-4.4 exports (but not yet documents) sendData, recvData and
recvOneData generics from 'parallel', we are still in a position to
codify and implement the change to the 'parallel' cluster back-end API.

It shouldn't be too hard to document the requirement that recvData() /
recvOneData() must signal immediateConditions arriving from the nodes
and patch the existing cluster types (socket and MPI). Not sure how
hard it will be to implement for 'mirai' clusters.

> I honestly think we could arrive at a solution where base-R proposes
> a very light, yet powerful, progress API that handles all of the
> above. The main task is to come up with a standard API/protocol -
> then the implementation does not matter.

Since you've already given it a lot of thought, which parts of
progressr would you suggest for inclusion into R, besides 'parallel'
clusters and mclapply() forwarding immediateConditions from the worker
processes?

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Wish: a way to track progress of parallel operations

2024-03-25 Thread Ivan Krylov via R-devel

Hello R-devel,

A function to be run inside lapply() or one of its friends is trivial
to augment with side effects to show a progress bar. When the code is
intended to be run on a 'parallel' cluster, it generally cannot rely on
its own side effects to report progress.

I've found three approaches to progress bars for parallel processes on
CRAN:

 - Importing 'snow' (not 'parallel') internals like sendCall and
   implementing parallel processing on top of them (doSNOW). This has
   the downside of having to write higher-level code from scratch
   using undocumented inferfaces.

 - Splitting the workload into length(cluster)-sized chunks and
   processing them in separate parLapply() calls between updating the
   progress bar (pbapply). This approach trades off parallelism against
   the precision of the progress information: the function has to wait
   until all chunk elements have been processed before updating the
   progress bar and submitting a new portion; dynamic load balancing
   becomes much less efficient.

 - Adding local side effects to the function and detecting them while
   the parallel function is running in a child process (parabar). A
   clever hack, but much harder to extend to distributed clusters.

With recvData and recvOneData becoming exported in R-4.4 [*], another
approach becomes feasible: wrap the cluster object (and all nodes) into
another class, attach the progress callback as an attribute, and let
recvData / recvOneData call it. This makes it possible to give wrapped
cluster objects to unchanged code, but requires knowing the precise
number of chunks that the workload will be split into.

Could it be feasible to add an optional .progress argument after the
ellipsis to parLapply() and its friends? We can require it to be a
function accepting (done_chunk, total_chunks, ...). If not a new
argument, what other interfaces could be used to get accurate progress
information from staticClusterApply and dynamicClusterApply?

I understand that the default parLapply() behaviour is not very
amenable to progress tracking, but when running clusterMap(.scheduling
= 'dynamic') spanning multiple hours if not whole days, having progress
information sets the mind at ease.

I would be happy to prepare code and documentation. If there is no time
now, we can return to it after R-4.4 is released.

-- 
Best regards,
Ivan

[*] https://bugs.r-project.org/show_bug.cgi?id=18587

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Spurious warning in as.data.frame.factor()

2024-03-17 Thread Ivan Krylov via R-devel

On Fri, 15 Mar 2024 11:24:22 +0100
Martin Maechler  wrote:

> I think just adding
> 
>  removeGeneric('as.data.frame')
> 
> is appropriate here as it is self-explaining and should not leave
> much traces.

Thanks for letting me know! I'll make sure to use removeGeneric() in
similar cases in the future.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Spurious warning in as.data.frame.factor()

2024-03-14 Thread Ivan Krylov via R-devel

On Thu, 14 Mar 2024 10:41:54 +0100
Martin Maechler  wrote:

> Anybody trying S7 examples and see if they work w/o producing
> wrong warnings?

It looks like this is not applicable to S7. If I overwrite
as.data.frame with a newly created S7 generic, it fails to dispatch on
existing S3 classes:

new_generic('as.data.frame', 'x')(factor(1))
# Error: Can't find method for `as.data.frame(S3)`.

But there is no need to overwrite the generic, because S7 classes
should work with existing S3 generics:

foo <- new_class('foo', parent = class_double)
method(as.data.frame, foo) <- function(x) structure(
 # this is probably not generally correct
 list(x),
 names = deparse1(substitute(x)),
 row.names = seq_len(length(x)),
 class = 'data.frame'
)
str(as.data.frame(foo(pi)))
# 'data.frame':   1 obs. of  1 variable:
#  $ x:  num 3.14

So I think that is nothing to break because S7 methods for
as.data.frame will rely on S3 for dispatch.

> > The patch passes make check-devel, but I'm not sure how to safely
> > put setGeneric('as.data.frame'); as.data.frame(factor(1:10)) in a
> > regression test.  
> 
> {What's the danger/problem?  we do have "similar" tests in both
>   src/library/methods/tests/*.R
>   tests/reg-S4.R
> 
>  -- maybe we can discuss bi-laterally  (or here, as you prefer)
> }

This might be educational for other people wanting to add a regression
test to their patch. I see that tests/reg-tests-1e.R is already running
under options(warn = 2), so if I add the following near line 750
("Deprecation of *direct* calls to as.data.frame.")...

# Should not warn for a call from a derivedDefaultMethod to the raw
# S3 method -- implementation detail of S4 dispatch
setGeneric('as.data.frame')
as.data.frame(factor(1))

...then as.data.frame will remain an S4 generic. Should the test then
rm(as.data.frame) and keep going? (Or even keep the S4 generic?) Is
there any hidden state I may be breaking for the rest of the test this
way? The test does pass like this, so this may be worrying about
nothing.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Spurious warning in as.data.frame.factor()

2024-03-13 Thread Ivan Krylov via R-devel

В Tue, 12 Mar 2024 12:33:17 -0700
Hervé Pagès  пишет:

> The acrobatics that as.data.frame.factor() is going thru in order to 
> recognize a direct call don't play nice if as.data.frame() is an S4 
> generic:
> 
>      df <- as.data.frame(factor(11:12))
> 
>      suppressPackageStartupMessages(library(BiocGenerics))
>      isGeneric("as.data.frame")
>      # [1] TRUE
> 
>      df <- as.data.frame(factor(11:12))
>      # Warning message:
>      # In as.data.frame.factor(factor(11:12)) :
>      #   Direct call of 'as.data.frame.factor()' is deprecated.

How about something like the following:

Index: src/library/base/R/zzz.R
===
--- src/library/base/R/zzz.R(revision 86109)
+++ src/library/base/R/zzz.R(working copy)
@@ -681,7 +681,14 @@
 bdy <- body(as.data.frame.vector)
 bdy <- bdy[c(1:2, seq_along(bdy)[-1L])] # taking [(1,2,2:n)] to insert at 
[2]:
 ## deprecation warning only when not called by method dispatch from 
as.data.frame():
-bdy[[2L]] <- quote(if((sys.nframe() <= 1L || !identical(sys.function(-1L), 
as.data.frame)))
+bdy[[2L]] <- quote(if((sys.nframe() <= 1L || !(
+   identical(sys.function(-1L), as.data.frame) || (
+   .isMethodsDispatchOn() &&
+   methods::is(sys.function(-1L), 'derivedDefaultMethod') &&
+   identical(
+   sys.function(-1L)@generic,
+   structure('as.data.frame', package = 'base')
+   )
.Deprecated(
msg = gettextf(
"Direct call of '%s()' is deprecated.  Use '%s()' or
'%s()' instead",

The patch passes make check-devel, but I'm not sure how to safely put
setGeneric('as.data.frame'); as.data.frame(factor(1:10)) in a
regression test.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Never exporting .global and .suppressForeign?

2024-03-06 Thread Ivan Krylov via R-devel

Hello,

(Dear Richard, I hope you don't mind being Cc:'d on this thread in
R-devel. This is one of the ways we can prevent similar problems from
happening in the future.)

Sometimes, package authors who use both exportPattern('.') and
utils::globalVariables(...) get confusing WARNINGs about undocumented
exports:
https://stat.ethz.ch/pipermail/r-package-devel/2024q1/010531.html

I would like to suggest adding the variables used by
utils::globalVariables and utils::suppressForeignCheck to the list of
things that should never be exported:

Index: src/library/base/R/namespace.R
===
--- src/library/base/R/namespace.R  (revision 86054)
+++ src/library/base/R/namespace.R  (working copy)
@@ -806,7 +806,8 @@
 if (length(exports)) {
 stoplist <- c(".__NAMESPACE__.", ".__S3MethodsTable__.",
   ".packageName", ".First.lib", ".onLoad",
-  ".onAttach", ".conflicts.OK", ".noGenerics")
+  ".onAttach", ".conflicts.OK", ".noGenerics",
+  ".__global__", ".__suppressForeign__")
 exports <- exports[! exports %in% stoplist]
 }
if(lev > 2L) message("--- processing exports for ", dQuote(package))

(Indeed, R CMD check is very careful to only access these variables
using the interface functions in the utils package, so there doesn't
seem to be any code that depends on them being exported, and they
usually aren't.)

Alternatively (or maybe additionally), it may be possible to enhance
the R CMD check diagnostics by checking whether the name of the
undocumented object starts with a dot and asking the user whether it
was intended to be exported. This is not as easy to implement due to
tools:::.check_packages working with the log output from
tools::undoc(), not the object itself. Would a change to
tools:::format.undoc be warranted?

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] How to avoid the Markdown code block bug on R Bugzilla

2024-02-27 Thread Ivan Krylov via R-devel

Hello,

There's a rare but annoying bug in Bugzilla 5.1.2...5.3.2+ where a
Markdown code block inside a comment may be replaced by U+F111 or
U+F222, and then the following code blocks may end up being replaced by
the preceding ones. For example, the problem can be seen in PR16158:
https://bugs.r-project.org/show_bug.cgi?id=16158.

Here's how to avoid it:

1. If no code blocks have been already swallowed by Bugzilla, use the
comment preview to make sure yours won't be swallowed either. If you do
see a  or a  instead of your code block in the preview tab, try:
 - starting the comment with an empty line
 - removing the colons from the starting sentence
 - if all else fails, switching Markdown off

2. If you would like to post some code into a bug where this has
already happened, the preview won't be enough. Bugzilla::Markdown has
separate queues for fenced code blocks and indented code blocks, so if
one was swallowed, it may be possible to post the other. Unfortunately,
you won't know whether it'll fail until you post the comment, and by
then it may be a part of the problem. The only safe way to continue is
to switch Markdown off for the comment.

A technical analysis of the bug is available at
,
but it may take a while to get this fixed on the Bugzilla side.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Tcl socket server (tcltk) does not work any more on R 4.3.2

2024-02-21 Thread Ivan Krylov via R-devel

В Wed, 21 Feb 2024 08:01:16 +0100
"webmail.gandi.net"  пишет:

> Since the {tcltk} package was working fine with  "while
> (Tcl_DoOneEvent(TCL_DONT_WAIT) && max_ev) max_ev—;", unless there is
> a clear performance enhancement with "while (i-- &&
> Tcl_ServiceAll())", it would perhaps be wise to revert this back.

I forgot to mention the comment in the new version of the function
explaining the switch:

>> [Tcl_DoOneEvent(TCL_DONT_WAIT)] <...> causes infinite recursion with
>> R handlers that have a re-entrancy guard, when TclSpinLoop is
>> invoked from such a handler (seen with Rhttp server)

The difference between Tcl_ServiceAll() and Tcl_DoOneEvent() is that
the latter calls Tcl_WaitForEvent(). The comments say that it is called
for the side effect of queuing the events detected by select(). The
function can indeed be observed to access the fileHandlers via the
thread-specific data pointer, which contain the file descriptors and
the instructions saying what to do with them.

Without Tcl_WaitForEvent, the only event sources known to Tcl are
RTcl_{setup,check}Proc (which only checks file descriptors owned by R),
Display{Setup,Check}Proc (which seems to be owned by Tk), and
Timer{Setup,Check}Proc (for which there doesn't seem to be any timers
by default).

As far as I understand the problem, while the function
worker_input_handler() from src/modules/internet/Rhttpd.c is running,
TclHandler() might be invoked, causing Tcl_DoOneEvent() to call
RTcl_checkProc() and therefore trying to run worker_input_handler()
again. The Rhttpd handler prevents this and doesn't clear the
condition, which causes the event loop to keep calling it. Is that
correct? Are there easy ways to reproduce the problem?

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Tcl socket server (tcltk) does not work any more on R 4.3.2

2024-02-20 Thread Ivan Krylov via R-devel

В Tue, 20 Feb 2024 12:27:35 +0100
"webmail.gandi.net"  пишет:

> When R process #1 is R 4.2.3, it works as expected (whatever version
> of R #2). When R process #1 is R 4.3.2, nothing is sent or received
> through the socket apparently, but no error is issued and process #2
> seems to be able to connect to the socket.

The difference is related to the change in
src/library/tcltk/src/tcltk_unix.c.

In R-4.2.1, the function static void TclSpinLoop(void *data) says:

int max_ev = 100;
/* Tcl_ServiceAll is not enough here, for reasons that escape me */
while (Tcl_DoOneEvent(TCL_DONT_WAIT) && max_ev) max_ev--;

In R-devel, the function instead says:

int i = R_TCL_SPIN_MAX; 
while (i-- && Tcl_ServiceAll())
;

Manually calling Tcl_DoOneEvent(0) from the debugger at this point
makes the Tcl code respond to the connection. Tcl_ServiceAll() seems to
be still not enough. I'll try reading Tcl documentation to investigate
this further.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] certain pipe() use cases not working in r-devel

2024-02-15 Thread Ivan Krylov via R-devel

В Wed, 14 Feb 2024 14:43:12 -0800
Jennifer Bryan  пишет:

> But in r-devel on macOS, this is silent no-op, i.e. "hello, world"
> does not print:
> 
> > R.version.string  
> [1] "R Under development (unstable) (2024-02-13 r85895)"
> > con <- pipe("cat")
> > writeLines("hello, world", con)  

I can reproduce this on 64-bit Linux.

I think that this boils down to problems with cleanup in R_pclose_pg
[*]. The FILE* fp corresponding to the child process pipe is created
using fdopen() in R_popen_pg(), but R_pclose_pg() only performs close()
on the file descriptor returned by fileno(). The FILE* itself is
leaked, and any buffered content waiting to be written out is lost.

One of the last few lines in the strace output before the process
terminates is the standard C library cleaning up the FILE* object and
trying to flush the buffer:

$ strace -f bin/R -q -s \
 -e 'writeLines("hello", x <- pipe("cat")); close(x)'
...skip...
write(5, "hello\n", 6)  = -1 EBADF (Bad file descriptor)
exit_group(0)   = ?
+++ exited with 0 +++

There is a comment saying "see timeout_wait for why not to use fclose",
which I think references a different function, R_pclose_timeout():

>> Do not use fclose, because on Solaris it sets errno to "Invalid
>> seek" when the pipe is already closed (e.g. because of timeout).
>> fclose would not return an error, but it would set errno and the
>> non-zero errno would then be reported by R's "system" function.

(There are no comments about fclose() in timeout_wait() itself.)

Is there a way to work around the errno problem without letting the
FILE* leak?

-- 
Best regards,
Ivan

[*] Introduced in https://bugs.r-project.org/show_bug.cgi?id=17764#c6
to run child processes in a separate process group, safe from
interrupts aimed at R.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Difficult debug

2024-02-07 Thread Ivan Krylov via R-devel

On Wed, 07 Feb 2024 14:01:44 -0600
"Therneau, Terry M., Ph.D. via R-devel"  wrote:

>  > test2 <- mysurv(fit2, pbc2$bili4, p0= 4:0/10, fit2, x0 =50)  
> ==31730== Invalid read of size 8
> ==31730==    at 0x298A07: Rf_allocVector3 (memory.c:2861)
> ==31730==    by 0x299B2C: Rf_allocVector (Rinlinedfuns.h:595)
> ==31730==    by 0x299B2C: R_alloc (memory.c:2330)
> ==31730==    by 0x3243C6: do_which (summary.c:1152)
<...>
> ==31730==  Address 0x10 is not stack'd, malloc'd or (recently) free'd
<...>
>   *** caught segfault ***
> address 0x10, cause 'memory not mapped'

An unrelated allocation function suddenly dereferencing a null pointer
is likely indication of heap corruption. Valgrind may be silent about
it because the C heap (that it knows how to override and track) is still
intact, but the R memory management metadata got corrupted (which looks
like a valid memory access to Valgrind).

An easy solution could be brought by more instrumentation.

R can tell Valgrind to consider some memory accesses invalid if you
configure it using --with-valgrind-instrumentation [*], but I'm not
sure it will be able to trap overwriting GC metadata, so let's set it
aside for now.

If you compile your own R, you can configure it with -fsanitize=address
added to the compiler and linker flags [**]. I'm not sure whether the
bounds checks performed by AddressSanitizer would be sufficient to
catch the problem, but it's worth a try. Instead of compiling R with
sanitizers, it should be also possible to use the container image
docker.io/rocker/r-devel-san.

The hard option is left if no instrumentation lets you pinpoint the
error. Since the first (as far as Valgrind is concerned) memory error
already happens to result in a SIGSEGV, you can run R in a regular
debugger and try to work backwards from the local variables at the
location of the crash. Maybe there's a way to identify the block
containing the pointer that gets overwritten and set a watchpoint on
it for the next run of R. Maybe you can read the overwritten value as
double and guess where the number came from. If your processor is
sufficiently new, you can try `rr`, the time-travelling debugger [***],
to rewind the process execution back to the point where the pointer gets
overwritten.

-- 
Best regards,
Ivan

[*]
https://cran.r-project.org/doc/manuals/R-exts.html#Using-valgrind

[**]
https://cran.r-project.org/doc/manuals/R-exts.html#Using-Address-Sanitizer

[***]
https://rr-project.org
Judging by the domain name, it's practically designed to fix troublesome
bugs in R packages!

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Advice debugging M1Mac check errors

2024-02-05 Thread Ivan Krylov via R-devel

On Sun, 4 Feb 2024 20:41:51 +0100
Holger Hoefling  wrote:

> I wanted to ask if people have good advice on how to debug M1Mac
> package check errors when you don´t have a Mac?

Apologies for not answering the question you asked, but is this about
hdf5r and problems printing R_xlen_t [*] that appeared in 1.3.8 and you
tried to solve in 1.3.9?

We had a thread about this last November:
https://stat.ethz.ch/pipermail/r-package-devel/2023q4/010123.html

To summarise, there is no single standard C format specifier that can be
used to print R_xlen_t. As an implementation detail, it can be defined
as int or ptrdiff_t (or something completely different in the future),
and ptrdiff_t itself is usually defined as long or long long (or, also,
something completely different on a weirder platform). All three basic
types can have different widths and cause painful stack-related
problems when a mismatch happens.

In R-4.4, there will be a macro R_PRIdXLEN_T defining a compatible
printf specifier. Until then (and for compatibility with R-4.3 and
lower), it's relatively safe to cast to (long long) or (ptrdiff_t) and
then use the corresponding specifier, but that's not 100% future-proof.
Also, mind the warnings that mingw compilers sometimes emit for "new"
printf specifiers despite UCRT is documented to support them.

-- 
Best regards,
Ivan

[*] https://www.stats.ox.ac.uk/pub/bdr/M1mac/hdf5r.out

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: Choices to remove `srcref` (and its buddies) when serializing objects

2024-01-18 Thread Ivan Krylov via R-devel

On Thu, 18 Jan 2024 09:59:31 -0600 (CST)
luke-tier...@uiowa.edu wrote:

> What does 'blow up' mean? If it is anything other than signal a "bad
> binding access" error then it would be good to have more details.

My apologies for not being precise enough. I meant the "bad binding
access" error in all such cases.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Choices to remove `srcref` (and its buddies) when serializing objects

2024-01-18 Thread Ivan Krylov via R-devel

В Tue, 16 Jan 2024 14:16:19 -0500
Dipterix Wang  пишет:

> Could you recommend any packages/functions that compute hash such
> that the source references and sexpinfo_struct are ignored? Basically
> a version of `serialize` that convert R objects to raw without
> storing the ancillary source reference and sexpinfo.

I can show how this can be done, but it's not currently on CRAN or even
a well-defined package API. I have adapted a copy of R's serialize()
[*] with the following changes:

 * Function bytecode and flags are ignored:

f <- function() invisible()
depcache:::hash(f, 2) # This is plain FNV1a-64 of serialize() output
# [1] "9b7a1af5468deba4"
.Call(depcache:::C_hash2, f) # This is the new hash
[1] 91 5f b8 a1 b0 6b cb 40
f() # called once: function gets the MAYBEJIT_MASK flag
depcache:::hash(f, 2)
# [1] "7d30e05546e7a230"
.Call(depcache:::C_hash2, f)
# [1] 91 5f b8 a1 b0 6b cb 40
f() # called twice: function now has bytecode
depcache:::hash(f, 2)
# [1] "2a2cba4150e722b8"
.Call(depcache:::C_hash2, f)
# [1] 91 5f b8 a1 b0 6b cb 40 # new hash stays the same

 * Source references are ignored:

.Call(depcache:::C_hash2, \( ) invisible( ))
# [1] 91 5f b8 a1 b0 6b cb 40 # compare vs. above

# For quoted function definitions, source references have to be handled
# differently 
.Call(depcache:::C_hash2, quote(function(){}))
[1] 58 0d 44 8e d4 fd 37 6f
.Call(depcache:::C_hash2, quote(\( ){  }))
[1] 58 0d 44 8e d4 fd 37 6f

 * ALTREP is ignored:

identical(1:10, 1:10+0L)
# [1] TRUE
identical(serialize(1:10, NULL), serialize(1:10+0L, NULL))
# [1] FALSE
identical(
 .Call(depcache:::C_hash2, 1:10),
 .Call(depcache:::C_hash2, 1:10+0L)
)
# [1] TRUE

 * Strings not marked as bytes are encoded into UTF-8:

identical('\uff', iconv('\uff', 'UTF-8', 'latin1'))
# [1] TRUE
identical(
 serialize('\uff', NULL),
 serialize(iconv('\uff', 'UTF-8', 'latin1'), NULL)
)
# [1] FALSE
identical(
 .Call(depcache:::C_hash2, '\uff'),
 .Call(depcache:::C_hash2, iconv('\uff', 'UTF-8', 'latin1'))
)
# [1] TRUE

 * NaNs with different payloads (except NA_numeric_) are replaced by
   R_NaN.

One of the many downsides to the current approach is that we rely on
the non-API entry point getPRIMNAME() in order to hash builtins.
Looking at the source code for identical() is no help here, because it
uses the private PRIMOFFSET macro.

The bitstream being hashed is also, unfortunately, not exactly
compatible with R serialization format version 2: I had to ignore the
LEVELS of the language objects being hashed both because identical()
seems to ignore those and because I was missing multiple private
definitions (e.g. the MAYBEJIT flag) to handle them properly.

Then there's also the problem of immediate bindings [**]: I've seen bits
of vctrs, rstudio, rlang blow up when calling CAR() on SEXP objects that
are not safe to handle this way, but R_expand_binding_value() (used by
serialize()) is again a private function that is not accessible from
packages. identical() won't help here, because it compares reference
objects (which may or may not contain such immediate bindings) by their
pointer values instead of digging down into them.

Dropping the (already violated) requirement to be compatible with R
serialization bitstream will make it possible to simplify the code
further.

Finally:

a <- new.env()
b <- new.env()
a$x <- b$x <- 42
identical(a, b)
# [1] FALSE
.Call(depcache:::C_hash2, a)
# [1] 44 21 f1 36 5d 92 03 1b
.Call(depcache:::C_hash2, b)
# [1] 44 21 f1 36 5d 92 03 1b

...but that's unavoidable when looking at frozen object contents
instead of their live memory layout.

If you're interested, here's the development version of the package:
install.packages('depcache',contriburl='https://aitap.github.io/Rpackages')

-- 
Best regards,
Ivan

[*]
https://github.com/aitap/depcache/blob/serialize_canonical/src/serialize.c

[**]
https://svn.r-project.org/R/trunk/doc/notes/immbnd.md

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Sys.which() caching path to `which`

2024-01-12 Thread Ivan Krylov via R-devel

On Thu, 11 Jan 2024 09:30:55 +1300
Simon Urbanek  wrote:

> That said, WHICH is a mess - it may make sense to switch to the
> command -v built-in which is part of POSIX (where available - which
> is almost everywhere today) which would not require an external tool

This is a bit tricky to implement. I've prepared the patch at the end
of this e-mail, tested it on GNU/Linux and tried to test on OpenBSD [*]
(I cannot test on a Mac), but then I realised one crucial detail:
unlike `which`, `command -v` returns names of shell builtins if
something is both an executable and a builtin. So for things like `[`,
Sys.which would behave differently if changed to use command -v:

$ sh -c 'which ['
/usr/bin/[
$ sh -c 'command -v ['
[

R checks the returned string with file.exists(), so the new
Sys.which('[') returns an empty string instead of /usr/bin/[. That's
probably undesirable, isn't it?

Index: configure
===
--- configure   (revision 85802)
+++ configure   (working copy)
@@ -949,7 +949,6 @@
 PDFTEX
 TEX
 PAGER
-WHICH
 SED
 INSTALL_DATA
 INSTALL_SCRIPT
@@ -5390,66 +5389,6 @@
 done
 test -n "$SED" || SED="/bin/sed"
 
-
-## 'which' is not POSIX, and might be a shell builtin or alias
-##  (but should not be in 'sh')
-for ac_prog in which
-do
-  # Extract the first word of "$ac_prog", so it can be a program name with 
args.
-set dummy $ac_prog; ac_word=$2
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
-printf %s "checking for $ac_word... " >&6; }
-if test ${ac_cv_path_WHICH+y}
-then :
-  printf %s "(cached) " >&6
-else $as_nop
-  case $WHICH in
-  [\\/]* | ?:[\\/]*)
-  ac_cv_path_WHICH="$WHICH" # Let the user override the test with a path.
-  ;;
-  *)
-  as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
-for as_dir in $PATH
-do
-  IFS=$as_save_IFS
-  case $as_dir in #(((
-'') as_dir=./ ;;
-*/) ;;
-*) as_dir=$as_dir/ ;;
-  esac
-for ac_exec_ext in '' $ac_executable_extensions; do
-  if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then
-ac_cv_path_WHICH="$as_dir$ac_word$ac_exec_ext"
-printf "%s\n" "$as_me:${as_lineno-$LINENO}: found 
$as_dir$ac_word$ac_exec_ext" >&5
-break 2
-  fi
-done
-  done
-IFS=$as_save_IFS
-
-  ;;
-esac
-fi
-WHICH=$ac_cv_path_WHICH
-if test -n "$WHICH"; then
-  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $WHICH" >&5
-printf "%s\n" "$WHICH" >&6; }
-else
-  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5
-printf "%s\n" "no" >&6; }
-fi
-
-
-  test -n "$WHICH" && break
-done
-test -n "$WHICH" || WHICH="which"
-
-if test "${WHICH}" = which ; then
-  ## needed to build and run R
-  ## ends up hard-coded in the utils package
-  as_fn_error $? "which is required but missing" "$LINENO" 5
-fi
-
 ## Make
 : ${MAKE=make}
 
Index: configure.ac
===
--- configure.ac(revision 85802)
+++ configure.ac(working copy)
@@ -680,15 +680,6 @@
 ## we would like a POSIX sed, and need one on Solaris
 AC_PATH_PROGS(SED, sed, /bin/sed, [/usr/xpg4/bin:$PATH])
 
-## 'which' is not POSIX, and might be a shell builtin or alias
-##  (but should not be in 'sh')
-AC_PATH_PROGS(WHICH, which, which)
-if test "${WHICH}" = which ; then
-  ## needed to build and run R
-  ## ends up hard-coded in the utils package
-  AC_MSG_ERROR([[which is required but missing]])
-fi
-
 ## Make
 : ${MAKE=make}
 AC_SUBST(MAKE)
Index: src/library/base/Makefile.in
===
--- src/library/base/Makefile.in(revision 85802)
+++ src/library/base/Makefile.in(working copy)
@@ -28,7 +28,7 @@
 all: Makefile DESCRIPTION
@$(ECHO) "building package '$(pkg)'"
@$(MKINSTALLDIRS) $(top_builddir)/library/$(pkg)
-   @WHICH="@WHICH@" $(MAKE) mkRbase mkdesc2 mkdemos2
+   @$(MAKE) mkRbase mkdesc2 mkdemos2
@$(INSTALL_DATA) $(srcdir)/inst/CITATION $(top_builddir)/library/$(pkg)
 
 include $(top_srcdir)/share/make/basepkg.mk
@@ -45,12 +45,12 @@
 mkR: mkRbase
 
 Rsimple:
-   @WHICH="@WHICH@" $(MAKE) mkRbase mkRsimple
+   @$(MAKE) mkRbase mkRsimple
 
 ## Remove files to allow this to be done repeatedly
 Rlazy:
-@rm -f  $(top_builddir)/library/$(pkg)/R/$(pkg)*
-   @WHICH="@WHICH@" $(MAKE) mkRbase
+   @$(MAKE) mkRbase
@cat $(srcdir)/makebasedb.R | \
  R_DEFAULT_PACKAGES=NULL LC_ALL=C $(R_EXE) > /dev/null
@$(INSTALL_DATA) $(srcdir)/baseloader.R \
@@ -57,4 +57,4 @@
  $(top_builddir)/library/$(pkg)/R/$(pkg)
 
 Rlazycomp:
-   @WHICH="@WHICH@" $(MAKE) mkRbase mklazycomp
+   @$(MAKE) mkRbase mklazycomp
Index: src/library/base/R/unix/system.unix.R
===
--- src/library/base/R/unix/system.unix.R   (revision 85802)
+++ src/library/base/R/unix/system.unix.R   (working copy)
@@ -114,23 +114,14 @@
 Sys.which <- function(names)
 {
 res <-

Re: [Rd] Choices to remove `srcref` (and its buddies) when serializing objects

2024-01-12 Thread Ivan Krylov via R-devel

В Fri, 12 Jan 2024 00:11:45 -0500
Dipterix Wang  пишет:

> I wonder how hard it would be to have options to discard source when
> serializing R objects? 

> Currently my analyses heavily depend on digest function to generate
> file caches and automatically schedule pipelines (to update cache)
> when changes are detected.

Source references may be the main problem here, but not the only one.
There are also string encodings and function bytecode (which may or may
not be present and probably changes between R versions). I've been
collecting the ways that the objects that are identical() to each other
can serialize() differently in my package 'depcache'; I'm sure I missed
a few.

Admittedly, string encodings are less important nowadays (except on
older Windows and weirdly set up Unix-like systems). Thankfully, the
digest package already knows to skip the serialization header (which
contains the current version of R).

serialize() only knows about basic types [*], and source references are
implemented on top of these as objects of class 'srcref'. Sometimes
they are attached as attributes to other objects, other times (e.g. in
quote(function(){}), [**]) just sitting there as arguments to a call.

Sometimes you can hash the output of deparse(x) instead of serialize(x)
[***]. Text representations aren't without their own problems (e.g.
IEEE floating-point numbers not being representable as decimal
fractions), but at least deparsing both ignores the source references
and punts the encoding problem to the abstraction layer above it:
deparse() is the same for both '\uff' and iconv('\uff', 'UTF-8',
'latin1'): just "ÿ".

Unfortunately, this doesn't solve the environment problem. For these,
you really need a way to canonicalize the reference-semantics objects
before serializing them without changing the originals, even in cases
like a <- new.env(); b <- new.env(); a$x <- b; b$x <- a. I'm not sure
that reference hooks can help with that. In order to implement it
properly, the fixup process will have to rely on global state and keep
weak references to the environments it visits and creates shadow copies
of.

I think it's not impossible to implement
serialize_to_canonical_representation() for an R package, but it will
be a lot of work to decide which parts are canonical and which should
be discarded.

-- 
Best regards,
Ivan

[*]
https://cran.r-project.org/doc/manuals/R-ints.html#Serialization-Formats

[**]
https://bugs.r-project.org/show_bug.cgi?id=18638

[***]
https://stat.ethz.ch/pipermail/r-devel/2023-March/082505.html

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] using Paraview "in-situ" with R?

2024-01-09 Thread Ivan Krylov via R-devel

В Tue, 9 Jan 2024 14:20:17 +
Mike Marchywka  пишет:

> it seems like an excellent tool to interface to R allowing
> visualization without a bunch of temp files or 
> 
> Is anyone aware of anyone doing this interface or reasons its  a
> boondoggle?

This sounds like it's better suited for r-package-de...@r-project.org,
not R-devel itself.

In theory, nothing should prevent you from writing C++ code interfacing
with ParaView (via its "adios" streaming library) and with R. The Rcpp
package will likely help you bring the semantics of the two languages
closer together. (Memory allocation and error handling are the two
major topics where R and C++ especially disagree.)

On the R side, make an object with reference semantics (i.e. an
external pointer) and use callbacks to update it with new information
while R code is running. On the R extension side, translate these
callbacks into necessary calls to the adios library to transfer the
data to ParaView.

For more informaion, see Writing R Extensions at
 and Rcpp
documentation at .

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

41 matches

Mail list logo