Re: [Rd] using Paraview "in-situ" with R?

2024-01-09 Thread Ivan Krylov via R-devel
В Tue, 9 Jan 2024 14:20:17 +
Mike Marchywka  пишет:

> it seems like an excellent tool to interface to R allowing
> visualization without a bunch of temp files or 
> 
> Is anyone aware of anyone doing this interface or reasons its  a
> boondoggle?

This sounds like it's better suited for r-package-de...@r-project.org,
not R-devel itself.

In theory, nothing should prevent you from writing C++ code interfacing
with ParaView (via its "adios" streaming library) and with R. The Rcpp
package will likely help you bring the semantics of the two languages
closer together. (Memory allocation and error handling are the two
major topics where R and C++ especially disagree.)

On the R side, make an object with reference semantics (i.e. an
external pointer) and use callbacks to update it with new information
while R code is running. On the R extension side, translate these
callbacks into necessary calls to the adios library to transfer the
data to ParaView.

For more informaion, see Writing R Extensions at
 and Rcpp
documentation at .

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Choices to remove `srcref` (and its buddies) when serializing objects

2024-01-12 Thread Ivan Krylov via R-devel
В Fri, 12 Jan 2024 00:11:45 -0500
Dipterix Wang  пишет:

> I wonder how hard it would be to have options to discard source when
> serializing R objects? 

> Currently my analyses heavily depend on digest function to generate
> file caches and automatically schedule pipelines (to update cache)
> when changes are detected.

Source references may be the main problem here, but not the only one.
There are also string encodings and function bytecode (which may or may
not be present and probably changes between R versions). I've been
collecting the ways that the objects that are identical() to each other
can serialize() differently in my package 'depcache'; I'm sure I missed
a few.

Admittedly, string encodings are less important nowadays (except on
older Windows and weirdly set up Unix-like systems). Thankfully, the
digest package already knows to skip the serialization header (which
contains the current version of R).

serialize() only knows about basic types [*], and source references are
implemented on top of these as objects of class 'srcref'. Sometimes
they are attached as attributes to other objects, other times (e.g. in
quote(function(){}), [**]) just sitting there as arguments to a call.

Sometimes you can hash the output of deparse(x) instead of serialize(x)
[***]. Text representations aren't without their own problems (e.g.
IEEE floating-point numbers not being representable as decimal
fractions), but at least deparsing both ignores the source references
and punts the encoding problem to the abstraction layer above it:
deparse() is the same for both '\uff' and iconv('\uff', 'UTF-8',
'latin1'): just "ÿ".

Unfortunately, this doesn't solve the environment problem. For these,
you really need a way to canonicalize the reference-semantics objects
before serializing them without changing the originals, even in cases
like a <- new.env(); b <- new.env(); a$x <- b; b$x <- a. I'm not sure
that reference hooks can help with that. In order to implement it
properly, the fixup process will have to rely on global state and keep
weak references to the environments it visits and creates shadow copies
of.

I think it's not impossible to implement
serialize_to_canonical_representation() for an R package, but it will
be a lot of work to decide which parts are canonical and which should
be discarded.

-- 
Best regards,
Ivan

[*]
https://cran.r-project.org/doc/manuals/R-ints.html#Serialization-Formats

[**]
https://bugs.r-project.org/show_bug.cgi?id=18638

[***]
https://stat.ethz.ch/pipermail/r-devel/2023-March/082505.html

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Sys.which() caching path to `which`

2024-01-12 Thread Ivan Krylov via R-devel
On Thu, 11 Jan 2024 09:30:55 +1300
Simon Urbanek  wrote:

> That said, WHICH is a mess - it may make sense to switch to the
> command -v built-in which is part of POSIX (where available - which
> is almost everywhere today) which would not require an external tool

This is a bit tricky to implement. I've prepared the patch at the end
of this e-mail, tested it on GNU/Linux and tried to test on OpenBSD [*]
(I cannot test on a Mac), but then I realised one crucial detail:
unlike `which`, `command -v` returns names of shell builtins if
something is both an executable and a builtin. So for things like `[`,
Sys.which would behave differently if changed to use command -v:

$ sh -c 'which ['
/usr/bin/[
$ sh -c 'command -v ['
[

R checks the returned string with file.exists(), so the new
Sys.which('[') returns an empty string instead of /usr/bin/[. That's
probably undesirable, isn't it?

Index: configure
===
--- configure   (revision 85802)
+++ configure   (working copy)
@@ -949,7 +949,6 @@
 PDFTEX
 TEX
 PAGER
-WHICH
 SED
 INSTALL_DATA
 INSTALL_SCRIPT
@@ -5390,66 +5389,6 @@
 done
 test -n "$SED" || SED="/bin/sed"
 
-
-## 'which' is not POSIX, and might be a shell builtin or alias
-##  (but should not be in 'sh')
-for ac_prog in which
-do
-  # Extract the first word of "$ac_prog", so it can be a program name with 
args.
-set dummy $ac_prog; ac_word=$2
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
-printf %s "checking for $ac_word... " >&6; }
-if test ${ac_cv_path_WHICH+y}
-then :
-  printf %s "(cached) " >&6
-else $as_nop
-  case $WHICH in
-  [\\/]* | ?:[\\/]*)
-  ac_cv_path_WHICH="$WHICH" # Let the user override the test with a path.
-  ;;
-  *)
-  as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
-for as_dir in $PATH
-do
-  IFS=$as_save_IFS
-  case $as_dir in #(((
-'') as_dir=./ ;;
-*/) ;;
-*) as_dir=$as_dir/ ;;
-  esac
-for ac_exec_ext in '' $ac_executable_extensions; do
-  if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then
-ac_cv_path_WHICH="$as_dir$ac_word$ac_exec_ext"
-printf "%s\n" "$as_me:${as_lineno-$LINENO}: found 
$as_dir$ac_word$ac_exec_ext" >&5
-break 2
-  fi
-done
-  done
-IFS=$as_save_IFS
-
-  ;;
-esac
-fi
-WHICH=$ac_cv_path_WHICH
-if test -n "$WHICH"; then
-  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $WHICH" >&5
-printf "%s\n" "$WHICH" >&6; }
-else
-  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5
-printf "%s\n" "no" >&6; }
-fi
-
-
-  test -n "$WHICH" && break
-done
-test -n "$WHICH" || WHICH="which"
-
-if test "${WHICH}" = which ; then
-  ## needed to build and run R
-  ## ends up hard-coded in the utils package
-  as_fn_error $? "which is required but missing" "$LINENO" 5
-fi
-
 ## Make
 : ${MAKE=make}
 
Index: configure.ac
===
--- configure.ac(revision 85802)
+++ configure.ac(working copy)
@@ -680,15 +680,6 @@
 ## we would like a POSIX sed, and need one on Solaris
 AC_PATH_PROGS(SED, sed, /bin/sed, [/usr/xpg4/bin:$PATH])
 
-## 'which' is not POSIX, and might be a shell builtin or alias
-##  (but should not be in 'sh')
-AC_PATH_PROGS(WHICH, which, which)
-if test "${WHICH}" = which ; then
-  ## needed to build and run R
-  ## ends up hard-coded in the utils package
-  AC_MSG_ERROR([[which is required but missing]])
-fi
-
 ## Make
 : ${MAKE=make}
 AC_SUBST(MAKE)
Index: src/library/base/Makefile.in
===
--- src/library/base/Makefile.in(revision 85802)
+++ src/library/base/Makefile.in(working copy)
@@ -28,7 +28,7 @@
 all: Makefile DESCRIPTION
@$(ECHO) "building package '$(pkg)'"
@$(MKINSTALLDIRS) $(top_builddir)/library/$(pkg)
-   @WHICH="@WHICH@" $(MAKE) mkRbase mkdesc2 mkdemos2
+   @$(MAKE) mkRbase mkdesc2 mkdemos2
@$(INSTALL_DATA) $(srcdir)/inst/CITATION $(top_builddir)/library/$(pkg)
 
 include $(top_srcdir)/share/make/basepkg.mk
@@ -45,12 +45,12 @@
 mkR: mkRbase
 
 Rsimple:
-   @WHICH="@WHICH@" $(MAKE) mkRbase mkRsimple
+   @$(MAKE) mkRbase mkRsimple
 
 ## Remove files to allow this to be done repeatedly
 Rlazy:
-@rm -f  $(top_builddir)/library/$(pkg)/R/$(pkg)*
-   @WHICH="@WHICH@" $(MAKE) mkRbase
+   @$(MAKE) mkRbase
@cat $(srcdir)/makebasedb.R | \
  R_DEFAULT_PACKAGES=NULL LC_ALL=C $(R_EXE) > /dev/null
@$(INSTALL_DATA) $(srcdir)/baseloader.R \
@@ -57,4 +57,4 @@
  $(top_builddir)/library/$(pkg)/R/$(pkg)
 
 Rlazycomp:
-   @WHICH="@WHICH@" $(MAKE) mkRbase mklazycomp
+   @$(MAKE) mkRbase mklazycomp
Index: src/library/base/R/unix/system.unix.R
===
--- src/library/base/R/unix/system.unix.R   (revision 85802)
+++ src/library/base/R/unix/system.unix.R   (working copy)
@@ -114,23 +114,14 @@
 Sys.which <- function(names)
 {
 res <- characte

Re: [Rd] Choices to remove `srcref` (and its buddies) when serializing objects

2024-01-18 Thread Ivan Krylov via R-devel
В Tue, 16 Jan 2024 14:16:19 -0500
Dipterix Wang  пишет:

> Could you recommend any packages/functions that compute hash such
> that the source references and sexpinfo_struct are ignored? Basically
> a version of `serialize` that convert R objects to raw without
> storing the ancillary source reference and sexpinfo.

I can show how this can be done, but it's not currently on CRAN or even
a well-defined package API. I have adapted a copy of R's serialize()
[*] with the following changes:

 * Function bytecode and flags are ignored:

f <- function() invisible()
depcache:::hash(f, 2) # This is plain FNV1a-64 of serialize() output
# [1] "9b7a1af5468deba4"
.Call(depcache:::C_hash2, f) # This is the new hash
[1] 91 5f b8 a1 b0 6b cb 40
f() # called once: function gets the MAYBEJIT_MASK flag
depcache:::hash(f, 2)
# [1] "7d30e05546e7a230"
.Call(depcache:::C_hash2, f)
# [1] 91 5f b8 a1 b0 6b cb 40
f() # called twice: function now has bytecode
depcache:::hash(f, 2)
# [1] "2a2cba4150e722b8"
.Call(depcache:::C_hash2, f)
# [1] 91 5f b8 a1 b0 6b cb 40 # new hash stays the same

 * Source references are ignored:

.Call(depcache:::C_hash2, \( ) invisible( ))
# [1] 91 5f b8 a1 b0 6b cb 40 # compare vs. above

# For quoted function definitions, source references have to be handled
# differently 
.Call(depcache:::C_hash2, quote(function(){}))
[1] 58 0d 44 8e d4 fd 37 6f
.Call(depcache:::C_hash2, quote(\( ){  }))
[1] 58 0d 44 8e d4 fd 37 6f

 * ALTREP is ignored:

identical(1:10, 1:10+0L)
# [1] TRUE
identical(serialize(1:10, NULL), serialize(1:10+0L, NULL))
# [1] FALSE
identical(
 .Call(depcache:::C_hash2, 1:10),
 .Call(depcache:::C_hash2, 1:10+0L)
)
# [1] TRUE

 * Strings not marked as bytes are encoded into UTF-8:

identical('\uff', iconv('\uff', 'UTF-8', 'latin1'))
# [1] TRUE
identical(
 serialize('\uff', NULL),
 serialize(iconv('\uff', 'UTF-8', 'latin1'), NULL)
)
# [1] FALSE
identical(
 .Call(depcache:::C_hash2, '\uff'),
 .Call(depcache:::C_hash2, iconv('\uff', 'UTF-8', 'latin1'))
)
# [1] TRUE

 * NaNs with different payloads (except NA_numeric_) are replaced by
   R_NaN.

One of the many downsides to the current approach is that we rely on
the non-API entry point getPRIMNAME() in order to hash builtins.
Looking at the source code for identical() is no help here, because it
uses the private PRIMOFFSET macro.

The bitstream being hashed is also, unfortunately, not exactly
compatible with R serialization format version 2: I had to ignore the
LEVELS of the language objects being hashed both because identical()
seems to ignore those and because I was missing multiple private
definitions (e.g. the MAYBEJIT flag) to handle them properly.

Then there's also the problem of immediate bindings [**]: I've seen bits
of vctrs, rstudio, rlang blow up when calling CAR() on SEXP objects that
are not safe to handle this way, but R_expand_binding_value() (used by
serialize()) is again a private function that is not accessible from
packages. identical() won't help here, because it compares reference
objects (which may or may not contain such immediate bindings) by their
pointer values instead of digging down into them.

Dropping the (already violated) requirement to be compatible with R
serialization bitstream will make it possible to simplify the code
further.

Finally:

a <- new.env()
b <- new.env()
a$x <- b$x <- 42
identical(a, b)
# [1] FALSE
.Call(depcache:::C_hash2, a)
# [1] 44 21 f1 36 5d 92 03 1b
.Call(depcache:::C_hash2, b)
# [1] 44 21 f1 36 5d 92 03 1b

...but that's unavoidable when looking at frozen object contents
instead of their live memory layout.

If you're interested, here's the development version of the package:
install.packages('depcache',contriburl='https://aitap.github.io/Rpackages')

-- 
Best regards,
Ivan

[*]
https://github.com/aitap/depcache/blob/serialize_canonical/src/serialize.c

[**]
https://svn.r-project.org/R/trunk/doc/notes/immbnd.md

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: Choices to remove `srcref` (and its buddies) when serializing objects

2024-01-18 Thread Ivan Krylov via R-devel
On Thu, 18 Jan 2024 09:59:31 -0600 (CST)
luke-tier...@uiowa.edu wrote:

> What does 'blow up' mean? If it is anything other than signal a "bad
> binding access" error then it would be good to have more details.

My apologies for not being precise enough. I meant the "bad binding
access" error in all such cases.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Advice debugging M1Mac check errors

2024-02-05 Thread Ivan Krylov via R-devel
On Sun, 4 Feb 2024 20:41:51 +0100
Holger Hoefling  wrote:

> I wanted to ask if people have good advice on how to debug M1Mac
> package check errors when you don´t have a Mac?

Apologies for not answering the question you asked, but is this about
hdf5r and problems printing R_xlen_t [*] that appeared in 1.3.8 and you
tried to solve in 1.3.9?

We had a thread about this last November:
https://stat.ethz.ch/pipermail/r-package-devel/2023q4/010123.html

To summarise, there is no single standard C format specifier that can be
used to print R_xlen_t. As an implementation detail, it can be defined
as int or ptrdiff_t (or something completely different in the future),
and ptrdiff_t itself is usually defined as long or long long (or, also,
something completely different on a weirder platform). All three basic
types can have different widths and cause painful stack-related
problems when a mismatch happens.

In R-4.4, there will be a macro R_PRIdXLEN_T defining a compatible
printf specifier. Until then (and for compatibility with R-4.3 and
lower), it's relatively safe to cast to (long long) or (ptrdiff_t) and
then use the corresponding specifier, but that's not 100% future-proof.
Also, mind the warnings that mingw compilers sometimes emit for "new"
printf specifiers despite UCRT is documented to support them.

-- 
Best regards,
Ivan

[*] https://www.stats.ox.ac.uk/pub/bdr/M1mac/hdf5r.out

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Difficult debug

2024-02-07 Thread Ivan Krylov via R-devel
On Wed, 07 Feb 2024 14:01:44 -0600
"Therneau, Terry M., Ph.D. via R-devel"  wrote:

>  > test2 <- mysurv(fit2, pbc2$bili4, p0= 4:0/10, fit2, x0 =50)  
> ==31730== Invalid read of size 8
> ==31730==    at 0x298A07: Rf_allocVector3 (memory.c:2861)
> ==31730==    by 0x299B2C: Rf_allocVector (Rinlinedfuns.h:595)
> ==31730==    by 0x299B2C: R_alloc (memory.c:2330)
> ==31730==    by 0x3243C6: do_which (summary.c:1152)
<...>
> ==31730==  Address 0x10 is not stack'd, malloc'd or (recently) free'd
<...>
>   *** caught segfault ***
> address 0x10, cause 'memory not mapped'

An unrelated allocation function suddenly dereferencing a null pointer
is likely indication of heap corruption. Valgrind may be silent about
it because the C heap (that it knows how to override and track) is still
intact, but the R memory management metadata got corrupted (which looks
like a valid memory access to Valgrind).

An easy solution could be brought by more instrumentation.

R can tell Valgrind to consider some memory accesses invalid if you
configure it using --with-valgrind-instrumentation [*], but I'm not
sure it will be able to trap overwriting GC metadata, so let's set it
aside for now.

If you compile your own R, you can configure it with -fsanitize=address
added to the compiler and linker flags [**]. I'm not sure whether the
bounds checks performed by AddressSanitizer would be sufficient to
catch the problem, but it's worth a try. Instead of compiling R with
sanitizers, it should be also possible to use the container image
docker.io/rocker/r-devel-san.

The hard option is left if no instrumentation lets you pinpoint the
error. Since the first (as far as Valgrind is concerned) memory error
already happens to result in a SIGSEGV, you can run R in a regular
debugger and try to work backwards from the local variables at the
location of the crash. Maybe there's a way to identify the block
containing the pointer that gets overwritten and set a watchpoint on
it for the next run of R. Maybe you can read the overwritten value as
double and guess where the number came from. If your processor is
sufficiently new, you can try `rr`, the time-travelling debugger [***],
to rewind the process execution back to the point where the pointer gets
overwritten.

-- 
Best regards,
Ivan

[*]
https://cran.r-project.org/doc/manuals/R-exts.html#Using-valgrind

[**]
https://cran.r-project.org/doc/manuals/R-exts.html#Using-Address-Sanitizer

[***]
https://rr-project.org
Judging by the domain name, it's practically designed to fix troublesome
bugs in R packages!

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] certain pipe() use cases not working in r-devel

2024-02-15 Thread Ivan Krylov via R-devel
В Wed, 14 Feb 2024 14:43:12 -0800
Jennifer Bryan  пишет:

> But in r-devel on macOS, this is silent no-op, i.e. "hello, world"
> does not print:
> 
> > R.version.string  
> [1] "R Under development (unstable) (2024-02-13 r85895)"
> > con <- pipe("cat")
> > writeLines("hello, world", con)  

I can reproduce this on 64-bit Linux.

I think that this boils down to problems with cleanup in R_pclose_pg
[*]. The FILE* fp corresponding to the child process pipe is created
using fdopen() in R_popen_pg(), but R_pclose_pg() only performs close()
on the file descriptor returned by fileno(). The FILE* itself is
leaked, and any buffered content waiting to be written out is lost.

One of the last few lines in the strace output before the process
terminates is the standard C library cleaning up the FILE* object and
trying to flush the buffer:

$ strace -f bin/R -q -s \
 -e 'writeLines("hello", x <- pipe("cat")); close(x)'
...skip...
write(5, "hello\n", 6)  = -1 EBADF (Bad file descriptor)
exit_group(0)   = ?
+++ exited with 0 +++

There is a comment saying "see timeout_wait for why not to use fclose",
which I think references a different function, R_pclose_timeout():

>> Do not use fclose, because on Solaris it sets errno to "Invalid
>> seek" when the pipe is already closed (e.g. because of timeout).
>> fclose would not return an error, but it would set errno and the
>> non-zero errno would then be reported by R's "system" function.

(There are no comments about fclose() in timeout_wait() itself.)

Is there a way to work around the errno problem without letting the
FILE* leak?

-- 
Best regards,
Ivan

[*] Introduced in https://bugs.r-project.org/show_bug.cgi?id=17764#c6
to run child processes in a separate process group, safe from
interrupts aimed at R.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Tcl socket server (tcltk) does not work any more on R 4.3.2

2024-02-20 Thread Ivan Krylov via R-devel
В Tue, 20 Feb 2024 12:27:35 +0100
"webmail.gandi.net"  пишет:

> When R process #1 is R 4.2.3, it works as expected (whatever version
> of R #2). When R process #1 is R 4.3.2, nothing is sent or received
> through the socket apparently, but no error is issued and process #2
> seems to be able to connect to the socket.

The difference is related to the change in
src/library/tcltk/src/tcltk_unix.c.

In R-4.2.1, the function static void TclSpinLoop(void *data) says:

int max_ev = 100;
/* Tcl_ServiceAll is not enough here, for reasons that escape me */
while (Tcl_DoOneEvent(TCL_DONT_WAIT) && max_ev) max_ev--;

In R-devel, the function instead says:

int i = R_TCL_SPIN_MAX; 
while (i-- && Tcl_ServiceAll())
;

Manually calling Tcl_DoOneEvent(0) from the debugger at this point
makes the Tcl code respond to the connection. Tcl_ServiceAll() seems to
be still not enough. I'll try reading Tcl documentation to investigate
this further.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Tcl socket server (tcltk) does not work any more on R 4.3.2

2024-02-21 Thread Ivan Krylov via R-devel
В Wed, 21 Feb 2024 08:01:16 +0100
"webmail.gandi.net"  пишет:

> Since the {tcltk} package was working fine with  "while
> (Tcl_DoOneEvent(TCL_DONT_WAIT) && max_ev) max_ev—;", unless there is
> a clear performance enhancement with "while (i-- &&
> Tcl_ServiceAll())", it would perhaps be wise to revert this back.

I forgot to mention the comment in the new version of the function
explaining the switch:

>> [Tcl_DoOneEvent(TCL_DONT_WAIT)] <...> causes infinite recursion with
>> R handlers that have a re-entrancy guard, when TclSpinLoop is
>> invoked from such a handler (seen with Rhttp server)

The difference between Tcl_ServiceAll() and Tcl_DoOneEvent() is that
the latter calls Tcl_WaitForEvent(). The comments say that it is called
for the side effect of queuing the events detected by select(). The
function can indeed be observed to access the fileHandlers via the
thread-specific data pointer, which contain the file descriptors and
the instructions saying what to do with them.

Without Tcl_WaitForEvent, the only event sources known to Tcl are
RTcl_{setup,check}Proc (which only checks file descriptors owned by R),
Display{Setup,Check}Proc (which seems to be owned by Tk), and
Timer{Setup,Check}Proc (for which there doesn't seem to be any timers
by default).

As far as I understand the problem, while the function
worker_input_handler() from src/modules/internet/Rhttpd.c is running,
TclHandler() might be invoked, causing Tcl_DoOneEvent() to call
RTcl_checkProc() and therefore trying to run worker_input_handler()
again. The Rhttpd handler prevents this and doesn't clear the
condition, which causes the event loop to keep calling it. Is that
correct? Are there easy ways to reproduce the problem?

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] How to avoid the Markdown code block bug on R Bugzilla

2024-02-27 Thread Ivan Krylov via R-devel
Hello,

There's a rare but annoying bug in Bugzilla 5.1.2...5.3.2+ where a
Markdown code block inside a comment may be replaced by U+F111 or
U+F222, and then the following code blocks may end up being replaced by
the preceding ones. For example, the problem can be seen in PR16158:
https://bugs.r-project.org/show_bug.cgi?id=16158.

Here's how to avoid it:

1. If no code blocks have been already swallowed by Bugzilla, use the
comment preview to make sure yours won't be swallowed either. If you do
see a  or a  instead of your code block in the preview tab, try:
 - starting the comment with an empty line
 - removing the colons from the starting sentence
 - if all else fails, switching Markdown off

2. If you would like to post some code into a bug where this has
already happened, the preview won't be enough. Bugzilla::Markdown has
separate queues for fenced code blocks and indented code blocks, so if
one was swallowed, it may be possible to post the other. Unfortunately,
you won't know whether it'll fail until you post the comment, and by
then it may be a part of the problem. The only safe way to continue is
to switch Markdown off for the comment.

A technical analysis of the bug is available at
,
but it may take a while to get this fixed on the Bugzilla side.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Never exporting .__global__ and .__suppressForeign__?

2024-03-06 Thread Ivan Krylov via R-devel
Hello,

(Dear Richard, I hope you don't mind being Cc:'d on this thread in
R-devel. This is one of the ways we can prevent similar problems from
happening in the future.)

Sometimes, package authors who use both exportPattern('.') and
utils::globalVariables(...) get confusing WARNINGs about undocumented
exports:
https://stat.ethz.ch/pipermail/r-package-devel/2024q1/010531.html

I would like to suggest adding the variables used by
utils::globalVariables and utils::suppressForeignCheck to the list of
things that should never be exported:

Index: src/library/base/R/namespace.R
===
--- src/library/base/R/namespace.R  (revision 86054)
+++ src/library/base/R/namespace.R  (working copy)
@@ -806,7 +806,8 @@
 if (length(exports)) {
 stoplist <- c(".__NAMESPACE__.", ".__S3MethodsTable__.",
   ".packageName", ".First.lib", ".onLoad",
-  ".onAttach", ".conflicts.OK", ".noGenerics")
+  ".onAttach", ".conflicts.OK", ".noGenerics",
+  ".__global__", ".__suppressForeign__")
 exports <- exports[! exports %in% stoplist]
 }
if(lev > 2L) message("--- processing exports for ", dQuote(package))

(Indeed, R CMD check is very careful to only access these variables
using the interface functions in the utils package, so there doesn't
seem to be any code that depends on them being exported, and they
usually aren't.)

Alternatively (or maybe additionally), it may be possible to enhance
the R CMD check diagnostics by checking whether the name of the
undocumented object starts with a dot and asking the user whether it
was intended to be exported. This is not as easy to implement due to
tools:::.check_packages working with the log output from
tools::undoc(), not the object itself. Would a change to
tools:::format.undoc be warranted?

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Spurious warning in as.data.frame.factor()

2024-03-13 Thread Ivan Krylov via R-devel
В Tue, 12 Mar 2024 12:33:17 -0700
Hervé Pagès  пишет:

> The acrobatics that as.data.frame.factor() is going thru in order to 
> recognize a direct call don't play nice if as.data.frame() is an S4 
> generic:
> 
>      df <- as.data.frame(factor(11:12))
> 
>      suppressPackageStartupMessages(library(BiocGenerics))
>      isGeneric("as.data.frame")
>      # [1] TRUE
> 
>      df <- as.data.frame(factor(11:12))
>      # Warning message:
>      # In as.data.frame.factor(factor(11:12)) :
>      #   Direct call of 'as.data.frame.factor()' is deprecated.

How about something like the following:

Index: src/library/base/R/zzz.R
===
--- src/library/base/R/zzz.R(revision 86109)
+++ src/library/base/R/zzz.R(working copy)
@@ -681,7 +681,14 @@
 bdy <- body(as.data.frame.vector)
 bdy <- bdy[c(1:2, seq_along(bdy)[-1L])] # taking [(1,2,2:n)] to insert at 
[2]:
 ## deprecation warning only when not called by method dispatch from 
as.data.frame():
-bdy[[2L]] <- quote(if((sys.nframe() <= 1L || !identical(sys.function(-1L), 
as.data.frame)))
+bdy[[2L]] <- quote(if((sys.nframe() <= 1L || !(
+   identical(sys.function(-1L), as.data.frame) || (
+   .isMethodsDispatchOn() &&
+   methods::is(sys.function(-1L), 'derivedDefaultMethod') &&
+   identical(
+   sys.function(-1L)@generic,
+   structure('as.data.frame', package = 'base')
+   )
.Deprecated(
msg = gettextf(
"Direct call of '%s()' is deprecated.  Use '%s()' or
'%s()' instead",

The patch passes make check-devel, but I'm not sure how to safely put
setGeneric('as.data.frame'); as.data.frame(factor(1:10)) in a
regression test.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Spurious warning in as.data.frame.factor()

2024-03-14 Thread Ivan Krylov via R-devel
On Thu, 14 Mar 2024 10:41:54 +0100
Martin Maechler  wrote:

> Anybody trying S7 examples and see if they work w/o producing
> wrong warnings?

It looks like this is not applicable to S7. If I overwrite
as.data.frame with a newly created S7 generic, it fails to dispatch on
existing S3 classes:

new_generic('as.data.frame', 'x')(factor(1))
# Error: Can't find method for `as.data.frame(S3)`.

But there is no need to overwrite the generic, because S7 classes
should work with existing S3 generics:

foo <- new_class('foo', parent = class_double)
method(as.data.frame, foo) <- function(x) structure(
 # this is probably not generally correct
 list(x),
 names = deparse1(substitute(x)),
 row.names = seq_len(length(x)),
 class = 'data.frame'
)
str(as.data.frame(foo(pi)))
# 'data.frame':   1 obs. of  1 variable:
#  $ x:  num 3.14

So I think that is nothing to break because S7 methods for
as.data.frame will rely on S3 for dispatch.

> > The patch passes make check-devel, but I'm not sure how to safely
> > put setGeneric('as.data.frame'); as.data.frame(factor(1:10)) in a
> > regression test.  
> 
> {What's the danger/problem?  we do have "similar" tests in both
>   src/library/methods/tests/*.R
>   tests/reg-S4.R
> 
>  -- maybe we can discuss bi-laterally  (or here, as you prefer)
> }

This might be educational for other people wanting to add a regression
test to their patch. I see that tests/reg-tests-1e.R is already running
under options(warn = 2), so if I add the following near line 750
("Deprecation of *direct* calls to as.data.frame.")...

# Should not warn for a call from a derivedDefaultMethod to the raw
# S3 method -- implementation detail of S4 dispatch
setGeneric('as.data.frame')
as.data.frame(factor(1))

...then as.data.frame will remain an S4 generic. Should the test then
rm(as.data.frame) and keep going? (Or even keep the S4 generic?) Is
there any hidden state I may be breaking for the rest of the test this
way? The test does pass like this, so this may be worrying about
nothing.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Spurious warning in as.data.frame.factor()

2024-03-17 Thread Ivan Krylov via R-devel
On Fri, 15 Mar 2024 11:24:22 +0100
Martin Maechler  wrote:

> I think just adding
> 
>  removeGeneric('as.data.frame')
> 
> is appropriate here as it is self-explaining and should not leave
> much traces.

Thanks for letting me know! I'll make sure to use removeGeneric() in
similar cases in the future.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Wish: a way to track progress of parallel operations

2024-03-25 Thread Ivan Krylov via R-devel
Hello R-devel,

A function to be run inside lapply() or one of its friends is trivial
to augment with side effects to show a progress bar. When the code is
intended to be run on a 'parallel' cluster, it generally cannot rely on
its own side effects to report progress.

I've found three approaches to progress bars for parallel processes on
CRAN:

 - Importing 'snow' (not 'parallel') internals like sendCall and
   implementing parallel processing on top of them (doSNOW). This has
   the downside of having to write higher-level code from scratch
   using undocumented inferfaces.

 - Splitting the workload into length(cluster)-sized chunks and
   processing them in separate parLapply() calls between updating the
   progress bar (pbapply). This approach trades off parallelism against
   the precision of the progress information: the function has to wait
   until all chunk elements have been processed before updating the
   progress bar and submitting a new portion; dynamic load balancing
   becomes much less efficient.

 - Adding local side effects to the function and detecting them while
   the parallel function is running in a child process (parabar). A
   clever hack, but much harder to extend to distributed clusters.

With recvData and recvOneData becoming exported in R-4.4 [*], another
approach becomes feasible: wrap the cluster object (and all nodes) into
another class, attach the progress callback as an attribute, and let
recvData / recvOneData call it. This makes it possible to give wrapped
cluster objects to unchanged code, but requires knowing the precise
number of chunks that the workload will be split into.

Could it be feasible to add an optional .progress argument after the
ellipsis to parLapply() and its friends? We can require it to be a
function accepting (done_chunk, total_chunks, ...). If not a new
argument, what other interfaces could be used to get accurate progress
information from staticClusterApply and dynamicClusterApply?

I understand that the default parLapply() behaviour is not very
amenable to progress tracking, but when running clusterMap(.scheduling
= 'dynamic') spanning multiple hours if not whole days, having progress
information sets the mind at ease.

I would be happy to prepare code and documentation. If there is no time
now, we can return to it after R-4.4 is released.

-- 
Best regards,
Ivan

[*] https://bugs.r-project.org/show_bug.cgi?id=18587

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wish: a way to track progress of parallel operations

2024-03-26 Thread Ivan Krylov via R-devel
Henrik,

Thank you for taking the time to read and reply to my message!

On Mon, 25 Mar 2024 10:19:38 -0700
Henrik Bengtsson  wrote:

> * Target a solution that works the same regardless whether we run in
> parallel or not, i.e. the code/API should look the same regardless of
> using, say, parallel::parLapply(), parallel::mclapply(), or
> base::lapply(). The solution should also work as-is in other parallel
> frameworks.

You are absolutely right about mclapply(): it suffers from the same
problem where the task running inside it has no reliable mechanism of
reporting progress. Just like on a 'parallel' cluster (which can be
running on top of an R connection, MPI, the 'mirai' package, a server
pretending to be multiple cluster nodes, or something completely
different), there is currently no documented interface for the task to
report any additional data except the result of the computation.

> I argue the end-user should be able to decided whether they want to
> "see" progress updates or not, and the developer should focus on
> where to report on progress, but not how and when.

Agreed. As a package developer, I don't even want to bother calling
setTxtProgressBar(...), but it gets most of the job done at zero
dependency cost, and the users don't complain. The situation could
definitely be improved.

> It is possible to use the existing PSOCK socket connections to send
> such 'immediateCondition':s.

Thanks for pointing me towards ClusterFuture, that's a great hack, and
conditions are a much better fit for progress tracking than callbacks.

It would be even better if 'parallel' clusters could "officially"
handle immediateConditions and re-signal them in the main R session.
Since R-4.4 exports (but not yet documents) sendData, recvData and
recvOneData generics from 'parallel', we are still in a position to
codify and implement the change to the 'parallel' cluster back-end API.

It shouldn't be too hard to document the requirement that recvData() /
recvOneData() must signal immediateConditions arriving from the nodes
and patch the existing cluster types (socket and MPI). Not sure how
hard it will be to implement for 'mirai' clusters.

> I honestly think we could arrive at a solution where base-R proposes
> a very light, yet powerful, progress API that handles all of the
> above. The main task is to come up with a standard API/protocol -
> then the implementation does not matter.

Since you've already given it a lot of thought, which parts of
progressr would you suggest for inclusion into R, besides 'parallel'
clusters and mclapply() forwarding immediateConditions from the worker
processes?

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] paths capability FALSE on devel?

2024-03-27 Thread Ivan Krylov via R-devel
В Wed, 27 Mar 2024 11:28:17 +0100
Alexandre Courtiol  пишет:

> after installing R-devel the output of
> grDevices::dev.capabilities()$paths is FALSE, while it is TRUE for R
> 4.3.3

Your system must be missing Cairo development headers, making x11()
fall back to type = 'Xlib':

$ R-devel -q -s -e 'x11(); grDevices::dev.capabilities()$paths'
 [1] TRUE
$ R-devel -q -s -e \
 'x11(type="Xlib"); grDevices::dev.capabilities()$paths'
 [1] FALSE

If that's not the case and capabilities()['cairo'] is TRUE in your
build of R-devel, please show us the sessionInfo() from your build of
R-devel.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] hand compile; link to MKL fails at BLAS zdotu

2024-03-30 Thread Ivan Krylov via R-devel
В Sat, 30 Mar 2024 10:55:48 +
Ramón Fallon  пишет:

> In contrast to Dirk's solution, I've found R's configure script
> doesn't recognise the update-alternatives system on debian/ubuntu, if
> it's MKL.

It ought to work if configured with --with-blas=-lblas
--with-lapack=-llapack, but, as you found out (and I can confirm), if
libblas.so and liblapack.so already point to MKL, ./configure somehow
fails the test for zdotu and falls back to bundled Rblas and Rlapack.

If you'd like the built R to work with the update-alternatives system,
the workaround seems to help is to temporarily switch the alternatives
to reference BLAS & LAPACK, configure and build R, and then switch the
alternatives back to MKL.

> appending "-lmkl_gf_lp64" to the --with-blas option does not help
> (that's suggested by several posts out there).

MKL has an official "link line advisor" at
,
which may suggest a completely different set of linker options
depending on what it is told. Here's how R's zdotu test always fails
when linking directly with MKL:

# pre-configure some variables
echo '#define HAVE_F77_UNDERSCORE 1' > confdefs.h
FC=gfortran
FFLAGS='-g -Og'
CC=gcc
CFLAGS='-g -Og'
CPPFLAGS=-I/usr/local/include
MAIN_LDFLAGS='-Wl,--export-dynamic -fopenmp'
LDFLAGS='-L/usr/local/lib'
LIBM=-lm
FLIBS=' -lgfortran -lm -lquadmath'
# copied & pasted from the Intel web page
BLAS_LIBS='-lmkl_rt -Wl,--no-as-needed -lpthread -lm -ldl'

# R prepares to call zdotu from Fortran...
cat > conftestf.f < 1.0d-10) then
iflag = 1
  else
iflag = 0
  endif
  end
EOF
${FC} ${FFLAGS} -c conftestf.f

# and then call the Fortran subroutine from the C runner...
cat > conftest.c <
#include "confdefs.h"
#ifdef HAVE_F77_UNDERSCORE
# define F77_SYMBOL(x)   x ## _
#else
# define F77_SYMBOL(x)   x
#endif
extern void F77_SYMBOL(test1)(int *iflag);

int main () {
  int iflag;
  F77_SYMBOL(test1)(&iflag);
  exit(iflag);
}
EOF
${CC} ${CPPFLAGS} ${CFLAGS} -c conftest.c

# and then finally link and execute the program
${CC} ${CPPFLAGS} ${CFLAGS} ${LDFLAGS} ${MAIN_LDFLAGS} \
 -o conftest conftest.o conftestf.o \
 ${BLAS_LIBS} ${FLIBS} ${LIBM}
./conftest

It seems to crash inside MKL!

rax=cccd rbx=5590ee102008 rcx=7ffdab2ddb20 
rdx=5590ee102008 
rsi=7ffdab2ddb18 rdi=5590ee10200c rbp=7ffdab2dd910 
rsp=7ffdab2db600 
 r8=5590ee102008  r9=7ffdab2ddb28 r10=7f4086a99178 
r11=7f4086e02490 
r12=5590ee10200c r13=7ffdab2ddb20 r14=5590ee102008 
r15=7ffdab2ddb28 
ip = 7f4086e02a60, sp = 7ffdab2db600 [mkl_blas_zdotu+1488]
ip = 7f4085dc5250, sp = 7ffdab2dd920 [zdotu+256]
ip = 5590ee1011cc, sp = 7ffdab2ddb40 [test1_+91]
ip = 5590ee101167, sp = 7ffdab2ddb70 [main+14]

It's especially strange that R does seem to work if you just
update-alternatives after linking it with the reference BLAS, but
./conftest starts crashing again in the same place. This is with
Debian's MKL version 2020.4.304-4, by the way.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] hand compile; link to MKL fails at BLAS zdotu

2024-03-30 Thread Ivan Krylov via R-devel
В Sat, 30 Mar 2024 20:31:25 +0300
Ivan Krylov via R-devel  пишет:

> It seems to crash inside MKL!

Should have read some more about mkl_gf_lp64 before posting. According
to the Intel forums, it is indeed required in order to work with the
GFortran calling convention, but if you're linking against it, you also
have to add the rest of the linker command line, i.e.:

-lmkl_gf_lp64 -lmkl_core -lmkl_sequential 
-Wl,--no-as-needed -lpthread -lm -ldl

https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/ARPACK-with-MKL-crashes-when-calling-zdotc/m-p/1054316

Maybe it's even documented somewhere, but Intel makes it too annoying
to read their documentation, and they definitely don't mention it in
the link line advisor. There's also the ominous comment saying that

>> you cannot call standard BLAS [c,z]dot[c,u] functions from C/C++
>> because the interface library that is linked is specific for
>> GFortran which has a different calling convention of returning a
>> Complex type and would cause issues

I'm not seeing any calls to [c,z]dot[c,u] from inside R's C code (which
is why R seems to work when running with libmkl_rt.so), and the
respective declarations in R_ext/BLAS.h have an appropriate warning:

>> WARNING!  The next two return a value that may not be compatible
>> between C and Fortran, and even if it is, this might not be the
>> right translation to C.

...so it's likely that everything will keep working.

Indeed, R configured with

--with-blas='-lmkl_gf_lp64 -lmkl_core -lmkl_sequential'
--with-lapack='-lmkl_gf_lp64 -lmkl_core -lmkl_sequential'

seems to work with MKL.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bug in out-of-bounds assignment of list object to expression() vector

2024-04-05 Thread Ivan Krylov via R-devel
On Fri, 5 Apr 2024 08:15:20 -0400
June Choe  wrote:

> When assigning a list to an out of bounds index (ex: the next, n+1
> index), it errors the same but now changes the values of the vector
> to NULL:
> 
> ```
> x <- expression(a,b,c)
> x[[4]] <- list() # Error
> x
> #> expression(NULL, NULL, NULL)  
> ```
> 
> Curiously, this behavior disappears if a prior attempt is made at
> assigning to the same index, using a different incompatible object
> that does not share this bug (like a function)

Here's how the problem happens:

1. The call lands in src/main/subassign.c, do_subassign2_dflt().

2. do_subassign2_dflt() calls SubassignTypeFix() to prepare the operand
for the assignment.

3. Since the assignment is "stretching", SubassignTypeFix() calls
EnlargeVector() to provide the space for the assignment.

The bug relies on `x` not being IS_GROWABLE(), which may explain 
why a plain x[[4]] <- list() sometimes doesn't fail.

The future assignment result `x` is now expression(a, b, c, NULL), and
the old `x` set to expression(NULL, NULL, NULL) by SET_VECTOR_ELT(newx,
i, VECTOR_ELT(x, i)); CLEAR_VECTOR_ELT(x, i); during EnlargeVector().

4. But then the assignment fails, raising the error back in
do_subassign2_dflt(), because the assignment kind is invalid: there is
no way to put data.frames into an expression vector. The new resized
`x` is lost, and the old overwritten `x` stays there.

Not sure what the right way to fix this is. It's desirable to avoid
shallow_duplicate(x) for the overwriting assignments, but then the
sub-assignment must either succeed or leave the operand untouched.
Is there a way to perform the type check before overwriting the operand?

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wish: a way to track progress of parallel operations

2024-04-09 Thread Ivan Krylov via R-devel
Dear Henrik (and everyone else):

Here's a patch implementing support for immediateConditions in
'parallel' socket clusters. What do you think?

I've tried to make the feature backwards-compatible in the sense that
an older R starting a newer cluster worker will not pass the flag
enabling condition passing and so will avoid being confused by packets
with type = 'CONDITION'.

In order to propagate the conditions in a timely manner, all 'parallel'
functions that currently use recvData() on individual nodes will have
to switch to calling recvOneData(). I've already adjusted
staticClusterApply(), but e.g. clusterCall() would still postpone
immediateConditions from nodes later in the list (should they appear).

If this is deemed a good way forward, I can prepare a similar patch for
the MPI and socket clusters implemented in the 'snow' package.

-- 
Best regards,
Ivan
Index: src/library/parallel/R/clusterApply.R
===
--- src/library/parallel/R/clusterApply.R	(revision 86373)
+++ src/library/parallel/R/clusterApply.R	(working copy)
@@ -28,8 +28,12 @@
 end <- min(n, start + p - 1L)
 	jobs <- end - start + 1L
 for (i in 1:jobs)
-sendCall(cl[[i]], fun, argfun(start + i - 1L))
-val[start:end] <- lapply(cl[1:jobs], recvResult)
+sendCall(cl[[i]], fun, argfun(start + i - 1L),
+ tag = start + i - 1L)
+for (i in 1:jobs) {
+d <- recvOneResult(cl)
+val[d$tag] <- list(d$value)
+}
 start <- start + jobs
 }
 checkForRemoteErrors(val)
Index: src/library/parallel/R/snow.R
===
--- src/library/parallel/R/snow.R	(revision 86373)
+++ src/library/parallel/R/snow.R	(working copy)
@@ -120,7 +120,8 @@
 rprog = file.path(R.home("bin"), "R"),
 snowlib = .libPaths()[1],
 useRscript = TRUE, # for use by snow clusters
-useXDR = TRUE)
+useXDR = TRUE,
+forward_conditions = TRUE)
 defaultClusterOptions <<- addClusterOptions(emptyenv(), options)
 }
 
Index: src/library/parallel/R/snowSOCK.R
===
--- src/library/parallel/R/snowSOCK.R	(revision 86373)
+++ src/library/parallel/R/snowSOCK.R	(working copy)
@@ -32,6 +32,7 @@
 methods <- getClusterOption("methods", options)
 useXDR <- getClusterOption("useXDR", options)
 homogeneous <- getClusterOption("homogeneous", options)
+forward_conditions <- getClusterOption('forward_conditions', options)
 
 ## build the local command for starting the worker
 env <- paste0("MASTER=", master,
@@ -40,7 +41,8 @@
  " SETUPTIMEOUT=", setup_timeout,
  " TIMEOUT=", timeout,
  " XDR=", useXDR,
- " SETUPSTRATEGY=", setup_strategy)
+ " SETUPSTRATEGY=", setup_strategy,
+ " FORWARDCONDITIONS=", forward_conditions)
 ## Should cmd be run on a worker with R <= 4.0.2,
 ## .workRSOCK will not exist, so fallback to .slaveRSOCK
 arg <- "tryCatch(parallel:::.workRSOCK,error=function(e)parallel:::.slaveRSOCK)()"
@@ -130,17 +132,26 @@
 sendData.SOCKnode <- function(node, data) serialize(data, node$con)
 sendData.SOCK0node <- function(node, data) serialize(data, node$con, xdr = FALSE)
 
-recvData.SOCKnode <- recvData.SOCK0node <- function(node) unserialize(node$con)
+recvData.SOCKnode <- recvData.SOCK0node <- function(node) repeat {
+val <- unserialize(node$con)
+if (val$type != 'CONDITION') return(val)
+signalCondition(val$value)
+}
 
 recvOneData.SOCKcluster <- function(cl)
 {
 socklist <- lapply(cl, function(x) x$con)
 repeat {
-ready <- socketSelect(socklist)
-if (length(ready) > 0) break;
+repeat {
+ready <- socketSelect(socklist)
+if (length(ready) > 0) break;
+}
+n <- which.max(ready) # may need rotation or some such for fairness
+value <- unserialize(socklist[[n]])
+if (value$type != 'CONDITION')
+return(list(node = n, value = value))
+signalCondition(value$value)
 }
-n <- which.max(ready) # may need rotation or some such for fairness
-list(node = n, value = unserialize(socklist[[n]]))
 }
 
 makePSOCKcluster <- function(names, ...)
@@ -349,6 +360,7 @@
 timeout <- 2592000L   # wait 30 days for new cmds before failing
 useXDR <- TRUE# binary serialization
 setup_strategy <- "sequential"
+forward_conditions <- FALSE
 
 for (a in commandArgs(TRUE)) {
 ## Or use strsplit?
@@ -365,6 +377,9 @@
SETUPSTRATEGY = {
setup_strategy <- match.arg(value,
c("sequential", "parallel")

Re: [Rd] View() segfaulting ...

2024-04-24 Thread Ivan Krylov via R-devel
On Wed, 24 Apr 2024 19:35:42 -0400
Ben Bolker  wrote:

>  I'm using bleeding-edge R-devel, so maybe my build is weird. Can 
> anyone else reproduce this?
> 
>View() seems to crash on just about anything.

Not for me, sorry.

If you have a sufficiently new processor, you can use `rr` [*] to
capture the crash, set a breakpoint in in_R_X11_dataviewer and rewind,
then set a watchpoint on the stack canary and run the program forward
again:
https://www.redhat.com/en/blog/debugging-stack-protector-failures

If you can't locate the canary, try setting watchpoints on large local
variables. Without `rr`, the procedure is probably the same, but
without rewinding: set a breakpoint in in_R_X11_dataviewer, set some
watchpoints, see if they fire when they shouldn't, start from scratch
if you get past the watchpoints and the process crashes.

I think that that either an object file didn't get rebuilt when it
should have, or a shared library used by something downstream from
View() got an ABI-breaking update. If this still reproduces with a clean
rebuild of R, it's definitely worth investigating further, perhaps using
AddressSanitizer. Valgrind may be lacking the information about the
stack canary and thus failing to distinguish between overwriting the
canary and normal access to a stack variable via a pointer.

-- 
Best regards,
Ivan

[*] https://rr-project.org/
Edit distance of one from the domain name of the R project!

Use rr replay -g $EVENT_NUMBER to debug past the initial execve()
from the shell wrapper: https://github.com/rr-debugger/rr/wiki/FAQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: Is ALTREP "non-API"?

2024-04-25 Thread Ivan Krylov via R-devel
On Wed, 24 Apr 2024 15:31:39 -0500 (CDT)
luke-tierney--- via R-devel  wrote:

> We would be better off (in my view, not necessarily shared by others
> in R-core) if we could get to a point where:
> 
>  all entry points listed in installed header files can be used in
>  packages, at least with some caveats;
> 
>  the caveats are expressed in a standard way that is searchable,
>  e.g. with a standardized comment syntax at the header file or
>  individual declaration level.

This sounds almost like Doxygen, although the exact syntax used to
denote the entry points and the necessary comments is far from the most
important detail at this point.

> There are some 500 entry points in the R shared library that are in
> the installed headers but not mentioned in WRE. These would need to
> be reviewed and adjusted.

Is there a way for outsiders to help? For example, would it help to
produce the linking graph (package P links to entry points X, Y)? I
understand that an entry point being unpopular doesn't mean it
shouldn't be public (and the other way around), but combined with a
list of entry points that are listed in WRE, such a graph could be
useful to direct effort or estimate impact from interface changes.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Big speedup in install.packages() by re-using connections

2024-04-25 Thread Ivan Krylov via R-devel
On Thu, 25 Apr 2024 14:45:04 +0200
Jeroen Ooms  wrote:

> Thoughts?

How verboten would it be to create an empty external pointer object,
add it to the preserved list, and set an on-exit finalizer to clean up
the curl multi-handle? As far as I can tell, the internet module is not
supposed to be unloaded, so this would not introduce an opportunity to
jump to an unmapped address. This makes it possible to avoid adding a
CurlCleanup() function to the internet module:

Index: src/modules/internet/libcurl.c
===
--- src/modules/internet/libcurl.c  (revision 86484)
+++ src/modules/internet/libcurl.c  (working copy)
@@ -55,6 +55,47 @@
 
 static int current_timeout = 0;
 
+// The multi-handle is shared between downloads for reusing connections
+static CURLM *shared_mhnd = NULL;
+static SEXP mhnd_sentinel = NULL;
+
+static void cleanup_mhnd(SEXP ignored)
+{
+if(shared_mhnd){
+curl_multi_cleanup(shared_mhnd);
+shared_mhnd = NULL;
+}
+curl_global_cleanup();
+}
+static void rollback_mhnd_sentinel(void* sentinel) {
+// Failed to allocate memory while registering a finalizer,
+// therefore must release the object
+R_ReleaseObject((SEXP)sentinel);
+}
+static CURLM *get_mhnd(void)
+{
+if (!mhnd_sentinel) {
+  SEXP sentinel = PROTECT(R_MakeExternalPtr(NULL, R_NilValue, R_NilValue));
+  R_PreserveObject(sentinel);
+  UNPROTECT(1);
+  // Avoid leaking the sentinel before setting the finalizer
+  RCNTXT cntxt;
+  begincontext(&cntxt, CTXT_CCODE, R_NilValue, R_BaseEnv, R_BaseEnv,
+   R_NilValue, R_NilValue);
+  cntxt.cend = &rollback_mhnd_sentinel;
+  cntxt.cenddata = sentinel;
+  R_RegisterCFinalizerEx(sentinel, cleanup_mhnd, TRUE);
+  // Succeeded, no need to clean up if endcontext() fails allocation
+  mhnd_sentinel = sentinel;
+  cntxt.cend = NULL;
+  endcontext(&cntxt);
+}
+if(!shared_mhnd) {
+  shared_mhnd = curl_multi_init();
+}
+return shared_mhnd;
+}
+
 # if LIBCURL_VERSION_MAJOR < 7 || (LIBCURL_VERSION_MAJOR == 7 && 
LIBCURL_VERSION_MINOR < 28)
 
 // curl/curl.h includes  and headers it requires.
@@ -565,8 +606,6 @@
if (c->hnd && c->hnd[i])
curl_easy_cleanup(c->hnd[i]);
 }
-if (c->mhnd)
-   curl_multi_cleanup(c->mhnd);
 if (c->headers)
curl_slist_free_all(c->headers);
 
@@ -668,7 +707,7 @@
c.headers = headers = tmp;
 }
 
-CURLM *mhnd = curl_multi_init();
+CURLM *mhnd = get_mhnd();
 if (!mhnd)
error(_("could not create curl handle"));
 c.mhnd = mhnd;


-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R 4.4.0 has version of Matrix 1.7-0, but it's not available on CRAN

2024-04-26 Thread Ivan Krylov via R-devel
On Fri, 26 Apr 2024 12:32:59 +0200
Martin Maechler  wrote:

> Finally, I'd think it definitely would be nice for
> install.packages("Matrix") to automatically get the correct
> Matrix version from CRAN ... so we (R-core) would be grateful
> for a patch to install.packages() to achieve this

Since the binaries offered on CRAN are already of the correct version
(1.7-0 for -release and -devel), only source package installation needs
to concern itself with the Recommended subdirectory.

Would it be possible to generate the PACKAGES* index files in the
4.4.0/Recommended subdirectory? Then on the R side it would be needed
to add a new repo (adjusting chooseCRANmirror() to set it together with
repos["CRAN"]) and keep the rest of the machinery intact.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R 4.4.0 has version of Matrix 1.7-0, but it's not available on CRAN

2024-04-26 Thread Ivan Krylov via R-devel
On Fri, 26 Apr 2024 13:15:47 +0200
Gábor Csárdi  wrote:

> That's not how this worked in the past AFAIR. Simply, the packages in
> the x.y.z/Recommended directories were included in
> src/contrib/PACKAGES*, metadata, with the correct R version
> dependencies, in the correct order, so that `install.packages()`
> automatically installed the correct version without having to add
> extra repositories or manually search for package files.

That's great, then there is no need to patch anything. Thanks for
letting me know.

Should we be asking c...@r-project.org to add 4.4.0/Recommended to the
index, then?

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] max on numeric_version with long components

2024-04-27 Thread Ivan Krylov via R-devel
В Sat, 27 Apr 2024 13:56:58 -0500
Jonathan Keane  пишет:

> In devel:
> > max(numeric_version(c("1.0.1.1", "1.0.3.1",  
> "1.0.2.1")))
> [1] ‘1.0.1.1’
> > max(numeric_version(c("1.0.1.1000", "1.0.3.1000",  
> "1.0.2.1000")))
> [1] ‘1.0.3.1000’

Thank you Jon for spotting this!

This is an unintended consequence of
https://bugs.r-project.org/show_bug.cgi?id=18697.

The old behaviour of max() was to call
which.max(xtfrm(x)), which first produced a permutation that sorted the
entire .encode_numeric_version(x). The new behavioiur is to call
which.max directly on .encode_numeric_version(x), which is faster (only
O(length(x)) instead of a sort).

What do the encoded version strings look like?

x <- numeric_version(c(
 "1.0.1.1", "1.0.3.1", "1.0.2.1"
))
# Ignore the attributes
(e <- as.vector(.encode_numeric_version(x)))
# [1] "101575360400"
# [2] "103575360400"
# [3] "102575360400"

# order(), xtfrm(), sort() all agree that e[2] is the maximum:
order(e)
# [1] 1 3 2
xtfrm(e)
# [1] 1 3 2
sort(e)
# [1] "101575360400"
# [2] "102575360400"
# [3] "103575360400"

# but not which.max:
which.max(e)
# [1] 1

This happens because which.max() converts its argument to double, which
loses precision:

(n <- as.numeric(e))
# [1] 1e+27 1e+27 1e+27
identical(n[1], n[2])
# [1] TRUE
identical(n[3], n[2])
# [1] TRUE

Will be curious to know if there is a clever way to keep both the O(N)
complexity and the full arbitrary precision.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] R hang/bug with circular references and promises

2024-05-13 Thread Ivan Krylov via R-devel
On Mon, 13 May 2024 09:54:27 -0500 (CDT)
luke-tierney--- via R-devel  wrote:

> Looks like I added that warning 22 years ago, so that should be enough
> notice :-). I'll look into removing it now.

Dear Luke,

I've got a somewhat niche use case: as a way of protecting myself
against rogue *.rds files and vulnerabilities in the C code, I've been
manually unserializing "plain" data objects (without anything
executable), including environments, in R [1].

I see that SET_ENCLOS() is already commented as "not API and probably
should not be <...> used". Do you think there is a way to recreate an
environment, taking the REFSXP entries into account, without
`parent.env<-`?  Would you recommend to abandon the folly of
unserializing environments manually?

-- 
Best regards,
Ivan

[1]
https://codeberg.org/aitap/unserializeData/src/commit/33d72705c1ee265349b3e369874ce4b47f9cd358/R/unserialize.R#L289-L313

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] FR: Customize background colour of row and column headers for the View output

2024-05-16 Thread Ivan Krylov via R-devel
The change suggested by Iago Giné Vázquez is indeed very simple. It
sets the background colour of the row and column headers to the
background of the rest of the dataentry window. With this patch, R
passes 'make check'. As Duncan Murdoch mentions, the X11 editor already
behaves this way.

If it's not acceptable to make the row and column headers the same
colour as the rest of the text, let's make it into a separate setting.

--- src/library/utils/src/windows/dataentry.c   (revision 86557)
+++ src/library/utils/src/windows/dataentry.c   (working copy)
@@ -1474,7 +1474,7 @@
 resize(DE->de, r);
 
 DE->CellModified = DE->CellEditable = FALSE;
-bbg = dialog_bg();
+bbg = guiColors[dataeditbg];
 /* set the active cell to be the upper left one */
 DE->crow = 1;
 DE->ccol = 1;

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] confint Attempts to Use All Server CPUs by Default

2024-05-21 Thread Ivan Krylov via R-devel
В Tue, 21 May 2024 08:00:11 +
Dario Strbenac via R-devel  пишет:

> Would a less resource-intensive value, such as 1, be a safer default
> CPU value for confint?

Which confint() method do you have in mind? There is at least four of
them by default in R, and many additional classes could make use of
stats:::confint.default by implementing vcov().

> Also, there is no mention of such parallel processing in ?confint, so
> it was not clear at first where to look for performance degradation.
> It could at least be described in the manual page so that users would
> know that export OPENBLAS_NUM_THREADS=1 is a solution.

There isn't much R can do about the behaviour of the BLAS, because
there is no standard interface to set the number of threads. Some BLASes
(like ATLAS) don't even offer it as a tunable number at all [*].

A system administrator could link the installation of R against
FlexiBLAS [**], provide safe defaults in the environment variables and
educate the users about its tunables [***], but that's a choice just
like it had been a choice to link R against a parallel variant of
OpenBLAS on a shared computer. This is described in R Installation and
Administration, section A.3.1 [].

-- 
Best regards,
Ivan

[*]
https://math-atlas.sourceforge.net/faq.html#tnum

[**]
https://www.mpi-magdeburg.mpg.de/projects/flexiblas

[***]
https://search.r-project.org/CRAN/refmans/flexiblas/html/flexiblas-threads.html

[]
https://cran.r-project.org/doc/manuals/R-admin.html#BLAS

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Mismatches for methods registered for non-generic:

2024-05-27 Thread Ivan Krylov via R-devel
В Mon, 27 May 2024 10:52:26 +
"Koenker, Roger W"  пишет:

> that have been fine until now and on my fresh R version 4.4.0
> (2024-04-24) are still ok with R CMD check —as-cran

This extra check requires the environment variable
_R_CHECK_S3_METHODS_SHOW_POSSIBLE_ISSUES_ to be set to TRUE to show the
issues.

> but CRAN checking reveals, e.g.
> 
> Check: S3 generic/method consistency, Result: NOTE
>  Mismatches for methods registered for non-generic:
>  as:
>function(object, Class, strict, ext)
>  as.matrix.coo:
>function(x, nrow, ncol, eps, …)
> 
> which I interpret as regarding  my generics as just S3 methods for
> the non-generic “as”.

There are calls to S3method("as", ...) and S3method("is", ...) at the
end of the NAMESPACE for the current CRAN version of SparseM. If I
comment them out, the package passes R CMD check --as-cran without the
NOTE and seemingly with no extra problems.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Hard crash of lme4 in R-devel

2024-06-15 Thread Ivan Krylov via R-devel
В Sat, 15 Jun 2024 02:04:31 +
"Therneau, Terry M., Ph.D. via R-devel"  пишет:

> other attached packages:
> [1] lme4_1.1-35.1  Matrix_1.7-0 

I see you have a new Matrix (1.7-0 from 2024-04-26 with a new ABI) but
an older lme4 (1.1-35.1 from 2023-11-05).

I reproduced the crash and the giant backtrace by first installing
latest lme4 and then updating Matrix. With the latest version of lme4,
this results in a warning:

library(lme4)
# Loading required package: Matrix
# Warning message:
# In check_dep_version() : ABI version mismatch:
# lme4 was built with Matrix ABI version 1
# Current Matrix ABI version is 2
# Please re-install lme4 from source or restore original 'Matrix'
# package

The version of lme4 that you have installed doesn't have this check
because it only appeared in March 2024:
https://github.com/lme4/lme4/commit/8be641b7a1fd5b6e6ac962552add13e29bb5ff5b

The crash should go away if you update or at least reinstall lme4 from
source.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Creating a text-based device/output format

2024-06-25 Thread Ivan Krylov via R-devel
В Tue, 25 Jun 2024 09:42:59 +
David McArthur  пишет:

> ggplot(data, aes(x=body_mass_g,fill = species)) +
>   geom _histogram()
> 
> Could output something like:
> 
> title "The body mass (g) of penguin species"
> x-axis "Body mass (g)" 3000 --> 5550
> y-axis "Count" 0 --> 2
> histogram
>   Adelie [3000, 3250, 3400]
>   ChinStrap [3250, 3600]
>   Gentoo [4300, 5050, 5200, 5300, 5450]
> 
> How should I go about this in R?

R graphics devices are very low-level:
https://cran.r-project.org/doc/manuals/r-devel/R-ints.html#Graphics-devices

Instead of drawing histograms, they are asked to draw points, lines,
polygons, and text. If you're curious what it's like to implement such
a device, packages such as 'devEMF' and 'ragg' will provide examples.

You could instead go the 'txtplot' route and implement your own
functions that would return text in mermaid syntax, unrelated to
existing plotting engines. An alternative would be to carefully
deconstruct 'ggplot2' and 'lattice' objects and translate what you can
into mermaid diagrams, but that will always be limited to the
intersection of the source and target featuresets.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] API documentation for R

2024-06-26 Thread Ivan Krylov via R-devel
В Thu, 25 Apr 2024 10:10:44 -0700
Kevin Ushey  пишет:

> I'm guessing the most welcome kinds of contributions would be
> documentation? IMHO, "documenting an API" and "describing how an API
> can be used" are somewhat separate endeavors. I believe R-exts does an
> excellent job of the latter, but may not be the right vehicle for the
> former. To that end, I believe it would be helpful to have some
> structured API documentation as a separate R-api document.

Now that we have a machine-readable list of APIs in the form of
system.file('wre.txt', package = 'tools') (which is not yet an API
itself, but I trust we'll be able to adapt to ongoing changes), it's
possible to work on such an R-api document.

I've put a proof of concept that checks its Texinfo indices against the
list of @apifun entries in wre.txt at 
with a rendered version at . I've
tried to address Agner's concerns [*] about R_NO_REMAP by showing the
declarations available with or without this preprocessor symbol
defined.

34 vaguely documented entry points out of 538 lines in wre.txt is
obviously not enough, but I'm curious whether this is the right
direction. Should we keep to a strict structure, like in Rd files, with
a table for every argument and the return value? Can we group functions
together, or should there be a separate @node for every function and
variable? Is Rd (and Henrik's earlier work [**]) a better format than
Texinfo for a searchable C API reference?

-- 
Best regards,
Ivan

[*] https://stat.ethz.ch/pipermail/r-package-devel/2024q2/010913.html

[**] https://github.com/HenrikBengtsson/RNativeAPI

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Fixing a CRAN note

2024-06-26 Thread Ivan Krylov via R-devel
26 июня 2024 г. 16:42:39 GMT+03:00, "Therneau, Terry M., Ph.D. via R-devel" 
 пишет:
>What is it complaining about -- that it doesn't like my name?

>* checking CRAN incoming feasibility ... [7s/18s] NOTE
>Maintainer: ‘Terry Therneau ’
>
>Found the following \keyword or \concept entries
>which likely give several index terms:
>   File ‘deming.Rd’:
>     \keyword{models, regression}
I think that the check points out that in order to specify multiple keywords, 
you need to use \keyword{models} and \keyword{regression} separately, not 
\keyword{models, regression} in one Rd command.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] write.csv problems

2024-06-28 Thread Ivan Krylov via R-devel
В Fri, 28 Jun 2024 11:02:12 -0500
Spencer Graves  пишет:

> df1 <- data.frame(x=1)
> class(df1) <- c('findFn', 'data.frame')
> write.csv(df1, 'df1.csv')
> # Error in x$Package : $ operator is invalid for atomic vectors

Judging by the traceback, only data frames that have a Package column
should have a findFn class:

9: PackageSummary(xi)
8: `[.findFn`(x, needconv)
7: x[needconv]
6: lapply(x[needconv], as.character)
5: utils::write.table(df1, "df1.csv", col.names = NA, sep = ",",
   dec = ".", qmethod = "double")

write.table sees columns that aren't of type character yet and tries to
convert them one by one, subsetting the data frame as a list. The call
lands in sos:::`[.findFn`

if (missing(j)) {
xi <- x[i, ]
attr(xi, "PackageSummary") <- PackageSummary(xi)
class(xi) <- c("findFn", "data.frame")
return(xi)
}

Subsetting methods are hard. For complex structures like data.frames,
`[.class` must handle all of x[rows,cols]; x[rows,]; x[,cols];
x[columns]; x[], and also respect the drop argument:
https://stat.ethz.ch/pipermail/r-help/2021-December/473207.html

I think that the `[.findFn` method mistakes x[needconv] for
x[needconv,] when it should instead perform x[,needconv].

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Making use of win32_segv

2024-06-30 Thread Ivan Krylov via R-devel
Hello R-devel,

When checking packages on Windows, a crash of the process looks like a
sudden stop in the output of the child process, which can be very
perplexing for package maintainers (e.g. [1,2]), especially if the
stars are only right on Win-Builder but not on the maintainer's PC.

On Unix-like systems, we get a loud message from the SIGSEGV handler
and (if we're lucky and the memory manager is still mostly intact) an
R-level traceback. A similar signal handler, win32_segv(), is defined
in src/main.c for use on Windows, but I am not seeing a way it could be
called in the current codebase. The file src/gnuwin32/psignal.c that's
responsible for signals on Windows handles Ctrl+C but does not emit
SIGSEGV or SIGILL. Can we make use of vectored exception handling [3]
to globally catch unhandled exceptions in the Win32 process and
transform them into raise(SIGSEGV)?

One potential source of problems is threading. The normal Unix-like
sigactionSegv(...) doesn't care; if a non-main thread causes a crash
with SIGSEGV unblocked, it will run in that thread's context and call R
API from there. On Windows, where a simple REprintf() may go into a GUI
window, the crash handler may be written in a more cautious manner:

 - Only set up the crash handler in Rterm, because this is mostly for
   the benefit of the people reading the R CMD check output
 - Compare GetCurrentThreadId() against a saved value and don't call
   R_Traceback() if it doesn't match
 - Rewrite win32_segv in terms of StringCcbPrintf to static storage and
   WriteFile(GetStdHandle(STD_ERROR_HANDLE), ...), which may be too much

Attached is a crude first draft to see if the approach is viable. If it
turns out to be a good idea, I can add the Rterm or thread ID checks, a
reentrancy guard, declare and export a special struct win32_segvinfo
from psignal.c, put all the crash reporting in win32_segv(), and move
the VEH setup into psignal.c's signal(). (Just don't want to waste the
effort if this proves ill-advised.)

Without the patch:
User@WIN-LGTSPJA3F1V MSYS /c/R/R-svn/src/gnuwin32
$ cat crash.c
void crash(void) { *(double*)42 = 42; }

User@WIN-LGTSPJA3F1V MSYS /c/R/R-svn/src/gnuwin32
$ ../../bin/R -q -s -e 'dyn.load("crash.dll"); .C("crash")'
Segmentation fault # <-- printed by MSYS2 shell

With the patch:
User@WIN-LGTSPJA3F1V MSYS /c/R/R-svn/src/gnuwin32
$ ../../bin/R -q -s -e 'dyn.load("crash.dll"); .C("crash")'
*** caught access violation at program counter 0x7ff911c61387 ***
accessing address 0x002a, action: write

Traceback:
 1: .C("crash")
Segmentation fault

With the patch applied, I am not seeing changes in make check-devel or
package checks for V8. Couldn't test rJava yet. 

-- 
Best regards,
Ivan

[1]
https://stat.ethz.ch/pipermail/r-package-devel/2024q2/010919.html

[2]
https://stat.ethz.ch/pipermail/r-package-devel/2024q2/010872.html

[3]
https://learn.microsoft.com/en-us/windows/win32/debug/vectored-exception-handling
Index: src/gnuwin32/sys-win32.c
===
--- src/gnuwin32/sys-win32.c(revision 86850)
+++ src/gnuwin32/sys-win32.c(working copy)
@@ -26,6 +26,8 @@
 #include 
 #endif
 
+#include 
+
 #include 
 #include 
 #include 
@@ -384,3 +386,69 @@
return rval;
 }
 }
+
+static WINAPI LONG veh_report_and_raise(PEXCEPTION_POINTERS ei)
+{
+int signal = 0;
+const char *exception = 0;
+switch (ei->ExceptionRecord->ExceptionCode) {
+case EXCEPTION_ILLEGAL_INSTRUCTION:
+   exception = "illegal instruction";
+   signal = SIGILL;
+   break;
+case EXCEPTION_ACCESS_VIOLATION:
+   exception = "access violation";
+   signal = SIGSEGV;
+   break;
+case EXCEPTION_ARRAY_BOUNDS_EXCEEDED:
+   exception = "array bounds overflow";
+   signal = SIGSEGV;
+   break;
+case EXCEPTION_DATATYPE_MISALIGNMENT:
+   exception = "datatype misalignment";
+   signal = SIGSEGV;
+   break;
+case EXCEPTION_IN_PAGE_ERROR:
+   exception = "page load failure";
+   signal = SIGSEGV;
+   break;
+case EXCEPTION_PRIV_INSTRUCTION:
+   exception = "privileged instruction";
+   signal = SIGILL;
+   break;
+case EXCEPTION_STACK_OVERFLOW:
+   exception = "stack overflow";
+   signal = SIGSEGV;
+   break;
+default: /* do nothing */ ;
+}
+if (signal) {
+   REprintf("*** caught %s at program counter %p ***\n",
+exception, ei->ExceptionRecord->ExceptionAddress);
+   /* just two more special cases */
+   switch (ei->ExceptionRecord->ExceptionCode) {
+   case EXCEPTION_ACCESS_VIOLATION:
+   case EXCEPTION_IN_PAGE_ERROR:
+   {
+   const char * action;
+   switch (ei->ExceptionRecord->ExceptionInformation[0]) {
+   case 0: action = "read"; break;
+   case 1: action = "write"; break;
+   case 8: action = "execute"; break;
+   default: acti

Re: [Rd] Large vector support in data.frames

2024-07-02 Thread Ivan Krylov via R-devel
В Wed, 19 Jun 2024 09:52:20 +0200
Jan van der Laan  пишет:

> What is the status of supporting long vectors in data.frames (e.g. 
> data.frames with more than 2^31 records)? Is this something that is 
> being worked on? Is there a time line for this? Is this something I
> can contribute to?

Apologies if you've already received a better answer off-list.

>From from my limited understanding, the problem with supporting
larger-than-(2^31-1) dimensions has multiple facets:

 - In many parts of R code, there's the assumption that dim() is
   of integer type. That wouldn't be a problem by itself, except...

 - R currently lacks a native 64-bit integer type. About a year ago
   Gabe Becker mentioned that Luke Tierney has been considering
   improvements in this direction, but it's hard to introduce 64-bit
   integers without making the user worry even more about data types
   (numeric != integer != 64-bit integer) or introducing a lot of
   overhead (64-bit integers being twice as large as 32-bit ones and,
   depending on the workload, frequently redundant).

 - Two-dimensional objects eventually get transformed into matrices and
   handed to LAPACK for linear algebra operations. Currently, the
   interface used by R to talk to BLAS and LAPACK only supports 32-bit
   signed integers for lengths. 64-bit BLASes and LAPACKs do exist
   (e.g. OpenBLAS can be compiled with 64-bit lengths), but we haven't
   taught R to use them.

   (This isn't limited to array dimensions, by the way. If you try to
   svd() a 4 by 4 matrix, it'll try to ask for temporary memory
   with length that overflows a signed 32-bit integer, get a much
   shorter allocation instead, promptly overflow the buffer and
   crash the process.)

As you see, it's interconnected; work on one thing will involve the
other two.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] R FAQ 2.6, 7.21

2024-07-04 Thread Ivan Krylov via R-devel
Hello R-devel,

I would like to suggest a couple of updates for the R FAQ.

https://CRAN.R-project.org/bin/linux/suse is currently empty and the
directory has mtime from 2012, so it probably doesn't help to reference
it in FAQ 2.6.

There seems to be increased interest in using variables as variable
names [1,2], so it might be useful to expand 7.21 a little. Can an R
FAQ entry link to R-intro section 6.1?

Index: doc/manual/R-FAQ.texi
===
--- doc/manual/R-FAQ.texi   (revision 86871)
+++ doc/manual/R-FAQ.texi   (working copy)
@@ -503,9 +503,6 @@
 @abbr{RPM}s for @I{RedHat Enterprise Linux} and compatible distributions (e.g.,
 @I{Centos}, Scientific Linux, Oracle Linux).
 
-See @url{https://CRAN.R-project.org/bin/linux/suse/README.html} for
-information about @abbr{RPM}s for openSUSE.
-
 No other binary distributions are currently publicly available via
 @CRAN{}.
 
@@ -2624,8 +2621,31 @@
 @end example
 
 @noindent
-without any of this messing about.
+without any of this messing about. This becomes especially true if you
+are finding yourself creating and trying to programmatically access
+groups of related variables such as @code{result1}, @code{result2},
+@code{result3}, and so on: instead of fighting against the language to
+use
 
+@example
+# 'i'th result <- process('i'th dataset)
+assign(paste0("result", i), process(get(paste0("dataset", i
+@end example
+
+it is much easier to put the related variables in lists and use
+
+@example
+result[[i]] <- process(dataset[[i]])
+@end example
+
+and, eventually,
+
+@example
+result <- lapply(dataset, process)
+@end example
+
+which is easy to replace with @code{parLapply} for parallel processing.
+
 @node Why do lattice/trellis graphics not work?
 @section Why do lattice/trellis graphics not work?
 


-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] API for converting LANGSXP to LISTSXP?

2024-07-06 Thread Ivan Krylov via R-devel
On Fri, 5 Jul 2024 15:27:50 +0800
Kevin Ushey  wrote:

> A common idiom in the R sources is to convert objects between LANGSXP
> and LISTSXP by using SET_TYPEOF. However, this is soon going to be
> disallowed in packages.

Would you mind providing an example where a package needs to take an
existing LISTSXP and convert it to a LANGSXP (or vice versa)? I think
that Luke Tierney intended to replace the uses of
SET_TYPEOF(allocList(...), LANGSXP) with allocLang(...).

At least it's easy to manually convert between the two by replacing the
head of the list using LCONS(CAR(list), CDR(list)) or CONS(CAR(lang),
CDR(lang)): in a call, the rest of the arguments are ordinary LISTSXPs.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] xftrm is more than 100x slower for AsIs than for character vectors

2024-07-14 Thread Ivan Krylov via R-devel
В Fri, 12 Jul 2024 17:35:19 +0200
Hilmar Berger via R-devel  пишет:

> This can be finally traced to base::rank() (called from
> xtfrm.default), where I found that
> 
> "NB: rank is not itself generic but xtfrm is, and rank(xtfrm(x), )
> will have the desired result if there is a xtfrm method. Otherwise,
> rank will make use of ==, >, is.na and extraction methods for classed
> objects, possibly rather slowly. "

The problem is indeed that the vector reaches base::rank in both cases,
but since it has a class, the function has to construct and evaluate a
call to .gt every time it wants to compare two elements.

xtfrm.AsIs even tries to remove the 'AsIs' class before continuing the
method dispatch process:

>> if (length(cl <- class(x)) > 1) oldClass(x) <- cl[-1L]

It doesn't work in the (very contrived) case when 'AsIs' is not the
first class and it doesn't remove 'AsIs' as the only class (making
static int equal(...) take the slower branch). What's going to break if
we allow removing the class attribute altogether? This seems to speed
up xtfrm(I(x)) and survive LC_ALL=C.UTF-8 make check-devel:

Index: src/library/base/R/sort.R
===
--- src/library/base/R/sort.R   (revision 86895)
+++ src/library/base/R/sort.R   (working copy)
@@ -297,7 +297,8 @@
 
 xtfrm.AsIs <- function(x)
 {
-if(length(cl <- class(x)) > 1) oldClass(x) <- cl[-1L]
+cl <- oldClass(x)
+oldClass(x) <- cl[cl != 'AsIs']
 NextMethod("xtfrm")
 }
 

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Minor inconsistencies in tools:::funAPI()

2024-07-15 Thread Ivan Krylov via R-devel
Hi all,

I've noticed some peculiarities in the tools:::funAPI output that
complicate its programmatic use a bit.

 - Is it for remapped symbol names (with Rf_ or the Fortran
   underscore), or for unmapped names (without Rf_ or the underscore)?

I see that the functions marked in WRE are almost all (except
Rf_installChar and Rf_installTrChar) unmapped. This makes a lot of
sense because some of those interfaces (e.g. CONS(), CHAR(),
NOT_SHARED()) are C preprocessor macros, not functions. I also see that
installTrChar is not explicitly marked.

Are we allowed to call tools:::unmap(tools:::funAPI()$name) and
consider the return value to be the list of all unmapped APIs, despite,
e.g., installTrChar not being explicitly marked?

 - Should R_PV be an @apifun if it's currently caught by checks in
   sotools.R?

 - Should R_FindSymbol be commented /* Not API */ if it's marked as
   @apifun in WRE and not caught by sotools.R? It is currently used by 8
   CRAN packages.

 - The names 'select', 'delztg' from R_ext/Lapack.h are function
   pointer arguments, not functions or type declarations. They are
   being found because funcRegexp is written to match incomplete
   function declarations (e.g. when they end up being split over
   multiple lines, like in R_ext/Lapack.h), and function pointer
   argument declarations look sufficiently similar.

A relatively compact (but still brittle) way to match function
declarations in C header files is shown at the end of this message. I
have confirmed that compared to tools:::getFunsHdr, the only extraneous
symbols that it finds in preprocessed headers are "R_SetWin32",
"user_unif_rand", "user_unif_init", "user_unif_nseed",
"user_unif_seedloc" "user_norm_rand", which are special-cased in
tools:::getFunsHdr, and the only symbols it doesn't find are "select"
and "delztg" in R_ext/Lapack.h, which we should not be finding.

# "Bird's eye" view, gives unmapped names on non-preprocessed headers
getdecl <- function(file, lines = readLines(file)) {
# have to combine to perform multi-line matches
lines <- paste(c(lines, ''), collapse = '\n')
# first eat the C comments, dotall but non-greedy match
lines <- gsub('(?s)/\\*.*?\\*/', '', lines, perl = TRUE)
# C++-style comments too, multiline not dotall
lines <- gsub('(?m)//.*$', '', lines, perl = TRUE)
# drop all preprocessor directives
lines <- gsub('(?m)^\\s*#.*$', '', lines, perl = TRUE)

rx <- r"{(?xs)
(?!typedef)(?
readLines() |>
grep('^\\s*#\\s*error', x = _, value = TRUE, invert = TRUE) |>
tools:::ccE() |>
getdecl(lines = _)

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Question about regexp edge case

2024-07-29 Thread Ivan Krylov via R-devel
В Sun, 28 Jul 2024 20:02:21 -0400
Duncan Murdoch  пишет:

> gsub("^([0-9]{,5}).*","\\1","123456789")  
> [1] "123456"

This is in TRE itself: for "^([0-9]{,1})" tre_regexecb returns {.rm_so
= 0, .rm_eo = 1}, matching "1", but for "^([0-9]{,2})" and above it
returns an off-by-one result, {.rm_so = 0, .rm_eo = 3}.

Compiling with TRE_DEBUG, I see it parsed correctly:

catenation, sub 0, 0 tags
  assertions: bol
  iteration {-1, 2}, sub -1, 0 tags, greedy
literal (0, 9) (48, 57), pos 0, sub -1, 0 tags

...but after tre_expand_ast I see

catenation, sub 0, 1 tags
  assertions: bol
  catenation, sub -1, 1 tags
tag 0
union, sub -1, 0 tags
  literal empty
  catenation, sub -1, 0 tags
literal (0, 9) (48, 57), pos 2, sub -1, 0 tags
union, sub -1, 0 tags
  literal empty
  catenation, sub -1, 0 tags
literal (0, 9) (48, 57), pos 1, sub -1, 0 tags
union, sub -1, 0 tags
  literal empty
  literal (0, 9) (48, 57), pos 0, sub -1, 0 tags

...which has one too many copies of "literal (0,9)". I think it's due
to the expansion loop on line 942 of src/extra/tre/tre-compile.c being

for (j = iter->min; j < iter->max; j++)

...where 'min' is -1 to denote no minimum. This is further confirmed by
"{0,3}", "{1,3}", "{2,3}", "{3,3}" all working correctly.

Neither TRE documentation [1] nor POSIX [2] specify the {,n} syntax:
from my reading, it looks like if the upper boundary is specified, the
lower boundary must be specified too. But if we do want to fix this, it
will have to be a special case for iter->min == -1.

-- 
Best regards,
Ivan

[1]
https://laurikari.net/tre/documentation/regex-syntax/

[2]
https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap09.html#tag_09_03_06

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Minor inconsistencies in tools:::funAPI()

2024-07-30 Thread Ivan Krylov via R-devel
В Mon, 29 Jul 2024 16:29:42 -0400
Toby Hocking  пишет:

> Can you please clarify what input files should be used with your
> proposed function? I tried a few files in r-svn/src/include and one of
> them gave me an error.

This is a good illustration of the brittleness of the regexp approach.
I focused on the header files marked as API: 

> tools:::funAPI()$loc |> unique() |> setdiff('WRE')
 [1] "R_ext/GraphicsDevice.h" "Rmath.h"
 [3] "R_ext/GraphicsEngine.h" "R_ext/BLAS.h"
 [5] "R_ext/Lapack.h" "R_ext/Linpack.h"
 [7] "Rembedded.h""Rinterface.h"
 [9] "R_ext/Altrep.h" "R_ext/Memory.h"
[11] "R_ext/RStartup.h"   "R_ext/Arith.h"
[13] "R_ext/Random.h" "R_ext/Error.h"

I also wanted the function not to crash with Rinternals.h, but getdecl
/ getdecl2 / tools:::getFunsHdr all give different answers for it.

I think this can be done in a more reliable manner using a recursive
descent parser, but that would take some screenfuls of R that will need
to be very carefully written.

Speaking of discrepancies, here are a few functions declared in API
headers but marked with attribute_hidden:

R_ext/Error.h:NORET void WrongArgCount(const char *);
R_ext/Memory.h:int  R_gc_running(void);

And some minor headaches for people who would like a full
programmatic list of entry points:

 - The functions [dpq]norm are unconditionally remapped to dnorm4,
   pnorm5, qnorm5, and the header file parser only picks up the
   numbered function names.

 - 'optimfn', 'optimgr', 'integr_fn' are marked in WRE as @apifun
   despite not directly being functions or symbol names exported by R
   binaries. May I suggest a separate category for types?

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Another issue using multi-processing linear algebra libraries

2024-08-06 Thread Ivan Krylov via R-devel
В Tue, 6 Aug 2024 10:19:25 -0400
Rob Steele via R-devel  пишет:

> Would it make sense to add a parameter somewhere, to mclapply(), say,
> telling R to not use multiprocessing libraries?

It would be great if we had a way to limit all kinds of multiprocessing
(child processes, OpenMP threads, pthreads, forks, MPI, PVM, 'parallel'
clusters, ) in a single setting, but
there is currently no such setting, and it may be impossible to
implement. Particularly problematic may be nested parallellism:
sometimes desirable (e.g. 4-machine cluster, each machine in it using
OpenMP threads), sometimes undesired (e.g. your case). A single setting
is probably far from enough.

> Does R even know whether a linked library is doing multi-processing?

Unfortunately, no, there is no standard interface for that. Best I can
recommend is to link your R installation with FlexiBLAS and then use the
'flexiblas' CRAN package to talk to it.

> Does R build its own BLAS and LAPACK if its also linking external
> ones?

I think it doesn't.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Another issue using multi-processing linear algebra libraries

2024-08-08 Thread Ivan Krylov via R-devel
В Wed, 7 Aug 2024 07:47:38 -0400
Dipterix Wang  пишет:

> I wonder if R initiates a system environment or options to instruct
> the packages on the number of cores to use?

A lot of thought and experience with various HPC systems went into
availableCores(), a function from the zero-dependency 'parallelly'
package by Henrik Bengtsson:
https://search.r-project.org/CRAN/refmans/parallelly/html/availableCores.html
If you cannot accept a pre-created cluster object or 'future' plan or
'BiocParallel' parameters or the number of OpenMP threads from the
user, this must be a safer default than parallel::detectCores().

Building such a limiter into R poses a number of problems. Here is a
summary from a previous discussion on R-pkg-devel [1] with wise
contributions from Dirk Eddelbuettel, Reed A. Cartwright, Vladimir
Dergachev, and Andrew Robbins.

 - R is responsible for the BLAS it is linked to and therefore must
   actively manage the BLAS threads when the user sets the thread
   limit. This requires writing BLAS-specific code to talk to the
   libraries, like done in FlexiBLAS and the RhpcBLASctl package. Some
   BLASes (like ATLAS) only have a compile-time thread limit. R should
   somehow give all threads to BLAS by default but take them away when
   some other form of parallelism is requested.

 - Should R be managing the OpenMP thread limit by itself? If not,
   that's a lot of extra work for every OpenMP-using package developer.
   If yes, R is now responsible for initialising OpenMP.

 - Managing the BLAS and OpenMP thread limits is already a hard problem
   because some BLASes may or may not be following the OpenMP thread
   limits.

 - What if two packages both consult the thread limit and create N^2
   processes as a result of one calling the other? Dividing a single
   computer between BLAS threads, OpenMP threads, child processes and
   their threads needs a very reliable global inter-process semaphore.
   R would have to grow a jobserver like in GNU Make, a separate
   process because the main R thread will be blocked waiting for the
   computation result, especially if we want to automatically recover
   job slots from crashed processes. That's probably not impossible,
   but involves a lot of OS-specific code.

 - What happens with the thread limit when starting remote R processes?
   It's best to avoid having to set it manually. If multiple people
   unknowingly start R on a shared server, how to avoid the R instances
   competing for the CPU (or the ownership of the semaphore)?

 - It will take a lot of political power to actually make this scheme
   work. The limiter can only be cooperative (unless you override the
   clone() syscall and make it fail? I expect everything to crash after
   that), so it takes one piece of software to unknowingly ignore the
   limit and break everything.

-- 
Best regards,
Ivan

[1] https://stat.ethz.ch/pipermail/r-package-devel/2023q4/009956.html

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R-patched on CRAN is R-4.3.3

2024-08-09 Thread Ivan Krylov via R-devel
В Fri, 9 Aug 2024 10:28:19 +0200
Gábor Csárdi  пишет:

> Possibly related to this, it seems that
> https://cran.r-project.org/src/base-prerelease/R-latest.tar.gz
> is not available any more.

I think it's now R-patched.tar.?z:
https://github.com/r-devel/r-dev-web/commit/8e146a769206924ec60ae08e2841910ac8e23083

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Certificates are not trusted

2024-08-15 Thread Ivan Krylov via R-devel
В Thu, 15 Aug 2024 14:57:13 +0200
Ben Engbers  пишет:

> GPG-sleutel op 
> https://download.copr.fedorainfracloud.org/results/iucar/cran/pubkey.gpg 
> (0x1A3B4456) is al geïnstalleerd
> error: Verifying a signature using certificate 
> 3124D2EF76DA4D972F6BE4AC9D60CBB71A3B4456 (iucar_cran (None) 
> ):
>1. Certificiate 9D60CBB71A3B4456 invalid: certificate is not alive
>because: The primary key is not live
>because: Expired on 2024-08-13T00:46:08Z

The copy of Iñaki Ucar's public key stored on your computer has
expired. Meanwhile, the public key at
https://download.copr.fedorainfracloud.org/results/iucar/cran/pubkey.gpg
has not expired (the current file will expire in 2028).

The suggested workaround is to download the public key anew and import
it manually: https://github.com/fedora-copr/copr/issues/2894

Either way, this is a problem related to Fedora Copr, not an R-devel
problem. May be a better fit for r-sig-fed...@r-project.org.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] specials and ::

2024-08-26 Thread Ivan Krylov via R-devel
В Mon, 26 Aug 2024 09:42:10 -0500
"Therneau, Terry M., Ph.D. via R-devel"  пишет:

> For instance
>    fit <- survival::survdiff( survival::Surv(time, status) ~
> ph.karno + survival::strata(inst),  data= survival::lung)
> 
> This fails to give the correct answer because it fools terms(formula,
> specials= "strata").

Apologies if the following has no chance to work for reasons obvious to
everyone else, but *currently*, terms(formula, specials= c('strata',
'survival::strata')) seems to recognise `survival::strata`. Would it be
possible to then post-process the terms object and retain only one kind
of 'strata' special?

Having said that, if https://bugs.r-project.org/show_bug.cgi?id=18568
is merged, this will probably break and will instead require
recognising `::` as a special and then manually figuring out which
function is being imported from which package.


-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Can gzfile be given the same method option as file

2024-09-12 Thread Ivan Krylov via R-devel
В Thu, 12 Sep 2024 12:01:54 +
Simon Andrews via R-devel  пишет:

> readRDS('https://seurat.nygenome.org/azimuth/references/homologs.rds')
> Error in gzfile(file, "rb") : cannot open the connection

I don't think that gzfile works with URLs. gzcon(), on the other hand,
does work with url() connections, which accepts the 'method' argument
and the getOption('url.method') default.

h <- readRDS(url(
 'https://seurat.nygenome.org/azimuth/references/homologs.rds'
))

But that only works with gzip-compressed files. For example, CRAN's
PACKAGES.rds is xz-compressed, and I don't see a way to read it the
same way:

readBin(
 index <- file.path(
  contrib.url(getOption('repos')['CRAN']),
  'PACKAGES.rds'
 ), raw(), 5
) |> rawToChar()
# [1] "\xfd7zXZ" <-- note the "7zXZ" header
readRDS(url(index))
# Error in readRDS(url(index)) : unknown input format

>   2.  Given the warnings we're getting when using wininet, are their
> plans to make windows certficates be supported in another way?

What does libcurlVersion() return for you? In theory, it should be
possible to make libcurl use schannel and therefore the system
certificate store for TLS verification purposes.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Can gzfile be given the same method option as file

2024-09-12 Thread Ivan Krylov via R-devel
В Thu, 12 Sep 2024 15:06:50 +
Simon Andrews  пишет:

> > download.file('https://seurat.nygenome.org/azimuth/references/homologs.rds',
> > destfile = "c:/Users/andrewss/homologs.rds", method="libcurl")  
<...>
> status was 'SSL connect error'
> 
> > download.file('https://seurat.nygenome.org/azimuth/references/homologs.rds',
> > destfile = "c:/Users/andrewss/homologs.rds", method="curl")  
<...>
> curl: (35) schannel: next InitializeSecurityContext
> failed: CRYPT_E_NO_REVOCATION_CHECK (0x80092012) - The revocation
> function was unable to check revocation for the certificate.

This extra error code is useful, thank you for trying the "curl"
method. https://github.com/curl/curl/issues/14315 suggests a libcurl
option and a curl command line option.

Does download.file(method = 'curl', extra = '--ssl-no-revoke') work for
you?

Since R-4.2.2, R understands the R_LIBCURL_SSL_REVOKE_BEST_EFFORT
environment variable. Does it help to set it to "TRUE" (e.g. in the
.Renviron file) before invoking download.file(method = "libcurl")?

Some extra context can be found in
news(grepl('R_LIBCURL_SSL_REVOKE_BEST_EFFORT', Text)) and
.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] WRE about R_strtod

2024-10-08 Thread Ivan Krylov via R-devel
Hello,

This is what "Writing R extensions" currently says about R_atof and
R_strtod:

>> Function: void R_atof (const char* str)
>> Function: void R_strtod (const char* str, char ** end)
>>
>> Implementations of the C99/POSIX functions atof and strtod which
>> guarantee platform-dependent behaviour, including always using the
>> period as the decimal point aka ‘radix character’ and converting
>> "NA" to R’s NA_REAL_ . 

Besides the easily fixable return type (void -> double), shouldn't the
documentation mention the fact that, unlike the standard C library
functions, R's parser returns NA_REAL instead of 0 when no conversion
is performed (including for the "NA" string, *end == str)?

Index: doc/manual/R-exts.texi
===
--- doc/manual/R-exts.texi  (revision 87211)
+++ doc/manual/R-exts.texi  (working copy)
@@ -16482,12 +16482,12 @@
 
 @apifun R_atof
 @apifun R_strtod
-@deftypefun void R_atof (const char* @var{str})
-@deftypefunx void R_strtod (const char* @var{str}, char ** @var{end})
+@deftypefun double R_atof (const char* @var{str})
+@deftypefunx double R_strtod (const char* @var{str}, char ** @var{end})
 Implementations of the C99/POSIX functions @code{atof} and @code{strtod}
 which guarantee platform-dependent behaviour, including always using the
-period as the decimal point @emph{aka} `@I{radix character}' and converting
-@code{"NA"} to R's @code{NA_REAL_} .
+period as the decimal point @emph{aka} `@I{radix character}' and returning
+R's @code{NA_REAL_} for all unconverted strings, including @code{"NA"}.
 @end deftypefun
 

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] invalid permissions

2024-10-22 Thread Ivan Krylov via R-devel
Dear Prof. Roger Koenker,

On Tue, 22 Oct 2024 09:08:12 +
"Koenker, Roger W"  wrote:

> > fN <- rqss(y~qss(x,constraint="N")+z)  
> 
>  *** caught segfault ***
> address 0x0, cause 'invalid permissions’

Given a freshly produced quantreg.Rcheck directory, I was able to
reproduce this crash by running

R -d gdb
# make sure that the package version under check will be loaded
.libPaths(c("quantreg.Rcheck", .libPaths()))
library(quantreg)
example(plot.rqss)

The crash happens in the Fortran code:

Thread 1 "R" received signal SIGSEGV, Segmentation fault.
0x73d77bd4 in pchol (m=5, n=1, xpnt=..., x=..., 
mxdiag=6971508156.8648586, ntiny=0, iflag=0,
smxpy=0x73d75b80 ,
tiny=,
large=) 
at cholesky.f:4927
4927IF (DIAG .LE. tiny * MXDIAG) THEN
(gdb) bt
#0  0x73d77bd4 in pchol
(m=5, n=1, xpnt=..., x=..., mxdiag=6971508156.8648586, ntiny=0, iflag=0, 
smxpy=0x73d75b80 , tiny=Cannot access memory at address 0xe
#1  0x73d77d7a in chlsup
(m=5, n=1, split=..., xpnt=..., x=..., mxdiag=6971508156.8648586, ntiny=0, 
iflag=0, mmpyn=0x73d7
9d90 , smxpy=0x73d75b80 , tiny=Cannot access memory at 
address 0xe
#2  0x73d7849c in blkfc2
(nsuper=, xsuper=..., snode=..., split=..., xlindx=..., 
lindx=..., xlnz=..., lnz=...,
 link=..., length=..., indmap=..., relind=..., tmpsiz=10, temp=..., iflag=0, 
mmpyn=0x73d79d90 , smxpy=0x73d75b80 , tiny=Cannot access memory at address 
>0xe
#3  0x73d78bad in blkfct
(neqns=, nsuper=, xsuper=..., snode=..., 
split=..., xlindx=..., lindx=
..., xlnz=..., lnz=..., iwsiz=796, iwork=..., tmpsiz=10, tmpvec=..., iflag=0, 
mmpyn=0x73d79d90 , smxpy=0x73d75b80 , tiny=Cannot access memory at address 
0xe
#4  0x73d7516d in chlfct
(m=201, xlindx=..., lindx=..., invp=..., perm=..., iwork=..., nnzdsub=1588, 
jdsub=..., colcnt=..., n
super=197, snode=..., xsuper=..., nnzlmax=197231, nsubmax=2615, xlnz=..., 
lnz=..., id=..., jd=..., d=...
, cachsz=64, tmpmax=100244, level=8, tmpvec=..., split=..., ierr=0, it=1, 
timewd=...) at chlfct.f:125
#5  0x73d8bfdf in slpfn
(n=398, m=, nnza=1193, a=..., ja=..., ia=..., ao=..., 
jao=..., iao=..., nnzdmax=1193,
 d=..., jd=..., id=..., dsub=..., jdsub=..., nsubmax=2615, lindx=..., 
xlindx=..., nnzlmax=197231, lnz=..
., xlnz=..., invp=..., perm=..., iwmax=1410, iwork=..., colcnt=..., snode=..., 
xsuper=..., split=..., tm
pmax=100244, tmpvec=..., newrhs=..., cachsz=64, level=8, x=..., s=..., u=..., 
c=..., y=..., b=..., r=...
, z=..., w=..., q=..., nnzemax=1789, e=..., je=..., ie=..., dy=..., dx=..., 
ds=..., dz=..., dw=..., dxdz
=..., dsdw=..., xi=..., xinv=..., sinv=..., ww1=..., ww2=..., 
small=9.9995e-07, ierr=0, maxi
t=100, timewd=...) at srqfn.f:238
#6  0x73d8ccdb in srqfn
(n=, m=, nnza=1193, a=..., ja=..., ia=..., 
ao=..., jao=..., iao=..., n
nzdmax=1193, d=..., jd=..., id=..., dsub=..., jdsub=..., nnzemax=1789, e=..., 
je=..., ie=..., nsubmax=26
15, lindx=..., xlindx=..., nnzlmax=197231, lnz=..., xlnz=..., iw=..., 
iwmax=1410, iwork=..., xsuper=...,
 tmpmax=100244, tmpvec=..., wwm=..., wwn=..., cachsz=64, level=8, x=..., s=..., 
u=..., c=..., y=..., b=.
.., small=9.9995e-07, ierr=0, maxit=100, timewd=...) at srqfn.f:27
#7  0x77b037a2 in do_dotCode # <-- R code starts here
(call=, op=, args=,
env=)

So both TINY and LARGE are invalid pointers at this point, suspiciously
small ones at that (on my 64-bit Linux, a typical pointer looks like
0x7f?? or 0x, with a few more non-zero digits).
Where do they come from?

At chlfct.f (frame 4 above) lines 124-125 we have a function call:

124  call blkfct(m,nsuper,xsuper,snode,split,xlindx,lindx,xlnz,
125  &   lnz,iwsiz,iwork,tmpsiz,tmpvec,ierr,mmpy8,smxpy8)

The function is defined in cholesky.f:

623   SUBROUTINE  BLKFCT (  NEQNS , NSUPER, XSUPER, SNODE , SPLIT ,
624  &  XLINDX, LINDX , XLNZ  , LNZ   , IWSIZ ,
625  &  IWORK , TMPSIZ, TMPVEC, IFLAG , MMPYN ,
626  &  SMXPY,  tiny, Large )

It has two more arguments (tiny and Large) than chlfct gives to it.
That must be the source of the error. Adding the missing arguments to
the function calls avoids the crash:

--- quantreg/src/chlfct.f2019-08-06 15:30:35.0 +0300
+++ quantreg/src/chlfct.f 2024-10-22 12:35:55.0 +0300
@@ -113,16 +113,20 @@
   timbeg = gtimer()
   if (level .eq. 1) then
  call blkfct(m,nsuper,xsuper,snode,split,xlindx,lindx,xlnz,
- &   lnz,iwsiz,iwork,tmpsiz,tmpvec,ierr,mmpy1,smxpy1)
+ &   lnz,iwsiz,iwork,tmpsiz,tmpvec,ierr,mmpy1,smxpy1,
+ &   tiny, large)
   elseif (level .eq. 2) then
  call blkfct(m,nsuper,xsuper,snode,split,xlindx,lindx,xlnz,
- &   lnz,iwsiz,iwork,tmpsiz,tmp

[Rd] Could .Primitive("[") stop forcing R_Visible = TRUE?

2024-10-24 Thread Ivan Krylov via R-devel
Hello,

The "[" primitive operator currently has the 'eval' flag set to 0 in
src/main/names.c. This means that the result of subsetting, whether
R-native or implemented by a method, will never be invisible().

This is a very reasonable default: if the user goes as far as to subset
a value, they probably want to see the result. Unfortunately, there
also exists at least one counter-example to that: data.table's
modification by reference using the `:=` operator from inside the `[`
operator.

If a user creates a data.table object `x` and evaluates x[,foo := bar],
the desired outcome is to return x invisibly, both to allow chained
updates by reference (x[,foo := bar][,bar := baz]) and to avoid
cluttering the screen by printing the whole object after updating a few
columns. Since .Primitive("[") forces visibility on, the data.table
developers had to come up with their own visibility flag [1] and check
it from inside the print() method when it looks like it originates from
auto-printing [2]. Since the auto-printing detection works by looking
at the call stack, this recently broke after a knitr update (but can
be reliably repaired [3]) and doesn't work for sub-classes of
data.table [4].

Is it feasible for R to consider allowing methods for `[` to set their
own visibility flag at this point? The change is deceptively small: set
'eval' to 200 in names.c and R_Visible = TRUE before returning from the
non-method-dispatch branch in do_subset(). This results in one change
in the saved output of R's own tests/reg-S4.R [5]. Or is the potential
breakage for existing code too prohibitive?

-- 
Best regards,
Ivan

[1]
https://github.com/Rdatatable/data.table/blob/e5b845e5cbc6be826558d11d601243240abe7a72/R/print.data.table.R#L164-L169

[2]
https://github.com/Rdatatable/data.table/blob/e5b845e5cbc6be826558d11d601243240abe7a72/R/print.data.table.R#L24-L41

[3]
https://github.com/Rdatatable/data.table/pull/6589

[4]
https://github.com/Rdatatable/data.table/issues/3029

[5] A method for `[` that runs cat() used to return NULL visibly.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Could .Primitive("[") stop forcing R_Visible = TRUE?

2024-10-25 Thread Ivan Krylov via R-devel
On Thu, 24 Oct 2024 13:23:56 -0400
Toby Hocking  wrote:

> The patch you are proposing to base R is
> https://github.com/Rdatatable/data.table/issues/6566#issuecomment-2428912338
> right?

Yes, it's this one, thank you for providing the link.

Surprisingly, a very cursory check of 100 packages most downloaded from
cloud.r-project.org in the last month resulted in only one change to
worse: it's the data.table's own test of auto-printing behaviour. But
there might still be breakage we don't see yet.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Could .Primitive("[") stop forcing R_Visible = TRUE?

2024-10-25 Thread Ivan Krylov via R-devel
В Fri, 25 Oct 2024 08:39:39 -0400
Duncan Murdoch  пишет:

> Surely you or they should be the ones to run the test across all of
> CRAN?

That's fair. The question is, is there a fundamental reason I
overlooked to deny such a change? Except for positioning and
whitespace, the line has been in names.c since SVN revision 2. The
one regression test touched by the change has been there since 2010.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] model.matrix() may be misleading for "lme" models

2024-09-21 Thread Ivan Krylov via R-devel
Dear Prof. John Fox,

В Sat, 21 Sep 2024 12:47:49 -0400
John Fox  пишет:

>  NextMethod(formula(object),  data=eval(object$call$data),
> contrasts.arg=object$contrasts)

The use of NextMethod worries me a bit. It will work as intended as
long as everyone gives fully-named arguments to the generic, without
relying on positional or partial matching, but may give unexpected
results otherwise:

foo <- \(x, ...) UseMethod('foo')
foo.default <- \(x, foo = 'default', baz = 'baz', ...)
 list(foo = foo, baz = baz, '...' = list(...))
# try to override the argument to the default method
foo.bar <- \(x, ...) NextMethod(x, foo = 'override')
x <- structure(list(), class = 'bar')
foo(x) # works, gives the right argument to foo.default
# $foo
# [1] "override"
# 
# $baz
# [1] "baz"
# 
# $...
# list()

# this used to work with foo.default, but now doesn't:
foo(x, fo = 'bar') # not matched to foo=
# $foo
# [1] "override"
# 
# $baz
# [1] "baz"
# 
# $...
# $...$fo
# [1] "bar"

foo(x, 'bar') # not matched to foo=, given to baz=
# $foo
# [1] "override"
# 
# $baz
# [1] "bar"
# 
# $...
# list()

This happens because NextMethod() overwrites named arguments already
present in the call, but any other arguments just get appended, without
any regard to whether they had already matched an argument before the
call was modified. In fact, I'm not seeing a way to safely override
some of the arguments for the next S3 method. The "attempt 4" described
by Henrik Bengtsson at [1] seems to work only if an argument is given
as part of the call:

foo.bar <- \(x, foo, ...) { foo <- 'override'; NextMethod() }
foo(x) # doesn't work
# $foo
# [1] "default"
# 
# $baz
# [1] "baz"
# 
# $...
# list()

foo(x, 1) # does work
# $foo
# [1] "override"
# 
# $baz
# [1] "baz"
#
# $...
# list()

Evaluating object$call$data in the environment of the suggested
nlme:::model.matrix.lme function may also not work right. Without an
explicit copy of the data, the best environment to evaluate it in would
be parent.frame().

-- 
Best regards,
Ivan

[1] https://github.com/HenrikBengtsson/Wishlist-for-R/issues/44

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] model.matrix() may be misleading for "lme" models

2024-09-23 Thread Ivan Krylov via R-devel
В Sun, 22 Sep 2024 10:23:50 -0400
John Fox  пишет:

> > Evaluating object$call$data in the environment of the suggested
> > nlme:::model.matrix.lme function may also not work right. Without an
> > explicit copy of the data, the best environment to evaluate it in
> > would be parent.frame().  
> 
> I'm afraid that I don't understand the suggestion. Isn't
> parent.frame() the default for the envir argument of eval()? Do you
> mean the parent frame of the call to model.matrix.lme()?

Yes, I do mean the parent frame of the model.matrix.lme() function
call. While eval()'s default for the 'envir' argument is
parent.frame(), this default value is evaluated in the context of the
eval() call. Letting model.matrix.lme() call eval() results in the
'envir' being the eval()'s parent, the model.matrix.lme() call frame.

In most cases, model.matrix.lme() works as intended: either lme() has
been given the 'data' argument, so object$data is not NULL and the
branch to eval() is not taken, or 'data' has not been given, so both
object$data and object$call$data are NULL, and NULL doesn't cause any
harm when evaluated in any environment. In the latter case
model.matrix.default() can access the variables in the environment of
the formula.

With keep.data = FALSE, the function may evaluate object$call$data in
the wrong environment:

maybe_model_matrix <- function(X)
 model.matrix(lme(distance ~ Sex, random = ~ 1 | Subject, X,
  contrasts=list(Sex=contr.sum), keep.data=FALSE))

maybe_model_matrix(Orthodont)
# Error in eval(object$call$data) : object 'X' not found

...but then model.matrix.default doesn't work on such objects either,
and if the user wanted the data to be accessible, they could have set
keep.data = TRUE. I can't tell whether evaluating object$call$data in
environment(object$formula) is a better or worse idea than
parent.frame().

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Possible update to survival

2024-09-24 Thread Ivan Krylov via R-devel
В Sun, 15 Sep 2024 00:43:31 +
"Therneau, Terry M., Ph.D. via R-devel"  пишет:

>   2.  Before calling model.frame(), insert my copy of strata into the
> calling chain coxenv <- new.env(parent= environment(formula))
> assign("strata", survival::strata, envir= coxenv)
> environment(formula) <- coxenv

<...>

> For ultimate safety, I am thinking of extending the above to all of
> the internal survival functions that might be used in a formula:
> Surv, strata, pspline, cluster, ratetable  (I think that_s all).   An
> intitial limited test looks okay, but before anything migrates to
> CRAN I am looking for any feedback.

What do you think of the following approach?

When changing the environment of the formula, construct the following
environment chain:

1. Top: your 'coxenv' environment with the special survival functions
2. Enclosing environment: list2env(data)
3. Enclosing^2 environment: original environment(formula)

Since the environment chain constructed by eval() when called by
model.frame() looks different (top: data -> enclosure:
environment(formula)), someone truly determined to shoot themselves in
the foot could still sneak a 'strata' function inside their 'data'
argument.

Having said that, what you are planning to implement may be already
reliable enough.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Disabling S4 primitive dispatch during method resolution affects namespace load actions

2024-09-27 Thread Ivan Krylov via R-devel
Hello,

This problem originally surfaced as an interaction between 'brms',
'rstan' and 'Rcpp' [1]: a call to dimnames() from the 'brms' package on
an object of an S4 class owned by the 'rstan' package tried to load its
namespace. rstan:::.onLoad needs to load Rcpp modules, which uses load
actions and reference classes. Since methods:::.findInheritedMethods
temporarily disables primitive S4 dispatch [2], reference classes break
and the namespace fails to load. I have prepared a small reproduction
package [3], which will need to be installed to show the problem:

R -q -s -e "saveRDS(repro::mk_external(), 'foo.rds')"
R -q -s -e "readRDS('foo.rds')"
# Loading required package: repro
# Error: package or namespace load failed for ‘repro’ in
# .doLoadActions(where, attach):
#  error in load action .__A__.1 for package repro: bar$foo(): attempt
#  to apply non-function
# Error in .requirePackage(package) : unable to find required package
# ‘repro’
# Calls:  ... .findInheritedMethods -> getClass ->
# getClassDef -> .requirePackage
# Execution halted

(Here it has to be a show() call to trigger the package load, not just
dimnames().)

I have verified that the following patch prevents the failure in
loading the namespace, but which other problems could it introduce?

Index: src/library/methods/R/RClassUtils.R
===
--- src/library/methods/R/RClassUtils.R (revision 87194)
+++ src/library/methods/R/RClassUtils.R (working copy)
@@ -1812,6 +1812,9 @@
 
 ## real version of .requirePackage
 ..requirePackage <- function(package, mustFind = TRUE) {
+# we may be called from .findInheritedMethods, which disables S4 primitive 
dispatch
+primMethods <- .allowPrimitiveMethods(TRUE)
+on.exit(.allowPrimitiveMethods(primMethods))
 value <- package
 if(nzchar(package)) {
 ## lookup as lightning fast as possible:

The original change to disable S4 primitive dispatch during method
resolution was done in r50609 (2009); this may be the first documented
instance of it causing a problem. The comment says "At the moment, this
is just for efficiency, but in principle it could be needed to avoid
recursive calls to findInheritedMethods."

-- 
Best regards,
Ivan

[1]
https://stat.ethz.ch/pipermail/r-package-devel/2024q3/011097.html

[2]
https://github.com/r-devel/r-svn/blob/776045d4601ed3ac7b8041e94c665bbfe9709191/src/library/methods/R/methodsTable.R#L457

[3]
https://codeberg.org/aitap/S4_vs_onLoad

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: Could .Primitive("[") stop forcing R_Visible = TRUE?

2024-10-25 Thread Ivan Krylov via R-devel
Dear Luke,

Thank you very much for taking the time to write an exhaustive answer!
They are always a pleasure to read on the R mailing lists.

On Fri, 25 Oct 2024 11:50:34 -0500 (CDT)
luke-tier...@uiowa.edu wrote:

> So there is a discrepancy between interpreted and compiled code which
> is a bug that ideally should be resolved. I suspect changing the
> compiled code behavior would be more disruptive than changing the
> interpreted code behavior, but that would need some looking into.
> 
> Filing a bug report on the discrepancy would be a good next step.

PR18813 is now filed. It is unfortunate that invisible [.data.table and
consistent behaviour of R operators are mutually exclusive, but at
least one positive change could come out of this.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Good practice for packages with Fortran and C code

2024-10-25 Thread Ivan Krylov via R-devel
В Fri, 25 Oct 2024 15:03:54 -0500
f...@fharrell.com пишет:

> Now I find that I can get rid of init.c, and  change NAMESPACE to use
> useDynLib(package name, list of compiled routine names) (without
> .registration and .fixes) after making sure the Fortran and C
> routines are not named the same as the package name.  The routines
> are called using .Fortran(binary module name, args) or .Call(C binary
> module name, ...).
> 
> Can anyone see any problem with streamlining in this way?

"Writing R Extensions" 1.5.4 says:

>> this approach is nowadays deprecated in favour of supplying
>> registration information

...which is the init.c approach you have been using.

With useDynLib(package name, list of compiled routine names), R has to
ask the operating system's dynamic loader to find exported functions in
the DLL by their names. With function registration, it is enough for
the DLL to export one function (package_name_init) and then directly
provide function pointers together with their desired names to R. The
latter is considered more reliable than talking to the operating
system's dynamic loader. It also provides an opportunity for R to check
the number of arguments and their types.

> I'm also using Fortran's dynamic array allocation instead of passing
> working vectors from R.

This is quite reasonable, with the following two things to care about:

1. If the stat= argument is absent from the allocate statement and the
allocation is unsuccessful, program execution stops. Stopping the whole
R process may lose unsaved data belonging to the user, so it's best to
always provide the stat= argument and handle allocation failures
somehow.

2. Calls to R API from within the Fortran program may signal R errors
and jump away from the function call without deallocating the memory
that R knows nothing about. From Fortran code, it's most simple not to
call R API while having allocated memory, although it's not impossible
to use Fortran 2003 C interoperability to call R's error handling
interfaces (described in Writing R Extensions 6.12).

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Depends: R (>= 4.1) for packages that use |> and \(...)

2025-02-06 Thread Ivan Krylov via R-devel
On Thu, 23 Jan 2025 11:16:48 +0100
Kurt Hornik  wrote:

> My guess would be that the new syntax is particularly prominently used
> in examples: if so, it would be good to also have coverage for this.

In today's CRAN snapshot, there turned out to be 198 packages that use
4.1 syntax in examples but not in code, 5 packages that use 4.2 syntax
in examples but 4.1 in the code, and 3 packages that use 4.2 syntax in
examples but not the code. This may be slightly imprecise because I
don't have some of the Rd macro packages installed and run
Rd2ex(stages=NULL) on manually-parsed Rd files without installing the
packages.

Attaching a patch that checks the syntax used in Rd examples at the
same time as the main R code, not necessarily the best way to perform
this check. Is it perhaps worth separating R/* checks from man/*.Rd
checks? Should R CMD check try to reuse the Rd database from the
installed copy of the package?

-- 
Best regards,
Ivan
Index: src/library/tools/R/utils.R
===
--- src/library/tools/R/utils.R	(revision 87694)
+++ src/library/tools/R/utils.R	(working copy)
@@ -2103,6 +2103,38 @@
 out
 }
 
+### ** .source_file_using_R_4.x_syntax
+
+.source_file_using_R_4.x_syntax <- function(f)
+{
+x <- utils::getParseData(parse(f, keep.source = TRUE))
+i1 <- which(x$token %in% c("PIPE", "''"))
+i2 <- which(x$token == "PLACEHOLDER")
+if(length(i1) || length(i2)) {
+xi <- x$id
+xp <- x$parent
+n1 <- rep_len("4.1.0", length(i1))
+## Detect experimental placeholder feature as the head of a
+## chain of extractions by looking at the first child of the
+## grandparent of the placeholder: if it is the placeholder
+## expression then we have the 4.3.0 syntax.
+n2 <- ifelse(vapply(i2,
+function(j) {
+u <- xp[j]
+v <- xp[xi %in% u]
+min(xi[xp %in% v]) == u
+},
+NA),
+ "4.3.0",
+ "4.2.0")
+i <- c(i1, i2)
+data.frame(token = x$token[i],
+   needs = c(n1, n2),
+   text = utils::getParseText(x, xp[i]))
+} else
+NULL
+}
+
 ### ** .package_code_using_R_4.x_syntax
 
 .package_code_using_R_4.x_syntax <-
@@ -2109,43 +2141,31 @@
 function(dir)
 {
 dir <- file_path_as_absolute(dir)
-wrk <- function(f) {
-p <- file.path(dir, "R", f)
-x <- utils::getParseData(parse(p, keep.source = TRUE))
-i1 <- which(x$token %in% c("PIPE", "''"))
-i2 <- which(x$token == "PLACEHOLDER")
-if(length(i1) || length(i2)) {
-xi <- x$id
-xp <- x$parent
-n1 <- rep_len("4.1.0", length(i1))
-## Detect experimental placeholder feature as the head of a
-## chain of extractions by looking at the first child of the
-## grandparent of the placeholder: if it is the placeholder
-## expression then we have the 4.3.0 syntax.
-n2 <- ifelse(vapply(i2,
-function(j) {
-u <- xp[j]
-v <- xp[xi %in% u]
-min(xi[xp %in% v]) == u
-},
-NA),
- "4.3.0",
- "4.2.0")
-i <- c(i1, i2)
-data.frame(token = x$token[i],
-   needs = c(n1, n2),
-   text = utils::getParseText(x, xp[i]),
-   file = rep_len(f, length(i)))
-} else
-NULL
+wrk.R <- function(f)
+{
+ret <- .source_file_using_R_4.x_syntax(file.path(dir, "R", f))
+if (!is.null(ret)) cbind(ret, file = f)
 }
-one <- function(f)
-tryCatch(wrk(f), error = function(e) NULL)
-
-files <- list_files_with_type(file.path(dir, "R"), "code",
+one.R <- function(f)
+tryCatch(wrk.R(f), error = function(e) NULL)
+files.R <- list_files_with_type(file.path(dir, "R"), "code",
   full.names = FALSE,
   OS_subdirs = c("unix", "windows"))
-do.call(rbind, lapply(files, one))
+
+db <- Rd_db(dir = dir)
+wrk.Rd <- function(Rd, f)
+{
+exfile <- tempfile()
+on.exit(unlink(exfile))
+Rd2ex(Rd, exfile)
+ret <- .source_file_using_R_4.x_syntax(exfile)
+if (!is.null(ret)) cbind(ret, file = f)
+}
+one.Rd <- function(Rd, f)
+tryCatch(wrk.Rd(Rd, f), error = function(e) NULL)
+
+do.call(rbind, c(lapply(files.R, one.R),
+ Map(one.Rd, db, names(db
 }
 
 ## ** .package_depends_on_R_at_least
__
R-de

[Rd] A few problems with Sys.setLanguage()

2025-02-11 Thread Ivan Krylov via R-devel
Hello R-devel,

Currently, Sys.setLanguage() interprets an empty/absent environment
variable LANGUAGE to mean unset="en", which disagrees with gettext():
it defaults to the LC_MESSAGES category of the current locale [1]. As a
result, on systems with $LANGUAGE normally unset, Sys.setLanguage(...)
returns "en" instead of the language previously in effect. I would like
to suggest making the default unset = Sys.getlocale("LC_MESSAGES")
instead of "en" so that Sys.setLanguage(Sys.setLanguage(anything))
would not reset language to English. Making Sys.setLanguage() accept an
empty string or NA to reset or remove LANGUAGE (and allowing
Sys.setLanguage() to return that value) could also be an option.

Additionally, there is a number of problems with the way
Sys.setLanguage() handles R having started up in the C locale, some of
them easier to solve than others.

gettext() disables translation lookup only when the LC_MESSAGES locale
category is "C" or "POSIX", so the current test for identical("C",
Sys.getlocale()) will miss the situations when not all locale
categories are set to "C". I think the correct test should be
Sys.getlocale("LC_MESSAGES") %in% c("C", "POSIX", "C.UTF-8", "C.utf8").
(On my GNU/Linux system, setting a "POSIX" locale returns it as "C",
but I don't think that's guaranteed to happen everywhere.)

So what should Sys.setLanguage(lang, force=TRUE) do when the current
LC_MESSAGES locale category disables translation? "en_US.UTF-8" is not
guaranteed to be present on a given system. POSIX documents 'locale -a'
to list available locales [2], so R could attempt something like:

# any locales except C.*/POSIX which disable translation?
system("locale -a", intern = TRUE) |>
 setdiff(c("C", "C.UTF-8", "C.utf8", "POSIX")) -> candidates
locale <- if (any(mask <- startsWith(candidates, lang))) {
 candidates[mask][[1]]
} else if (length(candidates)) {
 candidates[[1]]
} else {
 "en_US.UTF-8" # maybe it's available despite 'locale -a' failing?
}
lcSet <- Sys.setlocale("LC_MESSAGES", locale)

Unfortunately, that's not all: translations are also affected by the
LC_CTYPE category of the current locale, and gettext() will try to
convert the translations into that locale's encoding before returning
them. What about LC_CTYPE being "C"? Sometimes gettext() is able to
transliterate:

$ LC_CTYPE=C LANGUAGE=ru R -q -s -e 'foo'
Oshibka: ob``ekt 'foo' ne najden
Vy`polnenie ostanovleno

And sometimes it's not:

$ LC_CTYPE=C LANGUAGE=zh_CN R -q -s -e 'foo'
??: ?'foo'
 # <-- these are \x3F question marks, not replacement characters

There doesn't seem to be a portable way to determine a locale with an
encoding that would be appropriate in the current session. For example,
on my system, only 4 locales out of 11 listed by 'locale -a' use UTF-8
as their encoding (and sometimes UTF-8 is the wrong choice when I'm
using 'luit' with a non-UTF-8 environment).

R could try to force the same locale for LC_CTYPE as it sets
LC_MESSAGES, or force a UTF-8 locale if it finds one, or leave LC_CTYPE
as it is. All of these options have their downsides. How helpful is
Sys.setLanguage(force = TRUE) in practice? 

-- 
Best regards,
Ivan

[1] The environment variables used for gettext() are listed at the
following resources:
https://www.gnu.org/software/gettext/manual/html_node/Locale-Environment-Variables.html
https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap08.html#tag_08_02
The exact lookup procedure is also documented here:
https://pubs.opengroup.org/onlinepubs/9799919799/functions/dngettext.html
In short, if the LC_MESSAGES category of the current locale is
"C" or "POSIX", gettext() does not translate. (GNU gettext additionally
disables translation for "C.UTF-8".) Otherwise it consults the LANGUAGE
environment variable. If that variable is absent or empty, it uses the
LC_MESSAGES category of the current locale. When a program calls
setlocale(category, ""), $LANG provides the default value for all
categories, which is overridden by the $LC_* variables for individual
categories, which are all overridden by $LC_ALL.

[2]
https://pubs.opengroup.org/onlinepubs/9799919799/utilities/locale.html

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Suggestion to emphasize Rboolean is unrelated to LGLSXP in R-exts

2025-02-02 Thread Ivan Krylov via R-devel
The good news is that without a C23-enabled compiler, the problem will
only happen to source files that #include . The bad news is
that such a source file will technically disagree with the rest of R
about the type of Rboolean, including the prototypes of the API
functions that accept Rboolean:

#include 
#include 
typedef void (*pordervector1)(int *, int, SEXP, Rboolean, Rboolean);
// ...
pordervector1 f = R_orderVector1;
f(pindx, length(indx), arg, nalast, decreasing);

foo.c:27:17: runtime error: call to function R_orderVector1 through
pointer to incorrect function type 'void (*)(int *, int, struct SEXPREC
*, bool, bool)'
/tmp/R-devel/src/main/sort.c:1135: note: R_orderVector1 defined here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior foo.c:27:17

With sanitizers disabled, this doesn't seem to cause any real problems
thanks to the calling convention, where both 'enum's and 'bool's are
passed and returned in a register.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [SPAM Warning!] Suggestion to emphasize Rboolean is unrelated to LGLSXP in R-exts

2025-02-01 Thread Ivan Krylov via R-devel
On Thu, 30 Jan 2025 13:07:31 -0800
Michael Chirico  wrote:

> There are at least dozens of other cases on CRAN [2],[3].

Some of these involve casting an int to Rboolean. Best case, the int is
compared against NA_LOGICAL beforehand, avoiding any mistake (there's
at least one like that). Worst case, NA_LOGICAL is not considered before
the cast, so NA will now be interpreted as TRUE. This is hard to check
without actually reading the code.

Some packages compare an Rboolean expression against NA_LOGICAL [1].
This implies having stored an int in an Rboolean value as in the
previous paragraph. I think that it wasn't disallowed according to the
C standard to store NA_LOGICAL in an enumeration type wide enough to
fit it (and it evidently worked in practice). With typedef bool
Rboolean, storing NA_LOGICAL in an Rboolean converts it to 'true', so
the comparison will definitely fail:

DPQ src/pnchisq-it.c:530,532
Rmpfr src/convert.c:535
checkmate src/helper.c:102
chron src/unpaste.c:21
collapse src/data.table_rbindlist.c:208,258,383,384,408,431
data.table (many; fixed in Git)
ff src/ordermerge.c:5074 (one declaration, many comparisons)
networkDynamic src/Rinit.c:209 src/is.active.c:75,76,96-98
slam src/util.c:258
this.path src/get_file_from_closure.h:13,43 src/thispath.c:14,17,19,39
 src/ext.c:25 src/setsyspath.c:8 src/get_file_from_closure.h:13,43

Four packages cast int* pointers returned by LOGICAL() to Rboolean* or
use sizeof(Rboolean) to calculate buffer sizes in calls to memcpy()
with LOGICAL() buffers [2]. With typedef bool Rboolean, this is a
serious mistake, because the memory layout of the types is no longer
compatible:

bit64 src/integer64.c:576,603,914,929,942,955,968,981,994
collapse src/data.table_rbindlist.c:19,67,105
data.table (many; fixed in Git)
kit src/utils.c:390

I don't know Coccinelle that well and there may be additional cases I
failed to consider. At which point is it appropriate to start notifying
maintainers of the bugs not caught by their test suites?

-- 
Best regards,
Ivan

[1] Coccinelle script:
@@
typedef Rboolean;
Rboolean E;
@@
* E == NA_LOGICAL

[2] Coccinelle scripts:

@@
typedef Rboolean;
int* E;
@@
* (Rboolean*)E

This one will offer a diff to fix the bug:

@@
int *E1;
int *E2;
typedef Rboolean;
@@
(
 memcpy
|
 memmove
)
 (E1, E2,
 <+...
-sizeof(Rboolean)
+sizeof(int)
 ...+>
 )

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] binomial()$linkinv no longer accepts integer values

2025-01-08 Thread Ivan Krylov via R-devel
On Wed, 8 Jan 2025 10:57:47 -0500
Ben Bolker  wrote:

> I haven't done the archaeology to figure out when this broke/exactly 
> what change in the R code base broke it: it happened within the last 
> month or so

binomial() itself exhibits this property even in R-4.2.2 from more than
two years ago:

R -q -s -e 'getRversion(); binomial()$linkinv(1L)'
# [1] ‘4.2.2’
# Error in binomial()$linkinv(1L) : 
#   REAL() can only be applied to a 'numeric', not a 'integer'

It's the `etas` [1] that suddenly became integer due to a change in
seq.int():

R -q -s -e 'str(seq.int(-8, 8, by=1))'
# num [1:17] -8 -7 -6 -5 -4 -3 -2 -1 0 1 ...
R-devel -q -s -e 'str(seq.int(-8, 8, by=1))'
# int [1:17] -8 -7 -6 -5 -4 -3 -2 -1 0 1 ...

-- 
Best regards,
Ivan

[1]
https://github.com/lme4/lme4/blob/54c54a320c23b34fea2f7e613928d1ebe7a3fd37/tests/testthat/test-glmFamily.R#L10C5-L10C25

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Package compression benchmarks for zstd vs gzip

2025-01-12 Thread Ivan Krylov via R-devel
On Sat, 11 Jan 2025 16:05:46 -0800
Henrik Bengtsson  wrote:

> It's probably also worth looking at package compression with 'xz'
> compression. In [1], Mike FC has a graph where 'bzip2' and 'xz' seem
> to give the best compression ratios, at least for RDS files.

'bzip2' can be surprisingly good on very repetitive payloads. It
compresses 0x8000 zero bytes to only 1.5 KiB, much better than 'xz
-9' with 305 KiB (with compression settings not making much
difference), although the compression is not perfect. One terabyte of
zeros can be compressed to 697202 bytes of repetitive compressed stream
that can be bzipped further to 248 bytes.

Binary packages are probably the most obvious target for new
compression methods because there is no need to install them on older
versions of R.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Depends: R (>= 4.1) for packages that use |> and \(...)

2025-01-23 Thread Ivan Krylov via R-devel
Many thanks to Henrik for remembering the report in Bugzilla and to
Kurt for implementing the change and finding out the true number of
affected packages.

On Wed, 22 Jan 2025 15:34:41 -0500
Ian Farm  wrote:

> Would packages using the underscore placeholder with the native pipe
> need to also depend on R >= 4.2.0?

That's a good find! For the R >= 4.2 syntax, we only need to check for
getParseData(...)$token %in% 'PLACEHOLDER'. The R >= 4.3 syntax feature
is harder to test for:

>> As an experimental feature the placeholder _ can now also be used in
>> the rhs of a forward pipe |> expression as the first argument in an
>> extraction call, such as _$coef. More generally, it can be used as
>> the head of a chain of extractions, such as _$coef[[2]]. 

I think it might be possible to parse(text = paste('PLACEHOLDER |>',
grandparent_expression)) and then look at the top-level function in the
call, but that feels quite fragile:

x <- utils::getParseData(parse(f, keep.source = TRUE))
i <- x$token %in% "PLACEHOLDER"
pi <- x[i, "parent"]
ppi <- x[x$id %in% pi, "parent"]
placeholder_expressions <- utils::getParseText(x, ppi)
extractor_used <- vapply(placeholder_expressions, function(src) {
 toplevel <- parse(text = paste("PLACEHOLDER |> ", src))[[1]][[1]]
 identical(toplevel, quote(`$`)) ||
  identical(toplevel, quote(`[`)) ||
  identical(toplevel, quote(`[[`))
}, FALSE)

Alternatively, we may find the first child of the grandparent of the
placeholder. If it's the placeholder expression, then the pipe must be
of the form ...|> _..., which is the R >= 4.3 syntax:

x <- utils::getParseData(parse(f, keep.source = TRUE))
i <- x$token %in% "PLACEHOLDER"
vapply(which(i), function(i) {
 pi <- x[i, "parent"]
 ppi <- x[x$id %in% pi, "parent"]
 cppi <- x[x$parent %in% ppi, "id"]
 min(cppi) == pi
}, FALSE)

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] How setClass() may introduce a binary dependency between packages

2025-01-18 Thread Ivan Krylov via R-devel
Hello R-devel,

Since Pavel has mentioned ABI-level dependencies between packages [1],
it may be relevant to revisit the related problem mentioned ~1.5 years
ago by Dirk [2].

While the current version of SeuratObject doesn't exhibit this problem,
a combination of package versions described by Dirk still breaks each
other on R-devel:

1. Install Matrix_1.5-1
2. Install SeuratObject_4.1.3 from source
3. Install Matrix_1.6-0
4. SeuratObject is now broken until reinstalled from source

The problem is actually slightly worse, because loading SeuratObject
from step (2) breaks sparse matrices for everyone until Matrix is
reloaded (and very few people can afford the $127-150 million budget for
that):

library(Matrix); sparseMatrix(1,1)
# 1 x 1 sparse Matrix of class "ngCMatrix"
# 
# [1,] |
suppressPackageStartupMessages(library(SeuratObject))
sparseMatrix(1,1)
# 1 x 1 sparse Matrix of class "ngCMatrix"
# Error in validityMethod(as(object, superClass)) :
#   object 'Csparse_validate' not found
detach('package:SeuratObject', unload = TRUE); sparseMatrix(1,1)
# 1 x 1 sparse Matrix of class "ngCMatrix"
# Error in validityMethod(as(object, superClass)) :
#   object 'Csparse_validate' not found
detach('package:Matrix', unload = TRUE); library(Matrix)
sparseMatrix(1,1)
# 1 x 1 sparse Matrix of class "ngCMatrix"
# 
# [1,] |

In turn, this can be traced to a copy of the CsparseMatrix class from
Matrix_1.5-1 remaining in the namespace and the lazy-load database of
SeuratObject:

readRDS('SeuratObject/R/SeuratObject.rdx')$variables |> names() |>
grep('sparseM', x = _, value = TRUE)
# [1] ".__C__CsparseMatrix" ".__C__dsparseMatrix" ".__C__sparseMatrix" 
SeuratObject:::.__C__CsparseMatrix@validity
# function (object) 
# .Call(Csparse_validate, object) # <-- missing in Matrix_1.6-0
# 
# 

When the SeuratObject namespace is loaded, methods::cacheMetaData sees
the 1.5-1 class definition after the 1.6-0 definition and overwrites
the cache entry.

Why do these objects appear in the namespace and not the imports
environment together with the actually imported .__C__dgCMatrix?

(gdb) p Rf_install(".__C__CsparseMatrix")
$1 = (struct SEXPREC *) 0x57888c28
(gdb) b Rf_defineVar if symbol == (SEXP)0x57888c28
Breakpoint 1 at 0x77b1bcd0: file envir.c, line 1624.

file.copy(
 'SeuratObject-collated.R', 'SeuratObject/R/SeuratObject',
 overwrite=TRUE
)
Sys.setenv('_R_TRACE_LOADNAMESPACE_'='5')
tools:::makeLazyLoading('SeuratObject')

Eventually, after two hits during loading Matrix code and exports:

-- done processing imports for “SeuratObject”
-- loading code for “SeuratObject”
Thread 1 "R" hit Breakpoint 1, Rf_defineVar (symbol=0x58753e18, 
value=0x5d0602f8, rho=0x58906630) at envir.c:1624
1624if (value == R_UnboundValue)
(gdb) call Rf_PrintValue(R_NamespaceEnvSpec(rho))
  nameversion
"SeuratObject""4.1.3"
(gdb) call Rf_PrintValue(symbol)
.__C__CsparseMatrix
(gdb) call Rf_PrintValue(R_GlobalContext->call)
assign(mname, def, where)
(gdb) call Rf_PrintValue(R_GlobalContext->nextcontext->call)
assignClassDef(class2, classDef2, where2, TRUE)
(gdb) call Rf_PrintValue(R_GlobalContext->nextcontext->nextcontext->call)
setIs(class2, cli, extensionObject = obji, doComplete = FALSE, 
where = where)
(gdb) call 
Rf_PrintValue(R_GlobalContext->nextcontext->nextcontext->nextcontext->call)
completeSubclasses(classDef2, class1, obj, where)
(gdb) call 
Rf_PrintValue(R_GlobalContext->nextcontext->nextcontext->nextcontext->nextcontext->call)
setIs(Class, class2, classDef = classDef, where = where)
(gdb) call 
Rf_PrintValue(R_GlobalContext->nextcontext->nextcontext->nextcontext->nextcontext->nextcontext->nextcontext->nextcontext->nextcontext->nextcontext->nextcontext->nextcontext->call)
setClass(Class = "Graph", contains = "dgCMatrix", slots = list(assay.used = 
"OptionalCharacter"))

In other words, setIs("Graph", "dgCMatrix", ...) implies setIs("Graph",
"CsparseMatrix", ...), which needs to update the definition of
CsparseMatrix in some environment. In the current version of
SeuratObject, methods:::.findOrCopyClass() succeeds in finding the
class to update in the _imports_ of SeuratObject because the relevant
classes are now imported [3]:

findClass('CsparseMatrix', loadNamespace('SeuratObject'))
# [[1]]
# 
# attr(,"name")
# [1] "imports:SeuratObject"

In SeuratObject_4.1.3, the class was not imported, so
methods:::.findOrCopyClass() used the SeuratObject _namespace_ as the
environment to assign the class definition in.

Are there ways to prevent this problem (by importing more classes?) or
at least warn about it at package check time? How prevalent is class
copying on CRAN? Out of 358 packages installed on my machine, many no
doubt outdated, only six copy foreign S4 classes into their own
namespaces:

installed.packages() |> rownames() |> setNames(nm = _) |> lapply(\(n) {
 ns <- loadNamespace(n)
 ls(ns, pattern = '^[.]__C__', all.names = TRUE) |>
  setNames(nm = _) |> lapply(get, ns) |>
  vapply(at

Re: [Rd] Creating a long list triggers billions of messages

2025-01-21 Thread Ivan Krylov via R-devel
В Tue, 21 Jan 2025 16:51:34 +1100
Hugh Parsonage  пишет:

> x <- vector("list", 2^31)
> 
> which triggers (presumably) billions of error messages like
> Error: long vectors are not supported yet ../include/Rinlinedfuns.h

I couldn't reproduce this with some released versions of R or a recent
R-devel. Would you mind sharing your sessionInfo()?

If you'd like to, could you please run

R -q -s -e 'x <- vector("list", 2^31)' -d gdb

...then set a breakpoint on Rf_errorcall, run the program and collect a
backtrace when it fires?

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Depends: R (>= 4.1) for packages that use |> and \(...)

2025-01-16 Thread Ivan Krylov via R-devel
Hello R-devel,

Approximately [*] the following CRAN packages make use of the pipe
syntax in their source code or examples without depending on R >= 4.1:

 [1] "biplotEZ"   "CaseBasedReasoning" "collinear"
 [4] "cubble" "disk.frame" "duckdbfs"
 [7] "eia""feltr"  "flattabler"
[10] "geodimension"   "hgnc"   "himach"
[13] "lay""lidR"   "locateip"
[16] "particles"  "photosynthesis" "pivotea"
[19] "planr"  "rtrek"  "satres"
[22] "sdtmval""selenider"  "sewage"
[25] "stminsights""tabr"   "tidygraph"
[28] "tidywikidatar"  "USgas"  "washi"
[31] "zctaCrosswalk"

Since we have checks in place to automatically set Depends: R (>=
2.10.0) for data files compressed with xz or bzip2 and >= 3.5.0 for
data files serialized with format version 3, would it make sense to
automatically add Depends: R (>= 4.1) for such packages?

The patch at the end of this message adds the R version dependency
during R CMD build:

R-devel CMD build --no-build-vignettes .
* checking for file ‘./DESCRIPTION’ ... OK
* preparing ‘biplotEZ’:
* checking DESCRIPTION meta-information ... OK
* cleaning src
* checking vignette meta-information ... OK
* checking for LF line-endings in source and make files and shell
scripts
* checking for empty or unneeded directories
  NB: this package now depends on R (>= 4.1.0)
  WARNING: Added dependency on R >= 4.1.0 because some of the source
  files use the new syntax constructs.
Files making use of R >= 4.1 pipe |> or function shorthand \(...):
  biplotEZ/R/biplot.R biplotEZ/R/translate_axes.R
* building ‘biplotEZ_2.2.tar.gz’

A more extensive test could also look at the tests, demos, and
\examples{}, but that may take longer and open the door for false
positives. A package that uses |> in one example would still be useful
on R-4.0.0, while a package that uses |> in the source code would fail
to parse and install.

Index: src/library/tools/R/QC.R
===
--- src/library/tools/R/QC.R(revision 87545)
+++ src/library/tools/R/QC.R(working copy)
@@ -10367,7 +10367,29 @@
 }
 }
 
+.check_use_of_R41_syntax <-
+function(files)
+{
+out <- vapply(files,
+   function(f) tryCatch(
+   any(getParseData(parse(f, keep.source = TRUE))$token %in% 
c("PIPE", "''")),
+   error = function(e) FALSE
+   ),
+   FALSE)
+out <- files[out]
+class(out) <- "check_use_of_R41_syntax"
+out
+}
 
+format.check_use_of_R41_syntax <-
+function(x, ...)
+{
+if (length(x)) {
+c("Files making use of R >= 4.1 pipe |> or function shorthand 
\\(...):",
+  .strwrap22(x, " "))
+} else character()
+}
+
 ### Local variables: ***
 ### mode: outline-minor ***
 ### outline-regexp: "### [*]+" ***
Index: src/library/tools/R/build.R
===
--- src/library/tools/R/build.R (revision 87545)
+++ src/library/tools/R/build.R (working copy)
@@ -1165,9 +1165,11 @@
 desc <- .read_description(file.path(pkgname, "DESCRIPTION"))
 Rdeps <- .split_description(desc)$Rdepends2
 hasDep350 <- FALSE
+hasDep410 <- FALSE
 for(dep in Rdeps) {
 if(dep$op != '>=') next
 if(dep$version >= "3.5.0") hasDep350 <- TRUE
+if(dep$version >= "4.1.0") hasDep410 <- TRUE
 }
 if (!hasDep350) {
 ## re-read files after exclusions have been applied
@@ -1189,6 +1191,23 @@
  "\n")
 }
 }
+if (!hasDep410) {
+uses410 <- .check_use_of_R41_syntax(dir(file.path(pkgname, "R"),
+full.names = TRUE,
+pattern = "[.]R$",
+ignore.case = TRUE))
+if (length(uses410)) {
+fixup_R_dep(pkgname, "4.1.0")
+msg <- paste("WARNING: Added dependency on R >= 4.1.0 because",
+ "some of the source files use the new syntax",
+ "constructs.")
+printLog(Log,
+ paste(c(strwrap(msg, indent = 2L, exdent = 2L),
+ format(uses410)),
+   collapse = "\n"),
+   "\n")
+}
+}
 
## add NAMESPACE if the author didn't write one
if(!file.exists(namespace <- file.path(pkgname, "NAMESPACE")) ) {


-- 
Best regards,
Ivan

[*] Based on the following GitHub search, which requires logging in:
https://github.com/search?q=org%3Acran%20path%3A%2F%5B.%5D%5BRr%5Dd%3F%24%2F%20%2F%5Cs%5C%7C%3E%2F&type=code
There's currently no REST API support for regexp search, so the list
was ob

Re: [Rd] UTF-8 encoding issue with R CMD check with install-args="--latex"

2025-01-16 Thread Ivan Krylov via R-devel
В Thu, 16 Jan 2025 18:09:25 +0100
Peter Ruckdeschel via R-devel  пишет:

> this is to report some minor UTF-8 encoding issue with R CMD check
> with option --install-args="--latex" (and possibly more install-args).

Thank you for a very detailed report!

This doesn't happen on R-4.2.2 or 4.3.1, but it does happen on R-devel.

Comparing the calls from R CMD check to R CMD Rd2pdf, I see no
difference in the environment variables or any significant difference
in the command lines. The command being run ends up being equivalent to

R CMD Rd2pdf .Rcheck/

...and the source of the difference is the presence (or absence) of the
.Rcheck//latex directory. If I temporarily move it
away during an R CMD check --install-args=--latex run, the command
succeeds.

Indeed, tools:::.pkg2tex says

>> ## First check for a latex dir (from R CMD INSTALL --latex).
>> ## Second guess is this is a >= 2.10.0 package with stored .rds
>> ## files.
>> ## If it does not exist, guess this is a source package.
>> latexdir <- file.path(pkgdir, "latex")

The individual *.tex files in the latex/ subdirectory of the installed
package all do start with an "\inputencoding{utf8}" line.

When the latex/ subdirectory doesn't exist, the !dir.exists(latexdir)
branch is taken, where Rd2latex(...) is called with writeEncoding =
FALSE, thus avoiding the problem.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R-devel, bool, and C23

2025-03-15 Thread Ivan Krylov via R-devel
On Fri, 14 Mar 2025 22:25:54 -0500
Dirk Eddelbuettel  wrote:

> An older package I looked at apparently currently fails to build under
> r-devel (and with that my thanks to R-universe for giving us a
> 'broad' range of builds for free -- off our development sources) over
> 'bool' related changes and enum definitions.
> 
> I can get it to behave and build by declaring
> 
>   PKG_CFLAGS = -std=gnu23

Could you please share the compilation failure messages? The -std=gnu23
flag causes R_ext/Boolean.h to _not_ #include  before
defining enum Rboolean, so the problem is likely related to that.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R-devel, bool, and C23

2025-03-15 Thread Ivan Krylov via R-devel
On Sat, 15 Mar 2025 07:51:11 -0500
Dirk Eddelbuettel  wrote:

>   /usr/local/lib/R-devel/lib/R/include/R_ext/Boolean.h:62:16: warning: ISO C 
> does not support specifying ‘enum’ underlying types before C23 [-Wpedantic]
>  62 |   typedef enum :int { FALSE = 0, TRUE } Rboolean;  // so NOT NA
>   A |^

I think that the configure test [1] succeeds in non-C23 mode because
the test program compiles successfully (despite the warning), causing
the enum-related warning for any compilation units that include
R_ext/Boolean.h.

Since there may be no portable way to specify CFLAGS=-Werror for the
AC_RUN_IFELSE(...) test, perhaps the configure test should also test
for the reported C standard version? But that of course could also be
wrong...

-- 
Best regards,
Ivan

[1]
https://github.com/r-devel/r-svn/blob/886ba5f282a82d6f327211b08b4fa502641c7ef8/m4/R.m4#L5442-L5459

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R does not build with conda libcurl

2025-04-14 Thread Ivan Krylov via R-devel
On Mon, 14 Apr 2025 14:10:56 +0200
Toby Hocking  wrote:

> /usr/bin/ld : ../../lib/libR.so : référence indéfinie vers
> « u_getVersion_73 »

Strange that it's complaining about symbols from libicu when the
problem is due to libcurl-related flags. What was the command line used
to link libR.so somewhere above in the log? I think it's not being
correctly linked with libicu, but since shared libraries are allowed to
have undefined imports in them, this is only found out later, when
linking the R.bin executable.

> It seems that the libcurl package in conda provides the curl-config
> command line program, which R is using to get this flag:
> -I/home/local/USHERBROOKE/hoct2726/miniconda3/include

With libcurl installed from conda, what do the following commands print?

curl-config --built-shared
curl-config --static-libs
curl-config --libs

> To fix the build, I did "conda remove libcurl" and then "make clean"
> and then "configure" and "make" worked.

It should also be possible to override the path to curl-config using
the CURL_CONFIG environment variable.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] sqrt(.Machine$double.xmax)^2 == Inf, but only on Windows in R

2025-04-29 Thread Ivan Krylov via R-devel
On Tue, 29 Apr 2025 12:00:09 +0200
Martin Maechler  wrote:

> Would you (or anybody else) know if this is new behaviour or it
> also happened e.g. in R 4.4.x versions on  Windows?

R-4.3.1 on Windows 7 in a virtual machine gives:

dput(sqrt(.Machine$double.xmax), control = 'hex')
# 0x1p+512
sqrt(.Machine$double.xmax)^2
# [1] Inf

...which differs from R on Linux:

dput(.Machine$double.xmax, control = 'hex')
# 0x1.fp+1023
dput(sqrt(.Machine$double.xmax), control = 'hex')
# 0x1.fp+511
dput(sqrt(.Machine$double.xmax)^2, control = 'hex')
# 0x1.ep+1023

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Depends: R (>= 4.1) for packages that use |> and \(...)

2025-03-05 Thread Ivan Krylov via R-devel
В Wed, 5 Mar 2025 07:47:04 -0600
Hadley Wickham  пишет:

> Unfortunately your test generates a false positive for httr2 (
> https://cran.r-project.org/web/checks/check_results_httr2.html) and
> other tidyverse packages where we use the base pipe in examples, but
> carefully disable them for older versions of R.

Please accept my apologies. Indeed, the script [1] called at
configuration time does replace the examples section to avoid parse
errors for example(...). And having example() do nothing (with all code
and an explanation in the help page) is a small cost for having the
rest of the package work on R versions as old as 3.5.

I don't see a way to take this into account, since the workaround is
completely invisible to sufficiently new versions of R. It might be
that the best way forward is to revert the Rd example check.

-- 
Best regards,
Ivan

[1] https://github.com/r-lib/httr2/blob/main/tools/examples.R

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Customizing width of input in Rterm

2025-02-26 Thread Ivan Krylov via R-devel
В Fri, 21 Feb 2025 11:52:41 +0100
Iago Giné-Vázquez  пишет:

> When using Rterm.exe (on Windows, I didn’t check Linux) only a
> limited number of characters is displayed in the input lines,
> independent on the |width| option, and, when navigating through the
> command history this becomes very uncomfortable, since a command is
> not fully displayed and, for example, it is very difficult to know
> which parts of a command are being edited.

The patch at the end of this message reads the console width on startup
and handles the console resize events while getline() is running. The
automatic resizing doesn't look nice due to gl_redraw() starting a new
line newline before redrawing the prompt, but it works.

This may have accessibility implications since it changes the behaviour
of Rterm.exe, which is what A. Jonathan R. Godfrey recommends for blind
Windows users [1]. I've experimented with NVDA and didn't notice
anything breaking in Windows 10 terminal or mintty.exe, but it's hard
to be sure without the real experience of using a screen reader.

Is this approach worth adopting? Is it better to erase the current line
instead of starting a new one? options(setWidthOnResize) could be
implemented similarly but may require more care due to
R_SetOptionWidth(...) evaluating R code.

Index: src/gnuwin32/getline/getline.c
===
--- src/gnuwin32/getline/getline.c  (revision 87795)
+++ src/gnuwin32/getline/getline.c  (working copy)
@@ -25,6 +25,7 @@
 int(*gl_in_hook)(char *) = 0;
 int(*gl_out_hook)(char *) = 0;
 int(*gl_tab_hook)(char *, int, int *) = gl_tab;
+static int  do_setwidth(int w);
 
 #include 
 #include 
@@ -214,6 +215,10 @@
  The bug still exists in Windows 10, and thus we now call
  GetConsoleInputW to get uchar.UnicodeChar. */
   ReadConsoleInputW(Win32InputStream, &r, 1, &a);
+  if (r.EventType == WINDOW_BUFFER_SIZE_EVENT) {
+if (do_setwidth(r.Event.WindowBufferSizeEvent.dwSize.X))
+  gl_redraw();
+  }
   if (!(r.EventType == KEY_EVENT)) break;
   st = r.Event.KeyEvent.dwControlKeyState;
   vk = r.Event.KeyEvent.wVirtualKeyCode;
@@ -487,6 +492,11 @@
 gl_w2e_map = gl_realloc(NULL, 0, BUF_SIZE, sizeof(size_t)); 
 
 gl_char_init();
+
+CONSOLE_SCREEN_BUFFER_INFO csb;
+GetConsoleScreenBufferInfo(Win32OutputStream, &csb);
+do_setwidth(csb.dwSize.X);
+
 gl_init_done = 1;
 }
 
@@ -536,13 +546,21 @@
 BUF_SIZE = newsize;
 }
 
+static int
+do_setwidth(int w)
+{
+/* may be called from gl_getc if a resize event is received */
+if (w > 20) {
+   gl_w_termw = w;
+   return 1;
+}
+return 0;
+}
+
 void
 gl_setwidth(int w)
 {
-/* not used in R; should arrange for redraw */
-if (w > 20) 
-   gl_w_termw = w;
-else 
+if (!do_setwidth(w))
gl_error("\n*** Error: minimum screen width is 21\n");
 }
 


-- 
Best regards,
Ivan

[1]
https://r-resources.massey.ac.nz/lurnblind/LURNBlindch2.html#x4-80002.3

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Suggestion: Install packages on non-appendable file systems (e.g. databricks volumes)

2025-04-02 Thread Ivan Krylov via R-devel
В Thu, 27 Mar 2025 13:26:47 +0100
Sergio Oller  пишет:

> Our current workaround kind of works, but when users expect to be able
> to install packages
> using renv or other tools that use install.packages to work; our
> wrapper is not that convenient.

Here's an idea that might help: since R CMD INSTALL launches R without
--vanilla, you can provide a system-wide startup file to your users that
would patch the installation function at runtime without patching the R
source code:

if (nzchar(Sys.getenv("R_LOCKDIR_PREFIX", "")))
invisible(suppressMessages(trace(
tools:::.install_packages,
tracer = quote(local({
prefix <- path.expand(Sys.getenv("R_LOCKDIR_PREFIX", ""))
lockdir <- NULL
# prepend the prefix when asked for lockdir and it's non-empty
makeActiveBinding(
"lockdir",
function (newval) {
if (!missing(newval)) lockdir <<- newval
if (nzchar(lockdir)) file.path(prefix, lockdir) else ""
},
parent.env(environment())
)
})),
print = FALSE
)))

This is the same workaround as you have originally suggested, making
the installation on a non-appendable filesystem possible at the cost of
losing atomicity, with the added downside that an update of R could
change the internals and break the patch.

In theory, the same approach could be used to wrap the function that
installs source packages [*] in order to first install the package in a
non-databrick directory, then populate a temporary directory on the
databrick with the package contents, then rename it to the intended
installation directory. This would be even more brittle and harder to
implement because the function is created during the runtime of
tools:::.install_packages.

It might be easier to support a patched version of R installed on
databricks than such an Rprofile.

-- 
Best regards,
Ivan

[*]
https://github.com/r-devel/r-svn/blob/0a905442c27b538c7626b21e262939873523f209/src/library/tools/R/install.R#L544

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Is it possible to gracefully interrupt a child R process on MS Windows?

2025-05-11 Thread Ivan Krylov via R-devel
On Sun, 11 May 2025 10:58:18 -0700
Henrik Bengtsson  wrote:

> Is it possible to gracefully interrupt a child R process on MS
> Windows, e.g. a PSOCK cluster node?

Not in the general case (I think, based on the code paths leading to
Rf_onintr() on Windows), but PSOCK cluster nodes are instances of
Rscript.exe running the terminal front-end, and the terminal front-end
interrupts R upon receipt of Ctrl+C and Ctrl+Break console events:
https://learn.microsoft.com/en-us/windows/console/setconsolectrlhandler

This makes it possible for ps::ps_interrupt() to spawn a child process
to attach to the console where Rscript.exe is running and generate this
event:
https://github.com/r-lib/ps/blob/042d4836ac584c95a59985171fdfa3b6baf2fa6c/src/interrupt.c#L33-L35

This probably needs specially written code in the children that expects
to be interrupted in order to work reliably, but interrupting the
children and then interrupting the parent and submitting another job to
the PSOCK cluster seems to have worked for me on R-4.5.0.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] array-bound error with GCC 13/14

2025-05-09 Thread Ivan Krylov via R-devel
В Fri, 9 May 2025 11:09:22 +1000
Stephen Wade  пишет:

> inlined from ‘std::vector literanger::adjust_pvalues(const
> std::vector&)’ at ../src/literanger/utility_math.h:99:48:
> /usr/include/c++/13/bits/stl_algobase.h:437:30: warning: ‘void*
> __builtin_memmove(void*, const void*, long unsigned int)’ writing
> between 9 and 9223372036854775807 bytes into a region of size 8
> overflows the destination [-Wstringop-overflow=]
>   437 |  __builtin_memmove(__result, __first, sizeof(_Tp) * _Num);

The same problem (different reproducer, slightly different warning, but
same place in the standard library and similar circumstances) has been
reported as a false positive in GCC, Bug 109717:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109717

There are comments by GCC developers acknowledging that the overflow
detection may "detect" overflows in code paths that cannot be taken,
but they don't see an easy way to fix the warnings on the compiler side.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Time to revisit ifelse ?

2025-07-11 Thread Ivan Krylov via R-devel
On Fri, 11 Jul 2025 04:41:13 -0400
Mikael Jagan  wrote:

> But perhaps we should aim for consensus on a few issues beforehand.

Thank you for raising this topic!

> (Sorry if these have been discussed to death already elsewhere. In
> that case, links to relevant threads would be helpful ...)

The data.table::fifelse issue [1] comes to mind together with the vctrs
article section about the need for a less strict ifelse() [2]. 

>  1. Should the type and class attribute of the return value be
> exactly the type and class attribute of c(yes[0L], no[0L]),
> independent of 'test'? Or something else?

Can we afford an escape hatch for cases when one of the ifelse()
branches is NA or other special value handled by the '[<-' method
belonging to the class of the other branch? data.table::fifelse() has a
not exactly documented special case where it coerces NA_LOGICAL to the
appropriate type, so that data.table::fifelse(runif(10) < .5,
Sys.Date(), NA) works as intended, and dplyr::if_else also supports
this case, but none of the other ifelses I tested do that.

Can we say that if only some of the 'yes' / 'no' / 'na' arguments have
classes, those must match and they determine the class of the return
value? It could be convenient, and it also could be a source of bugs.

>  2. What should be the attributes of the return value (other than
> 'class')?

data.table::fifelse (and kit::iif, which shares a lot of the code) also
preserve the names, but neither dplyr nor hutils do. I think it would
be reasonable to preserve the 'dim' attribute and thus the 'dimnames'
attribute too.

>  3. Should the new function be stricter and/or more verbose?
> E.g., should it signal a condition if length(yes) or length(no) is
> not equal to 1 nor length(test)?

Leaning towards yes, but only because I haven't met any uses for
recycling of non-length-1 inputs myself. An allow.recycle=FALSE option
is probably overkill, right?

>  4. Should the most common case, in which neither 'yes' nor 'no'
> has a 'class' attribute, be handled in C?

This could be a very reasonable performance-correctness trade-off.

> FWIW, my first (and untested) approximation of an ifelse2 is just
> this:
> 
>  function (test, yes, no)

I think a widely asked-for feature is a separate 'na' branch.

-- 
Best regards,
Ivan

[1] https://github.com/rdatatable/data.table/issues/3657

[2] https://vctrs.r-lib.org/articles/stability.html#ifelse

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bug in prettyNum

2025-05-23 Thread Ivan Krylov via R-devel
В Fri, 23 May 2025 11:47:33 +
Marttila Mikko via R-devel  пишет:

> When called with a numeric vector, the `replace.zero` argument is
> disregarded.
> 
> > prettyNum(0, zero.print = "- ", replace.zero = TRUE)  
> [1] "-"
> Warning message:
> In .format.zeros(x, zero.print, replace = replace.zero) :
>   'zero.print' is truncated to fit into formatted zeros; consider
> 'replace=TRUE'

> Please see below a patch which I believe would fix this.

Surprisingly, it's not enough. The 'replace' argument to .format.zeros
needs to be "threaded" through both the call to vapply(x, format, ...)
and the internal call from format.default(...) to prettyNum(...):

R> options(warn = 2, error = recover)
R> prettyNum(0, zero.print = "--", replace.zero = TRUE)
Error in .format.zeros(x, zero.print, replace = replace.zero) :
  (converted from warning) 'zero.print' is truncated to fit into formatted 
zeros; consider 'replace=TRUE'

Enter a frame number, or 0 to exit

 1: prettyNum(0, zero.print = "--", replace.zero = TRUE)
 2: vapply(x, format, "", big.mark = big.mark, big.interval = big.interval, sma
 3: FUN(X[[i]], ...)
 4: format.default(X[[i]], ...)
 5: prettyNum(.Internal(format(x, trim, digits, nsmall, width, 3, na.encode, sc
 6: .format.zeros(x, zero.print, replace = replace.zero)
 7: warning("'zero.print' is truncated to fit into formatted zeros; consider 'r
<...omitted...>
Selection: 6
<...>
Browse[1]> ls.str()
i0 :  logi TRUE
ind0 :  int 1
nc :  int 1
nx :  num 0
nz :  int 2
replace :  logi FALSE
warn.non.fitting :  logi TRUE
x :  chr "0"
zero.print :  chr "--"

Since prettyNum() accepts ... and thus ignores unknown arguments, it
seems to be safe to forward the ellipsis from format.default() to
prettyNum(). The patch survives LANGUAGE=en TZ=UTC make check-devel.

Index: src/library/base/R/format.R
===
--- src/library/base/R/format.R (revision 88229)
+++ src/library/base/R/format.R (working copy)
@@ -73,7 +73,7 @@
 decimal.mark = decimal.mark, input.d.mark = 
decimal.mark,
 zero.print = zero.print, drop0trailing = 
drop0trailing,
 is.cmplx = is.complex(x),
-preserve.width = if (trim) "individual" else 
"common"),
+preserve.width = if (trim) "individual" else 
"common", ...),
   ## all others (for now):
   stop(gettextf("Found no format() method for class \"%s\"",
 class(x)), domain = NA))
@@ -338,7 +338,8 @@
big.mark=big.mark, big.interval=big.interval,
small.mark=small.mark, small.interval=small.interval,
decimal.mark=decimal.mark, zero.print=zero.print,
-   drop0trailing=drop0trailing, ...)
+   drop0trailing=drop0trailing, replace.zero=replace.zero,
+   ...)
 }
 ## be fast in trivial case, when all options have their default, or "match"
 nMark <- big.mark == "" && small.mark == "" && (notChar || decimal.mark == 
input.d.mark)


-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Depricated to Defunct

2025-07-30 Thread Ivan Krylov via R-devel
On Wed, 30 Jul 2025 09:04:39 -0500
"Therneau, Terry M., Ph.D. via R-devel"  wrote:

> In the survival package the survConcordance function was replaced by
> concordance a while ago, the latter works for any ordered response
> (continuous, binary, survival, ...). I deprecated the old one a
> couple of years ago.

For a less painful switch from Deprecated to Defunct, it should help to
minimise its use in existing CRAN and Bioconductor packages.

On CRAN, it's still used by the following packages:

 - 'arsenal', one vignette, CRAN version from 2021, last change on
   GitHub in 2024
 - 'blockForest', R code, CRAN version from 2023
 - 'distcomp', R code, CRAN version from 2022
 - 'grpreg', \link to "survival-deprecated" in documentation
 - 'MTLR', R code, CRAN version from 2019, last change on GitHub in 2020

(A few remaining search results consist of commented out code and news
entries about having avoided the deprecation warning.)

On Bioconductor, it's two more packages, 'iNETgrate' and 'messina' (both
of which seem to be stale imports, not actual uses of the function):
https://code.bioconductor.org/search/search?q=survConcordance

The maintainers may need some help removing these last few uses of the
function.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] overriding built in Rbuildignore values

2025-08-07 Thread Ivan Krylov via R-devel
On Thu, 7 Aug 2025 14:11:08 -0500
Dirk Eddelbuettel  wrote:

> One suggestion is to use '*.sw[g-p]' to spare .swf files for Flash

On Unix-like systems (and, empirically, on Windows, despite ':help
swap-file' says the dots are replaced), Vim prepends a dot to the name
of the swap file. So how about only matching files that start with a
dot and end with .sw[certain letters]?

grepl(
 '(^|/)[.][^/]+[.]sw[a-p]$',
 c('.swap.file.swp', 'subdir/.swapfile.swn', 'not-a-swapfile.swc'),
 perl = TRUE
)
# [1]  TRUE TRUE FALSE

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Non-ASCII citation keys prevent compiling with LC_ALL=C

2025-08-16 Thread Ivan Krylov via R-devel
Hello R-devel,

I've been watching the development of automatic Rd bibliography
generation with great interest and I'm looking forward to using
\bibcitet{...} and \bibshow{*} in my packages. Currently, non-ASCII
characters used in the citation keys prevent R from successfully
compiling when the current locale encoding is unable to represent them:

% touch src/library/stats/man/factanal.Rd && LC_ALL=C make
...
installing parsed Rd
make[3]: Entering directory '.../src/library'
  base
Error: factanal.Rd:99: (converted from warning) Could not find
bibentries for the following keys: %s
  'R:Jreskog:1963'
Execution halted
make[3]: *** [Makefile:76: stats.Rdts] Error 1

But as long as the locale encoding can represent the key, it's fine:

% touch src/library/stats/man/factanal.Rd && \
 LC_ALL=en_GB.iso885915 luit make
(works well without a UTF-8 locale)

I think this can be made to work by telling tools:::process_Rd() ->
tools:::processRdChunk() to parse character strings in R code as UTF-8:

Index: src/library/tools/R/RdConv2.R
===
--- src/library/tools/R/RdConv2.R   (revision 88617)
+++ src/library/tools/R/RdConv2.R   (working copy)
@@ -229,8 +229,8 @@
code <- structure(code[tags != "COMMENT"],
  srcref = codesrcref) # retain for error locations
chunkexps <- tryCatch(
-   parse(text = sub("\n$", "", as.character(code)),
- keep.source = options$keep.source),
+   parse(text = sub("\n$", "", enc2utf8(as.character(code))),
+ keep.source = options$keep.source, encoding = "UTF-8"),
error = function (e) stopRd(code, Rdfile, conditionMessage(e))
)
 
That enc2utf8() may be extraneous, since tools::parse_Rd() is
documented to convert text to UTF-8 while parsing. The downsides are,
of course, parse(encoding=...) not working with MBCS locales and the
ever-present danger of breaking some user code that depends on the
current behaviour (this was tested using 'make check-devel', not on
CRAN packages).

Should R compile under LC_ALL=C? Maybe it's time for people whose
builds are failing to switch the continuous integration containers from
C to C.UTF-8?

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] overriding built in Rbuildignore values

2025-08-10 Thread Ivan Krylov via R-devel
On Thu, 7 Aug 2025 16:57:20 -0500
Dirk Eddelbuettel  wrote:

> I trust you checked that 'perl = TRUE' applies also to these entries
> from tools:::get_exclude_patterns() ?

It does: tools:::inRbuildignore() uses perl = TRUE and I've tested the
pattern in an .Rbuildignore file with an older version of R.

For completeness, this should cover all swap files that Vim could
create with the 'shortname' option unset:

Index: src/library/tools/R/build.R
===
--- src/library/tools/R/build.R (revision 88556)
+++ src/library/tools/R/build.R (working copy)
@@ -54,8 +54,11 @@
 c("^\\.Rbuildignore$",
   "(^|/)\\.DS_Store$",
   "^\\.(RData|Rhistory)$",
-  "~$", "\\.bak$", "\\.sw.$",
+  "~$", "\\.bak$",
   "(^|/)\\.#[^/]*$", "(^|/)#[^/]*#$",
+  ## Vim
+  "(^|/)([.][^/]+|_)?[.]sw[a-p]$",
+  "(^|/)([.][^/]+|_)?[.]s[a-v][a-z]$",
   ## Outdated ...
   "^TITLE$", "^data/00Index$",
   "^inst/doc/00Index\\.dcf$",

On Windows, swapfiles for unnamed buffers are named _.swp (and so on)
instead of .swp (and so on), even with the 'shortname' option unset.

This gives us test cases:

.swp
.foo.txt.swo
src/.bar.c.swn
inst/not-a-swapfile.swc
inst/.not-a-swapfile.swc.swc
.saa
_.saa

With the 'shortname' option set, collisions are possible: a swapfile
could be named foo_txt.swc. Hopefully nobody develops R packages using
Vim on an 8.3 filesystem.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Non-ASCII citation keys prevent compiling with LC_ALL=C

2025-08-18 Thread Ivan Krylov via R-devel
On Sun, 17 Aug 2025 08:01:04 +0200
Kurt Hornik  wrote:

> There were 10 non-ASCII keys so far: I have for now changed them to
> all ASCII.

Thank you very much, this fixes my problem!

> (My regular checks use C.UTF-8, but I am not sure how universally
> available this is?)

'locale -a' says that C.UTF-8 is available with glibc and musl on
Linux, also FreeBSD and OpenBSD, but not macOS (and setlocale(LC_ALL,
"C.UTF-8") indeed fails on the latter).

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] compiler::cmpfile output contains source code

2025-09-17 Thread Ivan Krylov via R-devel
В Wed, 10 Sep 2025 16:13:25 -0700
helgasoft  пишет:

> Command compiler::cmpfile(infile) outputs a binary (.Rc) file.
> The infile source code is contained in this output file.
> Is the source code required, and if not, is it possible to make it 
> optional ?

The source references are optional. Since cmpfile() calls parse(),
setting options(keep.source=FALSE) should prevent it from adding source
references.

On the other hand, by calling compiler::disassemble() on the bytecode
values, you will still be able to see the language objects from the
file, so a somewhat lossy representation of the source code is still
present. These language objects are used at least for error handling
and sometimes for internal consistency checks [*].

What's your use case?

-- 
Best regards,
Ivan

[*]
https://coolbutuseless.github.io/book/rbytecodebook/15-stored-expressions.html

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel