Re: [Rd] accelerating matrix multiply

2017-01-16 Thread Tomas Kalibera


Hi Robert,

thanks for the report and your suggestions how to make the NaN checks 
faster.


Based on my experiments it seems that the "break" in the loop actually 
can have positive impact on performance even in the common case when we 
don't have NaNs. With gcc on linux (corei7), where isnan is inlined, the 
"break" version uses a conditional jump while the "nobreak" version uses 
a conditional move. The conditional jump is faster because it takes 
advantage of the branch prediction. Neither of the two versions is 
vectorized (only scalar SSE instructions used).


How do you run R on Xeon Phi? Do you offload the NaN checks to the Phi 
coprocessor? So far I tried without offloading to Phi, icc could 
vectorize the "nobreak" version, but the performance of it was the same 
as "break".


For my experiments I extracted NaN checks into a function. This was the 
"break" version (same performance as the current code):


static __attribute__ ((noinline)) Rboolean hasNA(double *x, int n) {
  for (R_xlen_t i = 0; i < n; i++)
if (ISNAN(x[i])) return TRUE;
  return FALSE;
}

And this was the "nobreak" version:

static __attribute__ ((noinline)) Rboolean hasNA(double *x, int n) {
  Rboolean has = FALSE;
  for (R_xlen_t i = 0; i < n; i++)
if (ISNAN(x[i])) has=TRUE;
  return has;
}

Thanks,
Tomas

On 01/11/2017 02:28 PM, Cohn, Robert S wrote:

Do you have R code (including set.seed(.) if relevant) to show on how to 
generate
the large square matrices you've mentioned in the beginning?  So we get to some
reproducible benchmarks?


Hi Martin,

Here is the program I used. I only generate 2 random numbers and reuse them to 
make the benchmark run faster. Let me know if there is something I can do to 
help--alternate benchmarks, tests, experiments with compilers other than icc.

MKL LAPACK behavior is undefined for NaN's so I left the check in, just made it 
more efficient on a CPU with SIMD. Thanks for looking at this.

set.seed (1)
m <- 3
n <- 3
A <- matrix (runif(2),nrow=m,ncol=n)
B <- matrix (runif(2),nrow=m,ncol=n)
print(typeof(A[1,2]))
print(A[1,2])

# Matrix multiply
system.time (C <- B %*% A)
system.time (C <- B %*% A)
system.time (C <- B %*% A)

-Original Message-
From: Martin Maechler [mailto:maech...@stat.math.ethz.ch]
Sent: Tuesday, January 10, 2017 8:59 AM
To: Cohn, Robert S 
Cc: r-devel@r-project.org
Subject: Re: [Rd] accelerating matrix multiply


Cohn, Robert S 
 on Sat, 7 Jan 2017 16:41:42 + writes:

I am using R to multiply some large (30k x 30k double) matrices on a
64 core machine (xeon phi).  I added some timers to src/main/array.c
to see where the time is going. All of the time is being spent in the
matprod function, most of that time is spent in dgemm. 15 seconds is
in matprod in some code that is checking if there are NaNs.

system.time (C <- B %*% A)

nancheck: wall time 15.240282s
dgemm: wall time 43.111064s
  matprod: wall time 58.351572s
 user   system  elapsed
2710.154   20.999   58.398

The NaN checking code is not being vectorized because of the early
exit when NaN is detected:

/* Don't trust the BLAS to handle NA/NaNs correctly: PR#4582
 * The test is only O(n) here.
 */
for (R_xlen_t i = 0; i < NRX*ncx; i++)
if (ISNAN(x[i])) {have_na = TRUE; break;}
if (!have_na)
for (R_xlen_t i = 0; i < NRY*ncy; i++)
if (ISNAN(y[i])) {have_na = TRUE; break;}

I tried deleting the 'break'. By inspecting the asm code, I verified
that the loop was not being vectorized before, but now is vectorized.
Total time goes down:

system.time (C <- B %*% A)
nancheck: wall time  1.898667s
dgemm: wall time 43.913621s
  matprod: wall time 45.812468s
 user   system  elapsed
2727.877   20.723   45.859

The break accelerates the case when there is a NaN, at the expense of
the much more common case when there isn't a NaN. If a NaN is
detected, it doesn't call dgemm and calls its own matrix multiply,
which makes the NaN check time insignificant so I doubt the early exit
provides any benefit.

I was a little surprised that the O(n) NaN check is costly compared to
the O(n**2) dgemm that follows. I think the reason is that nan check
is single thread and not vectorized, and my machine can do 2048
floating point ops/cycle when you consider the cores/dual issue/8 way
SIMD/muladd, and the constant factor will be significant for even
large matrices.

Would you consider deleting the breaks? I can submit a patch if that
will help. Thanks.

Robert

Thank you Robert for bringing the issue up ("again", possibly).
Within R core, some have seen somewhat similar timing on some platforms (gcc) 
.. but much less dramatical differences e.g. on macOS with clang.

As seen in the source code you cite above, the current implementation was 
triggered by a nasty BLAS bug .. actually also showing up only on some 
platforms, possibly depending on runtime libraries in addition to the compilers 
used.

Do you have R code (inclu

Re: [Rd] accelerating matrix multiply

2017-01-17 Thread Tomas Kalibera

Hi Robert,

I've run more experiments (and yes, the code is probably too long for 
the list). The tradeoffs are platform dependent. The "nobreak" version 
is slower than "break" on a corei7 (i7-3610QM), it is faster on opteron 
(6282) and it is about the same on Xeon (E5-2640, E5-2670 even though 
seen slower for big vectors).


It may be hard to get a universally better version. Still, a version 
that performs fastest on platforms I checked, and sometimes by a lot - 
about 2x faster than default - is


Rboolean hasNaN_pairsum(double *x, R_xlen_t n)
{
if ((n&1) != 0 && ISNAN(x[0]))
return TRUE;
for (int i = n&1; i < n; i += 2)
if (ISNAN(x[i]+x[i+1])) /* may also return TRUE for +-Inf */
return TRUE;
return FALSE;
}

It may also return "true" when some elements are Inf, but that is 
safe/conservative for this purpose, and actually the MKL disclaimer 
suggests we should be checking for Inf anyway.
This version is from pqR (except that pqR would check also the 
individual arguments of the sum, it the sum is found to have NaN).

Does it perform well on Knights Landing?

Best
Tomas


On 01/16/2017 06:32 PM, Cohn, Robert S wrote:

Hi Tomas,

Can you share the full code for your benchmark, compiler options, and 
performance results so that I can try to reproduce them? There are a lot of 
variables that can affect the results. Private email is fine if it is too much 
for the mailing list.

I am measuring on Knight's Landing (KNL) that was released in November. KNL is 
not a co-processor so no offload is necessary. R executes directly on the Phi, 
which looks like a multi-core machine with 64 cores.

Robert

-Original Message-
From: Tomas Kalibera [mailto:tomas.kalib...@gmail.com]
Sent: Monday, January 16, 2017 12:00 PM
To: Cohn, Robert S 
Cc: r-devel@r-project.org
Subject: Re: [Rd] accelerating matrix multiply


Hi Robert,

thanks for the report and your suggestions how to make the NaN checks faster.

Based on my experiments it seems that the "break" in the loop actually can have positive impact on 
performance even in the common case when we don't have NaNs. With gcc on linux (corei7), where isnan is 
inlined, the "break" version uses a conditional jump while the "nobreak" version uses a 
conditional move. The conditional jump is faster because it takes advantage of the branch prediction. Neither 
of the two versions is vectorized (only scalar SSE instructions used).

How do you run R on Xeon Phi? Do you offload the NaN checks to the Phi coprocessor? So far I tried 
without offloading to Phi, icc could vectorize the "nobreak" version, but the performance 
of it was the same as "break".

For my experiments I extracted NaN checks into a function. This was the "break" 
version (same performance as the current code):

static __attribute__ ((noinline)) Rboolean hasNA(double *x, int n) {
for (R_xlen_t i = 0; i < n; i++)
  if (ISNAN(x[i])) return TRUE;
return FALSE;
}

And this was the "nobreak" version:

static __attribute__ ((noinline)) Rboolean hasNA(double *x, int n) {
Rboolean has = FALSE;
for (R_xlen_t i = 0; i < n; i++)
  if (ISNAN(x[i])) has=TRUE;
return has;
}

Thanks,
Tomas

On 01/11/2017 02:28 PM, Cohn, Robert S wrote:

Do you have R code (including set.seed(.) if relevant) to show on how
to generate the large square matrices you've mentioned in the
beginning?  So we get to some reproducible benchmarks?

Hi Martin,

Here is the program I used. I only generate 2 random numbers and reuse them to 
make the benchmark run faster. Let me know if there is something I can do to 
help--alternate benchmarks, tests, experiments with compilers other than icc.

MKL LAPACK behavior is undefined for NaN's so I left the check in, just made it 
more efficient on a CPU with SIMD. Thanks for looking at this.

set.seed (1)
m <- 3
n <- 3
A <- matrix (runif(2),nrow=m,ncol=n)
B <- matrix (runif(2),nrow=m,ncol=n)
print(typeof(A[1,2]))
print(A[1,2])

# Matrix multiply
system.time (C <- B %*% A)
system.time (C <- B %*% A)
system.time (C <- B %*% A)

-Original Message-
From: Martin Maechler [mailto:maech...@stat.math.ethz.ch]
Sent: Tuesday, January 10, 2017 8:59 AM
To: Cohn, Robert S 
Cc: r-devel@r-project.org
Subject: Re: [Rd] accelerating matrix multiply


Cohn, Robert S 
  on Sat, 7 Jan 2017 16:41:42 + writes:

I am using R to multiply some large (30k x 30k double) matrices on a
64 core machine (xeon phi).  I added some timers to src/main/array.c
to see where the time is going. All of the time is being spent in the
matprod function, most of that time is spent in dgemm. 15 seconds is
in matprod in some code that is checking if there are NaNs.

system.time (C <- B %*% A)

nancheck: wall time 15.240282s
 dgemm: wall time 43.111064s
   matprod: wall time 58.35

Re: [Rd] NaN behavior of cumsum

2017-01-24 Thread Tomas Kalibera

Hi Lukas,

thanks for the report. I've changed cumsum so that it is now consistent 
with cumprod wrt to NA/NaN propagation.

Now NaN is not turned into NA unnecessarily.

Still please be aware that generally NaNs may become NAs in R (on some 
platforms/compilers)

?NaN says

"Computations involving ‘NaN’ will return ‘NaN’ or perhaps ‘NA’:
which of those two is not guaranteed and may depend on the R
platform (since compilers may re-order computations)."

Best
Tomas

On 01/20/2017 02:52 PM, Lukas Stadler wrote:

Hi!

I noticed that cumsum behaves different than the other cumulative functions 
wrt. NaN values:

values <- c(1,2,NaN,1)
for ( f in c(cumsum, cumprod, cummin, cummax)) print(f(values))

[1]  1  3 NA NA
[1]   1   2 NaN NaN
[1]   1   1 NaN NaN
[1]   1   2 NaN NaN

The reason is that cumsum (in cum.c:33) contains an explicit check for ISNAN.
Is that intentional?
IMHO, ISNA would be better (because it would make the behavior consistent with 
the other functions).

- Lukas
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Ancient C /Fortran code linpack error

2017-02-09 Thread Tomas Kalibera

On 02/09/2017 05:05 PM, Berend Hasselman wrote:

On 9 Feb 2017, at 16:00, Göran Broström  wrote:

In my package 'glmmML' I'm using old C code and linpack in the optimizing 
procedure. Specifically, one part of the code looks like this:

F77_CALL(dpoco)(*hessian, &bdim, &bdim, &rcond, work, info);
if (*info == 0){
F77_CALL(dpodi)(*hessian, &bdim, &bdim, det, &job);


This usually works OK, but with an ill-conditioned data set (from a user of 
glmmML) it happened that the hessian was all nan. However, dpoco returned *info 
= 0 (no error!) and then the call to dpodi hanged R!

I googled for C and nan and found a work-around: Change 'if ...' to

   if (*info == 0 & (hessian[0][0] == hessian[0][0])){

which works as a test of hessian[0][0] (not) being NaN.

I'm using the .C interface for calling C code.

Any thoughts on how to best handle the situation? Is this a bug in dpoco? Is 
there a simple way to test for any NaNs in a vector?

You should/could use macro R_FINITE to test each entry of the hessian.
In package nleqslv I test for a "correct" jacobian like this in file nleqslv.c 
in function fcnjac:

 for (j = 0; j < *n; j++)
 for (i = 0; i < *n; i++) {
 if( !R_FINITE(REAL(sexp_fjac)[(*n)*j + i]) )
 error("non-finite value(s) returned by jacobian 
(row=%d,col=%d)",i+1,j+1);
 rjac[(*ldr)*j + i] = REAL(sexp_fjac)[(*n)*j + i];
 }

There may be a more compact way with a macro in the R headers.
I feel that If other code can't handle non-finite values: then test.

Berend Hasselman
And if performance was of importance, you could use the trick from 
mayHaveNaNOrInf in array.c (originally from pqR), but be careful to test 
the individual operands of the sum.
mayHaveNaNOrInf does not do the test for performance reasons, but as a 
result it can have false positives.


Rboolean hasNaNOrInf(double *x, R_xlen_t n)
{
if ((n&1) != 0 && !R_FINITE(x[0]))
return TRUE;
for (R_xlen_t i = n&1; i < n; i += 2)
if (!R_FINITE(x[i]+x[i+1])&& (!R_FINITE(x[i]) || !R_FINITE(x[i+1]))
return TRUE;
return FALSE;
}

Tomas


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Debugging tools and practices in Windows?

2017-02-23 Thread Tomas Kalibera

The R for Windows FAQ suggests "make DEBUG=T" and has some more hints
https://cran.r-project.org/bin/windows/base/rw-FAQ.html

Tomas


On 02/23/2017 08:10 PM, Javier Luraschi wrote:

Right, I'm talking about C code.

Do you remember if you had to set specific CFLAGS or other settings to get
gdb working? I wasn't able to get gdb() working with the standard build
settings.

Otherwise, Rprintf() would work for sure.

Thank you!



On Thu, Feb 23, 2017 at 10:55 AM, Duncan Murdoch 
wrote:


On 23/02/2017 1:36 PM, Javier Luraschi wrote:


Hello r-devel, could someone share the tools and practices they use to
debug the core R sources in Windows?

For instance, I would like to set a breakpoint in `gl_loadhistory` and
troubleshoot from there.


You're talking about debugging the C code rather than the R code, I think.

These days I mostly avoid debugging in Windows, but when I have to do it,
I use gdb.  There used to be a front end for it (Insight) that worked in
Windows, but I don't think it works with our current gdb build.  Google
names lots of other front ends, but I haven't tried any of them in Windows.

The other choice is the old fashioned method:  add lots of Rprintf()
statements to the source and recompile.

Duncan Murdoch



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Error: memory exhausted (limit reached?)

2017-03-15 Thread Tomas Kalibera

Hi Guillaume,

the error "C stack usage is too close to the limit" is usually caused by 
infinite recursion.
It can also be caused by an external library that corrupts the C stack 
(such as Java on Linux, e.g. when using rJava).


I cannot repeat the problem on my machine.

To rule out the second option, you can try in a fresh R session without 
loading any packages (check with sessionInfo).
To diagnose the first option one would need to know if and where the 
infinite recursion happens, which can be found with gdb on a machine 
where the problem can be repeated.


Best
Tomas


On 03/15/2017 12:01 PM, Guillaume MULLER wrote:

Hi,

I first posted this message on r-help, but got redirected here.

I encounter a strange memory error, and I'd like some help to determine if I'm 
doing something wrong or if there's a bug in recent R versions...

I'm currently working on a DeepNet project @home, with an old PC with 4Gb RAM, 
running Ubuntu 16.04.

For efficiency reason, I preprocessed my dataset and stored it as a csv file 
with write.csv() so that I can reload it at will with read.csv(). I did it 
several time, everything was working fine.

A few days ago, I tried to pursue my work on anther machine @work, I wanted to use 
a more recent & powerful machine with 8Gb of RAM running under Ubuntu 16.10, 
but I ran into a strange error:

$ R
16:05:12 R > trainSet <- read.csv("trainSetWhole.csv")
Error: memory exhausted (limit reached?)
Error: C stack usage  7970548 is too close to the limit


I read a few fora on the Internet and found a potential workaround, consisting 
in increasing the stack size using ulimit. Unfortunately, it doesn't work for 
me:

$ ulimit -s
8192
$ ulimit -s $((100*$(ulimit -s)))
$ R --vanilla
16:05:12 R > trainSet <- read.csv("trainSetWhole.csv")
Error: memory exhausted (limit reached?)


This was under Ubuntu 16.10 with R version 3.3.1 (2016-06-21) "Bug in Your Hair"

Yesterday, I upgraded my Ubuntu to 17.04 (R version 3.3.2 (2016-10-31) "Sincere 
Pumpkin Patch") and tried again. This resulted in the exact same error.

How is it possible that a 513MB file cannot be read on a machine with 8GB RAM?
Also, how is it possible that a machine with twice the RAM as the previous one 
cannot load the same file?

Since the only other difference is the Ubuntu version (thus R version), I 
assume there's a bug in R/csv loader, but I don't know where to look for...

If anyone has an idea I would be glad to hear it...


GM


For the sake of completeness, I share my Trainset at the following link:

https://mega.nz/#!ZMs0TSRJ!47DCZCnE6_FnICUp8MVS2R9eY_GdVIyGZ5O9TiejHfc

FYI, it loads perfectly on 2 machines with Ubuntu 16.04 and R version 3.2.3 (2015-12-10) -- "Wooden 
Christmas-Tree". But exceededs stack mem under Ubuntu 16.10's R version 3.3.1 (2016-06-21) -- "Bug 
in Your Hair" and Ubuntu to 17.04's R version 3.3.2 (2016-10-31) "Sincere Pumpkin Patch".

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] failure of make check-all

2017-04-07 Thread Tomas Kalibera


The warning that dummy_ii returns address of local variable is benign, 
this function is intended to do so.

I've silenced it in R-devel.

Tomas


On 04/06/2017 03:51 PM, Therneau, Terry M., Ph.D. wrote:

Peter,
  Retry how much of it?  That is, where does it go in the sequence 
from svn up to make check?  I'll update my notes so as to do it 
correctly.


In any case, I put it first and reran the whole command chain.  I had 
recently upgraded linux from 14.xx LTS to 16.04 LTS so it makes sense 
to start over.  This removed the large diff in base-Ex.Rout from the 
first part of the log, but the terminal error still remains:


make[3]: Leaving directory '/usr/local/src/R-devel/tests'
make[3]: Entering directory '/usr/local/src/R-devel/tests'
running code in 'reg-tests-3.R' ... OK
  comparing 'reg-tests-3.Rout' to './reg-tests-3.Rout.save' ... OK
running code in 'reg-examples3.R' ...Makefile.common:98: recipe for 
target 'reg-examples3.Rout' failed

make[3]: *** [reg-examples3.Rout] Error 1
make[3]: Leaving directory '/usr/local/src/R-devel/tests'
Makefile.common:273: recipe for target 'test-Reg' failed

Here are lines 97-100 of tests/Makefile.common:

.R.Rout:
@rm -f $@ $@.fail $@.log
@$(ECHO) $(ECHO_N) "running code in '$<' ...$(ECHO_C)" > $@.log
@$(R) < $< > $@.fail 2>&1 || (cat $@.log && rm $@.log && exit 1)


There is one complier warning message, it prints in pink so as not to 
miss it!


main.c: In function ‘dummy_ii’:
main.c:1669:12: warning: function returns address of local variable 
[-Wreturn-local-addr]

 return (uintptr_t) ⅈ

-

So as to be more complete I did "cd tests; R" and 
source("reg-examples3.R"), and lo and behold the error is



source('reg-examples3.R')

Loading required package: MASS
Loading required package: survival
Error in fitter(X, Y, strats, offset, init, control, weights = 
weights,  :

  object 'Ccoxmart2' not found

Looking at src/coxmart2.c and src/init.c I don't see anything 
different than the other dozen .C routines in my survival package.  
The file tests/book7.R in the package exercises this routine, and CMD 
check passes.


Hints?

Terry T.






On 04/06/2017 07:52 AM, peter dalgaard wrote:
You may want to retry that after a make distclean, in case anything 
changed in the toolchain.


-pd



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Crash after (wrongly) applying product operator on S4 object that derives from list

2017-04-19 Thread Tomas Kalibera


We're working on a workaround for the JVM issue, it should be available 
in rJava soon.
(the JVM issue is only on Linux and it turns infinite/deep recursion 
into a crash of R; it also effectively reduces the R stack size)


Best
Tomas

On 04/19/2017 02:56 PM, Michael Lawrence wrote:

I think this is a known issue with Java messing with the stack, see
e.g. 
http://r.789695.n4.nabble.com/Error-memory-exhausted-limit-reached-td4729708.html.

I'll fix the infinite recursion caused by the methods package.

Michael

On Wed, Apr 19, 2017 at 1:12 AM, Wolfgang Huber  wrote:

Dear Hilmar

Perhaps this gives an indication of why the infinite recursion happens:

## after calling `*` on ma and a matrix:


showMethods(classes=class(ma), includeDefs=TRUE, inherited = TRUE)


Function: * (package base)
e1="FOOCLASS", e2="matrix"
 (inherited from: e1="vector", e2="structure")
 (definition from function "Ops")
function (e1, e2)
{
 value <- callGeneric(e1, e2@.Data)
 if (length(value) == length(e2)) {
 e2@.Data <- value
 e2
 }
 else value
}




is(ma, "vector")

[1] TRUE

I got that in a fresh session of

sessionInfo()

R Under development (unstable) (2017-04-18 r72542)
Platform: x86_64-apple-darwin16.5.0 (64-bit)
Running under: macOS Sierra 10.12.4

Best wishes
Wolfgang

19.4.17 10:01, Hilmar Berger scripsit:

Hi,

following up on my own question, I found smaller example that does not
require LIMMA:

setClass("FOOCLASS",
  representation("list")
)
ma = new("FOOCLASS", list(M=matrix(rnorm(300), 30,10)))


ma * ma$M

Error: C stack usage  7970512 is too close to the limit


library(xlsx)

Loading required package: rJava
Loading required package: xlsxjars

ma * ma$M

---> Crash

xlsx seems to act like a catalyst here, with the product operator
running in a deep nested iteration, exhausting the stack. Valgrind shows
thousands of invalid stack accesses when loading xslx, which might
contribute to the problem. Package xlsx has not been updated since 2014,
so it might fail with more current versions of R or Java (I'm using
Oracle Java 8).

Still, even if xlsx was the package to be blamed for the crash, I fail
to understand what exactly the product operator is trying to do in the
multiplication of the matrix with the object.

Best regards,
Hilmar

On 18/04/17 18:57, Hilmar Berger wrote:

Hi,

this is a problem that occurs in the presence of two libraries (limma,
xlsx) and leads to a crash of R. The problematic code is the wrong
application of sweep or the product ("*") function on an LIMMA MAList
object. To my knowledge, limma does not define a "*" method for MAList
objects.

If only LIMMA is loaded but not package xlsx, the code does not crash
but rather produces an error ("Error: C stack usage  7970512 is too
close to the limit"). Loading only package rJava instead of xlsx does
also not produce the crash but the error message instead. Note that
xlsx functions are not explicitly used.

It could be reproduced on two different Linux machines running
R-3.2.5, R-3.3.0 and R-3.3.2.

Code to reproduce the problem:
-
library(limma)
library(xlsx)

# a MAList
ma = new("MAList", list(A=matrix(rnorm(300), 30,10),
M=matrix(rnorm(300), 30,10)))

# This should actually be sweep(ma$M, ...) for functional code, but I
omitted the $M...
#sweep(ma, 2, c(1:10), "*")
# sweep will crash when doing the final operation of applying the
function over the input matrix, which in this case is function "*"

f = match.fun("*")
# This is not exactly the same as in sweep but it also tries to
multiply the MAList object with a matrix of same size and leads to the
crash
f(ma, ma$M)
# ma * ma$M has the same effect
-

My output:

R version 3.3.0 (2016-05-03) -- "Supposedly Educational"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

   Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.


library(limma)
library(xlsx)

Loading required package: rJava
Loading required package: xlsxjars

sessionInfo()

R version 3.3.0 (2016-05-03)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

locale:
  [1] LC_CTYPE=en_US.UTF-8  LC_NUMERIC=C LC_TIME=en_US.UTF-8
  [4] LC_COLLATE=en_US.UTF-8LC_MONETARY=en_US.UTF-8
LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8  LC_NAME=en_US.UTF-8
LC_ADDRESS=en_US.UTF-8
[10] LC_TELEPHONE=en_US.UTF-8  LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8

a

Re: [Rd] Crash after (wrongly) applying product operator on object from LIMMA package

2017-04-24 Thread Tomas Kalibera


Yes, the mentioned pull request for rJava is to workaround a JVM problem 
that is Linux-only.

I am not aware of any related problem on OSX.

In general, you can currently get two kinds of messages with infinite 
recursion:


Error: evaluation nested too deeply: infinite recursion / 
options(expressions=)?

Error: C stack usage XXX is too close to the limit

but in some situations you could get a segfault, an obvious one is 
infinite recursion in native code of a package.


If you're getting a segfault it is probably best to check with a 
debugger - infinite recursion would be easily identified in the stacktrace.


Best
Tomas


On 04/24/2017 11:58 AM, Hilmar Berger wrote:

Hi January,

I believe the root of the xlsx issue has been identified and a fix
suggested by Tomas Kalibera (see https://github.com/s-u/rJava/pull/102).
In a nutshell, Oracle Java on Linux modifies the stack in a way that
makes it smaller and and the same time makes it impossible for R to
detect this change, leading to segfaults. It is not clear to me that the
same problem would occur on Mac, since the behavior of Oracle seems to
be Linux specific. Possibly even Linux users on OpenJDK might not
encounter any problems (not tested).

So possibly the next release of rJava should also fix the xlsx problems
with other packages.

Best regards,
Hilmar


On 24/04/17 11:46, January W. wrote:

Hi Hilmar,

weird. The memory problem seems be due to recursion (my R, version
3.3.3, says: Error: evaluation nested too deeply: infinite recursion /
options(expressions=)?, just write traceback() to see how it happens),
but why does it segfault with xlsx? Nb xlsx is the culprit: neither
rJava nor xlsxjars cause the problem.

On the other hand, quick googling for r+xlsx+segfault returns tons of
reports of how xlsx crashes in dozens of situations. See for example
http://r.789695.n4.nabble.com/segfault-in-gplots-heatmap-2-td4641808.html.
Also, the problem might be platform-specific. It would be interesting
to see whether anyone with a Mac can reproduce it.

kind regards,

j.





On 19 April 2017 at 10:01, Hilmar Berger mailto:ber...@mpiib-berlin.mpg.de>> wrote:

 Hi,

 following up on my own question, I found smaller example that does
 not require LIMMA:

 setClass("FOOCLASS",
  representation("list")
 )
 ma = new("FOOCLASS", list(M=matrix(rnorm(300), 30,10)))

 > ma * ma$M
 Error: C stack usage  7970512 is too close to the limit

 > library(xlsx)
 Loading required package: rJava
 Loading required package: xlsxjars
 > ma * ma$M
 ---> Crash

 xlsx seems to act like a catalyst here, with the product operator
 running in a deep nested iteration, exhausting the stack. Valgrind
 shows thousands of invalid stack accesses when loading xslx, which
 might contribute to the problem. Package xlsx has not been updated
 since 2014, so it might fail with more current versions of R or
 Java (I'm using Oracle Java 8).

 Still, even if xlsx was the package to be blamed for the crash, I
 fail to understand what exactly the product operator is trying to
 do in the multiplication of the matrix with the object.

 Best regards,
 Hilmar


 On 18/04/17 18:57, Hilmar Berger wrote:

 Hi,

 this is a problem that occurs in the presence of two libraries
 (limma, xlsx) and leads to a crash of R. The problematic code
 is the wrong application of sweep or the product ("*")
 function on an LIMMA MAList object. To my knowledge, limma
 does not define a "*" method for MAList objects.

 If only LIMMA is loaded but not package xlsx, the code does
 not crash but rather produces an error ("Error: C stack usage
 7970512 is too close to the limit"). Loading only package
 rJava instead of xlsx does also not produce the crash but the
 error message instead. Note that xlsx functions are not
 explicitly used.

 It could be reproduced on two different Linux machines running
 R-3.2.5, R-3.3.0 and R-3.3.2.

 Code to reproduce the problem:
 -
 library(limma)
 library(xlsx)

 # a MAList
 ma = new("MAList", list(A=matrix(rnorm(300), 30,10),
 M=matrix(rnorm(300), 30,10)))

 # This should actually be sweep(ma$M, ...) for functional
 code, but I omitted the $M...
 #sweep(ma, 2, c(1:10), "*")
 # sweep will crash when doing the final operation of applying
 the function over the input matrix, which in this case is
 function "*"

 f = match.fun("*")
 # This is not exactly the same as in sweep but it also tries
 to multiply the MAList object with a matrix of same size and
 leads to the crash
 

Re: [Rd] tempdir() may be deleted during long-running R session

2017-04-26 Thread Tomas Kalibera


I agree this should be solved in configuration of 
systemd/tmpreaper/whatever tmp cleaner - the cleanup must be prevented 
in configuration files of these tools. Moving session directories under 
/var/run (XDG_RUNTIME_DIR) does not seem like a good solution to me, 
sooner or later someone might come with auto-cleaning that directory too.


It might still be useful if R could sometimes detect when automated 
cleanup happened and warn the user. Perhaps a simple way could be to 
always create an empty file inside session directory, like 
".tmp_cleaner_trap". R would never touch this file, but check its 
existence time-to-time. If it gets deleted, R would issue a warning and 
ask the user to check tmp cleaner configuration. The idea is that this 
file will be the oldest one in the session directory, so would get 
cleaned up first.


Tomas


On 04/26/2017 02:29 PM, Duncan Murdoch wrote:

On 26/04/2017 4:21 AM, Martin Maechler wrote:

  
on Tue, 25 Apr 2017 21:13:59 -0700 writes:


> On Tue, Apr 25, 2017 at 02:41:58PM +, Cook, Malcolm wrote:
>> Might this combination serve the purpose:
>> * R session keeps an open handle on the tempdir it creates,
>> * whatever tempdir harvesting cron job the user has be made 
sensitive enough not to delete open files (including open directories)


I also agree that the above would be ideal - if possible.

> Good suggestion but doesn't work with the (increasingly popular)
> "Systemd":

> $ mkdir /tmp/somedir
> $ touch -d "12 days ago" /tmp/somedir/
> $ cd /tmp/somedir/
> $ sudo systemd-tmpfiles --clean
> $ ls /tmp/somedir/
> ls: cannot access '/tmp/somedir/': No such file or directory

Some thing like your example is what I'd expect is always a
possibility on some platforms, all of course depending on low
things such as  root/syadmin/...  "permission" to clean up etc.

Jeroeen mentioned the fact that tempdir()s also can disappear
for other reasons {his was multicore child processes
.. bugously(?) implemented}.
Further reasons may be race conditions / user code bugs / user
errors, etc.
Note that the R process which created the tempdir on startup
always has the permission to remove it again.  But you can also
think a full file system, etc.

Current  R-devel'stempdir(check = TRUE)   would create a new
one or give an error (and then the user should be able to use
Sys.setenv("TEMPDIR" ...)
to a directory she has write-permission )

Gabe's point of course is important too: If you have a long
running process that uses a tempfile,
and if  "big brother"  has removed the full tempdir() you will
be "unhappy" in any case.
Trying to prevent big brother from doing that in all cases seems
"not easy" in any case.

I did want to provide an easy solution to the OP situation:
Suddenly tmpdir() is gone, and quite a few things stop working
in the current R process {he mentioned  help(), e.g.}.
With new   tmpdir(check=TRUE)  facility, code could be changed
to replace

   tempfile("foo")

either by
   tempfile("foo", tmpdir=tempdir(check=TRUE))

or by something like

   tryCatch(tempfile("foo"),
 error = function(e)
tempfile("foo", tmpdir=tempdir(check=TRUE)))

or be even more sophisticated.

We could also consider allowing   check =  TRUE | NA | FALSE

and make  NA  the default and have that correspond to
check =TRUE  but additionally do the equivalent of
   warning("tempdir() has become invalid and been recreated")
in case the tempdir() had been invalid.

> I would advocate just changing 'tempfile()' so that it 
recreates the
> directory where the file is (the "dirname") before returning 
the file
> path. This would have fixed the issue I ran into. Changing 
'tempdir()'

> to recreate the directory is another option.

In the end I had decided that

  tempfile("foo", tmpdir = tempdir(check = TRUE))

is actually better self-documenting than

  tempfile("foo", checkDir = TRUE)

which was my first inclination.

Note again that currently, the checking is _off_ by default.
I've just provided a tool -- which was relatively easy and
platform independent! --- to do more (real and thought)
experiments.


This seems like the wrong approach.  The problem occurs as soon as the 
tempdir() gets cleaned up:  there could be information in temp files 
that gets lost at that point.  So the solution should be to prevent 
the cleanup, not to continue on after it has occurred (as "check = 
TRUE" does).  This follows the principle that it's better for the 
process to always die than to sometimes silently produce incorrect 
results.


Frederick posted the way to do this in systems using systemd.  We 
should be putting that in place, or the equivalent on systems using 
other tempfile cleanups.  This looks to me like something that "make 
install" should do, or perhaps it should be done by people putting 
together packages for specific systems.


Duncan Murdoch

__

Re: [Rd] Byte compilation with window<- in R3.4.0

2017-05-02 Thread Tomas Kalibera

Thanks for the report, fixed in R-devel and R-patched.

Best
Tomas

On 04/30/2017 10:48 PM, Christoph Sax wrote:

Hi,

I am running into a problem when I use the window<- replacement function in R
3.4.0. It will lead to an error when it is called inside a loop, probably
the result of the byte compiler now enabled by default.

When I turn it off, it works again, as in older versions of R. I tested on Win,
Linux and Mac, and the problem occurs everywhere.

Here is a reproducible example:

z <- AirPassengers

# this works
window(z, start =  c(1955, 1), end =  c(1955, 1)) <- NA

# but this does not
for (i in 1) {
  window(z, start =  c(1955, 1), end =  c(1955, 1)) <- NA
}
# Error in stats::window(x = `*tmp*`, start = c(1955, 1), end = c(1955,  :
#   object '*tmp*' not found

# turning off the byte compiler makes it working again (as in older R versions)
compiler::enableJIT(0)
for (i in 1) {
  window(z, start =  c(1955, 1), end =  c(1955, 1)) <- NA
}

Any help is very much appreciated.

Thanks,
Christoph

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] complex tests failure

2017-05-04 Thread Tomas Kalibera


As a quick fix, you can undefine HAVE_CTANH in complex.c, somewhere 
after including config.h

An internal substitute, which is implemented inside complex.c, will be used.

Best
Tomas



On 05/04/2017 02:57 PM, Kasper Daniel Hansen wrote:

For a while I have been getting that the complex tests fails on RHEL 6.
The specific issue has to do with tanh (see below for full output from
complex.Rout.fail).

This is both with the stock compiler (GCC 4.4.7) and a compiler supplied
through the conda project (GCC 4.8.5).  The compiler supplied through conda
ends up linking R to certain system files, so the binary is not completely
independent (although most dynamically linked libraries are coming from the
conda installation).

A search on R-devel reveals a discussion in April on an issue reported on
Windows with a bug in tanh in old versions of the GNU C standard library;
this seems relevant.  The discussion by Martin Maechler suggest "using R's
internal substitute".  So how do I enable this?  Or does this requires
updating the C standard library?

** From complex.Rout.fail


stopifnot(identical(tanh(356+0i), 1+0i))

Error: identical(tanh(356 + (0+0i)), 1 + (0+0i)) is not TRUE
In addition: Warning message:
In tanh(356 + (0+0i)) : NaNs produced in function "tanh"
Execution halted

Best,
Kasper

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] complex tests failure

2017-05-04 Thread Tomas Kalibera

There is no way to control this at runtime.
We will probably have to add a configure test.

Best,
Tomas

On 05/04/2017 03:23 PM, Kasper Daniel Hansen wrote:
> Thanks.
>
> I assume there is no way to control this via. environment variables or 
> configure settings?  Obviously that would be great for something like 
> this which affects tests and seems to be a known problem for older C 
> standard libraries.
>
> Best,
> Kasper
>
> On Thu, May 4, 2017 at 9:12 AM, Tomas Kalibera 
> mailto:tomas.kalib...@gmail.com>> wrote:
>
>
> As a quick fix, you can undefine HAVE_CTANH in complex.c,
> somewhere after including config.h
> An internal substitute, which is implemented inside complex.c,
> will be used.
>
> Best
> Tomas
>
>
>
>
> On 05/04/2017 02:57 PM, Kasper Daniel Hansen wrote:
>
> For a while I have been getting that the complex tests fails
> on RHEL 6.
> The specific issue has to do with tanh (see below for full
> output from
> complex.Rout.fail).
>
> This is both with the stock compiler (GCC 4.4.7) and a
> compiler supplied
> through the conda project (GCC 4.8.5).  The compiler supplied
> through conda
> ends up linking R to certain system files, so the binary is
> not completely
> independent (although most dynamically linked libraries are
> coming from the
> conda installation).
>
> A search on R-devel reveals a discussion in April on an issue
> reported on
> Windows with a bug in tanh in old versions of the GNU C
> standard library;
> this seems relevant.  The discussion by Martin Maechler
> suggest "using R's
> internal substitute".  So how do I enable this?  Or does this
> requires
> updating the C standard library?
>
> ** From complex.Rout.fail
>
> stopifnot(identical(tanh(356+0i), 1+0i))
>
> Error: identical(tanh(356 + (0+0i)), 1 + (0+0i)) is not TRUE
> In addition: Warning message:
> In tanh(356 + (0+0i)) : NaNs produced in function "tanh"
> Execution halted
>
> Best,
> Kasper
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> <https://stat.ethz.ch/mailman/listinfo/r-devel>
>
>
>
>


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] complex tests failure

2017-05-05 Thread Tomas Kalibera
Thanks for the report, handled in configure in 72661 (R-devel).
I'll also port to R-patched.

Best
Tomas

On 05/04/2017 03:49 PM, Tomas Kalibera wrote:
>
> There is no way to control this at runtime.
> We will probably have to add a configure test.
>
> Best,
> Tomas
>
> On 05/04/2017 03:23 PM, Kasper Daniel Hansen wrote:
>> Thanks.
>>
>> I assume there is no way to control this via. environment variables 
>> or configure settings?  Obviously that would be great for something 
>> like this which affects tests and seems to be a known problem for 
>> older C standard libraries.
>>
>> Best,
>> Kasper
>>
>> On Thu, May 4, 2017 at 9:12 AM, Tomas Kalibera 
>> mailto:tomas.kalib...@gmail.com>> wrote:
>>
>>
>> As a quick fix, you can undefine HAVE_CTANH in complex.c,
>> somewhere after including config.h
>> An internal substitute, which is implemented inside complex.c,
>> will be used.
>>
>> Best
>> Tomas
>>
>>
>>
>>
>> On 05/04/2017 02:57 PM, Kasper Daniel Hansen wrote:
>>
>> For a while I have been getting that the complex tests fails
>> on RHEL 6.
>> The specific issue has to do with tanh (see below for full
>> output from
>> complex.Rout.fail).
>>
>> This is both with the stock compiler (GCC 4.4.7) and a
>> compiler supplied
>> through the conda project (GCC 4.8.5).  The compiler supplied
>> through conda
>> ends up linking R to certain system files, so the binary is
>> not completely
>> independent (although most dynamically linked libraries are
>> coming from the
>> conda installation).
>>
>> A search on R-devel reveals a discussion in April on an issue
>> reported on
>> Windows with a bug in tanh in old versions of the GNU C
>> standard library;
>> this seems relevant.  The discussion by Martin Maechler
>> suggest "using R's
>> internal substitute".  So how do I enable this?  Or does this
>> requires
>> updating the C standard library?
>>
>> ** From complex.Rout.fail
>>
>> stopifnot(identical(tanh(356+0i), 1+0i))
>>
>> Error: identical(tanh(356 + (0+0i)), 1 + (0+0i)) is not TRUE
>> In addition: Warning message:
>> In tanh(356 + (0+0i)) : NaNs produced in function "tanh"
>> Execution halted
>>
>> Best,
>> Kasper
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> <https://stat.ethz.ch/mailman/listinfo/r-devel>
>>
>>
>>
>>
>


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R-3.3.3/R-3.4.0 change in sys.call(sys.parent())

2017-05-10 Thread Tomas Kalibera


The difference in the outputs between 3.3 and 3.4 is in how call 
expressions are selected in presence of .Internals. R is asked for a 
call expression for "eval". In 3.3 one gets the arguments for the call 
expression from the .Internal that implements eval. In 3.4 one gets the 
arguments for the call expression from the closure wrapper of "eval", 
which is less surprising. See e.g.


(3.4)
> evalq()
Error in evalq() : argument is missing, with no default

vs

(3.3)
> evalq()
Error in eval(substitute(expr), envir, enclos) :
  argument is missing, with no default

(and yes, these examples work with sys.call() and lattice originally 
used it in xyplot - perhaps it'd be best to submit a bug report/issue 
for lattice)


Tomas


On 05/09/2017 11:06 PM, William Dunlap via R-devel wrote:

Some formula methods for S3 generic functions use the idiom
 returnValue$call <- sys.call(sys.parent())
to show how to recreate the returned object or to use as a label on a
plot.  It is often followed by
  returnValue$call[[1]] <- quote(myName)
E.g., I see it in packages "latticeExtra" and "leaps", and I suspect it
used in "lattice" as well.

This idiom has not done good things for quite a while (ever?) but I noticed
while running tests that it acts differently in R-3.4.0 than in R-3.3.3.
Neither the old or new behavior is nice.  E.g., in R-3.3.3 we get


parseEval <- function(text, envir) eval(parse(text=text), envir=envir)
parseEval('lattice::xyplot(mpg~hp, data=datasets::mtcars)$call',

envir=new.env())
xyplot(expr, envir, enclos)

and


evalInEnvir <- function(call, envir) eval(call, envir=envir)
evalInEnvir(quote(lattice::xyplot(mpg~hp, data=datasets::mtcars)$call),

envir=new.env())
xyplot(expr, envir, enclos)

while in R-3.4.0 we get

parseEval <- function(text, envir) eval(parse(text=text), envir=envir)
parseEval('lattice::xyplot(mpg~hp, data=datasets::mtcars)$call',

envir=new.env())
xyplot(parse(text = text), envir = envir)

and


evalInEnvir <- function(call, envir) eval(call, envir=envir)
evalInEnvir(quote(lattice::xyplot(mpg~hp, data=datasets::mtcars)$call),

envir=new.env())
xyplot(call, envir = envir)

Should these packages be be fixed up to use just sys.call()?

Bill Dunlap
TIBCO Software
wdunlap tibco.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Null pointer dereference?

2017-05-19 Thread Tomas Kalibera
Thanks, the tool is indeed right, this is a real error. Although it is 
unlikely to trigger and unlikely to cause new problems (R would fail 
soon anyway if out of memory), it is clearly something to be fixed and 
something to be classified as "true positive".


I've fixed this in a way that is consistent with coding style in that file.

Best,
Tomas

On 05/19/2017 06:12 PM, Zubin Mevawalla wrote:

I was curious if this was a real null pointer dereference issue in
R-devel/src/library/grDevices/src/devPS.c on line 1009?

1000: static type1fontinfo makeType1Font()
1001: {
1002: type1fontinfo font = (Type1FontInfo *) malloc(sizeof(Type1FontInfo));
1003: /*
1004:  * Initialise font->metrics.KernPairs to NULL
1005:  * so that we know NOT to free it if we fail to
1006:  * load this font and have to
1007:  * bail out and free this type1fontinfo
1008:  */
1009: font->metrics.KernPairs = NULL;
1010: if (!font)
1011: warning(_("failed to allocate Type 1 font info"));
1012: return font;
1013: }

`font` is conceivably null because there is a null check on line 1010,
but is dereferenced on 1009.

CodeAi, an automated repair tool being developed at Qbit logic,
suggested an if-guard as a fix:

@@ -1006,9 +1006,7 @@ static type1fontinfo makeType1Font()
   * load this font and have to
   * bail out and free this type1fontinfo
   */
-if(font) {
-font->metrics.KernPairs = NULL;
-}
+font->metrics.KernPairs = NULL;
  if (!font)
 warning(_("failed to allocate Type 1 font info"));
  return font;

Could I submit this as a patch if it looks alright?

Thanks so much,

Zubin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Inconsistency in handling of numeric input with %d by sprintf

2017-05-25 Thread Tomas Kalibera

Thanks, fixed in 72737 (R-devel).

Best
Tomas

On 05/23/2017 06:32 PM, Evan Cortens wrote:

Yes, what Joris posts about is exactly what I noted in my March 9th post to
R-devel. The behaviour is sort of documented, but not in the clearest
manner (in my opinion). Like I say, my ultimate conclusion was that the
silent coercion of numerics to integers by sprintf() was a handy
convenience, but not one that should be relied about to always work
predictably.

On Tue, May 23, 2017 at 10:02 AM, Michael Chirico 
wrote:
https://github.com/Rdatatable/data.table/issues/2171

The fix was easy, it's just surprising to see the behavior change almost
on a whim. Just wanted to point it out in case this is unknown behavior,
but Evan seems to have found this as well.

On Tue, May 23, 2017 at 12:00 PM, Michael Chirico <
michaelchiri...@gmail.com> wrote:


Astute observation. And of course we should be passing integer when we
use %d. It's an edge case in how we printed ITime objects in data.table:


On Tue, May 23, 2017 at 11:53 AM, Joris Meys  wrote:


I initially thought this is "documented behaviour". ?sprintf says:

Numeric variables with __exactly integer__ values will be coerced to
integer. (emphasis mine).

Turns out this only works when the first value is numeric and not NA, as
shown by the following example:


sprintf("%d", as.numeric(c(NA,1)))

Error in sprintf("%d", as.numeric(c(NA, 1))) :
   invalid format '%d'; use format %f, %e, %g or %a for numeric objects

sprintf("%d", as.numeric(c(1,NA)))

[1] "1"  "NA"

So the safest thing is indeed passing the right type, but the behaviour
is indeed confusing. I checked this on both Windows and Debian, and on both
systems I get the exact same response.

Cheers
Joris

On Tue, May 23, 2017 at 4:53 PM, Evan Cortens 
wrote:


Hi Michael,

I posted something on this topic to R-devel several weeks ago, but never
got a response. My ultimate conclusion is that sprintf() isn't super
consistent in how it handles coercion: sometimes it'll coerce real to
integer without complaint, other times it won't. (My particular email
had
to do with the vectors longer than 1 and their positioning vis-a-vis the
format string.) The safest thing is just to pass the right type. In this
case, sprintf('%d', as.integer(NA_real_)) works.

Best,

Evan

On Fri, May 19, 2017 at 9:23 AM, Michael Chirico <
michaelchiri...@gmail.com>
wrote:


Consider

#as.numeric for emphasis
sprintf('%d', as.numeric(1))
# [1] "1"

vs.

sprintf('%d', NA_real_)


  Error in sprintf("%d", NA_real_) :

invalid format '%d'; use format %f, %e, %g or %a for numeric object
I understand the error is correct, but if it works for other numeric

input,

why doesn't R just coerce NA_real_ to NA_integer_?

Michael Chirico

 [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




--
Evan Cortens, PhD
Institutional Analyst - Office of Institutional Analysis
Mount Royal University
403-440-6529

 [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




--
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel :  +32 (0)9 264 61 79 <+32%209%20264%2061%2079>
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php




[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bug in R 3.4.0: R CMD Sweave return value is 1 on success (instead of 0)

2017-05-26 Thread Tomas Kalibera

This bug has been fixed in R-devel 72612 and R-patched 72614.

Best
Tomas

On 05/26/2017 02:59 PM, Roman Kiselev wrote:

Dear all,

after an update from R 3.3.x to R 3.4.0 I cannot build Sweave documents
using make, because make checks the exit code of the `R CMD Sweave` and
stops if it is not zero. I believe that this is a bug and that the
return code should be 0 on success and any other value in case of error.

Regards
Roman Kiselev



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Interpreting R memory profiling statistics from Rprof() and gc()

2017-05-29 Thread Tomas Kalibera

On 05/18/2017 06:54 PM, Joy wrote:

Sorry, this might be a really basic question, but I'm trying to interpret
the results from memory profiling, and I have a few questions (marked by
*Q#*).

 From the summaryRprof() documentation, it seems that the four columns of
statistics that are reported when setting memory.profiling=TRUE are
- vector memory in small blocks on the R heap
- vector memory in large blocks (from malloc)
- memory in nodes on the R heap
- number of calls to the internal function duplicate in the time interval
(*Q1:* Are the units of the first 3 stats in bytes?)
In Rprof.out, vector memory in small and large blocks is given in 8-byte 
units (for historical reasons), but memory in nodes is given in bytes - 
this is not documented/guaranteed in documentation. In 
summaryRprof(memory="both"), memory usage is given in megabytes as 
documented.
For summaryRprof(memory="stats" and memory="tseries") I clarified in 
r72743, now memory usage is in bytes and it is documented.


and from the gc() documentation, the two rows represent
- ‘"Ncells"’ (_cons cells_), usually 28 bytes each on 32-bit systems and 56
bytes on 64-bit systems,
- ‘"Vcells"’ (_vector cells_, 8 bytes each)
(*Q2:* how are Ncells and Vcells related to small heap/large heap/memory in
nodes?)

Ncells describe memory in nodes (Ncells is the number of nodes).

Vcells describe memory in "small heap" + "large heap". A Vcell today 
does not have much meaning, it is shown for historical reasons, but the 
interesting thing is that Vcells*56 (or 28 on 32-bit systems) gives the 
number of bytes in "small heap"+"large heap" objects.



And I guess the question that lead to these other questions is - *Q3:* I'd
like to plot out the total amount of memory used over time, and I don't
think Rprofmem() give me what I'd like to know because, as I'm
understanding it, Rprofmem() records the amount of memory allocated with
each call, but this doesn't tell me the total amount of memory R is using,
or am I mistaken?
Rprof controls a sampling profiler which regularly asks the GC how much 
memory is currently in use on the R heap (but beware, indeed some of 
that memory is no longer reachable but has not yet been collected - 
running gc more frequently helps, and some of the memory may still be 
reachable but will not be used anymore). You can get this data by 
summaryRprof(memory="tseries") and plot them - add columns 1+2 or 1+2+3 
depending on what you want, in 72743 or more recent, in older version 
you need to multiply columns 1 and 2 by 8. To run the GC more frequently 
you can use gctorture.


Or if you are happy modifying your own R code and you don't insist on 
querying the memory size very frequently, you can also explicitly call 
gc(verbose=T) repeatedly. For this you won't need to use the profiler.


If you were looking instead at how much memory the whole R instance was 
using (that is, including memory allocated by the R gc but not presently 
used for R objects, including memory outside R heap), the easiest way 
would be to use facilities of your OS.


Rprofmem is a different thing and won't help you.

Best
Tomas



Thanks in advance!

Joy

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] translateChar in NewName in bind.c

2017-06-13 Thread Tomas Kalibera

Thanks, fixed in R-devel.
Best
Tomas

On 06/11/2017 02:30 PM, Suharto Anggono Suharto Anggono via R-devel wrote:

I see another thing in function 'NewName' in bind.c. In
else if (*CHAR(tag)) ,
'ans' is basically copied from 'tag'. Could the whole thing there be just the 
following?
ans = tag;
It seems to me that it can also replace
ans = R_BlankString;
in 'else'; so,
else if (*CHAR(tag))
and
else
can be merged to be just
else .




  Subject: translateChar in NewName in bind.c
  To: r-devel@r-project.org
  Date: Saturday, 10 June, 2017, 9:14 PM
  
  In function 'NewName' in bind.c (https://svn.r-project.org/R/trunk/src/main/bind.c), in

  else if (*CHAR(base)) ,
  'translateChar' is used. Should it be
  'translateCharUTF8' instead? The end result is marked as
  UTF-8:
  mkCharCE(cbuf, CE_UTF8)
  Other cases already use
  'translateCharUTF8'.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [Bug Fix] Default values not applied to ... arguments

2017-07-06 Thread Tomas Kalibera

Thanks for the report, I've fixed 15199 in the AST interpreter in 72664, 
I will fix it in the byte-code interpreter as well.

If you ever needed to disable the JIT, there is API for that, see 
?compiler. Note though that by disabling the JIT you won't disable the 
byte-code interpreter, because code also gets compiled when packages are 
installed or when the compiler is invoked explicitly.

Best,
Tomas

On 07/06/2017 04:40 PM, Sahil Kang wrote:
> Hi Duncan, Martin
>
> Here's a small patch that fixes bug 15199 reported at:
> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=15199
>
> I was able to reproduce the bug as Duncan had outlined just fine, but 
> I did notice that when we debug(f), the problem went away.
> I later realized that f(1,,3) behaved correctly the first time it was 
> executed, but misbehaved as documented on subsequent calls.
> This narrowed the problem down to the byte-compilation that occurs on 
> subsequent function calls.
>
> This patch prevents byte-compilation on closure objects.
> Although it's a less than ideal solution, this patch fixes the bug at 
> least until the underlying byte-compilation issue can be found (I'm 
> currently scrutinizing the promiseArgs function at eval.c:2771).
>
> Thanks,
> Sahil
>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] symantec treating R-devel r73003 for Windows as malware

2017-08-02 Thread Tomas Kalibera
Hi Petr,

I can repeat this behavior (or similar) with Norton Security:

R-devel-win.exe is reported by Download Insight
R-3.4.1patched-win.exe is reported by Download Insight
R-3.4.1-win.exe is fine

And the report is that the binary is very new (released less than 1 week 
ago) and has very few users (less than 5 in "Norton Community"), which 
is not surprising given that these files are updated frequently. There 
is no other report/warning. With Norton Security, one has the option to 
allow execution of the file and to check "Always allow this file (if Run 
is chosen)".

Best
Tomas

On 08/02/2017 11:46 AM, PIKAL Petr wrote:
> Hallo Martin
>
> The result is the same. The R exe file is downloaded with some note about 
> possible danger. When I try to run it, it is inspected and deleted from its 
> location.
>
> Symantec message is enclosed.
>
> Our IT said it has something to do with installer type, but it is beyond my 
> expertise.
>
> Petr Pikal
>
>
>> -Original Message-
>> From: Martin Maechler [mailto:maech...@stat.math.ethz.ch]
>> Sent: Wednesday, August 2, 2017 10:09 AM
>> To: PIKAL Petr 
>> Cc: r-devel@r-project.org
>> Subject: Re: [Rd] symantec treating R-devel r73003 for Windows as malware
>>
>>> PIKAL Petr 
>>>  on Wed, 2 Aug 2017 07:01:55 + writes:
>>  > Dear all
>>  > I am not sure if this is appropriate for the list, but I have 
>> recently found
>> that Symantec blocks R devel from installation. Enclosed is copy of Symantec
>> message.
>>
>> Thank you, Petr.   I do think it is quite appropriate for this list.
>>
>>  > Our IT installed it but I wont be probably alone with this problem.
>>
>> We will see...
>>
>>  >> sessionInfo()
>>  > R Under development (unstable) (2017-07-31 r73003)
>>  > Platform: x86_64-w64-mingw32/x64 (64-bit)
>>  > Running under: Windows 10 x64 (build 14393)
>>
>>  [..]
>>
>> I'm appending a (less compressed but notably correctly cropped) version of 
>> the
>> screen shot (also with a slightly more descriptive name) because I needed
>> several minutes before I could correctly view your first attachment
>>
>> What happens if you try "R 3.4.1-patched" for Windows?
>> I think they are built with the same tools... and that one is much more
>> important as closer to release.
>>
>> Thank you (and others) for diagnosing and giving feedback on this.
>>
>> Martin Maechler,
>> ETH Zurich
>>
>
> 
> Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou 
> určeny pouze jeho adresátům.
> Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
> jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
> svého systému.
> Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
> jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
> Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
> zpožděním přenosu e-mailu.
>
> V případě, že je tento e-mail součástí obchodního jednání:
> - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, 
> a to z jakéhokoliv důvodu i bez uvedení důvodu.
> - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
> Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany 
> příjemce s dodatkem či odchylkou.
> - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
> dosažením shody na všech jejích náležitostech.
> - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
> žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
> pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu 
> případně osobě, kterou adresát zastupuje, předloženy nebo jejich existence je 
> adresátovi či osobě jím zastoupené známá.
>
> This e-mail and any documents attached to it may be confidential and are 
> intended only for its intended recipients.
> If you received this e-mail by mistake, please immediately inform its sender. 
> Delete the contents of this e-mail with all attachments and its copies from 
> your system.
> If you are not the intended recipient of this e-mail, you are not authorized 
> to use, disseminate, copy or disclose this e-mail in any manner.
> The sender of this e-mail shall not be liable for any possible damage caused 
> by modifications of the e-mail or by delay with transfer of the email.
>
> In case that this e-mail forms part of business dealings:
> - the sender reserves the right to end negotiations about entering into a 
> contract in any time, for any reason, and without stating any reasoning.
> - if the e-mail contains an offer, the recipient is entitled to immediately 
> accept such offer; The sender of this e-mail (offer) excludes any acceptance 
> of the offer on the part of the recipient containing any amendment or 
> variation.
> - the sender insists on that the respective contract is concluded only upon 
> an 

Re: [Rd] attributes on symbols

2017-08-11 Thread Tomas Kalibera


Thanks for spotting this issue. The short answer is yes, adding 
attributes to a symbol is a bad idea and will be turned into a runtime 
error soon. Maintainers of packages that add attributes to symbols have 
been notified and some have already fixed their code.


At least in one case the package is not working properly, even in 
isolation, because of the global effect of adding an attribute to a 
symbol. Think about an expression "x - x", adding sign 1 to the "first 
x" and then sign -1 to the "second x" ends up with (both) "x" having 
sign "-1", because it is the same "x". The package would need something 
like a symbol, but passed by value (suggestions below).


By design in R symbols are represented by singleton objects registered 
in a global symbol table. Symbols are passed by reference and are fully 
represented by their name or pointer, so they can be quickly compared by 
pointer comparison and they can be used for bindings (naming variables, 
functions). Symbols thus cannot have attributes attached and must be 
treated as immutable. For this reason also attributes on symbols are not 
preserved on serialization (as Radford pointed out).


In some cases one needs to add an attribute to something similar to a 
symbol, but something passed by value. There are multiple ways to do it 
(credits for suggestions to Peter, Michael and others):


- wrap a symbol into an object passed by value, add an attribute to that 
object; such an object can be a list, an S3 or S4 object, an expression, 
etc; in "x - x", there will be two different wrappers of "x"


- encapsulate a symbol and needed meta-data (what would be in the 
attribute) together into an object passed by value, e.g. into S3/S4 
object or a list; in "x - x", there will again be two different objects 
encapsulating "x"


- store the meta-data (what would be in the attribute) in a user 
environment created by new.env(); the meta-data could be conveniently 
looked up by the symbol and the environment can be hashed for fast 
lookup; from Peter:

attrib <- new.env()
attributes(sym) > attrib$sym
attr(sym, "foo") > attrib$sym[["foo"]]
(the last suggestion will not work for the example "x-x", but may work 
for other where referential semantics is needed, but now in a well 
defined scope)



Best,
Tomas


On 07/07/2017 03:06 PM, Torsten Hothorn wrote:


Here is a simpler example:


ex <- as.name("a")
attr(ex, "test") <- 1
quote(a)

a
attr(,"test")
[1] 1

Torsten

On Thu, 6 Jul 2017, William Dunlap wrote:

The multcomp package has code in multcomp:::expression2coef that 
attaches the 'coef' attribute to
symbols.  Since there is only one symbol object in a session with a 
given name, this means that
this attaching has a global effect.  Should this be quietly allowed 
or should there be a warning or

an error?
E.g.,

str(quote(Education))
# symbol Education
lmod <- stats::lm(Fertility ~ ., data = datasets::swiss)
glmod <- multcomp::glht(lmod, c("Agriculture=0", "Education=0"))
str(quote(Education))
# symbol Education
# - attr(*, "coef")= num 1

Bill Dunlap
TIBCO Software
wdunlap tibco.com



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Problem with a regular expression.

2017-08-17 Thread Tomas Kalibera


The problem is in TRE library, in regcomp, while compiling the regular 
expression.


This is enough to trigger in R (to do this without re-boot: ulimit -v 
50 ):

> strsplit("", ")")

To repeat in TRE, one can build TRE from sources and run
> ./src/agrep ")" README.md

Tomas


On 08/17/2017 09:45 AM, Moshe Olshansky via R-devel wrote:

I tried this on a Linux (Ubuntu) server invoking R from the command line and 
the result was the same, except that I could kill the R session from another 
terminal window.


   From: Rui Barradas 
  To: Chris Triggs ; "r-devel@r-project.org" 

Cc: Thomas Lumley 
  Sent: Thursday, 17 August 2017, 17:26
  Subject: Re: [Rd] Problem with a regular expression.

Hello,


This seems to be serious.
RGui.exe, fresh session. I've clicked File > New Script and wrote

Oldterm <- c("A", "B", "A", "*", "B")
strsplit(Oldterm, ")" )

Ran each instruction at a time with Ctrl+r and with the strsplit call
the system froze.

Ctrl+Alt+Del didn't work, I had to go for the power switch button.

sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=Portuguese_Portugal.1252
LC_CTYPE=Portuguese_Portugal.1252
[3] LC_MONETARY=Portuguese_Portugal.1252 LC_NUMERIC=C

[5] LC_TIME=Portuguese_Portugal.1252

attached base packages:
[1] statsgraphics  grDevices utilsdatasets  methods  base

loaded via a namespace (and not attached):
[1] compiler_3.4.1


Rui Barradas

Em 16-08-2017 23:31, Chris Triggs escreveu:

Hi...

I have come upon a problem with a regular expression which causes base-R to 
freeze.  I have reproduced the phenomenon on several machines running R under 
Windows 10, and also under OSX  on different Apple MACs.

The minimal example is:-
Oldterm is a vector of characters, e.g. "A", "B", "A", "*", "B"
The regular expression is ")"

The call which freezes R is
strsplit(Oldterm, ")" )

Thomas - after he had reproduced the problem - suggested that I submit it to 
r-devel.

Best wishes
 Chris Triggs


 [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



	[[alternative HTML version deleted]]


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] strange behaviour read.table and clipboard

2017-08-17 Thread Tomas Kalibera
Thank you for the report, it is a bug in buffering in R (not specific to 
Windows) and will be fixed.


Best
Tomas

On 08/17/2017 10:37 AM, PIKAL Petr wrote:

Hi


-Original Message-
From: Robert Baer [mailto:rb...@atsu.edu]
Sent: Wednesday, August 16, 2017 3:04 PM
To: PIKAL Petr ; Duncan Murdoch

Cc: r-devel@r-project.org
Subject: Re: [Rd] strange behaviour read.table and clipboard

You said, "put a name in the cell".  Does that mean you forgot a header =
TRUE?

No

for read.delim header=TRUE is default option.

The mentioned issue starts between R-devel r71964 and r73003

I cannot narrow this range as I do not have available other versions between 
this date range.

I tested other read.* functions and all seems to work as expected.

The problem is connected **only** with reading from clipboard. Maybe it is the 
issue of Windows, but I cannot see anything weird when copying e.g. from Excel 
to Notepad

Cheers
Petr




On 8/16/2017 1:25 AM, PIKAL Petr wrote:

Hi Duncan

The simples spreadsheet is:

Put a name in the cell, let say "a1"
Put number e.g. 1 below "a1"
Copy the number to enough rows
Select this column and press ctrl-c

result is


temp<- read.delim("clipboard")
str(temp)

'data.frame':   1513 obs. of  1 variable:
   $ a1: Factor w/ 2 levels "1","a1": 1 1 1 1 1 1 1 1 1 1 ...

which(temp$a1=="a1")

[1] 1365
I tested it in vanilla R


sessionInfo()

R Under development (unstable) (2017-07-31 r73003)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 14393)

Matrix products: default

locale:
[1] LC_COLLATE=Czech_Czech Republic.1250  LC_CTYPE=Czech_Czech

Republic.1250

[3] LC_MONETARY=Czech_Czech Republic.1250 LC_NUMERIC=C
[5] LC_TIME=Czech_Czech Republic.1250

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.5.0
Excel 16 or 15 I am not sure.

R-devel 2015 (69443) works as expected so it started a believe around May

or June this year, when I installed new R version.

I hope it could help to trace the problem. If I can help any further, let me

know.

Best regards
Petr




-Original Message-
From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com]
Sent: Wednesday, August 16, 2017 12:35 AM
To: PIKAL Petr ; r-devel@r-project.org
Subject: Re: [Rd] strange behaviour read.table and clipboard

On 15/08/2017 10:03 AM, PIKAL Petr wrote:

Dear all

I used to transfer data from excel to R by simple ctrl-c and

read.delim("clipboard") construction. I know it is a bad practice but it is easy
and for quick exploratory work it is OK. However after changing to new R

devel

few days ago I encountered weird behaviour. I tried one or two columns.

You haven't posted something that is reproducible.  I don't have Excel, but I

can

cut and paste from Libreoffice, and I don't see this.
However, it's not the same spreadsheet as you used, so I wouldn't be
comfortable saying I did what you did.

Please reduce the size of your spreadsheet if you can, and then post
instructions for how to construct it, and what to cut and paste from it.
Then others can try what you did and see if this is specific to your 
machine,

to

that particular version of R-devel, to Excel, etc.

Duncan Murdoch



In case of 2 columns, header is repeated after 526 items

mar<-read.delim("clipboard")
which(mar$a2=="a1")

[1]  525 1051 1577

diff(which(mar$a2=="a1"))

[1] 526 526
and only first header item is repeated.

In case of one column, header is repeated after 1107 items


mar<-read.delim("clipboard")
diff(which(mar$a2=="a2"))

[1] 1107 1107

And all items in object are therefore changed to factor.

BTW, readxl package works on same excel file smoothly.

I will try to download the most recent R version to check it, but it could

take

some time due to our IT issues.

Best regards
Petr


version

 _
platform   x86_64-w64-mingw32
arch   x86_64
os mingw32
system x86_64, mingw32
status Under development (unstable)
major  3
minor  5.0
year   2017
month  07
day31
svn rev73003
language   R
version.string R Under development (unstable) (2017-07-31 r73003)
nickname   Unsuffered Consequences



Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou

určeny pouze jeho adresátům.

Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně

jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze
svého systému.

Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento

email

jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.

Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou

modifikacemi

či zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy,

a

to z jakéhokoliv důvodu i bez 

Re: [Rd] Control multi-threading in standard matrix product

2017-08-21 Thread Tomas Kalibera

Hi Ghislain,

I think you might be comparing two versions of R with different BLAS 
implementations, one that is single threaded (is your 3.3.2 used with 
reference blas?) and one that is multi threaded (3.4.1 with openblas). 
Could you check with "perf"? E.g. run your benchmark with "perf record" 
in both cases and you should see the names of the hot BLAS functions and 
this should reveal the BLAS implementation (look for dgemm).


In Ubuntu, if you install R from the package system, whenever you run it 
it will use the BLAS currently installed via the package system. However 
if you build R from source on Ubuntu, by default, it will use the 
reference BLAS which is distributed with R. Section "Linear algebra" of 
"R Installation and Administration" has details on how to build R with 
different BLAS/LAPACK implementations.


Sadly there is no standard way to specify the number of BLAS worker 
threads. RhpcBLASctl has specific code for several existing 
implementations, but R itself does not attempt to control BLAS multi 
threading in any way. It is expected the user/system administrator will 
configure their BLAS implementation of choice to use the number of 
threads they need. A similar problem exists in other internally 
multi-threaded third-party libraries, used by packages - R cannot 
control how many threads they run.


Best
Tomas

On 08/21/2017 02:55 PM, Ghislain Durif wrote:

Dear R Core Team,

I wish to report what can be viewed as a bug or at least a strange
behavior in R-3.4.1. I ask my question here (as recommended on
https://www.r-project.org/bugs.html) since I am not member of the R's
Bugzilla.

When running 'R --vanilla' from the command line, the standard matrix
product is by default based on BLAS and multi-threaded on all cores
available on the machine, c.f. following examples:

n=1
p=1000
q=5000
A = matrix(runif(n*p),nrow=n, ncol=p)
B = matrix(runif(p*q),nrow=p, ncol=q)
C = A %*% B # multi-threaded matrix product


However, the default behavior to use all available cores can be an
issue, especially on shared computing resources or when the matrix
product is used in parallelized section of codes (for instance with
'mclapply' from the 'parallel' package). For instance, the default
matrix product is single-threaded in R-3.3.2 (I ran a test on my
machine), this new features will deeply affect the behavior of existing
R packages that use other multi-threading solutions.

Thanks to this stackoverflow question
(https://stackoverflow.com/questions/45794290/in-r-how-to-control-multi-threading-in-blas-parallel-matrix-product),
I now know that it is possible to control the number of BLAS threads
thanks to the package 'RhpcBLASctl'. However, being able to control the
number of threads should maybe not require to use an additional package.

In addition, the doc 'matmult' does not mention this point, it points to
the 'options' doc page and especially the 'matprod' section, in which
the multi-threading is not discussed.


Here is the results of the 'sessionInfo()' function on my machine for
R-3.4.1:
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so

locale:
   [1] LC_CTYPE=fr_FR.utf8   LC_NUMERIC=C
   [3] LC_TIME=fr_FR.utf8LC_COLLATE=fr_FR.utf8
   [5] LC_MONETARY=fr_FR.utf8LC_MESSAGES=fr_FR.utf8
   [7] LC_PAPER=fr_FR.utf8   LC_NAME=C
   [9] LC_ADDRESS=C  LC_TELEPHONE=C
[11] LC_MEASUREMENT=fr_FR.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods base

loaded via a namespace (and not attached):
[1] compiler_3.4.1



and for R-3.3.2:
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS

locale:
   [1] LC_CTYPE=fr_FR.utf8   LC_NUMERIC=C
   [3] LC_TIME=fr_FR.utf8LC_COLLATE=fr_FR.utf8
   [5] LC_MONETARY=fr_FR.utf8LC_MESSAGES=fr_FR.utf8
   [7] LC_PAPER=fr_FR.utf8   LC_NAME=C
   [9] LC_ADDRESS=C  LC_TELEPHONE=C
[11] LC_MEASUREMENT=fr_FR.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods base


Thanks in advance,
Best regards
||



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Problem with a regular expression.

2017-08-22 Thread Tomas Kalibera

Thanks, I patched the TRE sources in R-devel 73107 and closed the PR.
Tomas

On 08/17/2017 11:06 AM, Wollschlaeger, Daniel wrote:

The issue seems related to R bug report 15012:

https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=15012

As mentioned in the comments there, a pull request to the TRE library has 
recently been made, but I don't know about its status.

Daniel


Von: R-devel  im Auftrag von Tomas Kalibera 

Gesendet: Donnerstag, 17. August 2017 10:14
An: r-devel@r-project.org
Betreff: Re: [Rd] Problem with a regular expression.

The problem is in TRE library, in regcomp, while compiling the regular
expression.

This is enough to trigger in R (to do this without re-boot: ulimit -v
50 ):
  > strsplit("", ")")

To repeat in TRE, one can build TRE from sources and run
  > ./src/agrep ")" README.md

Tomas


On 08/17/2017 09:45 AM, Moshe Olshansky via R-devel wrote:

I tried this on a Linux (Ubuntu) server invoking R from the command line and 
the result was the same, except that I could kill the R session from another 
terminal window.


From: Rui Barradas 
   To: Chris Triggs ; "r-devel@r-project.org" 

Cc: Thomas Lumley 
   Sent: Thursday, 17 August 2017, 17:26
   Subject: Re: [Rd] Problem with a regular expression.

Hello,

This seems to be serious.
RGui.exe, fresh session. I've clicked File > New Script and wrote

Oldterm <- c("A", "B", "A", "*", "B")
strsplit(Oldterm, ")" )

Ran each instruction at a time with Ctrl+r and with the strsplit call
the system froze.

Ctrl+Alt+Del didn't work, I had to go for the power switch button.

sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=Portuguese_Portugal.1252
LC_CTYPE=Portuguese_Portugal.1252
[3] LC_MONETARY=Portuguese_Portugal.1252 LC_NUMERIC=C

[5] LC_TIME=Portuguese_Portugal.1252

attached base packages:
[1] statsgraphics  grDevices utilsdatasets  methods  base

loaded via a namespace (and not attached):
[1] compiler_3.4.1


Rui Barradas

Em 16-08-2017 23:31, Chris Triggs escreveu:

Hi...

I have come upon a problem with a regular expression which causes base-R to 
freeze.  I have reproduced the phenomenon on several machines running R under 
Windows 10, and also under OSX  on different Apple MACs.

The minimal example is:-
Oldterm is a vector of characters, e.g. "A", "B", "A", "*", "B"
The regular expression is ")"

The call which freezes R is
strsplit(Oldterm, ")" )

Thomas - after he had reproduced the problem - suggested that I submit it to 
r-devel.

Best wishes
  Chris Triggs


  [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



   [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Possible repeat{} / break function bug in R 3.4.1

2017-08-23 Thread Tomas Kalibera

It is a bug in the byte-code compiler. I will fix
Tomas

On 08/23/2017 09:22 AM, Lionel Henry wrote:

I don't think that's a bug. source() uses eval(), and eval() creates a
new function-like context frame. In a way expecting `break` to work
inside source() is like expecting `break` to cross stack frames:

 my_break <- function() break
 repeat(my_break())

Lionel



On 23 août 2017, at 09:17, Martin Maechler  wrote:


Martin Maechler 
on Wed, 23 Aug 2017 09:10:20 +0200 writes:
Peter Bosa 
on Tue, 22 Aug 2017 14:39:50 + writes:

Hello, I've noticed the following error using repeat{} / break in R 3.4.1 
running on Windows 10 and Windows Server 2008 (both 64-bit environments).
When running a repeat function, the break command causes an error message if 
the repeat command refers to code within a file, but does not produce an error 
if the code is contained within the repeat{} command.
Hello, I've noticed the following error using repeat{} / break in R 3.4.1 
running on Windows 10 and Windows Server 2008 (both 64-bit environments).

When running a repeat function, the break command causes an error message if 
the repeat command refers to code within a file, but does not produce an error 
if the code is contained within the repeat{} command.

For example, the following code runs fine:

x <- 1
y <- 5

repeat {
if(x < y) {
print("No Break Dance :-(")
x = x + 1
} else {
print("Break Dance!")
break
}
}

[1] "No Break Dance :("
[1] "No Break Dance :("
[1] "No Break Dance :("
[1] "No Break Dance :("
[1] "No Break Dance :("
[1] "Break Dance!"
However, if I take the loop contents of the repeat{} function, and save them to 
a file (breakTest.R) that contains the following:

if(x < y) {
print("No Break Dance :-(")
x = x + 1
} else {
print("Break Dance!")
break
}

And then run the following code:

x <- 1
y <- 5

repeat{
source("./breakTest.R")
}

I get the following error:

[1] "No Break Dance :("
[1] "No Break Dance :("
[1] "No Break Dance :("
[1] "No Break Dance :("
[1] "No Break Dance :("
[1] "Break Dance!"
Error in eval(ei, envir) : no loop for break/next, jumping to top level
This was not an issue with previous versions of R that I have used, including 
3.3.3.

Any suggestions? Is this a known bug with 3.4.1?

Thank you, Peter!
I can confirm what you are seeing (on Linux) in R version 3.4.0,
3.4.1, and "R devel", and also that this had worked w/o a
problem in earlier versions of R, where I've looked at
R version 3.3.3 and 3.2.5.
I do think this is a bug, but it was not known till now.
For ease of use, I attach the two R files to easily reproduce.
Note I use  writeLines() instead of print() as its output is "nicer".
Best regards,
Martin Maechler, ETH Zurich

Trying again with the two attachment.  Yes, I of all people (!!)
should know that they must have an allowed MIME type; in this
case  text/plain !

Martin

## see ./break-source_R341.R
if(x < y) {
  writeLines("No Break Dance :-(")
  x <- x + 1
} else {
  writeLines("Break Dance!")
  break
}
## From: Peter Bosa 
## To: "R-devel@r-project.org" 
## Subject: [Rd] Possible repeat{} / break function bug in R 3.4.1
## Date: Tue, 22 Aug 2017 14:39:50 +

## Hello, I've noticed the following error using repeat{} / break in R 3.4.1 
running on Windows 10 and Windows Server 2008 (both 64-bit environments).

## When running a repeat function, the break command causes an error message if 
the repeat command refers to code within a file, but does not produce an error 
if the code is contained within the repeat{} command.

## For example, the following code runs fine:

x <- 1
y <- 5
repeat {
  if(x < y) {
writeLines("No Break Dance :-(")
x <- x + 1
  } else {
writeLines("Break Dance!")
break
  }
}
## No Break Dance :(
## No Break Dance :(
## No Break Dance :(
## No Break Dance :(
## No Break Dance :(
## Break Dance!
## >

## However, if I take the loop contents of the repeat{} function, and save
## them to a file (breakTest.R) that contains the following:
## ^^^
##__SEE THAT FILE__
## if(x < y) {
##   writeLines("No Break Dance :-(")
##   x = x + 1
## } else {
##   writeLines("Break Dance!")
##   break
## }

## And then run the following code:

x <- 1
y <- 5
repeat{
  source("./breakTest.R")
}
cat("successfully finished\n")

## I get the following error:

## No Break Dance :(
## No Break Dance :(
## No Break Dance :(
## No Break Dance :(
## No Break Dance :(
## Break Dance!
## Error in eval(ei, envir) : no loop for break/next, jumping to top level
## 


## This was not an issue with previous versions of R that I have used, 
including 3.3.3.

## MM: It does work in R 3.3.3, indeed
## --  it fails in R 3.4.0 and later


## Any suggestions? Is this a known bug with 3.4.1?

## Cheers-
## Peter


## 
## peter bosa
## metro
## modeling services
## 600 ne grand ave
## portland, or  97232

## peter.b...@oregonmetro.gov

Re: [Rd] Possible repeat{} / break function bug in R 3.4.1

2017-08-23 Thread Tomas Kalibera

Fixed in 73112.

If you needed to run this code in unpatched versions of R, you can 
disable the problematic compiler optimization in the loop for instance 
by adding "eval(NULL)" to the body of the loop. However, please do not 
forget to remove this for future versions of R and specifically do not 
assume this would turn off a particular compiler optimization in future 
versions.


Best
Tomas




On 08/23/2017 09:24 AM, Tomas Kalibera wrote:

It is a bug in the byte-code compiler. I will fix
Tomas

On 08/23/2017 09:22 AM, Lionel Henry wrote:

I don't think that's a bug. source() uses eval(), and eval() creates a
new function-like context frame. In a way expecting `break` to work
inside source() is like expecting `break` to cross stack frames:

 my_break <- function() break
 repeat(my_break())

Lionel


On 23 août 2017, at 09:17, Martin Maechler 
 wrote:



Martin Maechler 
on Wed, 23 Aug 2017 09:10:20 +0200 writes:
Peter Bosa 
on Tue, 22 Aug 2017 14:39:50 + writes:
Hello, I've noticed the following error using repeat{} / break in 
R 3.4.1 running on Windows 10 and Windows Server 2008 (both 64-bit 
environments).
When running a repeat function, the break command causes an error 
message if the repeat command refers to code within a file, but 
does not produce an error if the code is contained within the 
repeat{} command.
Hello, I've noticed the following error using repeat{} / break in 
R 3.4.1 running on Windows 10 and Windows Server 2008 (both 64-bit 
environments).


When running a repeat function, the break command causes an error 
message if the repeat command refers to code within a file, but 
does not produce an error if the code is contained within the 
repeat{} command.


For example, the following code runs fine:

x <- 1
y <- 5

repeat {
if(x < y) {
print("No Break Dance :-(")
x = x + 1
} else {
print("Break Dance!")
break
}
}

[1] "No Break Dance :("
[1] "No Break Dance :("
[1] "No Break Dance :("
[1] "No Break Dance :("
[1] "No Break Dance :("
[1] "Break Dance!"
However, if I take the loop contents of the repeat{} function, and 
save them to a file (breakTest.R) that contains the following:


if(x < y) {
print("No Break Dance :-(")
x = x + 1
} else {
print("Break Dance!")
break
}

And then run the following code:

x <- 1
y <- 5

repeat{
source("./breakTest.R")
}

I get the following error:

[1] "No Break Dance :("
[1] "No Break Dance :("
[1] "No Break Dance :("
[1] "No Break Dance :("
[1] "No Break Dance :("
[1] "Break Dance!"
Error in eval(ei, envir) : no loop for break/next, jumping to top 
level
This was not an issue with previous versions of R that I have 
used, including 3.3.3.


Any suggestions? Is this a known bug with 3.4.1?

Thank you, Peter!
I can confirm what you are seeing (on Linux) in R version 3.4.0,
3.4.1, and "R devel", and also that this had worked w/o a
problem in earlier versions of R, where I've looked at
R version 3.3.3 and 3.2.5.
I do think this is a bug, but it was not known till now.
For ease of use, I attach the two R files to easily reproduce.
Note I use  writeLines() instead of print() as its output is "nicer".
Best regards,
Martin Maechler, ETH Zurich

Trying again with the two attachment.  Yes, I of all people (!!)
should know that they must have an allowed MIME type; in this
case  text/plain !

Martin

## see ./break-source_R341.R
if(x < y) {
  writeLines("No Break Dance :-(")
  x <- x + 1
} else {
  writeLines("Break Dance!")
  break
}
## From: Peter Bosa 
## To: "R-devel@r-project.org" 
## Subject: [Rd] Possible repeat{} / break function bug in R 3.4.1
## Date: Tue, 22 Aug 2017 14:39:50 +

## Hello, I've noticed the following error using repeat{} / break in 
R 3.4.1 running on Windows 10 and Windows Server 2008 (both 64-bit 
environments).


## When running a repeat function, the break command causes an error 
message if the repeat command refers to code within a file, but does 
not produce an error if the code is contained within the repeat{} 
command.


## For example, the following code runs fine:

x <- 1
y <- 5
repeat {
  if(x < y) {
writeLines("No Break Dance :-(")
x <- x + 1
  } else {
writeLines("Break Dance!")
break
  }
}
## No Break Dance :(
## No Break Dance :(
## No Break Dance :(
## No Break Dance :(
## No Break Dance :(
## Break Dance!
## >

## However, if I take the loop contents of the repeat{} function, 
and save

## them to a file (breakTest.R) that contains the following:
## ^^^
##__SEE THAT FILE__
## if(x < y) {
##   writeLines("No Break Dance :-(")
##   x = x + 1
## } else {
##   writeLines("Break Dance!")
##   break

Re: [Rd] Possible repeat{} / break function bug in R 3.4.1

2017-08-23 Thread Tomas Kalibera


return can be used to set the return value of an expression evaluated by 
"eval"


expr <- quote(if (x) return(1) else return(2))
x <- FALSE
eval(expr) #2

Tomas

On 08/23/2017 09:46 AM, Lionel Henry wrote:

oops, I should have tried it:

 expr <- quote(break)
 repeat(eval(expr))


So eval() has hybrid semantics where `break` has more reach than
return(), weird.

 expr <- quote(return())
 repeat(eval(expr))  # infloop

Lionel



On 23 août 2017, at 09:24, Tomas Kalibera  wrote:

It is a bug in the byte-code compiler. I will fix
Tomas

On 08/23/2017 09:22 AM, Lionel Henry wrote:

I don't think that's a bug. source() uses eval(), and eval() creates a
new function-like context frame. In a way expecting `break` to work
inside source() is like expecting `break` to cross stack frames:

 my_break <- function() break
 repeat(my_break())

Lionel



On 23 août 2017, at 09:17, Martin Maechler  wrote:


Martin Maechler 
on Wed, 23 Aug 2017 09:10:20 +0200 writes:
Peter Bosa 
on Tue, 22 Aug 2017 14:39:50 + writes:

Hello, I've noticed the following error using repeat{} / break in R 3.4.1 
running on Windows 10 and Windows Server 2008 (both 64-bit environments).
When running a repeat function, the break command causes an error message if 
the repeat command refers to code within a file, but does not produce an error 
if the code is contained within the repeat{} command.
Hello, I've noticed the following error using repeat{} / break in R 3.4.1 
running on Windows 10 and Windows Server 2008 (both 64-bit environments).

When running a repeat function, the break command causes an error message if 
the repeat command refers to code within a file, but does not produce an error 
if the code is contained within the repeat{} command.

For example, the following code runs fine:

x <- 1
y <- 5

repeat {
if(x < y) {
print("No Break Dance :-(")
x = x + 1
} else {
print("Break Dance!")
break
}
}

[1] "No Break Dance :("
[1] "No Break Dance :("
[1] "No Break Dance :("
[1] "No Break Dance :("
[1] "No Break Dance :("
[1] "Break Dance!"
However, if I take the loop contents of the repeat{} function, and save them to 
a file (breakTest.R) that contains the following:

if(x < y) {
print("No Break Dance :-(")
x = x + 1
} else {
print("Break Dance!")
break
}

And then run the following code:

x <- 1
y <- 5

repeat{
source("./breakTest.R")
}

I get the following error:

[1] "No Break Dance :("
[1] "No Break Dance :("
[1] "No Break Dance :("
[1] "No Break Dance :("
[1] "No Break Dance :("
[1] "Break Dance!"
Error in eval(ei, envir) : no loop for break/next, jumping to top level
This was not an issue with previous versions of R that I have used, including 
3.3.3.

Any suggestions? Is this a known bug with 3.4.1?

Thank you, Peter!
I can confirm what you are seeing (on Linux) in R version 3.4.0,
3.4.1, and "R devel", and also that this had worked w/o a
problem in earlier versions of R, where I've looked at
R version 3.3.3 and 3.2.5.
I do think this is a bug, but it was not known till now.
For ease of use, I attach the two R files to easily reproduce.
Note I use  writeLines() instead of print() as its output is "nicer".
Best regards,
Martin Maechler, ETH Zurich

Trying again with the two attachment.  Yes, I of all people (!!)
should know that they must have an allowed MIME type; in this
case  text/plain !

Martin

## see ./break-source_R341.R
if(x < y) {
  writeLines("No Break Dance :-(")
  x <- x + 1
} else {
  writeLines("Break Dance!")
  break
}
## From: Peter Bosa 
## To: "R-devel@r-project.org" 
## Subject: [Rd] Possible repeat{} / break function bug in R 3.4.1
## Date: Tue, 22 Aug 2017 14:39:50 +

## Hello, I've noticed the following error using repeat{} / break in R 3.4.1 
running on Windows 10 and Windows Server 2008 (both 64-bit environments).

## When running a repeat function, the break command causes an error message if 
the repeat command refers to code within a file, but does not produce an error 
if the code is contained within the repeat{} command.

## For example, the following code runs fine:

x <- 1
y <- 5
repeat {
  if(x < y) {
writeLines("No Break Dance :-(")
x <- x + 1
  } else {
writeLines("Break Dance!")
break
  }
}
## No Break Dance :(
## No Break Dance :(
## No Break Dance :(
## No Break Dance :(
## No Break Dance :(
## Break Dance!
## >

## However, if I take the loop contents of the repeat{} function, and save
## them to a file (breakTest.R) that contains the following:
## ^^^
##__SEE THAT FILE__
## if(x < y) {
##   writeLines("No Break Dance :-(")
##   x = x + 1
## } else {
##   

Re: [Rd] readLines() segfaults on large file & question on how to work around

2017-09-04 Thread Tomas Kalibera

As of R-devel 72925 one gets a proper error message instead of the crash.

Tomas


On 09/04/2017 08:46 AM, rh...@eoos.dds.nl wrote:
Although the problem can apparently be avoided in this case. readLines 
causing a segfault still seems unwanted behaviour to me. I can 
replicate this with the example below (sessionInfo is further down):



# Generate an example file
l <- paste0(sample(c(letters, LETTERS), 1E6, replace = TRUE),
  collapse="")
con <- file("test.txt", "wt")
for (i in seq_len(2500)) {
  writeLines(l, con, sep ="")
}
close(con)


# Causes segfault:
readLines("test.txt")

Also the error reported by readr is also reproduced (a more 
informative error message and checking for integer overflows would be 
nice). I will report this with readr.


library(readr)
read_file("test.txt")
# Error in read_file_(ds, locale) : negative length vectors are not
# allowed


--
Jan








> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 17.04

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.7.0
LAPACK: /usr/lib/lapack/liblapack.so.3.7.0

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C LC_TIME=nl_NL.UTF-8
 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=nl_NL.UTF-8 
LC_MESSAGES=en_US.UTF-8

 [7] LC_PAPER=nl_NL.UTF-8   LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=nl_NL.UTF-8 
LC_IDENTIFICATION=C


attached base packages:
[1] stats graphics  grDevices utils datasets  methods base

other attached packages:
[1] readr_1.1.1

loaded via a namespace (and not attached):
[1] compiler_3.4.1 R6_2.2.2   hms_0.3tools_3.4.1 
tibble_1.3.3   Rcpp_0.12.12   rlang_0.1.2








On 03-09-17 20:50, Jennifer Lyon wrote:

Jeroen:

Thank you for pointing me to ndjson, which I had not heard of and is
exactly my case.

My experience:
jsonlite::stream_in - segfaults
ndjson::stream_in - my fault, I am running Ubuntu 14.04 and it is too 
old

   so it won't compile the package
corpus::read_ndjson - works!!! Of course it does a different 
simplification
  than jsonlite::fromJSON, so I have to change some code, but it 
works
  beautifully at least in simple tests. The memory-map option may 
be of

  use in the future.

Another correspondent said that strings in R can only be 2^31-1 long, 
which
is why any "solution" that tries to load the whole file into R first 
as a

string, will fail.

Thanks for suggesting a path forward for me!

Jen

On Sun, Sep 3, 2017 at 2:15 AM, Jeroen Ooms  
wrote:


On Sat, Sep 2, 2017 at 8:58 PM, Jennifer Lyon 


wrote:

I have a 2.1GB JSON file. Typically I use readLines() and
jsonlite:fromJSON() to extract data from a JSON file.


If your data consists of one json object per line, this is called
'ndjson'. There are several packages specialized to read ndjon files:

  - corpus::read_ndjson
  - ndjson::stream_in
  - jsonlite::stream_in

In particular the 'corpus' package handles large files really well
because it has an option to memory-map the file instead of reading all
of its data into memory.

If the data is too large to read, you can preprocess it using
https://stedolan.github.io/jq/ to extract the fields that you need.

You really don't need hadoop/spark/etc for this.



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] strange behaviour read.table and clipboard

2017-09-07 Thread Tomas Kalibera

Fixed in R-devel 73212 (and 73121).

Best
Tomas

On 08/17/2017 11:58 AM, Tomas Kalibera wrote:
Thank you for the report, it is a bug in buffering in R (not specific 
to Windows) and will be fixed.


Best
Tomas

On 08/17/2017 10:37 AM, PIKAL Petr wrote:

Hi


-Original Message-
From: Robert Baer [mailto:rb...@atsu.edu]
Sent: Wednesday, August 16, 2017 3:04 PM
To: PIKAL Petr ; Duncan Murdoch

Cc: r-devel@r-project.org
Subject: Re: [Rd] strange behaviour read.table and clipboard

You said, "put a name in the cell".  Does that mean you forgot a 
header =

TRUE?

No

for read.delim header=TRUE is default option.

The mentioned issue starts between R-devel r71964 and r73003

I cannot narrow this range as I do not have available other versions 
between this date range.


I tested other read.* functions and all seems to work as expected.

The problem is connected **only** with reading from clipboard. Maybe 
it is the issue of Windows, but I cannot see anything weird when 
copying e.g. from Excel to Notepad


Cheers
Petr




On 8/16/2017 1:25 AM, PIKAL Petr wrote:

Hi Duncan

The simples spreadsheet is:

Put a name in the cell, let say "a1"
Put number e.g. 1 below "a1"
Copy the number to enough rows
Select this column and press ctrl-c

result is


temp<- read.delim("clipboard")
str(temp)

'data.frame':   1513 obs. of  1 variable:
   $ a1: Factor w/ 2 levels "1","a1": 1 1 1 1 1 1 1 1 1 1 ...

which(temp$a1=="a1")

[1] 1365
I tested it in vanilla R


sessionInfo()

R Under development (unstable) (2017-07-31 r73003)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 14393)

Matrix products: default

locale:
[1] LC_COLLATE=Czech_Czech Republic.1250 LC_CTYPE=Czech_Czech

Republic.1250

[3] LC_MONETARY=Czech_Czech Republic.1250 LC_NUMERIC=C
[5] LC_TIME=Czech_Czech Republic.1250

attached base packages:
[1] stats graphics  grDevices utils datasets methods   base

loaded via a namespace (and not attached):
[1] compiler_3.5.0
Excel 16 or 15 I am not sure.

R-devel 2015 (69443) works as expected so it started a believe 
around May

or June this year, when I installed new R version.
I hope it could help to trace the problem. If I can help any 
further, let me

know.

Best regards
Petr




-Original Message-
From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com]
Sent: Wednesday, August 16, 2017 12:35 AM
To: PIKAL Petr ; r-devel@r-project.org
Subject: Re: [Rd] strange behaviour read.table and clipboard

On 15/08/2017 10:03 AM, PIKAL Petr wrote:

Dear all

I used to transfer data from excel to R by simple ctrl-c and
read.delim("clipboard") construction. I know it is a bad practice 
but it is easy
and for quick exploratory work it is OK. However after changing to 
new R

devel
few days ago I encountered weird behaviour. I tried one or two 
columns.


You haven't posted something that is reproducible.  I don't have 
Excel, but I

can

cut and paste from Libreoffice, and I don't see this.
However, it's not the same spreadsheet as you used, so I wouldn't be
comfortable saying I did what you did.

Please reduce the size of your spreadsheet if you can, and then post
instructions for how to construct it, and what to cut and paste 
from it.
Then others can try what you did and see if this is specific 
to your machine,

to

that particular version of R-devel, to Excel, etc.

Duncan Murdoch



In case of 2 columns, header is repeated after 526 items

mar<-read.delim("clipboard")
which(mar$a2=="a1")

[1]  525 1051 1577

diff(which(mar$a2=="a1"))

[1] 526 526
and only first header item is repeated.

In case of one column, header is repeated after 1107 items


mar<-read.delim("clipboard")
diff(which(mar$a2=="a2"))

[1] 1107 1107

And all items in object are therefore changed to factor.

BTW, readxl package works on same excel file smoothly.

I will try to download the most recent R version to check it, but 
it could

take

some time due to our IT issues.

Best regards
Petr


version

 _
platform   x86_64-w64-mingw32
arch   x86_64
os mingw32
system x86_64, mingw32
status Under development (unstable)
major  3
minor  5.0
year   2017
month  07
day31
svn rev73003
language   R
version.string R Under development (unstable) (2017-07-31 r73003)
nickname   Unsuffered Consequences



Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné 
a jsou

určeny pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě 
neprodleně
jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie 
vymažte ze

svého systému.

Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento

email

jakkoliv užívat, rozšiřovat, kopírovat či zveř

Re: [Rd] Change of variable address due to GC

2017-09-08 Thread Tomas Kalibera


I think you might get a more useful answer if you say what you want to 
achieve.


"address(x)" does not give you an address of variable "x" but of an 
object bound to x. The GC in R is non-moving, it does not relocate 
objects. However, a number of things can happen that will change the 
binding of "x" to point to a (modified) copy of the original object 
(such as replacement operations like x[3] <- 4 or class(x) <- "foo"). 
When these copies are made is implementation dependent and may change 
between R versions or may become unpredictable to the R program. An R 
program should only depend on values stored in an object, not on the 
location of that object, and this is also why "address(x)" is not part 
of the base packages.


Best
Tomas

On 09/08/2017 04:08 PM, lille stor wrote:

Hi,
  
I would like to know if the Garbage Collector (GC) changes the address of a variable in R. In other words, assuming the following code:
  
   library(pryr)
  
   x <- 1:1024
  
   addr <- address(x)  # save address of variable "x" in "addr"


   .
   .
   .
   (execution of operations that create/destroy many small/big objects in 
memory, which will likely make the GC to be called)
   .
   .
   .
  
   if (addr != address(x))

   {
   print("Address of x changed!")
   }
  
  
Will the message "Address of x changed!" be ever printed?
  
Thank you!


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] unpackPkgZip: "unable to move temporary installation" due to antivirus

2017-09-21 Thread Tomas Kalibera

This windows/anti-virus problem has been worked around in R-devel 73329.
Thanks to Mike for reporting this and testing the changes.

Best
Tomas

On 09/13/2017 01:40 AM, Mike Toews wrote:

Hi,

Me and an office colleague on Microsoft Windows 10 PCs are having
difficulty installing any package. This is a recent issue for us, and
we suspect our McAfee antivirus has modified by our IT department.
Let's take, for example, install.packages("mypackage"), here is the
output:

package ‘mypackage’ successfully unpacked and MD5 sums checked
Warning in install.packages :
   unable to move temporary installation
‘C:\Users\mtoews\Documents\R\win-library\3.3\file382064842da2\mypackage’
to ‘C:\Users\mtoews\Documents\R\win-library\3.3\mypackage’

Debugging, I found the issue around here:
https://github.com/wch/r-source/blob/980c15af89d99c04e09a40708512a57c49d1c6ee/src/library/utils/R/windows/install.packages.R#L173-L174

## To avoid anti-virus interference, wait a little
Sys.sleep(0.5)

As indicated by an answer
(https://stackoverflow.com/a/44256437/327026), debugging slows down
the function to allow the package to be installed. A simple fix is to
increase the sleep time to a time that is longer than 0.5 seconds.
(I've tried testing new times, but I can't seem to overload this
function). Or use a different strategy, such as using a few attempts
with increasing wait times, or using a custom unlink function.

Happy to help out or test more on this issue. Also, if any R Core
member could add me to R's Bugzilla members, that would be convenient
for me.

Cheers,
Mike

R version 3.3.3 (2017-03-06) -- "Another Canoe"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] calling R API functions after engine shutdown

2017-09-21 Thread Tomas Kalibera


Calling R_ReleaseObject in a C++ destructor is not reliable - it can be 
bypassed by a non-local return, such as an error. Generally in R one 
cannot use C++ destructors reliably for anything that the R runtime 
wouldn't do on its own in case of a non-local return.


A destructor that calls just UNPROTECT, in a way that balances out the 
protection stack (e.g. Rcpp Shield), is safe because R runtime balances 
the protection stack on non-local return. Destructors used in code that 
will never call into any R API (such as in a third party library) are 
safe, because the R runtime could not do non-local return. All other 
destructors are a problem.


UNPROTECT will work even during R shutdown.

In some cases cleanup code that would be put in C++ destructors, if they 
were safe with R, can instead be put into R finalizers.


Tomas



On 09/21/2017 04:41 PM, Lukas Stadler wrote:

Hi!

We’ve recently come across an example where a package (minqa) creates an Rcpp 
Function object in a static variable.
This causes R_ReleaseObject to be called by the destructor at a very late point 
in time - as part of the system exit function:

static Function cf("c");

I’m wondering if that is considered to be “safe”?
Is the R engine supposed to stay in a state where calls to API functions are 
valid, even after it has shut down?
It probably only ever happens with the ReleaseObject function, but even that 
could cause problems, e.g., with more elaborate refcounting schemes.

- Lukas
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] socketSelect(..., timeout): non-integer timeouts in (0, 2) (?) equal infinite timeout on Linux - weird

2017-10-05 Thread Tomas Kalibera

Fixed in 73470

Best,
Tomas

On 10/05/2017 06:11 AM, Henrik Bengtsson wrote:

I'd like to follow up/bump the attention to this bug causing the
timeout to fail for socketSelect() on Unix.  It is still there in R
3.4.2 and R-devel.  I've identified the bug in the R source code - the
bug is due to floating-point precisions and comparison using >=.  See
PR17203 (https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17203)
for details and a patch.  I've just reverified that the patch still
solves the problem on trunk (SVN r73463).

Thanks,

/Henrik

On Sat, Oct 1, 2016 at 1:11 PM, Henrik Bengtsson
 wrote:

There's something weird going on for certain non-integer values of
argument 'timeout' to base::socketSelect().  For such values, there is
no timeout and you effectively end up with an infinite timeout.   I
can reproduce this on R 3.3.1 on Ubuntu 16.04 and RedHat 6.6, but not
on Windows (via Linux Wine).

# 1. In R master session

con <- socketConnection('localhost', port = 11001, server = TRUE, blocking = 
TRUE, open = 'a+b')

# 2. In R servant session (connect to the above master socket)

con <- socketConnection('localhost', port = 11001, server = FALSE, blocking = 
TRUE, open = 'a+b')

# 3. In R master session (check if there's something available on connection)
# Wait at most 0 seconds

t <- system.time(r <- socketSelect(list(con), write = FALSE, timeout = 0)); 
print(t); print(r)

user  system elapsed
   0   0   0
[1] FALSE

# Wait at most 1 seconds

t <- system.time(r <- socketSelect(list(con), write = FALSE, timeout = 1)); 
print(t); print(r)

user  system elapsed
   0.000   0.000   1.002
[1] FALSE

# Wait at most 2 seconds

t <- system.time(r <- socketSelect(list(con), write = FALSE, timeout = 2)); 
print(t); print(r)

user  system elapsed
   0.000   0.000   2.002
[1] FALSE

# Wait at most 2.5 seconds

t <- system.time(r <- socketSelect(list(con), write = FALSE, timeout = 2.5)); 
print(t); print(r)

user  system elapsed
   0.000   0.000   2.502
[1] FALSE

# Wait at most 2.1 seconds

t <- system.time(r <- socketSelect(list(con), write = FALSE, timeout = 2.1)); 
print(t); print(r)

user  system elapsed
   0.000   0.000   2.101
[1] FALSE

However, here are some weird cases where the value of the 'timeout'
argument is ignored:

# Wait at most 1.9 seconds

t <- system.time(r <- socketSelect(list(con), write = FALSE, timeout = 1.9)); 
print(t); print(r)

^C   user  system elapsed
   3.780  14.888  20.594


t <- system.time(r <- socketSelect(list(con), write = FALSE, timeout = 0.1)); 
print(t); print(r)

^C   user  system elapsed
   2.596  11.208  13.907
[1] FALSE

Note how I had to signal a user interrupt (Ctrl-C) to exit
socketSelect().  Also, not that it still works with the timeout values
chosen above, e.g.


t <- system.time(r <- socketSelect(list(con), write = FALSE, timeout = 0)); 
print(t); print(r)

user  system elapsed
   0   0   0
[1] FALSE

t <- system.time(r <- socketSelect(list(con), write = FALSE, timeout = 1)); 
print(t); print(r)

user  system elapsed
   0.000   0.000   1.001
[1] FALSE


t <- system.time(r <- socketSelect(list(con), write = FALSE, timeout = 2.1)); 
print(t); print(r)

user  system elapsed
   0.000   0.000   2.103
[1] FALSE

It's almost as if there is something special with non-integer values
in (0,2).  Not saying these are the only cases, but that's what I've
observed by trial and error.  Weird.  The fact that it works on
Windows, may suggest it is a Unix specific.  Anyway with macOS that
wanna confirm?

/Henrik

Session information details:

# Ubuntu 16.04

sessionInfo()

R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_3.3.1

# RedHat 6.6:

sessionInfo()

R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
[1] C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_3.3.1

# Windows via Wine on Linux

sessionInfo()

R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows XP x64 (build 2600) Service Pack 3

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=C LC_NUMERIC=C
[5] LC_TIME=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_3.3.1

__
R-devel

Re: [Rd] Bug: Issues on Windows with SFN disabled

2017-10-17 Thread Tomas Kalibera

Hi Zach,

thanks for the report, I can reproduce the problem and confirm it is a 
bug in R and will be fixed.


Hopefully it only impacts few users now. The workaround is to create the 
short name for the directory where R is installed, using "fsutil file 
setshortname" (for all elements of the path that contain space in their 
name). One can revert this by setting the shortname to an empty string 
(""). At least for the latter one may need to boot in safe mode.


Best
Tomas


On 09/17/2017 08:23 PM, Zach Bjornson wrote:

Hello,

R appears to assume that Windows drives have short file names (SFN, 8.3)
enabled; for example, that "C:/Program Files/..." is addressable as
"C:/Progra~1/...". Newer versions of Windows have SFN disabled on non-OS
drives, however.

This means that if you install R on a non-OS drive, you
- can't start R.exe from the command line.
- consequently, anything that attempts to spawn a new R process also fails.
This includes a lot of the commands from the popular devtools package. More
discussion and background: https://github.com/hadley/devtools/issues/1514

I don't have access to bugzilla to file this there.

Thanks and best,
Zach

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bug: Issues on Windows with SFN disabled

2017-10-20 Thread Tomas Kalibera


This has now been mostly fixed in R-devel. What remains to be resolved 
is that some packages with custom make files cannot be installed from 
source (when R is installed into a directory with space in its name and 
short file names are not available)


Tomas



On 10/17/2017 10:37 AM, Tomas Kalibera wrote:

Hi Zach,

thanks for the report, I can reproduce the problem and confirm it is a 
bug in R and will be fixed.


Hopefully it only impacts few users now. The workaround is to create 
the short name for the directory where R is installed, using "fsutil 
file setshortname" (for all elements of the path that contain space in 
their name). One can revert this by setting the shortname to an empty 
string (""). At least for the latter one may need to boot in safe mode.


Best
Tomas


On 09/17/2017 08:23 PM, Zach Bjornson wrote:

Hello,

R appears to assume that Windows drives have short file names (SFN, 8.3)
enabled; for example, that "C:/Program Files/..." is addressable as
"C:/Progra~1/...". Newer versions of Windows have SFN disabled on non-OS
drives, however.

This means that if you install R on a non-OS drive, you
- can't start R.exe from the command line.
- consequently, anything that attempts to spawn a new R process also 
fails.
This includes a lot of the commands from the popular devtools 
package. More
discussion and background: 
https://github.com/hadley/devtools/issues/1514


I don't have access to bugzilla to file this there.

Thanks and best,
Zach

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Illegal Logical Values

2017-10-23 Thread Tomas Kalibera

On 10/21/2017 04:14 PM, Radford Neal wrote:

On Fri, 2017-10-20 at 14:01 +, brodie gaslam via R-devel wrote:

I'm thinking of this passage:


Logical values are sent as 0 (FALSE), 1 (TRUE) or INT_MIN =
-2147483648 (NA, but only if NAOK is true), and the compiled code
should return one of these three values. (Non-zero values other
than INT_MIN are mapped to TRUE.)

The parenthetical seems to suggest that something like 'LOGICAL(x)[0]
= 2;' will be treated as TRUE, which it sometimes is, and sometimes
isn't:

From: Martyn Plummer 
The title of Section 5.2 is "Interface functions .C and .Fortran" and
the text above refers to those interfaces. It explains how logical
vectors are mapped to C integer arrays on entry and back again on exit.

This does work as advertised.


Not always.  As I reported on bugzilla three years ago (PR#15878), it
only works if the logical argument does not have to be copied.  The
bug has been fixed in pqR since pqR-2014-09-30.

Radford Neal

Thanks, that's indeed a bug - now fixed in 73583.
Tomas



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Memory address of character datatype

2017-11-02 Thread Tomas Kalibera
If you were curious about the hidden details of the memory layout in R, 
the best reference is the source code. In your example, you are not 
getting to your string because there is one more pointer in the way, "x" 
is a vector of strings, each string is represented by a pointer.


At C level, there is an API for getting an address of the value, e.g. 
INTEGER(x) or CHAR(STRING_ELT(x)).

At R level, there is no such API.

You should never bypass these APIs.  The restrictions of the APIs allow 
us to change details of the memory layout between svn versions or even 
as the program executes (altrep), in order to save memory or improve 
performance. Also, it means that the layout can be slightly different 
between platforms, e.g. 32-bit vs 64-bit.


Unfortunately address(x) from pryr bypasses the APIs - you should never 
use address(x) in your programs and I wish address(x) did not exist. If 
you had a concrete problem at hand you wanted to solve with 
"address(x)", feel free to ask for a viable solution.


Best
Tomas




On 11/01/2017 07:37 PM, lille stor wrote:

Hi,
  
To get the memory address of where the value of variable "x" (of datatype "numeric") is stored one does the following in R (in 32 bit):
  
       library(pryr)

       x <- 1024
       addr <- as.numeric(address(x)) + 24    # 24 is needed to jump the 
variable info and point to the data itself (i.e. 1024)
  
The question now is what is the value of the jump so that one can obtain the memory address of where the value of variable "x" (of datatype "character"):
  


   library(pryr)
       x <- "abc"
       addr <- as.numeric(address(x)) + ??    # what should be the value of the jump so 
that it points to the data of variable "x" (i.e. abc)?
  
Thank you in advance!


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] range function with finite=T and logical parameters

2017-11-07 Thread Tomas Kalibera

FYI this has been fixed in R-devel by Martin
Tomas

On 10/23/2017 06:36 PM, Martin Maechler wrote:

Lukas Stadler 
 on Mon, 23 Oct 2017 15:56:55 +0200 writes:

 > Hi!
 > I was wondering about the behavior of the range function wrt. logical 
NAs:

 >> range(c(0L, 1L, NA), finite=T)
 > [1] 0 1
 >> range(c(F, T, NA), finite=T)
 > [1] NA NA

 > The documentation is quite clear that "finite = TRUE includes na.rm = 
TRUE”, so that I would have assumed that these two snippets would produce the same 
result.

 > - Lukas


I agree.  Further, another informal "rule" would require that the two calls

  range(L, *)
  range(as.numeric(L), *)

are equivalent for logical vectors L without attributes.
I'll look into fixing this by an obvious change to (R-level)
range.default().

--

Note for the more advanced ones -- i.e. typical R-devel readers :

T and F are variables in R.  For that reason, using the language
keywords TRUE and FALSE is much preferred in such cases.  For
some tests we'd even use

 T <- FALSE

or even

 delayedAssign("F", stop("do not use 'F'  when programming with R"))

before running the tests -- just do ensure that the code to be
tested does not use these short forms.


Thank you, Lukas,  for the report!

Best,
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Rscript Bug Report (improper parsing of [args])

2017-11-08 Thread Tomas Kalibera

Thanks for the report, fixed in R-devel.
Tomas

On 10/20/2017 08:09 PM, Trevor Davis wrote:

Hi,

A user of my `optparse` package discovered a bug in Rscript's parsing of
[args]. (https://github.com/trevorld/optparse/issues/24)

I've reproduced the bug on my machine including compiling and checking the
development version of R.  I couldn't find a mention of it in the Bug
Tracker or New Features.

Can be minimally reproduced on the UNIX command line with following
commands:

 bash$ touch test.R
 bash$ Rscript test.R -g 5

 WARNING: unknown gui '5', using X11

This is a bug because according to the documentation in ?Rscript besides
`-e` the only [options] Rscript should attempt to parse should

1) Come before the file i.e. `Rscript -g X11 test.R` and not `Rscript
test.R -g X11`
2) Begin with two dashes and not one i.e. `--` and not `-' i.e. `Rscript
--gui=X11 test.R` and not `Rscript -g X11 test.R` (although I'm not sure if
the command-line Rscript even needs to be supporting the gui option).

Thanks,

Trevor

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] check does not check that package examples remove tempdir()

2017-11-10 Thread Tomas Kalibera

Please note there is parallel::mcparallel/mccollect in R which provides 
similar functionality, mcparallel starts a new job and mccollect allows 
to wait for it.

You are right about _exit, but there are additional issues which cannot 
be solved independently in an external package, and, such a low level 
interface cannot be used without race conditions from R anyway.

Best
Tomas

On 11/09/2017 02:55 AM, danlrobertso...@gmail.com wrote:
>> tempdir(). I think it happens because the forked process shares the
>> value of tempdir() with the parent process and removes it when it
>> exits.
> This is very likely the case. Pretty much the entire library can be
> summed up by bfork_fork, which is the following.
>
>  SEXP res;
>  pid_t pid;
>  if((pid = fork()) == 0) {
>  PROTECT(res = eval(lang1(fn), R_GlobalEnv));
>  PROTECT(res = eval(lang2(install("q"), mkString("no")), 
> R_GlobalEnv));
>  UNPROTECT(2);
>  }
>
>  return ScalarInteger(pid);
>
> I wrote this lib when I was still in school and can see several issues
> with the implementation of `bfork_fork`. This issue happens because
> we do not exit with _exit, but by essentially calling q("no").
>
> Cheers,
>
> Dan
>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] command line arguments are parsed differently on windows, from 3.4.3

2017-12-05 Thread Tomas Kalibera

Thanks, will fix this
Best
Tomas

On 12/05/2017 12:44 PM, Gábor Csárdi wrote:

I wonder if this is intended.

Thanks,
Gabor


C:\Users\rhub>"c:\Program Files\R\R-3.4.2\bin\R" -q -e "1 + 1"

1 + 1

[1] 2



C:\Users\rhub>"c:\Program Files\R\R-3.4.3\bin\R" -q -e "1 + 1"
'c:\PROGRA~1\R\R-34~1.3\bin\x64\R.exe" -q -e "1' is not recognized as an interna
l or external command,
operable program or batch file.

C:\Users\rhub>"c:\Program Files\R\R-devel\bin\R" -q -e "1 + 1"
'c:\PROGRA~1\R\R-devel\bin\x64\R.exe" -q -e "1' is not recognized as an internal
  or external command,
operable program or batch file.

C:\Users\rhub>"c:\Program Files\R\R-3.4.3patched\bin\R" -q -e "1 + 1"
'c:\PROGRA~1\R\R-34~1.3PA\bin\x64\R.exe" -q -e "1' is not recognized as an inter
nal or external command,
operable program or batch file.

C:\Users\rhub>"c:\Program Files\R\R-devel\bin\R" -q -e "1+1"

1+1

[1] 2



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] command line arguments are parsed differently on windows, from 3.4.3

2017-12-05 Thread Tomas Kalibera
A quick workaround if you needed to execute R expressions in Windows is 
calling RTerm directly.

But a fix should be available soon.

Tomas

On 12/05/2017 05:51 PM, Henrik Bengtsson wrote:

Sorry for not reading carefully and thanks for confirming problem with
Rscript too.

On Dec 5, 2017 08:47, "Gábor Csárdi"  wrote:


On Tue, Dec 5, 2017 at 4:40 PM, Henrik Bengtsson
 wrote:

One comment:
For your R devel example you didn't use spaces in the expression, i.e.

maybe

that's broken too with spaces?

I did. There are two R-devel examples, one with spaces (buggy) and one
without spaces (works).
To show that spaces are the problem.


Three questions:
Does it work if you avoid spaces?

It seems so.


Does it work if you use single quotes?

It does not, single quotes are not special characters for windows, so
you'll get a different error. In
R -q -e '1 + 1'
there are three arguments after the -e: '1 and + and 1'


Does this also occur for Rscript?

It seems so indeed.

Gabor


Thxs

Henrik


On Dec 5, 2017 03:44, "Gábor Csárdi"  wrote:

I wonder if this is intended.

Thanks,
Gabor


C:\Users\rhub>"c:\Program Files\R\R-3.4.2\bin\R" -q -e "1 + 1"

1 + 1

[1] 2



C:\Users\rhub>"c:\Program Files\R\R-3.4.3\bin\R" -q -e "1 + 1"
'c:\PROGRA~1\R\R-34~1.3\bin\x64\R.exe" -q -e "1' is not recognized as an
interna
l or external command,
operable program or batch file.

C:\Users\rhub>"c:\Program Files\R\R-devel\bin\R" -q -e "1 + 1"
'c:\PROGRA~1\R\R-devel\bin\x64\R.exe" -q -e "1' is not recognized as an
internal
  or external command,
operable program or batch file.

C:\Users\rhub>"c:\Program Files\R\R-3.4.3patched\bin\R" -q -e "1 + 1"
'c:\PROGRA~1\R\R-34~1.3PA\bin\x64\R.exe" -q -e "1' is not recognized as

an

inter
nal or external command,
operable program or batch file.

C:\Users\rhub>"c:\Program Files\R\R-devel\bin\R" -q -e "1+1"

1+1

[1] 2



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] command line arguments are parsed differently on windows, from 3.4.3

2017-12-07 Thread Tomas Kalibera

Fixed in R-devel and R-patched.

Tomas

On 12/05/2017 05:58 PM, Tomas Kalibera wrote:
A quick workaround if you needed to execute R expressions in Windows 
is calling RTerm directly.

But a fix should be available soon.

Tomas

On 12/05/2017 05:51 PM, Henrik Bengtsson wrote:

Sorry for not reading carefully and thanks for confirming problem with
Rscript too.

On Dec 5, 2017 08:47, "Gábor Csárdi"  wrote:


On Tue, Dec 5, 2017 at 4:40 PM, Henrik Bengtsson
 wrote:

One comment:
For your R devel example you didn't use spaces in the expression, i.e.

maybe

that's broken too with spaces?

I did. There are two R-devel examples, one with spaces (buggy) and one
without spaces (works).
To show that spaces are the problem.


Three questions:
Does it work if you avoid spaces?

It seems so.


Does it work if you use single quotes?

It does not, single quotes are not special characters for windows, so
you'll get a different error. In
R -q -e '1 + 1'
there are three arguments after the -e: '1 and + and 1'


Does this also occur for Rscript?

It seems so indeed.

Gabor


Thxs

Henrik


On Dec 5, 2017 03:44, "Gábor Csárdi"  wrote:

I wonder if this is intended.

Thanks,
Gabor


C:\Users\rhub>"c:\Program Files\R\R-3.4.2\bin\R" -q -e "1 + 1"

1 + 1

[1] 2



C:\Users\rhub>"c:\Program Files\R\R-3.4.3\bin\R" -q -e "1 + 1"
'c:\PROGRA~1\R\R-34~1.3\bin\x64\R.exe" -q -e "1' is not recognized 
as an

interna
l or external command,
operable program or batch file.

C:\Users\rhub>"c:\Program Files\R\R-devel\bin\R" -q -e "1 + 1"
'c:\PROGRA~1\R\R-devel\bin\x64\R.exe" -q -e "1' is not recognized 
as an

internal
  or external command,
operable program or batch file.

C:\Users\rhub>"c:\Program Files\R\R-3.4.3patched\bin\R" -q -e "1 + 1"
'c:\PROGRA~1\R\R-34~1.3PA\bin\x64\R.exe" -q -e "1' is not 
recognized as

an

inter
nal or external command,
operable program or batch file.

C:\Users\rhub>"c:\Program Files\R\R-devel\bin\R" -q -e "1+1"

1+1

[1] 2



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Bug: Issues on Windows with SFN disabled

2017-12-07 Thread Tomas Kalibera


An update on this. Writing R Extensions does not recommend to have a 
space character in R_HOME. This means that on Windows one either should 
have SFN enabled (which is still the common case), or install into a 
directory that does not have a space in its name (so specifically not 
into "Program Files"). This recommendation unfortunately needs to stay 
for now.


WRE recommends that Makefiles are written to be robust against space 
characters inside R_HOME. All path names passed from a Makefile to the 
shell should be quoted at least if they include R_HOME. Make "include" 
directives should not be used on path names that are derived from 
R_HOME, but one should instead use the "-f" option multiple times when 
recursively invoking make. Maintainers of packages that use "include" 
with R_HOME have been notified. Unfortunately, the number of packages 
that do not quote pathnames with R_HOME in Makefiles is rather large, so 
fixing will take some time.


Currently, R-devel should build fine on Windows with R_HOME including 
space, including all base and recommended packages, and tests for these 
packages should pass even though this is not regularly tested. If you 
find a case when this does not work, please submit a bug report.


Thanks
Tomas


On 10/20/2017 04:29 PM, Tomas Kalibera wrote:


This has now been mostly fixed in R-devel. What remains to be resolved 
is that some packages with custom make files cannot be installed from 
source (when R is installed into a directory with space in its name 
and short file names are not available)


Tomas



On 10/17/2017 10:37 AM, Tomas Kalibera wrote:

Hi Zach,

thanks for the report, I can reproduce the problem and confirm it is 
a bug in R and will be fixed.


Hopefully it only impacts few users now. The workaround is to create 
the short name for the directory where R is installed, using "fsutil 
file setshortname" (for all elements of the path that contain space 
in their name). One can revert this by setting the shortname to an 
empty string (""). At least for the latter one may need to boot in 
safe mode.


Best
Tomas


On 09/17/2017 08:23 PM, Zach Bjornson wrote:

Hello,

R appears to assume that Windows drives have short file names (SFN, 
8.3)

enabled; for example, that "C:/Program Files/..." is addressable as
"C:/Progra~1/...". Newer versions of Windows have SFN disabled on 
non-OS

drives, however.

This means that if you install R on a non-OS drive, you
- can't start R.exe from the command line.
- consequently, anything that attempts to spawn a new R process also 
fails.
This includes a lot of the commands from the popular devtools 
package. More
discussion and background: 
https://github.com/hadley/devtools/issues/1514


I don't have access to bugzilla to file this there.

Thanks and best,
Zach

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel







__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Bug: Issues on Windows with SFN disabled

2017-12-08 Thread Tomas Kalibera

On 12/07/2017 06:15 PM, Dirk Eddelbuettel wrote:

On 7 December 2017 at 17:56, Tomas Kalibera wrote:
|
| An update on this. Writing R Extensions does not recommend to have a
| space character in R_HOME. This means that on Windows one either should
| have SFN enabled (which is still the common case), or install into a
| directory that does not have a space in its name (so specifically not
| into "Program Files"). This recommendation unfortunately needs to stay
| for now.
|
| WRE recommends that Makefiles are written to be robust against space
| characters inside R_HOME. All path names passed from a Makefile to the
| shell should be quoted at least if they include R_HOME. Make "include"
| directives should not be used on path names that are derived from
| R_HOME, but one should instead use the "-f" option multiple times when
| recursively invoking make. Maintainers of packages that use "include"
| with R_HOME have been notified. Unfortunately, the number of packages
| that do not quote pathnames with R_HOME in Makefiles is rather large, so
| fixing will take some time.
|
| Currently, R-devel should build fine on Windows with R_HOME including
| space, including all base and recommended packages, and tests for these
| packages should pass even though this is not regularly tested. If you
| find a case when this does not work, please submit a bug report.

Why does the Windows installer default to using a directory with spaces?
It's a convention on Windows and I guess there may be problems with 
permissions on other directories. My hope is we can make R work reliably 
without SFN just in time before SFN become disabled by default, after 
all, quoting pathnames in Makefiles (or shell scripts for that matter) 
is a good practice anyway and avoiding "include" is not a big problem as 
very few packages are affected.


But thanks for opening this and I am happy for insights from any Windows 
experts on the issue. I would not want to violate the convention for all 
users when just few of them have SFN disabled, and as I hope this will 
be fixed on R/packages side, but maybe the installer could at least 
detect the problem (when "Program Files" or another specified target 
directory did not have a short name). Or perhaps also suggest a 
different default. Certainly R could print a warning when it starts.


Tomas

Related (but moderately more advanced), why does R still install "everything"
under one (versioned) directory so that uninformed users on upgrade "miss"
all previously installed packages?

Why not (with space for exposition here, imagine s/ // everywhere below)

 $SOMEROOTDIR / R /
   R-a.b.c/  # before
   R-a.b.d/  # d > c, here
   site-library/ # with .libPaths having this preset?

I don't really care as I manage to work mostly / entirely on another OS, but
I just don't understand why we do not put two and two together. But I am
likely unaware of some salient issues.




In any event, I appreciate the thankless work of those taking care of Windoze
(ie Tomas, Jeroen, Duncan (now ex-officio), Brian, ...)

Dirk



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wish List: base::source() + Add Execution Time Argument

2018-01-02 Thread Tomas Kalibera
There is a simple way to achieve something similar: one can add a time 
stamp to each line of output, e.g. using "ts" command from "moreutils".


Tomas

On 12/21/2017 06:45 PM, Jim Hester wrote:

R does provide the addTaskCallback / taskCallbackManager to run a
callback function after every top level command. However there is not
an equivalent interface that would be run _before_ each command, which
would make it possible to time of top level calls and provide other
execution measurements.

On Thu, Dec 21, 2017 at 11:31 AM, William Dunlap via R-devel
 wrote:

Is source() the right place for this?  It may be, but we've had customers
who would like
this sort of thing done for commands entered by hand.  And there are those
who want
a description of any "non-triivial" objects created in .GlobalEnv by each
expression, ...
Do they need a way to wrap each expression evaluated in envir=.GlobalEnv
with a
function of their choice, one that would print times, datasets created,
etc.?

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, Dec 21, 2017 at 3:46 AM, Juan Telleria  wrote:


Dear R Developers,

Adding to source() base function a Timer which indicates the execution time
of the source code would be a very well welcome feature, and in my opinion
not difficult to implement as an additional funtion argument.

The source(timing = TRUE) function shall execute internally the following
code for each statement:

old <- Sys.time() # get start time at the beginning of source()
# source code
# print elapsed time
new <- Sys.time() - old # calculate difference
print(new) # print in nice format

Thank you.

Kind regards,

Juan Telleria

 [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


 [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Fixed BLAS tests for external BLAS library

2018-01-04 Thread Tomas Kalibera

In practical terms, failing tests are not preventing anyone from using 
an optimized BLAS/LAPACK implementation they trust. Building R with 
dynamically linked BLAS on Unix is supported, documented and easy for 
anyone who builds R from source. It is also how Debian/Ubuntu R packages 
are built by default, so R uses whichever BLAS is installed in the 
system and the user does not have to build from source. There is no 
reason why not to do the same thing with another optimized BLAS on 
another OS/distribution.

You may be right that reg-BLAS is too strict (it is testing matrix 
products, expecting equivalence to naive three-loop algorithm, just part 
of it really uses BLAS). I just wanted a concrete example to think about 
as I can't repeat it (e.g. it passes with openblas), but maybe someone 
else will be able to repeat and possibly adjust.

Tomas

On 01/04/2018 09:23 PM, Simon Guest wrote:
> Hi Tomas,
>
> Thanks for your reply.
>
> I find your response curious, however.  Surely the identical() test is 
> simply incorrect when catering for possibly different BLAS 
> implementations?  Or is it the case that conformant BLAS 
> implementations all produce bit-identical results, which seems 
> unlikely?  (Sorry, I am unfamiliar with the BLAS spec.)  Although 
> whatever the answer to this theoretical question, the CentOS 7 
> external BLAS library evidently doesn't produce bit-identical results.
>
> If you don't agree that replacing identical() with all.equal() is 
> clearly the right thing to do, as demonstrated by the CentOS 7 
> external BLAS library failing the test, then I think I will give up 
> now trying to help improve the R sources.  I simply can't justify to 
> my client more time spent on making this work, when we already have a 
> local solution (which I hoped others would be able to benefit from).  
> Ah well.
>
> cheers,
> Simon
>
> On 5 January 2018 at 00:07, Tomas Kalibera  <mailto:tomas.kalib...@gmail.com>> wrote:
>
> Hi Simon,
>
> we'd need more information to consider this - particularly which
> expression gives an imprecise result with ACML and what are the
> computed values, differences. It is not common for optimized BLAS
> implementations to fail reg-BLAS.R tests, but it is common for
> them to report numerical differences in tests of various
> recommended packages where more complicated computations are done
> (e.g. nlme), on various platforms.
>
> Best
> Tomas
>
>
> On 12/18/2017 08:56 PM, Simon Guest wrote:
>
> We build R with dynamically linked BLAS and LAPACK libraries,
> in order
> to use the AMD Core Math Library (ACML) multi-threaded
> implementation
> of these routines on our 64 core servers.  This is great, and our
> users really appreciate it.
>
> However, when building like this, make check fails on the
> reg-BLAS.R
> test.  The reason for this is that the expected test output is
> checked
> using identical.  By changing all uses of identical in this
> file to
> all.equal, the tests pass.
>
> Specifically, I run this command before make check:
>
> $ sed -i -e 's/identical/all.equal/g' tests/reg-BLAS.R
>
> I suggest that the test is fixed like this in the R source code.
>
> Here is the configure command I use, on CentOS 7:
> $ ./configure --with-system-zlib --with-system-bzlib
> --with-system-pcre \
>      --with-blas \
>      --with-lapack \
>      --with-tcl-config=/usr/lib64/tclConfig.sh \
>      --with-tk-config=/usr/lib64/tkConfig.sh \
>      --enable-R-shlib \
>      --enable-prebuilt-html
>
> cheers,
> Simon
>
> __
> R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> <https://stat.ethz.ch/mailman/listinfo/r-devel>
>
>
>
>


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Fwd: R/MKL Intel 2018 Compatibility

2018-01-08 Thread Tomas Kalibera
Hi Guillaume,

In principle, mycrossprod function does not need to PROTECT "ans", 
because it does not call any allocating function after allocating "ans" 
("dgemm" in particular should not allocate from the R heap). So it is 
surprising that PROTECTion makes a difference in your case. I agree 
there is no harm protecting defensively. R itself calls dgemm with the R 
object for the result protected when calculating matrix products, but 
there it is needed because there is further allocation when setting up 
attributes for the result.

Best
Tomas


On 01/08/2018 02:41 PM, Guillaume Collange wrote:
> Dear all,
>
>
>
> I would like to submit an issue that we are facing.
>
>
>
> Indeed, in our environment, we are optimizing the R code to speed up some
> mathematical calculations as matrix products using the INTEL libraries (
> MKL) ( https://software.intel.com/en-us/mkl )
>
>
>
> With the last version of the MKL libraries Intel 2018, we are facing to an
> issue with *all INTERNAL command* that are executing in R. The R console is
> freezing executing a process at 100% and never stop!!! It’s really an issue
> for us.
>
>
>
> As example, we can reproduce the error with *crossprod. Crossprod *which is
> a wrapper of BLAS GEMM (optimized with MKL libraries), in this function it
> seems that variables are not protected ( PROTECT(); UNPROTECT() ), see the
> screenshot below, which is a recommendation for external commands:
>
>
>
> Picture1
>
>
> *RECOMMANDATION*
>
> *Picture2*
>
> *Code of CROSSPROD*
>
>   Picture 3
>
>
>
> If we are recoding the CROSSPROD function with PROTECTT
>
> No more issues…
>
>
>
>
>
> Do you have any idea to solve this bug? Any recommendations?
>
>
>
>
>
> Thank you by advance for your help.
>
>
>
>
>
> Best regards,
>
> Guillaume Collange
>
>
>
>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R CMD build then check fails on R-devel due to serialization version

2018-01-12 Thread Tomas Kalibera
To reduce difficulties for people relying on automated tests set up to 
build&"check --as-cran" using R-devel (e.g. travis-ci), the default 
serialization version has been temporarily switched back to 2. Thank you 
for your patience - according to svn history, the last change of the 
serialization format happened 16 years ago, and unsurprisingly some 
practices that developed since did not anticipate such change and have 
to be adapted.


CRAN is now protected against packages containing serialized files in 
format 3 (which not only is not readable by 3.4.x and older, but could 
still change - the 'devel' in 'R-devel'). These new checks have to stay 
but we are looking at improving package-maintainer-friendliness. It 
turned out more difficult than just 1-2 days, hence the temporary switch 
back to version 2.


Best
Tomas

On 01/11/2018 02:47 PM, luke-tier...@uiowa.edu wrote:

As things stand now, package tarballs with vignettes that are built
with R-devel will not install in R 3.4.x, so CRAN can't accept them
and someone running R CMD check --as-cran should be told that. A
WARNING is appropriate.

Most likely what will change soon is that build/version.rds will be
saved with serialization version = 2 and this warning will not be
triggered just by having a vignette. It will still be triggered by
data files serialized with R-devel's default version = 3.

Please do remember that the 'devel' in R-devel means exactly that:
things will at times be unstable. There are currently a lot of balls
flying around with changes in R-devel and also Biocontuctor, and the
CRAN maintainers are working hard to keep things all up in the
air. Please be patient.

Best,

luke

On Thu, 11 Jan 2018, Jim Hester wrote:


This change poses difficulties for automated build systems such as
travis-ci, which is widely used in the R community. In particular
because this is a WARNING and not a NOTE this causes all R-devel
builds with vignettes to fail, as the default settings fail the build
if R CMD check issues a WARNING.

The simplest change would be for R-core to change this message to be a
NOTE rather than a WARNING, the serialization could still be tested
and there would be a check against vignettes built with R-devel, but
it would not cause these builds to fail.

On Wed, Jan 10, 2018 at 3:52 PM, Duncan Murdoch
 wrote:

On 10/01/2018 1:26 PM, Neal Richardson wrote:


Hi,
Since yesterday I'm seeing `R CMD check --as-cran` failures on the
R-devel daily build (specifically, R Under development (unstable)
(2018-01-09 r74100)) for multiple packages:

* checking serialized R objects in the sources ... WARNING
Found file(s) with version 3 serialization:
‘build/vignette.rds’
Such files are only readable in R >= 3.5.0.
Recreate them with R < 3.5.0 or save(version = 2) or saveRDS(version =
2) as appropriate

As far as I can tell, revision 74099

(https://github.com/wch/r-source/commit/d9530001046a582ff6a43ca834d6c3586abd0a97), 


which changes the default serialization format to 3, clashes with
revision 73973
(https://github.com/wch/r-source/commit/885764eb74f2211a547b13727f2ecc5470c3dd00), 


which checks that serialized R objects are _not_ version 3. It seems
that with the current development version of R, if you `R CMD build`
and then run `R CMD check --as-cran` on the built package, it will
fail.



I think the message basically says:  don't do that.  You should 
build with

R-release for now.  You always need to check with R-devel, so life is
complicated.

If you build with R-devel without forcing the old format, nobody using
R-release will be able to use your tarball.

Eventually I guess the new format will be accepted by CRAN, but it will
likely be a while:  nobody expects everyone to instantly upgrade to 
a new R

release, let alone to an unreleased development version.

Presumably that particular file (build/vignette.rds) could be 
automatically
built in the old format for now, but the new format needs testing, 
so it

makes sense to me to leave it as a default, even if it makes it more
complicated to submit a package to CRAN.

Duncan Murdoch


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Buffer overflow in cairoBM.c line 402

2018-01-19 Thread Tomas Kalibera
Thanks for reporting - there is no need to reproduce this, it is an 
obvious error.
I'll probably fix by throwing an error - like it is done in devX11.c 
when the file names are too long.


Tomas

On 01/19/2018 09:41 PM, Omri Schwarz wrote:

Hi, all.

Testing a change to that line to
 strncpy(xd->filename, filename,PATH_MAX);
right now.

The bug itself I've yet to reproduce in anything that doesn't involve
my employer's proprietary code, but strcpy is strcpy, after all.



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Fortran programs not writing stdout on windows

2018-01-26 Thread Tomas Kalibera
For reference, this has been fixed in R-devel, 74168. The problem only 
exists on Windows with RGui.


As a workaround for older versions of R, one can unset environment 
variables "GFORTRAN_STDOUT_UNIT" and "GFORTRAN_STDERR_UNIT" for the 
duration of system/system2 calls that invoke external Fortran programs.


Please note that while external code, which is run via system/system2, 
can be implemented in Fortran and use Fortran I/O to write to standard 
error and standard output, code that is linked against R is not allowed 
to do that. Mixing C and Fortran I/O is dangerous due to potentially 
incompatible runtimes. On Windows, with RGui, there is an additional 
problem that the output needs to go to the RGui console. So all output, 
at the lowest level, has to go through C functions Rprintf/REprintf. 
More details are in Writing R Extensions (Printing, Printing from FORTRAN).


Tomas

On 06/20/2017 12:52 PM, Jeroen Ooms wrote:

A user has reported an issue that appears when a fortran executable is
called via R on Windows. I am unsure if this expected behavior or a
bug in Fortran or in how R calls Windows executables.

The problem is that when the fortran program is called from R, stdout
gets written to a file "fort.6" instead of stdout. When the same
executable is called from the terminal it works fine and prints to
stdout. This unexpected behavior is unfortunate for R wrappers that
rely on captured output.

A minimal example is available from github [1] and can be installed with

devtools::install_github("jeroen/ftest")
ftest::hello()

When running ftest::hello() on linux, R will properly capture output.
However on Windows it will return an empty string, and a file 'fort.6'
gets created in the working directory instead.

The executables can be found in: system.file("bin", package = "ftest")

Interestingly if we open a command line terminal and run the same
executable it prints output to stdout. So perhaps it has to do with
the way R invokes CreateProcess() on Windows?

[1] https://github.com/jeroen/ftest

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Error message: 'Rscript' should not be used without a path

2018-02-01 Thread Tomas Kalibera

Hi Michal,

On 02/01/2018 09:23 AM, Michal Burda wrote:

Dear R-devel members,

recently, I ran into the following error message (R-devel 2018-01-31):

'Rscript' should not be used without a path -- see par. 1.6 of the manual

I would like to know more about it, why is it required to run Rscript with
a path, and where is that par. 1.6 of the manual.

The manual is "Writing R Extensions"
https://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Writing-portable-packages

"
Do not invoke R by plain R, Rscript or (on Windows) Rterm in your 
examples, tests, vignettes, makefiles or other scripts. As pointed out 
in several places earlier in this manual, use something like

"$(R_HOME)/bin/Rscript"
"$(R_HOME)/bin$(R_ARCH_BIN)/Rterm"
with appropriate quotes (as, although not recommended, R_HOME can 
contain spaces).

"

This is needed to make sure that one does not run Rscript from a 
different version of R installed in the system. The quotes are important 
and it works on all platforms supported by R.


(for similar questions perhaps R-package-devel is a bit better list)

Best
Tomas



I get this error message during Travis r-devel build of my package for
generating makefiles. I am developing a makefile generator package, which
contains testthat unit tests that generate and run various makefiles in
/tmp. These makefiles run several "Rscript -e" commands. Everything works
OK on R-stable on Linux as well as on Windows, the only problem is with
R-devel on that Travis cloud builder. Could someone give me more
information about that error? Is there any workaround or do I really need
to obtain somehow the full path of Rscript and put it into the makefiles
(as it may be tricky for such makefile work on linux, macOs and Windows)?

Thanks, in advance.


Michal Burda

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Warning from Sys.junction when using network drive.

2018-02-08 Thread Tomas Kalibera


Unfortunately, junctions cannot link to a network drive, they only can 
link directories on the same computer (possibly on different local 
volumes). This is a limitation imposed by Windows. I have updated the 
documentation for Sys.junction in R-devel accordingly.


Tomas

On 02/06/2018 10:50 PM, MARK BANGHART wrote:

I am running 3.4.3 on a windows server and I ran the code in a new session.


I get a warning when running packrat::init() on a project that is located on a 
windows network drive.
The warning I get is


Warning message:
cannot set reparse point 'U:/packrat5/packrat/lib-R/base', reason 'Access is 
denied'


The error is created based inside the function .Internal(mkjunction(fr, link)) 
which is called from Sys.junction().  I have run Sys.junction inside the 
RStudio debugger and I checked that the 'U:/packrat5/packrat/lib-R/base'
could be accessed via the windows file explorer before the 
.Internal(mkjunction(fr, link)) call is made.  Looking at the code for 
do_mkjunction(), the warning looks to be thrown based on the return status from 
DeviceIoControl.


I setup a project on the C: drive and tried the same packrat::init() code.

The call to .Internal(mkjunction(fr, link)) did not produce an error.


I would appreciate any help you can provide on this issue.

Thanks,

Mark

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] saveRDS() overwrites file when object is not found

2018-02-08 Thread Tomas Kalibera
Thanks, this has been already reported as bug 17358. Addressed in 
R-devel 74238. R may still create a corrupt file, though, in other 
circumstances (e.g. if it runs out of memory or is interrupted during 
serialization, etc).


Tomas

On 02/07/2018 04:14 PM, Kenny Bell wrote:

I ran into this behaviour when accidentally running a line of code that I
shouldn't have.

When saving over an rds with an object that's not found, I would have
expected saveRDS to not touch the file.

saveRDS(iris, "test.rds")
file.size("test.rds")
#> [1] 1080
saveRDS(no_object_here, "test.rds")
#> Error in saveRDS(no_object_here, "test.rds"): object 'no_object_here'
not found
file.size("test.rds")
#> [1] 20
file.remove("test.rds")

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R Compilation gets stuck on Windows 64

2018-02-09 Thread Tomas Kalibera
Please note that building R on Windows is documented in "R Installation 
and Administration", including links to external software. Particularly 
there is a link to texinfo which is part of Rtools. The documentation is 
maintained and it is a sufficient source of information for building R 
on Windows.


https://cran.r-project.org/doc/manuals/r-release/R-admin.html
https://cran.r-project.org/bin/windows/base/rw-FAQ.html

Tomas

On 02/09/2018 08:16 AM, Indrajit Sen Gupta wrote:

Hi Avraham,

A quick question - I realized I did not have *Perl* installed. So I
installed *ActiveState Perl* right now. Also I see I need *texinfo* and
*texi2any*. I was able to installed *texinfo* from here:
http://gnuwin32.sourceforge.net/packages/texinfo.htm. But not sure where to
get *texi2any*. Can you guide me in this step?

Regards,
Indrajit

On Fri, Feb 9, 2018 at 11:58 AM, Indrajit Sen Gupta 
wrote:


Hi Avraham,

What a coincidence, I have been following this post of yours: https://www.
avrahamadler.com/2013/10/24/an-openblas-based-rblas-for-
windows-64-step-by-step/

Looks like this post is slightly older than what you have shared
previously. It is strange that you did not get the attachments. I am
pasting the contents of the MkRules.local here:

---

#-*- Makefile -*-

## This is only used when building R itself but it does customize
## etc/*/Makeconf using LOCAL_SOFT, BINPREF[64], IMPLIB and R_ARCH

## Customize by copying to MkRules.local and uncommenting and editing
## some of the definitions there.
##

## === configuration macros for building packages 
# Absolute path to '/usr/local' software collection.  The versions used
# on CRAN can be found at https://www.stats.ox.ac.uk/pub/Rtools/libs.html
# It can be interrogated by 'R CMD config LOCAL_SOFT'
# Use 'make rsync-extsoft' to populate the default directory.
# LOCAL_SOFT = D:/R64/extsoft

## == configuration macros for building R ===

# Path of library directory containing zlib, bzlib, liblzma, pcre,
# libpng, libjpeg, libtiff.
# Use 'make rsync-extsoft' to populate the default directory.
EXT_LIBS = D:/R64/extsoft

# an alternative is to use -gstabs here, if the debugger supports only
stabs.
# G_FLAG = -gdwarf-2

# Set to YES and specify the path if you want to use the ATLAS BLAS.
USE_ATLAS = YES
ATLAS_PATH =D:/home/thread0

# Support for the ACML and Goto BLASes has been withdrawn: see R-admin.html

# Define to use svnversion to set SVN-REVISION (slow, and requires a clean
# checkout with no modifications).
# USE_SVNVERSION = YES

# With the previously recommended gcc 4.6.3 toolchain, set this to 32 or 64
# MULTI = 64
# If the toolchain's bin directory is not in your path, set this to the
path
# (including the trailing /, and use / not \).
# TOOL_PATH =
# for other toolchains leave these empty and set the more detailed options
below

# With the recommended gcc 4.9.3 toolchain or another toolchain, set
# BINPREF and BINPREF64 (below) to the respective bin directories.
# Include the trailing /, and use / not \.
# Do this in the more detailed options below
# Set this to 32 or 64
WIN = 64


### BEGIN more detailed options
# Some of the toolchains have prefixes for e.g. ar, gcc.
# This can also be used to give the full path to the compiler,
# including a trailing / .
# BINPREF = c:/Rtools/mingw_32/bin/
# prefix for 64-bit:
BINPREF64 = D:/Rtools/mingw_64/bin/
# Set this to indicate a non-gcc compiler and version
# COMPILED_BY = 

# Others use a -m64 or -m32 option to select architectures
# M_ARCH = -m64
# and for as (--32 or --64)
# AS_ARCH = --64
# and for windres (-F pe-i386 or pe-x86-64)
# RC_ARCH = pe-x86-64
# and for dlltool ("-m i386 --as-flags --32" vs "-m i386:x86-64 --as-flags
--64")
DT_ARCH = -m i386:x86-64 --as-flags --64

# 32- or 64-bit Windows?
WIN = 64

# The gcc 4.9.3 64 bit toolchain is set up for the 'medium code' model and
needs
# to remove the .refptr and .weak entries from the exports list; this is
the default
# when WIN = 64, with blank for WIN = 32:
NM_FILTER = | $(SED) -e '/[.]refptr[.]/d' -e '/[.]weak[.]/d'

# We normally link directly against DLLs,
# but this macro forces the use of import libs
# Has been needed for some versions of MinGW-w64
USE_IMPLIBS = YES

### END more detailed options


# set to use ICU
USE_ICU = YES
# path to parent of ICU headers
ICU_PATH = D:/home/ICU
ICU_LIBS = -lsicuin -lsicuuc -lsicudt -lstdc++

# set to use libcurl
USE_LIBCURL = YES
# path to parent of libcurl headers
CURL_PATH = D:/home/curl
# libs: for 32-bit
# CURL_LIBS = -lcurl -lrtmp -lssl -lssh2 -lcrypto -lgdi32 -lcrypt32 -lz
-lws2_32 -lgdi32 -lcrypt32 -lwldap32 -lwinmm -lidn
# libs: for 64-bit
CURL_LIBS = -lcurl -lrtmp -lssl -lssh2 -lcrypto -lgdi32 -lcrypt32 -lz
-lws2_32 -lgdi32 -lcrypt32 -lwldap32 -lwinmm

# For the cairographics devices
# Optionally use a static build of cairographics from
#   https://www.rforge.net/Cairo/files/cairo-current-

Re: [Rd] makeCluster hangs

2018-02-12 Thread Tomas Kalibera
Also using R-devel might help - the forking support in parallel has been 
made more robust against race conditions, but the changes are probably 
too substantial to port to 3.4.x. If you find how to cause a race 
condition using parallel/forking in R-devel, a report would be greatly 
appreciated.


Tomas

On 02/11/2018 09:51 PM, T. Florian Jaeger wrote:

Dear Henrik,

thank you, for the quick reply. Bizarrely enough, the problem vanished when
I woke the computer from sleep (I had previously replicated the problem
after several restarts of both R and the MacOS).

I will follow-up if I can again replicate the problem.

Florian


On 2/10/18 4:39 PM, Henrik Bengtsson wrote:

A few quick comments:

* You mention R --vanilla, but make sure to try with
parallel::makeCluster(), so that you don't happen to pick up
snow::makeCluster() if 'snow' is attached and ahead of parallel on the
search() path.

* Try creating a single background worker, i.e. parallel::makeCluster(1L).

* Try with cl <- future::makeClusterPSOCK(1L, verbose = TRUE), which
gives the same thing, but it also show you some details on what it
does internally; that may give some clues where it stalls.

/Henrik

On Sat, Feb 10, 2018 at 12:11 PM, T. Florian Jaeger
 wrote:

Hi all,

I can't get the functionality of the package parallel to work. Specifically,
makeCluster() hangs when I run it. I first noticed the problem when trying
to run Rstan with multiple cores and the traced it back to the core package
parallel. The following results in R hanging after the call to makeCluster.

library(parallel)

# Calculate the number of cores
no_cores <- detectCores() - 1

# Initiate cluster
cl <- makeCluster(no_cores)

I'm running MacOS High Sierra 10.13.3 (17D47) on a MacbookPro 2017 laptop
with 4 cores.

platform   x86_64-apple-darwin15.6.0
arch   x86_64
os darwin15.6.0
system x86_64, darwin15.6.0
status
major  3
minor  4.3
year   2017
month  11
day30
svn rev73796
language   R
version.string R version 3.4.3 (2017-11-30)
nickname   Kite-Eating Tree

The problem replicates in R --vanilla


I've spent hours googling for solutions but can't find any reports of this
problem. Any help would be appreciated.


Florian

__
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIBaQ&c=kbmfwr1Yojg42sGEpaQh5ofMHBeTl9EI2eaqQZhHbOU&r=O6dqVFPEDpdoXY3wkv8u6o0LHKx4WbQ_itn0O87jj5s&m=R7DIWWqYTP2xarhrvKcymtN3XlAQ9vHLFDhPL6FxQ60&s=7F8Lez-XE8iC2JBU4JYsEtF3U0HObhMnCASud5xTgNM&e=

.



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Setting the path to Rtools for package compilation on Windows

2018-02-13 Thread Tomas Kalibera
Thanks for the report - this has been already reported as bug 17376, it 
is caused by scripts that build the Windows binaries and by now has been 
fixed in R-patched and R-devel snapshot builds. So as a solution that 
works now I would recommend using R-patched.


Tomas


On 02/13/2018 08:33 AM, Søren Højsgaard wrote:

I can confirm the behaviour that you report.

Usually I put Rtools in c:\programs\Rtools and modify the path
accordingly. Recently (don't recall for how long) I have encountered
the same problems as you have and I have resorted to moving Rtools to
c:\Rtools

I have no idea as how to proceed; perhaps it could be worth trying an
older version of Rtools (though that may cause other problems).

Regards
Søren


On Mon, 2018-02-12 at 22:45 -0800, Peter Langfelder wrote:

Hi all,

I'm trying to set up the Windows Rtools toolset for building packages
with compiled code. I installed for Windows R-3.4.3 from CRAN and
installed Rtools-3.4 in a custom location M:\R\R-3.4.3 and
M:\R\Rtools-3.4

Following the instructions, in shell, I set
Path=M:\R\Rtools-3.4\bin;M:\R\Rtools-3.4\gcc-4.6.3\bin;M:\R\R-
3.4.3\bin;...
(the ... are other paths irrelevant for R/Rtools).

When I run

M:\Work\RLibs>R.exe CMD INSTALL --build WGCNA

I get the following ouput:

In R CMD INSTALL
* installing to library 'M:/R/R-3.4.3/library'
* installing *source* package 'WGCNA' ...
** libs

*** arch - i386
c:/Rtools/mingw_32/bin/g++  -I"M:/R/R-3.4.3/include" -DNDEBUG
-O2 -Wall  -mtune=generic -c bucketApproxSort.cc
-o bucketApproxSort.o
c:/Rtools/mingw_32/bin/g++: not found
make: *** [bucketApproxSort.o] Error 127
Warning: running command 'make -f "Makevars.win" -f
"M:/R/R-3.4.3/etc/i386/Makeconf" -f "M:/R/R-3.4.3/share/make/winshli
b.mk" SHLIB_LDFLAGS='$(SHLIB_CXXLDFLAGS)' SHLIB_LD='$(SHLIB_CXXLD)'
SHLIB="WGCNA.dll" OBJECTS="bucketApproxSort.o corFun
ctions-common.o corFunctions-unified.o networkFunctions.o pivot.o
quantileC.o"' had status 2
ERROR: compilation failed for package 'WGCNA'
* removing 'M:/R/R-3.4.3/library/WGCNA'
* restoring previous 'M:/R/R-3.4.3/library/WGCNA'


Apparently the install is looking for Rtools in c:\Rtools. I am a
perpetual Windows newbie and would be really thankful for any
pointers
as to how to proceed.

Peter

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Fix minor typo in error message from grDevices

2018-02-13 Thread Tomas Kalibera
Fixed, thanks,
Tomas

On 02/12/2018 09:33 PM, John Blischak wrote:
> Hi,
>
> I fixed a minor typo in an error message from grDevices. Please see
> attached for a patch to revision 74246.
>
> Thanks,
>
> John
>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R-win] Bug 17159 - recursive dir.create() fails on windows shares due to permissions (MMaechler: Resending to R-windows@R-pr..)

2018-02-16 Thread Tomas Kalibera
Bug 17159 has been fixed (in R-devel), but there may be more issues left 
with UNC paths.

Tomas

On 01/17/2018 01:37 PM, Joris Meys wrote:

Hi Peter,

I share your experience with trying to help IT departments setting things
up. The network directory of the students is mapped to a drive, but R still
uses the unc path instead of the drive when attempting to create that user
library. Unless I do it manually of course. The only solution I see right
now is to set the HOME or R_LIBS_USER environment variable in Renviron, but
that should be done each time a new student logs into the computer. Or is
there a way to ensure R uses the mapped drive instead of the network unc
path, either using an R setting or by messing with Windows itself?

Cheers
Joris



On Wed, Jan 17, 2018 at 1:21 PM, Peter Dalgaard  wrote:


I can easily believe that. It was maily for Joris, that it might not be
necessary to reinstall.

-pd


On 17 Jan 2018, at 11:55 , Thompson, Pete 

wrote:

That solution works fine for the use case where each user has a network

based home directory and needs to run R from there, but doesn’t help with
my situation. I need to be able to support arbitrary network based paths in
arbitrary numbers – so mapping drives isn’t an option. I have found a
workaround using symbolic links to the network share created within the
temporary folder, but would much prefer that R support UNC paths – it seems
a reasonably simple fix.

Cheers
Pete


On 17/01/2018, 10:52, "Peter Dalgaard"  wrote:

I usually draw a complete blank if  I try to assist our IT department

with such issues (we really need better documentation than the Admin manual
for large-system installs by non-experts in R).

However, it is my impression that there are also options involving

environment variables and LFS naming. E.g., map the networked user
directory to, say, a P: "drive" and make sure that the environment is set
up to reflect this.

-pd


On 16 Jan 2018, at 17:52 , Joris Meys  wrote:

Hi all,

I ran into this exact issue yesterday during the exam of statistical
computing. Users can install packages in a user library that R tries to
create automatically on the network drive of the student. But that

doesn't

happen as the unc path is not read correctly, leading to R attempting to
create a local directory and being told it has no right to do so.

That is an older version of R though (3.3), but I'm wondering whether I
would ask our IT department to just update R on all these computers to

the

latest version, or if we have to look for another solution.

Cheers
Joris

On Mon, Jan 8, 2018 at 1:43 PM, Thompson, Pete 
Hi, I’d like to ask about bug 17159:

https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17159

I can confirm that I see exactly this bug when using dir.create on

paths

of UNC form (\\server\share\xxx), with the recursive flag set. I’m

seeing

this when attempting to use install.packages with such a path (which I

know

isn’t supported, but would be great if it was!). I can see that a

patch has

been suggested for the problem and from looking at the source code I
believe it’s a correct fix. Is there a possibility of getting this

patch

included?

The existing logic for Windows recursive dir.create (platform.c lines
2209-22203) appears to be:
- Skip over any \\share at the start of the directory name
- Loop while there are pieces of directory name left (i.e. we haven’t

hit

the last \ character)
= Find the next portion of the directory name (up to the next \
character)
= Attempt to create the directory (unless it is of the form x: - i.e. a
drive name)
= Ignore any ‘already exists’ errors, otherwise throw an error

This logic appears flawed in that it skips \\share which isn’t a valid
path format (according to https://msdn.microsoft.com/en-
us/library/windows/desktop/aa365247(v=vs.85).aspx ). Dredging my

memory,

it’s possible that \\share was a supported format in very old versions

of

Windows, but it’s been a long time since the UNC format came in. It’s

also

possible that \\share is a valid format in some odd environments, but

the

UNC format is far more widely used.

The patch suggested by Evan Cortens is simply to change the skip logic

to

skip over \\server\share instead of \\share. This will certainly fix

the

common use case of using UNC paths, but doesn’t attempt to deal with

all

the more complex options in Microsoft’s documentation. I doubt many

users

would ask for the complex cases, but the basic UNC format would be of

wide

applicability.

Thanks
Pete Thompson
Director, Information Technology
Head of Spotfire Centre of Excellence
IQVIA





IMPORTANT - PLEASE READ: This electronic message, including its
attachments, is CONFIDENTIAL and may contain PROPRIETARY or LEGALLY
PRIVILEGED or PROTECTED information and is intended for the authorized
recipient of the sender. If you are not the intended recipient, you are
hereby notified that any use, disclosure, copying, 

Re: [Rd] writeLines argument useBytes = TRUE still making conversions

2018-02-19 Thread Tomas Kalibera


I think it is as Kevin described in an earlier response - the garbled 
output is because a UTF-8 encoded string is assumed to be native 
encoding (which happens not to be UTF-8 on the platform where this is 
observed) and converted again to UTF-8.


I think the documentation is consistent with the observed behavior


   tmp <- 'é'
   tmp <- iconv(tmp, to = 'UTF-8')
   print(Encoding(tmp))
   print(charToRaw(tmp))
   tmpfilepath <- tempfile()
   writeLines(tmp, con = file(tmpfilepath, encoding = 'UTF-8'), useBytes = TRUE)

[1] "UTF-8"
[1] c3 a9

Raw text as hex: c3 83 c2 a9
useBytes=TRUE in writeLines means that the UTF-8 string will be passed 
byte-by-byte to the connection. encoding="UTF-8" tells the connection to 
convert the bytes to UTF-8 (from native encoding). So the second step is 
converting a string which is assumed to be in native encoding, but in 
fact it is in UTF-8.


The documentation describes "useBytes=TRUE" as for expert use only, it 
can be useful for avoiding unnecessary conversions in some special 
cases, but one has then to make sure that no more conversions are 
attempted (so use "" as encoding of in "file", for instance). The long 
advice short would be to not use useBytes=TRUE with writeLines, but 
depend on the default behavior.


Tomas


On 02/17/2018 11:24 PM, Kevin Ushey wrote:

Of course, right after writing this e-mail I tested on my Windows
machine and did not see what I expected:


charToRaw(before)

[1] c3 a9

charToRaw(after)

[1] e9

so obviously I'm misunderstanding something as well.

Best,
Kevin

On Sat, Feb 17, 2018 at 2:19 PM, Kevin Ushey  wrote:

 From my understanding, translation is implied in this line of ?file (from the
Encoding section):

 The encoding of the input/output stream of a connection can be specified
 by name in the same way as it would be given to iconv: see that help page
 for how to find out what encoding names are recognized on your platform.
 Additionally, "" and "native.enc" both mean the ‘native’ encoding, that is
 the internal encoding of the current locale and hence no translation is
 done.

This is also hinted at in the documentation in ?readLines for its 'encoding'
argument, which has a different semantic meaning from the 'encoding' argument
as used with R connections:

 encoding to be assumed for input strings. It is used to mark character
 strings as known to be in Latin-1 or UTF-8: it is not used to re-encode
 the input. To do the latter, specify the encoding as part of the
 connection con or via options(encoding=): see the examples.

It might be useful to augment the documentation in ?file with something like:

 The 'encoding' argument is used to request the translation of strings when
 writing to a connection.

and, perhaps to further drive home the point about not translating when
encoding = "native.enc":

 Note that R will not attempt translation of strings when encoding is
 either "" or "native.enc" (the default, as per getOption("encoding")).
 This implies that attempting to write, for example, UTF-8 encoded content
 to a connection opened using "native.enc" will retain its original UTF-8
 encoding -- it will not be translated.

It is a bit surprising that 'native.enc' means "do not translate" rather than
"attempt translation to the encoding associated with the current locale", but
those are the semantics and they are not bound to change.

This is the code I used to convince myself of that case:

 conn <- file(tempfile(), encoding = "native.enc", open = "w+")

 before <- iconv('é', to = "UTF-8")
 cat(before, file = conn, sep = "\n")
 after <- readLines(conn)

 charToRaw(before)
 charToRaw(after)

with output:

 > charToRaw(before)
 [1] c3 a9
 > charToRaw(after)
 [1] c3 a9

Best,
Kevin


On Thu, Feb 15, 2018 at 9:16 AM, Ista Zahn  wrote:

On Thu, Feb 15, 2018 at 11:19 AM, Kevin Ushey  wrote:

I suspect your UTF-8 string is being stripped of its encoding before
write, and so assumed to be in the system native encoding, and then
re-encoded as UTF-8 when written to the file. You can see something
similar with:

 > tmp <- 'é'
 > tmp <- iconv(tmp, to = 'UTF-8')
 > Encoding(tmp) <- "unknown"
 > charToRaw(iconv(tmp, to = "UTF-8"))
 [1] c3 83 c2 a9

It's worth saying that:

 file(..., encoding = "UTF-8")

means "attempt to re-encode strings as UTF-8 when writing to this
file". However, if you already know your text is UTF-8, then you
likely want to avoid opening a connection that might attempt to
re-encode the input. Conversely (assuming I'm understanding the
documentation correctly)

 file(..., encoding = "native.enc")

means "assume that strings are in the native encoding, and hence
translation is unnecessary". Note that it does not mean "attempt to
translate strings to the native encoding".

If all that is true I think ?file needs some attention. I've read it
several times now and I just don't see how

Re: [Rd] readLines interaction with gsub different in R-dev

2018-02-19 Thread Tomas Kalibera

Thank you for the report and analysis. Now fixed in R-devel.
Tomas

On 02/17/2018 08:24 PM, William Dunlap via R-devel wrote:

I think the problem in R-devel happens when there are non-ASCII characters
in any
of the strings passed to gsub.

txt <- vapply(list(as.raw(c(0x41, 0x6d, 0xc3, 0xa9, 0x6c, 0x69, 0x65)),
as.raw(c(0x41, 0x6d, 0x65, 0x6c, 0x69, 0x61))), rawToChar, "")
txt
#[1] "Amélie" "Amelia"
Encoding(txt)
#[1] "unknown" "unknown"
gsub(perl=TRUE, "(\\w)(\\w)", "<\\L\\1\\U\\2>", txt)
#[1] "", txt[1])
#[1] "", txt[2])
#[1] ""

I can change the Encoding to "latin1" or "UTF-8" and get similar results
from gsub.


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Sat, Feb 17, 2018 at 7:35 AM, Hugh Parsonage 
wrote:


| Confirmed for R-devel (current) on Ubuntu 17.10.  But ... isn't the
regexp
| you use wrong, ie isn't R-devel giving the correct answer?

No, I don't think R-devel is correct (or at least consistent with the
documentation). My interpretation of gsub("(\\w)", "\\U\\1", entry,
perl = TRUE) is "Take every word character and replace it with itself,
converted to uppercase."

Perhaps my example was too minimal. Consider the following:

R> gsub("(\\w)", "\\U\\1", entry, perl = TRUE)
[1] "A"

R> gsub("(\\w)", "\\1", entry, perl = TRUE)
[1] "author: Amélie"   # OK, but very different to 'A', despite only
not specifying uppercase

R> gsub("(\\w)", "\\U\\1", "author: Amelie", perl = TRUE)
[1] "AUTHOR: AMELIE"  # OK, but very different to 'A',

R> gsub("^(\\w+?): (\\w)", "\\U\\1\\E: \\2", entry, perl = TRUE)
  "AUTHOR"  # Where did everything after the first group go?

I should note the following example too:
R> gsub("(\\w)", "\\U\\1", entry, perl = TRUE, useBytes = TRUE)
[1] "AUTHOR: AMéLIE"  # latin1 encoding


A call to `readLines` (possibly `scan()` and `read.table` and friends)
is essential.




On 18 February 2018 at 02:15, Dirk Eddelbuettel  wrote:

On 17 February 2018 at 21:10, Hugh Parsonage wrote:
| I was told to re-raise this issue with R-dev:
|
| In the documentation of R-dev and R-3.4.3, under ?gsub
|
| > replacement
| >... For perl = TRUE only, it can also contain "\U" or "\L" to

convert the rest of the replacement to upper or lower case and "\E" to end
case conversion.

|
| However, the following code runs differently:
|
| tempf <- tempfile()
| writeLines(enc2utf8("author: Amélie"), con = tempf, useBytes = TRUE)
| entry <- readLines(tempf, encoding = "UTF-8")
| gsub("(\\w)", "\\U\\1", entry, perl = TRUE)
|
|
| "AUTHOR: AMÉLIE"  # R-3.4.3
|
| "A"  # R-dev

Confirmed for R-devel (current) on Ubuntu 17.10.  But ... isn't the

regexp

you use wrong, ie isn't R-devel giving the correct answer?

R> tempf <- tempfile()
R> writeLines(enc2utf8("author: Amélie"), con = tempf, useBytes = TRUE)
R> entry <- readLines(tempf, encoding = "UTF-8")
R> gsub("(\\w)", "\\U\\1", entry, perl = TRUE)
[1] "A"
R> gsub("(\\w+)", "\\U\\1", entry, perl = TRUE)
[1] "AUTHOR"
R> gsub("(.*)", "\\U\\1", entry, perl = TRUE)
[1] "AUTHOR: AMÉLIE"
R>

Dirk

--
http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Unnecessary lines in stem.c?

2018-02-20 Thread Tomas Kalibera

Thanks! Cleaned up in R-devel,
Tomas

On 02/16/2018 05:03 PM, S Ellison wrote:

A discussion on r-help led me to look at stem.c at
https://github.com/wch/r-source/blob/trunk/src/library/graphics/src/stem.c

Lines 76-77 appear superfluous. They sit inside a condition, and set mu, as 
follows:
if (k*(k-4)*(k-8) == 0) mu = 5;
if ((k-1)*(k-5)*(k-6) == 0) mu = 20;

But mu is set unconditionally to 10 on line 84, and that is followed by 
conditional assignments (on line 85-6) identical to lines 76-77.

It looks like a couple of lines got left inside a condition that are no longer 
needed there. If that is correct, is it worth removing the superfluous lines, 
for future coders' benefit?

S Ellison
  



***
This email and any attachments are confidential. Any u...{{dropped:7}}


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] How to modify dots and dispatch NextMethod

2018-02-22 Thread Tomas Kalibera


The example is invoking NextMethod via an anonymous function, which is 
not allowed (see documentation for NextMethod). Normally one gets a 
runtime error "'NextMethod' called from an anonymous function", but not 
here as the anonymous function is called via do.call. I will fix so that 
there is a runtime error in this case as well, thanks for uncovering 
this problem.


I don't think there is a way to replace (unnamed) arguments in dots for 
NextMethod.


Tomas

On 02/21/2018 02:16 PM, Iñaki Úcar wrote:

I've set up a repo with a reproducible example of the issue described
in my last email:

https://github.com/Enchufa2/dispatchS3dots

Iñaki

2018-02-20 19:33 GMT+01:00 Iñaki Úcar :

Hi all,

Not sure if this belongs to R-devel or R-package-devel. Anyways...

Suppose we have objects of class c("foo", "bar"), and there are two S3
methods c.foo, c.bar. In c.foo, I'm trying to modify the dots and
forward the dispatch using NextMethod without any success. This is
what I've tried so far:

c.foo <- function(..., recursive=FALSE) {
   dots <- list(...)
   # inspect and modify dots
   # ...
   do.call(
 function(..., recursive=FALSE) structure(NextMethod("c"), class="foo"),
 c(dots, recursive=recursive)
   )
}

foobar <- 1
class(foobar) <- c("foo", "bar")
c(foobar, foobar)
Error: C stack usage  7970788 is too close to the limit

There's recursion (!). But the funniest thing is that if c.foo is
exported by one package and c.bar is exported by another one, there's
no recursion, but c.bar is never called (!!). Why is the same code
behaving in a different way depending on whether these functions are
defined in the .GlobalEnv or in two attached packages? (BTW,
isS3method() is TRUE, for c.foo and c.bar in both cases).

I'm blocked here. Am I missing something? Is there a way of doing
this? Thanks in advance.

Regards,
Iñaki





__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] How to modify dots and dispatch NextMethod

2018-02-22 Thread Tomas Kalibera

On 02/22/2018 12:07 PM, Iñaki Úcar wrote:

2018-02-22 10:29 GMT+01:00 Tomas Kalibera :

The example is invoking NextMethod via an anonymous function, which is not
allowed (see documentation for NextMethod).

Thanks for your response. I definitely missed that bit.


Normally one gets a runtime
error "'NextMethod' called from an anonymous function", but not here as the
anonymous function is called via do.call. I will fix so that there is a
runtime error in this case as well, thanks for uncovering this problem.

Then I did well chosing this list! Please also note that you could
take that anonymous function out of the method and name it, and the
behaviour would be the same. So maybe this case should issue an error
too.
I am not sure I understand how, but if you find a way to bypass the new 
check for an anonymous function (I intend to commit tomorrow), I will be 
happy to have a look if you provide a reproducible example.

I don't think there is a way to replace (unnamed) arguments in dots for
NextMethod.

That's a pity. IMHO, it should be some mechanism for that, but dots
are special in inscrutable ways.

Anyway, for anyone insterested, I found a workaround:

https://github.com/Enchufa2/dispatchS3dots#workaround
Even though technically this won't be too hard, I don't think NextMethod 
should be made any more complex than it is now. There should always be a 
way to implement special dispatch scenarios in R and your workaround 
shows it is possible specifically in your scenario.


Tomas


Tomas



Iñaki


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] How to modify dots and dispatch NextMethod

2018-02-22 Thread Tomas Kalibera

On 02/22/2018 02:31 PM, Iñaki Úcar wrote:

2018-02-22 12:39 GMT+01:00 Tomas Kalibera :

On 02/22/2018 12:07 PM, Iñaki Úcar wrote:

2018-02-22 10:29 GMT+01:00 Tomas Kalibera :

The example is invoking NextMethod via an anonymous function, which is
not
allowed (see documentation for NextMethod).

Thanks for your response. I definitely missed that bit.


Normally one gets a runtime
error "'NextMethod' called from an anonymous function", but not here as
the
anonymous function is called via do.call. I will fix so that there is a
runtime error in this case as well, thanks for uncovering this problem.

Then I did well chosing this list! Please also note that you could
take that anonymous function out of the method and name it, and the
behaviour would be the same. So maybe this case should issue an error
too.

I am not sure I understand how, but if you find a way to bypass the new
check for an anonymous function (I intend to commit tomorrow), I will be
happy to have a look if you provide a reproducible example.

I meant with a named function inside do.call, instead of an anonymous
one. For example:

c.foo <- function(..., recursive=FALSE) {
   message("calling c.foo...")
   dots <- list(...)
   # inspect and modify dots; for example:
   if (length(dots > 1))
 dots[[2]] <- 2
   do.call(
 c.foo.proxy,
 c(dots, recursive=recursive)
   )
}

c.foo.proxy <- function(..., recursive=FALSE)
structure(NextMethod("c"), class="foo")

Right now, the effect of the code above is the same as with the
anonymous function. Shouldn't it issue a similar error then?

Yes, it will also result in runtime error after the change is committed:

calling c.foo...
Error in NextMethod("c") : 'NextMethod' called from an anonymous function

I don't think there is a way to replace (unnamed) arguments in dots for
NextMethod.

That's a pity. IMHO, it should be some mechanism for that, but dots
are special in inscrutable ways.

Anyway, for anyone insterested, I found a workaround:

https://github.com/Enchufa2/dispatchS3dots#workaround

Even though technically this won't be too hard, I don't think NextMethod
should be made any more complex than it is now. There should always be a way
to implement special dispatch scenarios in R and your workaround shows it is
possible specifically in your scenario.

My only concern about this workaround is that it triggers the dispatch
stack again from the beginning of the class hierarchy, which seems not
very elegant nor efficient.
There may be a more elegant way, but that'd be a question for R-help and 
it might be worth giving a broader context for what you want to achieve. 
Also please note that S3 dispatch is done on the first argument, and c() 
gives no special meaning to its first argument, what if e.g. the second 
argument is of class "foo" but the first is not - is S3/NextMethod 
really a good fit here?


Tomas




Tomas


Iñaki


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Possible typo in the C source code of R

2018-02-26 Thread Tomas Kalibera
Thank you, Martin, for spotting this, it is clearly a bug, originally a 
conformance check was intended here and time series were defined using 
integers, so exact comparison would have made sense. Now time series are 
defined using doubles and exact comparison could be too strict with 
rounding errors. Moreover, it is not clear whether a conformance check 
at this low-level is a good thing, so the check has been removed 
completely, keeping the current behavior of R (except NaNs in the 
definition).

Best
Tomas

On 02/07/2018 04:12 PM, Martin Bodin wrote:
> Good morning,
>
> I am Martin Bodin, a postdoc at the CMM in Santiago de Chile, and I am
> currently in the process of formalising (a part of) the R language into
> the Coq proof assistant. This work makes me look frequently at the
> source code of R.
>
> I have noticed a strange line in the file src/main/util.c of the trunk
> branch:
> https://github.com/wch/r-source/blob/e42531eff56ee6582d7dc6a46f242af5447c633e/src/main/util.c#L70
>
> The line 70 “REAL(x)[0] == REAL(x)[0]” doesn’t make any sense for me:
> are we looking for NaN values here? I think that it should be
> “REAL(x)[0] == REAL(y)[0]” instead (and the same applies for the next
> two lines).
>
> I didn’t searched for any R program that may be affected by this typo,
> but I have the feeling that it may lead to unexpected behaviours.
>
>  From what I understood, the bug reporting tool for R is closed for
> non-members, and I am thus sending this possible bug report in this
> list. Please redirect me if I am not reporting in the right place.
>
> Best regards,
> Martin Bodin.
>
>
>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [parallel] fixes load balancing of parLapplyLB

2018-02-26 Thread Tomas Kalibera

Dear Christian and Henrik,

thank you for spotting the problem and suggestions for a fix. We'll 
probably add a chunk.size argument to parLapplyLB and parLapply to 
follow OpenMP terminology, which has already been an inspiration for the 
present code (parLapply already implements static scheduling via 
internal function staticClusterApply, yet with a fixed chunk size; 
parLapplyLB already implements dynamic scheduling via internal function 
dynamicClusterApply, but with a fixed chunk size set to an unlucky value 
so that it behaves like static scheduling). The default chunk size for 
parallelLapplyLB will be set so that there is some dynamism in the 
schedule even by default. I am now testing a patch with these changes.


Best
Tomas


On 02/20/2018 11:45 AM, Christian Krause wrote:

Dear Henrik,

The rationale is just that it is within these extremes and that it is really 
simple to calculate, without making any assumptions and knowing that it won't 
be perfect.

The extremes A and B you are mentioning are special cases based on assumptions. 
Case A is based on the assumption that the function has a long runtime or 
varying runtime, then you are likely to get the best load balancing with really 
small chunks. Case B is based on the assumption that the function runtime is 
the same for each list element, i.e. where you don't actually need load 
balancing, i.e. just use `parLapply` without load balancing.

This new default is **not the best one**. It's just a better one than we had 
before. There is no best one we can use as default because **we don't know the 
function runtime and how it varies**. The user needs to decide that because 
he/she knows the function. As mentioned before, I will write a patch that makes 
the chunk size an optional argument, so the user can decide because only he/she 
has all the information to choose the best chunk size, just like you did with 
the `future.scheduling` parameter.

Best Regards

On February 19, 2018 10:11:04 PM GMT+01:00, Henrik Bengtsson 
 wrote:

Hi, I'm trying to understand the rationale for your proposed amount of
splitting and more precisely why that one is THE one.

If I put labels on your example numbers in one of your previous post:

nbrOfElements <- 97
nbrOfWorkers <- 5

With these, there are two extremes in how you can split up the
processing in chunks such that all workers are utilized:

(A) Each worker, called multiple times, processes one element each
time:


nbrOfElements <- 97
nbrOfWorkers <- 5
nbrOfChunks <- nbrOfElements
sapply(parallel:::splitList(1:nbrOfElements, nbrOfChunks), length)

[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[30] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[59] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[88] 1 1 1 1 1 1 1 1 1 1


(B) Each worker, called once, processes multiple element:


nbrOfElements <- 97
nbrOfWorkers <- 5
nbrOfChunks <- nbrOfWorkers
sapply(parallel:::splitList(1:nbrOfElements, nbrOfChunks), length)

[1] 20 19 19 19 20

I understand that neither of these two extremes may be the best when
it comes to orchestration overhead and load balancing. Instead, the
best might be somewhere in-between, e.g.

(C) Each worker, called multiple times, processing multiple elements:


nbrOfElements <- 97
nbrOfWorkers <- 5
nbrOfChunks <- nbrOfElements / nbrOfWorkers
sapply(parallel:::splitList(1:nbrOfElements, nbrOfChunks), length)

[1] 5 5 5 5 4 5 5 5 5 5 4 5 5 5 5 4 5 5 5 5

However, there are multiple alternatives between the two extremes, e.g.


nbrOfChunks <- scale * nbrOfElements / nbrOfWorkers

So, is there a reason why you argue for scale = 1.0 to be the optimal?

FYI, In future.apply::future_lapply(X, FUN, ...) there is a
'future.scheduling' scale factor(*) argument where default
future.scheduling = 1 corresponds to (B) and future.scheduling = +Inf
to (A).  Using future.scheduling = 4 achieves the amount of
load-balancing you propose in (C).   (*) Different definition from the
above 'scale'. (Disclaimer: I'm the author)

/Henrik

On Mon, Feb 19, 2018 at 10:21 AM, Christian Krause
 wrote:

Dear R-Devel List,

I have installed R 3.4.3 with the patch applied on our cluster and

ran a *real-world* job of one of our users to confirm that the patch
works to my satisfaction. Here are the results.

The original was a series of jobs, all essentially doing the same

stuff using bootstrapped data, so for the original there is more data
and I show the arithmetic mean with standard deviation. The
confirmation with the patched R was only a single instance of that
series of jobs.

## Job Efficiency

The job efficiency is defined as (this is what the `qacct-efficiency`

tool below does):

```
efficiency = cputime / cores / wallclocktime * 100%
```

In simpler words: how well did the job utilize its CPU cores. It

shows the percentage of time the job was actually doing stuff, as
opposed to the difference:

```
wasted = 100% - efficiency
```

... which, essentially, tells us how much of

Re: [Rd] [parallel] fixes load balancing of parLapplyLB

2018-03-13 Thread Tomas Kalibera


Chunk size support has been added in R-devel 74353. Please let me know 
if you find any problem.


Thanks,
Tomas

On 03/01/2018 09:19 AM, Christian Krause wrote:

Dear Tomas,

Thanks for your commitment to fix this issue and also to add the chunk size as 
an argument. If you want our input, let us know ;)

Best Regards

On 02/26/2018 04:01 PM, Tomas Kalibera wrote:

Dear Christian and Henrik,

thank you for spotting the problem and suggestions for a fix. We'll probably 
add a chunk.size argument to parLapplyLB and parLapply to follow OpenMP 
terminology, which has already been an inspiration for the present code 
(parLapply already implements static scheduling via internal function 
staticClusterApply, yet with a fixed chunk size; parLapplyLB already implements 
dynamic scheduling via internal function dynamicClusterApply, but with a fixed 
chunk size set to an unlucky value so that it behaves like static scheduling). 
The default chunk size for parallelLapplyLB will be set so that there is some 
dynamism in the schedule even by default. I am now testing a patch with these 
changes.

Best
Tomas


On 02/20/2018 11:45 AM, Christian Krause wrote:

Dear Henrik,

The rationale is just that it is within these extremes and that it is really 
simple to calculate, without making any assumptions and knowing that it won't 
be perfect.

The extremes A and B you are mentioning are special cases based on assumptions. 
Case A is based on the assumption that the function has a long runtime or 
varying runtime, then you are likely to get the best load balancing with really 
small chunks. Case B is based on the assumption that the function runtime is 
the same for each list element, i.e. where you don't actually need load 
balancing, i.e. just use `parLapply` without load balancing.

This new default is **not the best one**. It's just a better one than we had 
before. There is no best one we can use as default because **we don't know the 
function runtime and how it varies**. The user needs to decide that because 
he/she knows the function. As mentioned before, I will write a patch that makes 
the chunk size an optional argument, so the user can decide because only he/she 
has all the information to choose the best chunk size, just like you did with 
the `future.scheduling` parameter.

Best Regards

On February 19, 2018 10:11:04 PM GMT+01:00, Henrik Bengtsson 
 wrote:

Hi, I'm trying to understand the rationale for your proposed amount of
splitting and more precisely why that one is THE one.

If I put labels on your example numbers in one of your previous post:

nbrOfElements <- 97
nbrOfWorkers <- 5

With these, there are two extremes in how you can split up the
processing in chunks such that all workers are utilized:

(A) Each worker, called multiple times, processes one element each
time:


nbrOfElements <- 97
nbrOfWorkers <- 5
nbrOfChunks <- nbrOfElements
sapply(parallel:::splitList(1:nbrOfElements, nbrOfChunks), length)

[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[30] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[59] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[88] 1 1 1 1 1 1 1 1 1 1


(B) Each worker, called once, processes multiple element:


nbrOfElements <- 97
nbrOfWorkers <- 5
nbrOfChunks <- nbrOfWorkers
sapply(parallel:::splitList(1:nbrOfElements, nbrOfChunks), length)

[1] 20 19 19 19 20

I understand that neither of these two extremes may be the best when
it comes to orchestration overhead and load balancing. Instead, the
best might be somewhere in-between, e.g.

(C) Each worker, called multiple times, processing multiple elements:


nbrOfElements <- 97
nbrOfWorkers <- 5
nbrOfChunks <- nbrOfElements / nbrOfWorkers
sapply(parallel:::splitList(1:nbrOfElements, nbrOfChunks), length)

[1] 5 5 5 5 4 5 5 5 5 5 4 5 5 5 5 4 5 5 5 5

However, there are multiple alternatives between the two extremes, e.g.


nbrOfChunks <- scale * nbrOfElements / nbrOfWorkers

So, is there a reason why you argue for scale = 1.0 to be the optimal?

FYI, In future.apply::future_lapply(X, FUN, ...) there is a
'future.scheduling' scale factor(*) argument where default
future.scheduling = 1 corresponds to (B) and future.scheduling = +Inf
to (A).  Using future.scheduling = 4 achieves the amount of
load-balancing you propose in (C).   (*) Different definition from the
above 'scale'. (Disclaimer: I'm the author)

/Henrik

On Mon, Feb 19, 2018 at 10:21 AM, Christian Krause
 wrote:

Dear R-Devel List,

I have installed R 3.4.3 with the patch applied on our cluster and

ran a *real-world* job of one of our users to confirm that the patch
works to my satisfaction. Here are the results.

The original was a series of jobs, all essentially doing the same

stuff using bootstrapped data, so for the original there is more data
and I show the arithmetic mean with standard deviation. The
confirmation with the patched R w

Re: [Rd] clusterApply arguments

2018-03-15 Thread Tomas Kalibera

On 03/15/2018 05:25 PM, Henrik Bengtsson wrote:

On Thu, Mar 15, 2018 at 3:39 AM,   wrote:

Thank you for your answer!
I agree with you except for the 3 (Error) example and
I realize now I should have started with that in the explanation.

 From my point of view
parLapply(cl = clu, X = 1:2, fun = fun, c = 1)
shouldn't give an error.

This could be easily avoided by using all the argument
names in the custerApply call of parLapply which means changing,

parLapply <- function(cl = NULL, X, fun, ...)  {
 cl <- defaultCluster(cl)
 do.call(c, clusterApply(cl, x = splitList(X, length(cl)),
 fun = lapply, fun, ...), quote = TRUE)
}

to

parLapply <- function (cl = NULL, X, fun, ...)  {
 cl <- defaultCluster(cl)
 do.call(c, clusterApply(cl = cl, x = splitList(X, length(cl)),
 fun = lapply, fun, ...), quote = TRUE)
}

Oh... sorry I missed that point.  Yes, I agree, this should be a
trivial fix to the 'parallel' package.

/Henrik
Yes, thanks for the report, I am testing a fix for this (and other 
missing argument names in calls involving ...) in parallel.

Tomas



.

Best regards,
Florian



Gesendet: Mittwoch, 14. März 2018 um 19:05 Uhr
Von: "Henrik Bengtsson" 
An: "Florian Schwendinger" 
Cc: fschw...@wu.ac.at, R-devel 
Betreff: Re: [Rd] clusterApply arguments
This is nothing specific to parallel::clusterApply() per se. It is the
default behavior of R where it allows for partial argument names. I
don't think there's much that can be done here except always using
fully named arguments to the "apply" function itself as you show.

You can "alert" yourself when there's a mistake by using:

options(warnPartialMatchArgs = TRUE)

e.g.


clusterApply(clu, x = 1:2, fun = fun, c = 1) ## Error

Warning in clusterApply(clu, x = 1:2, fun = fun, c = 1) :
partial argument match of 'c' to 'cl'
Error in checkCluster(cl) : not a valid cluster

It's still only a warning, but an informative one.

/Henrik

On Wed, Mar 14, 2018 at 8:50 AM, Florian Schwendinger
 wrote:

Hi!

I recognized that the argument matching of clusterApply (and therefore parLapply) goes wrong when one of the 
arguments of the function is called "c". In this case, the argument "c" is used as 
cluster and the functions give the following error message "Error in checkCluster(cl) : not a valid 
cluster".

Of course, "c" is for many reasons an unfortunate argument name and this can be 
easily fixed by the user side.

See below for a small example.

library(parallel)

clu <- makeCluster(2, "PSOCK")

fun <- function(x0, x1) (x0 + x1)
clusterApply(clu, x = 1:2, fun = fun, x1 = 1) ## OK
parLapply(cl = clu, X = 1:2, fun = fun, x1 = 1) #OK


fun <- function(b, c) (b + c)
clusterApply(clu, x = 1:2, fun = fun, c = 1) ## Error
clusterApply(cl = clu, x = 1:2, fun = fun, c = 1) ## OK
parLapply(cl = clu, X = 1:2, fun = fun, c = 1) ## Error

stopCluster(clu)


I used "R version 3.4.3 Patched (2018-01-07 r74099".


Best regards,
Florian

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel[https://stat.ethz.ch/mailman/listinfo/r-devel]



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Discrepancy: R sum() VS C or Fortran sum

2018-03-16 Thread Tomas Kalibera
R uses long double type for the accumulator (on platforms where it is 
available). This is also mentioned in ?sum:


"Where possible extended-precision accumulators are used, typically well 
supported with C99 and newer, but possibly platform-dependent."


Tomas

On 03/16/2018 06:08 PM, Pierre Chausse wrote:
My simple functions were to compare the result with the gfortran 
compiler sum() function.  I thought that the Fortran sum could not be 
less precise than R. I was wrong. I am impressed. The R sum does in 
fact match the result if we use the Kahan algorithm.


P.

I am glad to see that R sum() is more accurate than the gfortran 
compiler sum.


On 16/03/18 11:37 AM, luke-tier...@uiowa.edu wrote:

Install the gmp package, run your code, and then try this:

bu <- gmp::as.bigq(u)
bs4 <- bu[1] + bu[2] + bu[3] + bu[4] + bu[5]
s4 <- as.double(bs4)
s1 - s4
##  [1] 0
s2[[2]] - s4
##  [1] 7.105427e-15
s3 - s4
##  [1] 7.105427e-15
identical(s1, s4)
##  [1] TRUE

`bs4` is the exact sum of the binary rationals in your `u` vector;
`s4` is the closest double precision to this exact sum.

Looking at the C source code for sum() will show you that it makes
some extra efforts to get a more accurate sum than your simple
version.

Best,

luke

On Fri, 16 Mar 2018, Pierre Chausse wrote:


Hi all,

I found a discrepancy between the sum() in R and either a sum done 
in C or Fortran for vector of just 5 elements. The difference is 
very small, but this is a very small part of a much larger numerical 
problem in which first and second derivatives are computed 
numerically. This is part of a numerical method course I am teaching 
in which I want to compare speeds of R versus Fortran (We solve a 
general equilibrium problem all numerically, if you want to know). 
Because of this discrepancy, the Jacobian and Hessian in R versus in 
Fortran are quite different, which results in the Newton method 
producing a different solution (for a given stopping rule). Since 
the solution computed in Fortran is almost identical to the 
analytical solution, I suspect that the sum in Fortran may be more 
accurate (That's just a guess).  Most of the time the sum produces 
identical results, but for some numbers, it is different. The 
following example, shows what happens:


set.seed(12233)
n <- 5
a <- runif(n,1,5)
e <- runif(n, 5*(1:n),10*(1:n))
s <- runif(1, 1.2, 4)
p <- runif(5, 3, 10)
x <- c(e[-5], (sum(e*p)-sum(e[-5]*p[-5]))/p[5])
u <- a^(1/s)*x^((s-1)/s)
dyn.load("sumF.so")

u[1] <- u[1]+.0001 ### If we do not add .0001, all differences are 0
s1 <- sum(u)
s2 <- .Fortran("sumf", as.double(u), as.integer(n), sf1=double(1),
  sf2=double(1))[3:4]
s3 <- .C("sumc", as.double(u), as.integer(n), sC=double(1))[[3]]

s1-s2[[1]] ## R versus compiler sum() (Fortran)

[1] -7.105427e-15

s1-s2[[2]] ## R versus manual sum (Fortran

[1] -7.105427e-15

s1-s3 ## R Versus manual sum in C

[1] -7.105427e-15

s2[[2]]-s2[[1]] ## manual sum versus compiler sum() (Fortran)

[1] 0

s3-s2[[2]] ## Fortran versus C

[1] 0

My sumf and sumc are

 subroutine sumf(x, n, sx1, sx2)
 integer i, n
 double precision x(n), sx1, sx2
 sx1 = sum(x)
 sx2 = 0.0d0
 do i=1,n
    sx2 = sx2+x(i)
 end do
 end

void sumc(double *x, int *n, double *sum)
{
 int i;
 double sum1 = 0.0;
 for (i=0; i< *n; i++) {
   sum1 += x[i];
 }
 *sum = sum1;
}

Can that be a bug?  Thanks.








__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] clusterApply arguments

2018-03-16 Thread Tomas Kalibera

Fixed in R-devel (74418).
Tomas

On 03/15/2018 08:57 PM, Tomas Kalibera wrote:

On 03/15/2018 05:25 PM, Henrik Bengtsson wrote:

On Thu, Mar 15, 2018 at 3:39 AM,  wrote:

Thank you for your answer!
I agree with you except for the 3 (Error) example and
I realize now I should have started with that in the explanation.

 From my point of view
parLapply(cl = clu, X = 1:2, fun = fun, c = 1)
shouldn't give an error.

This could be easily avoided by using all the argument
names in the custerApply call of parLapply which means changing,

parLapply <- function(cl = NULL, X, fun, ...)  {
 cl <- defaultCluster(cl)
 do.call(c, clusterApply(cl, x = splitList(X, length(cl)),
 fun = lapply, fun, ...), quote = TRUE)
}

to

parLapply <- function (cl = NULL, X, fun, ...)  {
 cl <- defaultCluster(cl)
 do.call(c, clusterApply(cl = cl, x = splitList(X, length(cl)),
 fun = lapply, fun, ...), quote = TRUE)
}

Oh... sorry I missed that point.  Yes, I agree, this should be a
trivial fix to the 'parallel' package.

/Henrik
Yes, thanks for the report, I am testing a fix for this (and other 
missing argument names in calls involving ...) in parallel.

Tomas



.

Best regards,
Florian



Gesendet: Mittwoch, 14. März 2018 um 19:05 Uhr
Von: "Henrik Bengtsson" 
An: "Florian Schwendinger" 
Cc: fschw...@wu.ac.at, R-devel 
Betreff: Re: [Rd] clusterApply arguments
This is nothing specific to parallel::clusterApply() per se. It is the
default behavior of R where it allows for partial argument names. I
don't think there's much that can be done here except always using
fully named arguments to the "apply" function itself as you show.

You can "alert" yourself when there's a mistake by using:

options(warnPartialMatchArgs = TRUE)

e.g.


clusterApply(clu, x = 1:2, fun = fun, c = 1) ## Error

Warning in clusterApply(clu, x = 1:2, fun = fun, c = 1) :
partial argument match of 'c' to 'cl'
Error in checkCluster(cl) : not a valid cluster

It's still only a warning, but an informative one.

/Henrik

On Wed, Mar 14, 2018 at 8:50 AM, Florian Schwendinger
 wrote:

Hi!

I recognized that the argument matching of clusterApply (and 
therefore parLapply) goes wrong when one of the arguments of the 
function is called "c". In this case, the argument "c" is used as 
cluster and the functions give the following error message "Error 
in checkCluster(cl) : not a valid cluster".


Of course, "c" is for many reasons an unfortunate argument name and 
this can be easily fixed by the user side.


See below for a small example.

library(parallel)

clu <- makeCluster(2, "PSOCK")

fun <- function(x0, x1) (x0 + x1)
clusterApply(clu, x = 1:2, fun = fun, x1 = 1) ## OK
parLapply(cl = clu, X = 1:2, fun = fun, x1 = 1) #OK


fun <- function(b, c) (b + c)
clusterApply(clu, x = 1:2, fun = fun, c = 1) ## Error
clusterApply(cl = clu, x = 1:2, fun = fun, c = 1) ## OK
parLapply(cl = clu, X = 1:2, fun = fun, c = 1) ## Error

stopCluster(clu)


I used "R version 3.4.3 Patched (2018-01-07 r74099".


Best regards,
Florian

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel[https://stat.ethz.ch/mailman/listinfo/r-devel] 





__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] BUG: tools::pskill() returns incorrect values or non-initated garbage values [PATCH]

2018-03-19 Thread Tomas Kalibera

Thanks for spotting this, fixed in R-devel (including the Windows version).
Tomas


On 03/18/2018 09:53 PM, Henrik Bengtsson wrote:

For the record, I've just filed the following bug report with a patch
to https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17395:

tools::pskill() returns either random garbage or incorrect values,
because the internal ps_kill() (a) it does not initiate the returned
logical, and (b) it assigns the logical returned the 0/-1 value of C's
kill().

# Example 1: returns garbage due to non-initiated allocation


as.integer(tools::pskill(0))

[1] 44764824

as.integer(tools::pskill(0))

[1] 41609736

as.integer(tools::pskill(0))

[1] 45003984


# Example 2: returns 0 in success and -1 on failure


p <- parallel::mcparallel({ Sys.sleep(3600); 42L })
res <- tools::pskill(pid = p$pid, signal = tools::SIGKILL)
as.integer(res)

[1] 0


p <- parallel::mcparallel({ Sys.sleep(3600); 42L })
res <- tools::pskill(pid = p$pid, signal = -1) ## invalid signal
as.integer(res)

[1] -1

/Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Inconsistency, may be bug in read.delim ?

2018-03-21 Thread Tomas Kalibera

On 03/19/2018 02:23 PM, Detlef Steuer wrote:

Dear friends,

I stumbled into beheaviour of read.delim which I would consider a bug
or at least an inconsistency that should be improved upon.

Recently we had to work with data that used "", two double quotes, as
symbol to start and end character input.

Essentially the data looked like this

data.csv

V1, V2, V3
""data"", 3, 

The last sequence of  indicating a missing.

After processing the quotes, this is internally parsed as

data 3 "

Which I think is correct; in particular,  represents single quote. 
This is correct and it conforms to RFC 4180. "" in contrast represents 
an empty string.


Based on my reading of RFC4180, ""data"" is not a valid field, but not 
every CSV file follows that RFC, and R supports this pattern as expected 
in your data. So you should be fine here.



One obvious solution to read in this data is using some gsub(),
but that's not the point I want to make.

Consider this case we found during tests:

test.csv

V1, V2, V3, V4
, , 3, ""

and read it with

read.delim("test.csv", sep=",", header=TRUE, na.strings="\"")

After processing the quotes, this is internally parsed as
" " 3 

which is again I think correct (and conforms to RFC 4180)


you get the following

   V1 V2 V3 V4
1 NA  "  3 NA

(and a warning)


I do not get the warning on my system. The reason why the second " is 
not translated to NA by na.strings is white space after the comma in the 
CSV file, this works more consistently:


> read.delim("test.csv", sep=",", header=TRUE, na.strings="\"", 
strip.white=TRUE)

  V1 V2 V3 V4
1 NA NA  3 NA

If one needed to differentiate between " and , then it 
might be necessary to run without the na.strings argument.


Best
Tomas


I would have assumed to get some error message or at
least the same result for both appearances of  in the
input file.
(the setting na.strings="\"" turned out to be working for
  a colleague and his specific data, while I think it shouldn't)

My main concern is the different interpretation for the two 
sequences.

Real bug? Minor inconsistency? I don't know.

All the best
Detlef




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Objects not gc'ed due to caching (?) in R's S3 dispatch mechanism

2018-03-27 Thread Tomas Kalibera

On 03/27/2018 09:51 AM, Iñaki Úcar wrote:

2018-03-27 6:02 GMT+02:00  :

This has nothing to do with printing or dispatch per se. It is the
result of an internal register (R_ReturnedValue) being protected. It
gets rewritten whenever there is a jump, e.g. by an explicit return
call. So a simplified example is

new_foo <- function() {
   e <- new.env()
 reg.finalizer(e, function(e) message("Finalizer called"))
   e
   }

bar <- function(x) return(x)

bar(new_foo())
gc() # still in .Last.value
gc() # nothing

UseMethod essentially does a return call so you see the effect there.

Understood. Thanks for the explanation, Luke.


The R_ReturnedValue register could probably be safely cleared in more
places but it isn't clear exactly where. As things stand it will be
cleared on the next use of a non-local transfer of control, and those
happen frequently enough that I'm not convinced this is worth
addressing, at least not at this point in the release cycle.

I barely know the R internals, and I'm sure there's a good reason
behind this change (R 3.2.3 does not show this behaviour), but IMHO
it's, at the very least, confusing. When .Last.value is cleared, that
object loses the last reference, and I'd expect it to be eligible for
gc.

In my case, I was using an object that internally generates a bunch of
data. I discovered this because I was benchmarking the execution, and
I was running out of memory because the memory wasn't been freed as it
was supposed to. So I spent half of the day on this because I thought
I had a memory leak. :-\ (Not blaming anyone here, of course; just
making a case to show that this may be worth addressing at some
point). :-)
From the perspective of the R user/programmer/package developer, please 
do not make any assumptions on when finalizers will be run, only that 
they indeed won't be run when the object is still alive. Similarly, it 
is not good to make any assumptions that "gc()" will actually run a 
collection (and a particular type of collection, that it will be 
immediately, etc). Such guarantees would too much restrict the design 
space and potential optimizations on the R internals side - and for this 
reason are typically not given in other managed languages, either. I've 
seen R examples where most time had been wasted tracing live objects 
because explicit "gc()" had been run in a tight loop. Note in Java for 
instance, an explicit call to gc() had been eventually turned into a 
hint only.


Once you start debugging when objects are collected, you are debugging R 
internals - and surprises/changes between svn versions/etc should be 
expected as well as changes in behavior caused very indirectly by code 
changes somewhere else. I work on R internals and spend most of my time 
debugging - that is unfortunately normal when you work on a language 
runtime. Indeed, the runtime should try not to keep references to 
objects for too long, but it remains to be seen whether and for what 
cost this could be fixed with R_ReturnedValue.


Best
Tomas



Regards,
Iñaki


Best,

luke


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Objects not gc'ed due to caching (?) in R's S3 dispatch mechanism

2018-03-27 Thread Tomas Kalibera

On 03/27/2018 11:53 AM, Iñaki Úcar wrote:

2018-03-27 11:11 GMT+02:00 Tomas Kalibera :

On 03/27/2018 09:51 AM, Iñaki Úcar wrote:

2018-03-27 6:02 GMT+02:00  :

This has nothing to do with printing or dispatch per se. It is the
result of an internal register (R_ReturnedValue) being protected. It
gets rewritten whenever there is a jump, e.g. by an explicit return
call. So a simplified example is

new_foo <- function() {
e <- new.env()
  reg.finalizer(e, function(e) message("Finalizer called"))
e
}

bar <- function(x) return(x)

bar(new_foo())
gc() # still in .Last.value
gc() # nothing

UseMethod essentially does a return call so you see the effect there.

Understood. Thanks for the explanation, Luke.


The R_ReturnedValue register could probably be safely cleared in more
places but it isn't clear exactly where. As things stand it will be
cleared on the next use of a non-local transfer of control, and those
happen frequently enough that I'm not convinced this is worth
addressing, at least not at this point in the release cycle.

I barely know the R internals, and I'm sure there's a good reason
behind this change (R 3.2.3 does not show this behaviour), but IMHO
it's, at the very least, confusing. When .Last.value is cleared, that
object loses the last reference, and I'd expect it to be eligible for
gc.

In my case, I was using an object that internally generates a bunch of
data. I discovered this because I was benchmarking the execution, and
I was running out of memory because the memory wasn't been freed as it
was supposed to. So I spent half of the day on this because I thought
I had a memory leak. :-\ (Not blaming anyone here, of course; just
making a case to show that this may be worth addressing at some
point). :-)

 From the perspective of the R user/programmer/package developer, please do
not make any assumptions on when finalizers will be run, only that they
indeed won't be run when the object is still alive. Similarly, it is not
good to make any assumptions that "gc()" will actually run a collection (and
a particular type of collection, that it will be immediately, etc). Such
guarantees would too much restrict the design space and potential
optimizations on the R internals side - and for this reason are typically
not given in other managed languages, either. I've seen R examples where
most time had been wasted tracing live objects because explicit "gc()" had
been run in a tight loop. Note in Java for instance, an explicit call to
gc() had been eventually turned into a hint only.

Once you start debugging when objects are collected, you are debugging R
internals - and surprises/changes between svn versions/etc should be
expected as well as changes in behavior caused very indirectly by code
changes somewhere else. I work on R internals and spend most of my time
debugging - that is unfortunately normal when you work on a language
runtime. Indeed, the runtime should try not to keep references to objects
for too long, but it remains to be seen whether and for what cost this could
be fixed with R_ReturnedValue.

To be precise, I was not debugging *when* objects were collected, I
was debugging *whether* objects were collected. And for that, I
necessarily need some hint about the *when*.
They would be collected eventually if you were running a non-trivial 
program (because there would be a jump inside).

But I think that's another discussion. My point is that, as an R user
and package developer, I expect consistency, and currently

new_foo <- function() {
   e <- new.env()
   reg.finalizer(e, function(e) message("Finalizer called"))
   e
}

bar <- function(x) return(x)

bar(new_foo())
gc() # still in .Last.value
gc() # nothing

behaves differently than

new_foo <- function() {
   e <- new.env()
   reg.finalizer(e, function(e) message("Finalizer called"))
   e
}

bar <- function(x) x

bar(new_foo())
gc() # still in .Last.value
gc() # Finalizer called!

And such a difference is not explained (AFAIK) in the documentation.
At least the help page for 'return' does not make me think that I
should not expect exactly the same behaviour if I write (or not) an
explicit 'return'.
As R user and package developer, you should have consistency in 
_documented_ behavior. If not, it is a bug and has to be fixed either in 
the documentation, or in the code. You should never depend on 
undocumented behavior, because that can change at any time. You cannot 
expect that different versions of R would behave exactly the same, not 
even the svn versions, that is not possible and would not be possible 
even if we did not change any code in R implementation, because even the 
OS, C compiler, hardware, and third party libraries have their specified 
and unspecified behavior.


Best
Tomas


Regards,
Iñaki


Best
Tomas



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Typo in src/extra/tzone/registryTZ.c

2018-03-27 Thread Tomas Kalibera

Thanks! Fixed in R-devel,
Tomas

On 03/26/2018 03:22 PM, Korpela Mikko (MML) wrote:

I stumbled upon a typo in a time zone name: Irtutsk should be Irkutsk.
A patch is attached. I also checked that this is the only bug of its
kind in this file, i.e., all the other Olson time zones occurring in
the file can also be found in Unicode Common Locale Data Repository.

- Mikko Korpela

Index: src/extra/tzone/registryTZ.c
===
--- src/extra/tzone/registryTZ.c(revision 74465)
+++ src/extra/tzone/registryTZ.c(working copy)
@@ -303,7 +303,7 @@
  { L"Russia Time Zone 4", "Asia/Yekaterinburg" },
  { L"Russia Time Zone 5", "Asia/Novosibirsk" },
  { L"Russia Time Zone 6", "Asia/Krasnoyarsk" },
-{ L"Russia Time Zone 7", "Asia/Irtutsk" },
+{ L"Russia Time Zone 7", "Asia/Irkutsk" },
  { L"Russia Time Zone 8", "Asia/Yakutsk" },
  { L"Russia Time Zone 9", "Asia/Magadan" },
  { L"Russia Time Zone 10", "Asia/Srednekolymsk" },

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Possible `substr` bug in UTF-8 Corner Case

2018-03-29 Thread Tomas Kalibera
Thanks, fixed in R-devel (by checking validity of UTF-8 strings for 
substr/substring).

Tomas

On 03/29/2018 03:53 AM, brodie gaslam via R-devel wrote:

I think there is a memory bug in `substr` that is triggered by a UTF-8 corner 
case: an incomplete UTF-8 byte sequence at the end of a string.  With a 
valgrind level 2 instrumented build of R-devel I get:


string <- "abc\xEE"    # \xEE indicates the start of a 3 byte UTF-8 sequence
Encoding(string) <- "UTF-8"
substr(string, 1, 10)

==15375== Invalid read of size 1
==15375==    at 0x45B3F0: substr (character.c:286)
==15375==    by 0x45B3F0: do_substr (character.c:342)
==15375==    by 0x4CFCB9: bcEval (eval.c:6775)
==15375==    by 0x4D95AF: Rf_eval (eval.c:624)
==15375==    by 0x4DAD12: R_execClosure (eval.c:1764)
==15375==    by 0x4D9561: Rf_eval (eval.c:747)
==15375==    by 0x507008: Rf_ReplIteration (main.c:258)
==15375==    by 0x5073E7: R_ReplConsole (main.c:308)
==15375==    by 0x507494: run_Rmainloop (main.c:1082)
==15375==    by 0x41A8E6: main (Rmain.c:29)
==15375==  Address 0xb9e518d is 3,869 bytes inside a block of size 7,960 alloc'd
==15375==    at 0x4C2DB8F: malloc (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15375==    by 0x51033E: GetNewPage (memory.c:888)
==15375==    by 0x511FC0: Rf_allocVector3 (memory.c:2691)
==15375==    by 0x4657AC: Rf_allocVector (Rinlinedfuns.h:577)
==15375==    by 0x4657AC: Rf_ScalarString (Rinlinedfuns.h:1007)
==15375==    by 0x4657AC: coerceToVectorList (coerce.c:892)
==15375==    by 0x4657AC: Rf_coerceVector (coerce.c:1293)
==15375==    by 0x4660EB: ascommon (coerce.c:1369)
==15375==    by 0x4667C0: do_asvector (coerce.c:1544)
==15375==    by 0x4CFCB9: bcEval (eval.c:6775)
==15375==    by 0x4D95AF: Rf_eval (eval.c:624)
==15375==    by 0x4DAD12: R_execClosure (eval.c:1764)
==15375==    by 0x515EF7: dispatchMethod (objects.c:408)
==15375==    by 0x516379: Rf_usemethod (objects.c:458)
==15375==    by 0x516694: do_usemethod (objects.c:543)
==15375==
[1] "abc"

Here is a patch for the native version of `substr` that highlights the problem and a possible fix.  
Basically `substr` computes the byte width of a UTF-8 character based on the leading byte 
("\xEE" here, which implies 3 bytes) and reads/writes that entire byte width irrespective 
of whether the string actually ends before the theoretical end of the UTF-8 "character".

Index: src/main/character.c
===
--- src/main/character.c(revision 74482)
+++ src/main/character.c(working copy)
@@ -283,7 +283,7 @@
 for (i = 0; i < so && str < end; i++) {
 int used = utf8clen(*str);
 if (i < sa - 1) { str += used; continue; }
-for (j = 0; j < used; j++) *buf++ = *str++;
+for (j = 0; j < used && str < end; j++) *buf++ = *str++;
 }
  } else if (ienc == CE_LATIN1 || ienc == CE_BYTES) {
 for (str += (sa - 1), i = sa; i <= so; i++) *buf++ = *str++;

The change above removed the valgrind error for me.  I re-built R with the change and ran 
"make check" which seemed to work fine. I also ran some simple checks on UTF-8 
strings and things seem to work okay.

I have very limited experience making changes to R (this is my first attempt at 
a patch) so please take all of the above with extreme skepticism.

Apologies in advance if this turns out to be a false alarm caused by an error 
on my part.

Best,

Brodie.

PS: apologies also if the formatting of this e-mail is bad.  I have not figured 
out how to get plaintext working properly with yahoo.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] as.pairlist does not convert call objects

2018-03-29 Thread Tomas Kalibera


On 03/28/2018 08:31 AM, Jialin Ma wrote:

Dear all,

It seems that as.pairlist does not convert call objects, producing
results like the following:


is.pairlist(as.pairlist(quote(x + y)))

[1] FALSE

Should this behavior be expected?
The documentation says that the behavior of as.pairlist is undocumented 
in this case:


"
 ‘as.pairlist’ is implemented as ‘as.vector(x, "pairlist")’, and
 hence will dispatch methods for the generic function ‘as.vector’.
 Lists are copied element-by-element into a pairlist and the names
 of the list used as tags for the pairlist: the return value for
 other types of argument is undocumented.
"

as.pairlist implementation does currently nothing for a language object 
(because it is internally represented using a linked list). is.pairlist 
implementation is checking whether it's argument is a user-level 
pairlist, which language object is not, so it returns FALSE in the example.


These functions are rather low-level and should not be needed in user 
programs. Certainly programs should not depend on undocumented behavior.


Tomas

Thanks,
Jialin



sessionInfo()

R version 3.4.1 (2017-06-30)
Platform: x86_64-suse-linux-gnu (64-bit)
Running under: openSUSE Tumbleweed

Matrix products: default
BLAS: /usr/lib64/R/lib/libRblas.so
LAPACK: /usr/lib64/R/lib/libRlapack.so

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods
[7] base

other attached packages:
[1] magrittr_1.5

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] utils::unzip ignores overwrite argument, effectively

2018-04-04 Thread Tomas Kalibera

Thanks, fixed in R-devel.
Tomas

On 12/20/2017 02:38 PM, Gábor Csárdi wrote:

It does give a warning, but then it overwrites the files, anyway.
Reproducible example below.

This is R 3.4.3, but it does not seem to be fixed in R-devel:
https://github.com/wch/r-source/blob/4a9ca3e5ac6b19d7faa7c9290374f7604bf0ef64/src/main/dounzip.c#L171-L174

FYI,
Gábor

dir.create(tmp <- tempfile())
setwd(tmp)

cat("old1\n", file = "file1")
cat("old2\n", file = "file2")

utils::zip("files.zip", c("file1", "file2"))
#>   adding: file1 (stored 0%)
#>   adding: file2 (stored 0%)

unlink("file2")
cat("new1\n", file = "file1")
readLines("file1")
#> [1] "new1"

utils::unzip("files.zip", overwrite = FALSE)
#> Warning message:
#> In utils::unzip("files.zip", overwrite = FALSE) :
#>not overwriting file './file1

readLines("file1")
#> [1] "old1"

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [Bug report] Chinese characters are not handled correctly in Rterm for Windows

2018-04-05 Thread Tomas Kalibera

Thank you for the report and initial debugging. I am not sure what is 
going wrong, we may have to rely on your help to debug this (I do not 
have a system to reproduce on). A user-targeted advice would be to use 
RGui (Rgui.exe).

Does the problem also exist in R-devel?
https://cran.r-project.org/bin/windows/base/rdevel.html

Your example  print("ABC\u4f60\u597dDEF") is printing two Chinese 
characters, right? The first one is C4E3 in CP936 (4F60 in Unicode) and 
the second one is BAC3 in CP936 (597D in Unicode)? Could you reproduce 
the problem with printing just one of the characters, say 
print("ABC\u4f60DEF") ?

As a sanity check - does this display the correct characters in RGui? It 
should, and does on my system, as RGui uses Unicode internally. By 
correct I mean the characters shown e.g. here

https://msdn.microsoft.com/en-us/library/cc194923.aspx
https://msdn.microsoft.com/en-us/library/cc194920.aspx

What is the output of "chcp" in the terminal, before you run R.exe? It 
may be different from what Sys.getlocale() gives in R.

If you take the sequence of the "fputc" commands you captured by the 
debugger, and create a trivial console application to just run them - 
would the characters display correctly in the same terminal from which 
you run R.exe?

Thanks
Tomas


On 03/08/2018 06:54 PM, Azure wrote:
> Hello everyone,
>
>
> I am new to R and I have experienced some bugs when using Rterm on Windows.
>
> Chinese characters in the console output are discarded by Rterm, and trying
>
> to type them into the console will crash the Rterm application.
>
>
> ---ENVIRONMENT---
>
> Platform = x86_64-w64-mingw32
>
> OS = Windows 10 Pro 1709 chs
>
> R version = 3.4.3
>
> Active code page = 936 (Simplified Chinese)
>
>
> ---STEPS TO REPRODUCE---
>
> 1. Run cmd and start bin\x64\R.exe
>
>
> 2. Note that all Chinese characters in the startup banner are missing
>
>
> 3. > Sys.getlocale()
>
> [1] "LC_COLLATE=Chinese (Simplified)_China.936;LC_CTYPE=Chinese (Simplified)
> _China.936;LC_MONETARY=Chinese (Simplified)_China.936;LC_NUMERIC=C;LC_
> TIME=Chinese (Simplified)_China.936"
>
> 4. > print("ABC\u4f60\u597dDEF")
> [1] "ABCDEF"
> (Unicode code points for "���")
>
> 5. Use Microsoft Pinyin IME to type "���" into the console. An error message 
> appeared:
>> invalid multibyte character in mbcs_get_next
> Then the program crashed. My debugger reported a heap corruption, displayed 
> as follows:
> 0x7FFE2F3687BB (ntdll.dll) (Rterm.exe ��)δ�쳣: 0xC374: 
> �𻵡� (: 0x7FFE2F3CC6E0)��
> However, if the text is pasted into the console, it will not crash.
>
> ---ADDITIONAL INFO---
> Both 32-bit and 64-bit versions have the same problem.
> I attached a debugger to observe Rterm's behavior. The command in step 4
> produced the following calling sequence of C library function "fputc":
>
> fputc ( 91, 0x7ffe2d1aea40 ) //'['
> fputc ( 49, 0x7ffe2d1aea40 ) //'1'
> fputc ( 93, 0x7ffe2d1aea40 ) //']'
> //fflush ( 0x7ffe2d1aea40 )
> fputc ( 32, 0x7ffe2d1aea40 ) //' '
> fputc ( 34, 0x7ffe2d1aea40 ) //'\"'
> fputc ( 65, 0x7ffe2d1aea40 ) //'A'
> fputc ( 66, 0x7ffe2d1aea40 ) //'B'
> fputc ( 67, 0x7ffe2d1aea40 ) //'C'
> fputc ( 196, 0x7ffe2d1aea40 ) //FAILED!
> fputc ( 227, 0x7ffe2d1aea40 ) //FAILED!
> fputc ( 186, 0x7ffe2d1aea40 ) //FAILED!
> fputc ( 195, 0x7ffe2d1aea40 ) //FAILED!
> fputc ( 68, 0x7ffe2d1aea40 ) //'D'
> fputc ( 69, 0x7ffe2d1aea40 ) //'E'
> fputc ( 70, 0x7ffe2d1aea40 ) //'F'
> fputc ( 34, 0x7ffe2d1aea40 ) //'\"'
> //fflush ( 0x7ffe2d1aea40 )
> fputc ( 10, 0x7ffe2d1aea40 ) //'\n'
>
> {196, 227, 186, 195} or {C4 E3 BA C3} is multi-byte-encoded "���" in GBK 
> (Code page 936).
> These calls failed with a Windows error code 28 (No space left on device), 
> while the subsequent
> calls to fputc succeeded.
>
> Then I used C++ to implement a terminal front-end with REmbedded facilities. 
> R outputs were
> simply printf-ed to stdout. Everything worked as expected:
>
> Initializing R environment
> R version 3.4.3 detected
>> print("��ã�һ���й�ѧR is great!")
> [1] "��ã�һ���й�ѧR is great!"
>> Sys.getlocale()
> [1] "LC_COLLATE=Chinese (Simplified)_China.936;LC_CTYPE=Chinese (Simplified)
> _China.936;LC_MONETARY=Chinese (Simplified)_China.936;LC_NUMERIC=C;LC_
> TIME=Chinese (Simplified)_China.936"
> I hope these information are helpful.
>
> Best regards,
> AzureFx
>
>   [[alternative HTML version deleted]]
>
>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] potential file.copy() or documentation bug when copy.date = TRUE

2018-04-06 Thread Tomas Kalibera

Thanks for the report, fixed in R-devel.

Best,
Tomas

On 04/05/2018 05:01 PM, Gábor Csárdi wrote:

This is a recent R-devel. file.copy() is not vectorized if multiple
destinations succeed:

cat("foo1\n", file = "foo1")
cat("foo2\n", file = "foo2")
unlink(c("copy1", "copy2"), recursive = TRUE)

file.copy(c("foo1", "foo2"), c("copy1", "copy2"), copy.date = TRUE)

#> Error in Sys.setFileTime(to[okay], fi$mtime) : invalid 'path' argument

# But the files were still copied:
file.exists(c("copy1", "copy2"))
#> [1] TRUE TRUE

file.copy(c("foo1", "foo2"), c("copy1", "copy2"))
#> [1] FALSE FALSE

file.copy(c("foo1", "foo2"), c("copy1", "copy2"), copy.date = TRUE)
#> [1] FALSE FALSE

file.copy(c("foo1", "foo2"), c("copy1", "copy2"), copy.date = TRUE,
overwrite = TRUE)
#> Error in Sys.setFileTime(to[okay], fi$mtime) : invalid 'path' argument

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [bug report] Cyrillic letter "я" interrupts script execution via R source function

2018-04-09 Thread Tomas Kalibera

Hi Vladimir,

thanks for your report - this was really a bug, now fixed in R-devel and 
to appear in 3.5.0.


Apart from the bug, having source files in UTF-8 and reading them into R 
on Windows is perfectly fine, you just need to specify that they are in 
UTF-8. You also need to make sure R is running in Russian locale 
(CP1251) if that is not the default. On my system, this works fine


Sys.setlocale(locale="Russian")
source("russian_utf8.R", encoding="UTF-8")

Best
Tomas


On 08/28/2017 11:27 AM, Владимир Панфилов wrote:

Hello,

I do not have an account on R Bugzilla, so I will post my bug report here.
I want to report a very old bug in base R *source()* function. It relates
to sourcing some R scripts in UTF-8 encoding on Windows machines. For some
reason if the UTF-8 script is containing cyrillic letter *"я"*, the script
execution is interrupted directly on this letter (btw the same scripts are
sourcing fine when they are encoded in the systems CP1251 encoding).

Let's consider the following script that prints random russian words:




*print("Осень")print("Ёжик")print("трясина")print("тест")*


When this script is sourced we get INCOMPLETE_STRING error:






*source('D:/R code/test_cyr_letter.R', encoding = 'UTF-8', echo=TRUE)Error
in source("D:/R code/test_cyr_letter.R", encoding = "UTF-8", echo = TRUE)
:   D:/R code/test_cyr_letter.R:3:7: unexpected INCOMPLETE_STRING2:
print("Ёжик")3: print("тр ^*


Note that this bug is not triggered when the same file is executed using
*eval(parse(...))*:





*> eval(parse('D:/R code/test_cyr_letter.R', encoding="UTF-8"))[1]
"Осень"[1] "Ёжик"[1] "трясина"[1] "тест"*


I made some reserach and noticed that *source* and *parse* functions have
similar parts of code for reading files. After analyzing code of *source()*
function I found out that commenting one line from it fixes this bug and
the overrided function works fine. See this part of *source()* function
code:

*... *

*filename <- file*

*file <- file(filename, "r")*

*# on.exit(close(file))   COMMENT THIS LINE *

*if (isTRUE(keep.source)) {*

*  lines <- scan(file, what="character", encoding = encoding, sep

= "\n")*

*  on.exit()*

*  close(file)*

*  srcfile <- srcfilecopy(filename, lines,

file.mtime(filename)[1], *

* isFile = TRUE)*

*} *

*...*



I do not fully understand this weird behaviour, so I ask help of R Core
developers to fix this annoying bug that prevents using unicode scripts
with cyrillic on Windows.
Maybe you should make that part of *source()* function read files like
*parse()* function?

*Session and encoding info:*


sessionInfo()

R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=Russian_Russia.1251  LC_CTYPE=Russian_Russia.1251
  LC_MONETARY=Russian_Russia.1251
[4] LC_NUMERIC=CLC_TIME=Russian_Russia.1251
attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base
loaded via a namespace (and not attached):
[1] compiler_3.4.1 tools_3.4.1




l10n_info()

$MBCS
[1] FALSE
$`UTF-8`
[1] FALSE
$`Latin-1`
[1] FALSE
$codepage
[1] 1251

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [bug report] Cyrillic letter "я" interrupts script execution via R source function

2018-04-09 Thread Tomas Kalibera

Hi Patrick,

thanks for your comments on the bug, just to clarify - one could 
reproduce the bug simply using file() and readLines(). The parser saw a 
real end of file as (incorrectly) communicated to it by lower level 
connections code - there is no design issue related in the parser (nor 
elsewhere), it was a bug in connections code and is now fixed.


You can specify source encoding in "file()" or "source()" to tell R that 
the source file is in that given encoding. R will convert the file 
contents to the current native encoding of the R session. If in doubt, 
please check the documentation ?file, ?source, ?readLines, ?Encoding for 
the details.


The observation that "я" is represented as 0xff (-1 as signed char) and 
R_EOF/EOF is -1 (but integer) was related to the bug, well spotted.


Best
Tomas

On 08/28/2017 02:24 PM, Patrick Perry wrote:

My understanding (which could be wrong) is that when you source a file,
it first gets translated to your native locale and then parsed. When you
parse a character vector, it does not get translated.

In your locale, every "я" character  (U+044F) gets replaced by the byte
"\xFF":


  iconv("\u044f", "UTF-8", "Windows-1251")

[1] "\xff"

I suspect that particular value causes trouble for the R parser, which
uses a stack of previously-seen characters (include/Defn.h):

LibExtern charR_ParseContext[PARSE_CONTEXT_SIZE] INI_as("");

And at various places checks whether the context character is EOF. That
character is defined as

#define R_EOF-1

Which, when cast to a char, is 0xFF.

I suspect that your example is revealing two bugs:

1) The R parser seems to have trouble with native characters encoded as
0xFF. It's possible that, since R strings can't contain 0x00, this can
be fixed by changing the definition of R_EOF to

#define R_EOF 0


2) The other bug is that, as I understand the situation, "source" will
fail if the file contains a character that cannot be represented in your
native locale. This is a harder bug to tackle because of the way file()
and the other connection methods are designed, where they translate the
input to your native locale. I don't know if it's possible to override
this behavior, and have them translate input to UTF-8 instead.



Patrick


---

On Mon Aug 28 11:27:07 CEST 2017 Владимир Панфилов
 wrote:

Hello,

I do not have an account on R Bugzilla, so I will post my bug report here.
I want to report a very old bug in base R *source()* function. It relates
to sourcing some R scripts in UTF-8 encoding on Windows machines. For some
reason if the UTF-8 script is containing cyrillic letter *"я"*, the script
execution is interrupted directly on this letter (btw the same scripts are
sourcing fine when they are encoded in the systems CP1251 encoding).

Let's consider the following script that prints random russian words:



/

/>/
/>/  *print("Осень")print("Ёжик")print("трясина")print("тест")*
/

When this script is sourced we get INCOMPLETE_STRING error:



/

/>/
/>/
/>/
/>/  *source('D:/R code/test_cyr_letter.R', encoding = 'UTF-8', echo=TRUE)Error
/>/  in source("D:/R code/test_cyr_letter.R", encoding = "UTF-8", echo = TRUE)
/>/  :   D:/R code/test_cyr_letter.R:3:7: unexpected INCOMPLETE_STRING2:
/>/  print("Ёжик")3: print("тр ^*
/

Note that this bug is not triggered when the same file is executed using
*eval(parse(...))*:



/

/>/
/>/
/>/  *>  eval(parse('D:/R code/test_cyr_letter.R', encoding="UTF-8"))[1]
/>/  "Осень"[1] "Ёжик"[1] "трясина"[1] "тест"*
/

I made some reserach and noticed that *source* and *parse* functions have
similar parts of code for reading files. After analyzing code of *source()*
function I found out that commenting one line from it fixes this bug and
the overrided function works fine. See this part of *source()* function
code:

*... *

/

/>/  *filename<- file*
/>/
/>/  *file<- file(filename, "r")*
/>/
/>/  *# on.exit(close(file))   COMMENT THIS LINE *
/>/
/>/  *if (isTRUE(keep.source)) {*
/>/
/>/  *  lines<- scan(file, what="character", encoding = encoding, sep
/>>/  = "\n")*
/>/
/>/  *  on.exit()*
/>/
/>/  *  close(file)*
/>/
/>/  *  srcfile<- srcfilecopy(filename, lines,
/>>/  file.mtime(filename)[1], *
/>/
/>/  * isFile = TRUE)*
/>/
/>/  *} *
/>/
/>/  *...*
/>/
/>/
/I do not fully understand this weird behaviour, so I ask help of R Core
developers to fix this annoying bug that prevents using unicode scripts
with cyrillic on Windows.
Maybe you should make that part of *source()* function read files like
*parse()* function?

*Session and encoding info:*


/  >  sessionInfo()

/>/  R version 3.4.1 (2017-06-30)
/>/  Platform: x86_64-w64-mingw32/x64 (64-bit)
/>/  Running under: Windows 7 x64 (build 7601) Service Pack 1
/>/  Matrix products: default
/>/  locale:
/>/  [1] LC_COLLATE=Russian_Russia.1251  LC_CTYPE=Russian_Russia.1251
/>/   LC_MONETARY=Russian_Russia.1251
/>/  [4] LC_NUMERIC=C   

Re: [Rd] file.copy(from=Directory, to=File) oddity

2018-04-09 Thread Tomas Kalibera
Thanks for the report, fixed in R-devel. Now we get a warning when 
copying a directory over a non-directory file is attempted. The target 
(non-directory) file is left alone.


Tomas

On 09/08/2017 06:54 PM, William Dunlap via R-devel wrote:

When I mistakenly use file.copy() with a directory for the 'from' argument
and a non-directory for the 'to' and overwrite=TRUE, file.copy returns
FALSE, meaning it could not do the copying.  However, it also replaces the
'to' file with a zero-length file.

dir.create( fromDir <- tempfile() )
cat(file = toFile <- tempfile(), "existing file\n")
readLines(toFile)
#[1] "existing file"
file.copy(fromDir, toFile, recursive=FALSE, overwrite=TRUE)
#[1] FALSE
readLines(toFile)
#character(0)

or, with recursive=TRUE,

dir.create( fromDir <- tempfile() )
cat(file = toFile <- tempfile(), "existing file\n")
readLines(toFile)
#[1] "existing file"
file.copy(fromDir, toFile, recursive=TRUE, overwrite=TRUE)
#[1] FALSE
#Warning message:
#In file.copy(fromDir, toFile, recursive = TRUE, overwrite = TRUE) :
#  'recursive' will be ignored as 'to' is not a single existing directory
readLines(toFile)
#character(0)

Is this behavior intended?

Bill Dunlap
TIBCO Software
wdunlap tibco.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R Bug: write.table for matrix of more than 2, 147, 483, 648 elements

2018-04-19 Thread Tomas Kalibera

On 04/19/2018 02:06 AM, Duncan Murdoch wrote:

On 18/04/2018 5:08 PM, Tousey, Colton wrote:

Hello,

I want to report a bug in R that is limiting my capabilities to 
export a matrix with write.csv or write.table with over 2,147,483,648 
elements (C's int limit). I found this bug already reported about 
before: https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17182. 
However, there appears to be no solution or fixes in upcoming R 
version releases.


The error message is coming from the writetable part of the utils 
package in the io.c source 
code(https://svn.r-project.org/R/trunk/src/library/utils/src/io.c):

/* quick integrity check */
 if(XLENGTH(x) != (R_len_t)nr * nc)
 error(_("corrupt matrix -- dims not not match 
length"));


The issue is that nr*nc is an integer and the size of my matrix, 2.8 
billion elements, exceeds C's limit, so the check forces the code to 
fail.


Yes, looks like a typo:  R_len_t is an int, and that's how nr was 
declared.  It should be R_xlen_t, which is bigger on machines that 
support big vectors.


I haven't tested the change; there may be something else in that 
function that assumes short vectors.
Indeed, I think the function won't work for long vectors because of 
EncodeElement2 and EncodeElement0. EncodeElement2/0 would have to be 
changed, including their signatures


Tomas





Duncan Murdoch



My version:

R.Version()

$platform
[1] "x86_64-w64-mingw32"

$arch
[1] "x86_64"

$os
[1] "mingw32"

$system
[1] "x86_64, mingw32"

$status
[1] ""

$major
[1] "3"

$minor
[1] "4.3"

$year
[1] "2017"

$month
[1] "11"

$day
[1] "30"

$`svn rev`
[1] "73796"

$language
[1] "R"

$version.string
[1] "R version 3.4.3 (2017-11-30)"

$nickname
[1] "Kite-Eating Tree"

Thank you,
Colton


Colton Tousey
Research Associate II
P: 816.585.0300   E: colton.tou...@kc.frb.org
FEDERAL RESERVE BANK OF KANSAS CITY
1 Memorial Drive   *   Kansas City, Missouri 64198   * 
www.kansascityfed.org


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R Bug: write.table for matrix of more than 2, 147, 483, 648 elements

2018-04-19 Thread Tomas Kalibera

On 04/19/2018 11:47 AM, Serguei Sokol wrote:

Le 19/04/2018 à 09:30, Tomas Kalibera a écrit :

On 04/19/2018 02:06 AM, Duncan Murdoch wrote:

On 18/04/2018 5:08 PM, Tousey, Colton wrote:

Hello,

I want to report a bug in R that is limiting my capabilities to 
export a matrix with write.csv or write.table with over 
2,147,483,648 elements (C's int limit). I found this bug already 
reported about before: 
https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17182. However, 
there appears to be no solution or fixes in upcoming R version 
releases.


The error message is coming from the writetable part of the utils 
package in the io.c source 
code(https://svn.r-project.org/R/trunk/src/library/utils/src/io.c):

/* quick integrity check */
 if(XLENGTH(x) != (R_len_t)nr * nc)
 error(_("corrupt matrix -- dims not not match 
length"));


The issue is that nr*nc is an integer and the size of my matrix, 
2.8 billion elements, exceeds C's limit, so the check forces the 
code to fail.


Yes, looks like a typo:  R_len_t is an int, and that's how nr was 
declared.  It should be R_xlen_t, which is bigger on machines that 
support big vectors.


I haven't tested the change; there may be something else in that 
function that assumes short vectors.
Indeed, I think the function won't work for long vectors because of 
EncodeElement2 and EncodeElement0. EncodeElement2/0 would have to be 
changed, including their signatures


That would be a definite fix but before such deep rewriting is 
undertaken may the following small fix (in addition to "(R_xlen_t)nr * 
nc") will be sufficient for cases where nr and nc are in int range but 
their product can reach long vector limit:


replace
    tmp = EncodeElement2(x, i + j*nr, quote_col[j], qmethod,
                    &strBuf, sdec);
by
    tmp = EncodeElement2(VECTOR_ELT(x, (R_xlen_t)i + j*nr), 0, 
quote_col[j], qmethod,

                    &strBuf, sdec);


Unfortunately we can't do that, x is a matrix of an atomic vector type. 
VECTOR_ELT is taking elements of a generic vector, so it cannot be 
applied to "x". But even if we extracted a single element from "x" (e.g. 
via a type-switch etc), we would not be able to pass it to 
EncodeElement0 which expects a full atomic vector (that is, including 
its header). Instead we would have to call functions like EncodeInteger, 
EncodeReal0, etc on the individual elements. Which is then the same as 
changing EncodeElement0 or implementing a new version of it. This does 
not seem that hard to fix, just is not as trivial as changing the cast..


Tomas




Serguei


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R CMD build then check fails on R-devel due to serialization version

2018-04-24 Thread Tomas Kalibera


An update on the adoption of serialization format 3:

R 3.5.0 (released yesterday) supports serialization format 3, but the 
default is still format 2 to make the transition easier.


It is expected that the default will soon be changed to 3 in R-devel. 
Packages thus must not depend on what is the default serialization 
format (several packages used to have hard-coded assumptions about that 
the format was 2).


As of R 3.5.0, one can test the effect of the expected change to 
serialization format 3 by setting environment variables 
"R_DEFAULT_SAVE_VERSION=3" and "R_DEFAULT_SERIALIZE_VERSION=3". One can 
also set these to 2 to ensure the default format is 2 even after the 
expected change.


Packages that include serialization files in format 3 in their sources 
cannot be used by older versions of R than 3.5.0, so they must declare a 
dependency on R>=3.5.0. Package authors need to make sure of this, but 
they only need to worry if they added serialization files explicitly - 
package meta-data will still be saved in format 2 automatically. Note 
that the current release, R 3.5.0, has the default at 2, so no extra 
precaution is needed when creating such serialize files using the 
current release 3.5.0, but authors will have to be careful when creating 
such files using even newer versions of R.


As a safeguard, "R CMD build" automatically adds a dependency on 
R>=3.5.0 when it detects a serialization file in format 3, but package 
authors should not depend on this, because the detection is not and can 
not be fully precise. As another safeguard, "R CMD check --as-cran" will 
reject packages with serialization format 3 unless they have a 
dependency on at least R>=3.5.0.


This should give enough options also for testing frameworks based on 
running build&"check --as-cran" using the same version of R (most 
packages don't include serialize files at all, R CMD build would 
otherwise add dependency on R>=3.5.0 to make "R CMD check --as-cran" 
happy, one could also set R_DEFAULT_SAVE_VERSION=2 and 
R_DEFAULT_SERIALIZE_VERSION=2). These frameworks could also be tested 
before the change by running with R_DEFAULT_SAVE_VERSION=3 and 
R_DEFAULT_SERIALIZE_VERSION=3.


Best
Tomas

On 01/13/2018 01:35 AM, Tomas Kalibera wrote:
To reduce difficulties for people relying on automated tests set up to 
build&"check --as-cran" using R-devel (e.g. travis-ci), the default 
serialization version has been temporarily switched back to 2. Thank 
you for your patience - according to svn history, the last change of 
the serialization format happened 16 years ago, and unsurprisingly 
some practices that developed since did not anticipate such change and 
have to be adapted.


CRAN is now protected against packages containing serialized files in 
format 3 (which not only is not readable by 3.4.x and older, but could 
still change - the 'devel' in 'R-devel'). These new checks have to 
stay but we are looking at improving package-maintainer-friendliness. 
It turned out more difficult than just 1-2 days, hence the temporary 
switch back to version 2.


Best
Tomas

On 01/11/2018 02:47 PM, luke-tier...@uiowa.edu wrote:

As things stand now, package tarballs with vignettes that are built
with R-devel will not install in R 3.4.x, so CRAN can't accept them
and someone running R CMD check --as-cran should be told that. A
WARNING is appropriate.

Most likely what will change soon is that build/version.rds will be
saved with serialization version = 2 and this warning will not be
triggered just by having a vignette. It will still be triggered by
data files serialized with R-devel's default version = 3.

Please do remember that the 'devel' in R-devel means exactly that:
things will at times be unstable. There are currently a lot of balls
flying around with changes in R-devel and also Biocontuctor, and the
CRAN maintainers are working hard to keep things all up in the
air. Please be patient.

Best,

luke

On Thu, 11 Jan 2018, Jim Hester wrote:


This change poses difficulties for automated build systems such as
travis-ci, which is widely used in the R community. In particular
because this is a WARNING and not a NOTE this causes all R-devel
builds with vignettes to fail, as the default settings fail the build
if R CMD check issues a WARNING.

The simplest change would be for R-core to change this message to be a
NOTE rather than a WARNING, the serialization could still be tested
and there would be a check against vignettes built with R-devel, but
it would not cause these builds to fail.

On Wed, Jan 10, 2018 at 3:52 PM, Duncan Murdoch
 wrote:

On 10/01/2018 1:26 PM, Neal Richardson wrote:


Hi,
Since yesterday I'm seeing `R CMD check --as-cran` failures on the
R-devel daily build (specifically, R Under development (unstable)
(2018-01-09 r74100)) for multiple packages:

* checking serialized R objects in t

Re: [Rd] Bug in RScript.exe for 3.5.0

2018-04-25 Thread Tomas Kalibera
Thanks for the report. A quick workaround before this gets fixed is to 
add an extra first argument that has no space in it, e.g.


Rscript --vanilla "foo bar.R"

The problem exists on all systems, not just Windows.

Best
Tomas

On 04/25/2018 09:55 PM, Kerry Jackson wrote:

Hi R Developers,
I have found what I think is a bug in the RScript.exe in version 3.5.0 of R for 
Windows.
When I call Rscript.exe for Version 3.5 of R, it is unable to open the file if 
the file name or path has a space in it.
As an example of what happens, I saved 2 files with the code:
cat("What do you get when you multiply 6 * 9?")
as C:\foo bar.R and as C:\foo_bar.R
When I in a DOS command window try to run these using version 3.4.3 and 3.5:
C:\>"C:\Program Files\R\R-3.4.3\bin\x64\Rscript.exe" "C:\foo bar.R"
What do you get when you multiply 6 * 9?
C:\>"C:\Program Files\R\R-3.4.3\bin\x64\Rscript.exe" "C:\foo_bar.R"
What do you get when you multiply 6 * 9?
C:\>"C:\Program Files\R\R-3.5.0\bin\x64\Rscript.exe" "C:\foo bar.R"
Fatal error: cannot open file 'C:\foo': No such file or directory


C:\>"C:\Program Files\R\R-3.5.0\bin\x64\Rscript.exe" "C:\foo_bar.R"
What do you get when you multiply 6 * 9?
C:\>
When I try to run the file with a space in the name in version 3.5.0 of R, 
there is a fatal error saying there is no such file.


Kerry Jackson
Job title: Senior Account Manager, Ipsos Connect US RA Testing GMU
Phone: (203) 840-3443


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bug in RScript.exe for 3.5.0

2018-04-26 Thread Tomas Kalibera

Fixed in R-devel. I will port to R-patched after more testing.
Tomas

On 04/26/2018 01:52 AM, Tomas Kalibera wrote:
Thanks for the report. A quick workaround before this gets fixed is to 
add an extra first argument that has no space in it, e.g.


Rscript --vanilla "foo bar.R"

The problem exists on all systems, not just Windows.

Best
Tomas

On 04/25/2018 09:55 PM, Kerry Jackson wrote:

Hi R Developers,
I have found what I think is a bug in the RScript.exe in version 
3.5.0 of R for Windows.
When I call Rscript.exe for Version 3.5 of R, it is unable to open 
the file if the file name or path has a space in it.

As an example of what happens, I saved 2 files with the code:
cat("What do you get when you multiply 6 * 9?")
as C:\foo bar.R and as C:\foo_bar.R
When I in a DOS command window try to run these using version 3.4.3 
and 3.5:

C:\>"C:\Program Files\R\R-3.4.3\bin\x64\Rscript.exe" "C:\foo bar.R"
What do you get when you multiply 6 * 9?
C:\>"C:\Program Files\R\R-3.4.3\bin\x64\Rscript.exe" "C:\foo_bar.R"
What do you get when you multiply 6 * 9?
C:\>"C:\Program Files\R\R-3.5.0\bin\x64\Rscript.exe" "C:\foo bar.R"
Fatal error: cannot open file 'C:\foo': No such file or directory


C:\>"C:\Program Files\R\R-3.5.0\bin\x64\Rscript.exe" "C:\foo_bar.R"
What do you get when you multiply 6 * 9?
C:\>
When I try to run the file with a space in the name in version 3.5.0 
of R, there is a fatal error saying there is no such file.



Kerry Jackson
Job title: Senior Account Manager, Ipsos Connect US RA Testing GMU
Phone: (203) 840-3443


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bug in RScript.exe for 3.5.0

2018-04-26 Thread Tomas Kalibera

On 04/26/2018 02:23 PM, Kerry Jackson wrote:

Thanks Tomas.

I confirm the quick workaround works for me in the DOS prompt, and when having 
a shortcut to RScript in SendTo, and when used in the Task Scheduler.  I have 
not tested the R-devel version, due to my unfamiliarity with installing from 
source code.

Thanks, Kerry.

There are binary builds for daily snapshots of R-devel 
(development/unstable version of R) at

https://cran.r-project.org/bin/windows/base/rdevel.html

At this time the build should already have the fix.

Best
Tomas



-Original Message-
From: Tomas Kalibera [mailto:tomas.kalib...@gmail.com]
Sent: Thursday, April 26, 2018 6:34 AM
To: Kerry Jackson ; r-devel@r-project.org
Subject: Re: [Rd] Bug in RScript.exe for 3.5.0

Fixed in R-devel. I will port to R-patched after more testing.
Tomas

On 04/26/2018 01:52 AM, Tomas Kalibera wrote:

Thanks for the report. A quick workaround before this gets fixed is to
add an extra first argument that has no space in it, e.g.

Rscript --vanilla "foo bar.R"

The problem exists on all systems, not just Windows.

Best
Tomas

On 04/25/2018 09:55 PM, Kerry Jackson wrote:

Hi R Developers,
I have found what I think is a bug in the RScript.exe in version
3.5.0 of R for Windows.
When I call Rscript.exe for Version 3.5 of R, it is unable to open
the file if the file name or path has a space in it.
As an example of what happens, I saved 2 files with the code:
cat("What do you get when you multiply 6 * 9?") as C:\foo bar.R and
as C:\foo_bar.R When I in a DOS command window try to run these using
version 3.4.3 and 3.5:
C:\>"C:\Program Files\R\R-3.4.3\bin\x64\Rscript.exe" "C:\foo bar.R"
What do you get when you multiply 6 * 9?
C:\>"C:\Program Files\R\R-3.4.3\bin\x64\Rscript.exe" "C:\foo_bar.R"
What do you get when you multiply 6 * 9?
C:\>"C:\Program Files\R\R-3.5.0\bin\x64\Rscript.exe" "C:\foo bar.R"
Fatal error: cannot open file 'C:\foo': No such file or directory


C:\>"C:\Program Files\R\R-3.5.0\bin\x64\Rscript.exe" "C:\foo_bar.R"
What do you get when you multiply 6 * 9?
C:\>
When I try to run the file with a space in the name in version 3.5.0
of R, there is a fatal error saying there is no such file.


Kerry Jackson
Job title: Senior Account Manager, Ipsos Connect US RA Testing GMU
Phone: (203) 840-3443


 [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bug in RScript.exe for 3.5.0

2018-04-26 Thread Tomas Kalibera
Thanks, actually this is because the snapshot build is still one version 
behind (74642, the fix is in 74643). When I build my own installer and 
install it seems to be working fine. Sorry for the confusion,


Tomas

On 04/26/2018 02:49 PM, Kerry Jackson wrote:

Hi Tomas,

Thanks for the info about the binary builds; I did install it, however the bug 
still seems to be there in the current build.  The workaround you suggested 
does work:

C:\>"C:\Program Files\R\R-devel\bin\x64\Rscript.exe" "C:\foo bar.R"
Fatal error: cannot open file 'C:\foo': No such file or directory


C:\>"C:\Program Files\R\R-devel\bin\x64\Rscript.exe" --vanilla "C:\foo bar.R"
What do you get when you multiply 6 * 9?
C:\>

-Original Message-
From: Tomas Kalibera [mailto:tomas.kalib...@gmail.com]
Sent: Thursday, April 26, 2018 8:35 AM
To: Kerry Jackson ; r-devel@r-project.org
Subject: Re: [Rd] Bug in RScript.exe for 3.5.0

On 04/26/2018 02:23 PM, Kerry Jackson wrote:

Thanks Tomas.

I confirm the quick workaround works for me in the DOS prompt, and when having 
a shortcut to RScript in SendTo, and when used in the Task Scheduler.  I have 
not tested the R-devel version, due to my unfamiliarity with installing from 
source code.

Thanks, Kerry.

There are binary builds for daily snapshots of R-devel (development/unstable 
version of R) at https://cran.r-project.org/bin/windows/base/rdevel.html

At this time the build should already have the fix.

Best
Tomas


-Original Message-
From: Tomas Kalibera [mailto:tomas.kalib...@gmail.com]
Sent: Thursday, April 26, 2018 6:34 AM
To: Kerry Jackson ; r-devel@r-project.org
Subject: Re: [Rd] Bug in RScript.exe for 3.5.0

Fixed in R-devel. I will port to R-patched after more testing.
Tomas

On 04/26/2018 01:52 AM, Tomas Kalibera wrote:

Thanks for the report. A quick workaround before this gets fixed is
to add an extra first argument that has no space in it, e.g.

Rscript --vanilla "foo bar.R"

The problem exists on all systems, not just Windows.

Best
Tomas

On 04/25/2018 09:55 PM, Kerry Jackson wrote:

Hi R Developers,
I have found what I think is a bug in the RScript.exe in version
3.5.0 of R for Windows.
When I call Rscript.exe for Version 3.5 of R, it is unable to open
the file if the file name or path has a space in it.
As an example of what happens, I saved 2 files with the code:
cat("What do you get when you multiply 6 * 9?") as C:\foo bar.R and
as C:\foo_bar.R When I in a DOS command window try to run these
using version 3.4.3 and 3.5:
C:\>"C:\Program Files\R\R-3.4.3\bin\x64\Rscript.exe" "C:\foo bar.R"
What do you get when you multiply 6 * 9?
C:\>"C:\Program Files\R\R-3.4.3\bin\x64\Rscript.exe" "C:\foo_bar.R"
What do you get when you multiply 6 * 9?
C:\>"C:\Program Files\R\R-3.5.0\bin\x64\Rscript.exe" "C:\foo bar.R"
Fatal error: cannot open file 'C:\foo': No such file or directory


C:\>"C:\Program Files\R\R-3.5.0\bin\x64\Rscript.exe" "C:\foo_bar.R"
What do you get when you multiply 6 * 9?
C:\>
When I try to run the file with a space in the name in version 3.5.0
of R, there is a fatal error saying there is no such file.


Kerry Jackson
Job title: Senior Account Manager, Ipsos Connect US RA Testing GMU
Phone: (203) 840-3443


  [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bug in RScript.exe for 3.5.0

2018-04-28 Thread Tomas Kalibera


I don't have an opinion if this requires 3.5.1 to be released soon(er), 
but I have ported to R-patched now. The bug existed in R-devel for a 
year without being spotted, which is quite a long time - but it may be 
these bugs are hard to find before release, because people testing and 
using an unreleased version of R would not use space in file names. As 
apparently users of released versions do, we should do better about 
testing, perhaps test regularly with space in path names on a system 
that supports it. I think normal regression tests should not depend on 
such support.


In either case, I would recommend users to avoid space in file names to 
be safe. One known problem is that some packages will not build when 
RHOME has space in its name (on Windows, this is not a problem on drives 
with short file names supported, but that is not always the case) - CRAN 
is now checking the most common source of this issue, but there may be more.


Tomas


On 04/28/2018 07:23 PM, Yihui Xie wrote:

It seems the fix has not been ported to the patched version of R on
Windows yet. I just tested R version 3.5.0 Patched (2018-04-27
r74667).

IMHO this bug is so bad that it is worth a new release R 3.5.1 before
it starts biting more users like this one
https://stackoverflow.com/q/50077412/559676. BTW, although the bug has
been fixed (https://github.com/wch/r-source/commit/c29f694), I think
it will be even better if a corresponding test is added at the same
time to prevent this from happening again in the future.

Thanks!

Yihui

On Fri, Apr 27, 2018 at 7:03 AM, Kerry Jackson  wrote:

Thanks Tomas,

I confirm the R Under development (unstable) (2018-04-26 r74651) version works 
for Rscript when the file name has a space, and no arguments are specified.

C:\>"C:\Program Files\R\R-devel\bin\x64\Rscript.exe" "C:\foo bar.R"
R Under development (unstable) (2018-04-26 r74651)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.6.0

C:\>

-----Original Message-
From: Tomas Kalibera [mailto:tomas.kalib...@gmail.com]
Sent: Thursday, April 26, 2018 10:22 AM
To: Kerry Jackson 
Cc: r-devel@r-project.org
Subject: Re: [Rd] Bug in RScript.exe for 3.5.0

Thanks, actually this is because the snapshot build is still one version behind 
(74642, the fix is in 74643). When I build my own installer and install it 
seems to be working fine. Sorry for the confusion,

Tomas

On 04/26/2018 02:49 PM, Kerry Jackson wrote:

Hi Tomas,

Thanks for the info about the binary builds; I did install it, however the bug 
still seems to be there in the current build.  The workaround you suggested 
does work:

C:\>"C:\Program Files\R\R-devel\bin\x64\Rscript.exe" "C:\foo bar.R"
Fatal error: cannot open file 'C:\foo': No such file or directory


C:\>"C:\Program Files\R\R-devel\bin\x64\Rscript.exe" --vanilla "C:\foo bar.R"
What do you get when you multiply 6 * 9?
C:\>

-Original Message-
From: Tomas Kalibera [mailto:tomas.kalib...@gmail.com]
Sent: Thursday, April 26, 2018 8:35 AM
To: Kerry Jackson ; r-devel@r-project.org
Subject: Re: [Rd] Bug in RScript.exe for 3.5.0

On 04/26/2018 02:23 PM, Kerry Jackson wrote:

Thanks Tomas.

I confirm the quick workaround works for me in the DOS prompt, and when having 
a shortcut to RScript in SendTo, and when used in the Task Scheduler.  I have 
not tested the R-devel version, due to my unfamiliarity with installing from 
source code.

Thanks, Kerry.

There are binary builds for daily snapshots of R-devel
(development/unstable version of R) at
https://cran.r-project.org/bin/windows/base/rdevel.html

At this time the build should already have the fix.

Best
Tomas


-Original Message-
From: Tomas Kalibera [mailto:tomas.kalib...@gmail.com]
Sent: Thursday, April 26, 2018 6:34 AM
To: Kerry Jackson ; r-devel@r-project.org
Subject: Re: [Rd] Bug in RScript.exe for 3.5.0

Fixed in R-devel. I will port to R-patched after more testing.
Tomas

On 04/26/2018 01:52 AM, Tomas Kalibera wrote:

Thanks for the report. A quick workaround before this gets fixed is
to add an extra first argument that has no space in it, e.g.

Rscript --vanilla "foo bar.R"

The problem exists on all systems, not just Windows.

Best
Tomas

On 04/25/2018 09:55 PM, Kerry Jackson wrote:

Hi R Developers,
I have found what I think is a bug in the RScript.exe in version
3.5.0 of R for Windows.
When I call Rscript.exe for Version 3.5 of R, it is unable to open
the file if the file name or path has a space in it.
As an example of what hap

Re: [Rd] Bug in RScript.exe for 3.5.0

2018-04-29 Thread Tomas Kalibera
Hi Simone,

On 04/29/2018 10:12 PM, Simone Giannerini wrote:
>
> Dear Tomas,
>
> thank you for fixing the bug, I still do not find it mentioned in
> the changelog though (neither R-patched nor R-devel), also, see
> inline below
>
as Henrik pointed out, the fix is mentioned in the NEWS file (now both 
R-devel and R-patched). The usual procedure is fix in R-devel, give some 
time for further testing/review also by others (depends on the severity 
of the bug and the risk of introducing new bugs by the fix), then update 
NEWS in R-devel, then eventually port both the fix and the NEWS entry to 
R-patched.
>
>
> On Sat, Apr 28, 2018 at 11:36 PM, Tomas Kalibera
> mailto:tomas.kalib...@gmail.com>> wrote:
>
>
> I don't have an opinion if this requires 3.5.1 to be released
> soon(er), but I have ported to R-patched now. The bug existed
> in R-devel for a year without being spotted, which is quite a
> long time - but it may be these bugs are hard to find before
> release, because people testing and using an unreleased
> version of R would not use space in file names. As apparently
> users of released versions do, we should do better about
> testing, perhaps test regularly with space in path names on a
> system that supports it. I think normal regression tests
> should not depend on such support.
>
> In either case, I would recommend users to avoid space in file
> names to be safe. 
>
>
> note that sometimes users have little or no control over this. For
> instance, the bug broke the RManager interface between R, knitr
> and Winedt since Winedt installs itself and its data in
> directories with spaces in the filename/path and RManager calls
> are something of the kind
>
> Rscript.exe "%b\Exec\R\Knitr.R" filename.Rnw
>
> where %b is the local Winedt directory that by default has spaces
> in its path. Before you suggested the workaround I had to convert
> manually the paths to the dos 8.3 format in order to make it work
> again.
>
Indeed, the space could also originate from path name set up by the 
system administrator (such as first and last name of a user) or from a 
third party application. It may be also a third party application that 
is not robust against space in file name (e.g. Make, where it is by 
design and essentially cannot be fixed). I think ideally in we should in 
all roles (administrator, user, package developer, R internals 
developer) try to avoid space in file names, try to educate people to 
avoid space in file names, but also try to make systems work even with 
space in file name. When you find a bug in R itself that prevents 
something from working with space in filename, please file a bug report. 
Especially when a bug has been introduced between R versions, it is 
usually easy to fix it - of course there is no guarantee but you may 
easily be given a workaround within few hours. If you find such error in 
a package, please report to package author (and if repeatedly 
unresponsive, please report to repository maintainer - e.g. CRAN).

Best
Tomas


>
> Ciao,
>
> Simone
>
> One known problem is that some packages will not build when
> RHOME has space in its name (on Windows, this is not a problem
> on drives with short file names supported, but that is not
> always the case) - CRAN is now checking the most common source
> of this issue, but there may be more.
>
> Tomas
>
>
>
> On 04/28/2018 07:23 PM, Yihui Xie wrote:
>
> It seems the fix has not been ported to the patched
> version of R on
> Windows yet. I just tested R version 3.5.0 Patched (2018-04-27
> r74667).
>
> IMHO this bug is so bad that it is worth a new release R
> 3.5.1 before
> it starts biting more users like this one
> https://stackoverflow.com/q/50077412/559676. BTW, although
> the bug has
> been fixed
> (https://github.com/wch/r-source/commit/c29f694), I think
> it will be even better if a corresponding test is added at
> the same
> time to prevent this from happening again in the future.
>
> Thanks!
>
> Yihui
>
> On Fri, Apr 27, 2018 at 7:03 AM, Kerry Jackson
> mailto:kerry.jack...@ipsos.com>>
> wrote:
>
> Thanks Tomas,
>
> I confirm the R Under development (unstable)
> (2018-04-26 r74651) version works for Rscript when the
> 

Re: [Rd] download.file does not process gz files correctly (truncates them?)

2018-05-03 Thread Tomas Kalibera

On 05/03/2018 11:14 PM, Henrik Bengtsson wrote:

Also, as mentioned in my
https://stat.ethz.ch/pipermail/r-devel/2012-August/064739.html, when
not specifying the mode argument, the default on Windows is mode = "w"
*except* for certain, case-sensitive, filename extensions:

 if(missing(mode) && length(grep("\\.(gz|bz2|xz|tgz|zip|rda|RData)$", url)))
 mode <- "wb"

Just like the need for mode = "wb" on Windows, the above
special-file-extension-hack is only happening on Windows, and is only
documented in ?download.file if you're on Windows; so someone who's on
Linux/macOS trying to help someone on Windows may not be aware of
this. This adds to even more confusions, e.g. "works for me".
If we were designing the API today, it would probably make more sense 
not to convert any line endings by default. Today's editors _usually_ 
can cope with different line endings and it is probably easier to detect 
that a text file has incorrect line endings rather than detecting that a 
binary file has been corrupted by an attempt to convert line endings. 
But whether to change existing, documented behavior is a different 
question. In order to help users and programmers who do not read the 
documentation carefully we would create problems for users and 
programmers who do. The current heuristic/hack is in line with the 
compatibility approach: it detects files that are obviously binary, so 
it changes the default behavior only for cases when it would obviously 
cause damage.


Tomas




/Henrik

On Thu, May 3, 2018 at 7:27 AM, Joris Meys  wrote:

Thank you Henrik and Martin for explaining what was going on. Very
insightful!

On Thu, May 3, 2018 at 4:21 PM, Jeroen Ooms  wrote:

On Thu, May 3, 2018 at 2:42 PM, Henrik Bengtsson
 wrote:

Use mode="wb" when you download the file. See
https://github.com/HenrikBengtsson/Wishlist-for-R/issues/30.

R core, and others, is there a good argument for why we are not making
this
the default download mode? It seems like a such a simple fix to such a
common "mistake".

I'd like to second this feature request. This default behaviour is
unexpected and often leads to r scripts that were written on
mac/linux, to produce corrupted files on windows, checksum mismatches,
etc.

Even for text files, the default should be to download the file as-is.
Trying to "fix" line-endings should be opt-in, never the default.
Downloading a file via a browser or ftp client on windows also doesn't
change the file, why should R?


I third the feature request.




On Thu, May 3, 2018 at 3:02 PM, Duncan Murdoch 
wrote:

Many downloads are text files (HTML, CSV, etc.), and if those are
downloaded
in binary, a Windows user might end up with a file that Notepad can't
handle, because it would have Unix-style line endings.

True but I don't think this is relevant. The same holds e.g. for the R
files in source packages, which also have unix line endings. Most
Windows users will use an actual editor that understands both types of
line endings, or can convert between the two.

Downloading-file should do just that.


Again, I agree. In my (limited) experience the only program that fails to
properly display \n as a line ending, is Notepad. But it can still open the
file regardless. If line ending conflicts cause bugs, it's almost always a
unix-like OS struggling with Windows-style endings. I have yet to meet the
first one the other way around.

Cheers
Joris


--
Joris Meys
Statistical consultant

Department of Data Analysis and Mathematical Modelling
Ghent University
Coupure Links 653, B-9000 Gent (Belgium)

---
Biowiskundedagen 2017-2018
http://www.biowiskundedagen.ugent.be/

---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [Bug report] Chinese characters are not handled correctly in Rterm for Windows

2018-05-04 Thread Tomas Kalibera


Thanks for the update. I believe I've fixed a part of the problem you 
have reported, the crash while entering Chinese characters to the 
console (e.g. via Pinyin, the error message about invalid multibyte 
character in mbcs_get_next). The fix is in R-devel 74693 - Windows 
function ReadConsoleInputA no longer works with multibyte characters (it 
is not documented, probably a Windows bug, according to reports online 
this problem exists since Windows 8, but I only reproduced/tested in 
Windows 10). Could you please verify the crash is no longer happening on 
your system?


Re the other problem, Chinese characters not being displayed. I found 
this is caused by R calling setlocale(LC_CTYPE, *). Setting this to 
"Chinese" and variants (code page 936) causes the problem, but running 
in the "C" locale as per default works fine. This is easily reproduced 
by an external program below - when setlocale() is called, the Chinese 
character disappears from the output. A workaround is to run R with 
environment variable LC_CTYPE=C. Could you please verify the printed 
characters are ok with this setting? Would you have an explanation for 
this behavior? It seems a bit odd - why would the CRT remove characters 
valid in the console code page, when both the console code page and the 
"setlocale" code page are 936.


Thanks
Tomas

    #include 
    #include 
    int main(int argc, char **argv) {
    //if (!setlocale(LC_CTYPE, "Chinese")) fprintf(stderr, 
"setlocale failed\n");

    int chars[] = { 67, 196, 227, 68 };
    for(int i = 0; i < 4; i++) fputc(chars[i], stdout);
    fprintf(stdout, "\n");
    return 0;
    }

On 04/28/2018 04:53 PM, Azure wrote:

Hi Tomas,

Sorry for the delayed response. I have tested the problem on the latest R-devel 
build (2018-04-27 r74651), and it still exists. RGui is always fine with 
Chinese characters, but some IDEs rely on the CLI version of R (e.g. Visual 
Studio Code with R plugin).


Your example  print("ABC\u4f60\u597dDEF") is printing two Chinese characters, 
right?

Yes. U+4F60, U+597D or C4E3, BAC3 in CP936.


Could you reproduce the problem with printing just one of the characters, say 
print("ABC\u4f60DEF") ?

Yes. The console output is pasted in [ https://paste.ubuntu.com/p/TYgZWhdgXK/ ] 
(to avoid gibberish in e-mail).
The Active Code Page is 936 before and after running Rterm.


As a sanity check - does this display the correct characters in RGui?

Yes.


If you take the sequence of the "fputc" commands you captured by the debugger, 
and create a trivial console application to just run them - would the characters display 
correctly in the same terminal from which you run R.exe?

Yes. I created an Win32 Console Application in VS [ 
https://paste.ubuntu.com/p/h3NFV6nQvs/ ], and all the characters were displayed 
correctly in two ways. The WriteConsoleA variant uses the current console CP 
settings, and it should behave like fputc.

I guess the Rterm uses its own console I/O mechanism so the 2nd parameter of 
fputc is not stdout's handle. (I tried to read the source but unable to figure 
out how it works). The crash in mbcs_get_next, which is also mentioned in the 
previous post, may be related to this mechanism.

If you need further information, please let me know.

Thanks,
i...@azurefx.name


Tomas Kalibera  2018/4/5 22:42


Thank you for the report and initial debugging. I am not sure what is going 
wrong, we may have to rely on your help to debug this (I do not have a system 
to reproduce on). A user-targeted advice would be to use RGui (Rgui.exe).

Does the problem also exist in R-devel?
https://cran.r-project.org/bin/windows/base/rdevel.html

Your example  print("ABC\u4f60\u597dDEF") is printing two Chinese characters, right? The 
first one is C4E3 in CP936 (4F60 in Unicode) and the second one is BAC3 in CP936 (597D in Unicode)? 
Could you reproduce the problem with printing just one of the characters, say 
print("ABC\u4f60DEF") ?

As a sanity check - does this display the correct characters in RGui? It 
should, and does on my system, as RGui uses Unicode internally. By correct I 
mean the characters shown e.g. here

https://msdn.microsoft.com/en-us/library/cc194923.aspx
https://msdn.microsoft.com/en-us/library/cc194920.aspx

What is the output of "chcp" in the terminal, before you run R.exe? It may be 
different from what Sys.getlocale() gives in R.

If you take the sequence of the "fputc" commands you captured by the debugger, 
and create a trivial console application to just run them - would the characters display 
correctly in the same terminal from which you run R.exe?

Thanks
Tomas



>


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] download.file does not process gz files correctly (truncates them?)

2018-05-09 Thread Tomas Kalibera

On 05/08/2018 05:15 PM, Hadley Wickham wrote:

On Thu, May 3, 2018 at 11:34 PM, Tomas Kalibera
 wrote:

On 05/03/2018 11:14 PM, Henrik Bengtsson wrote:

Also, as mentioned in my
https://stat.ethz.ch/pipermail/r-devel/2012-August/064739.html, when
not specifying the mode argument, the default on Windows is mode = "w"
*except* for certain, case-sensitive, filename extensions:

  if(missing(mode) && length(grep("\\.(gz|bz2|xz|tgz|zip|rda|RData)$",
url)))
  mode <- "wb"

Just like the need for mode = "wb" on Windows, the above
special-file-extension-hack is only happening on Windows, and is only
documented in ?download.file if you're on Windows; so someone who's on
Linux/macOS trying to help someone on Windows may not be aware of
this. This adds to even more confusions, e.g. "works for me".

If we were designing the API today, it would probably make more sense not to
convert any line endings by default. Today's editors _usually_ can cope with
different line endings and it is probably easier to detect that a text file
has incorrect line endings rather than detecting that a binary file has been
corrupted by an attempt to convert line endings. But whether to change
existing, documented behavior is a different question. In order to help
users and programmers who do not read the documentation carefully we would
create problems for users and programmers who do. The current heuristic/hack
is in line with the compatibility approach: it detects files that are
obviously binary, so it changes the default behavior only for cases when it
would obviously cause damage.

 From a purely utilitarian standpoint, there are far more users who do
not carefully read the documentation than users who do ;)
And for that reason the behavior should be as intuitive as possible when 
designed. What was intuitive 15-20 years ago may not be intuitive now, 
but that should probably not be a justification for a change in 
documented behavior.

(I'd also argue that basing the decision on the file extension is
suboptimal, and it would be better to use the mime type if provided by
the server)
Yes, that would be nice. Also some binary files could be detected via 
magic numbers (yet not all, e.g. RDS do not have them). It won't be as 
trivial as decoding the URL, though.


Tomas



Hadley



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Possible bug in package installation when R_ICU_LOCALE is set

2018-05-16 Thread Tomas Kalibera

Thanks for the report, fixed in 74706.

Best,
Tomas

On 04/26/2018 08:43 AM, Korpela Mikko (MML) wrote:

(Belated) thanks for the confirmation, Ista. I just reported this issue on the 
R bug tracker:
https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17412

Best regards,

- Mikko

-Alkuperäinen viesti-
Lähettäjä: Ista Zahn [mailto:istaz...@gmail.com]
Lähetetty: 7. helmikuuta 2018 17:05
Vastaanottaja: Korpela Mikko (MML)
Kopio: r-devel@r-project.org
Aihe: Re: [Rd] Possible bug in package installation when R_ICU_LOCALE is set

I can reproduce this on Linux, so it is not Windows-specific.


sessionInfo()

R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Arch Linux

Matrix products: default
BLAS/LAPACK: /usr/lib/libopenblas_haswellp-r0.2.20.so

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
LC_TIME=en_US.UTF-8
  [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.4.3 rmsfact_0.0.3  tools_3.4.3cowsay_0.5.0   fortunes_1.5-4


On Wed, Feb 7, 2018 at 8:38 AM, Korpela Mikko (MML) 
 wrote:

On a Windows computer (other platforms not tested), installing a
package from source may fail if the environment variable R_ICU_LOCALE
is set, depending on the package and the locale.

For example, after setting R_ICU_LOCALE to "fi_FI",

   install.packages("seriation", type = "source")

(package version 1.2-3) fails with the following error:

** preparing package for lazy loading
Error in set_criterion_method("dist", "AR_events", criterion_ar_events,  :
   could not find function "set_criterion_method"
Error : unable to load R code in package 'seriation'

Package "Epi" (version 2.24) fails similarly:

** preparing package for lazy loading
Error in eval(exprs[i], envir) : object 'Relevel.default' not found
Error : unable to load R code in package 'Epi'

Whether R_ICU_LOCALE is set before R is launched or during the session
doesn't matter: installation of these two example packages fails
either way. If R_ICU_LOCALE is unset, calling

   icuSetCollate(locale = "fi_FI")

is harmless. Browsing through the R manuals, I did not find warnings
against using R_ICU_LOCALE, or any indication why package installation
should fail with the variable being set. About the collation order of
R code files, "Writing R Extensions" says:


The default is to collate according to the 'C' locale.

I interpret this (and the surrounding text) as a "promise" to package
developers that no matter what the end user does, the developer should
be able to rely on the collation order being 'C' unless the developer
defines another order.


sessionInfo()

R version 3.4.3 Patched (2018-02-03 r74231)
Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64
(build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=Finnish_Finland.1252  LC_CTYPE=Finnish_Finland.1252 [3]
LC_MONETARY=Finnish_Finland.1252 LC_NUMERIC=C [5]
LC_TIME=Finnish_Finland.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.4.3 tools_3.4.3

--
Mikko Korpela
Chief Expert, Valuations
National Land Survey of Finland
Opastinsilta 12 C, FI-00520 Helsinki, Finland
+358 50 462 6082
www.maanmittauslaitos.fi

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


  1   2   3   4   5   >