Re: [Rd] Package compiler - efficiency problem

2018-08-16 Thread Iñaki Ucar
isable compilation of these generated functions. There is not a
> > documented way to do that and maybe we could add it (and technically it is
> > trivial), but I have been reluctant so far - in some cases, compilation
> > even of these functions may be beneficial - if the speedup is 5-10x and we
> > run very many times. But once the generated code included some pragma
> > preventing compilation, it won't be ever compiled. Also, the trade-offs may
> > change as the compiler evolves, perhaps not in this case, but in other
> > where such pragma may be used.
> >
> > Well so the short answer would be that these functions should not be
> > generated in the first place. If it were too much work rewriting, perhaps
> > the generator could just be improved to produce vectorized operations.
> >
> > Best
> > Tomas
> > On 12.8.2018 21:31, Karol Podemski wrote:
> >
> >  Dear R team,
> >
> > I am a co-author and maintainer of one of R packages distributed by R-forge
> > (gEcon). One of gEcon package users found a strange behaviour of package (R
> > froze for couple of minutes) and reported it to me. I traced the strange
> > behaviour to compiler package. I attach short demonstration of the problem
> > to this mail (demonstration makes use of compiler and tictoc packages only).
> >
> > In short, the compiler package has problems in compiling large functions -
> > their compilation and execution may take much longer than direct execution
> > of an uncompiled function. Such functions are generated by gEcon package as
> > they describe steady state for economy.
> >
> > I am curious if you are aware of such problems and plan to handle the
> > efficiency issues. On one of the boards I saw that there were efficiency
> > issues in rpart package but they have been resolved. Or would you advise to
> > turn off JIT on package load (package heavily uses such long functions
> > generated whenever a new model is created)?
> >
> > Best regards,
> > Karol Podemski
> >
> >
> >
> > __r-de...@r-project.org mailing 
> > listhttps://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Iñaki Ucar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Consider setting RTLD_GLOBAL when loading packages in LinkingTo

2018-08-20 Thread Iñaki Ucar
Hi everyone,

Some of you probably received the following thread from the Rcpp-devel
mailing list:

http://lists.r-forge.r-project.org/pipermail/rcpp-devel/2018-August/010072.html

Summing up, the issue described is the following: pkg1 provides type1
in pkg1.so building on some headers. pkg2 links to pkg1 (BTW,
LinkingTo is actually misleading, because it doesn't really link to
it), i.e., provides type1 in pkg2.so building on the same headers.
Now, pkg2 creates an external pointer to type1 using pkg1.so, and
dynamically casts it and manipulates it using functions in pkg2.so.

This works perfectly, because type1 is exactly the same in pkg1.so and
pkg2.so. *But* UBSAN sanitizers give a runtime error, which arguably
is a false positive. Real example on CRAN:

https://www.stats.ox.ac.uk/pub/bdr/memtests/gcc-UBSAN/ldat/ldat-Ex.Rout

A solution to this would be to dlopen pkg1.so with RTLD_GLOBAL,
instead of RTLD_LOCAL, i.e., dyn.load(local=FALSE). So my proposal is
to automatically set RTLD_GLOBAL for those packages that are listed at
the same time in Depends/Imports/Suggests and LinkingTo, at least for
those machines on CRAN running UBSAN tests.

Regards,
-- 
Iñaki Ucar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] compairing doubles

2018-08-31 Thread Iñaki Ucar
El vie., 31 ago. 2018 a las 15:10, Felix Ernst
() escribió:
>
> Dear all,
>
> I a bit unsure, whether this qualifies as a bug, but it is definitly a 
> strange behaviour. That why I wanted to discuss it.
>
> With the following function, I want to test for evenly space numbers, 
> starting from anywhere.
>
> .is_continous_evenly_spaced <- function(n){
>   if(length(n) < 2) return(FALSE)
>   n <- n[order(n)]
>   n <- n - min(n)
>   step <- n[2] - n[1]
>   test <- seq(from = min(n), to = max(n), by = step)
>   if(length(n) == length(test) &&
>  all(n == test)){
> return(TRUE)
>   }
>   return(FALSE)
> }
>
> > .is_continous_evenly_spaced(c(1,2,3,4))
> [1] TRUE
> > .is_continous_evenly_spaced(c(1,3,4,5))
> [1] FALSE
> > .is_continous_evenly_spaced(c(1,1.1,1.2,1.3))
> [1] FALSE
>
> I expect the result for 1 and 2, but not for 3. Upon Investigation it turns 
> out, that n == test is TRUE for every pair, but not for the pair of 0.2.
>
> The types reported are always double, however n[2] == 0.1 reports FALSE as 
> well.
>
> The whole problem is solved by switching from all(n == test) to 
> all(as.character(n) == as.character(test)). However that is weird, isn’t it?
>
> Does this work as intended? Thanks for any help, advise and suggestions in 
> advance.

I guess this has something to do with how the sequence is built and
the inherent error of floating point arithmetic. In fact, if you
return test minus n, you'll get:

[1] 0.00e+00 0.00e+00 2.220446e-16 0.00e+00

and the error gets bigger when you continue the sequence; e.g., this
is for c(1, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7):

[1] 0.00e+00 0.00e+00 2.220446e-16 2.220446e-16 4.440892e-16
[6] 4.440892e-16 4.440892e-16 0.00e+00

So, independently of this is considered a bug or not, instead of

length(n) == length(test) && all(n == test)

I would use the following condition:

isTRUE(all.equal(n, test))

Iñaki

>
> Best regards,
> Felix
>
>
>     [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Iñaki Ucar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] compairing doubles

2018-08-31 Thread Iñaki Ucar
FYI, more fun with floats:

> 0.1+0.1==0.2
[1] TRUE
> 0.1+0.1+0.1+0.1==0.4
[1] TRUE
> 0.1+0.1+0.1==0.3
[1] FALSE
> 0.1+0.1+0.1==0.1*3
[1] TRUE
> 0.3==0.1*3
[1] FALSE

¯\_(ツ)_/¯

But this is not R's fault. See: https://0.30004.com

Iñaki

El vie., 31 ago. 2018 a las 15:36, Iñaki Ucar
() escribió:
>
> El vie., 31 ago. 2018 a las 15:10, Felix Ernst
> () escribió:
> >
> > Dear all,
> >
> > I a bit unsure, whether this qualifies as a bug, but it is definitly a 
> > strange behaviour. That why I wanted to discuss it.
> >
> > With the following function, I want to test for evenly space numbers, 
> > starting from anywhere.
> >
> > .is_continous_evenly_spaced <- function(n){
> >   if(length(n) < 2) return(FALSE)
> >   n <- n[order(n)]
> >   n <- n - min(n)
> >   step <- n[2] - n[1]
> >   test <- seq(from = min(n), to = max(n), by = step)
> >   if(length(n) == length(test) &&
> >  all(n == test)){
> > return(TRUE)
> >   }
> >   return(FALSE)
> > }
> >
> > > .is_continous_evenly_spaced(c(1,2,3,4))
> > [1] TRUE
> > > .is_continous_evenly_spaced(c(1,3,4,5))
> > [1] FALSE
> > > .is_continous_evenly_spaced(c(1,1.1,1.2,1.3))
> > [1] FALSE
> >
> > I expect the result for 1 and 2, but not for 3. Upon Investigation it turns 
> > out, that n == test is TRUE for every pair, but not for the pair of 0.2.
> >
> > The types reported are always double, however n[2] == 0.1 reports FALSE as 
> > well.
> >
> > The whole problem is solved by switching from all(n == test) to 
> > all(as.character(n) == as.character(test)). However that is weird, isn’t it?
> >
> > Does this work as intended? Thanks for any help, advise and suggestions in 
> > advance.
>
> I guess this has something to do with how the sequence is built and
> the inherent error of floating point arithmetic. In fact, if you
> return test minus n, you'll get:
>
> [1] 0.00e+00 0.00e+00 2.220446e-16 0.00e+00
>
> and the error gets bigger when you continue the sequence; e.g., this
> is for c(1, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7):
>
> [1] 0.00e+00 0.00e+00 2.220446e-16 2.220446e-16 4.440892e-16
> [6] 4.440892e-16 4.440892e-16 0.00e+00
>
> So, independently of this is considered a bug or not, instead of
>
> length(n) == length(test) && all(n == test)
>
> I would use the following condition:
>
> isTRUE(all.equal(n, test))
>
> Iñaki
>
> >
> > Best regards,
> > Felix
> >
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>
> --
> Iñaki Ucar



-- 
Iñaki Ucar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] compairing doubles

2018-08-31 Thread Iñaki Ucar
El vie., 31 ago. 2018 a las 16:00, Mark van der Loo
() escribió:
>
> how about
>
> is_evenly_spaced <- function(x,...) all.equal(diff(sort(x)),...)

This doesn't work, because

1. all.equal does *not* return FALSE. Use of isTRUE or identical(.,
TRUE) is required if you want a boolean.
2. all.equal compares two objects, not elements in a vector.

Iñaki

>
> (use ellipsis to set tolerance if necessary)
>
>
> Op vr 31 aug. 2018 om 15:46 schreef Emil Bode :
>>
>> Agreed that's it's rounding error, and all.equal would be the way to go.
>> I wouldn't call it a bug, it's simply part of working with floating point 
>> numbers, any language has the same issue.
>>
>> And while we're at it, I think the function can be a lot shorter:
>> .is_continous_evenly_spaced <- function(n){
>>   length(n)>1 && isTRUE(all.equal(n[order(n)], seq(from=min(n), to=max(n), 
>> length.out = length(n
>> }
>>
>> Cheers, Emil
>>
>> El vie., 31 ago. 2018 a las 15:10, Felix Ernst
>> () escribió:
>> >
>> > Dear all,
>> >
>> > I a bit unsure, whether this qualifies as a bug, but it is definitly a 
>> strange behaviour. That why I wanted to discuss it.
>> >
>> > With the following function, I want to test for evenly space numbers, 
>> starting from anywhere.
>> >
>> > .is_continous_evenly_spaced <- function(n){
>> >   if(length(n) < 2) return(FALSE)
>> >   n <- n[order(n)]
>> >   n <- n - min(n)
>> >   step <- n[2] - n[1]
>> >   test <- seq(from = min(n), to = max(n), by = step)
>> >   if(length(n) == length(test) &&
>> >  all(n == test)){
>> > return(TRUE)
>> >   }
>> >   return(FALSE)
>> > }
>> >
>> > > .is_continous_evenly_spaced(c(1,2,3,4))
>> > [1] TRUE
>> > > .is_continous_evenly_spaced(c(1,3,4,5))
>> > [1] FALSE
>> > > .is_continous_evenly_spaced(c(1,1.1,1.2,1.3))
>> > [1] FALSE
>> >
>> > I expect the result for 1 and 2, but not for 3. Upon Investigation it 
>> turns out, that n == test is TRUE for every pair, but not for the pair of 
>> 0.2.
>> >
>> > The types reported are always double, however n[2] == 0.1 reports 
>> FALSE as well.
>> >
>> > The whole problem is solved by switching from all(n == test) to 
>> all(as.character(n) == as.character(test)). However that is weird, isn’t it?
>> >
>> > Does this work as intended? Thanks for any help, advise and 
>> suggestions in advance.
>>
>> I guess this has something to do with how the sequence is built and
>> the inherent error of floating point arithmetic. In fact, if you
>> return test minus n, you'll get:
>>
>> [1] 0.00e+00 0.00e+00 2.220446e-16 0.00e+00
>>
>> and the error gets bigger when you continue the sequence; e.g., this
>> is for c(1, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7):
>>
>> [1] 0.00e+00 0.00e+00 2.220446e-16 2.220446e-16 4.440892e-16
>> [6] 4.440892e-16 4.440892e-16 0.00e+00
>>
>> So, independently of this is considered a bug or not, instead of
>>
>> length(n) == length(test) && all(n == test)
>>
>> I would use the following condition:
>>
>>     isTRUE(all.equal(n, test))
>>
>> Iñaki
>>
>> >
>> > Best regards,
>> > Felix
>> >
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > __
>> > R-devel@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>>
>> --
>> Iñaki Ucar
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Iñaki Ucar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] compairing doubles

2018-08-31 Thread Iñaki Ucar
El vie., 31 ago. 2018 a las 17:08, Serguei Sokol
() escribió:
>
> Le 31/08/2018 à 16:25, Mark van der Loo a écrit :
> > Ah, my bad, you're right of course.
> >
> > sum(abs(diff(diff( sort(x) < eps
> >
> > for some reasonable eps then, would do as a oneliner, or
> >
> > all(abs(diff(diff(sort(x < eps)
> >
> > or
> >
> > max(abs(diff(diff(sort(x) < eps
> Or with only four function calls:
> diff(range(diff(sort(x < eps

We may have a winner... :)

Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] True length - length(unclass(x)) - without having to call unclass()?

2018-09-05 Thread Iñaki Ucar
The bottomline here is that one can always call a base method,
inexpensively and without modifying the object, in, let's say,
*formal* OOP languages. In R, this is not possible in general. It
would be possible if there was always a foo.default, but primitives
use internal dispatch.

I was wondering whether it would be possible to provide a super(x, n)
function which simply causes the dispatching system to avoid "n"
classes in the hierarchy, so that:

> x <- structure(list(), class=c("foo", "bar"))
> length(super(x, 0)) # looks for a length.foo
> length(super(x, 1)) # looks for a length.bar
> length(super(x, 2)) # calls the default
> length(super(x, Inf)) # calls the default

Iñaki

El mié., 5 sept. 2018 a las 10:09, Tomas Kalibera
() escribió:
>
> On 08/24/2018 07:55 PM, Henrik Bengtsson wrote:
> > Is there a low-level function that returns the length of an object 'x'
> > - the length that for instance .subset(x) and .subset2(x) see? An
> > obvious candidate would be to use:
> >
> > .length <- function(x) length(unclass(x))
> >
> > However, I'm concerned that calling unclass(x) may trigger an
> > expensive copy internally in some cases.  Is that concern unfounded?
> Unclass() will always copy when "x" is really a variable, because the
> value in "x" will be referenced; whether it is prohibitively expensive
> or not depends only on the workload - if "x" is a very long list and
> this functions is called often then it could, but at least to me this
> sounds unlikely. Unless you have a strong reason to believe it is the
> case I would just use length(unclass(x)).
>
> If the copying is really a problem, I would think about why the
> underlying vector length is needed at R level - whether you really need
> to know the length without actually having the unclassed vector anyway
> for something else, so whether you are not paying for the copy anyway.
> Or, from the other end, if you need to do more without copying, and it
> is possible without breaking the value semantics, then you might need to
> switch to C anyway and for a bigger piece of code.
>
> If it were still just .length() you needed and it were performance
> critical, you could just switch to C and call Rf_length. That does not
> violate the semantics, just indeed it is not elegant as you are
> switching to C.
>
> If you stick to R and can live with the overhead of length(unclass(x))
> then there is a chance the overhead will decrease as R is optimized
> internally. This is possible in principle when the runtime knows that
> the unclassed vector is only needed to compute something that does not
> modify the vector. The current R cannot optimize this out, but it should
> be possible with ALTREP at some point (and as Radford mentioned pqR does
> it differently). Even with such internal optimizations indeed it is
> often necessary to make guesses about realistic workloads, so if you
> have a realistic workload where say length(unclass(x)) is critical, you
> are more than welcome to donate it as benchmark.
>
> Obviously, if you use a C version calling Rf_length, after such R
> optimization your code would be unnecessarily non-elegant, but would
> still work and probably without overhead, because R can't do much less
> than Rf_length. In more complicated cases though hand-optimized C code
> to implement say 2 operations in sequence could be slower than what
> better optimizing runtime could do by joining the effect of possibly
> more operations, which is in principle another danger of switching from
> R to C. But as far as the semantics is followed, there is no other danger.
>
> The temptation should be small anyway in this case when Rf_length()
> would be the simplest, but as I made it more than clear in the previous
> email, one should never violate the value semantics by temporarily
> modifying the object (temporarily removing the class attribute or
> temporarily remove the object bit). Violating semantics causes bugs, if
> not with the present then with future versions of R (where version may
> be an svn revision). A concrete recent example: modifying objects in
> place in violation of the semantics caused a lot of bugs with
> introduction of unification of constants in the byte-code compiler.
>
> Best
> Tomas
>
> >
> > Thxs,
> >
> > Henrik
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Iñaki Ucar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] True length - length(unclass(x)) - without having to call unclass()?

2018-09-10 Thread Iñaki Ucar
El lun., 10 sept. 2018 a las 14:18, Tomas Kalibera
() escribió:
>
> On 09/05/2018 11:18 AM, Iñaki Ucar wrote:
> > The bottomline here is that one can always call a base method,
> > inexpensively and without modifying the object, in, let's say,
> > *formal* OOP languages. In R, this is not possible in general. It
> > would be possible if there was always a foo.default, but primitives
> > use internal dispatch.
> >
> > I was wondering whether it would be possible to provide a super(x, n)
> > function which simply causes the dispatching system to avoid "n"
> > classes in the hierarchy, so that:
> >
> >> x <- structure(list(), class=c("foo", "bar"))
> >> length(super(x, 0)) # looks for a length.foo
> >> length(super(x, 1)) # looks for a length.bar
> >> length(super(x, 2)) # calls the default
> >> length(super(x, Inf)) # calls the default
> I think that a cast should always to be for a specific class, defined by
> the name of the class. Identifying classes by their inheritance index
> might be unnecessarily brittle - it would break if someone introduced a
> new ancestor class.

Agree. But just wanted to point out that, then, something like
super(x, "default") should always work to point to default methods,
even if a method is internal and there's no foo.default defined.
Otherwise, we would have the same problem.

Iñaki

> Apart from the syntax - supporting fast casts for S3
> dispatch in the current implementation would be quite a bit of work,
> probably not worth it, also it would probably slow down the internal
> dispatch in primitives. But a partial solution could be implemented at
> some point with ALTREP wrappers when one could without copying create a
> wrapper object with a modified class attribute.
>
> Tomas
> > Iñaki
> >
> > El mié., 5 sept. 2018 a las 10:09, Tomas Kalibera
> > () escribió:
> >> On 08/24/2018 07:55 PM, Henrik Bengtsson wrote:
> >>> Is there a low-level function that returns the length of an object 'x'
> >>> - the length that for instance .subset(x) and .subset2(x) see? An
> >>> obvious candidate would be to use:
> >>>
> >>> .length <- function(x) length(unclass(x))
> >>>
> >>> However, I'm concerned that calling unclass(x) may trigger an
> >>> expensive copy internally in some cases.  Is that concern unfounded?
> >> Unclass() will always copy when "x" is really a variable, because the
> >> value in "x" will be referenced; whether it is prohibitively expensive
> >> or not depends only on the workload - if "x" is a very long list and
> >> this functions is called often then it could, but at least to me this
> >> sounds unlikely. Unless you have a strong reason to believe it is the
> >> case I would just use length(unclass(x)).
> >>
> >> If the copying is really a problem, I would think about why the
> >> underlying vector length is needed at R level - whether you really need
> >> to know the length without actually having the unclassed vector anyway
> >> for something else, so whether you are not paying for the copy anyway.
> >> Or, from the other end, if you need to do more without copying, and it
> >> is possible without breaking the value semantics, then you might need to
> >> switch to C anyway and for a bigger piece of code.
> >>
> >> If it were still just .length() you needed and it were performance
> >> critical, you could just switch to C and call Rf_length. That does not
> >> violate the semantics, just indeed it is not elegant as you are
> >> switching to C.
> >>
> >> If you stick to R and can live with the overhead of length(unclass(x))
> >> then there is a chance the overhead will decrease as R is optimized
> >> internally. This is possible in principle when the runtime knows that
> >> the unclassed vector is only needed to compute something that does not
> >> modify the vector. The current R cannot optimize this out, but it should
> >> be possible with ALTREP at some point (and as Radford mentioned pqR does
> >> it differently). Even with such internal optimizations indeed it is
> >> often necessary to make guesses about realistic workloads, so if you
> >> have a realistic workload where say length(unclass(x)) is critical, you
> >> are more than welcome to donate it as benchmark.
> >>
> >> Obviously, if you use a C version calling Rf_length, after such R
> >> optimization your code would be unnecessarily non-el

Re: [Rd] Bias in R's random integers?

2018-09-19 Thread Iñaki Ucar
El mié., 19 sept. 2018 a las 14:43, Duncan Murdoch
() escribió:
>
> On 18/09/2018 5:46 PM, Carl Boettiger wrote:
> > Dear list,
> >
> > It looks to me that R samples random integers using an intuitive but biased
> > algorithm by going from a random number on [0,1) from the PRNG to a random
> > integer, e.g.
> > https://github.com/wch/r-source/blob/tags/R-3-5-1/src/main/RNG.c#L808
> >
> > Many other languages use various rejection sampling approaches which
> > provide an unbiased method for sampling, such as in Go, python, and others
> > described here:  https://arxiv.org/abs/1805.10941 (I believe the biased
> > algorithm currently used in R is also described there).  I'm not an expert
> > in this area, but does it make sense for the R to adopt one of the unbiased
> > random sample algorithms outlined there and used in other languages?  Would
> > a patch providing such an algorithm be welcome? What concerns would need to
> > be addressed first?
> >
> > I believe this issue was also raised by Killie & Philip in
> > http://r.789695.n4.nabble.com/Bug-in-sample-td4729483.html, and more
> > recently in
> > https://www.stat.berkeley.edu/~stark/Preprints/r-random-issues.pdf,
> > pointing to the python implementation for comparison:
> > https://github.com/statlab/cryptorandom/blob/master/cryptorandom/cryptorandom.py#L265
>
> I think the analyses are correct, but I doubt if a change to the default
> is likely to be accepted as it would make it more difficult to reproduce
> older results.
>
> On the other hand, a contribution of a new function like sample() but
> not suffering from the bias would be good.  The normal way to make such
> a contribution is in a user contributed package.
>
> By the way, R code illustrating the bias is probably not very hard to
> put together.  I believe the bias manifests itself in sample() producing
> values with two different probabilities (instead of all equal
> probabilities).  Those may differ by as much as one part in 2^32.  It's

According to Kellie and Philip, in the attachment of the thread
referenced by Carl, "The maximum ratio of selection probabilities can
get as large as 1.5 if n is just below 2^31".

Iñaki

> very difficult to detect a probability difference that small, but if you
> define the partition of values into the high probability values vs the
> low probability values, you can probably detect the difference in a
> feasible simulation.
>
> Duncan Murdoch
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Suggestion: Make CRAN source URLs immutable

2018-11-02 Thread Iñaki Ucar
On Wed, 24 Oct 2018 at 11:40, Kurt Hornik  wrote:
>
> >>>>> Kurt Wheeler writes:
>
> Try e.g.
>
> https://cran.r-project.org/package=httr&version=1.3.1
> https://cran.r-project.org/package=httr&version=1.3.0

This is a nice feature that I didn't know. I recently proposed
enforcing this scheme in Fedora's packaging guidelines, because in
this way, a SPEC would build correctly even if the package was updated
and the old version was archived (is this guaranteed to continue to
work in future? I assumed so...).

There is an odd thing about this format though, and that is the
absence of a file extension. This is a redirection, yes, but the
spectool can't trust the filename that is sent by the remote server,
and uses only the filename extracted from the URL.

Without extension, RPM doesn't know how to unpack the sources. So we
have to do the following similarly odd trick (note the "#"):

Source0: 
https://cran.r-project.org/package=%{packname}&version=%{version}#/%{packname}_%{version}.tar.gz

Did you consider this problem? Is there any alternate immutable URL
*with* extension? If not, is there any potential issue with the trick
above?

Regards,
--
Iñaki Ucar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Suggestion: Make CRAN source URLs immutable

2018-11-03 Thread Iñaki Ucar
On Sat, 3 Nov 2018 at 11:54, Joris Meys  wrote:
>
> FWIW, you can get the URL and extract the link with extension from there. 
> Archived packages are always tarballs, so that makes the following possible:
>
> url <- "https://cran.r-project.org/package=httr&version=1.3.0";
>
> library(RCurl)
>
> pkgurl <- gsub(".*(https://cran.+\\.tar.gz).*",
>"\\1",
>getURL(url))
>
> install.packages(pkgurl, type = "source", repos = NULL)

The proper way to do this would be to fetch just the headers and
extract the location field:

$ URL="https://cran.r-project.org/package=httr&version=1.3.0";
$ curl -sI $URL | awk '/^Location:/{print $2}'
https://cran.r-project.org/src/contrib/Archive/httr/httr_1.3.0.tar.gz

But that's not the point. I'm talking about RPM packaging, and the
point is, as I said, that the tool that expands and downloads sources
from specfiles simply doesn't do that, because it can't trust an
extension sent from a remote server. We need to provide it explicitly
in the URL.

--
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bug report: Function ppois(0:20, lambda=0.9) does not generate a non-decreasing result.

2018-12-04 Thread Iñaki Ucar
On Tue, 4 Dec 2018 at 11:12,  wrote:
>
> function ppois is a function calculate the CDF of Poisson distribution, it 
> should generate a non-decreasing result, but what I got is:
>
> > any(diff(ppois(0:19,lambda=0.9))<0)
> [1] TRUE
>
> Actually,
>
> > ppois(19,lambda=0.9) [1] TRUE
>
> Which could not be TRUE.

This is just another manifestation of

0.1 * 3 > 0.3
#> [1] TRUE

This discussion returns to this list from time to time. TLDR; this is
not an R issue, but an unavoidable floating point issue. Solution:
work with log-probabilities instead.

any(diff(ppois(0:40, lambda=0.9, log.p=TRUE))<0)
#> [1] FALSE

Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Compiler + stopifnot bug

2019-01-03 Thread Iñaki Ucar
Hi,

I found the following issue in r-devel (2019-01-02 r75945):

`foo<-` <- function(x, value) {
  bar(x) <- value * x
  x
}

`bar<-` <- function(x, value) {
  stopifnot(all(value / x == 1))
  x + value
}

`foo<-` <- compiler::cmpfun(`foo<-`)
`bar<-` <- compiler::cmpfun(`bar<-`)

x <- c(2, 2)
foo(x) <- 1
x # should be c(4, 4)
#> [1] 3 3

If the functions are not compiled or the stopifnot call is removed,
the snippet works correctly. So it seems that something is messing
around with the references to "value" when the call to stopifnot gets
compiled, and the wrong "value" is modified. Note also that if "x <-
2", then the result is correct, 4.

Regards,
-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Compiler + stopifnot bug

2019-01-04 Thread Iñaki Ucar
I confirm it is fixed in r75946. Thanks.

Iñaki

On Fri, 4 Jan 2019 at 09:27, Tierney, Luke  wrote:
>
> Should be fixed in r75946.
>
> Best,
>
> luke
>
> On Fri, 4 Jan 2019, Tierney, Luke wrote:
>
> > Thanks for the reports. Will look into it soon and report back.
> >
> > Luke
> >
> > Sent from my iPhone
> >
> >> On Jan 3, 2019, at 2:15 PM, Martin Morgan  wrote:
> >>
> >> For what it's worth this also introduced
> >>
> >>> df = data.frame(v = package_version("1.2"))
> >>> rbind(df, df)$v
> >> [[1]]
> >> [1] 1 2
> >>
> >> [[2]]
> >> [1] 1 2
> >>
> >> instead of
> >>
> >>> rbind(df, df)$v
> >>[1] '1.2' '1.2'
> >>
> >> which shows up in Travis builds of Bioconductor packages
> >>
> >>  https://stat.ethz.ch/pipermail/bioc-devel/2019-January/014506.html
> >>
> >> and elsewhere
> >>
> >> Martin Morgan
> >>
> >> On 1/3/19, 7:05 PM, "R-devel on behalf of Duncan Murdoch" 
> >>  
> >> wrote:
> >>
> >>>On 03/01/2019 3:37 p.m., Duncan Murdoch wrote:
> >>> I see this too; by bisection, it seems to have first appeared in r72943.
> >>
> >>Sorry, that was a typo.  I meant r75943.
> >>
> >>Duncan Murdoch
> >>
> >>>
> >>> Duncan Murdoch
> >>>
> >>>> On 03/01/2019 2:18 p.m., Iñaki Ucar wrote:
> >>>> Hi,
> >>>>
> >>>> I found the following issue in r-devel (2019-01-02 r75945):
> >>>>
> >>>> `foo<-` <- function(x, value) {
> >>>>bar(x) <- value * x
> >>>>x
> >>>> }
> >>>>
> >>>> `bar<-` <- function(x, value) {
> >>>>stopifnot(all(value / x == 1))
> >>>>x + value
> >>>> }
> >>>>
> >>>> `foo<-` <- compiler::cmpfun(`foo<-`)
> >>>> `bar<-` <- compiler::cmpfun(`bar<-`)
> >>>>
> >>>> x <- c(2, 2)
> >>>> foo(x) <- 1
> >>>> x # should be c(4, 4)
> >>>> #> [1] 3 3
> >>>>
> >>>> If the functions are not compiled or the stopifnot call is removed,
> >>>> the snippet works correctly. So it seems that something is messing
> >>>> around with the references to "value" when the call to stopifnot gets
> >>>> compiled, and the wrong "value" is modified. Note also that if "x <-
> >>>> 2", then the result is correct, 4.
> >>>>
> >>>> Regards,
> >>>>
> >>>
> >>
> >>__
> >>R-devel@r-project.org mailing list
> >>https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
> >> __
> >> R-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> --
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa  Phone: 319-335-3386
> Department of Statistics andFax:   319-335-3017
> Actuarial Science
> 241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
> Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Runnable R packages

2019-01-07 Thread Iñaki Ucar
On Mon, 7 Jan 2019 at 22:09, Gergely Daróczi  wrote:
>
> Dear David, sharing some related (subjective) thoughts below.
>
> You can provide your app as a Docker image, so that the end-user
> simply calls a "docker pull" and then "docker run" -- that can be done
> from a user-friendly script as well.
> Of course, this requires Docker to be installed, but if that's a
> problem, probably better to "ship" the app as a web application and
> share a URL with the user, eg backed by shinyproxy.io

If Docker is a problem, you can also try podman: same usage,
compatible with Dockerfiles and daemon-less, no admin rights required.

https://podman.io/

Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Error: corrupted double-linked list

2019-01-14 Thread Iñaki Ucar
On Mon, 14 Jan 2019 at 10:58, Glen MacLachlan  wrote:
>
> Hello,
>
> Not sure if this is the right list of if this is a gdal/sf issue so I
> apologize but recently I've been seeing errors that crash R/3.5.1 and throw
> a double-linked list error (see below). Has anyone else come across this
> issue and if so is there a fix?

R-sig-geo is probably a better place for this. Also, a reproducible
example and/or the output from valgrind would be helpful.

Iñaki

>
>
>
> > rwhole <- st_transform(rwhole,st_crs(ele.map))
> *** Error in `/usr/local/lib64/R/bin/exec/R': corrupted double-linked list:
> 0x82787a00 ***
> === Backtrace: =
> /lib64/libc.so.6(+0x80aef)[0x7f774ceceaef]
> /lib64/libc.so.6(+0x8137e)[0x7f774cecf37e]
> /usr/local/lib/libproj.so.13(pj_dealloc+0xe)[0x7f773f13ddee]
> /usr/local/lib/libgdal.so.20(_ZN10OGRProj4CTD1Ev+0x77)[0x7f773f75a2f7]
> /usr/local/lib/libgdal.so.20(_ZN10OGRProj4CTD0Ev+0x9)[0x7f773f75a3f9]
> /usr/local/lib64/R
> /library/sf/libs/sf.so(_Z13CPL_transformN4Rcpp6VectorILi19ENS_15PreserveStorageEEENS0_ILi16ES1_EE+0x1e2)[0x7f7738566582]
> /usr/local/lib64/R
> /library/sf/libs/sf.so(_sf_CPL_transform+0x72)[0x7f7738556472]
> /usr/local/lib64/R/lib/libR.so(+0xf58ed)[0x7f774d97f8ed]
> /usr/local/lib64/R/lib/libR.so(+0x131b36)[0x7f774d9bbb36]
> /usr/local/lib64/R/lib/libR.so(Rf_eval+0x348)[0x7f774d9caf78]
> /usr/local/lib64/R/lib/libR.so(+0x14256b)[0x7f774d9cc56b]
> /usr/local/lib64/R/lib/libR.so(+0x136e17)[0x7f774d9c0e17]
> /usr/local/lib64/R/lib/libR.so(Rf_eval+0x348)[0x7f774d9caf78]
> /usr/local/lib64/R/lib/libR.so(+0x1416c6)[0x7f774d9cb6c6]
> /usr/local/lib64/R/lib/libR.so(+0x13d89c)[0x7f774d9c789c]
> /usr/local/lib64/R/lib/libR.so(Rf_eval+0x348)[0x7f774d9caf78]
> /usr/local/lib64/R/lib/libR.so(+0x14256b)[0x7f774d9cc56b]
> /usr/local/lib64/R/lib/libR.so(+0x136e17)[0x7f774d9c0e17]
> /usr/local/lib64/R/lib/libR.so(Rf_eval+0x348)[0x7f774d9caf78]
> /usr/local/lib64/R/lib/libR.so(+0x14256b)[0x7f774d9cc56b]
> /usr/local/lib64/R/lib/libR.so(+0x17f7b4)[0x7f774da097b4]
> /usr/local/lib64/R/lib/libR.so(+0x17fba7)[0x7f774da09ba7]
> /usr/local/lib64/R/lib/libR.so(+0x17ff93)[0x7f774da09f93]
> /usr/local/lib64/R/lib/libR.so(+0x130109)[0x7f774d9ba109]
> /usr/local/lib64/R/lib/libR.so(Rf_eval+0x348)[0x7f774d9caf78]
> /usr/local/lib64/R/lib/libR.so(+0x14256b)[0x7f774d9cc56b]
> /usr/local/lib64/R/lib/libR.so(+0x136e17)[0x7f774d9c0e17]
> /usr/local/lib64/R/lib/libR.so(Rf_eval+0x348)[0x7f774d9caf78]
> /usr/local/lib64/R/lib/libR.so(+0x14256b)[0x7f774d9cc56b]
> /usr/local/lib64/R/lib/libR.so(+0x17f7b4)[0x7f774da097b4]
> /usr/local/lib64/R/lib/libR.so(+0x17fc28)[0x7f774da09c28]
> /usr/local/lib64/R/lib/libR.so(+0x17ff93)[0x7f774da09f93]
> /usr/local/lib64/R/lib/libR.so(+0x130109)[0x7f774d9ba109]
> /usr/local/lib64/R/lib/libR.so(Rf_eval+0x348)[0x7f774d9caf78]
> /usr/local/lib64/R/lib/libR.so(+0x14256b)[0x7f774d9cc56b]
> /usr/local/lib64/R/lib/libR.so(Rf_eval+0x2fc)[0x7f774d9caf2c]
> /usr/local/lib64/R/lib/libR.so(+0x144e76)[0x7f774d9cee76]
> /usr/local/lib64/R/lib/libR.so(Rf_eval+0x55d)[0x7f774d9cb18d]
> /usr/local/lib64/R/lib/libR.so(+0x14655b)[0x7f774d9d055b]
> /usr/local/lib64/R/lib/libR.so(+0x131b36)[0x7f774d9bbb36]
> /usr/local/lib64/R/lib/libR.so(Rf_eval+0x348)[0x7f774d9caf78]
> /usr/local/lib64/R/lib/libR.so(+0x14256b)[0x7f774d9cc56b]
> /usr/local/lib64/R/lib/libR.so(+0x136e17)[0x7f774d9c0e17]
> /usr/local/lib64/R/lib/libR.so(Rf_eval+0x348)[0x7f774d9caf78]
> /usr/local/lib64/R/lib/libR.so(+0x1416c6)[0x7f774d9cb6c6]
> /usr/local/lib64/R/lib/libR.so(Rf_eval+0x800)[0x7f774d9cb430]
> /usr/local/lib64/R/lib/libR.so(+0x146841)[0x7f774d9d0841]
> /usr/local/lib64/R/lib/libR.so(+0x17d7f9)[0x7f774da077f9]
> /usr/local/lib64/R/lib/libR.so(+0x130109)[0x7f774d9ba109]
> /usr/local/lib64/R/lib/libR.so(Rf_eval+0x348)[0x7f774d9caf78]
> /usr/local/lib64/R/lib/libR.so(+0x14256b)[0x7f774d9cc56b]
> /usr/local/lib64/R/lib/libR.so(+0x136e17)[0x7f774d9c0e17]
> /usr/local/lib64/R/lib/libR.so(Rf_eval+0x348)[0x7f774d9caf78]
> /usr/local/lib64/R/lib/libR.so(+0x14256b)[0x7f774d9cc56b]
> /usr/local/lib64/R/lib/libR.so(Rf_eval+0x2fc)[0x7f774d9caf2c]
> /usr/local/lib64/R/lib/libR.so(Rf_ReplIteration+0x232)[0x7f774d9fada2]
> /usr/local/lib64/R/lib/libR.so(+0x171191)[0x7f774d9fb191]
> /usr/local/lib64/R/lib/libR.so(run_Rmainloop+0x4f)[0x7f774d9fb22f]
> /usr/local/lib64/R/bin/exec/R(main+0x1b)[0x40075b]
> /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f774ce70445]
> /usr/local/lib64/R/bin/exec/R[0x40078b]
> === Memory map: 
> 0040-00401000 r-xp  07:01 70665
> /usr/local/lib64/R/bin/exec/R
> 0060-00601000 r--p  07:01 70665
> /usr/local/lib64/R/bin/exec/R
> 00601000-00602000 rw-p 1000 07:01 70665
> /usr/local/lib64/R/bin/exec/R
> 019c6000-a7898000 rw-p  00:00 0
> [heap]
> 7f773853-7f77385ae000 r-xp  07:01 68
>  /usr/local/lib64/R/library/sf/libs/sf.so
> 7f77385ae000-7f77387ad000 ---p 0007e000 07:01 68
>  /usr/local/lib64/R/library/sf/libs/sf.so
> 7f77387ad000-7f7

[Rd] Encoding issues

2019-02-18 Thread Iñaki Ucar
Hi,

We found a (to our eyes) strange behaviour that might be a bug. First
a little bit of context. The 'units' package allows us to set the unit
using both SE or NSE. E.g., these both work in the same way:

units::set_units(1:10, "μm")
#> Units: [μm]
#> [1]  1  2  3  4  5  6  7  8  9 10

units::set_units(1:10, μm)
#> Units: [μm]
#> [1]  1  2  3  4  5  6  7  8  9 10

That's micrometers, and works fine if the session charset is UTF-8.
Now the funny part comes with Windows. The first version, with quotes,
works fine, but the second one fails. This is easy to demonstrate from
Linux:

LC_CTYPE=en_US.iso88591 Rscript -e 'units::set_units(1:10, "μm")'
#> Units: [μm]
#> [1]  1  2  3  4  5  6  7  8  9 10

LC_CTYPE=en_US.iso88591 Rscript -e 'units::set_units(1:10, μm)'
#> Error: unexpected input in "units::set_units(1:10, μ"
#> Execution halted

However, if you use the first version, with quotes, in an example, and
the package is checked on Windows, it fails too (see
https://ci.appveyor.com/project/edzer/units/builds/22440023#L747). The
package declares UTF-8 encoding, so none of these errors should, in
principle, happen. Am I wrong?

Thanks in advance, regards,
Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Encoding issues

2019-02-18 Thread Iñaki Ucar
On Mon, 18 Feb 2019 at 17:27, Gábor Csárdi  wrote:
>
> From "Writing R Extensions":
>
> "Only ASCII characters (and the control characters tab, formfeed, LF
> and CR) should be used in code files."
>
> So I am afraid you cannot use μm.

Thanks, Gábor, I missed that bit. Then, is an .Rd file considered a
"code file"? Our surprise comes from the fact that the quoted version
works fine in a test file, but not in an example. Anyway, if they
cause such a documented trouble, it seems that the safest option is to
avoid its use in the first place.

Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Improved Data Aggregation and Summary Statistics in R

2019-02-27 Thread Iñaki Ucar
On Wed, 27 Feb 2019 at 09:02, Sebastian Martin Krantz
 wrote:
>
> Dear Developers,
>
> Having spent time developing and thinking about how data aggregation and
> summary statistics can be enhanced in R, I would like to present my
> ideas/efforts in the form of two commands:
>
> The first, which for now I called 'collap', is an upgrade of aggregate that
> accommodates and extends the functionality of aggregate in various
> respects, most importantly to work with multilevel and multi-type data,
> multiple function calls, highly customized aggregation tasks, a much
> greater flexibility in the passing of inputs and tidy output.
>
> The second function, 'qsu', is an advanced and flexible summary command for
> cross-sectional and multilevel (panel) data (i.e. it can provide overall,
> between and within entities statistics, and allows for grouping, custom
> functions and transformations). It also provides a quick method to compute
> and output within-transformed data.
>
> Both commands are efficiently built from core R, but provide for optional
> integration with data.table, which renders them extremely fast on large
> datasets. An explanation of the syntax, a demonstration and benchmark
> results are provided in the attached vignette.

Looks interesting. Sorry if it's there and I didn't find it: is there
any package implementing these functions so that we can try them?

Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Intermittent crashes with inset `[<-` command

2019-02-27 Thread Iñaki Ucar
On Wed, 27 Feb 2019 at 09:51, Serguei Sokol  wrote:
>
> On 26/02/2019 05:18, Brian Montgomery via R-devel wrote:
> > The following code crashes after about 300 iterations on my 
> > x86_64-w64-mingw32 machine on R 3.5.2 --vanilla.
> > Others have duplicated this (see 
> > https://github.com/tidyverse/magrittr/issues/190 if necessary), but I don't 
> > know how machine/OS-dependent it may be.
> It crashes too on my Mageia6 (RPM based Linux distribution):
>   184 185 186 187
>   *** caught segfault ***
> address 0x70002, cause 'memory not mapped'

I can reproduce it too. This is the output from valgrind (sessionInfo below):

==3296== Invalid read of size 1
==3296==at 0x4A2D7F7: UnknownInlinedFun (Rinlinedfuns.h:542)
==3296==by 0x4A2D7F7: VectorAssign (subassign.c:658)
==3296==by 0x4A30540: do_subassign_dflt (subassign.c:1641)
==3296==by 0x4A338F2: do_subassign (subassign.c:1571)
==3296==by 0x49769A1: bcEval (eval.c:6795)
==3296==by 0x498B415: R_compileAndExecute (eval.c:1407)
==3296==by 0x498B985: do_for (eval.c:2185)
==3296==by 0x49848A8: Rf_eval (eval.c:691)
==3296==by 0x49B5131: Rf_ReplIteration (main.c:258)
==3296==by 0x49B5131: Rf_ReplIteration (main.c:198)
==3296==by 0x49B54F0: R_ReplConsole (main.c:308)
==3296==by 0x49B55AF: run_Rmainloop (main.c:1082)
==3296==by 0x1090AE: main (Rmain.c:29)
==3296==  Address 0x1dafab90 is 0 bytes inside a block of size 160,048 free'd
==3296==at 0x4839A0C: free (vg_replace_malloc.c:540)
==3296==by 0x49BCA56: ReleaseLargeFreeVectors (memory.c:1055)
==3296==by 0x49BCA56: RunGenCollect (memory.c:1825)
==3296==by 0x49BCA56: R_gc_internal (memory.c:2998)
==3296==by 0x49BCA56: R_gc_internal (memory.c:2964)
==3296==by 0x49BFB2C: Rf_allocVector3 (memory.c:2682)
==3296==by 0x49C09FC: UnknownInlinedFun (Rinlinedfuns.h:577)
==3296==by 0x49C09FC: R_alloc (memory.c:2197)
==3296==by 0x4A377F5: logicalSubscript (subscript.c:575)
==3296==by 0x4A377F5: logicalSubscript (subscript.c:503)
==3296==by 0x4A3A8D3: Rf_makeSubscript (subscript.c:994)
==3296==by 0x4A2D63D: VectorAssign (subassign.c:656)
==3296==by 0x4A30540: do_subassign_dflt (subassign.c:1641)
==3296==by 0x4A338F2: do_subassign (subassign.c:1571)
==3296==by 0x49769A1: bcEval (eval.c:6795)
==3296==by 0x498B415: R_compileAndExecute (eval.c:1407)
==3296==by 0x498B985: do_for (eval.c:2185)
==3296==  Block was alloc'd at
==3296==at 0x483880B: malloc (vg_replace_malloc.c:309)
==3296==by 0x49C0031: Rf_allocVector3 (memory.c:2713)
==3296==by 0x4A3B041: UnknownInlinedFun (Rinlinedfuns.h:577)
==3296==by 0x4A3B041: Rf_ExtractSubset (subset.c:115)
==3296==by 0x4A3DA8A: VectorSubset (subset.c:198)
==3296==by 0x4A3DA8A: do_subset_dflt (subset.c:823)
==3296==by 0x4A3FCAA: do_subset (subset.c:661)
==3296==by 0x49848A8: Rf_eval (eval.c:691)
==3296==by 0x4989600: Rf_evalListKeepMissing (eval.c:2955)
==3296==by 0x4A3390B: R_DispatchOrEvalSP (subassign.c:1535)
==3296==by 0x4A3390B: do_subassign (subassign.c:1567)
==3296==by 0x49769A1: bcEval (eval.c:6795)
==3296==by 0x498B415: R_compileAndExecute (eval.c:1407)
==3296==by 0x498B985: do_for (eval.c:2185)
==3296==by 0x49848A8: Rf_eval (eval.c:691)
==3296==
==3296== Invalid read of size 1
==3296==at 0x4A2E2C0: UnknownInlinedFun (Rinlinedfuns.h:189)
==3296==by 0x4A2E2C0: UnknownInlinedFun (Rinlinedfuns.h:554)
==3296==by 0x4A2E2C0: VectorAssign (subassign.c:658)
==3296==by 0x4A30540: do_subassign_dflt (subassign.c:1641)
==3296==by 0x4A338F2: do_subassign (subassign.c:1571)
==3296==by 0x49769A1: bcEval (eval.c:6795)
==3296==by 0x498B415: R_compileAndExecute (eval.c:1407)
==3296==by 0x498B985: do_for (eval.c:2185)
==3296==by 0x49848A8: Rf_eval (eval.c:691)
==3296==by 0x49B5131: Rf_ReplIteration (main.c:258)
==3296==by 0x49B5131: Rf_ReplIteration (main.c:198)
==3296==by 0x49B54F0: R_ReplConsole (main.c:308)
==3296==by 0x49B55AF: run_Rmainloop (main.c:1082)
==3296==by 0x1090AE: main (Rmain.c:29)
==3296==  Address 0x1dafab90 is 0 bytes inside a block of size 160,048 free'd
==3296==at 0x4839A0C: free (vg_replace_malloc.c:540)
==3296==by 0x49BCA56: ReleaseLargeFreeVectors (memory.c:1055)
==3296==by 0x49BCA56: RunGenCollect (memory.c:1825)
==3296==by 0x49BCA56: R_gc_internal (memory.c:2998)
==3296==by 0x49BCA56: R_gc_internal (memory.c:2964)
==3296==by 0x49BFB2C: Rf_allocVector3 (memory.c:2682)
==3296==by 0x49C09FC: UnknownInlinedFun (Rinlinedfuns.h:577)
==3296==by 0x49C09FC: R_alloc (memory.c:2197)
==3296==by 0x4A377F5: logicalSubscript (subscript.c:575)
==3296==by 0x4A377F5: logicalSubscript (subscript.c:503)
==3296==by 0x4A3A8D3: Rf_makeSubscript (subscript.c:994)
==3296==by 0x4A2D63D: VectorAssign (subassign.c:656)
==3296==by 0x4A30540: do_subassign_dflt (subassign.c:1641)
==3296==by 0x4A338F2: do_subassign (subassi

Re: [Rd] SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()

2019-04-12 Thread Iñaki Ucar
On Thu, 11 Apr 2019 at 22:07, Henrik Bengtsson
 wrote:
>
> ISSUE:
> Using *forks* for parallel processing in R is not always safe.
> [...]
> Comments?

Using fork() is never safe. The reference provided by Kevin [1] is
pretty compelling (I kindly encourage anyone who ever forked a process
to read it). Therefore, I'd go beyond Henrik's suggestion, and I'd
advocate for deprecating fork clusters and eventually removing them
from parallel.

[1] 
https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf

-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()

2019-04-12 Thread Iñaki Ucar
On Fri, 12 Apr 2019 at 21:32, Travers Ching  wrote:
>
> Just throwing my two cents in:
>
> I think removing/deprecating fork would be a bad idea for two reasons:
>
> 1) There are no performant alternatives

"Performant"... in terms of what. If the cost of copying the data
predominates over the computation time, maybe you didn't need
parallelization in the first place.

> 2) Removing fork would break existing workflows

I don't see why mclapply could not be rewritten using PSOCK clusters.
And as a side effect, this would enable those workflows on Windows,
which doesn't support fork.

> Even if replaced with something using the same interface (e.g., a
> function that automatically detects variables to export as in the
> amazing `future` package), the lack of copy-on-write functionality
> would cause scripts everywhere to break.

To implement copy-on-write, Linux overcommits virtual memory, and this
is what causes scripts to break unexpectedly: everything works fine,
until you change a small unimportant bit and... boom, out of memory.
And in general, running forks in any GUI would cause things everywhere
to break.

> A simple example illustrating these two points:
> `x <- 5e8; mclapply(1:24, sum, x, 8)`
>
> Using fork, `mclapply` takes 5 seconds.  Using "psock", `clusterApply`
> does not complete.

I'm not sure how did you setup that, but it does complete. Or do you
mean that you ran out of memory? Then try replacing "x" with, e.g.,
"x+1" in your mclapply example and see what happens (hint: save your
work first).

--
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()

2019-04-13 Thread Iñaki Ucar
On Sat, 13 Apr 2019 at 03:51, Kevin Ushey  wrote:
>
> I think it's worth saying that mclapply() works as documented

Mostly, yes. But it says nothing about fork's copy-on-write and memory
overcommitment, and that this means that it may work nicely or fail
spectacularly depending on whether, e.g., you operate on a long
vector.

-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()

2019-04-13 Thread Iñaki Ucar
On Sat, 13 Apr 2019 at 18:41, Simon Urbanek  wrote:
>
> Sure, but that a completely bogus argument because in that case it would fail 
> even more spectacularly with any other method like PSOCK because you would 
> *have to* allocate n times as much memory so unlike mclapply it is guaranteed 
> to fail. With mclapply it is simply much more efficient as it will share 
> memory as long as possible. It is rather obvious that any new objects you 
> create can no longer be shared as they now exist separately in each process.

The point was that PSOCK fails and succeeds *consistently*,
independently of what you do with the input in the function provided.
I think that's a good property.

-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()

2019-04-15 Thread Iñaki Ucar
On Mon, 15 Apr 2019 at 08:44, Tomas Kalibera  wrote:
>
> On 4/13/19 12:05 PM, Iñaki Ucar wrote:
> > On Sat, 13 Apr 2019 at 03:51, Kevin Ushey  wrote:
> >> I think it's worth saying that mclapply() works as documented
> > Mostly, yes. But it says nothing about fork's copy-on-write and memory
> > overcommitment, and that this means that it may work nicely or fail
> > spectacularly depending on whether, e.g., you operate on a long
> > vector.
>
> R cannot possibly replicate documentation of the underlying operating
> systems. It clearly says that fork() is used and readers who may not
> know what fork() is need to learn it from external sources.
> Copy-on-write is an elementary property of fork().

Just to be precise, copy-on-write is an optimization widely deployed
in most modern *nixes, particularly for the architectures in which R
usually runs. But it is not an elementary property; it is not even
possible without an MMU.

-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] openblas

2019-05-08 Thread Iñaki Ucar
On Wed, 8 May 2019 at 04:52, Peter Langfelder
 wrote:
>
> (CCing the R-devel list, maybe someone will have a better answer.)
>
> To be honest, I don't know how to. I wasn't able to configure R to use
> OpenBLAS using the configure script and options on my Linux Fedora system.
> I configure it without external BLAS, then replace the libRblas.dylib (.so
> in my case) with a link to the OpenBLAS dynamic link library.

R on Fedora uses openblas by default since Fedora 23. In fact, there's
a specific package, openblas-Rblas, that provides libRblas.so.

$ ll /usr/lib64/R/lib/
total 64544
-rwxr-xr-x. 1 root root 60113776 feb 28 13:37 libRblas.so
-rwxr-xr-x. 1 root root  1961880 mar 11 20:37 libRlapack.so
-rwxr-xr-x. 1 root root   182304 mar 11 20:37 libRrefblas.so
-rwxr-xr-x. 1 root root  3828104 mar 11 20:37 libR.so

R reference blas is installed as libRrefblas.so.

Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] openblas

2019-05-08 Thread Iñaki Ucar
On Wed, 8 May 2019 at 18:00, Ralf Stubner  wrote:
>
> On 08.05.19 09:34, Iñaki Ucar wrote:
> > On Wed, 8 May 2019 at 04:52, Peter Langfelder
> >  wrote:
> >>
> >> (CCing the R-devel list, maybe someone will have a better answer.)
> >>
> >> To be honest, I don't know how to. I wasn't able to configure R to use
> >> OpenBLAS using the configure script and options on my Linux Fedora system.
> >> I configure it without external BLAS, then replace the libRblas.dylib (.so
> >> in my case) with a link to the OpenBLAS dynamic link library.
> >
> > R on Fedora uses openblas by default since Fedora 23. In fact, there's
> > a specific package, openblas-Rblas, that provides libRblas.so.
>
> AFAIK a single-threaded OpenBLAS is used.

Of course it is the serial version. It wouldn't be a good policy to
ship a threaded shared library by default.

> When compiling R from source
> on a CentOS system I have used the configure option
> '--with-blas="-lopenblasp"' to link with the pthread version of OpenBLAS.

CentOS uses the reference BLAS by default instead. It's a long story.
But both CentOS and Fedora configure R with --enable-BLAS-shlib, so
you don't need to recompile it to use an optimized version.

Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R-pkg-devel] Three-argument S3method declaration does not seem to affect dispatching from inside the package.

2019-05-14 Thread Iñaki Ucar
CCing r-devel.

On Tue, 14 May 2019 at 02:11, Pavel Krivitsky  wrote:
>
> Dear All,
>
> I've run into this while updating a package with unfortunately named
> legacy functions. It seems like something that might be worth changing
> in R, and I want to get a sense of whether this is a problem before
> submitting a report to the Bugzilla.
>
> It appears that the 3-argument form of S3method() in NAMESPACE controls
> dispatching when the generic is called from outside the package that
> defines it but not when it's called from inside the package that
> defines it.
>
> For example the attached minimal package has four R functions:
>
>gen <- function(object, ...)
>  UseMethod("gen")
>
>.gen.formula <- function(object, ...){
>  message("I am the S3method-declared method.")
>}
>
>gen.formula <- function(object, ...){
>  message("I am the function with an unfortunate name.")
>}
>
>test_me <- function(){
>  message("I am the tester. Which one will I call?")
>  gen(a~b)
>}
>
> and the following NAMESPACE:
>
>export(gen)
>S3method(gen, formula, .gen.formula)
>export(gen.formula)
>export(test_me)
>
> Now,
>
>library(anRpackage)
>example(test_me)
>
> results in the following:
>
>test_m> test_me
>function(){
>  message("I am the tester. Which one will I call?")
>  gen(a~b)
>}
>
>
>
>test_m> test_me() # Calls gen.formula()
>I am the tester. Which one will I call?
>I am the function with an unfortunate name.
>
>test_m> gen(a~b) # Calls .gen.formula()
>I am the S3method-declared method.
>
> So, calling the same generic function with the same class results in
> different dispatching behaviour depending on whether the call is from
> within the package doing the export or from the outside.

It does not depend on whether you export gen.formula() or not. When
you call gen() inside your package, the S3 dispatch mechanism finds a
method gen.formula defined in that environment (the package's
namespace), so it is called.

> This behaviour appears to be as documented (
> https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Method-dispatching
> ), but it seems to me that if S3method() is allowed to give the name of
> the method to be used, then it should should override the name-based
> dispatching both inside and outside the package.
>
> Any thoughts?

Note that disabling name-based dispatch implies two things: 1) the
inability to override your method by defining gen.formula in the
global environment, and 2) another package can break yours (i.e.,
internal calls to gen()) by registering an S3 method for gen() after
you. I don't think that's a good idea.

Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R-pkg-devel] Three-argument S3method declaration does not seem to affect dispatching from inside the package.

2019-05-14 Thread Iñaki Ucar
On Tue, 14 May 2019 at 12:31, Pavel Krivitsky  wrote:
>
> > Note that disabling name-based dispatch implies two things: 1) the
> > inability to override your method by defining gen.formula in the
> > global environment, and 2) another package can break yours (i.e.,
> > internal calls to gen()) by registering an S3 method for gen() after
> > you.
>
> That's a good point.
>
>> library(anRpackage)
>> gen(a~b)
>I am the S3method-declared method.
>> gen.formula <- function(object, ...){message("I am the externally 
> declared method.")}
>> gen(a~b)
>I am the externally declared method.
>> test_me()
>I am the tester. Which one will I call?
>I am the function with an unfortunate name.
>
> In that case, I think that the least surprising behaviour would
> prioritise declarations and methods "nearer" to the caller over those
> "farther" from the caller (where "caller" is the caller of the generic,
> not the generic itself), and, within that, give precedence to S3method
> declarations over function names.

The thing is that, in R, "nearer" means "the calling environment" (and
then, other things). When you call test_me(), the calling environment
for gen() is the package namespace. When you call gen() directly, then
the calling environment is the global environment. So what happens
here follows the principle of least astonishment.

The issue here is that you are registering a non-standard name
(.gen.formula) for that generic and then defining what would be the
standard name (gen.formula) for... what purpose? IMHO, this is a bad
practice and should be avoided.

> That is, for a call from inside a package, the order of precedence
> would be as follows:
>1. S3method() in that package's NAMESPACE.
>2. Appropriately named function in that package (exported or not).
>3. Appropriately named function in calling environment (which may be
>   GlobalEnv).
>4. S3method() in other loaded packages' NAMESPACEs.
>5. Appropriately named functions exported by other loaded packages'
>   NAMESPACEs.
>
> For a call from outside a package, the precedence is the same, but 1
> and 2 are not relevant.
>
> As far as I can tell, this is the current behaviour except for the
> relative ordering of 1 and 2.

Nope. Current behaviour (see details in ?UseMethod) is:

"To support this, UseMethod and NextMethod search for methods in two
places: in the environment in which the generic function is called,
and in the registration data base for the environment in which the
generic is defined".

Changing this would probably break a lot of things out there.

Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R-pkg-devel] Three-argument S3method declaration does not seem to affect dispatching from inside the package.

2019-05-19 Thread Iñaki Ucar
On Sat, 18 May 2019 at 23:34, Pavel Krivitsky  wrote:
>
> > The issue here is that you are registering a non-standard name
> > (.gen.formula) for that generic and then defining what would be the
> > standard name (gen.formula) for... what purpose? IMHO, this is a bad
> > practice and should be avoided.
>
> The situation initially arose when I wanted to soft-deprecate calling a
> particular method by its full name in order to clean up the package's
> namespace.
>
> To use our working example, I wanted calls to gen.formula() to issue a
> deprecation warning, but calls to gen(formula) not to. The simplest way
> to do that that I could find was to create a function, say,
> .gen.formula() that would implement the method and declare it as the S3
> export, and modify gen.formula() to issue the warning before passing on
> to .gen.formula(). Then, direct calls to gen.formula() would produce a
> warning, but gen(formula) would by pass it.

IMO the simplest way to do this is to check who the caller was:

foo <- function(x) UseMethod("foo")
foo.bar <- function(x) {
  sc <- sys.call(-1)
  if (is.null(sc) || sc[[1]] != "foo")
.Deprecated(msg="Calling 'foo.bar' directly is deprecated")
}

x <- 1
class(x) <- "bar"

foo(x)  # silent
foo.bar(x)  # a warning is issued

> > > That is, for a call from inside a package, the order of precedence
> > > would be as follows:
> > >1. S3method() in that package's NAMESPACE.
> > >2. Appropriately named function in that package (exported or
> > > not).
> > >3. Appropriately named function in calling environment (which
> > > may be
> > >   GlobalEnv).
> > >4. S3method() in other loaded packages' NAMESPACEs.
> > >5. Appropriately named functions exported by other loaded
> > > packages'
> > >   NAMESPACEs.
> > >
> > > For a call from outside a package, the precedence is the same, but
> > > 1 and 2 are not relevant.
> > >
> > > As far as I can tell, this is the current behaviour except for the
> > > relative ordering of 1 and 2.
> >
> > Nope. Current behaviour (see details in ?UseMethod) is:
> >
> > "To support this, UseMethod and NextMethod search for methods in two
> > places: in the environment in which the generic function is called,
> > and in the registration data base for the environment in which the
> > generic is defined".
>
> Can you be more specific where the sequence above contradicts the
> current implementation (except for swapping 1 and 2)? As far as I can
> tell, it's just a more concrete description of what's in the
> documentation.

The description in the documentation means that point 3) in your list
goes always first, which automatically implies 2) if the generic is
defined in the same package.

Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R-pkg-devel] Three-argument S3method declaration does not seem to affect dispatching from inside the package.

2019-05-19 Thread Iñaki Ucar
On Sun, 19 May 2019 at 23:23, Pavel Krivitsky  wrote:
>
> Hi, Inaki,
>
> On Sun, 2019-05-19 at 16:59 +0200, Iñaki Ucar wrote:
> > IMO the simplest way to do this is to check who the caller was:
> >
> > foo <- function(x) UseMethod("foo")
> > foo.bar <- function(x) {
> >   sc <- sys.call(-1)
> >   if (is.null(sc) || sc[[1]] != "foo")
> > .Deprecated(msg="Calling 'foo.bar' directly is deprecated")
> > }
> >
> > x <- 1
> > class(x) <- "bar"
> >
> > foo(x)  # silent
> > foo.bar(x)  # a warning is issued
>
> f <- getS3method("foo","bar")
> f(x) # spurious warning
>
> foo.baz <- function(x) NextMethod("foo")
> class(x) <- c("baz","bar")
> foo(x) # spurious warning

Checking the enclosing environment and whether was called through
NextMethod respectively covers these cases too.

> > The description in the documentation means that point 3) in your list
> > goes always first, which automatically implies 2) if the generic is
> > defined in the same package.
>
> Are you sure which package defines the generic matters? I've just ran
> some tests with two packages and moving the generic around doesn't seem
> to affect things: the calling function determines whose method is used.

If package A defines generic foo and package B defines method foo.bar
without registering nor exporting it, then foo can't find foo.bar.

> It seems to me like there is no contradiction after all, except that I
> propose that the registered method should take precedence within a
> namespace.
>
> The only situation in which it would change R's behaviour would be when
> a package/namespace contains a function foo.bar() AND a NAMESPACE
> containing S3method(foo,bar,not.foo.bar) AND calls foo() on objects of
> type bar from inside the package. It is extremely unlikely to break any
> existing code.

To try to avoid changing current behaviour if foo.bar is found, R
would need to check whether the enclosing environment is identical to
the enclosing environment of the registered method, and in that case,
give precedence to the latter (which, BTW, is exactly what you need to
do to fix the first spurious warning above).

And still, funny things may happen. For example, pkgA defines generic
foo, exports foo.bar and registers other.foo.bar instead of foo.bar.
Following your proposal, if I load pkgA and call foo for an object of
class bar, other.foo.bar is called. Then I load pkgB, which registers
just another method for foo.bar, and call foo again. What happens is
that the registered method belongs now to pkgB, which is a different
namespace, so we got different precedence, and foo.bar is called
instead.

Exceptions leads us to inconsistencies like this. I can't speak for R
core, but I don't think that the use case is compelling enough to take
that path.

Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Making a package CITATION file from BibTeX

2019-05-29 Thread Iñaki Ucar
I believe r-package-devel is the proper list for this. Now in CC.

On Thu, 30 May 2019 at 00:16, Dr Gregory Jefferis
 wrote:
>
> Dear Colleagues,
>
> I would like to provide a CITATION file for my package nat.nblast [1].
>
> I have the correct citation in BibTeX format [2]. How can I convert this
> BibTeX to the format needed by R for a package CITATION file (I have a
> lot of other packages needing citations ...).
>
> I think what I need is the opposite of RefManageR::toBiblatex [3]. This
> seems like it should be a common need, so I feel sure I must be missing
> something, but I can't seem to google up any hints.

There's a specific section in the manual about this (1.9 CITATION
files), and lots of examples out there. Here's one:

https://github.com/r-simmer/simmer/blob/master/inst/CITATION

Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Offer zip builds

2019-06-04 Thread Iñaki Ucar
FWIW, innoextract extracts the contents of the installer just fine.

Iñaki

On Tue, 4 Jun 2019 at 17:40, Steven Penny  wrote:
>
> On Mon, Jun 3, 2019 at 6:54 PM Marc Schwartz wrote:
> > I am on macOS primarily, albeit, I have run both Windows and Linux routinely
> > in years past.
>
> With all due respect, then you have no business in this thread.
>
> > That being said, these days, I do run Windows 10 under a Parallels VM on
> > macOS, as I have a single commercial application that I need to run for
> > clients now and then, and it sadly only runs on a real Windows install (e.g.
> > not with Wine).
>
> Further demonstrating my point. You run Windows in a virtual machine, meaning
> even if you encountered some bad installer, you could just revert to a 
> snapshot
> or similar.
>
> > To your points:
> >
> > [bunch of links]
>
> I am sorry if I miscommunicated, I didnt and dont wish to be convinced about 
> how
> well behaved R installer is. I wish for R to offer zip builds. Many other
> programming languages do:
>
> - http://strawberryperl.com/releases.html
> - https://dotnet.microsoft.com/download/dotnet-core/2.2
> - https://golang.org/dl
> - https://nim-lang.org/install_windows.html
> - https://python.org/downloads/release/python-373
> - https://windows.php.net/download
>
> As I see it, the question isnt "should R offer zip builds", its "why isnt R
> offering zip builds".
>
> > Unless you can make the case to them to expend the finite resources that 
> > they
> > have to support this as part of each version release process, in light of 
> > the
> > prior discussions, it is not clear that this appears to be a priority.
>
> Thats the point of my original post. If they choose to continue with only EXE,
> I will just keep using other programming languages. So you could see how it
> might be in R interest to offer this, as no zip builds might be one of the
> reasons people avoid the language.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Halfway through writing an "IDE" with support for R; Proof of concept, and request for suggestions.

2019-06-14 Thread Iñaki Ucar
Honestly, I don't see the motivation for this. There are many similar
projects that are mature, so my feedback would be: don't reinvent the wheel
and contribute to those.

Iñaki


El vie., 14 jun. 2019 3:18, Abby Spurdle  escribió:

> I thought that I'd get more feedback.
> But it's ok, I understand.
>
> I wanted to note that I've moved symbyont to GitLab, which is where I
> should have put it, in the first place.
>
> Also, I'm not planning to start another thread.
> However, if anyone has suggestions six months from now (or six years from
> now...), you're still welcome to email me, and I will try to listen...
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Halfway through writing an "IDE" with support for R; Proof of concept, and request for suggestions.

2019-06-14 Thread Iñaki Ucar
On Sat, 15 Jun 2019 at 01:24, Abby Spurdle  wrote:
>
> None of the tools that I've looked at satisfy these constraints.
> But if you know of some, I'd like to know... And I would consider 
> contributing...

What about Atom, VS Code and the like? Or what about taking a project
that meets most of the constraints and pushing to cover all of them,
or even forking it and modifying the part you don't like?

Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] making a vignette optional

2019-06-18 Thread Iñaki Ucar
On Tue, 18 Jun 2019 at 19:03, Therneau, Terry M., Ph.D. via R-devel
 wrote:
>
> I had added a vignette to the coxme package and all worked well locally, but 
> it failed at
> CRAN. The issue is that the vignette involves using coxme for pedigree 
> data, it
> doesn't work without the kinship2 package, and I hadn't put in the necessary 
> "if
> (require(" logic.
>
> The question is, how do I make the entire vignette conditional? If the 
> package isn't
> available, there is nothing to run.  The latex itself will fail when it can't 
> find the
> figures  (I float them), and the parts that don't will end up as inane 
> discussion of
> material that isn't there.

This is what I do in my packages:
https://github.com/r-simmer/simmer/blob/master/vignettes/simmer-08-philosophers.Rmd#L21

If SweaveOpts accept code in the same way, you can set something like

\SweaveOpts{eval=requireNamespace("kinship2", quietly=TRUE)}

Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Fast way to call an R function from C++?

2019-06-18 Thread Iñaki Ucar
For reference, your benchmark using UNWIND_PROTECT:

> system.time(test(testFunc, evn$x))
   user  system elapsed
  0.331   0.000   0.331
> system.time(test(C_test1, testFunc, evn$x))
   user  system elapsed
  2.029   0.000   2.036
> system.time(test(C_test2, expr, evn))
   user  system elapsed
  2.307   0.000   2.313
> system.time(test(C_test3, testFunc, evn$x))
   user  system elapsed
  2.131   0.000   2.138

Iñaki

On Tue, 18 Jun 2019 at 20:35, Iñaki Ucar  wrote:
>
> On Tue, 18 Jun 2019 at 19:41, King Jiefei  wrote:
> >
> > [...]
> >
> > It is clear to see that calling an R function in R is the fast one, it is
> > about 5X faster than ` R_forceAndCall ` and ` Rf_eval`. the latter two
> > functions have a similar performance and using Rcpp is the worst one. Is it
> > expected? Why is calling an R function from C++ much slower than calling
> > the function from R? Is there any faster way to do the function call in C++?
>
> Yes, there is: enable fast evaluation by setting
> -DRCPP_USE_UNWIND_PROTECT, or alternatively, use
>
> // [[Rcpp::plugins(unwindProtect)]]
>
> Iñaki



-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Fast way to call an R function from C++?

2019-06-18 Thread Iñaki Ucar
On Tue, 18 Jun 2019 at 19:41, King Jiefei  wrote:
>
> [...]
>
> It is clear to see that calling an R function in R is the fast one, it is
> about 5X faster than ` R_forceAndCall ` and ` Rf_eval`. the latter two
> functions have a similar performance and using Rcpp is the worst one. Is it
> expected? Why is calling an R function from C++ much slower than calling
> the function from R? Is there any faster way to do the function call in C++?

Yes, there is: enable fast evaluation by setting
-DRCPP_USE_UNWIND_PROTECT, or alternatively, use

// [[Rcpp::plugins(unwindProtect)]]

Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Fast way to call an R function from C++?

2019-06-19 Thread Iñaki Ucar
On Wed, 19 Jun 2019 at 07:42, King Jiefei  wrote:
>
> Hello Kevin and Iñaki,
>
> Thanks for your quick responses. I sincerely appreciate them! I can see how 
> complicated it is to interact with R in C. Iñaki's suggestion is very 
> helpful, I saw there is a lot of performance gain by turning the flag on, but 
> sadly the best performance it can offer still cannot beat R itself. It is 
> interesting to see that C++ is worse than R in this special case despite 
> there is a common belief that C++ code is the fast one... Anyway, thanks 
> again for your suggestions and reference!

That is misleading. C++ code is faster, that's beyond doubt. But you
are not running C++ code here, you are running R code. So what is
faster, running R code or executing something that then runs R code?
Obviously the first thing is the baseline, and from there, it can only
get worse as you add more layers on top of it.

Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] memory access question

2019-06-30 Thread Iñaki Ucar
It doesn't matter you didn't use the value. An invalid read may fail
or not. It depends on whether that memory portion was reallocated or
not by the OS. When it was, you are trying to read in a memory region
where you don't have permission, so it crashes.

Iñaki

On Sun, 30 Jun 2019 at 04:27, Therneau, Terry M., Ph.D. via R-devel
 wrote:
>
> I had a problem with the latest iteration of the survival package  (one that 
> I hope to
> post to Github next week) where it would die in strange ways, i.e., work on 
> one box and
> not on another, a vignette would compile if I invoked Sweave myself but fail 
> in R CMD
> build, etc.   The type of thing that says quite loudly that there is a memory 
> issue
> somewhere in a C routine.   The kind that has potential for making you tear 
> your hair out.
>
> In any case, I finally built an ASAN aware version of R on my test box, and 
> it failed on
> something that looked minor.  I was reading one element past one of the input 
> vectors,
> though I never used the value.   In essence I had  "temp = input[i]" one line 
> in front of
> the "if() break" test for i.   (The while loop for i was running from n-1 to 
> 0; one often
> goes from largest to smallest time value in survival code, so i was -1 at the 
> failure).
> I repaired this of course, but with no real hope that it could be the actual 
> issue causing
> my errors.   And now the weird behavior seems to have gone away!  The 
> argument in question
> was about midway on the argument list BTW.
>
> My question is, should I have been as surprised as I am?
>
> And let me give a big thank you to the authors of the "debugging" section of 
> the packages
> guide.  Things that reliably die are one thing, but I don't know how I would 
> have found
> this one without the help.
>
> Terry T.
>
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Potential bug with data.frame replacement

2019-07-15 Thread Iñaki Ucar
On Mon, 15 Jul 2019 at 18:55, William Dunlap via R-devel
 wrote:
>
> This may be related to the size of the deparsed call in the error message
> that Brodie and Luke were discussing recently on R-devel (" Mitigating
> Stalls Caused by Call Deparse on Error").   I don't get a crash, but the
> error message itself doesn't show up after the deparsed call.

The example crashes with a buffer overflow in systems with
FORTIFY_SOURCE=2 (i.e., official R package in most Linux
distributions).

Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Rtools contains Python interpreter(s), and six copies?

2019-08-02 Thread Iñaki Ucar
On Sat, 3 Aug 2019 at 00:36, Abby Spurdle  wrote:
>
> I can't find one reference to Python in the documentation:

Maybe because it's *not* needed? There's a note here though:
https://github.com/rwinlib/gcc-4.9.3

Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Is R-devel broken?

2019-08-21 Thread Iñaki Ucar
Hi,

I'm building r-devel using [1], and I see:

mv: './grid/vignettes/grid.Rnw-lattice' and
'./grid/vignettes/grid.Rnw' are the same file
make[1]: *** [Makefile:121: vignettes-no-lattice] Error 1

Regards,
Iñaki

[1] https://hub.docker.com/r/rocker/r-devel/dockerfile

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Is R-devel broken?

2019-08-21 Thread Iñaki Ucar
Don't mind. It seems to be a caching issue in the underlying
filesystem. Not sure how to solve it though.

Iñaki

On Wed, 21 Aug 2019 at 13:21, Iñaki Ucar  wrote:
>
> Hi,
>
> I'm building r-devel using [1], and I see:
>
> mv: './grid/vignettes/grid.Rnw-lattice' and
> './grid/vignettes/grid.Rnw' are the same file
> make[1]: *** [Makefile:121: vignettes-no-lattice] Error 1
>
> Regards,
> Iñaki
>
> [1] https://hub.docker.com/r/rocker/r-devel/dockerfile



-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Suggestions for improved checks on CRAN/R

2019-08-24 Thread Iñaki Ucar
Dear CRAN maintainers, R core team,

Here are some suggestions to prevent some issues I found in several
packages on CRAN. Some of these issues have been reported to their
maintainers, but still I believe it would be desirable to enforce
these on CRAN or in the corresponding R CMD.

- Checks for undeclared sysreqs. There are packages that do not
declare some system requirement. E.g., bioacoustics requires fftw and
soxr, but no sysreqs are declared; ijtiff links agains jpeg, but it's
not declared; there are a handful of packages linking against GSL and
not declaring it, such as BayesSAE, BayesVarSel, bnpmr...

Speaking of which... it would be *great* to have some standardization
in the way sysreqs are declared... But that's another story for
another day.

- Checks for buildroot path in the installed files. E.g., RUnit calls
system.file in man/checkFuncs.Rd, and as a result, the installed
manual contains the buildroot path, which should never happen. Another
example is TMB, but in this case the buildroot ends up in a binary
file, simple.so, that is compiled during the installation.

- Checks for incorrect NeedsCompilation. Some packages have this flag,
but nothing is compiled. E.g., reshape, analogueExtra, AGHmatrix...

- Checks for execution flags. The execution bit is enabled in many,
many files in many packages when it shouldn't (i.e., there's no
shebang). An example that comes to mind: Javascript files under "inst"
in shinyAce.

- Checks for incorrect versions in dependencies. E.g., rtweet depends
on magrittr >= 1.5.0, and abstractr depends on gridExtra >= 2.3.0. It
should be 1.5 and 2.3 respectively. This may not be important on CRAN,
because version comparisons still work in R, but this fails in other
systems, such as RPM packaging.

- Checks for top-level directories. I suppose there are some already
in place, but e.g., adapr has a zero-length file called "data" in the
sources. It seems that the installation command simply ignores it, but
it shouldn't be there.

Thanks for the already huge efforts to implement more thorough checks.
Hope this helps for the task.

Regards,
-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] reverse dependency checks

2019-09-04 Thread Iñaki Ucar
On Tue, 3 Sep 2019 at 14:53, Therneau, Terry M., Ph.D. via R-devel
 wrote:
>
> I remember there was advice about a server that one could use for reverse 
> dependency
> checks, but I forgot to write it down.  (Or I did save the info and forgot 
> where I saved
> it...)   I have been doing the checks for survival myself, but the count is 
> getting out of
> hand (663, not counting bioconductor).
>
> Any pointers?

You could try Yihui's crandalf [1]. Locally, I don't know what you are
using, but there are a few alternatives. Notably, Dirk's prrd [2] (the
only one that I tried myself and I can thus recommend), Gabor's
revdepcheck [3] and R's brand new tools::check_packages_in_dir().

[1] https://github.com/yihui/crandalf
[2] https://github.com/eddelbuettel/prrd
[3] https://github.com/r-lib/revdepcheck

Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [EXTERNAL] RE: install_github and survival

2019-09-06 Thread Iñaki Ucar
On Fri, 6 Sep 2019 at 14:08, Therneau, Terry M., Ph.D. via R-devel
 wrote:
>
> Yes, that is exactly the problem.  The code found in the "config" script is 
> never run.
> But why doesn't it get run?

It should be called "configure", not "config".

Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] should base R have a piping operator ?

2019-10-05 Thread Iñaki Ucar
On Sat, 5 Oct 2019 at 17:15, Hugh Marera  wrote:
>
> How is your argument different to, say,  "Should dplyr or data.table be
> part of base R as they are the most popular data science packages and they
> are used by a large number of users?"

Two packages with many features, dozens of functions and under heavy
development to fix bugs, add new features and improve performance, vs.
a single operator with a limited and well-defined functionality, and a
reference implementation that hasn't changed in years (but certainly
hackish in a way that probably could only be improved from R itself).

Can't you really spot the difference?

Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] should base R have a piping operator ?

2019-10-05 Thread Iñaki Ucar
On Sat, 5 Oct 2019 at 18:10, Rui Barradas  wrote:
>
> R is a functional language, pipes are not.

How would you classify them? Pipes are analogous to function
composition. In that sense, they are more functional than classes, and
R does have classes.

Anyway, I don't see "purity" as a valid argument either in favour or
against any given feature. Language classification may be useful for
theorists, but certainly not for practitioners.

Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] should base R have a piping operator ?

2019-10-05 Thread Iñaki Ucar
On Sat, 5 Oct 2019 at 19:54, Hugh Marera  wrote:
>
> [...] it is not very difficult to find functions in dplyr or data.table or 
> indeed other packages that one may wish to be in base R. Examples, for me, 
> could include data.table::fread

You have utils::read.table and the like.

> dplyr::group_by & dplyr::summari[sZ]e combo

base::tapply, base::by, stats::aggregate.

> [...] Many R users don't even know that they are installing the magrittr 
> package.

And that's one of the reasons why the proposal makes sense. Another
one is that the pipe plays well with many base R functions, such as
subset, transform, merge, aggregate and reshape.

Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] should base R have a piping operator ?

2019-10-06 Thread Iñaki Ucar
On Sun, 6 Oct 2019 at 10:30, Joris Meys  wrote:
>
> I'm largely with Gabriel Becker on this one: if pipes enter base R, they
> should be a well thought out and integrated part of the language.
>
> I do see merit though in providing a pipe in base R. Reason is mainly that
> right now there's not a single pipe. A pipe function exists in different
> packages, and it's not impossible that at one point piping operators might
> behave slightly different depending on the package you load. So I hope
> someone from RStudio is reading this thread and decides to do the heavy
> lifting for R core. After all, it really is mainly their packages that
> would benefit from it.

Completely agree with Gabriel and Joris.

> I can't think of a non-tidyverse package that's
> easier to use with pipes than without.

I can give you one (disclaimer: it's one of my packages): simmer,
which is specifically designed to work with pipes, and has nothing to
do with the tidyverse.

Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Puzzled about a new method for "[".

2019-11-03 Thread Iñaki Ucar
On Sun, 3 Nov 2019 at 22:12, Rolf Turner  wrote:
>
>
> I recently tried to write a new method for "[", to be applied to data
> frames, so that the object returned would retain (all) attributes of the
> columns, including attributes that my code had created.
>
> I thrashed around for quite a while, and then got some help from Rui
> Barradas who showed me how to do it, in the following manner:
>
> `[.myclass` <- function(x, i, j, drop = if (missing(i)) TRUE else
> length(cols) == 1)[{
> SaveAt <- lapply(x, attributes)
> x <- NextMethod()
> lX <- lapply(names(x),function(nm, x, Sat){
>   attributes(x[[nm]]) <- Sat[[nm]]
>   x[[nm]]}, x = x, Sat = SaveAt)
> names(lX) <- names(x)
> x <- as.data.frame(lX)
> x
> }
>
> If I set class(X) <- c("myclass",class(X)) and apply "[" to X (e.g.
> something like X[1:42,]) the attributes are retained as desired.
>
> OK.  All good.  Now we finally come to my question!  I want to put this
> new method into a package that I am building.  When I build the package
> and run R CMD check I get a complaint:
>
> ... no visible binding for global variable ‘cols’
>
> And indeed, there is no such variable.  At first I thought that maybe
> the code should be
>
> `[.myclass` <- function(x, i, j, drop = if (missing(i)) TRUE else
>length(j) == 1)[{
>
> But I looked at "[.data.frame" and it has "cols" too; not "j".
>
> So why doesn't "[.data.frame" throw a warning when R gets built?
>
> Can someone please explain to me what's going on here?

The thing is...

test <- function(x = y * 2) {
  y <- 1
  x
}

test()
# 2

Lazy evaluation magic.

Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Puzzled about a new method for "[".

2019-11-05 Thread Iñaki Ucar
You can try for testing with a column of class errors, from the package
'errors'. The attributes depend on the content in the way Hadley pointed
out.

Iñaki

El lun., 4 nov. 2019 23:19, Rolf Turner  escribió:

> On 5/11/19 10:54 AM, Duncan Murdoch wrote:
> > On 04/11/2019 4:40 p.m., Pages, Herve wrote:
> >> Hi Rolf,
> >>
> >> On 11/4/19 12:28, Rolf Turner wrote:
> >>>
> >>> On 5/11/19 3:41 AM, Hadley Wickham wrote:
> >>>
>  For what it's worth, I don't think this strategy can work in general,
>  because a class might have attributes that depend on its data/contents
>  (e.g.
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__vctrs.r-2Dlib.org_articles_s3-2Dvector.html-23cached-2Dsum&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=pqLHzHYLUeyQnxA1K_XhSbKJql6r9wK1RXcDG2tuZ6s&s=kPUlNqBPr6j4lPvqkIj8w2Gl5JYGLqJ7ws6wH5tpGcw&e=
> 
>  ). I
>  don't think these are particularly common in practice, but it's
>  dangerous to assume that you can restore a class simply by restoring
>  its attributes after subsetting.
> >>>
> >>>
> >>> You're probably right that there are lurking perils in general, but I
> am
> >>> not trying to "restore a class".  I simply want to *retain* attributes
> >>> of columns in a data frame.
> >>>
> >>> * I have a data frame X
> >>> * I attach attributes to certain of its columns;
> >>>attr(X$melvin,"clyde") <- 42
> >>> (I *don't* change the class of X$melvin.)
> >>> * I form a subset of X:
> >>>   Y <- X[1:100,3:10]
> >>> * given that "melvin" is amongst columns 3 through 10 of X,
> >>>   I want Y$melvin to retain the attribute "clyde", i.e. I
> >>>   want attr(Y$melvin,"clyde") to return 42
> >>>
> >>> There is almost surely a better approach than the one that I've chosen
> >>> (isn't there always?) but it seems to work, and the perils certainly
> are
> >>> not immediately apparent to me.
> >>
> >> Maybe you've solved the problem for the columns that contain your
> >> objects but now you've introduced a potential problem for columns that
> >> contain objects with attributes whose value depend on content.
> >>
> >> Hadley it right that restoring the original attributes of a vector (list
> >> or atomic) after subsetting is unsafe.
> >
> > Right, so Rolf should only restore attributes that are ones he added in
> > the first place.  Unknown attributes should be left alone.
>
> Fair point.  And that gets fiddly.  I guess I'm going to have to rethink
> my strategy.
>
> cheers,
>
> Rolf
>
> --
> Honorary Research Fellow
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Another wish (?) for R 4.0.0: print(*, width = )

2020-01-07 Thread Iñaki Ucar
On Wed, 8 Jan 2020 at 02:05, Pages, Herve  wrote:
>
> On 1/7/20 06:13, brodie gaslam via R-devel wrote:
> ...
> > Happy new decade.
>
>   *** caught segfault ***
> conflicting decade boundaries

https://xkcd.com/2249/ ;-)

>
> Traceback:
>   1: new_decade <- 2020:2029
>   2: previous_decade <- 2011:2020
>   3: previous_previous_decade <- 2001:2010
>   4: current_millenium <- 2001:3000
>   5: previous_millenium <- 1001:2000
>   6: previous_previous_millenium <- 1:1000
>
> Cheers,
> H.

Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] add jsslogo.jpg to R sources?

2020-01-08 Thread Iñaki Ucar
On Wed, 8 Jan 2020 at 19:21, Toby Hocking  wrote:
>
> Hi R-core, I was wondering if somebody could please add jsslogo.jpg to the
> R sources? (as I reported yesterday in this bug)
>
> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17687
>
> R already includes jss.cls which is the document class file for Journal of
> Statistical Software. Actually, for the jss.cls file to be useful, it also
> requires jsslogo.jpg in order to compile JSS articles without error.
>
> This is an issue for me because I am writing a JSS paper that includes
> figures created using tikzDevice, which I am telling to use the jss class
> for computing metrics. On debian/ubuntu the R-src/share/texmf directory is
> copied to /usr/share/texmf/tex/latex/R, so tikzDevice is finding jss.cls in
> /usr/share/texmf/tex/latex/R/tex/latex/jss.cls but it is failing with a
> 'jsslogo not found' error -- the fix is to also include jsslogo.jpg in the
> R sources (in the same directory as jss.cls).

Why don't you just include jsslogo.jpg in your working directory?
jss.cls is included in the R sources because there are many vignettes
with the JSS style, but always *without* the logo. The logo should
only be used for actual JSS publication, so I think that the R sources
are no place for it.

-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()

2020-01-12 Thread Iñaki Ucar
On Sun, 12 Jan 2020 at 00:49, Henrik Bengtsson
 wrote:
>
> [snip]
>
> A final plead: Adding an option to disable forking, at least in the
> 'parallel' package only, will spare people (end users, developers,
> sysadms, ...) many many hours of troubleshooting and eventually trying
> to find workarounds. Those hours adds up quickly given the number of R
> users we have out there.  We have more important things to spend our
> time on.  I can easily count days wasted due to troubleshooting and
> helping others remotely on problems related to instability of forked
> processing. Being able to disable it, would have shortcut this quite a
> bit.

+1 to such an option. I don't see how this could be implemented in
another package. One could do something like

stop_on_fork <- inline::cfunction(
  body='pthread_atfork(stop, NULL, NULL);',
  includes='#include ', convention=".C",
  otherdefs='void stop() { Rf_error("Fork disabled"); }')
stop_on_fork()
parallel::mclapply(1:2, force)

which works nice in a standalone R session, but freezes RStudio.
Another workaround would be

unlockBinding("mclapply", getNamespace("parallel"))
assignInNamespace("mclapply", function(...) stop("Fork disabled"), "parallel")
parallel::mclapply(1:2, force)

(plus several more bindings to cover all the cases), but that's not
allowed, and shouldn't be allowed, on CRAN.

Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN check fails if website is unavailable on Fedora platforms

2020-01-14 Thread Iñaki Ucar
On Tue, 14 Jan 2020 at 15:06, Siegfried Köstlmeier
 wrote:
>
> Hi all,
>
> I maintain the package „qrandom“ which is based on a web API. In last time
> the testthat tests failed because the website was down.
> I implemented the following code in v1.2.2 to ensure that tests are only run
> if the website is accessible and to avoid the CRAN checks to fail:
>
> > library(testthat)
> > library(qrandom)
>
> > check_qrng <- function(){
> >   tryCatch(
> > expr = {
> >   req <- curl::curl_fetch_memory('https://qrng.anu.edu.au/index.php')
> >   req$status_code
> > },
> > error = function(e){
> >   -1
> > }
> >   )
> > }
>
> > ## test package separated with filter due to limited Travis-CI build time
> > ## HTTP status 200 indicates “OK”
> > if(curl::has_internet() & check_qrng() == 200){
> >test_check('qrandom', filter = "qrandom")
> >test_check('qrandom', filter = "qrandomunif")
> >test_check('qrandom', filter = "qrandomnorm")
> >test_check('qrandom', filter = "qUUID")
> >test_check('qrandom', filter = "qrandommaxint")
> > }
>
> I was informed that the check results
>   had an
> error status for both flavor r-devel-linux-x86_64-fedora-clang and
> r-devel-linux-x86_64-fedora-gcc, while the other platforms showed the status
> “OK”. Currently, the status is “OK” for all updated package versions 1.2.2
> because the website is available again.
>
> What is it that the above code does not prevent the checks to be run if the
> website is not available specifically on Fedora systems? May it be that curl
> is platform dependent or are CRAN package checks run different here? I would
> be pleased to avoid these check fails in future.

Since you use testthat, you can use skip_on_cran() to skip tests that
require an Internet connection. Yet it would be a better idea to mock
those tests on CRAN (see packages vcr, webmockr, httptest, and
probably others).

Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] as-cran issue ==> set _R_CHECK_LENGTH_1_* settings!

2020-01-15 Thread Iñaki Ucar
A bit off-topic, but...

On Wed, 15 Jan 2020 at 05:45, Abby Spurdle  wrote:
>
> > Which version of Fedora are you on?
>
> I've got Fedora 31.
> I just checked, and R 3.6.2 is available now.

R 3.6.2 was submitted a month ago for testing and reached stable 19
days ago [1]. At any time, you can see which version is available in
stable (updates repo) and in testing for all supported Fedora and EPEL
versions in [2].

> Progress...
> ...however, there's another problem.
>
> From the dependencies:
> R-java   x86_64 3.6.2-1.fc31   updates  10 k
> R-java-devel x86_64 3.6.2-1.fc31   updates 9.9 k
> java-1.8.0-openjdk   x86_64 1:1.8.0.232.b09-0.fc31 updates 281 k
> java-1.8.0-openjdk-devel x86_64 1:1.8.0.232.b09-0.fc31 updates 9.3 M
> java-1.8.0-openjdk-headless
>  x86_64 1:1.8.0.232.b09-0.fc31 updates  32 M
>
> So, Linux's R (or at least Fedora's R) is dependent on Java.
> -> Bad idea...

(Not so) fresh news: R officially supports Java [3], and many packages
on CRAN use Java (at least 170 by my count). So if you simply install
"R", you are requesting a *full* R installation, which of course
includes Java. However, if you don't want Java nor any of these
packages, R-core and R-core-devel do not depend on Java, as Ralf
pointed out.

> I'm using OpenJ9, so I can't install R like this without causing
> significant problems.
> (But please someone correct me if I'm wrong).

Note though that the R-java bits do not depend on any specific version
of Java. Several versions of Java can coexist, and then you can switch
between them using alternatives [4].

[1] https://bodhi.fedoraproject.org/updates/FEDORA-2019-3d6f517d22
[2] https://src.fedoraproject.org/rpms/R
[3] https://cran.r-project.org/doc/manuals/r-devel/R-admin.html#Java-support
[4] https://fedoraproject.org/wiki/Java

-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] A bug understanding F relative to FALSE?

2020-01-15 Thread Iñaki Ucar
On Wed, 15 Jan 2020 at 15:14, IAGO GINÉ VÁZQUEZ  wrote:
>
> Hi all,
>
> Is the next behaviour suitable?
>
> identical(F,FALSE)
>
> ## [1] TRUE
>
> utils::getParseData(parse(text = "c(F,FALSE)", keep.so=rce = TRUE))
>
> ##line1 col1 line2 col2 id parenttoken terminal  text
> ## 14 11 1   10 14  0 exprFALSE
> ## 1  11 11  1  3 SYMBOL_FUNCTION_CALL TRUE c
> ## 3  11 11  3 14 exprFALSE
> ## 2  12 12  2 14  '(' TRUE (
> ## 4  13 13  4  6   SYMBOL TRUE F
> ## 6  13 13  6 14 exprFALSE
> ## 5  14 14  5 14  ',' TRUE ,
> ## 9  15 19  9 10NUM_CONST TRUE FALSE
> ## 10 15 19 10 14 exprFALSE
> ## 11 1   10 1   10 11 14  ')' TRUE )
>
> I would expect that token for F is the same as token for FALSE.

>From the manual:

‘TRUE’ and ‘FALSE’ are reserved words denoting logical constants
in the R language, whereas ‘T’ and ‘F’ are global variables whose
initial values set to these.

Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Plotmath on Fedora 31 broken with with pango >= 1.44 - workarounds?

2020-03-25 Thread Iñaki Ucar
On Wed, 25 Mar 2020 at 01:14, Gavin Simpson  wrote:
>
> Dear list
>
> On Fedora 31 the pango library has recently updated to version >= 1.44
> and in doing so has switched to using the HarfBuzz library (from
> FreeType) and dropped Adobe Type 1 font support. This causes problems
> with plotmath as all bar one of the glyphs doesn't render (see
> attached PNG image if it makes it through the list filters - if not I
> have shared a copy via my google drive:
> https://drive.google.com/file/d/1llFqKHD7LFKzQbVuq6sibY1UizRn7xxS/view?usp=sharing
> )
>
> I'm not the only person who has come across this, e.g.
> https://stackoverflow.com/q/60656445/429846 and the resulting reported
> bug on the RedHat Bugzilla:
> https://bugzilla.redhat.com/show_bug.cgi?id=1815128
>
> Beyond switching to  `type = 'Xlib'`, has anyone worked around this
> issue on a Fedora 31 or later system?

Adding de...@lists.fp.o to CC. A workaround is to avoid using PS fonts
for symbols. If you run the following, you'll see

$ fc-match Symbol
StandardSymbolsPS.t1: "Standard Symbols PS" "Regular"

So let's change this. Install a TTF symbol font, such as Symbola:

$ sudo dnf install gdouros-symbola-fonts

Then add the following to /etc/fonts/local.conf (system-wide) or
~/.fonts.conf (just for your user):



 Symbol
 
   Symbola
 



Now you should see this:

$ fc-match Symbol
Symbola.ttf: "Symbola" "Regular"

and symbols should render correctly.

Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Plotmath on Fedora 31 broken with with pango >= 1.44 - workarounds?

2020-03-27 Thread Iñaki Ucar
On Wed, 25 Mar 2020 at 12:25, Nicolas Mailhot
 wrote:
>
> 
>
> R brought this all on itself by hardcoding a Windows-only “Symbol” font
> family name in its default conf. Linux systems are UTF-8 by default for
> ~20 years now, they don’t need the forcing of magic font families to
> handle symbols not present in the 8-bit legacy Windows encodings.
>
> The actual effect of this conf is not the selection of font files with
> special and unusual symbols. It is to priorize fonts that match the
> "Symbol" magic name. And those fonts are few and crumbling on Linux
> systems, because no one has needed to bother with them since Linux
> switched to UTF-8 last millenium.
>
> Just stop using “Symbol” in R and things will work a lot better.
> Alternatively, prepare to maintain the “Symbol” aliasing stack in
> fontconfig (and fight with wine for it), because *no* *one* *else*
> *cares* about this legacy Windows-specific stuff.

So, in the light of Nicolas' input (thanks!), I think that font
selection should be fixed upstream in R. I'd be happy to put all this
together in R's bugzilla, but I don't have an account. Could someone
please invite me?

Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Plotmath on Fedora 31 broken with with pango >= 1.44 - workarounds?

2020-03-29 Thread Iñaki Ucar
Thanks, Paul. I've created a bug report to keep track of this
(https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17748), and taken
the liberty of adding you to CC. We'll need to cherry-pick the fix for
R 3.6.3 in Fedora 31.

Iñaki

On Sun, 29 Mar 2020 at 21:15, Paul Murrell  wrote:
>
> Hi
>
> Thanks for your input on this Iñaki and Nicolas.
>
> I am starting testing an R fix for this problem today.
>
> As suggested, the plan is to allow the R user to specify a font family
> other than "symbol" for plotmath output (or, more generally, in R
> parlance, for 'font=5' or 'fontface=5') on a Cairo-based graphics device.
>
> Paul
>
>
> On 27/03/20 11:30 pm, Iñaki Ucar wrote:
> > On Wed, 25 Mar 2020 at 12:25, Nicolas Mailhot
> >  wrote:
> >>
> >> 
> >>
> >> R brought this all on itself by hardcoding a Windows-only “Symbol” font
> >> family name in its default conf. Linux systems are UTF-8 by default for
> >> ~20 years now, they don’t need the forcing of magic font families to
> >> handle symbols not present in the 8-bit legacy Windows encodings.
> >>
> >> The actual effect of this conf is not the selection of font files with
> >> special and unusual symbols. It is to priorize fonts that match the
> >> "Symbol" magic name. And those fonts are few and crumbling on Linux
> >> systems, because no one has needed to bother with them since Linux
> >> switched to UTF-8 last millenium.
> >>
> >> Just stop using “Symbol” in R and things will work a lot better.
> >> Alternatively, prepare to maintain the “Symbol” aliasing stack in
> >> fontconfig (and fight with wine for it), because *no* *one* *else*
> >> *cares* about this legacy Windows-specific stuff.
> >
> > So, in the light of Nicolas' input (thanks!), I think that font
> > selection should be fixed upstream in R. I'd be happy to put all this
> > together in R's bugzilla, but I don't have an account. Could someone
> > please invite me?
> >
> > Iñaki
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >



-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Plotmath on Fedora 31 broken with with pango >= 1.44 - workarounds?

2020-03-30 Thread Iñaki Ucar
On Mon, 30 Mar 2020 at 04:24, Paul Murrell  wrote:
>
> Hi
>
> I have created an R branch that contains a potential fix ...
>
> https://svn.r-project.org/R/branches/R-symfam/
>
> This allows, for example, ...
>
> cairo_pdf(symbolfamily="OpenSymbol")
>
> ... to specify that the OpenSymbol family should be used as the "symbol"
> font (e.g., for "plotmath") in R.

Will this be a default on Linux? Or are you planning any mechanism
(env variable, option...) to make it the default? Because, otherwise,
as pango is updated across distributions, R graphics will be "broken"
by default unless the user explicitly calls the graphics device in
that way to set that option, which I would say is uncommon.

Iñaki

> This is just a separate branch for now because, while I have tested it
> under Unbuntu 18.04 and Fedora 31, I cannot even build R for Windows
> (right now) or Mac (ever) and I do not want to drop a bomb on R-devel at
> this stage of the release process for R 4.0.0.
>
> The attached file contains at least an outline of steps required to do a
> minimal test if anyone wants to try the fix on Linux.
>
> cc'ing Simon and Jeroen in case they are able to help with checking that
> this builds and works on Mac and/or Windows.
>
> NOTEs:
> - 'symbolfamily' can only be specified when a graphics device is opened,
> and it is then fixed for that device.
> - on Windows, for cairo-based devices, the "symbol" font is still
> hard-coded as "Standard Symbols L"
>
>
> Paul
>
> On 30/03/20 8:15 am, Paul Murrell wrote:
> > Hi
> >
> > Thanks for your input on this Iñaki and Nicolas.
> >
> > I am starting testing an R fix for this problem today.
> >
> > As suggested, the plan is to allow the R user to specify a font family
> > other than "symbol" for plotmath output (or, more generally, in R
> > parlance, for 'font=5' or 'fontface=5') on a Cairo-based graphics device.
> >
> > Paul
> >
> >
> > On 27/03/20 11:30 pm, Iñaki Ucar wrote:
> >> On Wed, 25 Mar 2020 at 12:25, Nicolas Mailhot
> >>  wrote:
> >>>
> >>> 
> >>>
> >>> R brought this all on itself by hardcoding a Windows-only “Symbol” font
> >>> family name in its default conf. Linux systems are UTF-8 by default for
> >>> ~20 years now, they don’t need the forcing of magic font families to
> >>> handle symbols not present in the 8-bit legacy Windows encodings.
> >>>
> >>> The actual effect of this conf is not the selection of font files with
> >>> special and unusual symbols. It is to priorize fonts that match the
> >>> "Symbol" magic name. And those fonts are few and crumbling on Linux
> >>> systems, because no one has needed to bother with them since Linux
> >>> switched to UTF-8 last millenium.
> >>>
> >>> Just stop using “Symbol” in R and things will work a lot better.
> >>> Alternatively, prepare to maintain the “Symbol” aliasing stack in
> >>> fontconfig (and fight with wine for it), because *no* *one* *else*
> >>> *cares* about this legacy Windows-specific stuff.
> >>
> >> So, in the light of Nicolas' input (thanks!), I think that font
> >> selection should be fixed upstream in R. I'd be happy to put all this
> >> together in R's bugzilla, but I don't have an account. Could someone
> >> please invite me?
> >>
> >> Iñaki
> >>
> >> __
> >> R-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
>
> --
> Dr Paul Murrell
> Department of Statistics
> The University of Auckland
> Private Bag 92019
> Auckland
> New Zealand
> 64 9 3737599 x85392
> p...@stat.auckland.ac.nz
> http://www.stat.auckland.ac.nz/~paul/



-- 
Iñaki Úcar

On Mon, 30 Mar 2020 at 04:24, Paul Murrell  wrote:
>
> Hi
>
> I have created an R branch that contains a potential fix ...
>
> https://svn.r-project.org/R/branches/R-symfam/
>
> This allows, for example, ...
>
> cairo_pdf(symbolfamily="OpenSymbol")
>
> ... to specify that the OpenSymbol family should be used as the "symbol"
> font (e.g., for "plotmath") in R.
>
> This is just a separate branch for now because, while I have tested it
> under Unbuntu 18.04 and Fedora 31, I cannot even build R for Windows
> (right now) or Mac (ever) and I do not want to drop a bomb on R-devel at
> this stage of the release process for

Re: [Rd] Plotmath on Fedora 31 broken with with pango >= 1.44 - workarounds?

2020-03-30 Thread Iñaki Ucar
On Mon, 30 Mar 2020 at 22:41, Paul Murrell  wrote:
>
> Hi
>
> On 30/03/20 10:43 pm, Iñaki Ucar wrote:
> > On Mon, 30 Mar 2020 at 04:24, Paul Murrell  wrote:
> >>
> >> Hi
> >>
> >> I have created an R branch that contains a potential fix ...
> >>
> >> https://svn.r-project.org/R/branches/R-symfam/
> >>
> >> This allows, for example, ...
> >>
> >> cairo_pdf(symbolfamily="OpenSymbol")
> >>
> >> ... to specify that the OpenSymbol family should be used as the "symbol"
> >> font (e.g., for "plotmath") in R.
> >
> > Will this be a default on Linux? Or are you planning any mechanism
> > (env variable, option...) to make it the default? Because, otherwise,
> > as pango is updated across distributions, R graphics will be "broken"
> > by default unless the user explicitly calls the graphics device in
> > that way to set that option, which I would say is uncommon.
>
> Good question.  Currently, for x11() (and png() etc) the default is
> taken from X11.options().  So it is possible to set this default for a
> session, or even for an installation via one of the ?Startup mechanisms
> (e.g., an R_HOME/etc/Rprofile.site file).
>
> For svg(), cairo_pdf(), and cairo_ps(), the default is hard-coded in the
> function arguments, but I *think* they are used less as default graphics
> devices.
>
> Another option would be to try to detect Fedora and set the default
> X11.options() differently there.  Two problems:  I am not sure there is
> a reliable R code chunk for detecting Fedora (sessionInfo()$running?)
> let alone Fedora >= 30;   what to set the default to?  (just has to be a
> font with a good Unicode coverage that is pretty much guaranteed to be
> in a default Fedora install).

As per Nicolas' comment (I failed to include him in CC in my last
email, and he's not in this list, sorry for that) any font installed
by default would have good symbol coverage, so there's really no need
to set a different font for symbols. According again to Nicolas (he's
one of the font experts in Fedora), the "sans-serif" or "monospace"
fontconfig defaults would work out of the box, and if a symbol is not
available, fontconfig should fallback gracefully to another font.

So maybe instead of a new "symbolfamily" argument, maybe it's better
to just use the "family" for all characters, including symbols, on
Linux, and fontconfig should take care of everything (if I understood
correctly your explanation, Nicolas; please correct me if I'm wrong).

-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Plotmath on Fedora 31 broken with with pango >= 1.44 - workarounds?

2020-03-31 Thread Iñaki Ucar
On Tue, 31 Mar 2020 at 03:32, Paul Murrell  wrote:
>
> I think R will retain the idea of a separate symbol font in at least the
> short term because of backward compatibility and cross-platform support
> and support for a range of graphics devices.  So this fix is just for
> cairo-based devices on Linux at most (probably only Fedora).
>
> So this becomes just a decision about user interface and default settings.
>
> I did consider the option of allowing the existing "family" parameter to
> be length-two (with the second one being an optional symbol font
> specification), but because of the overlaps of X11/cairo and different
> cairo-based device interfaces, this became awkward.  Hence the separate
> "symbolfamily" interface.  And in any case, this still means a separate
> "symbol" font specification (for the reasons above).
>
> Regarding changing to a default symbolfamily=family on Linux generally
> (rather than just on Fedora), I have at least one counter-example (my
> Ubuntu 18.04) that shows that this would degrade output significantly.
> For one, the symbols are a LOT uglier, plus there are some incorrect
> glyphs.  So I think we have to stay with treating Fedora as a special
> case for now.

You can try Noto Sans Symbols (google-noto-sans-symbols-fonts) or
Symbola (gdouros-symbola-fonts). We could make the R package depend on
any of these fonts included in Fedora.

> Thanks for your point about just using symbolfamily=family as the Fedora
> default.  That seems reasonable (and definitely better than it just
> being completely broken!).
>
> That does still leave the problem of how to set the default value for
> "symbolfamily" JUST on Fedora.   I am not convinced we can use R code to
> detect Fedora >= 30 reliably (but happy to learn otherwise).  Is it a
> possibility for the Fedora distribution to include a .Rprofile.site file
> that sets the X11.options() ?

1. I don't think you need to detect you are in Fedora at all, just to
detect the version of pango, and apply this configuration if it's >=
1.44 (e.g., by executing pango-view --version; or better yet, at
building time 
https://developer.gnome.org/pango/stable/pango-Version-Checking.html).

2. Yes, we can include any custom configuration files or patches. In
fact we will need to patch R 3.6.3 for Fedora 31 at least, because
Fedora 32 is about to be released, and thus R 4.0.0 won't be included
in Fedora 31. The problem with the .Rprofile.site is that any
user-specific .Rprofile will prevent the default from being loaded,
right? And I'd say ~/.Rprofile is pretty common out there, and even
project-specific .Rprofile.

-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Help useRs to use R's own Time/Date objects more efficiently

2020-04-04 Thread Iñaki Ucar
On Sat, 4 Apr 2020 at 11:51, Martin Maechler  wrote:
>
> This is mostly a RFC  [but *not* about the many extra packages, please..]:
>
> Noticing to my chagrin  how my students work in a project,
> googling for R code and cut'n'pasting stuff together, accumulating
> this and that package on the way  all just for simple daily time series
> (though with partly missing parts),
> using chron, zoo, lubridate, ...  all for things that are very
> easy in base R *IF* you read help pages and start thinking on
> your own (...), I've noted once more that the above "if" is a
> very strong one, and seems to happen rarely nowadays by typical R users...
> (yes, I stop whining for now).

It's not my intention to sound harsh here, but just to provide
constructive criticism (I clarify this beforehand because, you know,
this is an email).

It's too easy to whine about this every now and then, and blame the
useRs for not being diligent enough, not patient enough and not
reading enough manual pages. But did you considered that maybe it's
the usability of this stuff in base R what leaves much to be desired,
and the lack of good and intuitive helpers what triggered the
development of so many related packages?

> In this case, I propose to slightly improve the situation ...
> by adding a few more lines to one help page [[how could that
> help in the age where "google"+"cut'n'paste" has replaced thinking ? .. ]] :

Google + cut'n'paste hasn't replaced thinking, but struggling. So no,
I don't think that more documentation (which I do think is already
great) improves the situation.

...snip...

> In the distant past / one of the last times I touched on people
> using (base) R's  Date / Time-Date  objects, I had started
> thinking if we should not provide some simple utilities to "base R"
> (not in the 'base' pkg, but rather 'utils') for "extracting" from
> {POSIX(ct), Date} objects ... and we may have discussed that
> within R Core 20 years ago,  and had always thought that this
> shouldn't be hard for useRs themselves to see how to do...

Never too late to change your mind.

> But then I see that "everybody" uses extension packages instead,
> even in the many situations where there's no gain doing so,
> but rather increases the dependency-complexity of the data analysis
> unnecessarily.

I do think there's gain. Again, it's not poor silly useRs not doing
their homework, it's a handful of developers that invested many many
hours of their time for years producing extension packages for a
functionality that is perfectly covered in base R. Maybe it's time to
think that it's not that well covered?

-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Plotmath on Fedora 31 broken with with pango >= 1.44 - workarounds?

2020-04-06 Thread Iñaki Ucar
On Mon, 6 Apr 2020 at 04:59, Paul Murrell  wrote:
>
> Hi
>
> The R branch ...
>
> https://svn.r-project.org/R/branches/R-symfam/
>
> ... is now set up so that it works "out of the box" on Fedora by setting
> the default to be 'symbolfamily=cairoSymbolFont(family, usePUA=FALSE)'
> when grSoftVersion()["pango"] is greater than "1.44".

That is awesome, thanks! Will you port this to the R-3-6-branch?

-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R 4.0.0 build error with sysdata.rda on ppc64el architecture

2020-04-30 Thread Iñaki Ucar
On Thu, 30 Apr 2020 at 02:49, Dirk Eddelbuettel  wrote:
>
>
> On 29 April 2020 at 11:22, peter dalgaard wrote:
> | Hum, at least it is not Apple, so maybe you can attach a debugger to the 
> running process? (gdb -p process_id or something like that --- haven't 
> actually done it for a decade). Then at least we can get a stack trace and a 
> clue about where it is looping. Diddling optimization options can also 
> sometimes provide a clue.
>
> (Missed this earlier as the conversation moved off-list.)
>
> And to keep the list abreast, this appears to be related to the long double
> issue on powerpc where needed an extra #define to ensure compilation. That
> commit is the difference in a bisection as I was able to demonstrate. The
> issue can also be circumvented by disabling long double support on the
> platform, but hopefully a better fix can be found.  Bryan Lewis was
> eagle-eyed on this and very helpful. The issue is now back in the hands of R
> Core and I and others will await the news.

Which reminds me that [1] was required for v3.6.2. Could be related?

[1] 
https://src.fedoraproject.org/rpms/R/blob/master/f/R-3.6.2-ppc64-no-const-long-double.patch

Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Restrict package to load-only access - prevent attempts to attach it

2020-07-17 Thread Iñaki Ucar
Hi Henrik,

A bit late, but you can take a look at smbache's {import} package [1]
in case you didn't know it. I believe it does what you are describing.

[1] https://github.com/smbache/import

Iñaki

On Tue, 23 Jun 2020 at 22:21, Henrik Bengtsson
 wrote:
>
> Hi,
>
> I'm developing a package whose API is only meant to be used in other
> packages via imports or pkg::foo().  There should be no need to attach
> this package so that its API appears on the search() path. As a
> maintainer, I want to avoid having it appear in search() conflicts by
> mistake.
>
> This means that, for instance, other packages should declare this
> package under 'Imports' or 'Suggests' but never under 'Depends'.  I
> can document this and hope that's how it's going to be used.  But, I'd
> like to make it explicit that this API should be used via imports or
> ::.  One approach I've considered is:
>
> .onAttach <- function(libname, pkgname) {
>if (nzchar(Sys.getenv("R_CMD"))) return()
>stop("Package ", sQuote(pkgname), " must not be attached")
> }
>
> This would produce an error if the package is attached.  It's
> conditioned on the environment variable 'R_CMD' set by R itself
> whenever 'R CMD ...' runs.  This is done to avoid errors in 'R CMD
> INSTALL' and 'R CMD check' "load tests", which formally are *attach*
> tests.  The above approach passes all the tests and checks I'm aware
> of and on all platforms.
>
> Before I ping the CRAN team explicitly, does anyone know whether this
> is a valid approach?  Do you know if there are alternatives for
> asserting that a package is never attached.  Maybe this is more
> philosophical where the package "contract" is such that all packages
> should be attachable and, if not, then it's not a valid R package.
>
> This is a non-critical topic but if it can be done it would be useful.
>
> Thanks,
>
> Henrik
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Garbage collection of seemingly PROTECTed pairlist

2020-09-12 Thread Iñaki Ucar
Hi,

In line 5, you are allocating a vector of length nc. Then, in line 12, you
are using nr as a limit, so if nr goes beyond nc, which is happening in
line 39, you are in trouble.

Iñaki

On Sat, 12 Sep 2020 at 03:30, Rory Nolan  wrote:

> I want to write an R function using R's C interface that takes a 2-column
> matrix of increasing, non-overlapping integer intervals and returns a list
> with those intervals plus some added intervals, such that there are no
> gaps. For example, it should take the matrix rbind(c(5L, 6L), c(7L, 10L),
> c(20L, 30L)) and return list(c(5L, 6L), c(7L, 10L), c(11L, 19L), c(20L,
> 30L)). Because the output is of variable length, I use a pairlist (because
> it is growable) and then I call Rf_PairToVectorList() at the end to make it
> into a regular list.
>
> I'm getting a strange garbage collection error. My PROTECTed pairlist prlst
> gets garbage collected away and causes a memory leak error when I try to
> access it.
>
> Here's my code.
>
> #include 
>
>
> SEXP C_int_mat_nth_row_nrnc(int *int_mat_int, int nr, int nc, int n) {
>   SEXP out = PROTECT(Rf_allocVector(INTSXP, nc));
>   int *out_int = INTEGER(out);
>   if (n <= 0 | n > nr) {
> for (int i = 0; i != nc; ++i) {
>   out_int[i] = NA_INTEGER;
> }
>   } else {
> for (int i = 0; i != nr; ++i) {
>   out_int[i] = int_mat_int[n - 1 + i * nr];
> }
>   }
>   UNPROTECT(1);
>   return out;}
>
> SEXP C_make_len2_int_vec(int first, int second) {
>   SEXP out = PROTECT(Rf_allocVector(INTSXP, 2));
>   int *out_int = INTEGER(out);
>   out_int[0] = first;
>   out_int[1] = second;
>   UNPROTECT(1);
>   return out;}
>
> SEXP C_fullocate(SEXP int_mat) {
>   int nr = Rf_nrows(int_mat), *int_mat_int = INTEGER(int_mat);
>   int last, row_num;  // row_num will be 1-indexed
>   SEXP prlst0cdr = PROTECT(C_int_mat_nth_row_nrnc(int_mat_int, nr, 2, 1));
>   SEXP prlst = PROTECT(Rf_list1(prlst0cdr));
>   SEXP prlst_tail = prlst;
>   last = INTEGER(prlst0cdr)[1];
>   row_num = 2;
>   while (row_num <= nr) {
> Rprintf("row_num: %i\n", row_num);
> SEXP row = PROTECT(C_int_mat_nth_row_nrnc(int_mat_int, nr, 2,
> row_num));
> Rf_PrintValue(prlst);  // This is where the error occurs
> int *row_int = INTEGER(row);
> if (row_int[0] == last + 1) {
>   Rprintf("here1");
>   SEXP next = PROTECT(Rf_list1(row));
>   prlst_tail = SETCDR(prlst_tail, next);
>   last = row_int[1];
>   UNPROTECT(1);
>   ++row_num;
> } else {
>   Rprintf("here2");
>   SEXP next_car = PROTECT(C_make_len2_int_vec(last + 1, row_int[0] -
> 1));
>   SEXP next = PROTECT(Rf_list1(next_car));
>   prlst_tail = SETCDR(prlst_tail, next);
>   last = row_int[0] - 1;
>   UNPROTECT(2);
> }
> UNPROTECT(1);
>   }
>   SEXP out = PROTECT(Rf_PairToVectorList(prlst));
>   UNPROTECT(3);
>   return out;}
>
> As you can see, I have some diagnostic print statements in there. The
> offending line is line 40, which I have marked with a comment of // This is
> where the error occurs. I have a minimal reproducible package at
> https://github.com/rorynolan/testpkg and I have run R CMD CHECK with
> valgrind using GitHub actions, the results of which are at
> https://github.com/rorynolan/testpkg/runs/1076595757?check_suite_focus=true
> .
> That's where I found out which line is causing the error. This function
> works as expected sometimes, and then sometimes this issue appears. This
> lends weight to the suspicion that it's a garbage collection issue.
>
> I really want to know what my mistake is. I'm not that interested in
> alternative implementations; I want to understand the mistake that I'm
> making so that I can avoid making it in future.
>
> I have asked the question on stackoverflow to little avail, but the
> discussion there may prove helpful.
>
> https://stackoverflow.com/questions/63759604/garbage-collection-of-seemingly-protected-pairlist
> .
>
>
>
> Thanks,
>
> Rory
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>


-- 
Iñaki Úcar

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] parallel PSOCK connection latency is greater on Linux?

2020-11-02 Thread Iñaki Ucar
On Mon, 2 Nov 2020 at 02:22, Simon Urbanek  wrote:
>
> It looks like R sockets on Linux could do with TCP_NODELAY -- without (status 
> quo):

How many network packets are generated with and without it? If there
are many small writes and thus setting TCP_NODELAY causes many small
packets to be sent, it might make more sense to set TCP_QUICKACK
instead.

Iñaki

> Unit: microseconds
>expr  min   lq mean  median   uq  max
>  clusterEvalQ(cl, iris) 1449.997 43991.99 43975.21 43997.1 44001.91 48027.83
>  neval
>   1000
>
> exactly the same machine + R but with TCP_NODELAY enabled in R_SockConnect():
>
> Unit: microseconds
>expr min lq mean  median  uq  max neval
>  clusterEvalQ(cl, iris) 156.125 166.41 180.8806 170.247 174.298 5322.234  1000
>
> Cheers,
> Simon
>
>
> > On 2/11/2020, at 3:39 AM, Jeff  wrote:
> >
> > I'm exploring latency overhead of parallel PSOCK workers and noticed that 
> > serializing/unserializing data back to the main R session is significantly 
> > slower on Linux than it is on Windows/MacOS with similar hardware. Is there 
> > a reason for this difference and is there a way to avoid the apparent 
> > additional Linux overhead?
> >
> > I attempted to isolate the behavior with a test that simply returns an 
> > existing object from the worker back to the main R session.
> >
> > library(parallel)
> > library(microbenchmark)
> > gcinfo(TRUE)
> > cl <- makeCluster(1)
> > (x <- microbenchmark(clusterEvalQ(cl, iris), times = 1000, unit = "us"))
> > plot(x$time, ylab = "microseconds")
> > head(x$time, n = 10)
> >
> > On Windows/MacOS, the test runs in 300-500 microseconds depending on 
> > hardware. A few of the 1000 runs are an order of magnitude slower but this 
> > can probably be attributed to garbage collection on the worker.
> >
> > On Linux, the first 5 or so executions run at comparable speeds but all 
> > subsequent executions are two orders of magnitude slower (~40 milliseconds).
> >
> > I see this behavior across various platforms and hardware combinations:
> >
> > Ubuntu 18.04 (Intel Xeon Platinum 8259CL)
> > Linux Mint 19.3 (AMD Ryzen 7 1800X)
> > Linux Mint 20 (AMD Ryzen 7 3700X)
> > Windows 10 (AMD Ryzen 7 4800H)
> > MacOS 10.15.7 (Intel Core i7-8850H)
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] parallel PSOCK connection latency is greater on Linux?

2020-11-02 Thread Iñaki Ucar
On Mon, 2 Nov 2020 at 14:29, Jeff  wrote:
>
> Could TCP_NODELAY and TCP_QUICKACK be exposed to the R user so that
> they might determine what is best for their potentially latency- or
> throughput-sensitive application?

I think it makes sense (with a sensible default). E.g., Julia does this [1-2].

[1] https://docs.julialang.org/en/v1/stdlib/Sockets/#Sockets.nagle
[2] https://docs.julialang.org/en/v1/stdlib/Sockets/#Sockets.quickack

-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] parallel PSOCK connection latency is greater on Linux?

2020-11-04 Thread Iñaki Ucar
Please, check a tcpdump session on localhost while running the following script:

library(parallel)
library(tictoc)
cl <- makeCluster(1)
Sys.sleep(1)

for (i in 1:10) {
  tic()
  x <- clusterEvalQ(cl, iris)
  toc()
}

The initialization phase comprises 7 packets. Then, the 1-second sleep
will help you see where the evaluation starts. Each clusterEvalQ
generates 6 packets:

1. main -> worker PSH, ACK 1026 bytes
2. worker -> main ACK 66 bytes
3. worker -> main PSH, ACK 3758 bytes
4. main -> worker ACK 66 bytes
5. worker -> main PSH, ACK 2484 bytes
6. main -> worker ACK 66 bytes

The first two are the command and its ACK, the following are the data
back and their ACKs. In the first 4-5 iterations, I see no delay at
all. Then, in the following iterations, a 40 ms delay starts to happen
between packets 3 and 4, that is: the main process delays the ACK to
the first packet of the incoming result.

So I'd say Nagle is hardly to blame for this. It would be interesting
to see how many packets are generated with TCP_NODELAY on. If there
are still 6 packets, then we are fine. If we suddenly see a gazillion
packets, then TCP_NODELAY does more harm than good. On the other hand,
TCP_QUICKACK would surely solve the issue without any drawback. As
Nagle himself put it once, "set TCP_QUICKACK. If you find a case where
that makes things worse, let me know."

Iñaki

On Wed, 4 Nov 2020 at 04:34, Simon Urbanek  wrote:
>
> I'm not sure the user would know ;). This is very system-specific issue just 
> because the Linux network stack behaves so differently from other OSes (for 
> purely historical reasons). That makes it hard to abstract as a "feature" for 
> the R sockets that are supposed to be platform-independent. At least 
> TCP_NODELAY is actually part of POSIX so it is on better footing, and 
> disabling delayed ACK is practically only useful to work around the other 
> side having Nagle on, so I would expect it to be rarely used.
>
> This is essentially RFC since we don't have a mechanism for socket options 
> (well, almost, there is timeout and blocking already...) and I don't think we 
> want to expose low-level details so perhaps one idea would be to add 
> something like delay=NA to socketConnection() in order to not touch (NA), 
> enable (TRUE) or disable (FALSE) TCP_NODELAY. I wonder if there is any other 
> way we could infer the intention of the user to try to choose the right 
> approach...
>
> Cheers,
> Simon
>
>
> > On Nov 3, 2020, at 02:28, Jeff  wrote:
> >
> > Could TCP_NODELAY and TCP_QUICKACK be exposed to the R user so that they 
> > might determine what is best for their potentially latency- or 
> > throughput-sensitive application?
> >
> > Best,
> > Jeff
> >
> > On Mon, Nov 2, 2020 at 14:05, Iñaki Ucar  wrote:
> >> On Mon, 2 Nov 2020 at 02:22, Simon Urbanek  
> >> wrote:
> >>> It looks like R sockets on Linux could do with TCP_NODELAY -- without 
> >>> (status quo):
> >> How many network packets are generated with and without it? If there
> >> are many small writes and thus setting TCP_NODELAY causes many small
> >> packets to be sent, it might make more sense to set TCP_QUICKACK
> >> instead.
> >> Iñaki
> >>> Unit: microseconds
> >>>expr  min   lq mean  median   uq  
> >>> max
> >>>  clusterEvalQ(cl, iris) 1449.997 43991.99 43975.21 43997.1 44001.91 
> >>> 48027.83
> >>>  neval
> >>>   1000
> >>> exactly the same machine + R but with TCP_NODELAY enabled in 
> >>> R_SockConnect():
> >>> Unit: microseconds
> >>>expr min lq mean  median  uq  max 
> >>> neval
> >>>  clusterEvalQ(cl, iris) 156.125 166.41 180.8806 170.247 174.298 5322.234  
> >>> 1000
> >>> Cheers,
> >>> Simon
> >>> > On 2/11/2020, at 3:39 AM, Jeff  wrote:
> >>> >
> >>> > I'm exploring latency overhead of parallel PSOCK workers and noticed 
> >>> > that serializing/unserializing data back to the main R session is 
> >>> > significantly slower on Linux than it is on Windows/MacOS with similar 
> >>> > hardware. Is there a reason for this difference and is there a way to 
> >>> > avoid the apparent additional Linux overhead?
> >>> >
> >>> > I attempted to isolate the behavior with a test that simply returns an 
> >>> > existing object from the worker back to the main R session.
> >>> >
> >>> > library(parallel)
> &g

Re: [Rd] formatting issue with gcc 9.3.0 on Ubuntu on WSL2

2020-11-18 Thread Iñaki Ucar
On Wed, 18 Nov 2020 at 10:26, Tomas Kalibera  wrote:
>
> On 11/17/20 9:34 PM, Bill Dunlap wrote:
> > I just got a new Windows laptop (i7, 10th generation CPU), installed
> > 'Windows Subsystem for Linux 2' and then installed Ubuntu 20.04 and
> > used 'apt-get install' to install packages that the R build seems
> > to require.  In particular, I am using gcc version 9.3.0.   The
> > build went without a hitch but the tests showed that deparse(1e-16)
> > produced "1.00e-16" instead of the expected "1e-16".
> >
> > It looks like the problem is in src/main/format.c:scientific().  The
> > lowest two+ bytes in the fractional part of the long double (80-bit)
> > return value of powl(10.0L, -30L), seem to be corrupted.  I made a
> > standalong program to test powl and saw no problem - it gives the
> > same results for the fractional part as bc does.
> >
> >  bc: A2425FF7 5E14FC31 A125...
> > standalone: 22425FF7 5E14FC32
> >   R: 22425FF7 5E151800
> >
> > There are lots of other small numbers with the same problem:
> >
> >
> >   > grep(value=TRUE, "0e",
> > vapply((1+(0:1)/1000)*1e-15, deparse, ""))
> > [1] "8.56e-15" "8.717000e-15" "8.778000e-15"
> > [4] "8.935000e-15" "9.508000e-15" "9.838000e-15"
> > [7] "9.899000e-15" "9.934000e-15" "9.995000e-15"
> >> str(grep(value=TRUE, "0e", vapply((1+(0:1)/1000)*1e-14, deparse, "")))
> >   chr [1:295] "8.002000e-14" "8.005000e-14" ...
> >
> > Has anyone else seen this?  I am wondering if this is an oddity in WSL2
> >
> >   or Ubuntu's gcc-9.3.0.

I cannot reproduce this issue (version 20H2, build 19042.630; Ubuntu
20.04 installed from the store). Are you sure you are running on WSL2?
(You can check this with `wsl --list --verbose`).

> Almost surely it is Windows/WSL related, I'm not seeing this on Ubuntu
> 20.04.
>
> One thing to check might be the FPU control word. In a Windows build, R
> will set as it is on Unix, to use all 80 bits when values stay in FPU
> registers, which is not the Windows default. This should not matter with
> SSE anymore, but maybe something is still using the FPU. This is just
> using inline assembly, so one could enable it as experiment. In
> principle, this could be also due to some other things specific to
> Windows that R works around in Windows builds, but doesn't in Linux
> builds assuming they will not run on Windows.

It does run on Linux. WSL2 runs a modified version of the Linux kernel
on top of Hyper-V. Unless Bill is running WSL1, which runs on top of
the Windows kernel with a syscall translation layer.

> Other issues I had with WSL in the past (trying to build R and run
> checks) included time-zones and surprising encodings, but I didn't check
> recently. I would not use R on WSL unless my goal was to diagnose these
> issues and see if they could be overcome on the R side.
>
> Best
> Tomas

-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] R crashes when using huge data sets with character string variables

2020-12-13 Thread Iñaki Ucar
On Sun, 13 Dec 2020 at 04:27,  wrote:
>
> If R is receiving a kill signal there is nothing it can do about it.
>
> I am guessing you are running into a memory over-commit issue in your OS.
> https://en.wikipedia.org/wiki/Memory_overcommitment
> https://engineering.pivotal.io/post/virtual_memory_settings_in_linux_-_the_problem_with_overcommit/

Correct. And in particular, this is most probably the earlyoom [1]
service in action, which, I believe, is installed and enabled by
default in Ubuntu 20.04. It is a simple daemon that monitors memory,
and when some conditions are reached (e.g., the system is about to
start swapping), it looks for offending processes and kills them.

[1] https://github.com/rfjakob/earlyoom

Iñaki

> If you have to run this close to your physical memory limits you might
> try using your shell's facility (ulimit for bash, limit for some
> others) to limit process memory/virtual memory use to your available
> physical memory. You can also try setting the R_MAX_VSIZE environment
> variable mentioned in ?Memory; that only affects the R heap, not
> malloc() done elsewhere.
>
> Best,
>
> luke
>
> On Sat, 12 Dec 2020, Arne Henningsen wrote:
>
> > When working with a huge data set with character string variables, I
> > experienced that various commands let R crash. When I run R in a
> > Linux/bash console, R terminates with the message "Killed". When I use
> > RStudio, I get the message "R Session Aborted. R encountered a fatal
> > error. The session was terminated. Start New Session". If an object in
> > the R workspace needs too much memory, I would expect that R would not
> > crash but issue an error message "Error: cannot allocate vector of
> > size ...".  A minimal reproducible example (at least on my computer)
> > is:
> >
> > nObs <- 1e9
> >
> > date <- paste( round( runif( nObs, 1981, 2015 ) ), round( runif( nObs,
> > 1, 12 ) ), round( runif( nObs, 1, 31 ) ), sep = "-" )
> >
> > Is this a bug or a feature of R?
> >
> > Some information about my R version, OS, etc:
> >
> > R> sessionInfo()
> > R version 4.0.3 (2020-10-10)
> > Platform: x86_64-pc-linux-gnu (64-bit)
> > Running under: Ubuntu 20.04.1 LTS
> >
> > Matrix products: default
> > BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
> > LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
> >
> > locale:
> > [1] LC_CTYPE=en_DK.UTF-8   LC_NUMERIC=C
> > [3] LC_TIME=en_DK.UTF-8LC_COLLATE=en_DK.UTF-8
> > [5] LC_MONETARY=en_DK.UTF-8LC_MESSAGES=en_DK.UTF-8
> > [7] LC_PAPER=en_DK.UTF-8   LC_NAME=C
> > [9] LC_ADDRESS=C   LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] stats graphics  grDevices utils datasets  methods   base
> >
> > loaded via a namespace (and not attached):
> > [1] compiler_4.0.3
> >
> > /Arne
> >
> >
>
> --
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa  Phone: 319-335-3386
> Department of Statistics andFax:   319-335-3017
> Actuarial Science
> 241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
> Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] brief update on the pipe operator in R-devel

2021-01-12 Thread Iñaki Ucar
On Tue, 12 Jan 2021 at 20:23,  wrote:
>
> After some discussions we've settled on a syntax of the form
>
>  mtcars |> subset(cyl == 4) |> d => lm(mpg ~ disp, data = d)
>
> to handle cases where the pipe lhs needs to be passed to an argument
> other than the first of the function called on the rhs. This seems a
> to be a reasonable balance between making these non-standard cases
> easy to see but still easy to write. This is now committed to R-devel.

Interesting. Is the use of "d =>" restricted to pipelines? In other
words, I think that it shouldn't be equivalent to "function(d)", i.e.,
that this:

x <- d => lm(mpg ~ disp, data = d)

shouldn't work.

-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R 4.0.4 scheduled for February 15

2021-01-21 Thread Iñaki Ucar
Minor question: wouldn't the new pipe syntax be worth a minor version
bump? A package planning to drop magrittr would end up depending on R
4.0.4, which sounds suboptimal. And (I don't find any reference to
this in the manual or in CRAN policies, but) if I remember correctly,
depending on a patch version was discouraged.

Iñaki


On Thu, 21 Jan 2021 at 11:59, Peter Dalgaard via R-devel
 wrote:
>
> Full schedule is available on https://developer.r-project.org (or 
> https://svn.r-project.org/R-dev-web/trunk/index.html for the impatient).
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd@cbs.dk  Priv: pda...@gmail.com
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R 4.0.4 scheduled for February 15

2021-01-21 Thread Iñaki Ucar
On Thu, 21 Jan 2021 at 14:19, Sebastian Meyer  wrote:
>
> Am 21.01.21 um 13:51 schrieb Iñaki Ucar:
> > Minor question: wouldn't the new pipe syntax be worth a minor version
> > bump?
>
> Yes. The NEWS mention the pipe syntax for R-devel not for R-patched.
>
> See the section "CHANGES IN R 4.0.3 patched" in
>
> https://stat.ethz.ch/R-manual/R-devel/doc/html/NEWS.html
>
> for what is currently included for R 4.0.4.

Understood, thanks!

-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] boneheaded BLAS questions

2021-03-18 Thread Iñaki Ucar
On Thu, 18 Mar 2021 at 05:10, Dirk Eddelbuettel  wrote:
>
>
> On 17 March 2021 at 22:53, Ben Bolker wrote:
> |Thanks.  I know it's supposed to Just Work (and I definitely
> | appreciate all the work that's gone into making it Just Work 99% of the
> | time!).
>
> And for what it is worth, the aforementioned 'switching from within' solution
> is using FlexiBLAS (not BLIS as I had said in the previous email), and was
> described in an R application R here:
>
>   
> https://www.enchufa2.es/archives/switch-blas-lapack-without-leaving-your-r-session.html

Thanks, Dirk. Yes, since Fedora 33 (current release), we leverage this
excellent work by Martin Köhler et al. [1], so that every BLAS/LAPACK
consumer in Fedora is linked against FlexiBLAS, which enables
transparent live switching. And there are R and octave packages
providing bindings, as shown in the post above. Julia is in fact the
only component that is currently *not* using it due to the
particularities of their BLAS/LAPACK stack management, but they are
interested in FlexiBLAS too and some work is underway [2].

If you are interested in this, Ben, you could compile FlexiBLAS
yourself, docs are very clear and it's pretty straightforward. And
then you only need to tell R to link against libflexiblas. For that,
as previously described, we use (see lines 691-693 in [3]):

--with-lapack --with-blas=flexiblas

A small tweak in the configure is required though (see line 679 in
[3]; in fact, I should port a proper fix upstream, but I didn't find
the time yet). And if you have any issue Martin or myself could help.

[1] https://www.mpi-magdeburg.mpg.de/projects/flexiblas
[2] https://github.com/mpimd-csc/flexiblas/issues/12
[3] https://src.fedoraproject.org/rpms/R/blob/rawhide/f/R.spec

-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R does not start on Fedora 34

2021-04-29 Thread Iñaki Ucar
On Thu, 29 Apr 2021 at 14:36, Gábor Csárdi  wrote:
>
> Dear all,
>
> Fedora 34 was released two days ago, and with a fresh build of R I get
>
> [root@2dba8b3587c1 R-devel]# bin/R
> ERROR: R_HOME ('/tmp/R-devel') not found

This is known. It's a docker issue after a glibc change. See [1] and
references therein. A newer version of docker/libseccomp should work.

[1] https://stat.ethz.ch/pipermail/r-sig-fedora/2021-March/000732.html

>
> on it, coming from
> https://github.com/wch/r-source/blob/0f0092adf14b8bd17bcce1cac0ee26b928355dab/src/scripts/R.sh.in#L263
>
> Apparently `test -x` returns 1 for an existing 755 directory on Fedora 34:
> ❯ docker run -ti fedora:latest
> [root@f944f25b16b4 /]# test -x /tmp/
> [root@f944f25b16b4 /]# echo $?
> 1
>
> On Fedora 33 this was different:
> ❯ docker run -ti fedora:33
> [root@ea55a1b92215 /]# test -x /tmp/
> [root@ea55a1b92215 /]# echo $?
> 0
>
> A workaround would be to use `test -d` which still return 0 on Fedora 34.
>
> FYI,
> Gabor
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R compilation on old(ish) CentOS

2021-04-29 Thread Iñaki Ucar
On Thu, 29 Apr 2021 at 15:59, Ben Bolker  wrote:
>
>I probably don't want to go down this rabbit hole very far, but if
> anyone has any *quick* ideas ...
>
>Attempting to build R from scratch with a fresh SVN checkout on a
> somewhat out-of-date CentOS system (for which I don't have root access,
> although I can bug people if I care enough).

Could you bug them to... update CentOS? :)

[snip]

> $ gcc --version
> gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39)

Ouch. You definitely need to install and activate an appropriate
devtoolset as follows:

$ yum install centos-release-scl
$ yum install devtoolset-8

(Bug those people to at least install that). Then, put something like
this in your .bashrc:

$ source scl_source enable devtoolset-8

And you are ready to go with a fairly decent version of gcc, i.e.:

$ gcc --version
gcc (GCC) 8.3.1 20190311 (Red Hat 8.3.1-3)

Iñaki

> $ lsb_release -a
> LSB Version:
> :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
> Distributor ID: CentOS
> Description:CentOS Linux release 7.8.2003 (Core)
> Release:7.8.2003
> Codename:   Core
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R compilation on old(ish) CentOS

2021-05-01 Thread Iñaki Ucar
On Sat, 1 May 2021 at 03:41, Henrik Bengtsson
 wrote:
>
> Ben, it's most like what Peter says.  I can confirm it works; I just
> installed https://cran.r-project.org/src/base-prerelease/R-latest.tar.gz
> on an up-to-date CentOS 7.9.2009 system using the vanilla gcc (GCC)
> 4.8.5 that comes with that version and R compiles just fine and it
> passes 'make check' too.

It's not that you can't compile R with gcc 4.8.5, it's that you'll
have a hard time installing many packages. And that's why EPEL 7 has R
3.6 and cannot be updated to 4.

> Since R is trying to move toward C++14 support by default, I agree
> with Iñaki, you might wanr to build and run R with a newer version of
> gcc.  gcc 4.8.5 will only give you C++11 support.  RedHat's Software
> Collections (SCL) devtoolset:s is the easiest way to do this. I've
> done this too and can confirm that gcc 7.3.1 that comes with SCL
> devtoolset/7 is sufficient to get C++14 support.  I'm sharing my
> installation with lots of users, so I'm make it all transparent to the
> end-user with environment modules, i.e. 'module load r/4.1.0' is all
> the user needs to know.

--
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Possible bug in help file name generation

2021-06-24 Thread Iñaki Ucar
Hi,

I noticed that R 4.1 places html files into the packages' help
directory, compared to previous versions, which used an RDS. I found a
possible bug in the code that processes the aliases from the Rd files
and generates the names for these html files (I haven't identified
where this happens though).

To reproduce this, install e.g. the 'caper' package from CRAN and
inspect the help directory. I find the following file:

'pgls.confint'$'\n''.html'

which contains a special character. This comes from the fact that the
file caper/man/pgls.profile.Rd in caper's source code contains a
newline in the corresponding alias:

\name{pgls.profile}
\alias{pgls.profile}
\alias{plot.pgls.profile}
\alias{pgls.confint
}

and this ends up in the file name.

-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Possible bug in help file name generation

2021-06-24 Thread Iñaki Ucar
On Thu, 24 Jun 2021 at 14:21, Kurt Hornik  wrote:
>
> >>>>> Deepayan Sarkar writes:
>
> > On Thu, Jun 24, 2021 at 5:31 PM Iñaki Ucar  wrote:
> >>
> >> Hi,
> >>
> >> I noticed that R 4.1 places html files into the packages' help
> >> directory, compared to previous versions, which used an RDS. I found a
> >> possible bug in the code that processes the aliases from the Rd files
> >> and generates the names for these html files (I haven't identified
> >> where this happens though).
> >>
> >> To reproduce this, install e.g. the 'caper' package from CRAN and
> >> inspect the help directory. I find the following file:
> >>
> >> 'pgls.confint'$'\n''.html'
> >>
> >> which contains a special character. This comes from the fact that the
> >> file caper/man/pgls.profile.Rd in caper's source code contains a
> >> newline in the corresponding alias:
> >>
> >> \name{pgls.profile}
> >> \alias{pgls.profile}
> >> \alias{plot.pgls.profile}
> >> \alias{pgls.confint
> >> }
> >>
> >> and this ends up in the file name.
>
> > Yes, the code should probably do a trimws() somewhere, but this also
> > looks like something that maybe R CMD check should identify and
> > complain about.
>
> I'll take a look ...

Thanks. FYI, I was able to detect this thanks to build errors in
cran2copr, because RPM tools make this kind of checks. See the
complete log here:
https://download.copr.fedorainfracloud.org/results/iucar/cran/fedora-rawhide-x86_64/02296763-R-CRAN-caper/builder-live.log.gz

-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] na.omit inconsistent with is.na on list

2021-08-13 Thread Iñaki Ucar
On Thu, 12 Aug 2021 at 22:20, Gabriel Becker  wrote:
>
> Hi Toby,
>
> This definitely appears intentional, the first  expression of
> stats:::na.omit.default is
>
>if (!is.atomic(object))
>
> return(object)

I don't follow your point. This only means that the *default* method
is not intended for non-atomic cases, but it doesn't mean it shouldn't
exist a method for lists.

> So it is explicitly just returning the object in non-atomic cases, which
> includes lists. I was not involved in this decision (obviously) but my
> guess is that it is due to the fact that what constitutes an observation
> "being complete" in unclear in the list case. What should
>
> na.omit(list(5, NA, c(NA, 5)))
>
> return? Just the first element, or the first and the last? It seems, at
> least to me, unclear. A small change to the documentation to to add "atomic

> is.na(list(5, NA, c(NA, 5)))
[1] FALSE  TRUE FALSE

Following Toby's argument, it's clear to me: the first and the last.

Iñaki

> (in the sense of is.atomic returning \code{TRUE})" in front of "vectors"
> or similar  where what types of objects are supported seems justified,
> though, imho, as the current documentation is either ambiguous or
> technically incorrect, depending on what we take "vector" to mean.
>
> Best,
> ~G
>
> On Wed, Aug 11, 2021 at 10:16 PM Toby Hocking  wrote:
>
> > Also, the na.omit method for data.frame with list column seems to be
> > inconsistent with is.na,
> >
> > > L <- list(NULL, NA, 0)
> > > str(f <- data.frame(I(L)))
> > 'data.frame': 3 obs. of  1 variable:
> >  $ L:List of 3
> >   ..$ : NULL
> >   ..$ : logi NA
> >   ..$ : num 0
> >   ..- attr(*, "class")= chr "AsIs"
> > > is.na(f)
> >  L
> > [1,] FALSE
> > [2,]  TRUE
> > [3,] FALSE
> > > na.omit(f)
> >L
> > 1
> > 2 NA
> > 3  0
> >
> > On Wed, Aug 11, 2021 at 9:58 PM Toby Hocking  wrote:
> >
> > > na.omit is documented as "na.omit returns the object with incomplete
> > cases
> > > removed." and "At present these will handle vectors," so I expected that
> > > when it is used on a list, it should return the same thing as if we
> > subset
> > > via is.na; however I observed the following,
> > >
> > > > L <- list(NULL, NA, 0)
> > > > str(L[!is.na(L)])
> > > List of 2
> > >  $ : NULL
> > >  $ : num 0
> > > > str(na.omit(L))
> > > List of 3
> > >  $ : NULL
> > >  $ : logi NA
> > >  $ : num 0
> > >
> > > Should na.omit be fixed so that it returns a result that is consistent
> > > with is.na? I assume that is.na is the canonical definition of what
> > > should be considered a missing value in R.
> > >
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Floating point issue

2022-07-10 Thread Iñaki Ucar
On Sun, 10 Jul 2022 at 16:28, GILLIBERT, Andre
 wrote:
>
> > No, that is how computers work (with floating point numbers).
>
>
> The fact that not all values are representable by floating point does not 
> mean that outputing a number with maximum accuracy, then reading it back, 
> should yield a different number.
>
>
> I would like to point that I cannot reproduce this "bug" on the official R 
> 4.2.0 Windows x86_64 build on an AMD Ryzen 1700 on Windows 10.

I cannot reproduce this on a 64-bit Linux build of R 4.1.3 either:

options(scipen = 999)
1e24
#> [1]  83222784
1e24 == 83222784
#> [1] TRUE

1e25
#> [1] 1905969664
1e25 == 1905969664
#> [1] TRUE

1905969664
#> [1] 1905969664

10003053453312
#> [1] 10003053453312

10 == 1e25
#> [1] TRUE

-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Proposal to limit Internet access during package load

2022-09-23 Thread Iñaki Ucar
Hi all,

I'd like to open this debate here, because IMO this is a big issue.
Many packages do this for various reasons, some more legitimate than
others, but I think that this shouldn't be allowed, because it
basically means that installation fails in a machine without Internet
access (which happens e.g. in Linux distro builders for security
reasons).

Now, what if connection is suppressed during package load? There are
basically three use cases out there:

(1) The package requires additional files for the installation (e.g.
the source code of an external library) that cannot be bundled into
the package due to CRAN restrictions (size).
(2) The package requires additional files for using it (e.g.,
datasets, a JAR...) that cannot be bundled into the package due to
CRAN restrictions (size).
(3) Other spurious reasons (e.g. the maintainer decided that package
load was a good place to check an online service availability, etc.).

Again IMO, (3) shouldn't be allowed in any case; (2) should be a
separate function that the user actively calls to download the files,
and those files should be placed into the user dir, and (3) is the
only legitimate use, but then other mechanism should be provided to
avoid connections during package load.

My proposal to support (3) would be to add a new field in the
DESCRIPTION, "Additional_sources", which would be a comma separated
list of additional resources to download during R CMD INSTALL. Those
sources would be downloaded by R CMD INSTALL if not provided via an
option (to support offline installations), and would be placed in a
predefined place for the package to find and configure them (via an
environment variable or in a predefined subdirectory).

This proposal has several advantages. Apart from the obvious one
(Internet access during package load can be limited without losing
current functionalities), it gives more visibility to the resources
that packages are using during the installation phase, and thus makes
those installations more reproducible and more secure.

Best,
-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Proposal to limit Internet access during package load

2022-09-23 Thread Iñaki Ucar
On Fri, 23 Sept 2022 at 17:22, Iñaki Ucar  wrote:
>
> [snip]
> Now, what if connection is suppressed during package load? There are
> basically three use cases out there:
>
> (1) The package requires additional files for the installation (e.g.
> the source code of an external library) that cannot be bundled into
> the package due to CRAN restrictions (size).
> (2) The package requires additional files for using it (e.g.,
> datasets, a JAR...) that cannot be bundled into the package due to
> CRAN restrictions (size).
> (3) Other spurious reasons (e.g. the maintainer decided that package
> load was a good place to check an online service availability, etc.).
>
> Again IMO, (3) shouldn't be allowed in any case; (2) should be a
> separate function that the user actively calls to download the files,
> and those files should be placed into the user dir, and (3) is the
> only legitimate use, but then other mechanism should be provided to
> avoid connections during package load.
> [snip]

I meant "(1) is the only legitimate use" above.

-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Proposal to limit Internet access during package load

2022-09-26 Thread Iñaki Ucar
On Sat, 24 Sept 2022 at 01:55, Simon Urbanek
 wrote:
>
> Iñaki,
>
> I fully agree, this a very common issue since vast majority of server 
> deployments I have encountered don't allow internet access. In practice this 
> means that such packages are effectively banned.
>
> I would argue that not even (1) or (2) are really an issue, because in fact 
> the CRAN policy doesn't impose any absolute limits on size, it only states 
> that the package should be "of minimum necessary size" which means it 
> shouldn't waste space. If there is no way to reduce the size without 
> impacting functionality, it's perfectly fine.

"Packages should be of the minimum necessary size" is subject to
interpretation. And in practice, there is an issue with e.g. packages
that "bundle" big third-party libraries. There are also packages that
require downloading precompiled code, JARs... at installation time.

> That said, there are exceptions such as very large datasets (e.g., as 
> distributed by Bioconductor) which are orders of magnitude larger than what 
> is sustainable. I agree that it would be nice to have a mechanism for 
> specifying such sources. So yes, I like the idea, but I'd like to see more 
> real use cases to justify the effort.

"More real use cases" like in "more use cases" or like in "the
previous ones are not real ones"? :)

> The issue with any online downloads, though, is that there is no guarantee of 
> availability - which is real issue for reproducibility. So one could argue 
> that if such external sources are required then they should be on a 
> well-defined, independent, permanent storage such as Zenodo. This could be a 
> matter of policy as opposed to the technical side above which would be adding 
> such support to R CMD INSTALL.

Not necessarily. If the package declares the additional sources in the
DESCRIPTION (probably with hashes), that's a big improvement over the
current state of things, in which basically we don't know what the
package tries download, then it may fail, and finally there's no
guarantee that it's what the author intended in the first place.

But on top of this, R could add a CMD to download those, and then some
lookaside storage could be used on CRAN. This is e.g. how RPM
packaging works: the spec declares all the sources, they are
downloaded once, hashed and stored in a lookaside cache. Then package
building doesn't need general Internet connectivity, just access to
the cache.

Iñaki

>
> Cheers,
> Simon
>
>
> > On Sep 24, 2022, at 3:22 AM, Iñaki Ucar  wrote:
> >
> > Hi all,
> >
> > I'd like to open this debate here, because IMO this is a big issue.
> > Many packages do this for various reasons, some more legitimate than
> > others, but I think that this shouldn't be allowed, because it
> > basically means that installation fails in a machine without Internet
> > access (which happens e.g. in Linux distro builders for security
> > reasons).
> >
> > Now, what if connection is suppressed during package load? There are
> > basically three use cases out there:
> >
> > (1) The package requires additional files for the installation (e.g.
> > the source code of an external library) that cannot be bundled into
> > the package due to CRAN restrictions (size).
> > (2) The package requires additional files for using it (e.g.,
> > datasets, a JAR...) that cannot be bundled into the package due to
> > CRAN restrictions (size).
> > (3) Other spurious reasons (e.g. the maintainer decided that package
> > load was a good place to check an online service availability, etc.).
> >
> > Again IMO, (3) shouldn't be allowed in any case; (2) should be a
> > separate function that the user actively calls to download the files,
> > and those files should be placed into the user dir, and (3) is the
> > only legitimate use, but then other mechanism should be provided to
> > avoid connections during package load.
> >
> > My proposal to support (3) would be to add a new field in the
> > DESCRIPTION, "Additional_sources", which would be a comma separated
> > list of additional resources to download during R CMD INSTALL. Those
> > sources would be downloaded by R CMD INSTALL if not provided via an
> > option (to support offline installations), and would be placed in a
> > predefined place for the package to find and configure them (via an
> > environment variable or in a predefined subdirectory).
> >
> > This proposal has several advantages. Apart from the obvious one
> > (Internet access during package load can be limited without losing
> > current functi

Re: [Rd] Proposal to limit Internet access during package load

2022-09-26 Thread Iñaki Ucar
On Mon, 26 Sept 2022 at 21:50, Simon Urbanek
 wrote:
>
> [snip]
> Sure, I fully agree that it would be a good first step, but I'm still waiting 
> for examples ;).

Oh, you want me to actually name specific packages? I thought that
this was a well-established fact from your initial statement "I fully
agree, this a very common issue [...]", so I preferred to avoid
pointing fingers.

But of course you can start by taking a look at [1], where all
packages marked as "internet" or "cargo" are downloading stuff at
install time. There are some others that are too important to get rid
of, so I just build them with an Internet connection from time to
time. Or have them patched to avoid such downloads.

And others have been fixed after me opening an issue when a package
blows up when I try to build an RPM with it. But this is like playing
cat and mouse if this is not enforced somehow.

[1] https://github.com/Enchufa2/cran2copr/blob/master/excl-no-sysreqs.txt

-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Proposal to limit Internet access during package load

2022-09-26 Thread Iñaki Ucar
On Mon, 26 Sept 2022 at 23:07, Simon Urbanek
 wrote:
>
> Iñaki,
>
> I'm not sure I understand - system dependencies are an entirely different 
> topic and I would argue a far more important one (very happy to start a 
> discussion about that), but that has nothing to do with declaring downloads. 
> I assumed your question was about large files in packages which packages 
> avoid to ship and download instead so declaring them would be useful.

Exactly. Maybe there's a misunderstanding, because I didn't talk about
system dependencies (alas there are packages that try to download
things that are declared as system dependencies, as Gabe noted). :)

> And for that, the obvious answer is they shouldn't do that - if a package 
> needs a file to run, it should include it. So an easy solution is to disallow 
> it.

Then we completely agree. My proposal about declaring additional
sources was because, given that so many packages do this, I thought
that I would find a strong opposition to this. But if R Core / CRAN is
ok with just limiting net access at install time, then that's perfect
to me. :)

Iñaki

> But so far all examples where just (ab)use of downloads for binary 
> dependencies which is an entirely different issue that needs a different 
> solution (in a naive way declaring such dependencies, but we know it's not 
> that simple - and download URLs don't help there).
>
> Cheers,
> Simon
>
>
> > On 27/09/2022, at 8:25 AM,  Ucar  wrote:
> >
> > On Sat, 24 Sept 2022 at 01:55, Simon Urbanek
> >  wrote:
> >>
> >> Iñaki,
> >>
> >> I fully agree, this a very common issue since vast majority of server 
> >> deployments I have encountered don't allow internet access. In practice 
> >> this means that such packages are effectively banned.
> >>
> >> I would argue that not even (1) or (2) are really an issue, because in 
> >> fact the CRAN policy doesn't impose any absolute limits on size, it only 
> >> states that the package should be "of minimum necessary size" which means 
> >> it shouldn't waste space. If there is no way to reduce the size without 
> >> impacting functionality, it's perfectly fine.
> >
> > "Packages should be of the minimum necessary size" is subject to
> > interpretation. And in practice, there is an issue with e.g. packages
> > that "bundle" big third-party libraries. There are also packages that
> > require downloading precompiled code, JARs... at installation time.
> >
> >> That said, there are exceptions such as very large datasets (e.g., as 
> >> distributed by Bioconductor) which are orders of magnitude larger than 
> >> what is sustainable. I agree that it would be nice to have a mechanism for 
> >> specifying such sources. So yes, I like the idea, but I'd like to see more 
> >> real use cases to justify the effort.
> >
> > "More real use cases" like in "more use cases" or like in "the
> > previous ones are not real ones"? :)
> >
> >> The issue with any online downloads, though, is that there is no guarantee 
> >> of availability - which is real issue for reproducibility. So one could 
> >> argue that if such external sources are required then they should be on a 
> >> well-defined, independent, permanent storage such as Zenodo. This could be 
> >> a matter of policy as opposed to the technical side above which would be 
> >> adding such support to R CMD INSTALL.
> >
> > Not necessarily. If the package declares the additional sources in the
> > DESCRIPTION (probably with hashes), that's a big improvement over the
> > current state of things, in which basically we don't know what the
> > package tries download, then it may fail, and finally there's no
> > guarantee that it's what the author intended in the first place.
> >
> > But on top of this, R could add a CMD to download those, and then some
> > lookaside storage could be used on CRAN. This is e.g. how RPM
> > packaging works: the spec declares all the sources, they are
> > downloaded once, hashed and stored in a lookaside cache. Then package
> > building doesn't need general Internet connectivity, just access to
> > the cache.
> >
> > Iñaki
> >
> >>
> >> Cheers,
> >> Simon
> >>
> >>
> >>> On Sep 24, 2022, at 3:22 AM, Iñaki Ucar  wrote:
> >>>
> >>> Hi all,
> >>>
> >>> I'd like to open this debate here, because IMO this is a big issue.

Re: [Rd] Proposal to limit Internet access during package load

2022-09-27 Thread Iñaki Ucar
El mar., 27 sept. 2022 4:22, Dirk Eddelbuettel  escribió:

>
> Regarding 'system' libraries: Packages like stringi and nloptr download the
> source of, respectively, libicu or libnlopt and build a library _if_ the
> library is not found locally.  If we outlaw this, more users may hit a
> brick
> wall because they cannot install system libraries (for lack of
> permissions),
> or don't know how to, or ...  These facilities were not added to run afoul
> of
> best practices -- they were added to help actual users. Something to keep
> in
> mind.


Yes, but then IMO Internet access should be explicitly enabled by the user
with a flag. By default, it should be disabled and packages on CRAN should
install as is.

Iñaki

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Proposal to limit Internet access during package load

2022-09-27 Thread Iñaki Ucar
El mar., 27 sept. 2022 18:42, Blätte, Andreas 
escribió:

> Dear all,
>
> my apologies for a dull question. I think I do understand that unnoticed
> Internet access requires scrutiny and a more explicit approach.
>
> But I am not sure how this would impact on the practice on many Windows
> machines to download static libraries from one of the rwinlib repositories?
> See https://github.com/rwinlib, an approach taken by quite a few packages
> (src/Makevars.win triggers tools/winlibs.R for downloading a static
> library).
>
> I am asking because a package I maintain (RcppCWB) uses the approach, and
> am not sure whether and how the discussion has addressed this scenario. It
> may not be covered by Iñakis initial three scenario?


AFAIK, packages should compile on CRAN with the distribution of packages
that CRAN has for Windows, and thus offline. Then the majority of Windows
users just download precompiled binaries.

The rwinlib stuff is a nice to have feature for power users compiling their
own packages. But yet again those power users could enable Internet access
with this hypothetical flag I proposed.

Iñaki

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Compiling R-devel on older Linux distributions, e.g. RHEL / CentOS 7

2023-02-08 Thread Iñaki Ucar
On Wed, 8 Feb 2023 at 07:05, Prof Brian Ripley  wrote:
>
> On 08/02/2023 00:13, Gábor Csárdi wrote:
> > As preparation for the next release, I am trying to compile R devel on
> > RHEL / CentOS 7, which is still supported by RedHat until 2024 June.

True, but with a big asterisk. Full updates ended on 2020-08-06, and
it's been in maintenance mode since then, meaning that only security
and critical fixes are released until EOL to facilitate a transition
to a newer version. So CentOS 7 users shouldn't expect new releases of
software to be available.

> > There are two issues.
> >
> > One is that the libcurl version in CentOS 7 is quite old, 7.29.0, and
> > R devel now requires 7.32.0, since 83715 about a week ago. This
> > requirement is here to stay for R 4.3.0, right?

I suppose that if R-devel doesn't use any API endpoint not available
in 7.29, you could just patch out that requirement. Otherwise, you
would need to build your own.

> Unless we revert it.  The comment in the manual says
>
> @c libcurl 7.32.0 was released in Aug 2013
>
> and Centos 7 was released in 2014-07-07, 11 months later.  Do they
> really never security-patch libcurl?

Oh, they do port all security fixes, but without changing the version,
which is the whole point of LTS. In fact, current version is
7.29.0-59, and there are probably a hundred patches on top of those 59
builds.

> > The second is that the recommended packages are now installed with R
> > CMD INSTALL --use-C17, which fails on CentOS 7 with
> >
> > begin installing recommended package MASS
> > * installing *source* package 'MASS' ...
> > ** package 'MASS' successfully unpacked and MD5 sums checked
> > ** using non-staged installation
> > ** libs
> > Error: C17 standard requested but CC17 is not defined
> > * removing '/root/R-devel/library/MASS'
> >
> > CentOS 7 has GCC 4.8.5, which does not have a -std=gnu17 option.
> > However the commit message of this change in commit 83566 hints that
> > this requirement might be temporary. Hence my questions.
>
> It is temporary -- needed for survival (now updated) and mgcv (awaited).
>   However,
>
> 1) You should be able to set
>
> CC17="gcc -std=gnu11"
>
> in config.site, as C17 is a bug-fixed C11.
>
> 2) Centos 7 has later compilers available, and people are going to need
> them for C++.  The manual says
>
>  ... later compilers are available: for RHEL/Centos 7 look for
> ‘devtoolset’.

Exactly, here is the reference:
https://www.softwarecollections.org/en/scls/rhscl/devtoolset-7/

R 3.6.0 is the last version we support in EPEL, because EPEL is not
allowed to build on top of SCL. But you can enable SCL and install the
devtoolset available, which contains gcc version 7.3.1.

But anyway, I don't think that staying in an almost 10-year old distro
in maintenance mode and at the same time expecting a cutting-edge
version of R (or any software) is reasonable.

Iñaki

> > Is the C17 requirement temporary or it will be a requirement for R 4.3.0?
> > Should I expect any problems if I remove the --use-C17 flag for
> > installing the recommended packages?
>
> Not with that compiler.
>
> >
> > There are a lot of R users still on RHEL 7, so it would be great to
> > know what to expect for the next release.
> an D. Ripley,  rip...@stats.ox.ac.uk
> Emeritus Professor of Applied Statistics, University of Oxford
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Compiling R-devel on older Linux distributions, e.g. RHEL / CentOS 7

2023-02-08 Thread Iñaki Ucar
On Wed, 8 Feb 2023 at 19:59, Henrik Bengtsson
 wrote:
>
> I just want to add a few reasons that I know of for why users are
> still on Red Hat/CentOS 7 and learned from being deeply involved with
> big academic and research high-performance compute (HPC) environments.
> These systems are not like your regular sailing boat, but more like a
> giant container ship; much harder to navigate, slower to react, you
> cannot just cruise around and pop into any harbor you like to, or when
> you like to. It takes much more efforts, more logistics, and more
> people to operate them. If you mess up, the damage is much bigger.

I'm fully aware of, and I understand, all the technical and
organizational reasons why there are CentOS 7 systems out there. I
only challenge a single point (cherry-picked from your list):

> * The majority of users and sysadmins prefer stability over the being
> able to run the latest tools.

This is simply not true. In general, sysadmins do prefer stability,
but users want the latest tools (otherwise, this very thread would not
exist, QED). And the first thing is hardly compatible with the second
one. That is, without containers, which brings us to the next point.

> * Although you might want to tell everyone to just run a new version
> via Linux containers, it's not the magic sauce for all of the above.
> Savvy users might be able to do it, but not your average users. Also,
> this basically puts the common sysadmin burden on the end-user, who
> now have to keep their container stacks up-to-date and in sync.  In
> contrast to a homogeneous environment, this strategy increases the
> support burden on sysadms, because they will get much more questions
> and request for troubleshooting on very specific setups.

How is that so? Let's say a user wants the latest version of R.
Nothing prevents a sysadmin to set up a script called "R" in the PATH
that runs e.g. the r2u container [1] with the proper mounts. And
that's it: the user runs "R" and receives the latest version (and even
package installations seem to be blazing fast now!) without even
knowing that it's running inside a container.

I know, you are thinking "security", "permissions"...

$ yum install podman

Drop-in replacement for docker, but rootless, daemonless. Also there's
a similar thing called Apptainer [1], formerly Singularity, that was
specifically designed with HPC in mind, now part of the Linux
Foundation.

[1] https://github.com/eddelbuettel/r2u
[2] https://apptainer.org/

> What R Core, Gabor, and many others are doing, often silently in the
> background, is to make sure R works smoothly for many R users out
> there, whatever operating system and version they may be on. This is
> so essential to R's success, and, more importantly, for research and
> science to be able to move forward.

+1000

--
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Enable curl 8

2023-03-23 Thread Iñaki Ucar
Hi,

Just a heads-up that curl 8 landed in Fedora Rawhide a couple of days
ago. Note that R does **not** compile without a small fix [1] that
allows the configuration to continue if the major version is > 7.

[1] https://src.fedoraproject.org/rpms/R/blob/rawhide/f/R-4.2.3-curl-v8.patch

Best,
-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Unique ID for conditions to supress/rethrow selected conditions?

2023-04-16 Thread Iñaki Ucar
On Sun, 16 Apr 2023 at 12:58, nos...@altfeld-im.de  wrote:
>
> I am the author of the *tryCatchLog* package and want to
>
> - suppress selected conditions (warnings and messages)
> - rethrow  selected conditions (e.g a specific warning as a message or to 
> "rename" the condition text).
>
> I could not find any reliable unique identifier for each possible condition
>
> - that (base) R throws
> - that 3rd-party packages can throw (out of scope here).
>
> Is there any reliable way to identify each possible condition of base R?

I don't think so. As stated in the manual, "‘simpleError’ is the class
used by ‘stop’ and all internal error signals".

> Are there plans to implement such an identifier ("errno")?

I agree that something like this would be a nice addition. With the
current condition system, it would be certainly easy (but quite a lot
of work) to define a hierarchy of built-in conditions, and then use
them consistently throughout base R. For example,

> 1 + "a"
Error in 1 + "a" : non-numeric argument to binary operator

could be a "typeError". And catching this is already possible:

> e <- errorCondition("non-numeric argument to binary operator", 
> class="typeError")
> tryCatch(stop(e), typeError=function(e) print("hello"))
[1] "hello"

-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Correct use of tools::R_user_dir() in packages?

2023-06-29 Thread Iñaki Ucar
On Thu, 29 Jun 2023 at 01:34, Carl Boettiger  wrote:
>
> Thanks Simon, I was very much hoping that would be the case!  It may
> be that I just need to put the version requirement on 4.0 then.  I
> will be sure to add this version restriction to my packages (which
> technically I should be doing anyway since this function didn't exist
> in early versions of `tools`.)

In my experience, you *can* store stuff in those directories, but you
are required to clean up after yourself in CRAN checks. In other
words, if something is left behind when the check ends, CRAN won't be
happy.

Iñaki

>
> Cheers,
>
> Carl
>
> ---
> Carl Boettiger
> http://carlboettiger.info/
>
> On Wed, Jun 28, 2023 at 12:59 PM Simon Urbanek
>  wrote:
> >
> > Carl,
> >
> > I think your statement is false, the whole point of R_user_dir() is for 
> > packages to have a well-defined location that is allowed - from CRAN policy:
> >
> > "For R version 4.0 or later (hence a version dependency is required or only 
> > conditional use is possible), packages may store user-specific data, 
> > configuration and cache files in their respective user directories obtained 
> > from tools::R_user_dir(), provided that by default sizes are kept as small 
> > as possible and the contents are actively managed (including removing 
> > outdated material)."
> >
> > Cheers,
> > Simon
> >
> >
> > > On 28/06/2023, at 10:36 AM, Carl Boettiger  wrote:
> > >
> > > tools::R_user_dir() provides configurable directories for R packages
> > > to write persistent information consistent with standard best
> > > practices relative to each supported operating systems for
> > > applications to store data, config, and cache information
> > > respectively.  These standard best practices include writing to
> > > directories in the users home filespace, which is also specifically
> > > against CRAN policy.
> > >
> > > These defaults can be overridden by setting the environmental
> > > variables R_USER_DATA_DIR , R_USER_CONFIG_DIR, R_USER_CACHE_DIR,
> > > respectively.
> > >
> > > If R developers should be using the locations provided by
> > > tools::R_user_dir() in packages, why does CRAN's check procedure not
> > > set these three environmental variables to CRAN compliant location by
> > > default (e.g. tempdir())?
> > >
> > > In order to comply with CRAN policy, a package developer can obviously
> > > set these environmental variables themselves within the call for every
> > > example, every unit test, and every vignette.  Is this the recommended
> > > approach or is there a better technique?
> > >
> > > Thanks for any clarification!
> > >
> > > Regards,
> > >
> > > Carl
> > >
> > > ---
> > > Carl Boettiger
> > > http://carlboettiger.info/
> > >
> > > __
> > > R-devel@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > >
> >
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R packages to send plottable data to external apps

2023-08-27 Thread Iñaki Ucar
I think r-package-devel is a better place for this. CC'ing there.

On Sun, 27 Aug 2023 at 23:50, Mike Marchywka  wrote:
>
> I was curious what R packages, or indeed any other applications, exist
> to plot streamed data from arbitrary data generators. It need not
> be publication quality plotting but it should be easy to use  like
> an oscilloscope.

The last time I checked, there wasn't any R package suitable for
plotting high-throughput streaming data.

There's a nice command-line utility called trend [1] that I
extensively used in the past as an oscilloscope to visualize the
output from a DAQ card. I don't see any new development there, but it
does exactly what it promises; it's easy to use, quite configurable
and very fast. Old but gold.

I also explored VisPy, which is much more ambitious, but at that time
the API had a limitation that didn't allow me to achieve what I
required, and I haven't looked at it ever since, but the project seems
in good shape.

[1] https://www.thregr.org/wavexx/software/trend/
[2] https://vispy.org/

Hope it helps,
Iñaki

> I was working on something called datascope that I
> am using for 1D finite difference monitoring and recently interfaced it
> to freefem. I also created an R package. If there is any interest in something
> like this I guess I could put it up somewhere when it is more usable
> or if you can suggest some similar popular packages that would be good
> too. Is there something I could drop-in to the attached code and get
> something like the attached output that could also be switched to other
> data sources?  This right now works via linux fifo and somewhat by UDP.
> It can queue data and stop making it if no one seems to be  consuming
> it depending on the channel.
>
> Thanks.
>
>  Mike Marchywka
> 44 Crosscreek Trail
> Jasper GA 30143
> was 306 Charles Cox Drive  Canton, GA 30115
> 470-758-0799
> 404-788-1216
>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] About FlexiBLAS in the R-admin docs

2023-09-27 Thread Iñaki Ucar
Hi,

Not sure if this is the right place for this. The "R Installation and
Administration" guide states:

> Apparently undocumented: FlexiBLAS on Fedora provides a complete LAPACK, but 
> not the enhanced routines from ATLAS or OpenBLAS.

I'm not sure what this means. FlexiBLAS does provide 100% of BLAS and
LAPACK, and if the active backend (say, OpenBLAS) implements an
enhanced LAPACK routine, then the call is redirected to the backend.
If the user switches to another backend and that routine is not
available there, then the original LAPACK routine is dispatched
instead.

Best,
-- 
Iñaki Úcar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


  1   2   >