Re: [R-pkg-devel] Additional_packages in drat repositories

2023-11-09 Thread Sokol Serguei

Le 09/11/2023 à 08:01, Claborne, Daniel via R-package-devel a écrit :

I have a data package �pmartRdata� hosted in a drat repository here:  
https://github.com/pmartR/drat following the instructions here:  
https://cran.r-project.org/web/packages/drat/vignettes/DratStepByStep.html

The package installs fine via install.packages(�pmartRdata�, repos = 
�https://pmartR.github.io/drat�).  I have included it under Suggests and added 
�https://pmartR.github.io/drat� to the Additional_repositories field in the 
DESCRIPTION of a package I am submitting to CRAN.  I�m getting failures on 
Debian/R-devel and Windows/R-devel when running examples that use this package:

`Error in library(pmartRdata) : there is no package called �pmartRdata�`

Checks on rhub via `rhub::check_for_cran` for ubuntu,fedora, and windows do not 
have this problem.  What do I need to do to get it to install on CRAN 
submission machines?


Generally, you cannot assume that suggested package is available on CRAN 
(or any other) machine. Instead, you have to put the code involving 
suggested package inside an 'if()' clause, like:


if (requireNamespace("pkg", quietly=TRUE) {

   pkg::do_something()

}


Best,
Serguei.




Best,
-Daniel Claborne

[[alternative HTML version deleted]]


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] Feature request: Limiting output of installed.packages()

2023-04-19 Thread Sokol Serguei via R-devel

Le 19/04/2023 à 16:04, Chris Woelkers - NOAA Federal via R-devel a écrit :

I'm going to ask this here before submitting it to Bugzilla. Is this a good
idea and worthy of being considered? I think it is but others may not feel
the same.
The installed.packages() function is very useful but can be more
information than needed. While the function includes the argument 'fields'
it seems to do nothing as all fields are outputted when running the
function with or without the 'fields' argument.
I suggest changing the functionality to do the following. When the 'fields'
argument is not present all fields are outputted as is the current
behaviour. But when the 'fields' argument is present only those fields that
are listed are outputted. This would help me greatly as I could then run
just the installed.packages(fields = ) function without having to add any
sorting after the fact.


Not so much typing saved in this case, is it?

   fields=c("Package", "Version")
   plist=installed.packages()[, fields]

Best,
Serguei.




Thanks,

Chris Woelkers

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] Advice on elegant way to alias function name

2022-09-01 Thread Sokol Serguei

Le 01/09/2022 à 16:48, J C Nash a écrit :

Hi,

I've a package where it has been suggested that one of the functions 
-- call
it "myfn()" -- should be called something else, say "thefn()". Of 
course, I'll

need to keep the old name around for a while.

Web search has suggested simple assignment of

    thefn <- myfn

but I cannot seem to get this to work with R CMD check when I put this 
in a .R

file in the code and put alias and usage stanzas in documentation.


Is there any place where we can see what alias and stanzas you have used 
actually?


Best,
Serguei.


I get alias
and missing argument type errors. I've tried a number of variations on 
this theme

without appreciable success.

A workaround is to copy the entire function with Roxygen2 
documentation and

name change, but this seems inelegant.

Is there a better way e.g., using something like onLoad ? Pointers to 
working

examples in CRAN or Github packages would be welcome.

Best, JN

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] plogis (and other p* functions), vectorized lower.tail

2021-12-09 Thread Sokol Serguei

On 09/12/2021 16:55, Ben Bolker wrote:



On 12/9/21 10:03 AM, Martin Maechler wrote:

Matthias Gondan
 on Wed, 8 Dec 2021 19:37:09 +0100 writes:


 > Dear R developers,
 > I have seen that plogis silently ignores vector elements of 
lower.tail,


and also of 'log'.
This is indeed the case for all d*, p*, q* functions.

Yes, this has been on purpose and therefore documented, in the
case of plogis, e.g. in the 'Value' section of ?plogis :

  The length of the result is determined by ‘n’ for ‘rlogis’, and is
  the maximum of the lengths of the numerical arguments for the
  other functions.

  (note: *numerical* arguments: the logical ones are not recycled)

  The numerical arguments other than ‘n’ are recycled to the length
  of the result.  Only the first elements of the logical arguments
  are used.

  (above, we even explicitly mention the logical arguments ..)


Recycling happens for the first argument (x,p,q) of these
functions and for "parameters" of the distribution, but not for
lower.tail, log.p (or 'log').


 >> plogis(q=0.5, location=1, lower.tail=TRUE)
 > [1] 0.3775407
 >> plogis(q=0.5, location=1, lower.tail=FALSE)
 > [1] 0.6224593
 >> plogis(q=c(0.5, 0.5), location=1, lower.tail=c(TRUE, FALSE))
 > [1] 0.3775407 0.3775407

 > For those familiar with psychological measurement: A use case 
of the above function is the so-called Rasch model, where the 
probability that a person with some specific ability (q) makes a 
correct (lower.tail=TRUE) or wrong response (lower.tail=FALSE) to an 
item with a specific difficulty (location). A vectorized version of 
plogis would enable to determine the likelihood of an entire response 
vector in a single call. My current workaround is an intermediate 
call to „Vectorize“.


 > I am wondering if the logical argument of lower.tail can be 
vectorized (?). I see that this may be a substantial change in many 
places (basically, all p and q functions of probability 
distributions), but in my understanding, it would not break existing 
code which assumes lower.tail to be a single element. If that’s not
 > possible/feasible, I suggest to issue a warning if a vector of 
length > 1 is given in lower.tail. I am aware that the documentation 
clearly states that lower.tail is a single boolean.


aah ok, here you say you know that the current behavior is documented.

 > Thank you for your consideration.


As you mention, changing this would be quite a large endeavor.
I had thought about doing that many years ago, not remembering
details, but seeing that in almost all situations you really
only need one of the two tails  (for Gaussian- or t- based confidence
intervals you also only need one, for symmetry reason).

Allowing the recycling there would make the intermediate C code
(which does the recycling) larger and probably slightly
slower because of conceptually two more for loops which would in
99.9% only have one case ..

I'd have found that ugly to add. ... ...
... but of course, if you can prove that the code bloat would not be 
large

and not deteriorate speed in a measurable way and if you'd find
someone to produce a comprehensive and tested patch ...

Martin


 > With best wishes,
 > Matthias



 > [[alternative HTML version deleted]]

 > __
 > R-devel@r-project.org mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



  I agree with everything said above, but think that adding a warning 
when length(lower.tail) > 1 (rather than silently ignoring) might be 
helpful ...  ??


  As for the vectorization, it seems almost trivial to do at the user 
level when needed (albeit it's probably a little bit inefficient):


pv <- Vectorize(plogis, c("q", "location", "scale", "lower.tail"))
pv(q=c(0.5, 0.5), location=1, lower.tail=c(TRUE, FALSE))
[1] 0.3775407 0.6224593


.. or directly use mapply()

mapply(plogis, q=c(0.5, 0.5), location=1, lower.tail=c(TRUE, FALSE))
[1] 0.3775407 0.6224593

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] S3 is.unsorted registration

2021-09-09 Thread Sokol Serguei

On 09/09/2021 18:23, Hugh Parsonage wrote:

I would like to register an S3 method for `is.unsorted` for my
package's class "factor256" but I'm having trouble honouring the
`strictly` argument.  I've defined

is.unsorted.factor256 <- function(x, na.rm = FALSE, strictly = FALSE) {
   strictly
}

i.e. the class is sorted iff strictly = TRUE

However, the strictly argument appears to be ignored

x <- integer(2)
class(x) <- "factor256"

is.unsorted(x)  # FALSE [expected]
is.unsorted(x, strictly = TRUE)  # FALSE [unexpected]

The method is definitely being dispatched as when I change the function to

is.unsorted.factor256 <- function(x, na.rm = FALSE, strictly = FALSE) {
   cat("dispatching\n")
   strictly
}

I see the dispatch:

is.unsorted(ff, strictly = T)
#> dispatching
#> [1] FALSE


Moreover, if I define

is.unsorted.factor256 <- function(x, na.rm = FALSE, strictly = FALSE) {
   cat("dispatching\n")
   print(match.call())
   strictly
}

I get:

is.unsorted(ff, strictly = T)


# dispatching
# is.unsorted.factor256(x = x, na.rm = strictly)
# [1] FALSE

While defining not a method but a classic function works as expected:

f=function(x, na.rm = FALSE, strictly = FALSE) {
   cat("f\n")
   print(match.call())
   strictly
}

f(ff, strictly=T)

# f
# f(x = ff, strictly = T)
# [1] TRUE

Just my 0.02 €
Serguei.



and


methods("is.unsorted")

[1] is.unsorted.factor256
see '?methods' for accessing help and source code

I note that if I omit the na.rm = argument I will get the intended
result (I was drawn to the solution by noting the .Internal call has
omitted it) but I'm wondering whether this is correct.


Best,


Hugh Parsonage.

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] dgTMatrix Segmentation Fault

2021-06-07 Thread Sokol Serguei

Le 07/06/2021 à 09:00, Dario Strbenac a écrit :

Good day,

I notice that summing rows of a large dgTMatrix fails.

library(Matrix)
aMatrix <- new("dgTMatrix",
 i = as.integer(sample(20, 1)-1), j = 
as.integer(sample(5, 1)-1), x = rnorm(1),
Dim = c(20L, 5L)
  )
totals <- rowSums(aMatrix == 0)  # Segmentation fault.


On my R v4.1 (Ubuntu 18), I don't have a segfault but I do have an error 
message:


Error in h(simpleError(msg, call)) :
  error in evaluating the argument 'x' in selecting a method for 
function 'rowSums': cannot allocate vector of size 372.5 Gb


And the reason for this is quite clear: an intermediate logical matrix 
'aMatrix == 0' is almost dense thus having 20L*5L - 1L non 
zero entries. It is a little bit too much ;) for my modest laptop. So I 
can propose a workaround:


    totals <- 5 - rowSums(aMatrix != 0)

Hoping it helps.

Best,
Serguei.



The server has 768 GB of RAM and it was never close to being consumed by this. 
Converting it to an ordinary matrix works fine.

big <- as.matrix(aMatrix)
totals <- rowSums(big == 0)  # Uses more RAM but there is no segmentation 
fault and result is returned.

May it be made more robust for dgTMatrix?

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[R-pkg-devel] Fwd: Passing CRAN checks for a package linking to a system library on CRAN machines

2021-05-12 Thread Sokol Serguei

Le 13/05/2021 à 07:06, SN248 a écrit :

I am working on a package which provides an interface to the libsbml C++
library (http://sbml.org/Software/libSBML) in R. The source code of this
package (r2sbml) can be found at the following link

https://github.com/sn248/r2sbml

The package passes CRAN checks with `R CMD check` on my machine, but I do
have dependency (libsbml library) installed on my machine (OSX) with
headers and static libs at the usual locations, i.e., /usr/local/include
and /usr/local/lib. The package also passes CRAN check on a Windows 
machine
with libsbml installed using Rtools40 and msys2. The DESCRIPTION file 
lists

libsbml in SystemRequirements but `R CMD check` obviously fails on rhub
machines because there are no instructions to install libsbml first. As I
understand, I have the following options to pass checks on CRAN

1. Bundle the source code of libsbml into the package and make the static
libs on the fly. I don't really want to try this approach even though I
have used this approach before in another package as I think creating the
static lib is not as straightforward for this library because of the large
number of files and complex dependency chart.

2. Include header files in the `inst` folder and pull the static libs from
rwinlib github (assuming the libs can be posted there). I am not sure if
this approach will work on all platforms on which CRAN checks take place.

3. Somehow include instructions to install libsbml on CRAN machines (I 
have

no idea how to do this), or request CRAN maintainers to install libsbml
with header files and libs at usual locations (i.e., 
/usr/local/include and

/usr/local/lib).


I faced the same problem for my package r2sundials and in the end, I 
have opted for including the third-party source code into the package. 
However, a CRAN team member told me later that such kind of request 
(i.e. install third-party software on CRAN machines) can be sent to the 
CRAN team. Not sure that they accept but I think you can start by asking 
it and if the request is rejected, you can try other options.


Best,
Serguei.



I am sure some version of this question has been asked before as there are
many packages which interface with C/C++ libraries listed as
SystemRequirements, but I could not find a clear answer to this aspect,
i.e., passing checks on CRAN machines.

Any guidance here and pros/cons of the above mentioned approaches will be
very helpful.

Thanks
Satya

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] Unexpected behavior of '[' in an apply instruction

2021-02-12 Thread Sokol Serguei

Le 12/02/2021 à 23:49, Sokol Serguei a écrit :

Le 12/02/2021 à 22:23, Rui Barradas a écrit :

Hello,

Yes, although there is an accepted solution, I believe you should 
post this solution there. It's a base R solution, what the question 
asks for.


And thanks, I would have never reminded myself of slice.index.


There is another approach -- produce a call to `[`() putting there 
"required number of commas in their proper places" programmatically. 
Even if it does not lead to a very readable expression, I think it 
merits to be mentioned.


  x <- array(1:60, dim = c(10, 2, 3))
  ld=length(dim(x))
  i=1 # i.e. the first row but can be a slice 1:5, whatever
  do.call(`[`, c(alist(x, i), alist(,)[rep(1,ld-1)], alist(drop=FALSE)))


Or slightly shorter:

  do.call(`[`, alist(x, i, ,drop=FALSE)[c(1,2,rep(3,ld-1),4)])



Best,
Serguei.



Rui Barradas

Às 20:45 de 12/02/21, robin hankin escreveu:

Rui

 > x <- array(runif(60), dim = c(10, 2, 3))
 > array(x[slice.index(x,1) %in% 1:5],c(5,dim(x)[-1]))

(I don't see this on stackoverflow; should I post this there too?)  
Most of the magic package is devoted to handling arrays of arbitrary 
dimensions and this functionality might be good to include if anyone 
would find it useful.


HTH

Robin


<mailto:hankin.ro...@gmail.com>


On Sat, Feb 13, 2021 at 12:26 AM Rui Barradas <mailto:ruipbarra...@sapo.pt>> wrote:


    Hello,

    This came up in this StackOverflow post [1].

    If x is an array with n dimensions, how to subset by just one 
dimension?
    If n is known, it's simple, add the required number of commas in 
their

    proper places.
    But what if the user doesn't know the value of n?

    The example below has n = 3, and subsets by the 1st dim. The 
apply loop

    solves the problem as expected but note that the index i has
    length(i) > 1.


    x <- array(1:60, dim = c(10, 2, 3))

    d <- 1L
    i <- 1:5
    apply(x, MARGIN = -d, '[', i)
    x[i, , ]


    If length(i) == 1, argument drop = FALSE doesn't work as I 
expected it

    to work, only the other way does:


    i <- 1L
    apply(x, MARGIN = -d, '[', i, drop = FALSE)
    x[i, , drop = FALSE]


    What am I missing?

    [1]
https://stackoverflow.com/questions/66168564/is-there-a-native-r-syntax-to-extract-rows-of-an-array 



    Thanks in advance,

    Rui Barradas

    __
    R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list
    https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Unexpected behavior of '[' in an apply instruction

2021-02-12 Thread Sokol Serguei

Le 12/02/2021 à 22:23, Rui Barradas a écrit :

Hello,

Yes, although there is an accepted solution, I believe you should post 
this solution there. It's a base R solution, what the question asks for.


And thanks, I would have never reminded myself of slice.index.


There is another approach -- produce a call to `[`() putting there 
"required number of commas in their proper places" programmatically. 
Even if it does not lead to a very readable expression, I think it 
merits to be mentioned.


  x <- array(1:60, dim = c(10, 2, 3))
  ld=length(dim(x))
  i=1 # i.e. the first row but can be a slice 1:5, whatever
  do.call(`[`, c(alist(x, i), alist(,)[rep(1,ld-1)], alist(drop=FALSE)))

Best,
Serguei.



Rui Barradas

Às 20:45 de 12/02/21, robin hankin escreveu:

Rui

 > x <- array(runif(60), dim = c(10, 2, 3))
 > array(x[slice.index(x,1) %in% 1:5],c(5,dim(x)[-1]))

(I don't see this on stackoverflow; should I post this there too?)  
Most of the magic package is devoted to handling arrays of arbitrary 
dimensions and this functionality might be good to include if anyone 
would find it useful.


HTH

Robin





On Sat, Feb 13, 2021 at 12:26 AM Rui Barradas > wrote:


    Hello,

    This came up in this StackOverflow post [1].

    If x is an array with n dimensions, how to subset by just one 
dimension?
    If n is known, it's simple, add the required number of commas in 
their

    proper places.
    But what if the user doesn't know the value of n?

    The example below has n = 3, and subsets by the 1st dim. The 
apply loop

    solves the problem as expected but note that the index i has
    length(i) > 1.


    x <- array(1:60, dim = c(10, 2, 3))

    d <- 1L
    i <- 1:5
    apply(x, MARGIN = -d, '[', i)
    x[i, , ]


    If length(i) == 1, argument drop = FALSE doesn't work as I 
expected it

    to work, only the other way does:


    i <- 1L
    apply(x, MARGIN = -d, '[', i, drop = FALSE)
    x[i, , drop = FALSE]


    What am I missing?

    [1]
https://stackoverflow.com/questions/66168564/is-there-a-native-r-syntax-to-extract-rows-of-an-array

    Thanks in advance,

    Rui Barradas

    __
    R-devel@r-project.org  mailing list
    https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] Error occurring only when I submit to CRAN

2021-02-12 Thread Sokol Serguei
Le 12/02/2021 à 08:45, Elysée Aristide a écrit :
> Greetings,
>
> I submitted my package CDatanet .
> Before that I checked (as CRAN) locally on Linux and Windows and I did not
> get any error. I only get a note about my address mail (which is normal).
> However, when I submitted the package to CRAN, I got a warning and an error
> with the Window server:  
> https://win-builder.r-project.org/incoming_pretest/CDatanet_0.0.1_20210208_174258/Windows/00check.log
>
>
> * checking PDF version of manual ... WARNING
>
> LaTeX errors when creating PDF version.
> This typically indicates Rd problems.
> LaTeX errors found:
> ! Package inputenc Error: Unicode char ‐ (U+2010)

This is a frequent source of errors. Somewhere in your manual there is a 
unicode 2010 hyphen "‐" ( https://www.compart.com/fr/unicode/U+2010 ) 
instead of plain ASCII hyphen "-". They look pretty similar, don't they? 
May be you did some copy/paste from e.g. a pdf file and thus got this 
character in your texts.

Supposing that it is not so critical for your documentation to have 
U+2010 char, you can replace it by a hyphen from your keyboard. To spot 
its exact place you can try |tools::showNonASCIIfile().|

|Best,
Serguei.|

|
|

> (inputenc)not set up for use with LaTeX.
>
> See the inputenc package documentation for explanation.
> Type  H   for immediate help.
> * checking PDF version of manual without hyperrefs or index ... ERROR
> * checking for detritus in the temp directory ... OK
> * DONE
>
> Status: 1 ERROR, 1 WARNING, 1 NOTE
>
>
> while the Debian server is ok:  
> https://win-builder.r-project.org/incoming_pretest/CDatanet_0.0.1_20210208_174258/Debian/00check.log
>
> How can I fix this given that I am not able to reproduce the error locally?
>
> Thank you very much for your help.
>
> *-*
>
> *Aristide Elysée HOUNDETOUNGAN*
> *Ph.D. Candidate in Economics at Université Laval*
> *Personal website : *www.ahoundetoungan.com
>
>   [[alternative HTML version deleted]]
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel



[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[R-pkg-devel] Fwd: Subarchitectures

2021-02-09 Thread Sokol Serguei
Le 09/02/2021 à 17:01, Duncan Murdoch a écrit :
> I think the WRE manual says that an R_ARCH environment variable will 
> be set when subarchitectures are involved, but environment variables 
> aren't accessible in Makevars.
>
> Is there a standard way to get a value in Makevars which will match 
> .Platform$r_arch once R is running?

If sub-shell is possible, will it be convenient?

   myvar=`R --slave --vanilla -e 'cat(.Platform$r_arch)'`

Best,
Serguei.

>
> Duncan Murdoch
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel



[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] Possible documentation problem/bug?

2020-04-30 Thread Sokol Serguei

Le 30/04/2020 à 14:31, Dominic Littlewood a écrit :
It seems like there is no obvious way in the documentation to convert 
the expressions in the dots argument to a list without evaluating 
them. Say, if you want to have a function that prints all its arguments:
If you wish to iterate through all the arguments (not only '...') then 
match.call() seems to be the most straightforward and explicit tool:


f=function(a, ...) {mc <- match.call(); print(as.list(mc)[-1])}
f(x,y[h],abc$d)
#$a
#x
#
#[[2]]
#y[h]
#
#[[3]]
#abc$d

Best,
Serguei.





foo(abc$de, fg[h], i)

abc$de
fg[h]
i

...then converting them to a list would be helpful.
Using substitute(...) was the first thing I tried, but that only gives
the *first
*argument in dots. It turns out that there is a way to do this, using
substitute(...()), but this does not appear to be in either the substitute or
the dots help page.

In fact, there is a clue how to do this in the documentation, if you look
closely. Let me quote the substitute page:

"Substituting and quoting often cause confusion when the argument is
expression(...). The result is a call to the expression constructor
function and needs to be evaluated with eval to give the actual expression
object."

So this appears to give a way to turn the arguments into a list -
eval(substitute(expression(...))).  But that's quite long, and hard to
understand if you just come across it in some code - why are we using eval
here? why are we substituting expression? - and would definitely require an
explanatory comment. If the user just wants to iterate over the arguments,
substitute(...()) is better. In fact, you can get exactly the same effect
as the above code using as.expression(substitute(...())). Should the
documentation be updated?

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] suggestion: "." in [lsv]apply()

2020-04-20 Thread Sokol Serguei

Le 19/04/2020 à 20:46, Gabor Grothendieck a écrit :

You can get pretty close to that already using fn$ in the gsubfn package:
library(gsubfn) fn$sapply(split(mtcars, mtcars$cyl), x ~ 
summary(lm(mpg ~ wt, x))$r.squared) 

4 6 8 0.5086326 0.4645102 0.4229655
Right, I thought about similar syntax but this implementation has 
similar flaws pointed by Simon, i.e. it reduces the domain of valid 
inputs (though not on the same parameters). Take an example:


library(gsubfn)
fn$sapply(quote(x+y), as.character)
#Error in lapply(X = X, FUN = FUN, ...) : object 'x' not found

while

sapply(quote(x+y), as.character)
#[1] "+" "x" "y"

This makes me think that it could be advantageous to replace 
match.fun(FUN) in *apply() family by as.function(FUN) with obvious 
additional methods:

as.function.character <- function(x) match.fun(x)
as.function.name <- function(x) match.fun(x)

Such replacement would leave current usage of *apply() as is but at the 
same time would leave enough space for users who want to adapt *apply() 
to their objects like formula or whatever class that is currently not 
convertible to functions by match.fun()


Would it be possible?

Best,
Serguei.

It is not specific to sapply but rather fn$ can preface most 
functions. If the only free variables are the arguments to the 
function then you can omit the left hand side of the formula, i.e. the 
arguments to the function are implied by the free variables in the 
right hand side. Here x is the implied argument to the function 
because it is a free variable. We did not have use the name x. Any 
name could be used. It is the fact that it is a free variable, not its 
name, that matters.
fn$sapply(split(mtcars, mtcars$cyl), ~ sum(dim(x))) 
4 6 8 22 18 25 On Fri, Apr 17, 2020 at 4:11 AM Sokol Serguei 
 wrote:
Thanks Simon, Now, I see better your argument. Le 16/04/2020 à 22:48, 
Simon Urbanek a écrit :
... I'm not arguing against the principle, I'm arguing about your 
particular proposal as it is inconsistent and not general. 
This sounds promising for me. May be in a (new?) future, R core will 
come with a correct proposal for this principle? Meanwhile, to avoid 
substitute(), I'll look on the side of formula syntax deviation as 
your example x ~> i + x suggested. Best, Serguei.
Personally, I find the current syntax much clearer and readable 
(defining anything by convention like . being the function variable 
seems arbitrary and "dirty" to me), but if you wanted to define a 
shorter syntax, you could use something like x ~> i + x. That said, 
I really don't see the value of not using function(x) [especially 
these days when people are arguing for long variable names with the 
justification that IDEs do all the work anyway], but as I said, my 
argument was against the actual proposal, not general ideas about 
syntax improvement. Cheers, Simon
On 17/04/2020, at 3:53 AM, Sokol Serguei  
wrote: Simon, Thanks for replying. In what follows I won't try to 
argue (I understood that you find this a bad idea) but I would like 
to make clearer some of your point for me (and may be for others). 
Le 16/04/2020 à 16:48, Simon Urbanek a écrit :

Serguei,
On 17/04/2020, at 2:24 AM, Sokol Serguei  
wrote: Hi, I would like to make a suggestion for a small 
syntactic modification of FUN argument in the family of functions 
[lsv]apply(). The idea is to allow one-liner expressions without 
typing "function(item) {...}" to surround them. The argument to 
the anonymous function is simply referred as ".". Let take an 
example. With this new feature, the following call 
sapply(split(mtcars, mtcars$cyl), function(d) summary(lm(mpg ~ 
wt, d))$r.squared) # 4 6 8 #0.5086326 0.4645102 0.4229655 could 
be rewritten as sapply(split(mtcars, mtcars$cyl), summary(lm(mpg 
~ wt, .))$r.squared) "Not a big saving in typing" you can say but 
multiplied by the number of [lsv]apply usage and a neater look, I 
think, the idea merits to be considered. 
It's not in any way "neater", not only is it less readable, it's 
just plain wrong. What if the expression returned a function? 
do you mean like in l=sapply(1:3, function(i) function(x) i+x) 
l[[1]](3) # 4 l[[2]](3) # 5 This is indeed a corner case but a pair 
of () or {} can keep wsapply() in course: l=wsapply(1:3, 
(function(x) .+x)) l[[1]](3) # 4 l[[2]](3) # 5
How do you know that you don't want to apply the result of the call? 
A small example (if it is significantly different from the one 
above) would be very helpful for me to understand this point.
For the same reason the implementation below won't work - very 
often you just pass a symbol that evaluates to a function and 
always en expression that returns a function and there is no way 
to distinguish that from your new proposed syntax. 

Even with () or {} around such "dotted" expression? Best, Serguei.
When you feel compelled to use substitute() you should hear alarm 
bells that som

Re: [Rd] suggestion: "." in [lsv]apply()

2020-04-17 Thread Sokol Serguei

Thanks Simon,

Now, I see better your argument.

Le 16/04/2020 à 22:48, Simon Urbanek a écrit :
... I'm not arguing against the principle, I'm arguing about your 
particular proposal as it is inconsistent and not general.
This sounds promising for me. May be in a (new?) future, R core will 
come with a correct proposal for this principle?
Meanwhile, to avoid substitute(), I'll look on the side of formula 
syntax deviation as your example x ~> i + x suggested.


Best,
Serguei.

Personally, I find the current syntax much clearer and readable 
(defining anything by convention like . being the function variable 
seems arbitrary and "dirty" to me), but if you wanted to define a 
shorter syntax, you could use something like x ~> i + x. That said, I 
really don't see the value of not using function(x) [especially these 
days when people are arguing for long variable names with the 
justification that IDEs do all the work anyway], but as I said, my 
argument was against the actual proposal, not general ideas about 
syntax improvement. Cheers, Simon
On 17/04/2020, at 3:53 AM, Sokol Serguei  
wrote: Simon, Thanks for replying. In what follows I won't try to 
argue (I understood that you find this a bad idea) but I would like 
to make clearer some of your point for me (and may be for others). Le 
16/04/2020 à 16:48, Simon Urbanek a écrit :

Serguei,
On 17/04/2020, at 2:24 AM, Sokol Serguei  
wrote: Hi, I would like to make a suggestion for a small syntactic 
modification of FUN argument in the family of functions 
[lsv]apply(). The idea is to allow one-liner expressions without 
typing "function(item) {...}" to surround them. The argument to the 
anonymous function is simply referred as ".". Let take an example. 
With this new feature, the following call sapply(split(mtcars, 
mtcars$cyl), function(d) summary(lm(mpg ~ wt, d))$r.squared) # 4 6 
8 #0.5086326 0.4645102 0.4229655 could be rewritten as 
sapply(split(mtcars, mtcars$cyl), summary(lm(mpg ~ wt, 
.))$r.squared) "Not a big saving in typing" you can say but 
multiplied by the number of [lsv]apply usage and a neater look, I 
think, the idea merits to be considered. 
It's not in any way "neater", not only is it less readable, it's 
just plain wrong. What if the expression returned a function? 
do you mean like in l=sapply(1:3, function(i) function(x) i+x) 
l[[1]](3) # 4 l[[2]](3) # 5 This is indeed a corner case but a pair 
of () or {} can keep wsapply() in course: l=wsapply(1:3, (function(x) 
.+x)) l[[1]](3) # 4 l[[2]](3) # 5
How do you know that you don't want to apply the result of the call? 
A small example (if it is significantly different from the one above) 
would be very helpful for me to understand this point.
For the same reason the implementation below won't work - very often 
you just pass a symbol that evaluates to a function and always en 
expression that returns a function and there is no way to 
distinguish that from your new proposed syntax. 

Even with () or {} around such "dotted" expression? Best, Serguei.
When you feel compelled to use substitute() you should hear alarm 
bells that something is wrong ;). You can certainly write a new 
function that uses a different syntax (and I'm sure someone has 
already done that in the package space), but what you propose is 
incompatible with *apply in R (and very much not R syntax). Cheers, 
Simon
To illustrate a possible implementation, I propose a wrapper 
example for sapply(): wsapply=function(l, fun, ...) { 
s=substitute(fun) if (is.name(s) || is.call(s) && 
s[[1]]==as.name("function")) { sapply(l, fun, ...) # legacy call } 
else { sapply(l, function(d) eval(s, list(.=d)), ...) } } Now, we 
can do: wsapply(split(mtcars, mtcars$cyl), summary(lm(mpg ~ wt, 
.))$r.squared) or, traditional way: wsapply(split(mtcars, 
mtcars$cyl), function(d) summary(lm(mpg ~ wt, d))$r.squared) the 
both work. How do you feel about that? Best, Serguei. 
__ 
R-devel@r-project.org mailing list 
https://stat.ethz.ch/mailman/listinfo/r-devel 




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] suggestion: "." in [lsv]apply()

2020-04-16 Thread Sokol Serguei

Thanks Henrik,

Probably, it will be the solution I'll retain.

Best,
Serguei.

Le 16/04/2020 à 18:50, Henrik Bengtsson a écrit :

I'm sure this exists elsewhere, but, as a trade-off, could you achieve
what you want with a separate helper function F(expr) that constructs
the function you want to pass to [lsv]apply()?  Something that would
allow you to write:

sapply(split(mtcars, mtcars$cyl), F(summary(lm(mpg ~ wt,.))$r.squared))

Such an F() function would apply elsewhere too.

/Henrik

On Thu, Apr 16, 2020 at 9:30 AM Michael Mahoney
 wrote:

This syntax is already implemented in the {purrr} package, more or
less -- you need to add a tilde before your function call for it to
work exactly as written:

purrr::map_dbl(split(mtcars, mtcars$cyl), ~ summary(lm(wt ~ mpg, .))$r.squared)

is equivalent to

sapply(split(mtcars, mtcars$cyl), function(d) summary(lm(mpg ~ wt,
d))$r.squared)

Seems like using this package is probably an easier solution for this
wish than adding a reserved variable and adding additional syntax to
the apply family as a whole.

Thanks,

-Mike


From: Sokol Serguei 
Date: Thu, Apr 16, 2020 at 12:03 PM
Subject: Re: [Rd] suggestion: "." in [lsv]apply()
To: William Dunlap 
Cc: r-devel 


Thanks Bill,

Clearly, my first proposition for wsapply() is quick and dirty one.
However, if "." becomes a reserved variable with this new syntax,
wsapply() can be fixed (at least for your example and alike) as:

wsapply=function(l, fun, ...) {
  .=substitute(fun)
  if (is.name(.) || is.call(.) && .[[1]]==as.name("function")) {
  sapply(l, fun, ...)
  } else {
  sapply(l, function(d) eval(., list(.=d)), ...)
  }
}

Will it do the job?

Best,
Serguei.

Le 16/04/2020 à 17:07, William Dunlap a écrit :

Passing in a function passes not only an argument list but also an
environment from which to get free variables. Since your function
doesn't pay attention to the environment you get things like the
following.


wsapply(list(1,2:3), paste(., ":", deparse(s)))

[[1]]
[1] "1 : paste(., \":\", deparse(s))"

[[2]]
[1] "2 : paste(., \":\", deparse(s))" "3 : paste(., \":\", deparse(s))"

Bill Dunlap
TIBCO Software
wdunlap tibco.com <http://tibco.com>


On Thu, Apr 16, 2020 at 7:25 AM Sokol Serguei mailto:so...@insa-toulouse.fr>> wrote:

 Hi,

 I would like to make a suggestion for a small syntactic
 modification of
 FUN argument in the family of functions [lsv]apply(). The idea is to
 allow one-liner expressions without typing "function(item) {...}" to
 surround them. The argument to the anonymous function is simply
 referred
 as ".". Let take an example. With this new feature, the following call

 sapply(split(mtcars, mtcars$cyl), function(d) summary(lm(mpg ~ wt,
 d))$r.squared)
 #4 6 8
 #0.5086326 0.4645102 0.4229655


 could be rewritten as

 sapply(split(mtcars, mtcars$cyl), summary(lm(mpg ~ wt, .))$r.squared)

 "Not a big saving in typing" you can say but multiplied by the
 number of
 [lsv]apply usage and a neater look, I think, the idea merits to be
 considered.
 To illustrate a possible implementation, I propose a wrapper
 example for
 sapply():

 wsapply=function(l, fun, ...) {
  s=substitute(fun)
  if (is.name <http://is.name>(s) || is.call(s) &&
 s[[1]]==as.name <http://as.name>("function")) {
  sapply(l, fun, ...) # legacy call
  } else {
  sapply(l, function(d) eval(s, list(.=d)), ...)
  }
 }

 Now, we can do:

 wsapply(split(mtcars, mtcars$cyl), summary(lm(mpg ~ wt, .))$r.squared)

 or, traditional way:

 wsapply(split(mtcars, mtcars$cyl), function(d) summary(lm(mpg ~ wt,
 d))$r.squared)

 the both work.

 How do you feel about that?

 Best,
 Serguei.

 __
 R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel



 [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] suggestion: "." in [lsv]apply()

2020-04-16 Thread Sokol Serguei
Thanks Bill,

Clearly, my first proposition for wsapply() is quick and dirty one.
However, if "." becomes a reserved variable with this new syntax, 
wsapply() can be fixed (at least for your example and alike) as:

wsapply=function(l, fun, ...) {
     .=substitute(fun)
     if (is.name(.) || is.call(.) && .[[1]]==as.name("function")) {
     sapply(l, fun, ...)
     } else {
     sapply(l, function(d) eval(., list(.=d)), ...)
     }
}

Will it do the job?

Best,
Serguei.

Le 16/04/2020 à 17:07, William Dunlap a écrit :
> Passing in a function passes not only an argument list but also an 
> environment from which to get free variables. Since your function 
> doesn't pay attention to the environment you get things like the 
> following.
>
> > wsapply(list(1,2:3), paste(., ":", deparse(s)))
> [[1]]
> [1] "1 : paste(., \":\", deparse(s))"
>
> [[2]]
> [1] "2 : paste(., \":\", deparse(s))" "3 : paste(., \":\", deparse(s))"
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com <http://tibco.com>
>
>
> On Thu, Apr 16, 2020 at 7:25 AM Sokol Serguei  <mailto:so...@insa-toulouse.fr>> wrote:
>
> Hi,
>
> I would like to make a suggestion for a small syntactic
> modification of
> FUN argument in the family of functions [lsv]apply(). The idea is to
> allow one-liner expressions without typing "function(item) {...}" to
> surround them. The argument to the anonymous function is simply
> referred
> as ".". Let take an example. With this new feature, the following call
>
> sapply(split(mtcars, mtcars$cyl), function(d) summary(lm(mpg ~ wt,
> d))$r.squared)
> #    4 6 8
> #0.5086326 0.4645102 0.4229655
>
>
> could be rewritten as
>
> sapply(split(mtcars, mtcars$cyl), summary(lm(mpg ~ wt, .))$r.squared)
>
> "Not a big saving in typing" you can say but multiplied by the
> number of
> [lsv]apply usage and a neater look, I think, the idea merits to be
> considered.
> To illustrate a possible implementation, I propose a wrapper
> example for
> sapply():
>
> wsapply=function(l, fun, ...) {
>  s=substitute(fun)
>  if (is.name <http://is.name>(s) || is.call(s) &&
> s[[1]]==as.name <http://as.name>("function")) {
>  sapply(l, fun, ...) # legacy call
>  } else {
>  sapply(l, function(d) eval(s, list(.=d)), ...)
>  }
> }
>
> Now, we can do:
>
> wsapply(split(mtcars, mtcars$cyl), summary(lm(mpg ~ wt, .))$r.squared)
>
> or, traditional way:
>
> wsapply(split(mtcars, mtcars$cyl), function(d) summary(lm(mpg ~ wt,
> d))$r.squared)
>
> the both work.
>
> How do you feel about that?
>
> Best,
> Serguei.
>
> __
> R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] suggestion: "." in [lsv]apply()

2020-04-16 Thread Sokol Serguei

Simon,

Thanks for replying. In what follows I won't try to argue (I understood 
that you find this a bad idea) but I would like to make clearer some of 
your point for me (and may be for others).


Le 16/04/2020 à 16:48, Simon Urbanek a écrit :

Serguei,
On 17/04/2020, at 2:24 AM, Sokol Serguei  
wrote: Hi, I would like to make a suggestion for a small syntactic 
modification of FUN argument in the family of functions [lsv]apply(). 
The idea is to allow one-liner expressions without typing 
"function(item) {...}" to surround them. The argument to the 
anonymous function is simply referred as ".". Let take an example. 
With this new feature, the following call sapply(split(mtcars, 
mtcars$cyl), function(d) summary(lm(mpg ~ wt, d))$r.squared) # 4 6 8 
#0.5086326 0.4645102 0.4229655 could be rewritten as 
sapply(split(mtcars, mtcars$cyl), summary(lm(mpg ~ wt, .))$r.squared) 
"Not a big saving in typing" you can say but multiplied by the number 
of [lsv]apply usage and a neater look, I think, the idea merits to be 
considered. 
It's not in any way "neater", not only is it less readable, it's just 
plain wrong. What if the expression returned a function?

do you mean like in
l=sapply(1:3, function(i) function(x) i+x)
l[[1]](3)
# 4
l[[2]](3)
# 5

This is indeed a corner case but a pair of () or {} can keep wsapply() 
in course:

l=wsapply(1:3, (function(x) .+x))

l[[1]](3)

# 4

l[[2]](3)

# 5

How do you know that you don't want to apply the result of the call?
A small example (if it is significantly different from the one above) 
would be very helpful for me to understand this point.


For the same reason the implementation below won't work - very often 
you just pass a symbol that evaluates to a function and always en 
expression that returns a function and there is no way to distinguish 
that from your new proposed syntax.

Even with () or {} around such "dotted" expression?

Best,
Serguei.

When you feel compelled to use substitute() you should hear alarm 
bells that something is wrong ;). You can certainly write a new 
function that uses a different syntax (and I'm sure someone has 
already done that in the package space), but what you propose is 
incompatible with *apply in R (and very much not R syntax). Cheers, Simon
To illustrate a possible implementation, I propose a wrapper example 
for sapply(): wsapply=function(l, fun, ...) { s=substitute(fun) if 
(is.name(s) || is.call(s) && s[[1]]==as.name("function")) { sapply(l, 
fun, ...) # legacy call } else { sapply(l, function(d) eval(s, 
list(.=d)), ...) } } Now, we can do: wsapply(split(mtcars, 
mtcars$cyl), summary(lm(mpg ~ wt, .))$r.squared) or, traditional way: 
wsapply(split(mtcars, mtcars$cyl), function(d) summary(lm(mpg ~ wt, 
d))$r.squared) the both work. How do you feel about that? Best, 
Serguei. __ 
R-devel@r-project.org mailing list 
https://stat.ethz.ch/mailman/listinfo/r-devel 




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] suggestion: "." in [lsv]apply()

2020-04-16 Thread Sokol Serguei

Hi,

I would like to make a suggestion for a small syntactic modification of 
FUN argument in the family of functions [lsv]apply(). The idea is to 
allow one-liner expressions without typing "function(item) {...}" to 
surround them. The argument to the anonymous function is simply referred 
as ".". Let take an example. With this new feature, the following call


sapply(split(mtcars, mtcars$cyl), function(d) summary(lm(mpg ~ wt, 
d))$r.squared)

#    4 6 8
#0.5086326 0.4645102 0.4229655


could be rewritten as

sapply(split(mtcars, mtcars$cyl), summary(lm(mpg ~ wt, .))$r.squared)

"Not a big saving in typing" you can say but multiplied by the number of 
[lsv]apply usage and a neater look, I think, the idea merits to be 
considered.
To illustrate a possible implementation, I propose a wrapper example for 
sapply():


wsapply=function(l, fun, ...) {
    s=substitute(fun)
    if (is.name(s) || is.call(s) && s[[1]]==as.name("function")) {
    sapply(l, fun, ...) # legacy call
    } else {
    sapply(l, function(d) eval(s, list(.=d)), ...)
    }
}

Now, we can do:

wsapply(split(mtcars, mtcars$cyl), summary(lm(mpg ~ wt, .))$r.squared)

or, traditional way:

wsapply(split(mtcars, mtcars$cyl), function(d) summary(lm(mpg ~ wt, 
d))$r.squared)


the both work.

How do you feel about that?

Best,
Serguei.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] suggestion: conda for third-party software

2020-01-07 Thread Sokol Serguei

Thanks for this hint.

Le 07/01/2020 à 20:47, Kevin Ushey a écrit :

The newest version of reticulate does something very similar: R
packages can declare their Python package dependencies in the
Config/reticulate field of a DESCRIPTION file, and reticulate can read
and use those dependencies to provision a Python environment for the
user when requested (currently using Miniconda).


If miniconda is used, does it mean that not only Python but any conda 
package can be indicated in dependency ?


And another question, do you know if miniconda is installed on testing 
CRAN machines? (Without this I cannot see how your packages with conda 
dependencies could be tested during their submission.)


Best,

Serguei.



Similarly, rather than having this part of SystemRequirements, package
authors could declare these in a separate field called e.g.
Config/conda. Then, you could have an R package that knows how to read
and parse these configuration requests, and install those packages for
the user.

That said, maintaining a Conda installation and its environments is
non-trivial, and things do not always work as expected when mixing
Conda applications with non-Conda applications. Most notably, Conda
installations bundle their own copies of libraries; e.g. the C++
standard library, Qt, OpenSSL, and so on. If an application tries to
mix and match both system-provided and Conda-provided libraries in the
same process, bad things often happen. This was still the
lowest-friction way forward for us with reticulate, but it's worth
being aware that Conda is not a total panacea.

Best,
Kevin

On Tue, Jan 7, 2020 at 6:50 AM Serguei Sokol  wrote:

Best wishes for 2020!

I would like to suggest a new feature for R package management. Its aim
is to enable package developers and end-users to rely on conda (
https://docs.conda.io/en/latest/ ) for managing third-party software
(TPS) on major platforms: linux64, win64 and osx64. Currently, many R
packages include TPS as part of them thus bloating their sizes and often
duplicating files on a given system.  And even when TPS is not included
in an R package but is just installed on a system, it is not so obvious
to get the right path to it. Sometimes pkg-config helps but it is not
always present.

So, the new feature would be to let R package developers to write in
DESCRIPTION/SystemRequirements field something like
'conda:boost-cpp>=1.71' where 'boost-cpp' is an example of a conda
package and '>=1.71' is an optional version requirement. Having this
could allow install.packages() to install TPS on a testing CRAN machine
or on an end-user's one. (There is just one line to execute in a shell:
conda install . It will install the package itself as well as
all its dependencies).

To my mind, this feature would have the following advantages:
   - on-disk size economy as the same TPS does not have to be included in
R package itself and can be shared with other language wrappers, e.g.
Python;
   - an easy flag configuring in Makevars as paths to TPS will be well
known in advance;
   - CRAN machines could test packages relying on a wide panel of TPS
without bothering with their manual installation;
   - TPS installation can become transparent for the end-user on major
platforms;

Note that even R is part of conda (
https://anaconda.org/conda-forge/r-base ), it is not mandatory to use
the conda's R version for this feature. Here, conda is just meant to
facilitate access to TPS. However, a minimal requirement is obviously to
have conda itself.

Does it look reasonable? appealing?
Best,
Serguei.

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] [New Patch] Fix disk corruption when writing

2017-07-06 Thread Sokol Serguei

Duncan Murdoch has written at  Thu, 6 Jul 2017 13:58:10 -0400

On 06/07/2017 5:21 AM, Serguei Sokol wrote:
I propose the following patch against the current 
R-devel/src/main/connection.c (cf. attached file).

It gives (on my linux box):
 > fc=file("/dev/full", "w")
 > write.csv("a", file=fc)
Error in writeLines(paste(col.names, collapse = sep), file, sep = eol) :
   system call failure on writing
 > close(fc)

Serguei.


I suspect that flushing on every write will slow things down too much.

That's quite plausible.



I think a better approach is to catch the error in the Rconn_printf 
calls (as R-devel currently does), and also catch errors on 
con->close(con).  This one requires more changes to the source, so it 
may be a day or two before I commit.

I have testes it on file_close() and it works (cf. attached patch):
> fc=file("/dev/full", "w")
> write.csv("a", file=fc)
> close(fc)
Error in close.connection(fc) : closing file failed



One thing I have to look into:  is anyone making use of the fact that 
the R-level close.connection() function can return -1 to signal an 
error?  Base R ignores that, which is why we need to do something, but 
if packages are using it, things need to be changed carefully.  I 
can't just change it to raise an error instead.

As you can see in the patch, no need to change close.connection() function
but we have to add a test of con->status to all *_close() functions 
(gzfile_close() and co.)


Serguei.



Le 05/07/2017 à 15:33, Serguei Sokol a écrit :

Le 05/07/2017 à 14:46, Serguei Sokol a écrit :

Le 05/07/2017 à 13:09, Duncan Murdoch a écrit :

On 05/07/2017 5:26 AM, January W. wrote:

I tried the newest patch, but it does not seem to work for me (on
Linux). Despite the check in Rconn_printf, the write.csv happily 
writes
to /dev/full and does not report an error. When I added a 
printf("%d\n",

res); to the Rconn_printf() definition, I see only positive values
returned by the vfprintf call.



That's likely because you aren't writing enough to actually 
trigger a write to disk during the write.  Writes are buffered, 
and the error doesn't happen

until the buffer is written.
I can confirm this behavior with fvprintf(). Small and medium sized 
writings

on /dev/full don't trigger error and 1MB does.

But if fprintf() is used, it returns a negative value from the very 
first byte written.
I correct myself. In my test, fprintf() returned -1 for another 
reason (connection was already closed

at this moment)
However, if fvprintf(...) is followed by res=fflush(con) then res is -1
if we try to write on /dev/full. May be we have to use this to trigger
an error message in R.

Serguei.

  The regression test I put in had this problem; I'm working on 
MacOS and Windows, so I never got to actually try it before 
committing.


Unfortunately, it doesn't look possible to catch the final flush 
of the buffer when the connection is closed, so small writes won't 
trigger any error.


It's also possible that whatever system you're on doesn't signal 
an error when the write fails.


Duncan Murdoch


Cheers,

j.


On 4 July 2017 at 21:37, Duncan Murdoch > wrote:

On 04/07/2017 11:50 AM, Jean-Sébastien Bevilacqua wrote:

Hello,
You can find here a patch to fix disk corruption.
When your disk is full, the write function exit without 
error

but the file
is truncated.

https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17243



Thanks.  I didn't see that when it came through (or did and 
forgot).

I'll probably move the error check to a lower level (in the
Rconn_printf function), if tests show that works.

Duncan Murdoch


__
R-devel@r-project.org  mailing 
list

https://stat.ethz.ch/mailman/listinfo/r-devel





--
 January Weiner --


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel













Index: connections.c
===
--- connections.c	(revision 72894)
+++ connections.c	(working copy)
@@ -769,6 +769,8 @@
 #ifdef Win32
 if(this->anon_file) unlink(this->name);
 #endif
+if(con->status < 0)
+	error(_("closing file failed"));
 }
 
 static int file_vfprintf(Rconnection con, const char *format, va_list ap)
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel