Re: [Rd] stopifnot() does not stop at first non-TRUE argument

2017-05-18 Thread Martin Maechler
>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel@r-project.org>
>>>>> on Tue, 16 May 2017 16:37:45 + writes:

> switch(i, ...)
> extracts 'i'-th argument in '...'. It is like
> eval(as.name(paste0("..", i))) .

Yes, that's neat.

It is only almost the same:  in the case of illegal 'i'
the switch() version returns
invisible(NULL)

whereas the version we'd want should signal an error, typically
the same error message as

  > t2 <- function(...) ..2
  > t2(1)
  Error in t2(1) (from #1) : the ... list does not contain 2 elements
  > 


> Just mentioning other things:
> - For 'n',
> n <- nargs()
> can be used.

I know .. [in this case, where '...' is the only formal argument of the 
function]

> - sys.call() can be used in place of match.call() .

Hmm... in many cases, yes notably, as we do *not* want the
argument names here, I think you are right.


> ---
>>>>> peter dalgaard 
>>>>> on Mon, 15 May 2017 16:28:42 +0200 writes:

>> I think Hervé's idea was just that if switch can evaluate arguments 
selectively, so can stopifnot(). But switch() is .Primitive, so does it from C. 

> if he just meant that, then "yes, of course" (but not so interesting).

>> I think it is almost a no-brainer to implement a sequential stopifnot if 
dropping to C code is allowed. In R it gets trickier, but how about this:

> Something like this, yes, that's close to what Serguei Sokol had proposed
> (and of course I *do*  want to keep the current sophistication
> of stopifnot(), so this is really too simple)

>> Stopifnot <- function(...)
>> {
>> n <- length(match.call()) - 1
>> for (i in 1:n)
>> {
>> nm <- as.name(paste0("..",i))
>> if (!eval(nm)) stop("not all true")
>> }
>> }
>> Stopifnot(2+2==4)
>> Stopifnot(2+2==5, print("Hey!!!") == "Hey!!!")
>> Stopifnot(2+2==4, print("Hey!!!") == "Hey!!!")
>> Stopifnot(T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,F,T)


>>> On 15 May 2017, at 15:37 , Martin Maechler  wrote:
>>> 
>>> I'm still curious about Hervé's idea on using  switch()  for the
>>> issue.

>> -- 
>> Peter Dalgaard, Professor,
>> Center for Statistics, Copenhagen Business School
>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>> Phone: (+45)38153501
>> Office: A 4.23
>> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] problem running test on a system without /etc/localtime

2017-05-17 Thread Martin Maechler
> Henrik Bengtsson 
> on Tue, 16 May 2017 20:49:02 -0700 writes:

> On Tue, May 16, 2017 at 5:35 PM, Kirill Maslinsky  
wrote:
>> Hi all,
>> 
>> A problem with tests while building R.
>> 
>> I'm packaging R for Sisyphus repository and package build environment,
>> by design, doesn't have /etc/localtime file present. This causes failure
>> with Sys.timeone during test run:
>> 
>> [builder@localhost tests]$ ../bin/R --vanilla < reg-tests-1d.R
>> 
>>> ## PR#17186 - Sys.timezone() on some Debian-derived platforms
>>> (S.t <- Sys.timezone())
>> Error in normalizePath("/etc/localtime") :
>> (converted from warning) path[1]="/etc/localtime": No such file or
>> directory
>> Calls: Sys.timezone -> normalizePath
>> Execution halted
>> 
>> This is caused by this code:
>> 
>>> Sys.timezone
>> function (location = TRUE)
>> {
>> tz <- Sys.getenv("TZ", names = FALSE)
>> if (!location || nzchar(tz))
>> return(Sys.getenv("TZ", unset = NA_character_))
 lt <- normalizePath("/etc/localtime")
>> [remainder of the code skkipped]
>> 
>> File /etc/loclatime is optional and is not guaranteed to be present on
>> any platform. And anyway, it is a good idea to first check that file
>> exists before calling normalizePath.

> Looking at the code
> 
(https://github.com/wch/r-source/blob/R-3-4-branch/src/library/base/R/datetime.R#L26),
> could it be that mustWork = FALSE (instead of the default NA) avoids
> the warning causes this check error?

Good idea.

Kirill, could you apply the minimal patch to the sources and
report back ?


> Index: src/library/base/R/datetime.R
> ===
> --- src/library/base/R/datetime.R (revision 72684)
> +++ src/library/base/R/datetime.R (working copy)
> @@ -23,7 +23,7 @@
> {
> tz <- Sys.getenv("TZ", names = FALSE)
> if(!location || nzchar(tz)) return(Sys.getenv("TZ", unset = 
NA_character_))
> -lt <- normalizePath("/etc/localtime") # Linux, macOS, ...
> +lt <- normalizePath("/etc/localtime", mustWork = FALSE) # Linux, 
macOS, ...
> if (grepl(pat <- "^/usr/share/zoneinfo/", lt)) sub(pat, "", lt)
> else if (lt == "/etc/localtime" && file.exists("/etc/timezone") &&
> dir.exists("/usr/share/zoneinfo") &&

> /Henrik

>> 
>> Sure, this can be worked around by setting TZ environment variable, but
>> that causes tests to fail in another place:
>> 
>> [builder@localhost tests]$ TZ="GMT" ../bin/R --vanilla < reg-tests-1d.R
>> 
>>> ## format()ing invalid hand-constructed  POSIXlt  objects
>>> d <- as.POSIXlt("2016-12-06"); d$zone <- 1
>>> tools::assertError(format(d))
>> Error: Failed to get error in evaluating format(d)
>> Execution halted
>> 
>> It seems that the best solution will be to patch Sys.timezone.
>> 
>> --
>> KM
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] stopifnot() does not stop at first non-TRUE argument

2017-05-16 Thread Martin Maechler
>>>>>   <luke-tier...@uiowa.edu>
>>>>> on Tue, 16 May 2017 09:49:56 -0500 writes:

> On Tue, 16 May 2017, Martin Maechler wrote:
>>>>>>> Hervé Pagès <hpa...@fredhutch.org>
>>>>>>> on Mon, 15 May 2017 16:54:46 -0700 writes:
>> 
>> > Hi,
>> > On 05/15/2017 10:41 AM, luke-tier...@uiowa.edu wrote:
>> >> This is getting pretty convoluted.
>> >>
>> >> The current behavior is consistent with the description at the top of
>> >> the help page -- it does not promise to stop evaluation once the first
>> >> non-TRUE is found.  That seems OK to me -- if you want sequencing you
>> >> can use
>> >>
>> >> stopifnot(A)
>> >> stopifnot(B)
>> >>
>> >> or
>> >>
>> >> stopifnot(A && B)
>> 
>> > My main use case for using stopifnot() is argument checking. In that
>> > context, I like the conciseness of
>> 
>> > stopifnot(
>> > A,
>> > B,
>> > ...
>> > )
>> 
>> > I think it's a common use case (and a pretty natural thing to do) to
>> > order/organize the expressions in a way such that it only makes sense
>> > to continue evaluating if all was OK so far e.g.
>> 
>> > stopifnot(
>> > is.numeric(x),
>> > length(x) == 1,
>> > is.na(x)
>> > )
>> 
>> I agree.  And that's how I have used stopifnot() in many cases
>> myself, sometimes even more "extremely" than the above example,
>> using assertions that only make sense if previous assertions
>> were fulfilled, such as
>> 
>> stopifnot(is.numeric(n), length(n) == 1, n == round(n), n >= 0)
>> 
>> or in the Matrix package, first checking some class properties
>> and then things that only make sense for objects with those properties.
>> 
>> 
>> > At least that's how things are organized in the stopifnot() calls that
>> > accumulated in my code over the years. That's because I was convinced
>> > that evaluation would stop at the first non-true expression (as
>> > suggested by the man page). Until recently when I got a warning issued
>> > by an expression located *after* the first non-true expression. This
>> > was pretty unexpected/confusing!
>> 
>> > If I can't rely on this "sequencing" feature, I guess I can always
>> > do
>> 
>> > stopifnot(A)
>> > stopifnot(B)
>> > ...
>> 
>> > but I loose the conciseness of calling stopifnot() only once.
>> > I could also use
>> 
>> > stopifnot(A && B && ...)
>> 
>> > but then I loose the conciseness of the error message i.e. it's going
>> > to be something like
>> 
>> > Error: A && B && ... is not TRUE
>> 
>> > which can be pretty long/noisy compared to the message that reports
>> > only the 1st error.
>> 
>> 
>> > Conciseness/readability of the single call to stopifnot() and
>> > conciseness of the error message are the features that made me
>> > adopt stopifnot() in the 1st place.
>> 
>> Yes, and that had been my design goal when I created it.
>> 
>> I do tend agree with  Hervé and Serguei here.
>> 
>> > If stopifnot() cannot be revisited
>> > to do "sequencing" then that means I will need to revisit all my calls
>> > to stopifnot().
>> 
>> >>
>> >> I could see an argument for a change that in the multiple argumetn
>> >> case reports _all_ that fail; that would seem more useful to me than
>> >> twisting the code into knots.
>> 
>> Interesting... but really differing from the current documentation,
>> 
>> > Why not. Still better than the current situation. But only if that
>> > semantic seems more useful to people. Would be sad if usefulness
>> > of one semantic or the other was decided based on trickiness of
>> > implementation.
>> 
>> Well, the trickiness  should definitely play a role.
>> Apart from functionality and semantics, long term maintenanc

Re: [Rd] stopifnot() does not stop at first non-TRUE argument

2017-05-16 Thread Martin Maechler
>>>>> Hervé Pagès <hpa...@fredhutch.org>
>>>>> on Mon, 15 May 2017 16:54:46 -0700 writes:

> Hi,
> On 05/15/2017 10:41 AM, luke-tier...@uiowa.edu wrote:
>> This is getting pretty convoluted.
>> 
>> The current behavior is consistent with the description at the top of
>> the help page -- it does not promise to stop evaluation once the first
>> non-TRUE is found.  That seems OK to me -- if you want sequencing you
>> can use
>> 
>> stopifnot(A)
>> stopifnot(B)
>> 
>> or
>> 
>> stopifnot(A && B)

> My main use case for using stopifnot() is argument checking. In that
> context, I like the conciseness of

> stopifnot(
> A,
> B,
> ...
> )

> I think it's a common use case (and a pretty natural thing to do) to
> order/organize the expressions in a way such that it only makes sense
> to continue evaluating if all was OK so far e.g.

> stopifnot(
> is.numeric(x),
> length(x) == 1,
> is.na(x)
> )

I agree.  And that's how I have used stopifnot() in many cases
myself, sometimes even more "extremely" than the above example,
using assertions that only make sense if previous assertions
were fulfilled, such as

stopifnot(is.numeric(n), length(n) == 1, n == round(n), n >= 0)

or in the Matrix package, first checking some class properties
and then things that only make sense for objects with those properties.


> At least that's how things are organized in the stopifnot() calls that
> accumulated in my code over the years. That's because I was convinced
> that evaluation would stop at the first non-true expression (as
> suggested by the man page). Until recently when I got a warning issued
> by an expression located *after* the first non-true expression. This
> was pretty unexpected/confusing!

> If I can't rely on this "sequencing" feature, I guess I can always
> do

> stopifnot(A)
> stopifnot(B)
> ...

> but I loose the conciseness of calling stopifnot() only once.
> I could also use

> stopifnot(A && B && ...)

> but then I loose the conciseness of the error message i.e. it's going
> to be something like

> Error: A && B && ... is not TRUE

> which can be pretty long/noisy compared to the message that reports
> only the 1st error.


> Conciseness/readability of the single call to stopifnot() and
> conciseness of the error message are the features that made me
> adopt stopifnot() in the 1st place. 

Yes, and that had been my design goal when I created it.

I do tend agree with  Hervé and Serguei here.

> If stopifnot() cannot be revisited
> to do "sequencing" then that means I will need to revisit all my calls
> to stopifnot().

>> 
>> I could see an argument for a change that in the multiple argumetn
>> case reports _all_ that fail; that would seem more useful to me than
>> twisting the code into knots.

Interesting... but really differing from the current documentation,

> Why not. Still better than the current situation. But only if that
> semantic seems more useful to people. Would be sad if usefulness
> of one semantic or the other was decided based on trickiness of
> implementation.

Well, the trickiness  should definitely play a role.
Apart from functionality and semantics, long term maintenance
and code readibility, even elegance have shown to be very
important aspects of good code in ca 30 years of S and R programming.

OTOH, as mentioned above, the creation of good error messages
has been an important design goal of  stopifnot()  and hence I'm
willing to accept the extra complexity of "patching up" the call
used in the error / warning messages.

Also, as a change to what I posted yesterday, I now plan to follow
Peter Dalgaard's suggestion of using
 eval( .. ) 
instead of   eval(cl[[i]], envir = <parent.frame(.)>)
as there may be cases where the former behaves better in lazy
evaluation situations.
(Other opinions on that ?)

Martin

> Thanks,
> H.

>> 
>> Best,
>> 
>> luke
>> 
>> On Mon, 15 May 2017, Martin Maechler wrote:
>> 
>>>>>>>> Serguei Sokol <so...@insa-toulouse.fr>
>>>>>>>> on Mon, 15 May 2017 16:32:20 +0200 writes:
>>> 
>>> > Le 15/05/2017 à 15:37, Martin Maechler a écrit :
>>> >>>>>>> Serguei Sokol <so...@insa-toulouse.fr>
>&

Re: [Rd] [bug] droplevels() also drop object attributes (comment…)

2017-05-16 Thread Martin Maechler
> Serge Bibauw 
> on Mon, 15 May 2017 11:59:32 -0400 writes:

> Hi,

> Just reporting a small bug… not really a big deal, but I don’t think that 
is intended: droplevels() also drops all object’s attributes.

Yes.  The help page for droplevels (or the simple definition of
'droplevels.factor') clearly indicate that the method for
factors is really just a call to   factor(x, exclude = *)

and that _is_ quite an important base function whose semantic
should not be changed lightly. Still, let's continue :

Looking a bit, I see that the current behavior of factor() {and
hence droplevels} has been unchanged in this respect  for the
whole history of R, well, at least for more than 17 years (R 1.0.1, April 2000).

I'd agree there _is_ a bug, at least in the documentation which
does *not* mention that currently, all attributes are dropped but "names",
"levels" (and "class").

OTOH, factor() would only need a small change to make it
preserve all attributes (but "class" and "levels" which are set explicitly).

I'm sure this will break some checks in some packages.
Is it worth it?

e.g., our own R  QC checks currently check (the printing of) the
following (in tests/reg-tests-2.R ):

> ## some tests of factor matrices
> A <- factor(7:12)
> dim(A) <- c(2, 3)
> A
 [,1] [,2] [,3]
[1,] 7911  
[2,] 810   12  
Levels: 7 8 9 10 11 12
> str(A)
 factor [1:2, 1:3] 7 8 9 10 ...
 - attr(*, "levels")= chr [1:6] "7" "8" "9" "10" ...
> A[, 1:2]
 [,1] [,2]
[1,] 79   
[2,] 810  
Levels: 7 8 9 10 11 12
> A[, 1:2, drop=TRUE]
[1] 7  8  9  10
Levels: 7 8 9 10

with the proposed change to factor(),
the last call would change its result:

> A[, 1:2, drop=TRUE]
 [,1] [,2]
[1,] 79   
[2,] 810  
Levels: 7 8 9 10

because 'drop=TRUE' calls factor(..) and that would also
preserve the "dim" attribute.
I would think that the changed behavior _is_ better, and is also
according to documentation, because the help page for
 [.factor
explains that 'drop = TRUE' drops levels, but _not_ that it
transforms a factor matrix into a factor (vector).


Martin


> Example:

>> > test <- c("hello", "something", "hi")
>> > test <- factor(test)
>> > comment(test) <- "this is a test"
>> > attr(test, "description") <- "this is another test"
>> > attributes(test)
>> $levels
>> [1] "hello"     "hi"        "something"
>> 
>> $class
>> [1] "factor"
>> 
>> $comment
>> [1] "this is a test"
>> 
>> $description
>> [1] "this is another test"
>> 
>> > test <- droplevels(test)
>> > attributes(test)
>> $levels
>> [1] "hello"     "hi"        "something"
>> 
>> $class
>> [1] "factor"


> Serge

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] stopifnot() does not stop at first non-TRUE argument

2017-05-15 Thread Martin Maechler
>>>>> Serguei Sokol <so...@insa-toulouse.fr>
>>>>> on Mon, 15 May 2017 16:32:20 +0200 writes:

> Le 15/05/2017 à 15:37, Martin Maechler a écrit :
>>>>>>> Serguei Sokol <so...@insa-toulouse.fr>
>>>>>>> on Mon, 15 May 2017 13:14:34 +0200 writes:
>> > I see in the archives that the attachment cannot pass.
>> > So, here is the code:
>> 
>> [... MM: I needed to reformat etc to match closely to
>> the current source code which is in
>> https://svn.r-project.org/R/trunk/src/library/base/R/stop.R
>> or its corresponding github mirror
>> https://github.com/wch/r-source/blob/trunk/src/library/base/R/stop.R
>> ]
>> 
>> > Best,
>> > Serguei.
>> 
>> Yes, something like that seems even simpler than Peter's
>> suggestion...
>> 
>> It currently breaks 'make check' in the R sources,
>> specifically in tests/reg-tests-2.R (lines 6574 ff),
>> the new code now gives
>> 
>> > ## error messages from (C-level) evalList
>> > tst <- function(y) { stopifnot(is.numeric(y)); y+ 1 }
>> > try(tst())
>> Error in eval(cl.i, pfr) : argument "y" is missing, with no default
>> 
>> whereas previously it gave
>> 
>> Error in stopifnot(is.numeric(y)) :
>> argument "y" is missing, with no default
>> 
>> 
>> But I think that change (of call stack in such an error case) is
>> unavoidable and not a big problem.

> It can be avoided but at price of customizing error() and warning() calls 
with something like:
> wrn <- function(w) {w$call <- cl.i; warning(w)}
> err <- function(e) {e$call <- cl.i; stop(e)}
> ...
> tryCatch(r <- eval(cl.i, pfr), warning=wrn, error=err)

> Serguei.

Well, a good idea, but the 'warning' case is more complicated
(and the above incorrect): I do want the warning there, but
_not_ return the warning, but rather, the result of eval() :
So this needs even more sophistication, using  withCallingHandlers(.)
and maybe that really get's too sophisticated and no
more "readable" to 99.9% of the R users ... ?

I now do append my current version -- in case some may want to
comment or improve further.

Martin

stopifnot <- function(...)
{
penv <- parent.frame()
cl <- match.call(envir = penv)[-1]
Dparse <- function(call, cutoff = 60L) {
ch <- deparse(call, width.cutoff = cutoff)
if(length(ch) > 1L) paste(ch[1L], "") else ch
}
head <- function(x, n = 6L) ## basically utils:::head.default()
x[seq_len(if(n < 0L) max(length(x) + n, 0L) else min(n, length(x)))]
abbrev <- function(ae, n = 3L)
paste(c(head(ae, n), if(length(ae) > n) ""), collapse="\n  ")
benv <- baseenv()
for (i in seq_along(cl)) {
cl.i <- cl[[i]]
## r <- eval(cl.i, envir = penv, enclos = benv)
##   but with correct warn/err messages:
r <- withCallingHandlers(
tryCatch(eval(cl.i, envir = penv, enclos = benv),
 error = function(e) { e$call <- cl.i; stop(e) }),
warning = function(w) { w$call <- cl.i; w })
if (!(is.logical(r) && !anyNA(r) && all(r))) {
msg <- ## special case for decently written 'all.equal(*)':
if(is.call(cl.i) && identical(cl.i[[1]], quote(all.equal)) &&
   (is.null(ni <- names(cl.i)) || length(cl.i) == 3L ||
length(cl.i <- cl.i[!nzchar(ni)]) == 3L))

sprintf(gettext("%s and %s are not equal:\n  %s"),
Dparse(cl.i[[2]]),
Dparse(cl.i[[3]]), abbrev(r))
else
sprintf(ngettext(length(r),
 "%s is not TRUE",
 "%s are not all TRUE"),
Dparse(cl.i))

stop(msg, call. = FALSE, domain = NA)
}
}
invisible()
}
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] stopifnot() does not stop at first non-TRUE argument

2017-05-15 Thread Martin Maechler
>>>>> peter dalgaard <pda...@gmail.com>
>>>>> on Mon, 15 May 2017 16:28:42 +0200 writes:

> I think Hervé's idea was just that if switch can evaluate arguments 
selectively, so can stopifnot(). But switch() is .Primitive, so does it from C. 

if he just meant that, then "yes, of course" (but not so interesting).

> I think it is almost a no-brainer to implement a sequential stopifnot if 
dropping to C code is allowed. In R it gets trickier, but how about this:

Something like this, yes, that's close to what Serguei Sokol had proposed
(and of course I *do*  want to keep the current sophistication
 of stopifnot(), so this is really too simple)

> Stopifnot <- function(...)
> {
> n <- length(match.call()) - 1
> for (i in 1:n)
> {
> nm <- as.name(paste0("..",i))
> if (!eval(nm)) stop("not all true")
> }
> }
> Stopifnot(2+2==4)
> Stopifnot(2+2==5, print("Hey!!!") == "Hey!!!")
> Stopifnot(2+2==4, print("Hey!!!") == "Hey!!!")
> Stopifnot(T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,F,T)


>> On 15 May 2017, at 15:37 , Martin Maechler <maech...@stat.math.ethz.ch> 
wrote:
>> 
>> I'm still curious about Hervé's idea on using  switch()  for the
>> issue.

> -- 
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] stopifnot() does not stop at first non-TRUE argument

2017-05-15 Thread Martin Maechler
>>>>> Serguei Sokol <so...@insa-toulouse.fr>
>>>>> on Mon, 15 May 2017 13:14:34 +0200 writes:

> I see in the archives that the attachment cannot pass.
> So, here is the code:

[... MM: I needed to reformat etc to match closely to
 the current source code which is in
 https://svn.r-project.org/R/trunk/src/library/base/R/stop.R
 or its corresponding github mirror
https://github.com/wch/r-source/blob/trunk/src/library/base/R/stop.R
]

> Best,
> Serguei.

Yes, something like that seems even simpler than Peter's
suggestion...

It currently breaks 'make check' in the R sources,
specifically in tests/reg-tests-2.R (lines 6574 ff),
the new code now gives

  > ## error messages from (C-level) evalList
  > tst <- function(y) { stopifnot(is.numeric(y)); y+ 1 }
  > try(tst())
  Error in eval(cl.i, pfr) : argument "y" is missing, with no default

whereas previously it gave

  Error in stopifnot(is.numeric(y)) : 
 argument "y" is missing, with no default


But I think that change (of call stack in such an error case) is
unavoidable and not a big problem.

--

I'm still curious about Hervé's idea on using  switch()  for the
issue.

Martin


> Le 15/05/2017 à 12:48, Serguei Sokol a écrit :
>> Hello,
>> 
>> I am a new on this list, so I introduce myself very briefly:
>> my background is applied mathematics, more precisely scientific calculus
>> applied for modeling metabolic systems, I am author/maintainer of
>> few packages (Deriv, rmumps, arrApply).
>> 
>> Now, on the subject of this discussion, I must say that I don't really 
understand
>> Peter's argument:
>> 
>> >>> To do it differently, you would have to do something like
>> >>>
>> >>> dots <- match.call(expand.dots=FALSE)$...
>> >>>
>> >>> and then explicitly evaluate each argument in turn in the caller
>> >>> frame. This amount of nonstandard evaluation sounds like it would
>> >>> incur a performance penalty, which could be undesirable.
>> The first line of the current stopifnot()
>> n <- length(ll <- list(...))
>> already evaluates _all_ of the arguments
>> in the caller frame. So to do the same only
>> on a part of them (till the first FALSE or NA occurs)
>> cannot be more penalizing than the current version, right?
>> 
>> I attach here a slightly modified version called stopifnot_new()
>> which works in accordance with the man page and
>> where there are only two additional calls: parent.frame() and eval().
>> I don't think it can be considered as real performance penalty
>> as the same or bigger amount of (implicit) evaluations was
>> already done in the current version:
>> 
>>> source("stopifnot_new.R")
>>> stopifnot_new(3 == 5, as.integer(2^32), a <- 12)
>> Error: 3 == 5 is not TRUE
>>> a
>> Error: object 'a' not found
>> 
>> Best,
>> Serguei.
>> 
>> 
>> Le 15/05/2017 à 10:39, Martin Maechler a écrit :
>>>>>>>> Hervé Pagès <hpa...@fredhutch.org>
>>>>>>>> on Wed, 3 May 2017 12:08:26 -0700 writes:
>>> > On 05/03/2017 12:04 PM, Hervé Pagès wrote:
>>> >> Not sure why the performance penalty of nonstandard evaluation would
>>> >> be more of a concern here than for something like switch().
>>> 
>>> > which is actually a primitive. So it seems that there is at least
>>> > another way to go than 'dots <- match.call(expand.dots=FALSE)$...'
>>> 
>>> > Thanks, H.
>>> 
>>> >>
>>> >> If that can't/won't be fixed, what about fixing the man page so it's
>>> >> in sync with the current behavior?
>>> >>
>>> >> Thanks, H.
>>> 
>>> Being back from vacations,...
>>> I agree that something should be done here, if not to the code than at
>>> least to the man page.
>>> 
>>> For now, I'd like to look a bit longer into a possible change to the 
function.
>>> Peter mentioned a NSE way to fix the problem and you mentioned switch().
>>> 
>>> Originally, stopifnot() was only a few lines of code and meant to be
>>> "self-explaining" by just reading its definition, and I really would 
like
>>&

Re: [Rd] stopifnot() does not stop at first non-TRUE argument

2017-05-15 Thread Martin Maechler
> Hervé Pagès 
> on Wed, 3 May 2017 12:08:26 -0700 writes:

> On 05/03/2017 12:04 PM, Hervé Pagès wrote:
>> Not sure why the performance penalty of nonstandard evaluation would
>> be more of a concern here than for something like switch().

> which is actually a primitive. So it seems that there is at least
> another way to go than 'dots <- match.call(expand.dots=FALSE)$...'

> Thanks, H.

>> 
>> If that can't/won't be fixed, what about fixing the man page so it's
>> in sync with the current behavior?
>> 
>> Thanks, H.

Being back from vacations,...
I agree that something should be done here, if not to the code than at
least to the man page.

For now, I'd like to look a bit longer into a possible change to the function.
Peter mentioned a NSE way to fix the problem and you mentioned switch().

Originally, stopifnot() was only a few lines of code and meant to be
"self-explaining" by just reading its definition, and I really would like
to not walk too much away from that original idea.
How did you (Herve) think to use  switch()  here?



>> On 05/03/2017 02:26 AM, peter dalgaard wrote:
>>> The first line of stopifnot is
>>> 
>>> n <- length(ll <- list(...))
>>> 
>>> which takes ALL arguments and forms a list of them. This implies
>>> evaluation, so explains the effect that you see.
>>> 
>>> To do it differently, you would have to do something like
>>> 
>>> dots <- match.call(expand.dots=FALSE)$...
>>> 
>>> and then explicitly evaluate each argument in turn in the caller
>>> frame. This amount of nonstandard evaluation sounds like it would
>>> incur a performance penalty, which could be undesirable.
>>> 
>>> If you want to enforce the order of evaluation, there is always
>>> 
>>> stopifnot(A) stopifnot(B)
>>> 
>>> -pd
>>> 
 On 3 May 2017, at 02:50 , Hervé Pagès 
 wrote:
 
 Hi,
 
 It's surprising that stopifnot() keeps evaluating its arguments
 after it reaches the first one that is not TRUE:
 
 > stopifnot(3 == 5, as.integer(2^32), a <- 12) Error: 3 == 5 is
 not TRUE In addition: Warning message: In stopifnot(3 == 5,
 as.integer(2^32), a <- 12) : NAs introduced by coercion to integer
 range > a [1] 12
 
 The details section in its man page actually suggests that it
 should stop at the first non-TRUE argument:
 
 ‘stopifnot(A, B)’ is conceptually equivalent to
 
 { if(any(is.na(A)) || !all(A)) stop(...); if(any(is.na(B)) ||
 !all(B)) stop(...) }
 
 Best, H.
 
 --
 Hervé Pagès
 
 Program in Computational Biology Division of Public Health
 Sciences Fred Hutchinson Cancer Research Center 1100 Fairview
 Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024
 
 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax: (206)
 667-1319
 
 __
 R-devel@r-project.org mailing list
 
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel=DwIFaQ=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=JwgKhKD2k-9Kedeh6pqu-A8x6UEV0INrcxcSGVGo3Tg=f7IKJIhpRNJMC3rZAkuI6-MTdL3GAKSV2wK0boFN5HY=
 
>>> 
>> 

> -- Hervé Pagès

> Program in Computational Biology Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N,
> M1-B514 P.O. Box 19024 Seattle, WA 98109-1024

> E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax: (206)
> 667-1319

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] tempdir() may be deleted during long-running R session

2017-04-26 Thread Martin Maechler
> Dirk Eddelbuettel 
> on Wed, 26 Apr 2017 08:40:38 -0500 writes:

> On 26 April 2017 at 08:29, Duncan Murdoch wrote:
> | This seems like the wrong approach.  The problem occurs as soon as the 
> | tempdir() gets cleaned up:  there could be information in temp files 
> | that gets lost at that point.  So the solution should be to prevent the 
> | cleanup, not to continue on after it has occurred (as "check = TRUE" 
> | does).  This follows the principle that it's better for the process to 
> | always die than to sometimes silently produce incorrect results.

> That is generally true, but also "hard" as we don't have a handle on the 
OS
.

Indeed...
and that was the reason I've proposed the simple platform
agnostic tool which does not entirely solve the problem (in this sense I
agree with "wrong approach") but allows to mitigate it and (by
followup changes) to work around many use case problems.

> | Frederick posted the way to do this in systems using systemd.  We should

> While that was a very helpful post yet it may only apply to Arch Linux as
> stated.  My Ubuntu systems at home and work all run systemd too, but do 
_not_
> automatically remove tempfiles. 

> Yet what he suggested is quite right: we should define a proper config 
file
> for this facility and then possibly also use the /run directory as many 
other
> services now and (of course) also either TEMPDIR or later the code to have
> /run be another fallback if TMP, TEMP, TMPDIR, ... are unset.

> Distribution maintainers such as yours truly could then include this
> configuration.

> | be putting that in place, or the equivalent on systems using other 
> | tempfile cleanups.  This looks to me like something that "make install" 
> | should do, or perhaps it should be done by people putting together 
> | packages for specific systems.

> Doesn't 'make install' only write to $RHOME/ and below, plus $PREFIX/bin ?

Also, 'make install' is optional for good reasons.
E.g., I never ever run 'make install': I typically always have many R
versions, all available in the shell and ESS (Emacs Speaks
Statistics) via symbolic links into a directory on PATH.

Dirk mentioned (as well) that this is all very platform specific
which I do think is important. From my typical OS point of view:
  Why should the user who runs R not have the right to delete the
  tempdir which was created by the process that she runs and hence owns ?

I agree it would be an improvement if we made such deletion much
harder than it is now, and yes, there may be great (almost)
cross-platform tools available to manage this much better than
we do now, e.g., via open files.

Before we are there, I would find it useful to have a new
'tempdir' (i.e. folder/directory for R's temporary files) to be
re-created manually or automagically in those cases it has
disappeared, and that is within easy reach via the proposed
tempdir() functionality.

OTOH, I typically live very well by quickly killing
and restarting R (from inside ESS).  

The OP issue was to help newbies and computer-non-experts, the
latter nowadays comprising more than 90% of R users (I'd guess ~
98% looking at our otherwise smart students).

These are typically "slightly" confused when they ask for help and
get a pretty severe error message:

  > ?lm
  Error in file(out, "wt") : cannot open the connection
  In addition: Warning message:
  In file(out, "wt") :
cannot open file '/tmp/RtmpztK6f7/Rtxt36972b91938': No such file or 
directory


Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] tempdir() may be deleted during long-running R session

2017-04-26 Thread Martin Maechler
>   
> on Tue, 25 Apr 2017 21:13:59 -0700 writes:

> On Tue, Apr 25, 2017 at 02:41:58PM +, Cook, Malcolm wrote:
>> Might this combination serve the purpose: 
>> * R session keeps an open handle on the tempdir it creates, 
>> * whatever tempdir harvesting cron job the user has be made sensitive 
enough not to delete open files (including open directories)

I also agree that the above would be ideal - if possible.

> Good suggestion but doesn't work with the (increasingly popular)
> "Systemd":

> $ mkdir /tmp/somedir
> $ touch -d "12 days ago" /tmp/somedir/
> $ cd /tmp/somedir/
> $ sudo systemd-tmpfiles --clean
> $ ls /tmp/somedir/
> ls: cannot access '/tmp/somedir/': No such file or directory

Some thing like your example is what I'd expect is always a
possibility on some platforms, all of course depending on low
things such as  root/syadmin/...  "permission" to clean up etc.

Jeroeen mentioned the fact that tempdir()s also can disappear
for other reasons {his was multicore child processes
.. bugously(?) implemented}.
Further reasons may be race conditions / user code bugs / user
errors, etc.
Note that the R process which created the tempdir on startup
always has the permission to remove it again.  But you can also
think a full file system, etc.

Current  R-devel'stempdir(check = TRUE)   would create a new
one or give an error (and then the user should be able to use
Sys.setenv("TEMPDIR" ...)
to a directory she has write-permission )

Gabe's point of course is important too: If you have a long
running process that uses a tempfile,
and if  "big brother"  has removed the full tempdir() you will
be "unhappy" in any case.
Trying to prevent big brother from doing that in all cases seems
"not easy" in any case.

I did want to provide an easy solution to the OP situation:
Suddenly tmpdir() is gone, and quite a few things stop working
in the current R process {he mentioned  help(), e.g.}.
With new   tmpdir(check=TRUE)  facility, code could be changed
to replace

   tempfile("foo")

either by
   tempfile("foo", tmpdir=tempdir(check=TRUE))

or by something like

   tryCatch(tempfile("foo"),
 error = function(e)
tempfile("foo", tmpdir=tempdir(check=TRUE)))

or be even more sophisticated.

We could also consider allowing   check =  TRUE | NA | FALSE

and make  NA  the default and have that correspond to
check =TRUE  but additionally do the equivalent of
   warning("tempdir() has become invalid and been recreated")
in case the tempdir() had been invalid.

> I would advocate just changing 'tempfile()' so that it recreates the
> directory where the file is (the "dirname") before returning the file
> path. This would have fixed the issue I ran into. Changing 'tempdir()'
> to recreate the directory is another option.

In the end I had decided that

  tempfile("foo", tmpdir = tempdir(check = TRUE))

is actually better self-documenting than

  tempfile("foo", checkDir = TRUE)

which was my first inclination.

Note again that currently, the checking is _off_ by default.
I've just provided a tool -- which was relatively easy and
platform independent! --- to do more (real and thought)
experiments.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] tempdir() may be deleted during long-running R session

2017-04-25 Thread Martin Maechler
>>>>> Jeroen Ooms <jeroeno...@gmail.com>
>>>>> on Tue, 25 Apr 2017 15:05:51 +0200 writes:

    > On Tue, Apr 25, 2017 at 1:00 PM, Martin Maechler
> <maech...@stat.math.ethz.ch> wrote:
>> As I've found it is not at all hard to add an option
>> which checks the existence and if the directory is no
>> longer "valid", tries to recreate it (and if it fails
>> doing that it calls the famous R_Suicide(), as it does
>> when R starts up and tempdir() cannot be initialized
>> correctly).

> Perhaps this can also fix the problem with mcparallel
> deleting the tempdir() when one of its children dies:

   >   file.exists(tempdir()) #TRUE
   >   parallel::mcparallel(q('no'))
   >   file.exists(tempdir()) # FALSE

Thank you, Jeroen, for the extra example.

I now have comitted the new feature... (completely back
compatible: in R's code tempdir() is not yet called with an
argument and the default is  check = FALSE ),
actually in a "suicide-free" way ...  which needed only slightly
more code.

In the worst case, one could save the R session by
   Sys.setenv(TEMPDIR = "")
if for instance /tmp/ suddenly became unwritable for the user.

What we could consider is making the default of 'check' settable
by an option, and experiment with setting the option to TRUE, so
all such problems would be auto-solved (says the incurable optimist ...).

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] tempdir() may be deleted during long-running R session

2017-04-25 Thread Martin Maechler
> Dirk Eddelbuettel 
> on Sun, 23 Apr 2017 09:15:18 -0500 writes:

> On 21 April 2017 at 10:34, frede...@ofb.net wrote:
> | Hi Mikko,
> | 
> | I was bitten by this recently and I think some of the replies are
> | missing the point. As I understand it, the problem consists of these
> | elements:
> | 
> | 1. When R starts, it creates a directory like /tmp/RtmpVIeFj4
> | 
> | 2. Right after R starts I can create files in this directory with no
> |error
> | 
> | 3. After some hours or days I can no longer create files in this
> |directory, because it has been deleted

> Nope. That is local to your system. 

Correct.  OTOH, Mikko and Frederik have a point in my view (below).

> Witness eg at my workstation:

> /tmp$ ls -ltGd Rtmp* 
> drwx-- 3 edd 4096 Apr 21 16:12 Rtmp9K6bSN
> drwx-- 3 edd 4096 Apr 21 11:48 RtmpRRbaMP
> drwx-- 3 edd 4096 Apr 21 11:28 RtmpFlguFy
> drwx-- 3 edd 4096 Apr 20 13:06 RtmpWJDF3U
> drwx-- 3 edd 4096 Apr 18 15:58 RtmpY7ZIS1
> drwx-- 3 edd 4096 Apr 18 12:12 Rtmpzr9W0v
> drwx-- 2 edd 4096 Apr 16 16:02 RtmpeD27El
> drwx-- 2 edd 4096 Apr 16 15:57 Rtmp572FHk
> drwx-- 3 edd 4096 Apr 13 11:08 RtmpqP0JSf
> drwx-- 3 edd 4096 Apr 10 18:47 RtmpzRzyFb
> drwx-- 3 edd 4096 Apr  6 15:21 RtmpQhvAUb
> drwx-- 3 edd 4096 Apr  6 11:24 Rtmp2lFKPz
> drwx-- 3 edd 4096 Apr  5 20:57 RtmprCeWUS
> drwx-- 2 edd 4096 Apr  3 15:12 Rtmp8xviDl
> drwx-- 3 edd 4096 Mar 30 16:50 Rtmp8w9n5h
> drwx-- 3 edd 4096 Mar 28 11:33 RtmpjAg6iY
> drwx-- 2 edd 4096 Mar 28 09:26 RtmpYHSgZG
> drwx-- 2 edd 4096 Mar 27 11:21 Rtmp0gSV4e
> drwx-- 2 edd 4096 Mar 27 11:21 RtmpOnneiY
> drwx-- 2 edd 4096 Mar 27 11:17 RtmpIWeiTJ
> drwx-- 3 edd 4096 Mar 22 08:51 RtmpJkVsSJ
> drwx-- 3 edd 4096 Mar 21 10:33 Rtmp9a5KxL
> /tmp$ 

> Clearly still there after a month. I tend to have some longer-running R
> sessions in either Emacs/ESS or RStudio.

> So what I wrote in my last message here *clearly* applies to you: a local
> issue for which you have to take local action as R cannot know.  You also
> have a choice of setting variables to affect this.

Thank you Dirk (and Brian).  That is all true, and of course I
have known about this myself "forever" as well.
 
> | If R expected the directory to be deleted at random, and if we expect
> | users to call dir.create every time they access tempdir, then why did
> | R create the directory for us at the beginning of the session? That's
> | just setting people up to get weird bugs, which only appear in
> | difficult-to-reproduce situations (i.e. after the session has been
> | open for a long time).

> I disagree. R has been doing this many years, possibly two decades.

Yes, R has been doing this for a long time, including all the
configuration options with environment variables, and yes this
is sufficient "in principle".
 
> | I think before we dismiss this we should think about possible in-R
> | solutions and why they are not feasible. 

Here Mikko and Frederik do have a point I think.

> | Are there any packages which
> | would break if a call to 'tempdir' automatically recreated this
> | directory? (Or would it be too much of a performance hit to have
> | 'tempdir' check and even just issue a warning when the directory is
> | found not to exist?)

> | Should we have a timer which periodically updates
> | the modification time of tempdir()? What do other long-running
> | programs do (e.g. screen, emacs)?

Valid questions, in my view.  Before answering, let's try to see
how hard it would be to make the tempdir() function in R more versatile.

As I've found it is not at all hard to add an option which
checks the existence and if the directory is no longer "valid",
tries to recreate it (and if it fails doing that it calls the
famous R_Suicide(), as it does when R starts up and tempdir()
cannot be initialized correctly).

The proposed entry in NEWS is

   • tempdir(check=TRUE) recreates the tmpdir() if it is no longer valid.

and of course the default would be status quo, i.e.,  check = FALSE,
and once this is in R-devel, we (those who install R-devel) can
experiment with it.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] c() documentation after change; 'recursive' in "base" methods

2017-04-20 Thread Martin Maechler
> Suharto Anggono Suharto Anggono via R-devel 
> on Wed, 19 Apr 2017 22:50:41 + writes:

> In R 3.4.0 RC, argument list of 'c' as S4 generic function has become
> (x, ...) .
> However, "S4 methods" section in documentation of 'c' (c.Rd) is not 
updated yet.

Thank you, I've committed a change (72564 & 72565).

> Also, in R 3.4.0 RC, 'c' method of class "Date" ('c.Date') is still not 
explicitly documented.

yes, but that's true for other S3 methods, see below.

This is a bigger issue.  Thank you for raising it!  Look at

 R code --

(mc <- methods("c"))
## [1] c.bibentry*   c.Datec.difftimec.noquote 
c.numeric_version
## [6] c.person* c.POSIXct c.POSIXlt c.warnings
## and from `lcNSnm` below, you can see that these are from 'base',
## apart from {bibentry, person} which are from 'utils'
lc <- lapply(mc, function(nm) { f <- getAnywhere(nm) })
names(lc) <- sapply(lc, `[[`, "name")
str(lcwh <- lapply(lc, `[[`, "where"))
lcNSnm <- sub("^namespace:", '', sapply(lcwh, function(v) v[length(v)]))
lcNS <- lapply(lcNSnm, asNamespace)
str(lcMeths <-
sapply(names(lcNS), function(n) get(n, envir=lcNS[[n]], inherits=FALSE),
simplify = FALSE))
## $ c.bibentry   :function (..., recursive = FALSE)
## $ c.Date   :function (..., recursive = FALSE)
## $ c.difftime   :function (..., recursive = FALSE)
## $ c.noquote:function (..., recursive = FALSE)
## $ c.numeric_version:function (..., recursive = FALSE)
## $ c.person :function (..., recursive = FALSE)
## $ c.POSIXct:function (..., recursive = FALSE)
## $ c.POSIXlt:function (..., recursive = FALSE)
## $ c.warnings   :function (..., recursive = FALSE)

 .. --

and from these, only the 'noquote' method has a "\usage{ . }"
documentation.

The reason actually is that I had *wanted* to consider
__removing__ the 'recursive' argument from most of these S3 methods,
since all but  c.numeric_version()  completely disregard it and
it would be nicer if they did not have it.

HOWEVER, if it is removed and a user / code has

val <- c(, recursive = r)

then 'recursive' will become part of 'val' which is not desirable.

I had never thought more about this and if we should try or not to
remove it from the S3 methods in all those cases it is unused
... hoping that callers would also *not* set it.

As _one_ consequence I had decided rather *not* documenting it
for the S3 methods where it is (still ?!) part.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] "table(droplevels(aq)$Month)" in manual page of droplevels

2017-04-13 Thread Martin Maechler
>>>>> Rui Barradas <ruipbarra...@sapo.pt>
>>>>> on Wed, 12 Apr 2017 17:07:45 +0100 writes:

> Hello, Inline.

> Em 12-04-2017 16:40, Henric Winell escreveu:
>> (Let's keep the discussion on-list -- I've added back
>> R-devel.)
>> 
>> On 2017-04-12 16:39, Ulrich Windl wrote:
>> 
>>>>> Henric Winell <nilsson.hen...@gmail.com> schrieb am
>> 12.04.2017
>>>>> um 15:35 in
>>> Nachricht
>>> <b66fe849-bb8d-f00d-87e5-553f866d5...@gmail.com>:
>>>> On 2017-04-12 14:40, Ulrich Windl wrote:
>>>> 
>>>>> The last line of the example in droplevels' manual
>>>>> page seems to be incorrect to me. I think it should
>>>>> read: "table(droplevels(aq$Month))". Amazingly (I
>>>>> don't understand) both variants seem to produce the
>>>>> same result (R 3.3.3): ---
>>>> 
>>>> The manual says that "The function 'droplevels' is used
>>>> to drop unused levels from a 'factor' or, more
>>>> commonly, from factors in a data frame." and, as
>>>> documented, the 'droplevels' generic has methods for
>>>> objects of class "data.frame" and "factor".  So, your
>>>> being amazed is a bit surprising given that 'aq' is a
>>>> data frame.
>>> 
>>> The "surprising" thing is the syntax: I was unaware that
>>> '$' is a generic operator that can be applied to the
>>> result of a function (i.e.: droplevels); I thought it's
>>> kind of a special variable syntax.
>> 
>> Then your surprise is unrelated to the use of
>> 'droplevels'.
>> 
>> Since the 'droplevels' method for objects of class
>> "data.frame" returns a data frame, the extraction
>> operator '$' works directly on the resulting object.  So,
>> 'droplevels(aq)$Month' is essentially the same as
>> 
>> aq <- droplevels(aq) aq$Month
>> 
>> > Isn't there also the syntax
>> ``droplevels(aq)["Month"]''?
>> 
>> Sure, and there are even more ways to do subsetting.  But
>> this is basic stuff and therefore off-topic for R-devel.
>> Please see the manual (?Extract) or, e.g., Chapter 3 of
>> Hadley Wickham's "Advanced R".

> But note that droplevels(aq)["Month"] and
> droplevels(aq)$Month are _not_ the same. The first returns
> a data.frame (with just one vector), the latter returns a
> vector. To return just a vector you could also use

> droplevels(aq)[["Month"]]

> which is preferable for programming, by the way. The '$'
> operator should be reserved for interactive use only.

> Hope this helps,

Indeed, we hope..  Thanks to the helpers!

Ulrich, please note that in the end this was all  because you're
still learning to understand R (e.g., data frames !) better.

As such this was completely inappropriate for R-devel and should
have gotten to the R help list  R-help.

With regards,
Martin Maechler, ETH Zurich

> Rui Barradas
>> 
>> 
>> Henric Winell
>>> 
>>> Regards, Ulrich
>>> 
>>>> 
>>>> 
>>>> Henric Winell
>>>> 
>>>> 
>>>> 
>>>>> aq <- transform(airquality, Month = factor(Month, labels =
>>>>> month.abb[5:9])) aq <- subset(aq, Month != "Jul")
>>>>> table(aq$Month)
>>>>> 
>>>>> May Jun Jul Aug Sep 31 30 0 31 30
>>>>> table(droplevels(aq)$Month)
>>>>> 
>>>>> May Jun Aug Sep 31 30 31 30
>>>>> table(droplevels(aq$Month))
>>>>> 
>>>>> May Jun Aug Sep 31 30 31 30
>>>>>> 
>>>>> --- For the sake of learners, try to keep the examples
>>>>> simple and useful, even though you experts want to
>>>>> impress the newbees...
>>>>> 
>>>>> Ulrich
>>>>> 
>>>>> __
>>>>> R-devel@r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bug report: POSIX regular expression doesn't match for somewhat higher values of upper bound

2017-04-05 Thread Martin Maechler
>   
> on Tue, 4 Apr 2017 08:45:30 + writes:

> Dear Sirs,
> while

>> regexpr('(.{1,2})\\1', 'foo')
> [1] 2
> attr(,"match.length")
> [1] 2
> attr(,"useBytes")
> [1] TRUE

> yields the correct match, an incremented upper bound in

>> regexpr('(.{1,3})\\1', 'foo')
> [1] -1
> attr(,"match.length")
> [1] -1
> attr(,"useBytes")
> [1] TRUE

> incorrectly yields no match.

Hmm, yes, I would also say that this is incorrect
(though I'm always cautious: The  ?regex  help page explicitly
 mentions greedy repetitions, and these can "bite you" ..)

The behavior is also different from the  perl=TRUE one which is
correct (according to the above understanding).

Using  grep() instead of regexpr() makes the behavior easier to parse.
The following code 
--

tx <- c("ab","abc", paste0("foo", c("", "b", "o", "bar", "oofy")))
setNames(nchar(tx), tx)
## ab abc foofoobfooo  foobar ffy
##  2   3   3   4   4   6   7

grep1r <- function(n, txt, ...) {
pattern <- paste0('(.{1,',n,'})\\1', collapse="") ## can have empty n
ans <- grep(pattern, txt, value=TRUE, ...)
cat(sprintf("pattern '%s' : ", pattern)); print(ans, quote=FALSE)
invisible(ans)
}

grep1r({}, tx)# '.{1,}' : because of _greedy_ matching there is __no__ 
repetiion!
grep1r(100,tx)# i.e., these both give an empty match :  character(0)

## matching at most once:
grep1r(1, tx)# matches all 5 starting with "foo"
grep1r(2, tx)# ditto: all have more than 2 chars
grep1r(3, tx)# not "foo": those with more than 3 chars
grep1r(4, tx)# .. those with more than 4 characters
grep1r(5, tx)# .. those with more than 5 characters
grep1r(6, tx)# .. those with more than 6 characters
grep1r(7, tx)# NONE (= those with more than 7 characters)

for(p in c(FALSE,TRUE)) {
cat("\ngrep(*, perl =", p, ") :\n")
for(n in c(list(NULL), 1:7))
grep1r(n, tx, perl = p)
}

--

ends with

> for(p in c(FALSE,TRUE)) {
+ cat("\ngrep(*, perl =", p, ") :\n")
+ for(n in c(list(NULL), 1:7))
+ grep1r(n, tx, perl = p)
+ }

grep(*, perl = FALSE ) :
pattern '(.{1,})\1' : character(0)
pattern '(.{1,1})\1' : [1] foo foobfooofoobar  ffy
pattern '(.{1,2})\1' : [1] foo foobfooofoobar  ffy
pattern '(.{1,3})\1' : [1] foobfooofoobar  ffy
pattern '(.{1,4})\1' : [1] foobar  ffy
pattern '(.{1,5})\1' : [1] foobar  ffy
pattern '(.{1,6})\1' : [1] ffy
pattern '(.{1,7})\1' : character(0)

grep(*, perl = TRUE ) :
pattern '(.{1,})\1' : [1] foo foobfooofoobar  ffy
pattern '(.{1,1})\1' : [1] foo foobfooofoobar  ffy
pattern '(.{1,2})\1' : [1] foo foobfooofoobar  ffy
pattern '(.{1,3})\1' : [1] foo foobfooofoobar  ffy
pattern '(.{1,4})\1' : [1] foo foobfooofoobar  ffy
pattern '(.{1,5})\1' : [1] foo foobfooofoobar  ffy
pattern '(.{1,6})\1' : [1] foo foobfooofoobar  ffy
pattern '(.{1,7})\1' : [1] foo foobfooofoobar  ffy
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Very hard to reproduce bug (?) in R-devel

2017-04-05 Thread Martin Maechler
> Winston Chang 
> on Tue, 4 Apr 2017 15:29:40 -0500 writes:

> I've done some more investigation into the problem, and it is very
> difficult to pin down. What it looks like is happening is roughly like 
this:
> - `p` is an environment and `p$e` is also an environment.
> - There is a loop. In each iteration, it looks for one item in `p$e`, 
saves
> it in a variable `x`, then removes that item from `p$e`. Then it invokes
> `x()`. The loop runs again, until there are no more items in `p$e`.

> The problem is that `ls(p$e)` sometimes returns the wrong values -- it
> returns the values that it had in previous iterations of the loop. The
> behavior is very touchy. Almost any change to the code will slightly 
change
> the behavior; sometimes the `ls()` returns values from a different
> iteration of the loop, and sometimes the problem doesn't happen at all.

> I've put a  Dockerfile and instructions for reproducing the problem here:
> https://gist.github.com/wch/2596a1c9f1bcdee91bb210c782141c88

> I think that I've gotten about as far with this as I can, though I'd be
> happy to provide more information if anyone wants to take look at the
> problem.

Dear Winston,

While I agree this may very well be a bug in R(-devel), and hence
also R in 3.4.0 alpha and hence quite important to be dealt with,

your code still involves 3 non-trivial  packages (DBI, R6,
testthat) some of which have their own C code and notably load
a couple of other package's namespaces.
We've always made a point
  https://www.r-project.org/bugs.html
that bugs in R should be reproducible without extra
packages... and I think it would definitely help to pinpoint the
issue to be seen outside of your extra packages' world. 

Or have you been aware of that and are just asking for help
finding a bug in one of the extra packages involved, a bug that might only be 
triggered by recent changes in R ?

OTOH, what you describe above  (p ; p$e ; p$e$x ...)
should be reproducible in pure "base" R code, right?

I'm sorry not to be of more help
Martin

> -Winston

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Potential bug in utils::citation()

2017-04-04 Thread Martin Maechler
>>>>> Martin Maechler <maech...@stat.math.ethz.ch>
>>>>> on Mon, 3 Apr 2017 10:22:52 +0200 writes:

>>>>> Zhian Kamvar <zkam...@gmail.com>
>>>>> on Sun, 2 Apr 2017 16:26:37 -0500 writes:

>> Hi, I believe the function utils::citation() will fail if
>> the package specified has two or more citation entries in
>> the current R-devel. The following error is issued:

>> 'missing' can only be used for arguments

>> I have created a working example on github [0] that is
>> build using R-devel on travis-ci [1]. Jim Hester has
>> potentially identified [2] the source of the problem as
>> being from a commit on the 27th [3, 4]. I do not have
>> R-devel built on my machine, but I believe this error can
>> be reproduced on the current R-devel with:

>> if (require("boot") & require("utils"))
>> utils::citation("boot")

> Correct: it does reproduce the new bug 
> and that is due to a change by me, and I had started investigation
> on Friday (but not with your package and not having seen a
> straighforward example yet).

> This will be fixed ASAP, i.e., within hours.

In the end, it took two dozens of hours. The change is
r72478 | maechler | 2017-04-04 11:41:51 +0200 

Martin


>> Background:

>> My package poppr suddenly started failing check on R-devel
>> during a weekly travis-ci job [5] due to the error
>> above. Another package of mine, ezec, passed [6]. Both
>> contain calls to utils::citation() within the vignettes,
>> but poppr has two citations and ezec only has one (called
>> from another package).

>> Thanks, Zhian

>> [0]: https://github.com/zkamvar/citest [1]:
>> https://travis-ci.org/zkamvar/citest/jobs/217874351 [2]:
>> 
https://github.com/wch/r-source/commit/7890e9e87d44f85ab76c0e786036a191eacd71d1
>> [3]: https://svn.r-project.org/R/trunk@72419 [4]:
>> 
https://github.com/wch/r-source/commit/7890e9e87d44f85ab76c0e786036a191eacd71d1
>> [5]:
>> https://travis-ci.org/grunwaldlab/poppr/jobs/216452458
>> [6]: https://travis-ci.org/grunwaldlab/ezec/jobs/216452916

>> -
>> Zhian N. Kamvar, Ph. D.  Postdoctoral Researcher (Everhart
>> Lab) Department of Plant Pathology University of
>> Nebraska-Lincoln

>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] complex NA's match(), etc: not back-compatible change proposal

2017-04-03 Thread Martin Maechler
> Suharto Anggono Suharto Anggono via R-devel 
> on Sat, 1 Apr 2017 14:10:06 + writes:

> I am raising this again.

> With
> z <- complex(real = c(0,NaN,NaN), imaginary = c(NA,NA,0)) ,
> results of
> sapply(z, match, table = z)
> and
> match(z, z)
> are different in R 3.4.0 alpha. I think they should be the same.

> I suggest changing 'cequal' in unique.c such that a
> complex number that has both NA and NaN matches NA and
> doesn't match NaN, as such complex number is printed as NA.

Thank you very much, Suharto, for the reminder.

I have committed a change to R-devel yesterday, though
your suggestion above had not been 100% clear to me.

What I think we want and I decided to commit
  r72473 | maechler | 2017-04-02 22:23:56 +0200 (Sun, 02 Apr 2017)

was to entirely mimic how R format()s and prints() complex numbers:

1) If a complex number has a real or imaginary which is NA then
   it is formatted / printed as "NA"
   ==>  All such complex numbers should match()
   i.e. match(), unique(), duplicated() treat such complex
   numbers as "the same".

2) The picture is very different with (non-NA)  NaN:
   There, R formats and prints  NaN+1i  or NaN+99i  or 0+1i*NaN
   differently, and [in R-devel only, planned in R 3.4.0 alpha
   in a day or two!]
   match(), unique(), duplicated() now treat them as different.

The change is more consistent notably does give the same result

for   match(z,z)
and   sapply(z, match, table = z)  

for a variety of z (permutations).

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Potential bug in utils::citation()

2017-04-03 Thread Martin Maechler
>>>>> Zhian Kamvar <zkam...@gmail.com>
>>>>> on Sun, 2 Apr 2017 16:26:37 -0500 writes:

> Hi, I believe the function utils::citation() will fail if
> the package specified has two or more citation entries in
> the current R-devel. The following error is issued:

> 'missing' can only be used for arguments

> I have created a working example on github [0] that is
> build using R-devel on travis-ci [1]. Jim Hester has
> potentially identified [2] the source of the problem as
> being from a commit on the 27th [3, 4]. I do not have
> R-devel built on my machine, but I believe this error can
> be reproduced on the current R-devel with:

> if (require("boot") & require("utils"))
>utils::citation("boot")

Correct: it does reproduce the new bug 
and that is due to a change by me, and I had started investigation
on Friday (but not with your package and not having seen a
straighforward example yet).

This will be fixed ASAP, i.e., within hours.
Martin Maechler

> Background:

> My package poppr suddenly started failing check on R-devel
> during a weekly travis-ci job [5] due to the error
> above. Another package of mine, ezec, passed [6]. Both
> contain calls to utils::citation() within the vignettes,
> but poppr has two citations and ezec only has one (called
> from another package).

> Thanks, Zhian

> [0]: https://github.com/zkamvar/citest [1]:
> https://travis-ci.org/zkamvar/citest/jobs/217874351 [2]:
> 
https://github.com/wch/r-source/commit/7890e9e87d44f85ab76c0e786036a191eacd71d1
> [3]: https://svn.r-project.org/R/trunk@72419 [4]:
> 
https://github.com/wch/r-source/commit/7890e9e87d44f85ab76c0e786036a191eacd71d1
> [5]:
> https://travis-ci.org/grunwaldlab/poppr/jobs/216452458
> [6]: https://travis-ci.org/grunwaldlab/ezec/jobs/216452916

> -
> Zhian N. Kamvar, Ph. D.  Postdoctoral Researcher (Everhart
> Lab) Department of Plant Pathology University of
> Nebraska-Lincoln

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] `[` not recognized as a primitive in certain cases.

2017-03-29 Thread Martin Maechler
> Joris Meys 
> on Tue, 28 Mar 2017 15:19:14 +0200 writes:

> Thank you gents, I overlooked the subtle differences.

> On Tue, Mar 28, 2017 at 2:49 PM, Lukas Stadler 
> wrote:

>> “typeof” is your friend here:
>> 
>> > typeof(`[`)
>> [1] "special"
>> > typeof(mc[[1]])
>> [1] "symbol"
>> > typeof(mc2[[1]])
>> [1] "special"
>> 
>> so mc[[1]] is a symbol, and thus not a primitive.

or  str()  which should be better known to Joe Average useR

> mc <- call("[",iris,2,"Species")
> str(mc[[1]])
 symbol [
> str(`[`)
.Primitive("[") 
> 


>> - Lukas
>> 
>> > On 28 Mar 2017, at 14:46, Michael Lawrence 
>> wrote:
>> >
>> > There is a difference between the symbol and the function (primitive
>> > or closure) to which it is bound.
>> >
>> > This:
>> > mc2 <- as.call(list(`[`,iris,2,"Species"))
>> >
>> > Evaluates `[` to its value, in this case the primitive object, and the
>> > primitive itself is incorporated into the returned call.
>> >
>> > If you were to do this:
>> > mc2 <- as.call(list(quote(`[`),iris,2,"Species"))
>> >
>> > The `[` would _not_ be evaluated, quote() would return the symbol, and
>> > the symbol would end up in the call.
>> >
>> > The two forms have virtually identical behavior as long as the call
>> > ends up getting evaluated in the same environment.
>> >
>> > On Tue, Mar 28, 2017 at 3:03 AM, Joris Meys  
wrote:
>> >> Dear,
>> >>
>> >> I have noticed this problem while looking at the following question on
>> >> Stackoverflow :
>> >>
>> >> http://stackoverflow.com/questions/42894213/s4-class-
>> subset-inheritance-with-additional-arguments
>> >>
>> >> While going through callNextMethod, I've noticed the following odd
>> >> behaviour:
>> >>
>> >> mc <- call("[",iris,2,"Species")
>> >>
>> >> mc[[1]]
>> >> ## `[`
>> >>
>> >> is.primitive(`[`)
>> >> ## [1] TRUE
>> >>
>> >> is.primitive(mc[[1]])
>> >> ## [1] FALSE
>> >> # Expected to be TRUE
>> >>
>> >> mc2 <- as.call(list(`[`,iris,2,"Species"))
>> >>
>> >> is.primitive(mc2[[1]])
>> >> ## [1] TRUE
>> >>
>> >> So depending on how I construct the call (using call() or as.call() ),
>> the
>> >> function `[` is or is not recognized as a primitive by is.primitive()
>> >>
>> >> The behaviour is counterintuitive and -unless I miss something obvious
>> >> here- likely to be a bug imho. I immediately admit that my C chops
>> aren't
>> >> sufficient to come up with a patch.
>> >>
>> >> Cheers
>> >> Joris
>> >>
>> >> --
>> >> Joris Meys
>> >> Statistical consultant
>> >>
>> >> Ghent University
>> >> Faculty of Bioscience Engineering
>> >> Department of Mathematical Modelling, Statistics and Bio-Informatics
>> >>
>> >> tel :  +32 (0)9 264 61 79
>> >> joris.m...@ugent.be
>> >> ---
>> >> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>> >>
>> >>[[alternative HTML version deleted]]
>> >>
>> >> __
>> >> R-devel@r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-devel
>> >
>> > __
>> > R-devel@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
>> 


> -- 
> Joris Meys
> Statistical consultant

> Ghent University
> Faculty of Bioscience Engineering
> Department of Mathematical Modelling, Statistics and Bio-Informatics

> tel :  +32 (0)9 264 61 79
> joris.m...@ugent.be
> ---
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

> [[alternative HTML version deleted]]

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [R-pkg-devel] multiple bibentry()s in CITATION

2017-03-27 Thread Martin Maechler
>>>>> Fox, John <j...@mcmaster.ca>
>>>>> on Mon, 16 Jan 2017 15:44:05 + writes:

> Dear Martin,
> Thanks for addressing this question, if belatedly!

> After a little bit of thought, perhaps a default somewhere between 1 and 
Inf makes sense, along with an additional argument to citation: 
citation(package="pkg", bibtex.max=n), with default bibtex.max= 
getOption("citation.bibtex.max"), where the citation.bibtex.max option is 
initially set to something like 4. If the number of available citations exceeds 
bibtex.max, then a message like "there are additional BiBTeX citations, enter 
'citation(package="pkg", bibtex.max=Inf)' to see all of them."

In the mean time, I have always used my proposed change.
I think any number between 1 and Inf is so much arbitrary that
inspite of your good thoughts I kept the *new* default at Inf.

and because of this open question, I have forgotten to commit
the change to the development version of R !

I have done so now, however not ported it yet to  "R 3.4.0 alpha".
If not much surfaces (in CRAN / Bioc checks), we may port it in
time for 3.4.0.


Martin

> Best,
> John

>> -Original Message-
>> From: Martin Maechler [mailto:maech...@stat.math.ethz.ch]
>> Sent: Monday, January 16, 2017 10:02 AM
>> To: Fox, John <j...@mcmaster.ca>
>> Cc: r-package-devel@r-project.org
>> Subject: Re: [R-pkg-devel] multiple bibentry()s in CITATION
>> 
>> >>>>> Fox, John <j...@mcmaster.ca>
>> >>>>> on Fri, 2 Sep 2016 15:42:46 + writes:
>> 
>> (which is more than 4 months ago)
>> 
>> > Dear list members,
>> > I've noticed that citation(package="pkg") generates both a text
>> citation and a BiBTeX entry when the CITATION file contains a single
>> call to bibentry() or citEntry(), but that only text citations are shown
>> if there are multiple calls to bibentry() or citEntry().
>> 
>> > Is this behaviour intentional? In my opinion, it's useful always
>> to show the BiBTeX (although it's available through
>> toBibtex(citation(package="pkg")) ).
>> 
>> > The Writing R Extensions manual says, "A CITATION file will
>> contain *calls* [my emphasis] to function bibentry."
>> 
>> > Thanks,
>> > John
>> 
>> and you did not get a reply
>> I had wanted but forgotten about it ... two parts :
>> 
>> 1)  On November 24, 2012,  I had improved R with an option to get this
>> so this has been a "hidden gem" ;-) for a while in R:
>> 
>> > options(citation.bibtex.max = Inf)
>> > citation(package = "Rcmdr")
>> 
>> To cite the 'Rcmdr' package in publications use:
>> 
>> Fox, J., and Bouchet-Valat, M. (2017). Rcmdr: R Commander. R package
>> version 2.3-2.
>> 
>> A BibTeX entry for LaTeX users is
>> 
>> @Manual{,
>> title = {{Rcmdr: R Commander}},
>> author = {John Fox and Milan Bouchet-Valat},
>> year = {2017},
>> note = {R package version 2.3-2},
>> url = {http://socserv.socsci.mcmaster.ca/jfox/Misc/Rcmdr/},
>> }
>> 
>> Fox, J. (2017). Using the R Commander: A Point-and-Click Interface or
>> R. Boca Raton FL:
>> Chapman and Hall/CRC Press.
>> 
>> A BibTeX entry for LaTeX users is
>> 
>> @Book{,
>> title = {Using the {R Commander}: A Point-and-Click Interface for
>> {R}},
>> author = {John Fox},
>> year = {2017},
>> publisher = {Chapman and Hall/CRC Press},
>> address = {Boca Raton {FL}},
>> url = {http://socserv.mcmaster.ca/jfox/Books/RCommander/},
>> }
>> 
>> Fox, J. (2005). The R Commander: A Basic Statistics Graphical User
>> Interface to R.
>> Journal of Statistical Software, 14(9): 1--42.
>> 
>> A BibTeX entry for LaTeX users is
>> 
>> @Article{,
>> title = {The {R} {C}ommander: A Basic Statistics Graphical User
>> Interface to {R}},
>> author = {John Fox},
>> year = {2005},
>> journal = {Journal of Statistical Software},
>> volume = {14},
>> number = {9},
>> pages = {1--42},
>> url = {http://www.jstatsoft.org/v14/i09},
>> }
>> 
>> >
>> 
>> 
 

Re: [Rd] Documentation of model.frame() and get_all_vars()

2017-03-27 Thread Martin Maechler
ngths differ (found for '(new)')

> But, maybe that's something for the "Details" section? (Or it's a bug
> - I don't really know.)

I would not want to change model.frame.default() currently as it's
too important a building block and it may be wise to require
that its callers should have done recycling.

> Thanks in advance for your consideration.

Thank you Thomas for the suggested help file improvements!
Martin 

--
Martin Maechler
ETH Zurich

> Best,
> -Thomas

> Thomas J. Leeper
> http://www.thomasleeper.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Error in documentation for ?legend

2017-03-27 Thread Martin Maechler
>>>>> POLITZER-AHLES, Stephen [CBS] <stephen.politzerah...@polyu.edu.hk>
>>>>> on Sat, 25 Mar 2017 13:25:32 + writes:

> Right, that's my point. The help page mentions a
> `title.cex`, like I said; saying that `cex` sets the
> default `title.cex` sure implies to me (and presumably to
> the other people whose discussion I linked) that a
> `title.cex` parameter exists. Since no such parameter
> exists, this bit in the documentation is misleading
> (suggesting that there is a `title.cex` parameter which
> can be set, when there really isn't). Regardless of
> whether we call it an "oddity" or what, I don't think it's
> controversial that this is misleading. If it's misleading,
> shouldn't it be removed?

Yes.
I've done so now,  thank you for the report!

(You did not understand Peter:  He *did* agree with you that
 there's no 'title.cex' argument  and explained why the oddity
 probably has happened in the distant past ..)

Martin Maechler
ETH Zurich
and R Core Team (as Peter Dalgaard)

> From: peter dalgaard <pda...@gmail.com>
> Sent: Saturday, March 25, 2017 9:10:57 PM
> To: POLITZER-AHLES, Stephen [CBS]
> Cc: r-devel@r-project.org
> Subject: Re: [Rd] Error in documentation for ?legend


>> On 25 Mar 2017, at 00:39 , POLITZER-AHLES, Stephen [CBS] 
<stephen.politzerah...@polyu.edu.hk> wrote:
>> 
>> To whom it may concern:
>> 
>> 
>> The help page for ?legend refers to a `title.cex` parameter, which 
suggests that the function has such a parameter.

> No it does not. All arguments are listed and documented, none of them is 
title.cex, and there's no "...".

> However, the documentation for "cex" has this oddity inside:

> cex: character expansion factor *relative* to current
> �par("cex")�.  Used for text, and provides the default for
> �pt.cex� and �title.cex�.

> Checking the sources suggests that this is the last anyone has seen of 
title.cex:

> pd$ grep -r title.cex src
> src/library/graphics/man/legend.Rd:\code{pt.cex} and 
\code{title.cex}.}
> pd$

> The text was inserted as part of the addition of the title.col (!) 
argument, so it looks like the author got some wires crossed.

> -pd

>> As far as I can tell, though, it doesn't; here's an example:
>> 
>> 
>>> plot(1,1)
>>> legend("topright",pch=1, legend="something", title="my legend", 
title.cex=2)
>> Error in legend("topright", pch = 1, legend = "something", title = "my 
legend",  :
>> unused argument (title.cex = 2)
>> 
>> 
>> This issue appears to have been discussed online before (e.g. here's a 
post from 2011 mentioning it: 
http://r.789695.n4.nabble.com/Change-the-text-size-of-the-title-in-a-legend-of-a-R-plot-td3482880.html)
 but I'm not sure if anyone ever reported it to R developers.
>> 
>> 
>> Is it possible for someone to update the ?legend documentation page so 
that it doens't refer to a parameter that isn't usable?
>> 
>> Best,
>> 
>> Steve Politzer-Ahles
>> 
>> ---
>> Stephen Politzer-Ahles
>> The Hong Kong Polytechnic University
>> Department of Chinese and Bilingual Studies
>> http://www.mypolyuweb.hk/~sjpolit/<http://www.mypolyuweb.hk/%7Esjpolit/>
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] A question on stats::as.hclust.dendrogram

2017-03-24 Thread Martin Maechler
>>>>> Ma,Man Chun John <m...@mdanderson.org>
>>>>> on Thu, 23 Mar 2017 19:29:25 + writes:

> Hi all,
> This is the first time I'm writing to R-devel, and this time I'm just 
asking for the purpose for a certain line of code in 
stats::as.hclust.dendrogram, which comes up as I'm trying to fix dendextend.

"fix": where is it broken?
Do you mean the fact that in R <= 3.3.3, it is defined via
recursion and hence infeasible for "deep" dendrograms?

In any case, note that  NEWS  for the upcoming version of R,
R 3.4.0  contains 

• The str() and as.hclust() methods for "dendrogram" now also work
  for deeply nested dendrograms thanks to non-recursive
  implementations by Bradley Broom.

so the source code of  as.hclust.dendrogram  has been changed
substantially already.

Note that you **NEVER** see the "real" source code of function
by printing it to the console.
The source code is in the source of the corresponding package,
in the case of 'stats', as part of the source code of R.

I.e., here,
 https://svn.r-project.org/R/trunk/src/library/stats/R/dendrogram.R


I think the following question has become irrelevant now,
but yes, dendrograms *are* implemented as nested lists.

Martin Maechler
ETH Zurich and R core team


> The line in question is at line 128 of dendrogram.R in R-3.3.3, at 
stats::as.hclust.dendrogram:

> stopifnot(length(s) == 2L, all( vapply(s, is.integer, NA) ))

> Is there any legitimate possibility that s is a nested list? Currently I 
have a case where a dendrogram object is breaks at this line, because s is a 
nested list:

>> str (s)
> List of 2
> $ : int -779
> $ :List of 2
> ..$ : int -625
> ..$ : int 15

> I'm unsure if my dendrogram was malformed in the first place, since I was 
trying to use dendrapply.

> So, my question is: for that particular check, why use

> stopifnot(length(s) == 2L, all( vapply(s, is.integer, NA) ))

> instead of

> stopifnot(length(s) == 2L, all( vapply(unlist(s), is.integer, NA) ))?

> I appreciate your time and I'm looking forward to your response.

> Cheers,

> Man Chun John Ma, PhD
> Postdoctoral Fellow
> Unit 0903
> Dept Lymphoma & Myeloma Research
> 1515 Holcombe Blvd.
> Houston, TX 77030
> m...@mdanderson.org

> The information contained in this e-mail message may be ...{{dropped:14}}

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Bioc-devel] The story of tracing a derfinder bug on OSX that sometimes popped up, sometimes it didn't. Related to IRanges/S4Vectors '$<-'

2017-03-22 Thread Martin Maechler
> Andrzej Oleś 
> on Wed, 22 Mar 2017 10:29:57 +0100 writes:

> Just for the record, on R-3.3.2 Herve's code fails with the following 
error:
> Error in x[TRUE] <- new("A") :
> incompatible types (from S4 to logical) in subassignment type fix

yes, (of course) and I would be interested in a small
reproducible example which uses _valid_ code.
We have seen such examples with something (more complicated
than, but basically like)

  df <- data.frame(x=1:5, y=5:1, m=matrix(-pi*1:30, 5,6))
  M <- Matrix::Matrix(exp(0:3),2)
  df[1:2,1:2] <- M

which actually calls `[<-`, and then `[<-.data.frame`  and
always works for me but does seg.fault (in the CRAN checks of
package FastImputation (on 3 of the dozen platforms,
https://cran.r-project.org/web/checks/check_results_FastImputation.html

one of them is

 
https://www.r-project.org/nosvn/R.check/r-devel-macos-x86_64-clang/FastImputation-00check.html

I strongly suspect this is the same bug as yours, but for a case
where the correct behavior is *not* giving an error.

I have also written and shown  Herve's example  to the R-core team.

Unfortunately, I have no platform where I can trigger the bug.
Martin



> Cheers,
> Andrzej



> On Wed, Mar 22, 2017 at 1:28 AM, Martin Morgan <
> martin.mor...@roswellpark.org> wrote:

>> On 03/21/2017 08:21 PM, Hervé Pagès wrote:
>> 
>>> Hi Leonardo,
>>> 
>>> Thanks for hunting down and isolating that bug! I tried to simplify
>>> your code even more and was able to get a segfault with just:
>>> 
>>> setClass("A", representation(stuff="numeric"))
>>> x <- logical(10)
>>> x[TRUE] <- new("A")
>>> 
>>> I get the segfault about 50% of the time on a fresh R session on Mac.
>>> I tried this with R 3.3.3 on Mavericks, and with R devel (r72372)
>>> on El Capitan. I get the segfault on both.
>>> 
>>> So it looks like a bug in the `[<-` primitive to me (subassignment).
>>> 
>> 
>> Any insight from
>> 
>> R -d valgrind -f herve.R
>> 
>> where herve.R contains the code above?
>> 
>> Martin
>> 
>> 
>> 
>>> Cheers,
>>> H.
>>> 
>>> On 03/21/2017 03:06 PM, Leonardo Collado Torres wrote:
>>> 
 Hi bioc-devel,
 
 This is a story about a bug that took me a long time to trace. The
 behaviour was really weird, so I'm sharing the story in case this
 helps others in the future. I was originally writing it to request
 help, but then I was able to find the issue ^^. The story ends right
 now with code that will reproduce the problem with '$<-' from
 IRanges/S4Vectors.
 
 
 
 
 During this Bioc cycle, frequently my package derfinder has failed to
 pass R CMD check in OSX. The error is always the same when it appears
 and sometimes it shows up in release, but not devel and viceversa.
 Right now (3/21/2017) it's visible in both
 https://urldefense.proofpoint.com/v2/url?u=http-3A__biocondu
 ctor.org_checkResults_release_bioc-2DLATEST_derfinder_
 morelia-2Dchecksrc.html=DwIGaQ=eRAMFD45gAfqt84VtBcfh
 Q=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=Bw-1Kqy-M_
 t4kmpYWTpYkt5bvj_eTpxriUM3UvtOIzQ=RS-lsygPtDdgWKAhjA2BcSLk
 Vy9RxxshXWAJaBZa_Yc=
 
 and
 https://urldefense.proofpoint.com/v2/url?u=http-3A__biocondu
 ctor.org_checkResults_devel_bioc-2DLATEST_derfinder_toluca
 2-2Dchecksrc.html=DwIGaQ=eRAMFD45gAfqt84VtBcfhQ=BK7q3X
 eAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=Bw-1Kqy-M_
 t4kmpYWTpYkt5bvj_eTpxriUM3UvtOIzQ=a_K-yK7w2LEV72lpHrpp0UoK
 Rru_7Aad74T5Uk0R-Fo=
 .
 The end of "test-all.Rout.fail" looks like this:
 
 Loading required package: foreach
 Loading required package: iterators
 Loading required package: locfit
 locfit 1.5-9.1 2013-03-22
 getSegments: segmenting
 getSegments: splitting
 2017-03-20 02:36:52 findRegions: smoothing
 2017-03-20 02:36:52 findRegions: identifying potential segments
 2017-03-20 02:36:52 findRegions: segmenting information
 2017-03-20 02:36:52 .getSegmentsRle: segmenting with cutoff(s)
 16.3681899295041
 2017-03-20 02:36:52 findRegions: identifying candidate regions
 2017-03-20 02:36:52 findRegions: identifying region clusters
 2017-03-20 02:36:52 findRegions: smoothing
 2017-03-20 02:36:52 findRegions: identifying potential segments
 2017-03-20 02:36:52 findRegions: segmenting information
 2017-03-20 02:36:52 .getSegmentsRle: segmenting with cutoff(s)
 19.7936614060235
 2017-03-20 02:36:52 findRegions: identifying candidate regions
 2017-03-20 02:36:52 findRegions: identifying region clusters
 2017-03-20 02:36:52 

Re: [Rd] IO error when writing to disk

2017-03-22 Thread Martin Maechler
>>>>> realitix  <reali...@gmail.com>
>>>>> on Wed, 22 Mar 2017 10:17:54 +0100 writes:

> Hello,
> I have sent a mail but I got no answer.

All work here happens on a volunteer basis... and it seems
everybody was busy or not interested.

> Can you create a bugzilla account for me.

I've done that now.

Note that your prposed patch did contain a bit too many "copy &
paste" repetitions...  which I personally would have liked to be
written differently, using a wrapper (function or macro).

Also, let's assume on Linux, would there be a way to create a
small, say 1 MB, temporary file system as a non-root user?
In that case, we could do all the testing from inside R ..

Best,
Martin Maechler

> Thanks,
> Jean-Sébastien Bevilacqua

> 2017-03-20 10:24 GMT+01:00 realitix <reali...@gmail.com>:

>> Hello,
>> Here a small improvement for R.
>> 
>> When you use the function write.table, if the disk is full for example,
>> the function doesn't return an error and the file is written but 
truncated.
>> 
>> It can be a source of mistakes because you can then copy the output file
>> and think everything is ok.
>> 
>> How to reproduce
>> -
>> 
>> >> write.csv(1:1000, 'path')
>> 
>> You must have a path with a small amount of disk available (on linux:
>> http://souptonuts.sourceforge.net/quota_tutorial.html)
>> 
>> I have joined the patch in this email.
>> Can you open a bugzilla account for me to keep track of this change.
>> 
>> Thanks,
>> Jean-Sébastien Bevilacqua
>> 

> [[alternative HTML version deleted]]

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Hyperbolic tangent different results on Windows and Mac

2017-03-21 Thread Martin Maechler
>>>>> Rodrigo Zepeda <rzeped...@gmail.com>
>>>>> on Fri, 17 Mar 2017 12:56:06 -0600 writes:

> Dear all,
> We seem to have found a "strange" behaviour in the hyperbolic tangent
> function tanh on Windows.
> When running tanh(356 + 0i) the Windows result is NaN + 0.i while on Mac
> the result is 1 + 0i. It doesn't seem to be a floating point error because
> on Mac it is possible to run arbitrarily large numbers (say tanh(
> 
99677873648767519238192348124812341234182374817239847812738481234871823+0i)
> ) and still get 1 + 0i as result. This seems to be related to the 
imaginary
> part as tanh(356) returns 1 in both Windows and Mac.

> We have obtained those results in:
> 1) Mac with El Capitan v 10.11.6 *processor: 2.7 GHz Intel Core i5*
> - 2) Mac with Sierra v 10.12.3 *processor: 3.2 GHz Intel Core i5*
> - 3) Windows 10 Home v 1607 *processor: Intel Core m3-SY30 CPU@ 0.90 GHz
> 1.51 GHz*
> - 4) Windows 7 Home Premium Service Pack 1 *processor: Intel Core i5-2410M
> CPU @2.30 GHz 2.30GHz.*

(The hardware should not matter).

Yes, there is a bug here on Windows only, (several Linux
versions work correctly too).

> ​In all cases we are using R version 3.3.3 (64 bits)​


> - *Does anybody have a clue on why is this happening?*

> ​PS: We have previously posted this issue in Stack Overflow (
> 
http://stackoverflow.com/questions/42847414/hyperbolic-tangent-in-r-throws-nan-in-windows-but-not-in-mac).
> A comment suggests it is related to a glibc bug.

Yes, that would have been my guess too... as indeed, R on
Windows which should work for quite old versions of Windows has
been using a relatively old (gcc / libc) toolchain.

The upcoming version of R 3.4.0 uses a considerably newer
toolchain *BUT* I've just checked the latest "R-devel" binary
and the bug is still present there.

Here's a slight extension of the answer I wrote to the
above SO question here:  http://stackoverflow.com/a/42923289/161921

... Windows uses somewhat old C libraries, and here it is the
"mathlib" part of glibc. 

More specifically, according to the CRAN download page for R-devel for Windows
https://cran.r-project.org/bin/windows/base/rdevel.html ,
the R 3.3.z series uses the gcc 4.6.3 (March 2012) toolchain, whereas
"R-devel", the upcoming (not yet released!) R 3.4.z series uses
the gcc 4.9.3 (June 2015) toolchain.

According to Ben Bolker's comment on SO, the bug in glibc should have
been fixed in 2012 -- and so the change from 4.6.3 to 4.9.3
should have helped,
**however* I've just checked (installed the R-devel binary from CRAN on our 
Windows server virtual machine) and I see that the problem is still present 
there: In yesterday's version of R-devel, tanh(500+0i) still returns NaN+0i.

I now think a better solution would be to use R's internal
substitute (in R's src/main/complex.c): There, we have

#ifndef HAVE_CTANH
#define ctanh R_ctanh
static double complex ctanh(double complex z)
{
return -I * ctan(z * I); /* A 4.5.9 */
}
#endif

and we should use it (by "#undef HAVE_CTAN" (or better by a
  configure check, using  ctanh("500 + 0i"),
as I see that on Windows,
   R> -1i * tan((500+0i)*1i)
gives
   [1] 1+0i
as it should for tanh(500+0i) --- but does not on Windows.

Martin Maechler
ETH Zurich and R Core

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] outer not applying a constant function

2017-03-21 Thread Martin Maechler
>>>>> William Dunlap <wdun...@tibco.com>
>>>>> on Mon, 20 Mar 2017 10:20:11 -0700 writes:

>> Or is this a bad idea?
> I don't like the proposal.  I have seen code like the following (in
> fact, I have written such code, where I had forgotten a function was
> not vectorized) where the error would have been discovered much later
> if outer() didn't catch it.

>> outer(1:3, 11:13, sum)
>  Error in outer(1:3, 11:13, sum) :
>dims [product 9] do not match the length of object [1]

> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com

You are right, thank you!
Such a "convenience change" would not be a good idea.

Martin Maechler
ETH Zurich




> On Mon, Mar 20, 2017 at 6:36 AM, Martin Maechler
> <maech...@stat.math.ethz.ch> wrote:
>>>>>>> Gebhardt, Albrecht <albrecht.gebha...@aau.at>
>>>>>>> on Sun, 19 Mar 2017 09:14:56 + writes:
>> 
>> > Hi,
>> > the function outer can not apply a constant function as in the last 
line of the following example:
>> 
>> >> xg <- 1:4
>> >> yg <- 1:4
>> >> fxyg <- outer(xg, yg, function(x,y) x*y)
>> >> fconstg <- outer(xg, yg, function(x,y) 1.0)
>> > Error in outer(xg, yg, function(x, y) 1) :
>> > dims [product 16] do not match the length of object [1]
>> 
>> > Of course there are simpler ways to construct a constant matrix, that 
is not my point.
>> 
>> > It happens for me in the context of generating matrices of partial 
derivatives, and if on of these partial derivatives happens to be constant it 
fails.
>> 
>> > So e.g this works:
>> 
>> > library(Deriv)
>> > f <- function(x,y) (x-1.5)*(y-1)*(x-1.8)+(y-1.9)^2*(x-1.1)^3
>> > fx <- Deriv(f,"x")
>> > fy <- Deriv(f,"y")
>> > fxy <- Deriv(Deriv(f,"y"),"x")
>> > fxx <- Deriv(Deriv(f,"x"),"x")
>> > fyy <- Deriv(Deriv(f,"y"),"y")
>> 
>> > fg   <- outer(xg,yg,f)
>> > fxg  <- outer(xg,yg,fx)
>> > fyg  <- outer(xg,yg,fy)
>> > fxyg <- outer(xg,yg,fxy)
>> > fxxg <- outer(xg,yg,fxx)
>> > fyyg <- outer(xg,yg,fyy)
>> 
>> > And with
>> 
>> > f <- function(x,y) x+y
>> 
>> > it stops working. Of course I can manually fix this for that special 
case, but thats not my point. I simply thought "outer" should be able to handle 
constant functions.
>> 
>> ?outer   clearly states that  FUN  needs to be vectorized
>> 
>> but  function(x,y) 1is not.
>> 
>> It is easy to solve by wrapping the function in Vectorize(.):
>> 
>>> x <- 1:3; y <- 1:4
>> 
>>> outer(x,y, function(x,y) 1)
>> Error in dim(robj) <- c(dX, dY) :
>> dims [product 12] do not match the length of object [1]
>> 
>>> outer(x,y, Vectorize(function(x,y) 1))
>> [,1] [,2] [,3] [,4]
>> [1,]1111
>> [2,]1111
>> [3,]1111
>> 
>> 
>> 
>> So, your "should"  above must be read in the sense
>> 
>> "It really would be convenient here and
>> correspond to other "recycling" behavior of R"
>> 
>> and I agree with that, having experienced the same inconvenience
>> as you several times in the past.
>> 
>> outer() being a nice R-level function (i.e., no C speed up)
>> makes it easy to improve:
>> 
>> Adding something like the line
>> 
>> if(length(robj) == 1L) robj <- rep.int(robj, dX*dY)
>> 
>> beforedim(robj) <- c(dX, dY)   [which gave the error]
>> 
>> would solve the issue and not cost much (in the cases it is unneeded).
>> 
>> Or is this a bad idea?
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] outer not applying a constant function

2017-03-20 Thread Martin Maechler
> Gebhardt, Albrecht 
> on Sun, 19 Mar 2017 09:14:56 + writes:

> Hi,
> the function outer can not apply a constant function as in the last line 
of the following example:

>> xg <- 1:4
>> yg <- 1:4
>> fxyg <- outer(xg, yg, function(x,y) x*y)
>> fconstg <- outer(xg, yg, function(x,y) 1.0)
> Error in outer(xg, yg, function(x, y) 1) :
> dims [product 16] do not match the length of object [1]

> Of course there are simpler ways to construct a constant matrix, that is 
not my point.

> It happens for me in the context of generating matrices of partial 
derivatives, and if on of these partial derivatives happens to be constant it 
fails.

> So e.g this works:

> library(Deriv)
> f <- function(x,y) (x-1.5)*(y-1)*(x-1.8)+(y-1.9)^2*(x-1.1)^3
> fx <- Deriv(f,"x")
> fy <- Deriv(f,"y")
> fxy <- Deriv(Deriv(f,"y"),"x")
> fxx <- Deriv(Deriv(f,"x"),"x")
> fyy <- Deriv(Deriv(f,"y"),"y")

> fg   <- outer(xg,yg,f)
> fxg  <- outer(xg,yg,fx)
> fyg  <- outer(xg,yg,fy)
> fxyg <- outer(xg,yg,fxy)
> fxxg <- outer(xg,yg,fxx)
> fyyg <- outer(xg,yg,fyy)

> And with

> f <- function(x,y) x+y

> it stops working. Of course I can manually fix this for that special 
case, but thats not my point. I simply thought "outer" should be able to handle 
constant functions.

?outer   clearly states that  FUN  needs to be vectorized

but  function(x,y) 1is not.

It is easy to solve by wrapping the function in Vectorize(.):

> x <- 1:3; y <- 1:4

> outer(x,y, function(x,y) 1)
Error in dim(robj) <- c(dX, dY) : 
  dims [product 12] do not match the length of object [1]

> outer(x,y, Vectorize(function(x,y) 1))
 [,1] [,2] [,3] [,4]
[1,]1111
[2,]1111
[3,]1111



So, your "should"  above must be read in the sense

  "It really would be convenient here and
   correspond to other "recycling" behavior of R"

and I agree with that, having experienced the same inconvenience
as you several times in the past.

outer() being a nice R-level function (i.e., no C speed up)
makes it easy to improve:

Adding something like the line

if(length(robj) == 1L) robj <- rep.int(robj, dX*dY)

beforedim(robj) <- c(dX, dY)   [which gave the error]

would solve the issue and not cost much (in the cases it is unneeded).

Or is this a bad idea?

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Support for user defined unary functions

2017-03-16 Thread Martin Maechler
> Jim Hester 
> on Thu, 16 Mar 2017 12:31:56 -0400 writes:

> Gabe,
> The unary functions have the same precedence as normal SPECIALS
> (although the new unary forms take precedence over binary SPECIALS).
> So they are lower precedence than unary + and -. Yes, both of your
> examples are valid with this patch, here are the results and quoted
> forms to see the precedence.

> `%chr%` <- function(x) as.character(x)

  [more efficient would be `%chr%` <- as.character]

> `%identical%` <- function(x, y) identical(x, y)
> quote("100" %identical% %chr% 100)
> #>  "100" %identical% (`%chr%`(100))

> "100" %identical% %chr% 100
> #> [1] TRUE

> `%num%` <- as.numeric
> quote(1 + - %num% "5")
> #> 1 + -(`%num%`("5"))

> 1 + - %num% "5"
> #> [1] -4

> Jim

I'm sorry to be a bit of a spoiler to "coolness", but
you may know that I like to  applaud Norm Matloff for his book
title "The Art of R Programming",
because for me good code should also be beautiful to some extent.

I really very much prefer

   f(x)
to%f% x   

and hence I really really really cannot see why anybody would prefer
the ugliness of

   1 + - %num% "5"
to
   1 + -num("5")

(after setting  num <- as.numeric )

Martin


> On Thu, Mar 16, 2017 at 12:01 PM, Gabriel Becker  
wrote:
>> Jim,
>> 
>> This seems cool. Thanks for proposing it. To be concrete, he user-defined
>> unary operations would be of the same precedence (or just slightly 
below?)
>> built-in unary ones? So
>> 
>> "100" %identical% %chr% 100
>> 
>> would work and return TRUE under your patch?
>> 
>> And  with %num% <- as.numeric, then
>> 
>> 1 + - %num% "5"
>> 
>> would also be legal (though quite ugly imo) and work?
>> 
>> Best,
>> ~G
>> 
>> On Thu, Mar 16, 2017 at 7:24 AM, Jim Hester 
>> wrote:
>>> 
>>> R has long supported user defined binary (infix) functions, defined
>>> with `%fun%`. A one line change [1] to R's grammar allows users to
>>> define unary (prefix) functions in the same manner.
>>> 
>>> `%chr%` <- function(x) as.character(x)
>>> `%identical%` <- function(x, y) identical(x, y)
>>> 
>>> %chr% 100
>>> #> [1] "100"
>>> 
>>> %chr% 100 %identical% "100"
>>> #> [1] TRUE
>>> 
>>> This seems a natural extension of the existing functionality and
>>> requires only a minor change to the grammar. If this change seems
>>> acceptable I am happy to provide a complete patch with suitable tests
>>> and documentation.
>>> 
>>> [1]:
>>> Index: src/main/gram.y
>>> ===
>>> --- src/main/gram.y (revision 72358)
>>> +++ src/main/gram.y (working copy)
>>> @@ -357,6 +357,7 @@
>>> |   '+' expr %prec UMINUS   { $$ = xxunary($1,$2);
>>> setId( $$, @$); }
>>> |   '!' expr %prec UNOT { $$ = xxunary($1,$2);
>>> setId( $$, @$); }
>>> |   '~' expr %prec TILDE{ $$ = xxunary($1,$2);
>>> setId( $$, @$); }
>>> +   |   SPECIAL expr{ $$ = xxunary($1,$2);
>>> setId( $$, @$); }
>>> |   '?' expr{ $$ = xxunary($1,$2);
>>> setId( $$, @$); }
>>> 
>>> |   expr ':'  expr  { $$ =
>>> xxbinary($2,$1,$3);  setId( $$, @$); }
>>> 
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
>> 
>> 
>> 
>> --
>> Gabriel Becker, PhD
>> Associate Scientist (Bioinformatics)
>> Genentech Research

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] Extending an S3 method, but putting the package in Suggests?

2017-03-14 Thread Martin Maechler
>>>>> David Hugh-Jones <davidhughjo...@gmail.com>
>>>>> on Tue, 14 Mar 2017 09:26:49 + writes:

> Just out of interest, what would happen if I used the hacky solution of
> simply  exporting my own method like:

> as.FlexTable <- function(x, ...) UseMethod("as.FlexTable")

> I am fairly sure you will tell me that fire and brimstone will rain down…
> But it sure seems simple  compared to writing another package and getting
> it on CRAN...

no fire etc, but pretty close ;-)

You will have two (internal) methods tables for as.FlexTable,
one in ReporteRs, one in your package, and from a user point of
view there's a deep abyss in functionality between working with

  require(ReporteRs); require(huxtable)

and working with

  require(huxtable); require(ReporteRs)

This is undesirable and error prone and can be resolved by
correct imports as those mentioned.

Martin

> David

> On Tue, 14 Mar 2017 at 09:06, Martin Maechler <maech...@stat.math.ethz.ch>
> wrote:

>> >>>>> David Hugh-Jones <davidhughjo...@gmail.com>
>> >>>>> on Tue, 14 Mar 2017 02:46:35 + writes:
>> >>>>> David Hugh-Jones <davidhughjo...@gmail.com>
>> >>>>> on Tue, 14 Mar 2017 02:46:35 + writes:
>> 
>> > Hi,
>> > Cross-posted from SO:
>> >
>> 
http://stackoverflow.com/questions/42776058/extending-an-s3-generic-from-an-optional-package
>> 
>> ((sent my answer there as well))
>> 
>> > I have a package which provides an as.FlexTable method for its
>> objects,
>> > extending the S3 generic from the ReporteRs package. So, my
>> NAMESPACE file,
>> > generated by roxygen, has lines:
>> 
>> > importFrom(ReporteRs,as.FlexTable)
>> > ...
>> > S3method(as.FlexTable,huxtable)
>> > ...
>> > export(as.FlexTable)
>> 
>> > I don't much want to put ReporteRs in Imports: in the DESCRIPTION
>> file,
>> > because it involves a big external dependency on Java. But, when I
>> put it
>> > into Suggests:, R CMD check gives me errors like "Namespace
>> dependency not
>> > required".
>> 
>> > Is there anyway I can extend the generic without making a hard
>> dependency?
>> 
>> No.  Importing is a hard dependency..
>> Some people do not import formally but use  '::'
>> instead, *and* conditionalize their code on the availability of
>> that namespace.
>> I don't recommend that at all, and particularly not for
>> extending a generic.
>> 
>> I recommend you talk with the maintainer of 'ReporteRs':
>> 1) You could use a common (yet-to-create) very small package say
>> 'flexS3generics'
>> which provides S3 generics (and S4 if ..) you want to use
>> both, and then both you and her/him import from that mini package.
>> You'd be both authors of that package.
>> 
>> 2) If your package is much smaller (in its footprint, incl
>> dependencies) than 'ReporteRs' she/he may agree to import the
>> S3 generic from your package instead of the other way around.
>> 
>> Both are clean solutions,
>> and both need some time-coordination when releasing to CRAN,
>> '1)' being easier: Once the 'flexS3generics' is released to
>> CRAN, change (both) your package(s) to
>> importFrom(flexS3generics,*) but these changes and CRAN
>> submissions are then independent of each other.
>> 
>> 
>> > Cheers,
>> > David
>> 
>> --
> Sent from Gmail Mobile

> [[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] Extending an S3 method, but putting the package in Suggests?

2017-03-14 Thread Martin Maechler
> David Hugh-Jones 
> on Tue, 14 Mar 2017 02:46:35 + writes:
> David Hugh-Jones 
> on Tue, 14 Mar 2017 02:46:35 + writes:

> Hi,
> Cross-posted from SO:
> 
http://stackoverflow.com/questions/42776058/extending-an-s3-generic-from-an-optional-package

  ((sent my answer there as well))

> I have a package which provides an as.FlexTable method for its objects,
> extending the S3 generic from the ReporteRs package. So, my NAMESPACE 
file,
> generated by roxygen, has lines:

> importFrom(ReporteRs,as.FlexTable)
> ...
> S3method(as.FlexTable,huxtable)
> ...
> export(as.FlexTable)

> I don't much want to put ReporteRs in Imports: in the DESCRIPTION file,
> because it involves a big external dependency on Java. But, when I put it
> into Suggests:, R CMD check gives me errors like "Namespace dependency not
> required".

> Is there anyway I can extend the generic without making a hard dependency?

No.  Importing is a hard dependency..
Some people do not import formally but use  '::'
instead, *and* conditionalize their code on the availability of
that namespace.
I don't recommend that at all, and particularly not for
extending a generic.

I recommend you talk with the maintainer of 'ReporteRs':
1) You could use a common (yet-to-create) very small package say 
'flexS3generics'
   which provides S3 generics (and S4 if ..) you want to use
   both, and then both you and her/him import from that mini package.
   You'd be both authors of that package.

2) If your package is much smaller (in its footprint, incl
   dependencies) than 'ReporteRs' she/he may agree to import the
   S3 generic from your package instead of the other way around.

Both are clean solutions,
and both need some time-coordination when releasing to CRAN,
'1)' being easier: Once the 'flexS3generics' is released to
CRAN, change (both) your package(s) to
importFrom(flexS3generics,*) but these changes and CRAN
submissions are then independent of each other.


> Cheers,
> David

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] named arguments in formula and terms

2017-03-13 Thread Martin Maechler
Dear Achim,

> Achim Zeileis 
> on Fri, 10 Mar 2017 15:02:38 +0100 writes:

> Hi, we came across the following unexpected (for us)
> behavior in terms.formula: When determining whether a term
> is duplicated, only the order of the arguments in function
> calls seems to be checked but not their names. Thus the
> terms f(x, a = z) and f(x, b = z) are deemed to be
> duplicated and one of the terms is thus dropped.

R> attr(terms(y ~ f(x, a = z) + f(x, b = z)), "term.labels")
> [1] "f(x, a = z)"

> However, changing the arguments or the order of arguments
> keeps both terms:

R> attr(terms(y ~ f(x, a = z) + f(x, b = zz)), "term.labels")
> [1] "f(x, a = z)" "f(x, b = zz)"
R> attr(terms(y ~ f(x, a = z) + f(b = z, x)), "term.labels")
> [1] "f(x, a = z)" "f(b = z, x)"

> Is this intended behavior or needed for certain terms?

> We came across this problem when setting up certain smooth
> regressors with different kinds of patterns. As a trivial
> simplified example we can generate the same kind of
> problem with rep(). Consider the two dummy variables rep(x
> = 0:1, each = 4) and rep(x = 0:1, times = 4). With the
> response y = 1:8 I get:

R> lm((1:8) ~ rep(x = 0:1, each = 4) + rep(x = 0:1, times = 4))

> Call: lm(formula = (1:8) ~ rep(x = 0:1, each = 4) + rep(x
> = 0:1, times = 4))

> Coefficients: (Intercept) rep(x = 0:1, each = 4) 2.5 4.0

> So while the model is identified because the two
> regressors are not the same, terms.fomula does not
> recognize this and drops the second regressor.  What I
> would have wanted can be obtained by switching the
> arguments:

R> lm((1:8) ~ rep(each = 4, x = 0:1) + rep(x = 0:1, times =4))

> Call: lm(formula = (1:8) ~ rep(each = 4, x = 0:1) + rep(x
> = 0:1, times = 4))

> Coefficients: (Intercept) rep(each = 4, x = 0:1) rep(x =
> 0:1, times = 4) 2 4 1

> Of course, here I could avoid the problem by setting up
> proper factors etc. But to me this looks a potential bug
> in terms.formula...

I agree that there is a bug.
According to https://www.r-project.org/bugs.html
I have generated an R bugzilla account for you so you can report
it there (for "book keeping", posteriority, etc).

> Thanks in advance for any insights, Z

and thank *you* (and Nikolaus ?) for the report!

Best regards,
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bug in nlm()

2017-03-08 Thread Martin Maechler
   {This was sent to me, MM, only, but for completeness should
have gone back to R-devel.

Further: I now *have* added Marie B to the members'of "R bugzilla"
-- M.Maechler}


I had already read the R bug reporting guide and I'm sure it is a bug.
The bug occurs when the user provides not only the analytic gradient but also 
the analytic Hessian of the objective function. In that case, the algorithm 
does not converge due to an erroneous implementation of the modified Cholesky 
decomposition of the Hessian matrix. It is actually a bug in the C-code called 
by nlm(), therefore it is hard to show that the non-convergence of the 
algorithm is really due to this bug with only a MRE.
However, a short example (optimizing the Rosenbrock banana valley function with 
and without analytic Hessian) is:

fg <- function(x){  
  gr <- function(x1, x2) c(-400*x1*(x2 - x1*x1)-2*(1-x1), 200*(x2 - x1*x1)) 
  x1 <- x[1]; x2 <- x[2]
  res<- 100*(x2 - x1*x1)^2 + (1-x1)^2 
  attr(res, "gradient") <- gr(x1, x2)
  return(res)
} 
nlm.fg <- nlm(fg, c(-1.2, 1))

fgh <- function(x){ 
  gr <- function(x1, x2) c(-400*x1*(x2 - x1*x1) - 2*(1-x1), 200*(x2 - x1*x1)) 
  h <- function(x1, x2){
a11 <- 2 - 400*x2 + 1200*x1*x1
a21 <- -400*x1 
matrix(c(a11, a21, a21, 200), 2, 2)
  } 
  x1 <- x[1];  x2 <- x[2];  res<- 100*(x2 - x1*x1)^2 + (1-x1)^2 
  attr(res, "gradient") <- gr(x1, x2)
  attr(res, "hessian") <- h(x1, x2) 
  return(res)
}
nlm.fgh <- nlm(fgh, c(-1.2,1))

I have almost finished a more detailed bug report, which I would like to submit.

Best,
Marie Boehnstedt

>>>>> Martin Maechler <maech...@stat.math.ethz.ch>
>>>>> on Fri, 3 Mar 2017 18:15:47 +0100 writes:

>>>>> Boehnstedt, Marie <boehnst...@demogr.mpg.de>
>>>>> on Fri, 3 Mar 2017 10:23:12 + writes:

>> Dear all, I have found a bug in nlm() and would like to
>> submit a report on this.  Since nlm() is in the
>> stats-package, which is maintained by the R Core team,
>> bug reports should be submitted to R's Bugzilla. However,
>> I'm not a member of Bugzilla. Could anyone be so kind to
>> add me to R's Bugzilla members or let me know to whom I
>> should send the bug report?

> Dear Marie,

> I can do this ... but are you really sure?  There is
> https://www.r-project.org/bugs.html which you should spend
> some time reading if you haven't already.

> I think you would post a MRE (Minimal Reproducible
> Example) here {or on stackoverflow or ...} if you'd follow
> what the 'R bugs' web page (above) recommends and only
> report a bug after some feedback from "the public".

> Of course, I could be wrong.. and happy if you explain /
> tell me why.

> Best, Martin Maechler

>> Thank you in advance.

>> Kind regards, Marie B�hnstedt


>> Marie B�hnstedt, MSc Research Scientist Max Planck
>> Institute for Demographic Research Konrad-Zuse-Str. 1,
>> 18057 Rostock, Germany
>> www.demogr.mpg.de<http://www.demogr.mpg.de/>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Please add me to bugzilla

2017-03-06 Thread Martin Maechler
>>>>> Bradley Broom <bmbr...@gmail.com>
>>>>> on Mon, 6 Mar 2017 06:55:35 -0600 writes:

> Apologies, I thought I was following exactly that sentence
> and trying to make a minimal post that would waste as
> little developer bandwidth as possible given the lack of a
> better system.

I understand.   My apologies now, as I was mistrusting, clearly
wrongly in this case.

> Anyway, I have been using R for like forever (20 years).

> In my current project, I have run into problems with stack
> overflows in R's dendrogram code when trying to use either
> str() or as.hclust() on very deep dendrograms.

I understand.  Indeed, bug PR#16424 
https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16424
encountered the same problem in other dendrogram functions and
solved it by re-programming the relevant parts non-recursively,
too.

   [.]  

> What should happen: Function completes without a stack
> overflow.

> 2nd bug: hh <- as.hclust(de)

> What happens: Error: C stack usage 7971248 is too close to the limit

> What should happen: Function completes without a stack
> overflow.

> A knowledgeable user might be able to increase R's limits
> to avoid these errors on this particular dendrogram, but
> a) my users aren't that knowledgeable about R and this is
> expected to be a common problem, and b) there will be
> bigger dendrograms (up to at least 25000 leaves).

Agreed.  The current help pages warns about the problem and
gives advice (related to increasing the stack), but what you propose
is better, i.e., re-implementing relevant parts non-recursively.

> Please see attached patch for non-recursive
> implementations.

Very well done, thank you a lot!
[and I will add you to bugzilla .. so you can use it for the
 next bug .. ;-)]

Best,
Martin

> Regards, Bradley



> On Mon, Mar 6, 2017 at 3:50 AM, Martin Maechler
> <maech...@stat.math.ethz.ch> wrote:

>> >>>>> Bradley Broom <bmbr...@gmail.com> >>>>> on Sun, 5
>> Mar 2017 16:03:30 -0600 writes:
>> 
>> > Please add me to R bugzilla.  Thanks, Bradley
>> 
>> Well, I will not do it just like that (mean "after such a
>> minimal message").
>> 
>> I don't see any evidence as to your credentials,
>> knowledge of R, etc, as part of this request.  We are all
>> professionals, devoting part of our (work and free) time
>> to the R project (rather than employees of the company
>> you paid to serve you ...)
>> 
>> It may be that you have read
>> https://www.r-project.org/bugs.html
>> 
>> Notably this part
>> 
--> NOTE: due to abuse by spammers, since 2016-07-09 only
--> users who have
>> previously submitted bugs can submit new ones on R’s
>> Bugzilla. We’re working on a better system… In the mean
>> time, post (e-mail) to R-devel or ask an R Core member to
>> add you manually to R’s Bugzilla members.
>> 
>> The last sentence was *meant* to say you should post
>> (possibly parts, ideally a minimal reproducible example
>> of) your bug report to R-devel so others could comment on
>> it, agree or disagree with your assessment etc, __or__
>> ask an R-core member to add you to bugzilla (if you
>> really read the other parts of the 'R bugs' web page
>> above).
>> 
>> Posting to all 1000 R-devel readers with no content about
>> what you consider a bug is a waste of bandwidth for at
>> least 99% of these readers.
>> 
>> [Yes, I'm also using their time ... in the hope to
>> *improve* the quality of future such postings].
>> 
>> Martin Maechler ETH Zurich
>> 
> x external: dendro-non-recursive.patch text/x-patch, u
> [Click mouse-2 to display text]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Control statements with condition with greater than one should give error (not just warning) [PATCH]

2017-03-06 Thread Martin Maechler
>>>>> Michael Lawrence <lawrence.mich...@gene.com>
>>>>> on Sat, 4 Mar 2017 12:20:45 -0800 writes:

> Is there really a need for these complications? Packages
> emitting this warning are broken by definition and should be fixed. 

I agree and probably Henrik, too.

(Others may disagree to some extent .. and find it convenient
 that R does translate 'if(x)'  to  'if(x[1])'  for them albeit
 with a warning .. )

> Perhaps we could "flip the switch" in a test
> environment and see how much havoc is wreaked and whether
> authors are sufficiently responsive?

> Michael

As we have > 10'000 packages on CRAN alonce,  and people have
started (mis)using suppressWarnings(.) in many places,  there
may be considerably more packages affected than we optimistically assume...

As R core member who would  "flip the switch"  I'd typically then
have to be the one sending an e-mail to all package maintainers
affected and in this case I'm very reluctant to volunteer
for that and so, I'd prefer the environment variable where R
core and others can decide how to use it .. for a while .. until
the flip is switched for all.

or have I overlooked an issue?

Martin

> On Sat, Mar 4, 2017 at 12:04 PM, Martin Maechler
> <maech...@stat.math.ethz.ch
>> wrote:

>> >>>>> Henrik Bengtsson <henrik.bengts...@gmail.com> >>>>>
>> on Fri, 3 Mar 2017 10:10:53 -0800 writes:
>> 
>> > On Fri, Mar 3, 2017 at 9:55 AM, Hadley Wickham >
>> <h.wick...@gmail.com> wrote: >>> But, how you propose a
>> warning-to-error transition >>> should be made without
>> wreaking havoc?  Just flip the >>> switch in R-devel and
>> see CRAN and Bioconductor packages >>> break overnight?
>> Particularly Bioconductor devel might >>> become
>> non-functional (since at times it requires >>> R-devel).
>> For my own code / packages, I would be able >>> to handle
>> such a change, but I'm completely out of >>> control if
>> one of the package I'm depending on does not >>> provide
>> a quick fix (with the only option to remove >>> package
>> tests for those dependencies).
>> >>
>> >> Generally, a package can not be on CRAN if it has any
>> >> warnings, so I don't think this change would have any
>> >> impact on CRAN packages.  Isn't this also true for >>
>> bioconductor?
>> 
>> > Having a tests/warn.R file with:
>> 
>> > warning("boom")
>> 
>> > passes through R CMD check --as-cran unnoticed.
>> 
>> Yes, indeed.. you are right Henrik that many/most R
>> warning()s would not produce R CMD check 'WARNING's ..
>> 
>> I think Hadley and I fell into the same mental pit of
>> concluding that such warning()s from
>> if() ...  would not currently happen
>> in CRAN / Bioc packages and hence turning them to errors
>> would not have a direct effect.
>> 
>> With your 2nd e-mail of saying that you'd propose such an
>> option only for a few releases of R you've indeed
>> clarified your intent to me.  OTOH, I would prefer using
>> an environment variable (as you've proposed as an
>> alternative) which is turned "active" at the beginning
>> only manually or for the "CRAN incoming" checks of the
>> CRAN team (and bioconductor submission checks?)  and
>> later for '--as-cran' etc until it eventually becomes the
>> unconditional behavior of R (and the env.variable is no
>> longer used).
>> 
>> Martin
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 

>   [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Please add me to bugzilla

2017-03-06 Thread Martin Maechler
>>>>> Bradley Broom <bmbr...@gmail.com>
>>>>> on Sun, 5 Mar 2017 16:03:30 -0600 writes:

> Please add me to R bugzilla.  Thanks, Bradley

Well, I will not do it just like that (mean "after such a
minimal message").

I don't see any evidence as to your credentials, knowledge of R,
etc, as part of this request.  We are all professionals,
devoting part of our (work and free) time to the R project
(rather than employees of the company you paid to serve you ...)

It may be that you have read   https://www.r-project.org/bugs.html

Notably this part

--> NOTE: due to abuse by spammers, since 2016-07-09 only users who have 
previously submitted bugs can submit new ones on R’s Bugzilla. We’re working on 
a better system… In the mean time, post (e-mail) to R-devel or ask an R Core 
member to add you manually to R’s Bugzilla members.

The last sentence was *meant* to say you should post (possibly
parts, ideally a minimal reproducible example of) your bug
report to R-devel so others could comment on it, agree or
disagree with your assessment etc,
__or__ ask an R-core member to add you to bugzilla (if you really read the
other parts of the 'R bugs' web page above).

Posting to all 1000 R-devel readers with no content about what
you consider a bug  is a waste of bandwidth for at least 99% of
these readers.

[Yes, I'm also using their time ... in the hope to *improve* the
 quality of future such postings].

Martin Maechler
ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Control statements with condition with greater than one should give error (not just warning) [PATCH]

2017-03-04 Thread Martin Maechler
> Henrik Bengtsson 
> on Fri, 3 Mar 2017 10:10:53 -0800 writes:

> On Fri, Mar 3, 2017 at 9:55 AM, Hadley Wickham
>  wrote:
>>> But, how you propose a warning-to-error transition
>>> should be made without wreaking havoc?  Just flip the
>>> switch in R-devel and see CRAN and Bioconductor packages
>>> break overnight?  Particularly Bioconductor devel might
>>> become non-functional (since at times it requires
>>> R-devel).  For my own code / packages, I would be able
>>> to handle such a change, but I'm completely out of
>>> control if one of the package I'm depending on does not
>>> provide a quick fix (with the only option to remove
>>> package tests for those dependencies).
>> 
>> Generally, a package can not be on CRAN if it has any
>> warnings, so I don't think this change would have any
>> impact on CRAN packages.  Isn't this also true for
>> bioconductor?

> Having a tests/warn.R file with:

> warning("boom")

> passes through R CMD check --as-cran unnoticed.  

Yes, indeed.. you are right Henrik  that many/most R warning()s would
not produce  R CMD check  'WARNING's ..

I think Hadley and I fell into the same mental pit of concluding
that such warning()s  from   if()  ...
would not currently happen in CRAN / Bioc packages and hence
turning them to errors would not have a direct effect.

With your 2nd e-mail of saying that you'd propose such an option
only for a few releases of R you've indeed clarified your intent
to me.
OTOH, I would prefer using an environment variable (as you've
proposed as an alternative)  which is turned "active"  at the
beginning only manually or  for the  "CRAN incoming" checks of
the CRAN team (and bioconductor submission checks?)
and later for  '--as-cran'  etc until it eventually becomes the
unconditional behavior of R (and the env.variable is no longer used).

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Control statements with condition with greater than one should give error (not just warning) [PATCH]

2017-03-03 Thread Martin Maechler
>>>>> Henrik Bengtsson <henrik.bengts...@gmail.com>
>>>>> on Fri, 3 Mar 2017 00:52:16 -0800 writes:

> I'd like to propose that the whenever the length of condition passed
> to an if or a while statement differs from one, an error is produced
> rather than just a warning as today:

>> x <- 1:2
>> if (x == 1) message("x == 1")
> x == 1
> Warning message:
> In if (x == 1) message("x == 1") :
> the condition has length > 1 and only the first element will be used

> There are probably legacy reasons for why this is accepted by R in the
> first place, but I cannot imagine than anyone wants to use an if/while
> statement this way on purpose.  The warning about this misuse, was
> introduced in November 2002 (R-devel thread 'vector arguments to
> if()'; https://stat.ethz.ch/pipermail/r-devel/2002-November/025537.html).

yes, before, there was *no* warning at all and so the problem existed
in several partly important R packages.

Now is a different time, I agree, and I even tend to agree we
should make this an error... probably however not for the
upcoming R 3.4.0 (in April which is somewhat soon) but rather
for the next version.


> Below is patch (also attached) that introduces option
> 'check.condition' such that when TRUE, 

ouch ouch ouch!   There are many sayings starting with
  "The way to hell "

Here:

The way to R hell starts (or "widens", your choice) by
introducing options() that influence basic language semantics

!!

For robust code you will start to test all code of R for all
different possible combinations of these options set  I am
sure you would not want this.

No --- don't even think of allowing an option for something such basic!

Martin Maechler
ETH Zurich (and R Core)

> it will generate an error
> rather than a warning (default).  This option allows for a smooth
> migration as it can be added to 'R CMD check --as-cran' and developers
> can give time to check and fix their packages.  Eventually,
> check.condition=TRUE can become the new default.

> With options(check.condition = TRUE), one gets:

>> x <- 1:2
>> if (x == 1) message("x == 1")
> Error in if (x == 1) message("x == 1") : the condition has length > 1

> and

>> while (x < 2) message("x < 2")
> Error in while (x < 2) message("x < 2") : the condition has length > 1


> Index: src/library/base/man/options.Rd
> ===
> --- src/library/base/man/options.Rd (revision 72298)
> +++ src/library/base/man/options.Rd (working copy)
> @@ -86,6 +86,11 @@
> vector (atomic or \code{\link{list}}) is extended, by something
> like \code{x <- 1:3; x[5] <- 6}.}

> +\item{\code{check.condition}:}{logical, defaulting to \code{FALSE}.  
If
> +  \code{TRUE}, an error is produced whenever the condition to an
> +  \code{if} or a \code{while} control statement is of length greater
> +  than one.  If \code{FALSE}, a \link{warning} is produced.}
> +
> \item{\code{CBoundsCheck}:}{logical, controlling whether
> \code{\link{.C}} and \code{\link{.Fortran}} make copies to check for
> array over-runs on the atomic vector arguments.
> @@ -445,6 +450,7 @@
> \tabular{ll}{
> \code{add.smooth} \tab \code{TRUE}\cr
> \code{check.bounds} \tab \code{FALSE}\cr
> +\code{check.condition} \tab \code{FALSE}\cr
> \code{continue} \tab \code{"+ "}\cr
> \code{digits} \tab \code{7}\cr
> \code{echo} \tab \code{TRUE}\cr
> Index: src/library/utils/R/completion.R
> ===
> --- src/library/utils/R/completion.R (revision 72298)
> +++ src/library/utils/R/completion.R (working copy)
> @@ -1304,8 +1304,8 @@
> "plt", "ps", "pty", "smo", "srt", "tck", "tcl", "usr",
> "xaxp", "xaxs", "xaxt", "xpd", "yaxp", "yaxs", "yaxt")

> -options <- c("add.smooth", "browser", "check.bounds", "continue",
> - "contrasts", "defaultPackages", "demo.ask", "device",
> +options <- c("add.smooth", "browser", "check.bounds", 
"check.condition",
> +"continue", "contrasts", "defaultPackages", "demo.ask", "device",
> "di

Re: [Rd] Bug in nlm()

2017-03-03 Thread Martin Maechler
>>>>> Boehnstedt, Marie <boehnst...@demogr.mpg.de>
>>>>> on Fri, 3 Mar 2017 10:23:12 + writes:

> Dear all,
> I have found a bug in nlm() and would like to submit a report on this.
> Since nlm() is in the stats-package, which is maintained by the R Core 
team, bug reports should be submitted to R's Bugzilla. However, I'm not a 
member of Bugzilla. Could anyone be so kind to add me to R's Bugzilla members 
or let me know to whom I should send the bug report?

Dear Marie,

I can do this ... but  are you really sure?  There is
 https://www.r-project.org/bugs.html
which you should spend some time reading if you haven't already.

I think you would post a MRE (Minimal Reproducible Example) here
{or on stackoverflow or ...} if you'd follow what the 'R bugs' web
page (above) recommends and only report a bug after some
feedback from "the public".

Of course, I could be wrong.. and happy if you explain / tell me why.

Best,
Martin Maechler

> Thank you in advance.

> Kind regards,
> Marie B�hnstedt


> Marie B�hnstedt, MSc
> Research Scientist
> Max Planck Institute for Demographic Research
> Konrad-Zuse-Str. 1, 18057 Rostock, Germany
> www.demogr.mpg.de<http://www.demogr.mpg.de/>




> --
> This mail has been sent through the MPI for Demographic ...{{dropped:9}}


> --
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [R-pkg-devel] tryCatch defensive programming guidance

2017-03-01 Thread Martin Maechler
> Berry Boessenkool 
> on Wed, 1 Mar 2017 14:52:10 + writes:

> Hi Glenn,


> Better late than never:
> couldn't you simply use try?

> result <- try( log("a") )

> The printing is horrible: people will think an error
> occured (but the function didn't stop!)

> I tend to use it like this (which may be totally
> unintended):


> res <- try(log("a"), silent=TRUE)
> if(inherits(res, "try-error"))
> {
>   message("log failed: ",res,". Now continuing with res=0.")
>  res <- 0
> }

but if you ever looked:  try() is just a wrapper to tryCatch()
and using try(*, silent=TRUE)  is even closer to a pretty simple tryCatch(.)

Historically,  tryCatch() did not exist, but try() did.
So much of "old code" still has try() calls in it.

I consider try() as convenience function for interactive use but
would always use tryCatch() for new code in my packages.


> See here for my version that captures errors/warnings/messages with call 
tracing:

> https://www.rdocumentation.org/packages/berryFunctions/topics/tryStack

I'd recommend you switch to tryCatch(): It is more flexible and
more directly "configurable" than try() -- which also got some historical
flexibility by the "hack" (not functional programming)
of depending on  getOption("show.error.messages")  in the
default case  silent = FALSE  {needed to suppress other
packages' using  try(.) 's printing of error messages as if
*error*s inspite of the fact that they were caught.

 and back to the OP:

I think that's your only small problem that you chose
  error = function(e) print(e)

because that prints "as if" you had an error.

Martin


> Regards,

> Berry


> 
> From: R-package-devel  on behalf 
of Glenn Schultz 
> Sent: Saturday, February 25, 2017 15:50
> To: R Package Development
> Subject: [R-pkg-devel] tryCatch defensive programming guidance

> All,

> I have the following to create a class PriceTypes.  I use try catch on 
the function and it gives me the error

> price <- tryCatch(PriceTypes(price = "100")
> ,error = function(e) print(e)
> ,warning = function(w) print(w))

> 
>> 

> I read the section on tryCatch and withCallingHandlers as well the manual 
but I am still not clear as to how to use tryCatch in the function below I tried
> PriceTypes <- TryCatch(
> function(){}
> ), function(e) print(error)

> but this is obviously wrong as it did not work.  My question can I use 
tryCatch in the function itself or only when I invoke the function.

> Best Regards,
> Glenn

> #' An S4 class representating bond price
> #'
> #' This class is used to create and pass the price types reported to
> #' investors and used in analytics. For example price is often reported as
> #' decimal or fractions 32nds to investors but price basis (price/100) is
> #' used to calculate proceeds and compute metrics like yield, duration, 
and
> #' partial durations.
> #' @slot PriceDecimal A numeric value the price using decimal notation
> #' @slot Price32nds A character the price using 32nds notation
> #' @slot PriceBasis A numeric value price decimal notation in units of 100
> #' @slot PriceDecimalString A character the price using decimal notation
> #' @exportClass PriceTypes
> setClass("PriceTypes",
> representation(
> PriceDecimal = "numeric",
> Price32nds = "character",
> PriceBasis = "numeric",
> PriceDecimalString = "character")
> )

> setGeneric("PriceTypes", function(price = numeric())
> {standardGeneric("PriceTypes")})

> #' A standard generic function get the slot PriceDecimal
> #'
> #' @param object an S4 object
> #' @export PriceDecimal
> setGeneric("PriceDecimal", function(object)
> {standardGeneric("PriceDecimal")})

> #' A standard generic function to set the slot PriceDecimal
> #'
> #' @param object an S4 object
> #' @param value the replacement value of the slot
> #' @export PriceDecimal<-
> setGeneric("PriceDecimal<-", function(object, value)
> {standardGeneric("PriceDecimal<-")})

> #' A standard generic function to get the slot Price32nds
> #'
> #' @param object an S4 object
> #' @export Price32nds
> setGeneric("Price32nds", function(object)
> {standardGeneric("Price32nds")})

> #' A standard generic function to set the slot Price32nds
> #'
> #' @param object an S4 object
> #' @param value the replacement value of the slot
> #' @export Price32nds<-
> setGeneric("Price32nds<-", function(object, value)
> {setGeneric("Price32nds")})

> #' A standard generic to get the slot PriceBasis
> #'
> #' @param object an S4 object
> #' @export PriceBasis
> setGeneric("PriceBasis", 

Re: [Rd] stats::median

2017-03-01 Thread Martin Maechler
>>>>> Martin Maechler <maech...@stat.math.ethz.ch>
>>>>> on Mon, 27 Feb 2017 10:42:19 +0100 writes:

>>>>> Rob J Hyndman <rob.hynd...@monash.edu>
>>>>> on Wed, 15 Feb 2017 21:48:56 +1100 writes:

>> The generic stats::median method is defined as median <-
>> function (x, na.rm = FALSE) {UseMethod("median")}

>> I suggest that this should become median <- function (x,
>> na.rm = FALSE, ...)  {UseMethod("median")}

>> This would allow additional S3 methods to be developed
>> with additional arguments.

> and S4 methods, too.

>> Currently I have to over-ride this generic definition in
>> the demography package because median.demogdata has
>> several other arguments.

>> This shouldn't break any code, and will make it easier
>> for new S3 methods to be developed. It is also consistent
>> with almost all other S3 methods which do include an
>> ellipsis.

> "shouldn't break any code" is almost always quite
> optimistic nowadays,

For CRAN, the change leads   13 packages (out of > 1) to
"regress" to  status: WARN.

I've checked 10 of them, and all these define  median() S3
methods, and currently of course have not had the '...' in their
formal argument list(s).

They (and all other useRs who define median() S3 methods and
want their code to work both in R <= 3.3.x _and_ R >= 3.4.0
could use code such as
(for package 'sets' in R/summary.R )

 
median.set <-
function(x, na.rm = FALSE, ...)
{
median(as.numeric(x), na.rm = na.rm, ...)
}

## drop '...' in R versions <= 3.4.0 :
if((!any("..." == names(formals(median {
formals(median.set) <- formals(median.set)[names(formals(median.set)) != 
"..."]
body(median.set)[[2]] <- body(median.set)[[2]][-4]
}

or simply
 
median.cset <-
if("..." %in% names(formals(median))) {
function(x, na.rm = FALSE, ...) median.gset(x, na.rm = na.rm, ...)
} else
function(x, na.rm = FALSE)  median.gset(x, na.rm = na.rm)


which is R code that will work fine in both current (and older)
R and in R-devel and future R versions.

For packages however, this will leave a 'R CMD check '
warning (for now) because code and documentation mismatch
either in R-devel (and future)  R  or in current and previous R versions.

It is less clear what to do for these man i.e. *.Rd  pages [if you
have them for your median method(s): Note that they *are* optional for
registered S3 methods; package 'sets', e.g., documents 2 out of
4 median methods]. 

It may (or may not) make sense to tweak R-devel's own 'R CMD check'
to _not_ warn for the missing '...' in median methods for a
while and consequently you'd get away with continued use of no
'...' in the help page \usage{ ... } section.

One solution of course, would be to wait a bit and then release
such package only with

Depends: R (>= 3.4.0)

where you'd  use  '...' and keep the previous CRAN version of
the package for all earlier versions of R.
That is a maintenance pain however, if you want to change your
package features, because then you'd have to start releasing to
versions of the package: an "old" one with

Depends: R (< 3.4.0)

and a "new" one with   R (>= 3.4.0).

Probably easiest would be to comment the \usage{.} / \arguments \item{...}
parts for the time being {as long as you don't want R (>= 3.4.0)
in your package DESCRIPTION "unconditionally"}.

--

Tweaking  R-devel's  tools::codoc()  for this special case may
be a solution liked more by package maintainers for this case.
OTOH, we can only change R-devel's version of codoc(), so it
would be that platform which would show slightly inaccurate
"Usage:" for these (by not showing "...")  which also seems a
kludgy solution.



> Actually it probably will break things when people start
> using the new R version which implements the above *AND*
> use packages installed with a previous version of R.  I
> agree that this does not count as "breaking any code".

> In spite of all that *and* the perennial drawback that a
> '...' will allow argument name typos to go unnoticed

> I agree you have a good argument nowadays, that median()
> should be the same as many similar "basic statistics" R
> functions and so I'll commit such a change to R-devel (to
> become R 3.4.0 in April).

> Thank you for the suggestion!  Martin Maechler, ETH Zurich

>> -
>> Rob J Hyndman Professor of Statistics, Monash University
>> www.robjhyndman.com

>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Test suite failures in R-devel_2017-02-25_r72256

2017-02-28 Thread Martin Maechler
>>>>> Martin Maechler <maech...@stat.math.ethz.ch>
>>>>> on Mon, 27 Feb 2017 16:08:40 +0100 writes:

>>>>> Peter Simons <sim...@nospf.cryp.to>
>>>>> on Mon, 27 Feb 2017 10:30:39 +0100 writes:

>> Hi, I tried compiling the latest pre-release for R 3.3.3
>> for the NixOS Linux distribution [1], but the build fails
>> during the "make check" phase because of the following 2
>> issues:

>> 1) The "tools" test in "tests/Examples" requires network
>> access, which it doesn't have in our build
>> environment. 

> One may argue that the 'make check' (or even 'make check-all')
> tests could / should be allowed more resources than the pure
> build environment.

>> Therefore, it fails as follows according to
>> "tools-Ex.Rout.fail":

>> | [...]
>> | > set.seed(11)
>> | > ## End(Don't show)
>> | > pdb <- CRAN_package_db()
>> | Warning in url(sprintf("%s/%s", cran, path), open = "rb") :
>> |   URL 'http://CRAN.R-project.org/web/packages/packages.rds': status 
was 'Couldn't resolve host name'
>> | Error in url(sprintf("%s/%s", cran, path), open = "rb") :
>> |   cannot open the connection to 
'http://CRAN.R-project.org/web/packages/packages.rds'
>> | Calls: CRAN_package_db -> as.data.frame -> read_CRAN_object -> gzcon 
-> url
>> | Execution halted

>> I'm wondering whether it would be possible to extend the test suite
>> with a configure-time flag that disable tests which depend on network
>> access? My experience is that most modern Linux distributions run
>> their builds in a restricted environment and therefore will run into
>> trouble if the suite assumes that it can access the Internet.

> [see above] Is it necessary to also run the 'make check' part in
> that restricted environment?  Or could that ('checking") not get
> more priviledges?

> Note that you can only run  "make check" if you don't install
> recommended packages, whereas more thorough testing would
> include
>  make check-devel
> or even
>  make check-all
>
> but these do have quite a bit more requirements including
> recommended packages being present.

I have to correct myself:  The above paragraph may be misleading:

Much, if not all of
  make check-devel
and   make check-all
have worked well since R version 3.1.0  which had in its NEWS an entry

 • More of 'make check' will work if recommended packages are not
   installed: but recommended packages remain needed for thorough
   checking of an R build.

Further, the 'R-admin' manual (on 'make check-all' etc) contains

 |  Note that these checks are only run completely 
 |  if the recommended packages are installed.

so their presence is not required  but much recommended for
thorough testing.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Test suite failures in R-devel_2017-02-25_r72256

2017-02-27 Thread Martin Maechler
> Peter Simons 
> on Mon, 27 Feb 2017 10:30:39 +0100 writes:

> Hi, I tried compiling the latest pre-release for R 3.3.3
> for the NixOS Linux distribution [1], but the build fails
> during the "make check" phase because of the following 2
> issues:

> 1) The "tools" test in "tests/Examples" requires network
> access, which it doesn't have in our build
> environment. 

One may argue that the 'make check' (or even 'make check-all')
tests could / should be allowed more resources than the pure
build environment.

> Therefore, it fails as follows according to
> "tools-Ex.Rout.fail":

> | [...]
> | > set.seed(11)
> | > ## End(Don't show)
> | > pdb <- CRAN_package_db()
> | Warning in url(sprintf("%s/%s", cran, path), open = "rb") :
> |   URL 'http://CRAN.R-project.org/web/packages/packages.rds': status was 
'Couldn't resolve host name'
> | Error in url(sprintf("%s/%s", cran, path), open = "rb") :
> |   cannot open the connection to 
'http://CRAN.R-project.org/web/packages/packages.rds'
> | Calls: CRAN_package_db -> as.data.frame -> read_CRAN_object -> gzcon -> 
url
> | Execution halted

> I'm wondering whether it would be possible to extend the test suite
> with a configure-time flag that disable tests which depend on network
> access? My experience is that most modern Linux distributions run
> their builds in a restricted environment and therefore will run into
> trouble if the suite assumes that it can access the Internet.

[see above] Is it necessary to also run the 'make check' part in
that restricted environment?  Or could that ('checking") not get
more priviledges?

Note that you can only run  "make check" if you don't install
recommended packages, whereas more thorough testing would
include
make check-devel
or even
make check-all

but these do have quite a bit more requirements including
recommended packages being present.


> 2) When R is compiled with the --without-recommended-packages flag
> (which is our preferred configuration), the "base" test in
> "tests/Examples" fails, apparently because it depends on MASS. I'm
> citing from "base-Ex.Rout.fail":

> | >  ## The string "foo" and the symbol 'foo' can be used interchangably 
here:
> | >  stopifnot( identical(isNamespaceLoaded(  "foo"   ), FALSE),
> | + identical(isNamespaceLoaded(quote(foo)), FALSE),
> | + identical(isNamespaceLoaded(quote(stats)), statL))
> | >
> | > hasM <- isNamespaceLoaded("MASS") # (to restore if needed)
> | > Mns <- asNamespace("MASS") # loads it if not already
> | Error in loadNamespace(name) : there is no package called 'MASS'
> | Calls: asNamespace ... tryCatch -> tryCatchList -> tryCatchOne -> 

> | Execution halted

Yes, that example should not have assumed a recommended package
to be available unconditionally.

I've changed it, thank you!


> I hope this helps!

> Best regards,
> Peter



> [1] http://nixos.org/

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] stats::median

2017-02-27 Thread Martin Maechler
>>>>> Rob J Hyndman <rob.hynd...@monash.edu>
>>>>> on Wed, 15 Feb 2017 21:48:56 +1100 writes:

> The generic stats::median method is defined as median <-
> function (x, na.rm = FALSE) {UseMethod("median")}

> I suggest that this should become median <- function (x,
> na.rm = FALSE, ...)  {UseMethod("median")}

> This would allow additional S3 methods to be developed
> with additional arguments.

and  S4  methods, too.

> Currently I have to over-ride this generic definition in
> the demography package because median.demogdata has
> several other arguments.

> This shouldn't break any code, and will make it easier for
> new S3 methods to be developed. It is also consistent with
> almost all other S3 methods which do include an ellipsis.

"shouldn't break any code"   is almost always quite optimistic
   nowadays,

Actually it probably will break things when people start using
the new R version which implements the above *AND* use packages
installed with a previous version of R.
I agree that this does not count as "breaking any code".

In spite of all that  *and*
the perennial drawback that a '...' will allow argument name
typos to go unnoticed

I agree you have a good argument nowadays, that median() should
be the same as many similar "basic statistics" R functions and
so I'll commit such a change to R-devel (to become R 3.4.0 in April).

Thank you for the suggestion!
Martin Maechler,
ETH Zurich

> -
> Rob J Hyndman Professor of Statistics, Monash University
> www.robjhyndman.com

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] rep/rep.int: in NEWS, but not yet ported from trunk

2017-02-27 Thread Martin Maechler
>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel@r-project.org>
>>>>> on Sun, 26 Feb 2017 12:02:44 + writes:

> According to "CHANGES IN R 3.3.2 patched" in NEWS, rep(x,
> times) and rep.int(x, times) also work when 'times' has
> length greater than one and has element larger than the
> maximal integer. In fact, it is still not the case in R
> 3.3.3 beta r72259. In seq.c
> (https://svn.r-project.org/R/branches/R-3-3-branch/src/main/seq.c),
> 'times' that is a vector with storage mode "double" and
> length greater than one is still changed first to storage
> mode "integer". Number in 'times' that represents an
> integer that is larger than the maximal integer becomes NA
> and error is issued for such 'times'.  
> I have put a comment,
> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16932#c30
> .

Thank you for noticing!

- I've changed the NEWS entry for R-patched (and moved the more
general statement to a new entry for R-devel). 
- The changes were quite substantial so I did not port them to
R-patched at the time..  We could have ported them later, but
not now, immediately before code freeze (of R 3.3.3).

- I would say   rep(5, list(6))  was never "meant to" work and had worked
  incidentally only.
  OTOH, you are correct with your comments 11 & 29 in the about
  bug report, and your proposal to make the simple case   rep(s, list(7))
  work as previously seems ok to me.

However, for all this, we will concentrate on R-devel (to become
R 3.4.0).

Best regards,
Martin Maechler

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Update copyright year in manuals

2017-02-25 Thread Martin Maechler

> On Thu, Feb 23, 2017 at 03:23:10PM +0100, Martin Maechler wrote:
> > >>>>> Mikko Korpela <mikko.korp...@helsinki.fi>
> > >>>>> on Thu, 23 Feb 2017 14:02:58 +0200 writes:
> > 
> > > With new R releases soon to come, I suggest updating the
> > > Rcopyright macro in "doc/manual/R-defs.texi" to use year
> > > 2017.
> > 
> > Now this is an e-mail that *REALLY* does not fit to the R-devel
> > mailing list ... even though it is very very slightly related to
> > the R sources.
> > 
> > We do *not* want noise on R-devel, please.
> > 
> > (and let's continue this issue in private if you want)
> > Martin

> Martin, I'm confused. Should Mikko have sent his suggestion to the
> bugtracker, or just left it to the R Core Team to figure out for
> themselves? Are we not grateful for valid suggestions? What's the
> lesson here? Maybe I'm the only slow one on the list, but to me your
> reply would have been a bit more helpful if it told us what to do, not
> just what not to do...

Well, I really basically meant to do "nothing".

Mikko and I had continued the conversation in private, but here
is an excerpt of what I wrote to him:

 I think this is a case you should not even have reported at all,
 because
 
 - it is really not important: AFAIK, if you have
   copyright, the updating to new years is not important for legal
   reasons, but only for "propaganda".
 
 - (==> ) We don't ever update copyrights if we don't change the document,
   and even then do it often but not always.
 
 - more importantly: We don't clutter the SCM log files by *only*
   just changing such dates.  Some (even most?) of us, do update
   the copyright, if we do some real (non-trivial) change to the document,
   *and* notice that the last copyright is smaller than the
   current year.

I also told him that yes indeed we are grateful to learn about
errors including typos in manuals ambigous statements there, etc.

Note that sometimes even these may not need to go to R-devel
(with its 1000's of  subscribers rather interested into the
development of R, future features, bug fixes or changes of R..)
and rather be reported to R core or individuals whom you may have
identified (via svn logs) as working with these documents recently.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Update copyright year in manuals

2017-02-23 Thread Martin Maechler
> Mikko Korpela 
> on Thu, 23 Feb 2017 14:02:58 +0200 writes:

> With new R releases soon to come, I suggest updating the
> Rcopyright macro in "doc/manual/R-defs.texi" to use year
> 2017.

Now this is an e-mail that *REALLY* does not fit to the R-devel
mailing list ... even though it is very very slightly related to
the R sources.

We do *not* want noise on R-devel, please.

(and let's continue this issue in private if you want)
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] possible improvement to ?with examples

2017-02-21 Thread Martin Maechler
> Ben Bolker 
> on Thu, 16 Feb 2017 15:37:13 -0500 writes:

> A querent on StackOverflow asked about the with() function
> 
http://stackoverflow.com/questions/42283479/why-when-to-use-with-function#42283479

> and asked about the example in ?with

> library(MASS)
> with(anorexia, {
>  anorex.1 <- glm(Postwt ~ Prewt + Treat + offset(Prewt),
> family = gaussian)
>  summary(anorex.1)
> })

> which saves little or no typing relative to

> anorex.1 <- glm(Postwt ~ Prewt + Treat + offset(Prewt),
> family = gaussian, data=anorexia)

> (I would argue that the latter is better practice anyway).

> Could we have something more sensible like

> with(mtcars,mpg[cyl==8 & disp>350])

> ?  (It could be contrasted directly with

> mtcars$mpg[mtcars$cyl==8 & mtcars$disp>350]

> )

I now have done something like the above, and have added a
\note{ .. }  to warn about "over - use" of with().

Also added a link to Thomas Lumley's paper
  Thomas Lumley (2003)  \emph{Standard nonstandard evaluation rules}.
  \url{http://developer.r-project.org/nonstandard-eval.pdf}

> I'm happy to submit a bug report/patch if that seems appropriate.

Thank you, Ben!
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wish List: Extensions to the derivatives table

2017-02-20 Thread Martin Maechler
> 
> The issue is that without an extensible derivative table or the proposed 
> extensions, it is not possible to automatically produce (without manual 
> modification of the deriv3 output) a function that avoids catastrophic 
> cancellation regardless of the working range.
> Manual modification is not onerous as a one-time exercise, but can be time 
> consuming when it must be done numerous times, for example when evaluating 
> the impact of different parameterizations on parameter effects curvature.  
> The alternative of more flexible differentiation does not seem to be a 
> difficult addition to R.  In S+ (which does not have deriv3) it would simply 
> involve adding the following lines to the switch statement in D
> 
>   expm1 = make.call("*", make.call("exp", expr[[2]]), D(expr[[2]], name)),
>   log1p = make.call("/", D(expr[[2]], name), make.call("+", 1., expr[[2]])),
>   log2 = make.call("/", make.call("/", D(expr[[2]], name), expr[[2]]), 
> quote(log(2)) ),
>   log10 = make.call("/", make.call("/", D(expr[[2]], name), expr[[2]]), 
> quote(log(10)) ),
>   cospi = make.call("*", make.call("*", make.call("sinpi", expr[[2]]), 
> make.call("-", D(expr[[2]], name))), quote(pi)),
>   sinpi = make.call("*", make.call("*", make.call("cospi", expr[[2]]), 
> D(expr[[2]], name)), quote(pi)),
>   tanpi = make.call("/", make.call("*", D(expr[[2]], name), quote(pi)), 
> make.call("^", make.call("cospi", expr[[2]]), 2)),
> 
> Jerry

You are right, Jerry, it would be nice if R's derivative table
could be extended by the useR  using simple R code.
As Duncan Murdoch has mentioned already, this is now provided as
a byproduct of the functionality in the CRAN package 'nlsr'
{after that is tweaked, as you mentioned}, which is nice and
good to know (for all of us).

As one person who knows how important it may be to avoid cancellation,
I still would be willing to add to the "derivatives table" in
R's C source  if people like you provided  a (tested!) patch to
the source, which is in
https://svn.r-project.org/R/trunk/src/library/stats/src/deriv.c

Martin


> From: Avraham Adler [mailto:avraham.ad...@gmail.com]
> Sent: Friday, February 17, 2017 4:16 PM
> To: Jerry Lewis; r-devel@r-project.org
> Subject: Re: [Rd] Wish List: Extensions to the derivatives table
> 
> Hi.
> 
> Unless I'm misremembering, log, exp, sin, cos, and tan are all handled in 
> deriv3. The functions listed are  specially coded slightly more accurate 
> versions but can be substituted with native ones for which deriv/deriv3 will 
> work automatically. I believe that if you  write your functions using log(a + 
> 1) instead of log1p(a) or log(x) / log(2) instead of log2(x) deriv3 will work 
> fine.


> Thanks,
> Avi
> 
> On Fri, Feb 17, 2017 at 2:02 PM Jerry Lewis 
> > wrote:
> The derivative table resides in the function D.  In S+ that table is 
> extensible because it is written in the S language.  R is faster but less 
> flexible, since that table is programmed in C.  It would be useful if R 
> provided a mechanism for extending the derivative table, or barring that, 
> provided a broader table.  Currently unsupported mathematical functions of 
> one argument include expm1, log1p, log2, log10, cospi, sinpi, and tanpi.
> 
> While manual differentiation of these proposed additions is straight-forward, 
> their absence complicates what otherwise could be much simpler, such as using 
> deriv() or deriv3() to generate functions, for example to use as an nls model.
> 
> Thanks,

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] another fix for R crashes under enable-strict-barrier, lto, trunk@72156

2017-02-20 Thread Martin Maechler
> Hin-Tak Leung 
> on Sat, 11 Feb 2017 19:30:26 + writes:

> I haven' t touched R for some 18 months, and so I have no
> idea if this is a recent problems or not; but it certainly
> did not segfault two years ago.  Since it has been
> crashing (segfault) under 'make check-all' for over a
> month, I reckon I'll have to look at it myself, to have it
> fixed.

> I have been having the ' --enable-memory-profiling 
--enable-strict-barrier --with-valgrind-instrumentation=2" options

> for perhaps a decade - because I work(ed) with people who
> like to write buggy code :-(. And I also run 'make
> check-all' from time to time until two years ago.

> ./configure --enable-memory-profiling --enable-strict-barrier 
--enable-byte-compiled-packages --with-valgrind-instrumentation=2 --enable-lto

> current R dev crashes in make check-all . The fix is this:


> --- a/src/main/memory.c
> +++ b/src/main/memory.c
> @@ -3444,7 +3444,7 @@ R_xlen_t (XTRUELENGTH)(SEXP x) { return 
XTRUELENGTH(CHK2(x)); }
>  int  (IS_LONG_VEC)(SEXP x) { return IS_LONG_VEC(CHK2(x)); }

>  const char *(R_CHAR)(SEXP x) {
> -if(TYPEOF(x) != CHARSXP)
> +if(x && (TYPEOF(x) != CHARSXP))
> error("%s() can only be applied to a '%s', not a '%s'",
>   "CHAR", "CHARSXP", type2char(TYPEOF(x)));
>  return (const char *)CHAR(x);


> It is a fairly obvious fix to a bug since

> include/Rinternals.h:#define TYPEOF(x) ((x)->sxpinfo.type)

> and it was trying to de-reference "0->sxpinfo.type" (under
> --enable-strict-barrier I think).

Thank you  Hin-Tak!

I did not yet try to reproduce the segfault, and I am not
the expert here.  Just some remarks and a follow up question:

Typically, the above R_CHAR() is equivalent to the  CHAR()
macro which is used in many places.  I  _think_ that the bug is
that this is called with '0' instead of a proper SEXP  in your
case and the bug fix may be more appropriate "up stream", i.e.,
at the place where that call happens  rather than inside
R_CHAR.

Any chance you saw or can get more info about the location of
the crash, such as a stack trace ? 

The idiom if(TYPEOF(x)  ==  SXP)
is used in many places in the R sources, and I think we never
prepend that with a  'x && '  like you propose above.




> So there.

> While I subscribe to R-devel, I switched off delivery, so
> please CC if a response is required.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Grapics Device Resolution Limits

2017-02-10 Thread Martin Maechler
>>>>> Dario Strbenac <dstr7...@uni.sydney.edu.au>
>>>>> on Fri, 10 Feb 2017 02:00:08 + writes:

> Good day,
> Could the documentation of graphics devices give some explanation of how 
big the bitmap limits are? For example,

>> png("Figure1A.png", h = 7, w = 7, res = 1000, units = "cm")

> Results in Error: unable to start png() device, 

This is amazing to me.  I see

--
> png("Figure1A.png", h = 7, w = 7, res = 1000, units = "cm")
> plot(1)
> dev.off()
null device 
  1 
> file.info("Figure1A.png")[1:5]
  size isdir mode   mtime   ctime
Figure1A.png 41272 FALSE  644 2017-02-10 17:40:42 2017-02-10 17:40:42
> 
--

in three different versions of R I've tried (all were 64-bit Linux).
Note how *small* the file is.
Now, I've also tried a 32-bit version of Linux (Ubuntu 14.04 LTS) and get 
a similar result (not exactly the same number of bytes for the file size).


> but the help page of devices doesn't explain that there are any limits or how 
> they are determined. The wording of the error message could also be improved, 
> to explain that the resolution is too high or the dimensions are too large.

If one/some of those who can reproduce the problem in their
versions of R  provide (concise and not hard to read) patches to
the source of R, we'd probably gratefully accept them..

Martin Maechler

>> sessionInfo()
> R version 3.3.2 Patched (2017-02-07 r72138)
> Platform: i386-w64-mingw32/i386 (32-bit)
> Running under: Windows 7 (build 7601) Service Pack 1

> --
> Dario Strbenac
> University of Sydney
> Camperdown NSW 2050
> Australia

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R CMD check error (interfacing to C API of other pkg)

2017-02-10 Thread Martin Maechler
> Therneau, Terry M , Ph D 
> on Thu, 9 Feb 2017 12:56:17 -0600 writes:

> Martyn,
> No, that didn't work.
> One other thing in the mix (which I don't think is the issue) is that I 
call one of the 
> C-entry points of expm.  So the DESCRIPTION file imports expm, the 
NAMESPACE file imports 
> expm, and the init.c file is

> #include "R.h"
> #include "R_ext/Rdynload.h"

> /* Interface to expm package. */
> typedef enum {Ward_2, Ward_1, Ward_buggy_octave} precond_type;
> void (*expm)(double *x, int n, double *z, precond_type precond_kind);
> void R_init_hmm(DllInfo *dll)
> {
>  expm = (void (*)) R_GetCCallable("expm", "expm");
> }

> I don't expect that this is the problem since I stole the
> above almost verbatim from the msm package.

> Terry T.

Hmm.  Yes, I can see that the CRAN package  msm  does do this, indeed.

It is interesting if/why that does not produce any notes or rather even 
warnings.
In principle, if you use the C API of 'expm'  you should use
  'LinkingTo: expm'

see *the* manual, specifically the section


https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Linking-to-native-routines-in-other-packages

and that section does mention that (unfortunately in my view)
you also should use 'Imports:' or 'Depends:' in addition to the 'LinkingTo:'

Note howver that  'expm' would not have to mentioned 
in the NAMESPACE file unless your R functions do use some of
expm's R level functionality.


Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Ancient C /Fortran code linpack error

2017-02-09 Thread Martin Maechler

> > On 9 Feb 2017, at 16:00, Göran Broström <goran.brost...@umu.se> wrote:
> > 
> > In my package 'glmmML' I'm using old C code and linpack in the optimizing 
> > procedure. Specifically, one part of the code looks like this:
> > 
> >F77_CALL(dpoco)(*hessian, , , , work, info);
> >if (*info == 0){
> >F77_CALL(dpodi)(*hessian, , , det, );
> >
> > 
> > This usually works OK, but with an ill-conditioned data set (from a user of 
> > glmmML) it happened that the hessian was all nan. However, dpoco returned 
> > *info = 0 (no error!) and then the call to dpodi hanged R!
> > 
> > I googled for C and nan and found a work-around: Change 'if ...' to
> > 
> >   if (*info == 0 & (hessian[0][0] == hessian[0][0])){
> > 
> > which works as a test of hessian[0][0] (not) being NaN.
> > 
> > I'm using the .C interface for calling C code.
> > 
> > Any thoughts on how to best handle the situation? Is this a bug in dpoco? 
> > Is there a simple way to test for any NaNs in a vector?

> You should/could use macro R_FINITE to test each entry of the hessian.
> In package nleqslv I test for a "correct" jacobian like this in file 
> nleqslv.c in function fcnjac:

> for (j = 0; j < *n; j++)
> for (i = 0; i < *n; i++) {
> if( !R_FINITE(REAL(sexp_fjac)[(*n)*j + i]) )
> error("non-finite value(s) returned by jacobian 
> (row=%d,col=%d)",i+1,j+1);
> rjac[(*ldr)*j + i] = REAL(sexp_fjac)[(*n)*j + i];
> }

A minor hint  on that:  While REAL(.)  (or INTEGER(.) ...)  is really cheap in
the R sources themselves, that is not the case in package code.

Hence, not only nicer to read but even faster is

  double *fj = REAL(sexp_fjac);
  for (j = 0; j < *n; j++)
for (i = 0; i < *n; i++) {
if( !R_FINITE(fj[(*n)*j + i]) )
   error("non-finite value(s) returned by jacobian 
(row=%d,col=%d)",i+1,j+1);
   rjac[(*ldr)*j + i] = fj[(*n)*j + i];
 }


> There may be a more compact way with a macro in the R headers.
> I feel that If other code can't handle non-finite values: then test.

> Berend Hasselman

Indeed: do test.
Much better safe than going for speed and losing in rare cases! 

The latter is a recipe to airplanes falling out of the sky  ( ;-\ )
and is unfortunately used by some (in)famous "optimized" (fast but
sometimes wrong!!) Lapack/BLAS libraries.

The NEWS about the next version of R (3.4.0 due in April) has
a new 2-paragraph entry related to this:

-

  SIGNIFICANT USER-VISIBLE CHANGES:

[...]

* Matrix products now consistently bypass BLAS when the inputs have
  NaN/Inf values. Performance of the check of inputs has been
  improved. Performance when BLAS is used is improved for
  matrix/vector and vector/matrix multiplication (DGEMV is now used
  instead of DGEMM).

  One can now choose from alternative matrix product
  implementations via options(matprod = ).  The "internal"
  implementation is unoptimized but consistent in precision with
  other summation in R (uses long double accumulators).  "blas"
  calls BLAS directly for best performance, yet usually with
  undefined behavior for inputs with NaN/Inf.

-----


Last but not least :

If you are not afraid of +/- Inf, but really only of NA/NaN's (as the OP said), 
then note that "THE manual" (= "Writing R Extensions") does mention
ISNAN(.) almost in the same place as the first occurence of
R_FINITE(.).

Best regards,
Martin Maechler, ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Lack of 'seq_len' in 'head' in 'stopifnot'

2017-02-04 Thread Martin Maechler
> Suharto Anggono Suharto Anggono via R-devel 
> on Sat, 4 Feb 2017 10:18:33 + writes:

> Function 'stopifnot' in R devel r72104 has this.
>   head <- function(x, n = 6L) ## basically utils:::head.default()
> x[if(n < 0L) max(length(x) + n, 0L) else min(n, length(x))]

> If definition like utils:::head.default is intended, the index of 'x' 
should be wrapped in seq_len(...):
> x[seq_len(...)]

You are right... that was "lost in translation" .

As  seq_len(1) is 1   and that seems to have been the only case
much exercised, nobody seems to have noticed the problem till
now ((this assumes people *would* report it if they noticed.
  Yes, "hope dies last"  ;-))

Thank you, this is amended now.
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RFC: tapply(*, ..., init.value = NA)

2017-02-04 Thread Martin Maechler
>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel@r-project.org>
>>>>> on Wed, 1 Feb 2017 16:17:06 + writes:

> On 'aggregate data.frame', the URL should be
> https://stat.ethz.ch/pipermail/r-help/2016-May/438631.html .

thank you. Yes, using 'drop' makes sense there where the result
is always "linear(ized)" or "one-dimensional".
For tapply() that's only the case for 1D-index.

> vector(typeof(ans)) (or vector(storage.mode(ans))) has
> length zero and can be used to initialize array.  

Yes,.. unless in the case where ans is NULL.
You have convinced me, that is  nicer.

> Instead of if(missing(default)) , if(identical(default,
> NA)) could be used. The documentation could then say, for
> example: "If default = NA (the default), NA of appropriate
> storage mode (0 for raw) is automatically used."

After some thought (and experiments), I have reverted and no
longer use if(missing). You are right that it is not needed
(and even potentially confusing) here.

Changes are in svn c72106.

Martin Maechler


> 
> On Wed, 1/2/17, Martin Maechler
> <maech...@stat.math.ethz.ch> wrote:

>  Subject: Re: [Rd] RFC: tapply(*, ..., init.value = NA)

>  Cc: R-devel@r-project.org Date: Wednesday, 1 February,
> 2017, 12:14 AM
 
>>>>> Suharto Anggono Suharto Anggono via R-devel 
>>>>> on Tue, 31 Jan 2017 15:43:53 + writes:

>> Function 'aggregate.data.frame' in R has taken a
>> different route. With drop=FALSE, the function is also
>> applied to subset corresponding to combination of
>> grouping variables that doesn't appear in the data
>> (example 2 in
>> https://stat.ethz.ch/pipermail/r-devel/2017-January/073678.html).

> Interesting point (I couldn't easily find 'the example 2'
> though).  However, aggregate.data.frame() is a
> considerably more sophisticated function and one goal was
> to change tapply() as little as possible for compatibility
> (and maintenance!) reasons .

> [snip]

>> With the code using if(missing(default)) , I consider the
>> stated default value of 'default', default = NA ,
>> misleading because the code doesn't use it.

> I know and I also had thought about it and decided to keep
> it in the spirit of "self documentation" because "in
> spirit", the default still *is* NA.

>> Also, tapply(1:3, 1:3, as.raw) is not the same as
>> tapply(1:3, 1:3, as.raw, default = NA) .  The accurate
>> statement is the code in if(missing(default)) , but it
>> involves the local variable 'ans'.

> exactly.  But putting that whole expression in there would
> look confusing to those using str(tapply), args(tapply) or
> similar inspection to quickly get a glimpse of the
> function user "interface".  That's why we typically don't
> do that and rather slightly cheat with the formal default,
> for the above "didactical" purposes.

> If you are puristic about this, then missing() should
> almost never be used when the function argument has a
> formal default.

> I don't have a too strong opinion here, and we do have
> quite a few other cases, where the formal default argument
> is not always used because of if(missing(.))  clauses.

> I think I could be convinced to drop the '= NA' from the
> formal argument list..


>> As far as I know, the result of function 'array' in is
>> not a classed object and the default method of `[<-` will
>> be used in the 'tapply' code portion.

>> As far as I know, the result of 'lapply' is a list
>> without class. So, 'unlist' applied to it uses the
>> default method and the 'unlist' result is a vector or a
>> factor.

> You may be right here ((or not: If a package author makes
> array() into an S3 generic and defines S3method(array, *)
> and she or another make tapply() into a generic with
> methods, are we really sure that this code would not be
> used ??))

> still, the as.raw example did not easily work without a
> warning when using as.vector() .. or similar.

>> With the change, the result of

>> tapply(1:3, 1:3, factor, levels=3:1)

>> is of mode "character". The value is from the internal
>> code, not from the factor levels. It is worse than before
>> the change, where it is really the internal code,
>> integer.

> I agree that this change is not desirable.  One could
> argue that it was quite a

Re: [Rd] Typos in manuals

2017-02-02 Thread Martin Maechler
> Mikko Korpela 
> on Wed, 1 Feb 2017 12:16:49 +0200 writes:

> I found some trivial typos, mostly unmatched parentheses, in the R 
> manuals. More information and suggested fixes are in the attached diff 
file.
Thank you very much!
They have been applied to the R-devel and R-patched source now.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Unexpected EOF in R-patched_2017-01-30

2017-02-01 Thread Martin Maechler
> Avraham Adler 
> on Tue, 31 Jan 2017 16:07:20 -0500 writes:

> On Tue, Jan 31, 2017 at 3:30 PM, peter dalgaard  wrote:
>> 
>>> On 31 Jan 2017, at 18:56 , Avraham Adler  
wrote:
>>> 
>>> Hello.
>>> 
>>> When trying to unpack today's version of R-patched,
>> 
>> From which source? The files from cran.r-project.org seems OK, both 
those in src/base-prerelease and those from ETHZ. 

>> Also, is it not "tar -xfz" when reading a compressed file?

Recent (for several years) versions of tar (on Linux at least)
do not need the compression extension anymore: They guess it
correctly from the file.

>> 
>> -pd

>> From 

The last two of the daily R-patched*.tar.gz
unpack flawlessly for me as well.

Could it be that your Windows(?) version of tar (or the file
system or ???) is the problem?

Or the file was corrupted during download?

Here are the md5sum s  from the server itself for the last three snapshots:

388b607afe732c92442dbb49845fe377  /ftp/R/R-patched_2017-01-31.tar.gz
7daea59067454311818df1c75971a485  /ftp/R/R-patched_2017-01-30.tar.gz
9ddad833a455973631920c70b6da5d6e  /ftp/R/R-patched_2017-01-29.tar.gz



> Also, while passing z is not in the instructions given in Installation
> and Administration [1], I tried passing -xzf and it did not work. I
> believe f has to be last if the file name follows immediately.

> [1]  


> Thanks,

> Avi

>>> I get the following error:
>>> 
>>> C:\R>tar -xf R-patched_2017-01-30.tar.gz
>>> 
>>> gzip: stdin: unexpected end of file
>>> tar: Unexpected EOF in archive
>>> tar: Unexpected EOF in archive
>>> tar: Error is not recoverable: exiting now
>>> 
>>> I got the same error for R-patched_2017-01-30.tar.gz but not for 
R-3.3.2.tar.gz.
>>> 
>>> Thank you,
>>> 
>>> Avi
>>> 
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
>> --
>> Peter Dalgaard, Professor,
>> Center for Statistics, Copenhagen Business School
>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>> Phone: (+45)38153501
>> Office: A 4.23
>> Email: pd@cbs.dk  Priv: pda...@gmail.com
>> 

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RFC: tapply(*, ..., init.value = NA)

2017-01-31 Thread Martin Maechler
> Suharto Anggono Suharto Anggono via R-devel 
> on Tue, 31 Jan 2017 15:43:53 + writes:

> Function 'aggregate.data.frame' in R has taken a different route. With 
drop=FALSE, the function is also applied to subset corresponding to combination 
of grouping variables that doesn't appear in the data (example 2 in 
https://stat.ethz.ch/pipermail/r-devel/2017-January/073678.html).

Interesting point (I couldn't easily find 'the example 2' though).
However, aggregate.data.frame() is a considerably more
sophisticated function and one goal was to change tapply() as
little as possible for compatibility (and maintenance!) reasons .

> Because 'default' is used only when simplification happens, putting 'default' 
> after 'simplify' in the argument list may be more logical. 

Yes, from this point of view, you are right; I had thought about
that too; on the other hand, it belongs "closely" to the 'FUN'
and I think that's why I had decided not to change the proposal..

> Anyway, it doesn't affect call to 'tapply' because the argument 'default' 
> must be specified by name.

Exactly.. so we keep the order as is.

> With the code using
>if(missing(default)) ,
> I consider the stated default value of 'default',
>default = NA ,
> misleading because the code doesn't use it. 

I know and I also had thought about it and decided to keep it 
in the spirit of "self documentation" because  "in spirit", the
default still *is* NA.

> Also,
>  tapply(1:3, 1:3, as.raw)
> is not the same as
>  tapply(1:3, 1:3, as.raw, default = NA) .
> The accurate statement is the code in
> if(missing(default)) ,
> but it involves the local variable 'ans'.

exactly.  But putting that whole expression in there would look
confusing to those using  str(tapply), args(tapply) or similar
inspection to quickly get a glimpse of the function user "interface".
That's why we typically don't do that and rather slightly cheat
with the formal default, for the above "didactical" purposes.

If you are puristic about this, then missing() should almost never
be used when the function argument has a formal default.

I don't have a too strong opinion here, and we do have quite a
few other cases, where the formal default argument is not always
used because of   if(missing(.))  clauses.

I think I could be convinced to drop the '= NA' from the formal
argument list..


> As far as I know, the result of function 'array' in is not a classed 
object and the default method of  `[<-` will be used in the 'tapply' code 
portion.

> As far as I know, the result of 'lapply' is a list without class. So, 
'unlist' applied to it uses the default method and the 'unlist' result is a 
vector or a factor.

You may be right here
  ((or not:  If a package author makes array() into an S3 generic and defines
S3method(array, *) and she or another make tapply() into a
generic with methods,  are we really sure that this code
would not be used ??))

still, the as.raw example did not easily work without a warning
when using as.vector() .. or similar.

> With the change, the result of

> tapply(1:3, 1:3, factor, levels=3:1)

> is of mode "character". The value is from the internal code, not from the 
factor levels. It is worse than before the change, where it is really the 
internal code, integer.

I agree that this change is not desirable.
One could argue that it was quite a "lucky coincidence" that the previous
code returned the internal integer codes though..


> In the documentation, the description of argument 'simplify' says: "If 
'TRUE' (the default), then if 'FUN' always returns a scalar, 'tapply' returns 
an array with the mode of the scalar."


> To initialize array, a zero-length vector can also be used.

yes, of course; but my  ans[0L][1L]  had the purpose to get the
correct mode specific version of NA .. which works for raw (by
getting '00' because "raw" has *no* NA!).

So it seems I need an additional   !is.factor(ans)  there ...
a bit ugly.


-

> For 'xtabs', I think that it is better if the result has storage mode 
> "integer" if 'sum' results are of storage mode "integer", as in R 3.3.2. 

you are right, that *is* preferable

>  As 'default' argument for 'tapply', 'xtabs' can use 0L, or use 0L or 0 
> depending on storage mode of the summed quantity.

indeed, that will be an improvement there!

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] no visible global function definition for ‘par’

2017-01-31 Thread Martin Maechler
>>>>> Dirk Eddelbuettel <e...@debian.org>
>>>>> on Mon, 30 Jan 2017 20:50:19 -0600 writes:

> On 30 January 2017 at 09:58, Kevin Ushey wrote:
> | The correct thing to do is indeed import any functions from any R 
packages
> | you use, base or otherwise. The simplest fix, if you don't want to
> | selectively import such a large range of functions, is to simply add 
e.g.
> | 
> | import(utils)
> | import(stats)
> | ... etc ...
> | 
> | to your NAMESPACE file.

> Or do what R CMD check suggested and import the ones used, rather than 
all.

> Which is what I had quoted earlier:

> | Consider adding
> | 
> |   importFrom("grDevices", "as.raster", "dev.cur", "dev.off", "gray",
> |  "heat.colors", "jpeg", "palette", "pdf", "png", "rainbow",
> |  "terrain.colors", "tiff")
> |   importFrom("graphics", "abline", "axis", "barplot", "box", "boxplot",
> |  "image", "layout", "legend", "lines", "mtext", "par",
> |  "plot", "plot.new", "points", "rasterImage", "strwidth",
> |  "text", "title")
> |   importFrom("stats", "TukeyHSD", "acf", "aov", "ccf", "coefficients",
> |  "drop1", "end", "fft", "median", "model.tables",
> |  "na.action", "na.omit", "pf", "ts", "var")
> |   importFrom("utils", "read.table", "str", "tail", "write.table")
> | 
> | to your NAMESPACE file.

> I find this preferable and quite appreciate that R CMD check provides it.
> Dirk

yes, and that is not only Dirk :

It *is* highly preferable and recommended, also in *the*
reference manual ("Writing R Extensions", aka WRE) for reasons
of
  - efficiency,
  - modularity and "self-documentation"
  - much better control against accidental name clashes,

There are very few exceptions where importing a whole namespace
makes sense and the above base packages are typically never part
of these exceptions.

Martin Maechler
ETH Zurich and R Core

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] RFC: tapply(*, ..., init.value = NA)

2017-01-28 Thread Martin Maechler
>>>>> Henrik Bengtsson <henrik.bengts...@gmail.com>
>>>>> on Fri, 27 Jan 2017 09:46:15 -0800 writes:

> On Fri, Jan 27, 2017 at 12:34 AM, Martin Maechler
> <maech...@stat.math.ethz.ch> wrote:
>> 
>> > On Jan 26, 2017 07:50, "William Dunlap via R-devel"
>> <r-devel@r-project.org> > wrote:
>> 
>> > It would be cool if the default for tapply's init.value
>> could be > FUN(X[0]), so it would be 0 for FUN=sum or
>> FUN=length, TRUE for > FUN=all, -Inf for FUN=max, etc.
>> But that would take time and would > break code for which
>> FUN did not work on length-0 objects.
>> 
>> > Bill Dunlap > TIBCO Software > wdunlap tibco.com
>> 
>> I had the same idea (after my first post), so I agree
>> that would be nice. One could argue it would take time
>> only if the user is too lazy to specify the value, and we
>> could use tryCatch(FUN(X[0]), error = NA) to safeguard
>> against those functions that fail for 0 length arg.
>> 
>> But I think the main reason for _not_ setting such a
>> default is back-compatibility.  In my proposal, the new
>> argument would not be any change by default and so all
>> current uses of tapply() would remain unchanged.
>> 
>>>>>>> Henrik Bengtsson <henrik.bengts...@gmail.com> on
>>>>>>> Thu, 26 Jan 2017 07:57:08 -0800 writes:
>> 
>> > On a related note, the storage mode should try to match
>> ans[[1]] (or > unlist:ed and) when allocating 'ansmat' to
>> avoid coercion and hence a full > copy.
>> 
>> Yes, related indeed; and would fall "in line" with Bill's
>> idea.  OTOH, it could be implemented independently, by
>> something like
>> 
>> if(missing(init.value)) init.value <- if(length(ans))
>> as.vector(NA, mode=storage.mode(ans[[1]])) else NA

> I would probably do something like:

>   ans <- unlist(ans, recursive = FALSE, use.names = FALSE)
>   if (length(ans)) storage.mode(init.value) <- storage.mode(ans[[1]])
>   ansmat <- array(init.value, dim = extent, dimnames = namelist)

> instead.  That completely avoids having to use missing() and the value
> of 'init.value' will be coerced later if not done upfront.  use.names
> = FALSE speeds up unlist().

Thank you, Henrik.
That's a good idea to do the unlist() first, and with 'use.names=FALSE'.
I'll copy that.

On the other hand, "brutally" modifying  'init.value' (now called 'default')
even when the user has specified it is not acceptable I think.
You are right that it would be coerced anyway subsequently, but
the coercion will happen in whatever method of  `[<-` will be
appropriate.
Good S3 and S4 programmers will write such methods for their classes.

For that reason, I'm even more conservative now, only fiddle in
case of an atomic 'ans' and make use of the corresponding '['
method rather than as.vector(.) ... because that will fulfill
the following new regression test {not fulfilled in current R}:

identical(tapply(1:3, 1:3, as.raw),
  array(as.raw(1:3), 3L, dimnames=list(1:3)))

Also, I've done a few more things -- treating if(.) . else . as a
function call, etc  and now committed as  rev 72040  to
R-devel... really wanting to get this out.

We can bet if there will be ripples in (visible) package space,
I give it relatively high chance for no ripples (and much higher
chance for problems with the more aggressive proposal..)

Thank you again, for your "thinking along" and constructive
suggestions.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RFC: tapply(*, ..., init.value = NA)

2017-01-27 Thread Martin Maechler
> Suharto Anggono Suharto Anggono via R-devel 
> on Fri, 27 Jan 2017 16:36:59 + writes:

> The "no factor combination" case is distinguishable by 'tapply' with 
simplify=FALSE.
>> D2 <- data.frame(n = gl(3,4), L = gl(6,2, labels=LETTERS[1:6]), N=3)
>> D2 <- D2[-c(1,5), ]
>> DN <- D2; DN[1,"N"] <- NA
>> with(DN, tapply(N, list(n,L), FUN=sum, simplify=FALSE))
> ABCDEF
> 1 NA   6NULL NULL NULL NULL
> 2 NULL NULL 36NULL NULL
> 3 NULL NULL NULL NULL 66

Yes, I know that simplify=FALSE  behaves differently, it returns
a list with dim & dimnames, sometimes also called a "list - matrix"
... and it *can* be used instead, but to be useful needs to be
post processed and that overall is somewhat inefficient and ugly.


> There is an old related discussion starting on 
https://stat.ethz.ch/pipermail/r-devel/2007-November/047338.html .

Thank you, indeed, for finding that. There Andrew Robinson did
raise the same issue, but his proposed solution was not much
back compatible and I think was primarily dismissed because of that.

Martin

> --
> Last week, we've talked here about "xtabs(), factors and NAs",
-> https://stat.ethz.ch/pipermail/r-devel/2017-January/073621.html

> In the mean time, I've spent several hours on the issue
> and also committed changes to R-devel "in two iterations".

> In the case there is a *Left* hand side part to xtabs() formula,
> see the help page example using 'esoph',
> it uses  tapply(...,  FUN = sum)   and
> I now think there is a missing feature in tapply() there, which
> I am proposing to change. 

> Look at a small example:

>> D2 <- data.frame(n = gl(3,4), L = gl(6,2, labels=LETTERS[1:6]), 
N=3)[-c(1,5), ]; xtabs(~., D2)
> , , N = 3

> L
> n   A B C D E F
> 1 1 2 0 0 0 0
> 2 0 0 1 2 0 0
> 3 0 0 0 0 2 2

>> DN <- D2; DN[1,"N"] <- NA; DN
> n L  N
> 2  1 A NA
> 3  1 B  3
> 4  1 B  3
> 6  2 C  3
> 7  2 D  3
> 8  2 D  3
> 9  3 E  3
> 10 3 E  3
> 11 3 F  3
> 12 3 F  3
>> with(DN, tapply(N, list(n,L), FUN=sum))
> A  B  C  D  E  F
> 1 NA  6 NA NA NA NA
> 2 NA NA  3  6 NA NA
> 3 NA NA NA NA  6  6
>> 

> and as you can see, the resulting matrix has NAs, all the same
> NA_real_, but semantically of two different kinds:

> 1) at ["1", "A"], the  NA  comes from the NA in 'N'
> 2) all other NAs come from the fact that there is no such factor 
combination
> *and* from the fact that tapply() uses

> array(dim = .., dimnames = ...)

> i.e., initializes the array with NAs  (see definition of 'array').

> My proposition is the following patch to  tapply(), adding a new
> option 'init.value':

> 
-
 
> -tapply <- function (X, INDEX, FUN = NULL, ..., simplify = TRUE)
> +tapply <- function (X, INDEX, FUN = NULL, ..., init.value = NA, simplify 
= TRUE)
> {
> FUN <- if (!is.null(FUN)) match.fun(FUN)
> if (!is.list(INDEX)) INDEX <- list(INDEX)
> @@ -44,7 +44,7 @@
> index <- as.logical(lengths(ans))  # equivalently, lengths(ans) > 0L
> ans <- lapply(X = ans[index], FUN = FUN, ...)
> if (simplify && all(lengths(ans) == 1L)) {
> - ansmat <- array(dim = extent, dimnames = namelist)
> + ansmat <- array(init.value, dim = extent, dimnames = namelist)
> ans <- unlist(ans, recursive = FALSE)
> } else {
> ansmat <- array(vector("list", prod(extent)),

> 
-

> With that, I can set the initial value to '0' instead of array's
> default of NA :

>> with(DN, tapply(N, list(n,L), FUN=sum, init.value=0))
> A B C D E F
> 1 NA 6 0 0 0 0
> 2  0 0 3 6 0 0
> 3  0 0 0 0 6 6
>> 

> which now has 0 counts and NA  as is desirable to be used inside
> xtabs().

> All fine... and would not be worth a posting to R-devel,
> except for this:

> The change will not be 100% back compatible -- by necessity: any new 
argument for
> tapply() will make that argument name not available to be
> specified (via '...') for 'FUN'.  The new function would be

>> str(tapply)
> function (X, INDEX, FUN = NULL, ..., init.value = NA, simplify = TRUE)  

> where the '...' are passed FUN(),  and with the new signature,
> 'init.value' then won't be passed to FUN  "anymore" (compared to
> R <= 3.3.x).

> For that reason, we could use   'INIT.VALUE' instead (possibly decreasing
> the probability the arg name is used in other functions).


> Opinions?

> Thank you in advance,
> Martin

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Undefined behavior of head() and tail() with n = 0

2017-01-27 Thread Martin Maechler
Dear Florent,

thank you for striving to clearly disentangle and present the
issue below.
That is a nice "role model" way of approaching such topics!

>>>>> Florent Angly <florent.an...@gmail.com>
>>>>> on Fri, 27 Jan 2017 10:24:39 +0100 writes:

> Martin, I agree with you that +0 and -0 should generally be treated as
> equal, and R does a fine job in this respect. The Wikipedia article on
> signed zero (https://en.wikipedia.org/wiki/Signed_zero) echoes this
> view but also highlights that +0 and -0 can be treated differently in
> particular situations, including their interpretation as mathematical
> limits (as in the 1/-0 case). Indeed, the main question here is
> whether head() and tail() represent a special case that would benefit
> from differentiating between +0 and -0.

> We can break down the discussion into two problems:
> A/ the discrepancy between the implementation of R head() and tail()
> and the documentation of these functions (where the use of zero is not
> documented and thus not permissible),

Ehm, no, in R (and many other software systems),

  "not documented" does *NOT* entail "not permissible"


> B/ the discrepancy between the implementation of R head() and tail()
> and their GNU equivalent (which allow zeros and differentiate between
> -0 and +0, i.e. head takes "0" and "-0", tail takes "0" and "+0").

This discrepancy, as you mention later comes from the fact that
basically, these arguments are strings in the Unix tools (GNU being a
special case of Unix, here) and integers in R.

Below, I'm giving my personal view of the issue:

> There are several possible solutions to address these discrepancies:

> 1/ Leave the code as-is but document its behavior with respect to zero
> (zeros allowed, with negative zeros treated like positive zeros).
> Advantages: This is the path of least resistance, and discrepancy A is 
fixed.
> Disadvantages: Discrepancy B remains (but is documented).

That would be my "clear" choice.


> 2/ Leave the documentation as-is but reflect this in code by not
> allowing zeros at all.
> Advantages: Discrepancy A is fixed.
> Disadvantages: Discrepancy B remains in some form (but is documented).
> Need to deprecate the usage of +0 (which was not clearly documented
> but may have been assumed by users).

2/ looks "uniformly inferior" to 1/ to me


> 3/ Update the code and documentation to differentiate between +0 and -0.
> Advantages: In my eyes, this is the ideal solution since discrepancy A
> and (most of) B are resolved.
> Disadvantages: It is unclear how to implement this solution and the
> implications it may have on backward compatibility:
> a/ Allow -0 (as double). But is it supported on all platforms used
> by R (see ?Arithmetic)? William has raised the issue that negative
> zero cannot be represented as an integer. Should head() and tail()
> then strictly check double input (while forbidding integers)?
> b/ The input could always be as character. This would allow to
> mirror even more closely GNU tail (where the prefix "+" is used to
> invert the meaning of n). This probably involves a fair amount of work
> and careful handling of deprecation.

3/ involves quite a few complications, and in my view, your
   advantages are not even getting close to counter-weigh the drawbacks.


> On 26 January 2017 at 16:51, William Dunlap <wdun...@tibco.com> wrote:
>> In addition, signed zeroes only exist for floating point numbers - the
>> bit patterns for as.integer(0) and as.integer(-0) are identical.

indeed!

>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>> 
>> 
>> On Thu, Jan 26, 2017 at 1:53 AM, Martin Maechler
>> <maech...@stat.math.ethz.ch> wrote:
>>>>>>>> Florent Angly <florent.an...@gmail.com>
>>>>>>>> on Wed, 25 Jan 2017 16:31:45 +0100 writes:
>>> 
>>> > Hi all,
>>> > The documentation for head() and tail() describes the behavior of
>>> > these generic functions when n is strictly positive (n > 0) and
>>> > strictly negative (n < 0). How these functions work when given a zero
>>> > value is not defined.
>>> 
>>> > Both GNU command-line utilities head and tail behave differently with 
+0 and -0:
>>> > http://man7.org/linux/man-pages/man1/head.1.html
>>> > http://man7.org/linux/man-pages/man1/tail.1.html
>>

Re: [Rd] RFC: tapply(*, ..., init.value = NA)

2017-01-27 Thread Martin Maechler

> On Jan 26, 2017 07:50, "William Dunlap via R-devel" 
<r-devel@r-project.org>
> wrote:

> It would be cool if the default for tapply's init.value could be
> FUN(X[0]), so it would be 0 for FUN=sum or FUN=length, TRUE for
> FUN=all, -Inf for FUN=max, etc.  But that would take time and would
> break code for which FUN did not work on length-0 objects.

> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com

I had the same idea (after my first post), so I agree that would
be nice. One could argue it would take time only if the user is too lazy
to specify the value,  and we could use 
   tryCatch(FUN(X[0]), error = NA)
to safeguard against those functions that fail for 0 length arg.

But I think the main reason for _not_ setting such a default is
back-compatibility.  In my proposal, the new argument would not
be any change by default and so all current uses of tapply()
would remain unchanged.

>>>>> Henrik Bengtsson <henrik.bengts...@gmail.com>
>>>>> on Thu, 26 Jan 2017 07:57:08 -0800 writes:

> On a related note, the storage mode should try to match ans[[1]] (or
> unlist:ed and) when allocating 'ansmat' to avoid coercion and hence a full
> copy.

Yes, related indeed; and would fall "in line" with Bill's idea.
OTOH, it could be implemented independently,
by something like

   if(missing(init.value))
 init.value <-
   if(length(ans)) as.vector(NA, mode=storage.mode(ans[[1]]))
   else NA

.

A colleague proposed to use the shorter argument name 'default'
instead of 'init.value'  which indeed maybe more natural and
still not too often used as "non-first" argument in  FUN(.).

Thank you for the constructive feedback!
Martin

> On Thu, Jan 26, 2017 at 2:42 AM, Martin Maechler
> <maech...@stat.math.ethz.ch> wrote:
>> Last week, we've talked here about "xtabs(), factors and NAs",
-> https://stat.ethz.ch/pipermail/r-devel/2017-January/073621.html
>> 
>> In the mean time, I've spent several hours on the issue
>> and also committed changes to R-devel "in two iterations".
>> 
>> In the case there is a *Left* hand side part to xtabs() formula,
>> see the help page example using 'esoph',
>> it uses  tapply(...,  FUN = sum)   and
>> I now think there is a missing feature in tapply() there, which
>> I am proposing to change.
>> 
>> Look at a small example:
>> 
>>> D2 <- data.frame(n = gl(3,4), L = gl(6,2, labels=LETTERS[1:6]),
> N=3)[-c(1,5), ]; xtabs(~., D2)
>> , , N = 3
>> 
>> L
>> n   A B C D E F
>> 1 1 2 0 0 0 0
>> 2 0 0 1 2 0 0
>> 3 0 0 0 0 2 2
>> 
>>> DN <- D2; DN[1,"N"] <- NA; DN
>> n L  N
>> 2  1 A NA
>> 3  1 B  3
>> 4  1 B  3
>> 6  2 C  3
>> 7  2 D  3
>> 8  2 D  3
>> 9  3 E  3
>> 10 3 E  3
>> 11 3 F  3
>> 12 3 F  3
>>> with(DN, tapply(N, list(n,L), FUN=sum))
>> A  B  C  D  E  F
>> 1 NA  6 NA NA NA NA
>> 2 NA NA  3  6 NA NA
>> 3 NA NA NA NA  6  6
>>> 
>> 
>> and as you can see, the resulting matrix has NAs, all the same
>> NA_real_, but semantically of two different kinds:
>> 
>> 1) at ["1", "A"], the  NA  comes from the NA in 'N'
>> 2) all other NAs come from the fact that there is no such factor
> combination
>> *and* from the fact that tapply() uses
>> 
>> array(dim = .., dimnames = ...)
>> 
>> i.e., initializes the array with NAs  (see definition of 'array').
>> 
>> My proposition is the following patch to  tapply(), adding a new
>> option 'init.value':
>> 
>> 
> -
>> 
>> -tapply <- function (X, INDEX, FUN = NULL, ..., simplify = TRUE)
>> +tapply <- function (X, INDEX, FUN = NULL, ..., init.value = NA, simplify
> = TRUE)
>> {
>> FUN <- if (!is.null(FUN)) match.fun(FUN)
>> if (!is.list(INDEX)) INDEX <- list(INDEX)
>> @@ -44,7 +44,7 @@
>> index <- as.logical(lengths(ans))  # equivalently, lengths(ans) > 0L
>> ans <- lapply(X = ans[index], FUN = FUN, ...)
>> if (simplify && all(lengths(ans) == 1L)) {
>> -   ansmat <- array(dim = extent, dimnames = namelist)
>> +   ansmat <- array(init.value, dim = extent, dimnames = namel

[Rd] RFC: tapply(*, ..., init.value = NA)

2017-01-26 Thread Martin Maechler
Last week, we've talked here about "xtabs(), factors and NAs",
 ->  https://stat.ethz.ch/pipermail/r-devel/2017-January/073621.html

In the mean time, I've spent several hours on the issue
and also committed changes to R-devel "in two iterations".

In the case there is a *Left* hand side part to xtabs() formula,
see the help page example using 'esoph',
it uses  tapply(...,  FUN = sum)   and
I now think there is a missing feature in tapply() there, which
I am proposing to change. 

Look at a small example:

> D2 <- data.frame(n = gl(3,4), L = gl(6,2, labels=LETTERS[1:6]), N=3)[-c(1,5), 
> ]; xtabs(~., D2)
, , N = 3

   L
n   A B C D E F
  1 1 2 0 0 0 0
  2 0 0 1 2 0 0
  3 0 0 0 0 2 2

> DN <- D2; DN[1,"N"] <- NA; DN
   n L  N
2  1 A NA
3  1 B  3
4  1 B  3
6  2 C  3
7  2 D  3
8  2 D  3
9  3 E  3
10 3 E  3
11 3 F  3
12 3 F  3
> with(DN, tapply(N, list(n,L), FUN=sum))
   A  B  C  D  E  F
1 NA  6 NA NA NA NA
2 NA NA  3  6 NA NA
3 NA NA NA NA  6  6
>  

and as you can see, the resulting matrix has NAs, all the same
NA_real_, but semantically of two different kinds:

1) at ["1", "A"], the  NA  comes from the NA in 'N'
2) all other NAs come from the fact that there is no such factor combination
   *and* from the fact that tapply() uses

   array(dim = .., dimnames = ...)

i.e., initializes the array with NAs  (see definition of 'array').

My proposition is the following patch to  tapply(), adding a new
option 'init.value':

-
 
-tapply <- function (X, INDEX, FUN = NULL, ..., simplify = TRUE)
+tapply <- function (X, INDEX, FUN = NULL, ..., init.value = NA, simplify = 
TRUE)
 {
 FUN <- if (!is.null(FUN)) match.fun(FUN)
 if (!is.list(INDEX)) INDEX <- list(INDEX)
@@ -44,7 +44,7 @@
 index <- as.logical(lengths(ans))  # equivalently, lengths(ans) > 0L
 ans <- lapply(X = ans[index], FUN = FUN, ...)
 if (simplify && all(lengths(ans) == 1L)) {
-   ansmat <- array(dim = extent, dimnames = namelist)
+   ansmat <- array(init.value, dim = extent, dimnames = namelist)
ans <- unlist(ans, recursive = FALSE)
 } else {
ansmat <- array(vector("list", prod(extent)),

-

With that, I can set the initial value to '0' instead of array's
default of NA :

> with(DN, tapply(N, list(n,L), FUN=sum, init.value=0))
   A B C D E F
1 NA 6 0 0 0 0
2  0 0 3 6 0 0
3  0 0 0 0 6 6
> 

which now has 0 counts and NA  as is desirable to be used inside
xtabs().

All fine... and would not be worth a posting to R-devel,
except for this:

The change will not be 100% back compatible -- by necessity: any new argument 
for
tapply() will make that argument name not available to be
specified (via '...') for 'FUN'.  The new function would be

> str(tapply)
function (X, INDEX, FUN = NULL, ..., init.value = NA, simplify = TRUE)  

where the '...' are passed FUN(),  and with the new signature,
'init.value' then won't be passed to FUN  "anymore" (compared to
R <= 3.3.x).

For that reason, we could use   'INIT.VALUE' instead (possibly decreasing
the probability the arg name is used in other functions).


Opinions?

Thank you in advance,
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Undefined behavior of head() and tail() with n = 0

2017-01-26 Thread Martin Maechler
>>>>> Florent Angly <florent.an...@gmail.com>
>>>>> on Wed, 25 Jan 2017 16:31:45 +0100 writes:

> Hi all,
> The documentation for head() and tail() describes the behavior of
> these generic functions when n is strictly positive (n > 0) and
> strictly negative (n < 0). How these functions work when given a zero
> value is not defined.

> Both GNU command-line utilities head and tail behave differently with +0 
and -0:
> http://man7.org/linux/man-pages/man1/head.1.html
> http://man7.org/linux/man-pages/man1/tail.1.html

> Since R supports signed zeros (1/+0 != 1/-0) 

whoa, whoa, .. slow down --  The above is misleading!

Rather read in  ?Arithmetic (*the* reference to consult for such issues),
where the 2nd part of the following section

 || Implementation limits:
 || 
 ||  [..]
 || 
 ||  Another potential issue is signed zeroes: on IEC 60659 platforms
 ||  there are two zeroes with internal representations differing by
 ||  sign.  Where possible R treats them as the same, but for example
 ||  direct output from C code often does not do so and may output
 ||  ‘-0.0’ (and on Windows whether it does so or not depends on the
 ||  version of Windows).  One place in R where the difference might be
 ||  seen is in division by zero: ‘1/x’ is ‘Inf’ or ‘-Inf’ depending on
 ||  the sign of zero ‘x’.  Another place is ‘identical(0, -0, num.eq =
 ||  FALSE)’.

says the *contrary* ( __Where possible R treats them as the same__ ):
We do _not_ want to distinguish -0 and +0,
but there are cases where it is inavoidable

And there are good reasons (mathematics !!) for this.

I'm pretty sure that it would be quite a mistake to start
differentiating it here...  but of course we can continue
discussing here if you like.

Martin Maechler
ETH Zurich and R Core


> and the R head() and tail() functions are modeled after
> their GNU counterparts, I would expect the R functions to
> distinguish between +0 and -0

>> tail(1:5, n=0)
> integer(0)
>> tail(1:5, n=1)
> [1] 5
>> tail(1:5, n=2)
> [1] 4 5

>> tail(1:5, n=-2)
> [1] 3 4 5
>> tail(1:5, n=-1)
> [1] 2 3 4 5
>> tail(1:5, n=-0)
> integer(0)  # expected 1:5

>> head(1:5, n=0)
> integer(0)
>> head(1:5, n=1)
> [1] 1
>> head(1:5, n=2)
> [1] 1 2

>> head(1:5, n=-2)
> [1] 1 2 3
>> head(1:5, n=-1)
> [1] 1 2 3 4
>> head(1:5, n=-0)
> integer(0)  # expected 1:5

> For both head() and tail(), I expected 1:5 as output but got
> integer(0). I obtained similar results using a data.frame and a
> function as x argument.

> An easy fix would be to explicitly state in the documentation what n =
> 0 does, and that there is no practical difference between -0 and +0.
> However, in my eyes, the better approach would be implement support
> for -0 and document it. What do you think?

> Best,

> Florent


> PS/ My sessionInfo() gives:
> R version 3.3.2 (2016-10-31)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 7 x64 (build 7601) Service Pack 1

> locale:
> [1] LC_COLLATE=German_Switzerland.1252
> LC_CTYPE=German_Switzerland.1252
> LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
> LC_TIME=German_Switzerland.1252

> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] xtabs(), factors and NAs

2017-01-20 Thread Martin Maechler
>>>>> Milan Bouchet-Valat <nalimi...@club.fr>
>>>>> on Thu, 19 Jan 2017 13:58:31 +0100 writes:

> Hi all,
> I know this issue has been discussed a few times in the past already,
> but Martin Maechler suggested in a bug report [1] that I raise it here.
> 
> Basically, there is no (easy) way of printing NAs for all variables
> when calling xtabs() on factors. Passing 'exclude=NULL,
> na.action=na.pass' works for character vectors, but not for factors.
> 
[ yes, but your example below is *not* showing that ... so may be
  a bit confusing !]  {Reason: stringsAsFactors etc}

> > test <- data.frame(x=c("a",NA))
> > xtabs(~ x, exclude=NULL,
> na.action=na.pass, data=test)
> x
> a 
> 1 
> 
> > test <- data.frame(x=factor(c("a",NA)))
> > xtabs(~ x, exclude=NULL,
> na.action=na.pass, data=test)
> x
> a 
> 1 
> 
> 
> Even if it's documented, this inconsistency is annoying. When checking
> data, it is often useful to print all NA values temporarily, without
> calling addNA() individually on all crossed variables.

  {Note this is not (just) about print()ing; the issue is
   about the resulting *object*.}
> 
> Would it make sense to add a new argument similar to table()'s useNA
> which would behave the same for all input vector types?

You have to be aware that  table()  has been changed since R
3.3.2, i.e., is different in R-devel and hence will be different
in R 3.4.0.
table()'s handling of NAs has become very involved /
sophisticated(*), and currently I'd rather like to keep
xtabs()'s behavior much simpler. 

Interestingly, after starting to play with data containing NA's and
  xtabs(*, na.action=na.pass)
I have already detected bugs (for sparse=TRUE) and cases where
the current xtabs() behavior seems dubious to me.
So, the issue is --- as so often --- more involved than assumed initially.

We (R core) will probably do something, but do need more time
before we can promise anything more...

Thank you for raising the issue,
Martin Maechler, ETH Zurich


*) R-devel sources always current at
   https://svn.r-project.org/R/trunk/src/library/base/R/table.R

> 
> Regards

> [1] https://bugs.r-project.org/bugzilla/show_bug.cgi?id=14630

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] multiple bibentry()s in CITATION

2017-01-16 Thread Martin Maechler
>>>>> Fox, John <j...@mcmaster.ca>
>>>>> on Fri, 2 Sep 2016 15:42:46 + writes:

(which is more than 4 months ago)

> Dear list members,
> I've noticed that citation(package="pkg") generates both a text citation 
and a BiBTeX entry when the CITATION file contains a single call to bibentry() 
or citEntry(), but that only text citations are shown if there are multiple 
calls to bibentry() or citEntry(). 

> Is this behaviour intentional? In my opinion, it's useful always to show 
the BiBTeX (although it's available through toBibtex(citation(package="pkg")) ).

> The Writing R Extensions manual says, "A CITATION file will contain 
*calls* [my emphasis] to function bibentry."

> Thanks,
> John

and you did not get a reply
I had wanted but forgotten about it ... two parts :

1)  On November 24, 2012,  I had improved R with an option to get this
so this has been a "hidden gem" ;-) for a while in R:

> options(citation.bibtex.max = Inf)
> citation(package = "Rcmdr")

To cite the 'Rcmdr' package in publications use:

  Fox, J., and Bouchet-Valat, M. (2017). Rcmdr: R Commander. R package version 
2.3-2.

A BibTeX entry for LaTeX users is

  @Manual{,
title = {{Rcmdr: R Commander}},
author = {John Fox and Milan Bouchet-Valat},
year = {2017},
note = {R package version 2.3-2},
url = {http://socserv.socsci.mcmaster.ca/jfox/Misc/Rcmdr/},
  }

  Fox, J. (2017). Using the R Commander: A Point-and-Click Interface or R. Boca 
Raton FL:
  Chapman and Hall/CRC Press.

A BibTeX entry for LaTeX users is

  @Book{,
title = {Using the {R Commander}: A Point-and-Click Interface for {R}},
author = {John Fox},
year = {2017},
publisher = {Chapman and Hall/CRC Press},
address = {Boca Raton {FL}},
url = {http://socserv.mcmaster.ca/jfox/Books/RCommander/},
  }

  Fox, J. (2005). The R Commander: A Basic Statistics Graphical User Interface 
to R.
  Journal of Statistical Software, 14(9): 1--42.

A BibTeX entry for LaTeX users is

  @Article{,
title = {The {R} {C}ommander: A Basic Statistics Graphical User Interface 
to {R}},
author = {John Fox},
year = {2005},
journal = {Journal of Statistical Software},
volume = {14},
number = {9},
pages = {1--42},
url = {http://www.jstatsoft.org/v14/i09},
  }

>


This all works "obviously" (;-) via utils:::format.bibentry ()
and even when I had made the number one an argument to that
function with a default you can set via options(),  I had
wondered a bit  why the cutoff should by default be at one.

E.g., it looks strange that by *adding* a 2nd reference, you get
shorter citation output and to me it would seem more coherent
to have the default rather be 'Inf' instead of '1',  i.e. always
showing both text and bibtex.

There is quite a difference though: For our copula package, e.g.,

> options(citation.bibtex.max = 1); citation(package = "copula")

To cite the R package copula in publications use:

  Marius Hofert, Ivan Kojadinovic, Martin Maechler and Jun Yan (2017). copula:
  Multivariate Dependence with Copulas. R package version 0.999-16 URL
  https://CRAN.R-project.org/package=copula

  Jun Yan (2007). Enjoy the Joy of Copulas: With a Package copula. Journal of 
Statistical
  Software, 21(4), 1-21. URLhttp://www.jstatsoft.org/v21/i04/.

  Ivan Kojadinovic, Jun Yan (2010). Modeling Multivariate Distributions with 
Continuous
  Margins Using the copula R Package. Journal of Statistical Software, 34(9), 
1-20. URL
  http://www.jstatsoft.org/v34/i09/.

  Marius Hofert, Martin Maechler (2011). Nested Archimedean Copulas Meet R: The 
nacopula
  Package. Journal of Statistical Software, 39(9), 1-20. URL
  http://www.jstatsoft.org/v39/i09/.

>

This is relatively compact (18 lines)
whereas it gives  67 lines of output when the option is set to
something >= 4.

Other opinions?
What do you think, would it be worth the compatibility break to
change the default from '1' to 'Inf' ?

Best regards,
Martin

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] accelerating matrix multiply

2017-01-10 Thread Martin Maechler
>>>>> Cohn, Robert S <robert.s.c...@intel.com>
>>>>> on Sat, 7 Jan 2017 16:41:42 + writes:

> I am using R to multiply some large (30k x 30k double)
> matrices on a 64 core machine (xeon phi).  I added some timers
> to src/main/array.c to see where the time is going. All of the
> time is being spent in the matprod function, most of that time
> is spent in dgemm. 15 seconds is in matprod in some code that
> is checking if there are NaNs.

> > system.time (C <- B %*% A)
> nancheck: wall time 15.240282s
>dgemm: wall time 43.111064s
>  matprod: wall time 58.351572s
> user   system  elapsed 
> 2710.154   20.999   58.398
> 
> The NaN checking code is not being vectorized because of the
> early exit when NaN is detected:
> 
>   /* Don't trust the BLAS to handle NA/NaNs correctly: PR#4582
>* The test is only O(n) here.
>*/
>   for (R_xlen_t i = 0; i < NRX*ncx; i++)
>   if (ISNAN(x[i])) {have_na = TRUE; break;}
>   if (!have_na)
>   for (R_xlen_t i = 0; i < NRY*ncy; i++)
>   if (ISNAN(y[i])) {have_na = TRUE; break;}
> 
> I tried deleting the 'break'. By inspecting the asm code, I
> verified that the loop was not being vectorized before, but
> now is vectorized. Total time goes down:
> 
> system.time (C <- B %*% A)
> nancheck: wall time  1.898667s
>dgemm: wall time 43.913621s
>  matprod: wall time 45.812468s
> user   system  elapsed 
> 2727.877   20.723   45.859
> 
> The break accelerates the case when there is a NaN, at the
> expense of the much more common case when there isn't a
> NaN. If a NaN is detected, it doesn't call dgemm and calls its
> own matrix multiply, which makes the NaN check time
> insignificant so I doubt the early exit provides any benefit.
> 
> I was a little surprised that the O(n) NaN check is costly
> compared to the O(n**2) dgemm that follows. I think the reason
> is that nan check is single thread and not vectorized, and my
> machine can do 2048 floating point ops/cycle when you consider
> the cores/dual issue/8 way SIMD/muladd, and the constant
> factor will be significant for even large matrices.
> 
> Would you consider deleting the breaks? I can submit a patch
> if that will help. Thanks.
> 
> Robert

Thank you Robert for bringing the issue up ("again", possibly).
Within R core, some have seen somewhat similar timing on some
platforms (gcc) .. but much less dramatical differences e.g. on
macOS with clang.

As seen in the source code you cite above, the current
implementation was triggered by a nasty BLAS bug .. actually
also showing up only on some platforms, possibly depending on
runtime libraries in addition to the compilers used.

Do you have R code (including set.seed(.) if relevant) to show
on how to generate the large square matrices you've mentioned in
the beginning?  So we get to some reproducible benchmarks?

With best regards,
Martin Maechler

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] seq.int/seq.default

2017-01-06 Thread Martin Maechler
>>>>> Martin Maechler <maech...@stat.math.ethz.ch>
>>>>> on Thu, 5 Jan 2017 12:39:29 +0100 writes:

>>>>> Mick Jordan <mick.jor...@oracle.com>
>>>>>     on Wed, 4 Jan 2017 08:15:03 -0800 writes:

>> On 1/4/17 1:26 AM, Martin Maechler wrote:
>>>>>>>> Mick Jordan <mick.jor...@oracle.com> on Tue, 3 Jan
>>>>>>>> 2017 07:57:15 -0800 writes:
>>> > This is a message for someone familiar with the
>>> implementation.  > Superficially the R code for
>>> seq.default and the C code for seq.int > appear to be
>>> semantically very similar. My question is whether, in
>>> fact, > it is intended that behave identically for all
>>> inputs.
>>> 
>>> Strictly speaking, "no": As usual, RT?Manual (;-)
>>> 
>>> The help page says in the very first paragraph
>>> ('Description'):
>>> 
>>> ‘seq’ is a standard generic with a default method.
>>> ‘seq.int’ is a primitive which can be much faster but
>>> has a few restrictions.
>>> 
>>> > I have found two cases so far where they differ, first
>>> > that seq.int will coerce a character string to a real
>>> (via > Rf_asReal) whereas seq.default appears to coerce
>>> it to NA > and then throws an error:
>>> 
>>> >> seq.default("2", "5") > Error in seq.default("2",
>>> "5") : 'from' cannot be NA, NaN or infinite >>
>>> seq.int("2", "5") > [1] 2 3 4 5
>>> >>
>>> 
>>> this may be a bit surprising (if one does _not_ look at
>>> the code), indeed, notably because seq.int() is
>>> mentioned to have more restrictions than seq() which
>>> here calls seq.default().  "Surprising" also when
>>> considering
>>> 
>>> > "2":"5" [1] 2 3 4 5
>>> 
>>> and the documentation of ':' claims 'from:to' to be the
>>> same as rep(from,to) apart from the case of factors.
>>> 
>>> --- I am considering a small change in seq.default()
>>> which would make it work for this case, compatibly with
>>> ":" and seq.int().
>>> 
>>> 
>>> > and second, that the error messages for non-numeric
>>> arguments differ:
>>> 
>>> which I find fine... if the functions where meant to be
>>> identical, we (the R developers) would be silly to have
>>> both, notably as the ".int" suffix has emerged as
>>> confusing the majority of useRs (who don't read help
>>> pages).
>>> 
>>> Rather it has been meant as saying "internal" (including
>>> "fast") also for other such R functions, but the suffix
>>> of course is a potential clash with S3 method naming
>>> schemes _and_ the fact that 'int' is used as type name
>>> for integer in other languages, notably C.
>>> 
>>> > seq.default(to=quote(b), by=2) > Error in
>>> is.finite(to) : default method not implemented for type
>>> 'symbol'
>>> 
>>> which I find a very appropriate and helpful message
>>> 
>>> > seq.int(to=quote(b), by=2) > Error in seq.int(to =
>>> quote(b), by = 2) : > 'to' cannot be NA, NaN or infinite
>>> 
>>> which is true, as well, and there's no "default method"
>>> to be mentioned, but you are right that it would be
>>> nicer if the message mentioned 'symbol' as well.

>> Thanks for the clarifications. It was surprising that
>> seq.int supported more types than seq.default. I was
>> expecting the reverse.

> exactly, me too!

>> BTW, There are a couple of, admittedly odd, cases,
>> exposed by brute force testing, where seq.int will
>> actually return "missing", which I presume is not
>> intended, and seq.default behaves differently, vis:

>>> seq.default(to=1,by=2)
>> [1] 1
>>> seq.int(to=1,by=2)

>>> > x <- seq.int(to=1,by=2) x
>> Error: argument "x" is missing, with no default

>> Lines 792 and 799 of seq.c return the incoming argument
>> (as op

Re: [Rd] seq.int/seq.default

2017-01-05 Thread Martin Maechler
> Mick Jordan 
> on Wed, 4 Jan 2017 12:49:41 -0800 writes:

> On 1/4/17 8:15 AM, Mick Jordan wrote:
> Here is another difference that I am guessing is unintended.

>> y <- seq.int(1L, 3L, length.out=2)
>> typeof(y)
> [1] "double"
>> x <- seq.default(1L, 3L, length.out=2)
>> typeof(x)
> [1] "integer"

> The if (by == R_MissingArg) branch at line 842 doesn't contain a check 
> for "all INTSXP" unlike the if (to == R_MissingArg) branch.

> Mick

I'll look at this case, too,
thank you once more!

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] seq.int/seq.default

2017-01-05 Thread Martin Maechler
>>>>> Mick Jordan <mick.jor...@oracle.com>
>>>>> on Wed, 4 Jan 2017 08:15:03 -0800 writes:

> On 1/4/17 1:26 AM, Martin Maechler wrote:
>>>>>>> Mick Jordan <mick.jor...@oracle.com>
>>>>>>> on Tue, 3 Jan 2017 07:57:15 -0800 writes:
>> > This is a message for someone familiar with the implementation.
>> > Superficially the R code for seq.default and the C code for seq.int
>> > appear to be semantically very similar. My question is whether, in 
fact,
>> > it is intended that behave identically for all inputs.
>> 
>> Strictly speaking, "no":  As usual, RT?Manual (;-)
>> 
>> The help page says in the very first paragraph ('Description'):
>> 
>> ‘seq’ is a standard generic with a default method.
>> ‘seq.int’ is a primitive which can be much faster but has a few 
restrictions.
>> 
>> > I have found two cases so far where they differ, first
>> > that seq.int will coerce a character string to a real (via
>> > Rf_asReal) whereas seq.default appears to coerce it to NA
>> > and then throws an error:
>> 
>> >> seq.default("2", "5")
>> > Error in seq.default("2", "5") : 'from' cannot be NA, NaN or infinite
>> >> seq.int("2", "5")
>> > [1] 2 3 4 5
>> >>
>> 
>> this may be a bit surprising (if one does _not_ look at the code),
>> indeed, notably because seq.int() is mentioned to have more
>> restrictions than seq() which here calls seq.default().
>> "Surprising" also when considering
>> 
>> > "2":"5"
>> [1] 2 3 4 5
>> 
>> and the documentation of ':' claims 'from:to' to be the same as
>> rep(from,to)  apart from the case of factors.
>> 
>> --- I am considering a small change in  seq.default()
>> which would make it work for this case, compatibly with ":" and 
seq.int().
>> 
>> 
>> > and second, that the error messages for non-numeric arguments differ:
>> 
>> which I find fine... if the functions where meant to be
>> identical, we (the R developers) would be silly to have both,
>> notably as the ".int" suffix  has emerged as confusing the
>> majority of useRs (who don't read help pages).
>> 
>> Rather it has been meant as saying "internal" (including "fast") also 
for other
>> such R functions, but the suffix of course is a potential clash
>> with S3 method naming schemes _and_ the fact that 'int' is used
>> as type name for integer in other languages, notably C.
>> 
>> > seq.default(to=quote(b), by=2)
>> > Error in is.finite(to) : default method not implemented for type 
'symbol'
>> 
>> which I find a very appropriate and helpful message
>> 
>> > seq.int(to=quote(b), by=2)
>> > Error in seq.int(to = quote(b), by = 2) :
>> > 'to' cannot be NA, NaN or infinite
>> 
>> which is true, as well, and there's no "default method" to be
>> mentioned, but you are right that it would be nicer if the
>> message mentioned 'symbol' as well.

> Thanks for the clarifications. It was surprising that seq.int supported 
> more types than seq.default. I was expecting the reverse.

exactly, me too!

> BTW, There are a couple of, admittedly odd, cases, exposed by brute 
> force testing, where seq.int will actually return "missing", which I 
> presume is not intended, and seq.default behaves differently, vis:

>> seq.default(to=1,by=2)
> [1] 1
>> seq.int(to=1,by=2)

>> > x <- seq.int(to=1,by=2)
>> x
> Error: argument "x" is missing, with no default

> Lines 792 and 799 of seq.c return the incoming argument (as opposed to a 
> value based on its coercion to double via asReal) and this can, as in 
> the above example, be "missing".

> Thanks
> Mick Jordan

Thanks a lot, Mick -- you are right!

I'm fixing these  (the line numbers have greatly changed in the
mean time: Remember we work with "R-devel", i.e., the "trunk" :
always available at
https://svn.r-project.org/R/trunk/src/main/seq.c

Martin Maechler
ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] seq.int/seq.default

2017-01-04 Thread Martin Maechler
>>>>> Mick Jordan <mick.jor...@oracle.com>
>>>>> on Tue, 3 Jan 2017 07:57:15 -0800 writes:

> This is a message for someone familiar with the implementation.
> Superficially the R code for seq.default and the C code for seq.int 
> appear to be semantically very similar. My question is whether, in fact, 
> it is intended that behave identically for all inputs. 

Strictly speaking, "no":  As usual, RT?Manual (;-)

The help page says in the very first paragraph ('Description'):

  ‘seq’ is a standard generic with a default method.
  ‘seq.int’ is a primitive which can be much faster but has a few restrictions. 

> I have found two cases so far where they differ, first
> that seq.int will coerce a character string to a real (via
> Rf_asReal) whereas seq.default appears to coerce it to NA
> and then throws an error:

>> seq.default("2", "5")
> Error in seq.default("2", "5") : 'from' cannot be NA, NaN or infinite
>> seq.int("2", "5")
> [1] 2 3 4 5
>> 

this may be a bit surprising (if one does _not_ look at the code),
indeed, notably because seq.int() is mentioned to have more
restrictions than seq() which here calls seq.default().
"Surprising" also when considering

   > "2":"5"
   [1] 2 3 4 5

and the documentation of ':' claims 'from:to' to be the same as
rep(from,to)  apart from the case of factors.

--- I am considering a small change in  seq.default()
which would make it work for this case, compatibly with ":" and seq.int().


> and second, that the error messages for non-numeric arguments differ:

which I find fine... if the functions where meant to be
identical, we (the R developers) would be silly to have both,
notably as the ".int" suffix  has emerged as confusing the
majority of useRs (who don't read help pages).

Rather it has been meant as saying "internal" (including "fast") also for other
such R functions, but the suffix of course is a potential clash
with S3 method naming schemes _and_ the fact that 'int' is used
as type name for integer in other languages, notably C. 

> seq.default(to=quote(b), by=2)
> Error in is.finite(to) : default method not implemented for type 'symbol'

which I find a very appropriate and helpful message

> seq.int(to=quote(b), by=2)
> Error in seq.int(to = quote(b), by = 2) :
> 'to' cannot be NA, NaN or infinite

which is true, as well, and there's no "default method" to be
mentioned, but you are right that it would be nicer if the
message mentioned 'symbol' as well.

> Please reply off list.

[which I understand as that we should  CC you (which of course is
 netiquette to do)]

Martin Maechler
ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] utils::ls.str(): Partial argument name 'digits' to seq() (should be digits.d?)

2017-01-03 Thread Martin Maechler
You are right (though picky).  I have updated it now.

Thank you Henrik!
Martin

> Should utils::ls.str() be updated as:

> svn diff src/library/utils/R/str.R
> Index: src/library/utils/R/str.R
> ===
> --- src/library/utils/R/str.R (revision 71879)
> +++ src/library/utils/R/str.R (working copy)
> @@ -622,7 +622,7 @@
>  args$digits.d <- NULL
>  }
>  strargs <- c(list(max.level = max.level, give.attr = give.attr,
> -  digits = digits), args)
> +  digits.d = digits), args)
>  for(nam in x) {
>   cat(nam, ": ")


[...]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] structure(NULL, *) is deprecated [was: Unexpected I(NULL) output]

2016-12-29 Thread Martin Maechler
>>>>> Martin Maechler <maech...@stat.math.ethz.ch>
>>>>> on Thu, 22 Dec 2016 10:24:43 +0100 writes:

>>>>> Florent Angly <florent.an...@gmail.com>
>>>>> on Tue, 20 Dec 2016 13:42:37 +0100 writes:

>> Hi all,
>> I believe there is an issue with passing NULL to the function I().

>> class(NULL)  # "NULL"  (as expected)
>> print(NULL)   # NULL  (as expected)
>> is.null(NULL) # TRUE  (as expected)

>> According to the documentation I() should return a copy of its input
>> with class "AsIs" preprended:

>> class(I(NULL))  # "AsIs"  (as expected)
>> print(I(NULL))   # list()  (not expected! should be NULL)
>> is.null(I(NULL)) # FALSE  (not expected! should be TRUE)

>> So, I() does not behave according to its documentation. 

> yes.

>> In R, it is
>> not possible to give NULL attributes, but I(NULL) attempts to do that
>> nonetheless, using the structure() function. Probably:
>> 1/ structure() should not accept NULL as input since the goal of
>> structure() is to set some attributes, something cannot be done on
>> NULL.

> I tend to agree.  However if we gave an error now, I notice that
> even our own code, e.g., in stats:::formula.default()  would fail.

> Still, I think we should consider *deprecating*  structure(NULL, *),
> so it would give a *warning* (and continue working otherwise)
> (for a while before giving an error a year later).

 [..]

> Martin Maechler
> ETH Zurich

Since svn rev 71841,   structure(NULL, *) now __is__ deprecated
in R-devel, i.e.,

  > structure(NULL, foo = 2)
  list()
  attr(,"foo")
  [1] 2
  Warning message:
  In structure(NULL, foo = 2) :
Calling 'structure(NULL, *)' is deprecated, as NULL cannot have attributes.
Consider 'structure(list(), *)' instead.
  > 

A dozen or so CRAN packages now not only give warnings but
partially also  ERRORS in their checks,  which I find strange,
but it may be because of too stringent checks (e.g. checks were
all warnings are turned into errors).

The most prominent packages now giving errors are
data.table and ggplot2,  then also GGally.

Of course, we (the R core team) could make the deprecation even
milder by not giving a warning() but only a message(.) aka
"NOTE";  however, that renders the deprecation process longer and more
complicated (notably for us),  and there is still a few months' time
before this version of R will be released...
and really, as I said,... a new warning should rarely cause
*errors* but rather warnings.

OTOH, some of us have now seen / read on the  R-package-devel  mailing list
that it seems ggplot2 has stopped working correctly (under
R-devel only!) in building packages because of this warning.. 

The current plan is it will eventually, i.e., after the
deprecation period, become an error, so ideally packages are
patched and re-released ASAP.  It's bedtime here now and we will
see tomorrow how to continue.

My current plan is to an e-mail to the package maintainers of CRAN
packages that are affected, at least for those packages that are "easy to find".

Martin Maechler,
ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] colnames for data.frame could be greatly improved

2016-12-29 Thread Martin Maechler
> Hi there,
> Any update on this?
> Should I create bugzilla ticket and submit patch?

> Regards
> Jan Gorecki

Hi Jan,

Why should we care that the  do.NULL = FALSE case is slower?
After all do.NULL = TRUE is the default.

In other words, where are use cases where it is problematic that
do.NULL = FALSE is relatively slow?

Shorter code  *is* nicer than longer code,  so I need a bit more
conviction why we should add more code for that special case ..

Martin Maechler, ETH Zurich

> On 20 December 2016 at 01:27, Jan Gorecki <j.gore...@wit.edu.pl> wrote:
> > Hello,
> >
> > colnames seems to be not optimized well for data.frame. It escapes
> > processing for data.frame in
> >
> >   if (is.data.frame(x) && do.NULL)
> > return(names(x))
> >
> > but only when do.NULL true. This makes huge difference when do.NULL
> > false. Minimal edit to `colnames`:
> >
> > if (is.data.frame(x)) {
> > nm <- names(x)
> > if (do.NULL || !is.null(nm))
> > return(nm)
> > else
> > return(paste0(prefix, seq_along(x)))
> > }
> >
> > Script and timings:
> >
> > N=1e7; K=100
> > set.seed(1)
> > DF <- data.frame(
> > id1 = sample(sprintf("id%03d",1:K), N, TRUE),  # large groups (char)
> > id2 = sample(sprintf("id%03d",1:K), N, TRUE),  # large groups (char)
> > id3 = sample(sprintf("id%010d",1:(N/K)), N, TRUE), # small groups (char)
> > id4 = sample(K, N, TRUE),  # large groups (int)
> > id5 = sample(K, N, TRUE),  # large groups (int)
> > id6 = sample(N/K, N, TRUE),# small groups (int)
> > v1 =  sample(5, N, TRUE),  # int in range [1,5]
> > v2 =  sample(5, N, TRUE),  # int in range [1,5]
> > v3 =  sample(round(runif(100,max=100),4), N, TRUE) # numeric e.g. 
> > 23.5749
> > )
> > cat("GB =", round(sum(gc()[,2])/1024, 3), "\n")
> > #GB = 0.397
> > colnames(DF) = NULL
> > system.time(nm1<-colnames(DF, FALSE))
> > #   user  system elapsed
> > # 22.158   0.299  22.498
> > print(nm1)
> > #[1] "col1" "col2" "col3" "col4" "col5" "col6" "col7" "col8" "col9"
> >
> > ### restart R
> >
> > colnames <- function (x, do.NULL = TRUE, prefix = "col")
> > {
> > if (is.data.frame(x)) {
> > nm <- names(x)
> > if (do.NULL || !is.null(nm))
> > return(nm)
> > else
> > return(paste0(prefix, seq_along(x)))
> > }
> > dn <- dimnames(x)
> > if (!is.null(dn[[2L]]))
> > dn[[2L]]
> > else {
> > nc <- NCOL(x)
> > if (do.NULL)
> > NULL
> > else if (nc > 0L)
> > paste0(prefix, seq_len(nc))
> > else character()
> > }
> > }
> > N=1e7; K=100
> > set.seed(1)
> > DF <- data.frame(
> > id1 = sample(sprintf("id%03d",1:K), N, TRUE),  # large groups (char)
> > id2 = sample(sprintf("id%03d",1:K), N, TRUE),  # large groups (char)
> > id3 = sample(sprintf("id%010d",1:(N/K)), N, TRUE), # small groups (char)
> > id4 = sample(K, N, TRUE),  # large groups (int)
> > id5 = sample(K, N, TRUE),  # large groups (int)
> > id6 = sample(N/K, N, TRUE),# small groups (int)
> > v1 =  sample(5, N, TRUE),  # int in range [1,5]
> > v2 =  sample(5, N, TRUE),  # int in range [1,5]
> > v3 =  sample(round(runif(100,max=100),4), N, TRUE) # numeric e.g. 
> > 23.5749
> > )
> > cat("GB =", round(sum(gc()[,2])/1024, 3), "\n")
> > #GB = 0.397
> > colnames(DF) = NULL
> > system.time(nm1<-colnames(DF, FALSE))
> > #   user  system elapsed
> > #  0.001   0.000   0.000
> > print(nm1)
> > #[1] "col1" "col2" "col3" "col4" "col5" "col6" "col7" "col8" "col9"
> >
> > sessionInfo()
> > #R Under development (unstable) (2016-12-19 r71815)
> > #Platform: x86_64-pc-linux-gnu (64-bit)
> > #Running under: Debian GNU/Linux stretch/sid
> > #
> > #locale:
> > # [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
> > # [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
> > # [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
> > # [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
> > # [9] LC_ADDRESS=C   LC_TELEPHONE=C
> > #[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> > #
> > #attached base packages:
> > #[1] stats graphics  grDevices utils datasets  methods   base  #
> > #
> > #loaded via a namespace (and not attached):
> > #[1] compiler_3.4.0
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Unexpected I(NULL) output

2016-12-22 Thread Martin Maechler
>>>>> Florent Angly <florent.an...@gmail.com>
>>>>> on Tue, 20 Dec 2016 13:42:37 +0100 writes:

> Hi all,
> I believe there is an issue with passing NULL to the function I().

> class(NULL)  # "NULL"  (as expected)
> print(NULL)   # NULL  (as expected)
> is.null(NULL) # TRUE  (as expected)

> According to the documentation I() should return a copy of its input
> with class "AsIs" preprended:

> class(I(NULL))  # "AsIs"  (as expected)
> print(I(NULL))   # list()  (not expected! should be NULL)
> is.null(I(NULL)) # FALSE  (not expected! should be TRUE)

> So, I() does not behave according to its documentation. 

yes.

> In R, it is
> not possible to give NULL attributes, but I(NULL) attempts to do that
> nonetheless, using the structure() function. Probably:
> 1/ structure() should not accept NULL as input since the goal of
> structure() is to set some attributes, something cannot be done on
> NULL.

I tend to agree.  However if we gave an error now, I notice that
even our own code, e.g., in stats:::formula.default()  would fail.

Still, I think we should consider *deprecating*  structure(NULL, *),
so it would give a *warning* (and continue working otherwise)
(for a while before giving an error a year later).

> 2/ I() could accept NULL, but, as an exception, not set an "AsIs"
> class attribute on it. This would be in line with the philosophy of
> the I() function to return an object that is functionally equivalent
> to the input object.

If we'd adopt 2, the I(.) function would become slightly more
complicated and slower...  but possibly not practically
noticeable.

A last option would be

3/  The help page for I() could note what happens in the NULL case.

That would be the least work for everyone,
but at the moment, I tend to agree that '1/' is worth the pain to
have R's structure() become more consistent.

Martin Maechler
ETH Zurich

> My sessionInfo() returns:
>> sessionInfo()
> R version 3.3.2 (2016-10-31)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 7 x64 (build 7601) Service Pack 1

> locale:
> [1] LC_COLLATE=German_Switzerland.1252
> LC_CTYPE=German_Switzerland.1252
> LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
> [5] LC_TIME=German_Switzerland.1252

> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base

> Best regards,

> Florent

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Very small numbers in hexadecimal notation parsed as zero

2016-12-21 Thread Martin Maechler
> Florent Angly 
> on Tue, 20 Dec 2016 13:26:36 +0100 writes:

> Hi all,
> I have noticed incorrect parsing of very small hexadecimal numbers
> like "0x1.dp-987". Such a hexadecimal representation can
> can be produced by sprintf() using the %a flag. The return value is
> incorrectly reported as 0 when coercing these numbers to double using
> as.double()/as.numeric(), as illustrated in the three examples below:

> as.double("0x1.dp-987")# should be 7.645296e-298
> as.double("0x1.0p-1022")  # should be 2.225074e-308
> as.double("0x1.f89fc1a6f6613p-974")  # should be 1.23456e-293

> The culprit seems to be the src/main/util.c:R_strtod function and in
> some cases, removing the zeroes directly before the 'p' leads to
> correct parsing:

> as.double("0x1.dp-987") # 7.645296e-298, as expected
> as.double("0x1.p-1022") # 2.225074e-308, as expected

Yes, this looks like a bug, indeed.
Similarly convincing is a simple comparison (of even less extreme)

> as.double("0x1p-987")
[1] 7.645296e-298
> as.double("0x1.00p-987")
[1] 0
> 

The "bug boundary" seems around here:

> as.double("0x1.p-928") # fails
[1] 0
> as.double("0x1p-928")
[1] 4.407213e-280
> 

> as.double("0x1.p-927") # works
[1] 8.814426e-280

but then adding more zeros before "p-927" also underflows.

--> I have created an R bugzilla account for you; so you now
 can submit bug reports (including patch proposals to the source (hint!) ;-)

Thank you, Florent!
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Request: Increasing MAX_NUM_DLLS in Rdynload.c

2016-12-20 Thread Martin Maechler
>>>>> Steve Bronder <sbron...@stevebronder.com>
>>>>> on Tue, 20 Dec 2016 01:34:31 -0500 writes:

> Thanks Henrik this is very helpful! I will try this out on our tests and
> see if gcDLLs() has a positive effect.

> mlr currently has tests broken down by learner type such as 
classification,
> regression, forecasting, clustering, etc.. There are 83 classifiers alone
> so even when loading and unloading across learner types we can still hit
> the MAX_NUM_DLLS error, meaning we'll have to break them down further (or
> maybe we can be clever with gcDLLs()?). I'm CC'ing Lars Kotthoff and Bernd
> Bischl to make sure I am representing the issue well.

This came up *here* in May 2015
and then May 2016 ... did you not find it when googling.

Hint:  Use  
   site:stat.ethz.ch MAX_NUM_DLLS
as search string in Google, so it will basically only search the
R mailing list archives

Here's the start of that thread :

  https://stat.ethz.ch/pipermail/r-devel/2016-May/072637.html

There was not a clear conclusion back then, notably as
Prof Brian Ripley noted that 100 had already been an increase
and that a large number of loaded DLLs decreases look up speed.

OTOH (I think others have noted that) a large number of DLLs
only penalizes those who *do* load many, and we should probably
increase it.

Your use case of "hyper packages" which load many others
simultaneously is somewhat convincing to me... in so far as the
general feeling is that memory should be cheap and limits should
not be low.

(In spite of Brian Ripleys good reasons against it, I'd still
 aim for a *dynamic*, i.e. automatically increased list here).

Martin Maechler

> Regards,

> Steve Bronder
> Website: stevebronder.com
> Phone: 412-719-1282
> Email: sbron...@stevebronder.com


> On Tue, Dec 20, 2016 at 1:04 AM, Henrik Bengtsson <
> henrik.bengts...@gmail.com> wrote:

>> On reason for hitting the MAX_NUM_DLLS (= 100) limit is because some
>> packages don't unload their DLLs when they being unloaded themselves.
>> In other words, there may be left-over DLLs just sitting there doing
>> nothing but occupying space.  You can remove these, using:
>> 
>> R.utils::gcDLLs()
>> 
>> Maybe that will help you get through your tests (as long as you're
>> unloading packages).  gcDLLs() will look at base::getLoadedDLLs() and
>> its content and compare to loadedNamespaces() and unregister any
>> "stray" DLLs that remain after corresponding packages have been
>> unloaded.
>> 
>> I think it would be useful if R CMD check would also check that DLLs
>> are unregistered when a package is unloaded
>> (https://github.com/HenrikBengtsson/Wishlist-for-R/issues/29), but of
>> course, someone needs to write the code / a patch for this to happen.
>> 
>> /Henrik
>> 
>> On Mon, Dec 19, 2016 at 6:01 PM, Steve Bronder
>> <sbron...@stevebronder.com> wrote:
>> > This is a request to increase MAX_NUM_DLLS in Rdynload.c in from 100 to
>> 500.
>> >
>> > On line 131 of Rdynload.c, changing
>> >
>> > #define MAX_NUM_DLLS 100
>> >
>> >  to
>> >
>> > #define MAX_NUM_DLLS 500
>> >
>> >
>> > In development of the mlr package, there have been several episodes in
>> the
>> > past where we have had to break up unit tests because of the "maximum
>> > number of DLLs reached" error. This error has been an inconvenience 
that
>> is
>> > going to keep happening as the package continues to grow. Is there more
>> > than meets the eye with this error or would everything be okay if the
>> above
>> > line changes? Would that have a larger effect in other parts of R?
>> >
>> > As R grows, we are likely to see more 'meta-packages' such as the
>> > Hadley-verse, caret, mlr, etc. need an increasing amount of DLLs loaded
>> at
>> > any point in time to conduct effective unit tests. If  MAX_NUM_DLLS is
>> set
>> > to 100 for a very particular reason than I apologize, but if it is
>> possible
>> > to increase MAX_NUM_DLLS it would at least make the testing at mlr much
>> > easier.
>> >
>> > I understand you are all very busy and thank you for your time.
>> >
>> >
>> > Regards,
>> >
>> > Steve Bronder
>> > Website: stevebronder.com

Re: [Rd] print.POSIXct doesn't seem to use tz argument, as per its example

2016-12-16 Thread Martin Maechler
> Jennifer Lyon 
> on Thu, 15 Dec 2016 09:33:30 -0700 writes:

> On the documentation page for DateTimeClasses, in the Examples section,
> there are the following two lines:
> 
> format(.leap.seconds) # the leap seconds in your time zone
> print(.leap.seconds, tz = "PST8PDT")  # and in Seattle's
> 
> The second line (using print) seems to ignore the tz argument, and prints
> the dates in my time zone, while:
> 
> format(.leap.seconds, tz = "PST8PDT")
> 
> does print the dates in PST. The code in
> https://github.com/wch/r-source/blob/trunk/src/library/base/R/datetime.R
> around line 234 looks like the ... argument is passed to print, not to
> format.
> 
> print.POSIXct <-
> print.POSIXlt <- function(x, ...)
> {
> max.print <- getOption("max.print", L)
> if(max.print < length(x)) {
> print(format(x[seq_len(max.print)], usetz = TRUE), ...)
> cat(' [ reached getOption("max.print") -- omitted',
> length(x) - max.print, 'entries ]\n')
> } else print(if(length(x)) format(x, usetz = TRUE)
>  else paste(class(x)[1L], "of length 0"), ...)
> invisible(x)
> }
> 
> The documentation for print() on this page seems to be silent on tz as an
> argument, but I do believe the example using print() does not work as
> advertised.

> Thanks.
> 
> Jen

Thank you, Jen!
Indeed,  both your observation and your diagnosis are correct:
This has been a misleading example and needs amending (or the
code is changed, see below).

The most simple fix would be to replace  'print('  by
'format('; then the example would work as advertized.
That change has two drawbacks still:

1) format(.) examples on the help of print.POSIXct() where
   format.POSIXct() is *not* documented 

2) It *would* make sense that print.POSIXct() allowed for a 'tz' argument
   (and maybe 'usetz' too).  This/these would be (an) extra
   argument(s) rather than passing '...' not just to print() but
   also to format()rathere

My personal preference would tend to add both
 tz = ""
and  usetz = TRUE
to the formal arguments of print.POSIXct and pass them to the
format(.) calls.

Martin


> sessionInfo()
> R version 3.3.2 (2016-10-31)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 14.04.5 LTS
> 
> locale:
>  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
>  [9] LC_ADDRESS=C   LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] New leap second end of 2016 / beginning 2017 (depending on TZ)

2016-12-15 Thread Martin Maechler
>>>>> Martin Maechler <maech...@stat.math.ethz.ch>
>>>>> on Wed, 14 Dec 2016 17:04:22 +0100 writes:

> As R is sophisticated enough to track leap seconds,
> ?.leap.seconds

> we'd need to update our codes real soon now again:

> https://en.wikipedia.org/wiki/Leap_second

> (and those of you who want second precision in R in 2017 need to start
> working with 'R patched' or 'R devel' ...)

I've been told offline, that the above could be considered as
FUD .. which I hope nobody read from it.

Furthermore, there seems to be wide disagreement about the
usefulness of leap seconds, and how computers (and OSs) should
deal with them.
One recent approach (e.g. by Google) is to "smear the leap
second" into the system (by somehow "throttling" time servers ;-)..

(and no, I even less would want this to become a long thread, so
 please refrain if you can ...)

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] New leap second end of 2016 / beginning 2017 (depending on TZ)

2016-12-14 Thread Martin Maechler
As R is sophisticated enough to track leap seconds,

   ?.leap.seconds

we'd need to update our codes real soon now again:

https://en.wikipedia.org/wiki/Leap_second

(and those of you who want second precision in R in 2017 need to start
working with 'R patched' or 'R devel' ...)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Strange behavior when using progress bar (Fwd: Re: [R] The code itself disappears after starting to execute the for loop)

2016-12-07 Thread Martin Maechler
>>>>> Jon Skoien <jon.sko...@jrc.ec.europa.eu>
>>>>> on Wed, 7 Dec 2016 11:04:04 +0100 writes:

> I would like to ask once more if this is reproducible also for others? 
> If yes, should I submit it as a bug-report?

> Best,
> Jon

Please  Windows users .. this is possibly only for you!

Note that I do *not* see problems on Linux (in ESS; did not try RStudio).

Please also indicate in which form you are running R.
Here it does depend if this is inside RStudio, ESS, the "Windows
GUI", the "Windows terminal", ...

Martin Maechler,
ETH Zurich


> On 11/28/2016 11:26 AM, Jon Skoien wrote:
>> I first answered to the email below in r-help, but as I did not see 
>> any response, and it looks like a bug/unwanted behavior, I am also 
>> posting here. I have observed this in RGui, whereas it seems not to 
>> happen in RStudio.
>> 
>> Similar to OP, I sometimes have a problem with functions using the 
>> progress bar. Frequently, the console is cleared after x iterations 
>> when the progress bar is called in a function which is wrapped in a 
>> loop. In the example below, this happened for me every ~44th 
>> iteration. Interestingly, it seems that reduction of the sleep times 
>> in this function increases the number of iterations before clearing. 
>> In my real application, where the progress bar is used in a much 
>> slower function, the console is cleared every 2-3 iteration, which 
>> means that I cannot scroll back to check the output.

 testit <- function(x = sort(runif(20)), ...) {
   pb <- txtProgressBar(...)
   for(i in c(0, x, 1)) {Sys.sleep(0.2); setTxtProgressBar(pb, i)}
   Sys.sleep(1)
   close(pb)
 }
 
 it <- 0
 while (TRUE) {testit(style = 3); it <- it + 1; print(paste("done", it))}

>> Is this only a problem for a few, or is it reproducible? Any hints to
>> what the problem could be, or if it can be fixed? I have seen this in 
>> some versions of R, and could also reproduce in 3.3.2.

"some versions of R" ... all on Windows ?

>> 
>> Best wishes,
>> Jon
>> 
>> R version 3.3.2 (2016-10-31)
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>> Running under: Windows 8.1 x64 (build 9600)
>> 
>> locale:
>> [1] LC_COLLATE=English_United States.1252
>> [2] LC_CTYPE=English_United States.1252
>> [3] LC_MONETARY=English_United States.1252
>> [4] LC_NUMERIC=C
>> [5] LC_TIME=English_United States.1252
>> 
>> attached base packages:
>> [1] stats graphics  grDevices utils datasets  methods base

[.]

> Jon Olav Skøien
> Joint Research Centre - European Commission
> Institute for Space, Security & Migration
> Disaster Risk Management Unit

> Via E. Fermi 2749, TP 122,  I-21027 Ispra (VA), ITALY

> jon.sko...@jrc.ec.europa.eu
> Tel:  +39 0332 789205

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] segfault with POSIXlt zone=NULL zone=""

2016-12-06 Thread Martin Maechler
> Joshua Ulrich 
> on Tue, 6 Dec 2016 09:51:16 -0600 writes:

> On Tue, Dec 6, 2016 at 6:37 AM,   wrote:
>> Hi all,
>> 
>> I ran into a segfault while playing with dates.
>> 
>> $ R --no-init-file
>> ...
>> > library(lubridate); d=as.POSIXlt(floor_date(Sys.time(),"year")); 
d$zone=NULL; d$zone=""; d
>> 
> If you're asking about a bug in R, you should provide a *minimal*
> reproducible example (i.e. one without any package dependencies).
> This has nothing to do with lubridate, so you can reproduce the
> behavior with:

> d <- as.POSIXlt(Sys.time())
> d$zone <- NULL
> d$zone <- ""
> d

[..]

>> Hope I'm not doing something illegal...
>> 
> You are.  You're changing the internal structure of a POSIXlt object
> by re-ordering the list elements.  You should not expect a malformed
> POSIXlt object to behave as if it's correctly formed.  You can see
> it's malformed by comparing it's unclass()'d output.

> d <- as.POSIXlt(Sys.time())
> unclass(d)  # valid POSIXlt object
> d$zone <- NULL
> d$zone <- ""
> unclass(d)  # your malformed POSIXlt object

Indeed, really illegal, i.e. "against the law" ... ;-)

Thank you, Joshua!

Still, if R segfaults without the user explicitly
calling .Call(), .Internal()  or similar -- as here --
we usually acknowledge there *is* a bug in R .. even if it is
only triggered by a users "illegal" messing around.

an MRE for the above, where I really only re-order the "internal" list:

d <- as.POSIXlt("2016-12-06"); dz <- d$zone; d$zone <- NULL; d$zone <- dz; f <- 
format(d)

>  *** caught segfault ***
> address 0x8020, cause 'memory not mapped'

> Traceback:
>  1: format.POSIXlt(d)
>  2: format(d)

The current code is "optimized for speed" (not perfectly), and
a patch should hopefully address the C code.

Note that a smaller MRE -- which does *not* re-order, but just
invalidate the time zone is

  d <- as.POSIXlt("2016-12-06"); d$zone <- 1; f <- format(d)

--

I have now committed a "minimal" patch (to the C code) which for
the above two cases gives a sensible error rather than a
seg.fault :

  > d <- as.POSIXlt("2016-12-06"); d$zone <- 1 ; f <- format(d)
  Error in format.POSIXlt(d) : 
invalid 'zone' component in "POSIXlt" structure

  > d <- as.POSIXlt("2016-12-06"); dz <- d$zone; d$zone <- NULL; d$zone <- dz; 
f <- format(d)
  Error in format.POSIXlt(d) : 
invalid 'zone' component in "POSIXlt" structure
  > 

I guess that it should still be possible to produce a segfault
with invalid 'POSIXlt' structures though.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] problem with normalizePath()

2016-12-01 Thread Martin Maechler
>>>>> Evan Cortens <ecort...@mtroyal.ca>
>>>>> on Wed, 30 Nov 2016 09:58:59 -0700 writes:

> I found this as well. At our institution, our home directories are on
> network shares that are mapped to local drives. The default, it appears, 
is
> to set the location for libraries (etc) to the network share name
> (//computer//share/director/a/b/user) rather than the local drive mapping
> (H:/). Given the issue with dir.create(), this means it's impossible to
> install packages (since it tries to "create" the share, not the highest
> directory). This can be fixed in the same way Michael found, namely, set
> the environment variables to use the local mapping rather than the network
> share. But ideally, the fix would be to treat Windows network paths
> correctly.

Yes, and why shouldn't Microsoft be the institution who can best
judge how to do that,  now that they sell a "Microsoft R"  ?? 
!??!?!??!?!??!?
(trying again with BCC;  next time, I'll use CC).

(a slowly increasingly frustrated)
Martin Maechler
ETH Zurich

> Best,
> Evan

> On Wed, Nov 30, 2016 at 7:16 AM, Laviolette, Michael <
> michael.laviole...@dhhs.nh.gov> wrote:

>> In researching another issue, I discovered a workaround: the network 
drive
>> folder needs to be mapped to the local PC.
>> 
>> setwd("//Hzndhhsvf2/data/OCPH/EPI/BHSDM/Group/Michael Laviolette/Stat
>> tools")
>> df1 <- readxl::read_excel("addrlist-4-MikeL.xls", 2)
>> # fails, throws same error
>> df2 <- readxl::read_excel("Z:/Stat tools/addrlist-4-MikeL.xls", 2)  #
>> works
>> 
>> -Original Message-
>> From: Martin Maechler [mailto:maech...@stat.math.ethz.ch]
>> Sent: Friday, November 18, 2016 3:37 PM
>> To: Evan Cortens
>> Cc: Laviolette, Michael; r-devel@r-project.org
>> Subject: Re: [Rd] problem with normalizePath()
>> 
>> >>>>> Evan Cortens <ecort...@mtroyal.ca>
>> >>>>> on Thu, 17 Nov 2016 15:51:03 -0700 writes:
>> 
>> > I wonder if this could be related to the issue that I
>> > submitted to bugzilla about two months ago? (
>> > https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17159)
>> 
>> > That is to say, could it be that it's treating the first
>> > path after the single backslash as an actual directory,
>> > rather than as the name of the share?
>> 
>> > --
>> > Evan Cortens, PhD Institutional Analyst - Office of
>> > Institutional Analysis Mount Royal University 403-440-6529
>> 
>> Could well be.  Thank you, Evan, also for your bug report including patch
>> proposal.
>> 
>> In such situations we (R core) would be really happy if Microsoft showed
>> another facet of their investment into R:
>> Ideally there should be enough staff who can judge and test such bugs and
>> bug fixes?
>> 
--> I'm BCC'ing this to one place at least.
>> 
>> Best,
>> Martin Maechler  ETH Zurich
>> 
>> > On Thu, Nov 17, 2016 at 2:28 PM, Laviolette, Michael <
>> > michael.laviole...@dhhs.nh.gov> wrote:
>> 
>> >> The packages "readxl" and "haven" (and possibly others)
>> >> no longer access files on shared network drives. The
>> >> problem appears to be in the normalizePath()
>> >> function. The file can be read from a local drive or by
>> >> functions that don't call normalizePath(). The error
>> >> thrown is
>> >>
>> >> Error:
>> >> path[1]="\\Hzndhhsvf2/data/OCPH/EPI/BHSDM/Group/17.xls":
>> >> The system cannot find the file specified
>> >>
>> >> Here's my session:
>> >>
>> >> library(readxl) library(XLConnect)
>> >>
>> >> # attempting to read file from network drive df1 <-
>> >> read_excel("//Hzndhhsvf2/data/OCPH/EPI/BHSDM/Group/17.xls")
>> >> # pathname is fully qualified, but error thrown as above
>> >>
>> >> cat(normalizePath("//Hzndhhsvf2/data/OCPH/EPI/BHSDM/Group/17.xls"))
>> >> # throws same error
>> >>
>> >> # reading same file with different function df2 <-
>> >> readWorksheetFromFile(&

Re: [Rd] Different results for cos,sin,tan and cospi,sinpi,tanpi

2016-12-01 Thread Martin Maechler
>>>>> Martin Maechler <maech...@stat.math.ethz.ch>
>>>>> on Thu, 1 Dec 2016 09:36:10 +0100 writes:

>>>>> Ei-ji Nakama <nak...@ki.rim.or.jp>
>>>>> on Thu, 1 Dec 2016 14:39:55 +0900 writes:

>> Hi,
>> i try sin, cos, and tan.

>>> sapply(c(cos,sin,tan),function(x,y)x(y),1.23e45*pi)
>> [1] 0.5444181 0.8388140 1.5407532

>> However, *pi results the following

>>> sapply(c(cospi,sinpi,tanpi),function(x,y)x(y),1.23e45)
>> [1] 1 0 0

>> Please try whether the following becomes all right.

> [..]

> Yes, it does  -- the fix will be in all future versions of R.

oops not so quickly, Martin!

Of course, the results then coincide,  by sheer implementation.

*BUT* it is not at all clear which of the two results is better;
e.g., if you replace '1.23' by '1' in the above examples, the
result of the unchnaged  *pi() functions is 100% accurate,
whereas

 R> sapply(c(cos,sin,tan), function(Fn) Fn(1e45*pi))
 [1] -0.8847035 -0.4661541  0.5269043

is "garbage".  After all,  1e45 is an even integer and so, the
(2pi)-periodic functions should give the same as for 0  which
*is*  (1, 0, 0).

For such very large arguments, the results of all of sin() ,
cos() and tan()  are in some sense "random garbage" by
necessity:
Such large numbers have zero information about the resolution modulo
[0, 2pi)  or (-pi, pi]  and hence any (non-trivial) periodic
function with such a "small" period can only return "random noise".


> Thank you very much Ei-ji Nakama, for this valuable contribution
> to make R better!

That is still true!  It raises the issue to all of us and will
improve the documentation at least!

At the moment, I'm not sure where we should go.
Of course, I could start experiments using my own 'Rmpfr'
package where I can (with increasing computational effort!) get
correct values (for increasingly larger arguments) but at the
moment, I don't see how this would help.

Martin

> Martin Maechler,
> ETH Zurich


>> -- 
>> Best Regards,
>> --
>> Eiji NAKAMA 
>> "\u4e2d\u9593\u6804\u6cbb"  

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Different results for cos,sin,tan and cospi,sinpi,tanpi

2016-12-01 Thread Martin Maechler
>>>>> Ei-ji Nakama <nak...@ki.rim.or.jp>
>>>>> on Thu, 1 Dec 2016 14:39:55 +0900 writes:

> Hi,
> i try sin, cos, and tan.

>> sapply(c(cos,sin,tan),function(x,y)x(y),1.23e45*pi)
> [1] 0.5444181 0.8388140 1.5407532

> However, *pi results the following

>> sapply(c(cospi,sinpi,tanpi),function(x,y)x(y),1.23e45)
> [1] 1 0 0

> Please try whether the following becomes all right.

[..]

Yes, it does  -- the fix will be in all future versions of R.

Thank you very much Ei-ji Nakama, for this valuable contribution
to make R better!

Martin Maechler,
ETH Zurich


> -- 
> Best Regards,
> --
> Eiji NAKAMA 
> "\u4e2d\u9593\u6804\u6cbb"  

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ifelse() woes ... can we agree on a ifelse2() ?

2016-11-28 Thread Martin Maechler
ded to solve this differently.

I'm looking at these suggestions now, notably also your proposals below;
thank you, Suharto!

(I wanted to put my improved 'ifelse2' out first, quickly).
Martin


> A concrete version of 'ifelse2' that starts the result from 'yes':
> function(test, yes, no, NA. = NA) {
>     if(!is.logical(test))
>         test <- if(isS4(test)) methods::as(test, "logical") else 
as.logical(test)
>     n <- length(test)
>     ans <- rep(yes, length.out = n)
>     ans[!test & !is.na(test)] <- rep(no, length.out = n)[!test & 
!is.na(test)]
>     ans[is.na(test)] <- rep(NA., length.out = n)[is.na(test)]
>     ans
> }

> It requires 'rep' method that is compatible with subsetting. It also 
works with "POSIXlt" in R 2.7.2, when 'length' gives 9, and gives an 
appropriate result if time zones are the same.
> For coercion of 'test', there is no need of keeping attributes. So, it 
doesn't do
> storage.mode(test) <- "logical"
> and goes directly to 'as.logical'.
> It relies on subassignment for silent coercions of
> logical < integer < double < complex .
> Unlike 'ifelse', it never skips any subassignment. So, phenomenon as in 
"example of different return modes" in ?ifelse doesn't happen.

> Another version, for keeping attributes as pointed out by Duncan Murdoch:
> function(test, yes, no, NA. = NA) {
>     if(!is.logical(test))
>         test <- if(isS4(test)) methods::as(test, "logical") else 
as.logical(test)
>     n <- length(test)
>     n.yes <- length(yes); n.no <- length(no)
>     if (n.yes != n) {
>         if (n.no == n) {  # swap yes <-> no
>             test <- !test
>             ans <- yes; yes <- no; no <- ans
>             n.no <- n.yes
>         } else yes <- yes[rep_len(seq_len(n.yes), n)]
>     }
>     ans <- yes
>     if (n.no == 1L)
>         ans[!test] <- no
>     else
>         ans[!test & !is.na(test)] <- no[
>             if (n.no == n) !test & !is.na(test)
>             else rep_len(seq_len(n.no), n)[!test & !is.na(test)]]
>     stopifnot(length(NA.) == 1L)
>     ans[is.na(test)] <- NA.
>     ans
> }

> Note argument evaluation order: 'test', 'yes', 'no', 'NA.'.
> First, it chooses the first of 'yes' and 'no' that has the same length as 
the result. If none of 'yes' and 'no' matches the length of the result, it 
chooses recycled (or truncated) 'yes'.
> It uses 'rep' on the index and subsetting as a substitute for 'rep' on 
the value.
> It requires 'length' method that is compatible with subsetting.
> Additionally, it uses the same idea as dplyr::if_else, or more precisely 
the helper function 'replace_with'. It doesn't use 'rep' if the length of 'no' 
is 1 or is the same as the length of the result. For subassignment with value 
of length 1, recycling happens by itself and NA in index is OK.
    > It limits 'NA.' to be of length 1, considering 'NA.' just as a label for 
NA.

> Cases where the last version above or 'ifelse2 or 'ifelseHW' in 
ifelse-def.R gives inappropriate answers:
> - 'yes' and 'no' are "difftime" objects with different "units" attribute
> - 'yes' and 'no' are "POSIXlt" objects with different time zone
> Example: 'yes' in "UTC" and 'no' in "EST5EDT". The reverse, 'yes' in 
"EST5EDT" and 'no' in "UTC" gives error.

> For the cases, c(yes, no) helps. Function 'ifelseJH' in ifelse-def.R 
gives a right answer for "POSIXlt" case.
> -
> Martin et al.,




> On Tue, Nov 22, 2016 at 2:12 AM, Martin Maechler > wrote:

>> 
>> Note that my premise was really to get *away* from inheriting
>> too much from 'test'.
>> Hence, I have *not* been talking about replacing ifelse() but
>> rather of providing a new  ifelse2()
>> 
>>         [ or if_else()  if Hadley was willing to ditch the dplyr one
>>                         in favor of a base one]
>> 
>>      > Specifically, based on an unrelated discussion with Henrik 
Bengtsson
>> on
>>      > Twitter, I wonder if preserving the recycling behavior test is
>> longer than
>>      > yes, no, but making the case where
>> 
>>      > length( test ) < max(length( yes ), length( no ))
>> 
>>      > would simplify usage for userRs in a useful way.
>> 

> That was a copyediting bug on my part, it seem

Re: [Rd] ifelse() woes ... can we agree on a ifelse2() ?

2016-11-28 Thread Martin Maechler

> Related to the length of 'ifelse' result, I want to say that "example of 
> different return modes" in ?ifelse led me to perceive a wrong thing in the 
> past.
>  ## example of different return modes:
>  yes <- 1:3
>  no <- pi^(0:3)
>  typeof(ifelse(NA,yes, no)) # logical
>  typeof(ifelse(TRUE,  yes, no)) # integer
>  typeof(ifelse(FALSE, yes, no)) # double
> 
> As the result of each 'ifelse' call is not printed, I thought that the length 
> of the result is 3. In fact, the length of the result is 1.

"of course"... (;-)

But this indeed proves that the example is too sophisticated and
not helpful/clear enough.
Is this better?

## example of different return modes (and 'test' alone determining length):
yes <- 1:3
no  <- pi^(1:4)
utils::str( ifelse(NA,yes, no) ) # logical, length 1
utils::str( ifelse(TRUE,  yes, no) ) # integer, length 1
utils::str( ifelse(FALSE, yes, no) ) # double,  length 1



> I realize just now that the length of 'no' is different from 'yes'. The 
> length of 'yes' is 3, the length of 'no' is 4.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] shared libraries: missing soname

2016-11-23 Thread Martin Maechler
>>>>> Joseph Mingrone <j...@ftfl.ca>
>>>>> on Tue, 22 Nov 2016 22:21:49 -0400 writes:

> Dirk Eddelbuettel <e...@debian.org> writes:
>> On 22 November 2016 at 00:02, Joseph Mingrone wrote:
>> | These are also not fatal errors on FreeBSD, where everything, for now, 
also just
>> | works.  ...until a library's interface changes.  You seem to be 
arguing that
>> | sonmaes are pointless.  We disagree.

>> You are putting words in my mouth. In my very first reply to you, I 
pointed
>> out that (for non-BSD systems at least) the sonames do not matter as R 
loads
>> the libraries itself, rather than via ldd.  No more, no less.

> Let me restate.  You seem to be arguing that, because R itself doesn't 
consume
> it's shared libraries via ldd(), sonames serve no purpose, in this case.  
Please
> correct me if I'm putting words in your mouth.

>> | I can't say for certain (I'm not an rkward user), but looking at the 
build

>> Why did _you_ then bring up rkward as an example? That was your 
suggestion.

> Because you asked, "Yes, well, but are there other customers?"  Also, I'm 
trying
> to put myself in the perspective of package users.

> Is this a more appropriate example?

> # ldd /usr/local/lib/R/library/tseries/libs/tseries.so | grep libR
> libRblas.so => /usr/local/lib/R/lib/libRblas.so (0x80120c000)
> libR.so => /usr/local/lib/R/lib/libR.so (0x801c0)

Well, Dirk has said to have given his last reply on this thread.
I (as a member of R-core) am glad about people like Dirk who
take some of our load and helpfully answer such
questions/reports on R-devel.

To the issue:  I also don't see what your point is.
R works with these so libraries  as intended  in all cases as
far as we know, and so I don't understand why anything needs to
be changed.
All these libraries "belong to R" and are tied to a specific
version of R  and are not be used outside of R,  so I also don't see
your point.

Best regards,

Martin Maechler
ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ifelse() woes ... can we agree on a ifelse2() ?

2016-11-22 Thread Martin Maechler
>>>>> Gabriel Becker <gmbec...@ucdavis.edu>
>>>>> on Tue, 15 Nov 2016 11:56:04 -0800 writes:

> All,
> Martin: Thanks for this and all the other things you are doing to both
> drive R forward and engage more with the community about things like this.

> Apologies for missing this discussion the first time it came around and if
> anything here has already been brought up, but I wonder what exactly you
> mean when you want recycling behavior.

Thank you, Gabe.

Note that my premise was really to get *away* from inheriting
too much from 'test'.
Hence, I have *not* been talking about replacing ifelse() but
rather of providing a new  ifelse2()

   [ or if_else()  if Hadley was willing to ditch the dplyr one
   in favor of a base one]

> Specifically, based on an unrelated discussion with Henrik Bengtsson on
> Twitter, I wonder if preserving the recycling behavior test is longer than
> yes, no, but making the case where

> length( test ) < max(length( yes ), length( no ))

> would simplify usage for userRs in a useful way.

I'm sorry I don't understand the sentence above.

> I suspect it's easy to
> forget that the result is not guaranteed to be the length of  test, even
> for intermediate and advanced users familiar with ifelse and it's
> strengths/weaknesses.

> I certainly agree (for what that's worth...) that

> x = rnorm(100)

> y = ifelse2(x > 0, 1L, 2L)

> should continue to work.

(and give a a length 10 result).
Also
ifelse2(x > 0, sqrt(x), 0L)

should work even though  class(sqrt(x)) is "numeric" and the one
of 0L is "integer", and I'd argue

ifelse2(x < 0, sqrt(x + 0i), sqrt(x))

should also continue to work as with ifelse().

> Also, If we combine a stricter contract that the output will always be of
> length with the suggestion of a specified output class 

that was not my intent here but would be another interesting
extension. However, I would like to keep  R-semantic silent coercions
such as
  logical < integer < double < complex

and your pseudo code below would not work so easily I think.

> the pseudo code could be

(I'm changing assignment '=' to  '<-' ...  [please!] )

> ifelse2 <- function(test, yes, no, outclass) {
>   lenout  <- length(test)
>   out <- as( rep(yes, length.out <- lenout), outclass)
>   out[!test] <- as(rep(no, length.out = lenout)[!test], outclass)
>   # handle NA stuff
>   out
> }


> NAs could be tricky if outclass were allowed to be completely general, but
> doable, I think? Another approach  if we ARE fast-passing while leaving
> ifelse intact is that maybe NA's in test just aren't allowed in ifelse2.
> I'm not saying we should definitely do that, but it's possible and would
> make things faster.

> Finally, In terms of efficiency, with the stuff that Luke and I are 
working
> on, the NA detection could be virtually free in certain cases, which could
> give a nice boost for long vectors  that don't have any NAs (and 'know'
> that they don't).

That *is* indeed a very promising prospect!
Thank you in advance! 

> Best,
> ~G

I still am bit disappointed by the fact that it seems nobody has
taken a good look at my ifelse2() proposal.

I really would like an alternative to ifelse() in *addition* to
the current ifelse(), but hopefully in the future being used in
quite a few places instead of ifelse()
efficiency but for changed semantics, namely working for considerably
more "vector like" classes of  'yes' and 'no'  than the current
ifelse().

As I said, the current proposal works for objects of class
   "Date", "POSIXct", "POSIXlt", "factor",  "mpfr" (pkg 'Rmpfr')
and hopefully for "sparseVector" (in a next version of the 'Matrix' pkg).

Martin

> On Tue, Nov 15, 2016 at 3:58 AM, Martin Maechler 
<maech...@stat.math.ethz.ch
>> wrote:

>> Finally getting back to this :
>> 
>> >>>>> Hadley Wickham <h.wick...@gmail.com>
>> >>>>> on Mon, 15 Aug 2016 07:51:35 -0500 writes:
>> 
>> > On Fri, Aug 12, 2016 at 11:31 AM, Hadley Wickham
>> > <h.wick...@gmail.com> wrote:
>> >>> >> One possibility would also be to consider a
>> >>> "numbers-only" or >> rather "same type"-only {e.g.,
>> >>> would also work for characters} >> version.
>> >>>
>> >>> > I don't know what you mean by these.
>> >>>
>

Re: [R-pkg-devel] Question about configure file and system requirements

2016-11-21 Thread Martin Maechler
>>>>> Lorenzo Busetto <lbus...@gmail.com>
>>>>> on Fri, 18 Nov 2016 23:04:53 +0100 writes:

> Dear all, a quick question:

> while preparing for a CRAN submission, am I supposed to
> include a "configure" file and list for system
> requirements also if those system requirements "come from"
> the packages that I import ?

If the requirements are *only* in these packages, and not
(directly) in your package's functions,
you should *not* duplicate the checks (nor the entries in
DESCRIPTION),

by the same logic that you should not list 'Depends', 'Imports',
'Suggests', etc of those packages.

Martin Maechler
ETH Zurich

> Asking this because the "configure" and system
> requirements are already present in the imported packages,
> so my configuration check would duplicate what is already
> done while installing the dependencies.

> sorry if this is dumb...

> Lorenzo

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] problem with normalizePath()

2016-11-18 Thread Martin Maechler
>>>>> Evan Cortens <ecort...@mtroyal.ca>
>>>>> on Thu, 17 Nov 2016 15:51:03 -0700 writes:

> I wonder if this could be related to the issue that I
> submitted to bugzilla about two months ago? (
> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17159)

> That is to say, could it be that it's treating the first
> path after the single backslash as an actual directory,
> rather than as the name of the share?

> -- 
> Evan Cortens, PhD Institutional Analyst - Office of
> Institutional Analysis Mount Royal University 403-440-6529

Could well be.  Thank you, Evan, also for your bug report
including patch proposal.

In such situations we (R core) would be really happy if
Microsoft showed another facet of their investment into R:
Ideally there should be enough staff who can judge and test such
bugs and bug fixes? 

--> I'm BCC'ing this to one place at least.

Best,
Martin Maechler  ETH Zurich

> On Thu, Nov 17, 2016 at 2:28 PM, Laviolette, Michael <
> michael.laviole...@dhhs.nh.gov> wrote:

>> The packages "readxl" and "haven" (and possibly others)
>> no longer access files on shared network drives. The
>> problem appears to be in the normalizePath()
>> function. The file can be read from a local drive or by
>> functions that don't call normalizePath(). The error
>> thrown is
>> 
>> Error:
>> path[1]="\\Hzndhhsvf2/data/OCPH/EPI/BHSDM/Group/17.xls":
>> The system cannot find the file specified
>> 
>> Here's my session:
>> 
>> library(readxl) library(XLConnect)
>> 
>> # attempting to read file from network drive df1 <-
>> read_excel("//Hzndhhsvf2/data/OCPH/EPI/BHSDM/Group/17.xls")
>> # pathname is fully qualified, but error thrown as above
>> 
>> cat(normalizePath("//Hzndhhsvf2/data/OCPH/EPI/BHSDM/Group/17.xls"))
>> # throws same error
>> 
>> # reading same file with different function df2 <-
>> readWorksheetFromFile("//Hzndhhsvf2/data/OCPH/EPI/BHSDM/Group/17.xls",
>> 1) # completes successfully
>> 
>> # reading same file from local drive df3 <-
>> read_excel("C:/17.xls") # completes successfully
>> 
>> sessionInfo() R version 3.3.2 (2016-10-31) Platform:
>> x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7
>> x64 (build 7601) Service Pack 1
>> 
>> locale: [1] LC_COLLATE=English_United States.1252
>> LC_CTYPE=English_United States.1252 [3]
>> LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5]
>> LC_TIME=English_United States.1252
>> 
>> attached base packages: [1] stats graphics grDevices
>> utils datasets methods base
>> 
>> other attached packages: [1] readxl_0.1.1 dplyr_0.5.0
>> XLConnect_0.2-12 [4] XLConnectJars_0.2-12 ROracle_1.2-1
>> DBI_0.5-1
>> 
>> loaded via a namespace (and not attached): [1]
>> magrittr_1.5 R6_2.2.0 assertthat_0.1 tools_3.3.2
>> haven_1.0.0 [6] tibble_1.2 Rcpp_0.12.7 rJava_0.9-8
>> 
>> Please advise.  Thanks,
>> 
>> Michael Laviolette PhD MPH Public Health Statistician
>> Bureau of Public Health Statistics and Informatics New
>> Hampshire Division of Public Health Services 29 Hazen
>> Drive Concord, NH 03301-6504 Phone: 603-271-5688 Fax:
>> 603-271-7623 Email: michael.laviole...@dhhs.nh.gov
>> 
>> 
>> 
>> [[alternative HTML version deleted]]
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 

>   [[alternative HTML version deleted]]

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ifelse() woes ... can we agree on a ifelse2() ?

2016-11-15 Thread Martin Maechler
Finally getting back to this :

>>>>> Hadley Wickham <h.wick...@gmail.com>
>>>>> on Mon, 15 Aug 2016 07:51:35 -0500 writes:

> On Fri, Aug 12, 2016 at 11:31 AM, Hadley Wickham
> <h.wick...@gmail.com> wrote:
>>> >> One possibility would also be to consider a
>>> "numbers-only" or >> rather "same type"-only {e.g.,
>>> would also work for characters} >> version.
>>>
>>> > I don't know what you mean by these.
>>>
>>> In the mean time, Bob Rudis mentioned dplyr::if_else(),
>>> which is very relevant, thank you Bob!
>>>
>>> As I have found, that actually works in such a "same
>>> type"-only way: It does not try to coerce, but gives an
>>> error when the classes differ, even in this somewhat
>>> debatable case :
>>>
>>> > dplyr::if_else(c(TRUE, FALSE), 2:3, 0+10:11) Error:
>>> `false` has type 'double' not 'integer'
>>> >
>>>
>>> As documented, if_else() is clearly stricter than
>>> ifelse() and e.g., also does no recycling (but of
>>> length() 1).
>>
>> I agree that if_else() is currently too strict - it's
>> particularly annoying if you want to replace some values
>> with a missing:
>>
>> x <- sample(10) if_else(x > 5, NA, x) # Error: `false`
>> has type 'integer' not 'logical'
>>
>> But I would like to make sure that this remains an error:
>>
>> if_else(x > 5, x, "BLAH")
>>
>> Because that seems more likely to be a user error (but
>> reasonable people might certainly believe that it should
>> just work)
>>
>> dplyr is more accommodating in other places (i.e. in
>> bind_rows(), collapse() and the joins) but it's
>> surprisingly hard to get all the details right. For
>> example, what should the result of this call be?
>>
>> if_else(c(TRUE, FALSE), factor(c("a", "b")),
>> factor(c("c", "b"))
>>
>> Strictly speaking I think you could argue it's an error,
>> but that's not very user-friendly. Should it be a factor
>> with the union of the levels? Should it be a character
>> vector + warning? Should the behaviour change if one set
>> of levels is a subset of the other set?
>>
>> There are similar issues for POSIXct (if the time zones
>> are different, which should win?), and difftimes
>> (similarly for units).  Ideally you'd like the behaviour
>> to be extensible for new S3 classes, which suggests it
>> should be a generic (and for the most general case, it
>> would need to dispatch on both arguments).

> One possible principle would be to use c() -
> i.e. construct out as

> out <- c(yes[0], no[0]
> length(out) <- max(length(yes), length(no))

yes; this would require that a  `length<-` method works for the
class of the result.

Duncan Murdoch mentioned a version of this, in his very
first reply:

ans <- c(yes, no)[seq_along(test)]
ans <- ans[seq_along(test)]

which is less efficient for atomic vectors, but requires
less from the class: it "only" needs `c` and `[` to work

and a mixture of your two proposals would be possible too:

ans <- c(yes[0], no[0])
ans <- ans[seq_along(test)]

which does *not* work for my "mpfr" numbers (CRAN package 'Rmpfr'),
but that's a buglet in the  c.mpfr() implementation of my Rmpfr
package... (which has already been fixed in the development version on R-forge,
https://r-forge.r-project.org/R/?group_id=386)

> But of course that wouldn't help with factor responses.

Yes.  However, a version of Duncan's suggestion -- of treating 'yes' first
-- does help in that case.

For once, mainly as "feasability experiment",
I have created a github gist to make my current ifelse2() proposal available
for commenting, cloning, pullrequesting, etc:

Consisting of 2 files
- ifelse-def.R :  Functions definitions only, basically all the current
proposals, called  ifelse*()
- ifelse-checks.R : A simplistic checking function
and examples calling it, notably demonstrating that my
ifelse2()  does work with
"Date",  (i.e. "POSIXct" and "POSIXlt"), factors,
    and "mpfr" (the arbitrary-precision numbers in my package "Rmpfr")

Also if you are not on github, you can quickly get to the ifelse2()
definition :

htt

Re: [Rd] Missing objects using dump.frames for post-mortem debugging of crashed batch jobs. Bug or gap in documentation?

2016-11-15 Thread Martin Maechler
>>>>> nospam@altfeld-im de <nos...@altfeld-im.de>
>>>>> on Tue, 15 Nov 2016 01:15:46 +0100 writes:

> Martin, thanks for the good news and sorry for wasting your (and others
> time) by not doing my homework and query bugzilla first (lesson learned!
> ).
> 
> I have tested the new implementation from R-devel and observe a semantic
> difference when playing with the parameters:
> 
>   # Test script 1
>   g <- "global"
>   f <- function(p) {
> l <- "local"
> dump.frames()
>   }
>   f("parameter")
> 
> results in
>   # > debugger()
>   # Message:  object 'server' not foundAvailable environments had calls:
>   # 1: source("~/.active-rstudio-document", echo = TRUE)
>   # 2: withVisible(eval(ei, envir))
>   # 3: eval(ei, envir)
>   # 4: eval(expr, envir, enclos)
>   # 5: .active-rstudio-document#9: f("parameter")
>   # 
>   # Enter an environment number, or 0 to exit  
>   # Selection: 5
>   # Browsing in the environment with call:
>   #   .active-rstudio-document#9: f("parameter")
>   # Called from: debugger.look(ind)
>   # Browse[1]> g
>   # [1] "global"
>   # Browse[1]> 
> 
> while dumping to a file
> 
>   # Test script 2
>   g <- "global"
>   f <- function(p) {
> l <- "local"
> dump.frames(to.file = TRUE, include.GlobalEnv = TRUE)
>   }
>   f("parameter")
> 
> results in
>   # > load("last.dump.rda")
>   # > debugger()

>   # Message:  object 'server' not foundAvailable environments had calls:
>   # 1: .GlobalEnv
>   # 2: source("~/.active-rstudio-document", echo = TRUE)
>   # 3: withVisible(eval(ei, envir))
>   # 4: eval(ei, envir)
>   # 5: eval(expr, envir, enclos)
>   # 6: .active-rstudio-document#11: f("parameter")
>   # 
>   # Enter an environment number, or 0 to exit  
>   # Selection: 6
>   # Browsing in the environment with call:
>   #   .active-rstudio-document#11: f("parameter")
>   # Called from: debugger.look(ind)
>   # Browse[1]> g
>   # Error: object 'g' not found
>   # Browse[1]> 

Your call to f() and the corresponding dump is heavily
obfuscated by all the wrap paper that Rstudio seems to wrap around a
simple function call (or just around using debugger() ?).

All this was to get the correct environments when things are run
in a batch job... and there's no Rstudio gift wrapping in that case.

In my simple use of the above, "g" is clearly available in the .GlobalEnv
component of last.dump :

> exists("g", last.dump$.GlobalEnv)
[1] TRUE
> get("g", last.dump$.GlobalEnv)
[1] "global"
> 

and that's all what's promised, right?
In such a post mortem debugging, notably from a batch job (!),
you don't want your .GlobalEnv to be *replaced* by the
.GlobalEnv from 'last.dump', do you?

I think in the end, I think you are indirectly asking for new features to be
added to  debugger(), namely that it works more seemlessly with
a last.dump object that has been created via 'include.GlobalEnv = TRUE'.

This wish for a new feature may be a very sensible wish.
I think it's fine if you add it as wish (for a new feature to
debugger()) to the R bugzilla site
( https://bugs.r-project.org/ -- after asking one of R core to
  add you to the list of "registered ones" there, see the
  boldface note in https://www.r-project.org/bugs.html )

Personally, I would only look into this issue if we also get a patch
proposal (see also https://www.r-project.org/bugs.html), because
already now you can easily get to "g" in your example.

Martin

> The semantic difference is that the global variable "g" is visible
> within the function "f" in the first version, but not in the second
> version.
> 
> If I dump to a file and load and debug it then the search path through
> the
> frames is not the same during run time vs. debug time.
> 
> An implementation with the same semantics could be achieved
> by applying this workaround currently:
> 
>   dump.frames()
>   save.image(file = "last.dump.rda")
> 
> Does it possibly make sense to unify the semantics?
> 
> THX!
> 
> 
> On Mon, 2016-11-14 at 11:34 +0100, Martin Maechler wrote:
> > >>>>> nospam@altfeld-im de <nos...@altfeld-im.de>
> > >>>>> on Sun, 13 Nov 2016 13:11:38 +0100 writes:
> > 
> > > Dear R friends, to allow post-mortem debugging In my
> > > Rscript based batch jobs I use
> > 
> > >tryCatch( , error = function(e) {
> > > dump.frames(to.file = TRUE) })
> > 
> > >

Re: [Rd] Missing objects using dump.frames for post-mortem debugging of crashed batch jobs. Bug or gap in documentation?

2016-11-14 Thread Martin Maechler
> nospam@altfeld-im de 
> on Sun, 13 Nov 2016 13:11:38 +0100 writes:

> Dear R friends, to allow post-mortem debugging In my
> Rscript based batch jobs I use

>tryCatch( , error = function(e) {
> dump.frames(to.file = TRUE) })

> to write the called frames into a dump file.

> This is similar to the method recommended in the "Writing
> R extensions" manual in section 4.2 Debugging R code (page
> 96):

> https://cran.r-project.org/doc/manuals/R-exts.pdf

>> options(error = quote({dump.frames(to.file=TRUE); q()}))



> When I load the dump later in a new R session to examine
> the error I use

> load(file = "last.dump.rda") debugger(last.dump)

> My problem is that the global objects in the workspace are
> NOT contained in the dump since "dump.frames" does not
> save the workspace.

> This makes debugging difficult.



> For more details see the stackoverflow question + answer
> in:
> 
https://stackoverflow.com/questions/40421552/r-how-make-dump-frames-include-all-variables-for-later-post-mortem-debugging/40431711#40431711



> I think the reason of the problem is:
> 

> If you use dump.files(to.file = FALSE) in an interactive
> session debugging works as expected because it creates a
> global variable called "last.dump" and the workspace is
> still loaded.

> In the batch job scenario however the workspace is NOT
> saved in the dump and therefore lost if you debug the dump
> in a new session.


> Options to solve the issue:
> --

> 1. Improve the documentation of the R help for
> "dump.frames" and the R_exts manual to propose another
> code snippet for batch job scenarios:

>   dump.frames() save.image(file = "last.dump.rda")

> 2. Change the semantics of "dump.frames(to.file = TRUE)"
> to include the workspace in the dump.  This would change
> the semantics implied by the function name but makes the
> semantics consistent for both "to.file" param values.

There is a third option, already in place for three months now:
Andreas Kersting did propose it (nicely, as a wish),
https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17116
and I had added it to the development version of R back then :


r71102 | maechler | 2016-08-16 17:36:10 +0200 (Tue, 16 Aug 2016) | 1 line

dump.frames(*, include.GlobalEnv)


So, if you (or others) want to use this before next spring,
you should install a version of R-devel
and you use that, with

  tryCatch( ,
   error = function(e)
   dump.frames(to.file = TRUE, include.GlobalEnv = TRUE))

Using R-devel is nice and helpful for the R community, as you
will help finding bugs/problems in the new features (and
possibly changed features) we've introduced there. 


Best regards,
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Memory leak with tons of closed connections

2016-11-14 Thread Martin Maechler
> Gábor Csárdi 
> on Sun, 13 Nov 2016 20:49:57 + writes:

> Using dup() before fdopen() (and calling fclose() on the connection
> when it is closed) indeed fixes the memory leak.
> 

Thank you, Gábor!
Yes I can confirm that this fixes the memory leak.

I'm testing ('make check-all') currently and then (probably) will
commit the patch R-devel only for the time being.

Martin

> FYI,
> Gabor
> 
> Index: src/main/connections.c
> ===
> --- src/main/connections.c (revision 71653)
> +++ src/main/connections.c (working copy)
> @@ -576,7 +576,7 @@
>  fp = R_fopen(name, con->mode);
>  } else {  /* use file("stdin") to refer to the file and not the console 
> */
>  #ifdef HAVE_FDOPEN
> - fp = fdopen(0, con->mode);
> +fp = fdopen(dup(0), con->mode);
>  #else
>   warning(_("cannot open file '%s': %s"), name,
>   "fdopen is not supported on this platform");
> @@ -633,8 +633,7 @@
>  static void file_close(Rconnection con)
>  {
>  Rfileconn this = con->private;
> -if(con->isopen && strcmp(con->description, "stdin"))
> - con->status = fclose(this->fp);
> +con->status = fclose(this->fp);
>  con->isopen = FALSE;
>  #ifdef Win32
>  if(this->anon_file) unlink(this->name);
> 
> On Fri, Nov 11, 2016 at 1:12 PM, Gábor Csárdi  wrote:
> > On Fri, Nov 11, 2016 at 12:46 PM, Gergely Daróczi
> >  wrote:
> > [...]
> >>> I've changed the above to *print* the gc() result every 1000th
> >>> iteration, and after 100'000 iterations, there is still no
> >>> memory increase from the point of view of R itself.
> >
> > Yes, R does not know about it, it does not manage this memory (any
> > more), but the R process requested this memory from the OS, and never
> > gave it back, which is basically the definition of a memory leak. No?
> >
> > I think the leak is because 'stdin' is special and R opens it with fdopen():
> > https://github.com/wch/r-source/blob/f8cdadb769561970cc42776f563043ea5e12fe05/src/main/connections.c#L561-L579
> >
> > and then it does not close it:
> > https://github.com/wch/r-source/blob/f8cdadb769561970cc42776f563043ea5e12fe05/src/main/connections.c#L636
> >
> > I understand that R cannot fclose the FILE*, because that would also
> > close the file descriptor, but anyway, this causes a memory leak. I
> > think.
> >
> > It seems that you cannot close the FILE* without closing the
> > descriptor, so maybe a workaround would be to keep one FILE* open,
> > instead of calling fdopen() to create new ones every time. Another
> > possible workaround is to use dup(), but I don't know enough about the
> > details to be sure.
> >
> > Gabor
> >
> > [...]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Memory leak with tons of closed connections

2016-11-11 Thread Martin Maechler
> Gergely Daróczi 
> on Thu, 10 Nov 2016 16:48:12 +0100 writes:

> Dear All,
> I'm developing an R application running inside of a Java daemon on
> multiple threads, and interacting with the parent daemon via stdin and
> stdout.

> Everything works perfectly fine except for having some memory leaks
> somewhere. Simplified version of the R app:

> while (TRUE) {
> con <- file('stdin', open = 'r', blocking = TRUE)
> line <- scan(con, what = character(0), nlines = 1, quiet = TRUE)
> close(con)
> }

> This loop uses more and more RAM as time passes (see more on this
> below), not sure why, and I have no idea currently on how to debug
> this further. Can someone please try to reproduce it and give me some
> hints on what is the problem?

> Sample bash script to trigger an R process with such memory leak:

> Rscript --vanilla -e "while(TRUE)cat(runif(1),'\n')" | Rscript
> --vanilla -e 
"cat(Sys.getpid(),'\n');while(TRUE){con<-file('stdin',open='r',blocking=TRUE);line<-scan(con,what=character(0),nlines=1,quiet=TRUE);close(con);rm(con);gc()}"

> Maybe you have to escape '\n' depending on your shell.

> Thanks for reading this and any hints would be highly appreciated!

I have no hints, sorry... but give some more "data":

I've changed the above to *print* the gc() result every 1000th
iteration, and after 100'000 iterations, there is still no
memory increase from the point of view of R itself.

However, monitoring the process (via 'htop', e.g.) shows about
1 MB per second increase in memory foot print of the process.

One could argue that the error is with the OS / pipe / bash
rather than with R itself... but I'm not expert enough to do
argue  here at all.

Here's my version of your sample bash script and its output:

$  Rscript --vanilla -e "while(TRUE)cat(runif(1),'\n')" | Rscript --vanilla -e 
"cat(Sys.getpid(),'\n');i <- 0; 
while(TRUE){con<-file('stdin',open='r',blocking=TRUE);line<-scan(con,what=character(0),nlines=1,quiet=TRUE);close(con);rm(con);a
 <- gc(); i <- i+1; if(i %% 1000 == 1) {cat('i=',i,'\\n'); print(a)} }"

11059 
i= 1 
 used (Mb) gc trigger  (Mb) max used (Mb)
Ncells  83216  4.5   1000 534.1   213529 11.5
Vcells 172923  1.4   16777216 128.0   562476  4.3
i= 1001 
 used (Mb) gc trigger  (Mb) max used (Mb)
Ncells  83255  4.5   1000 534.1   213529 11.5
Vcells 172958  1.4   16777216 128.0   562476  4.3
...
...
...
...
i= 80001 
 used (Mb) gc trigger  (Mb) max used (Mb)
Ncells  83255  4.5   1000 534.1   213529 11.5
Vcells 172958  1.4   16777216 128.0   562476  4.3
i= 81001 
 used (Mb) gc trigger  (Mb) max used (Mb)
Ncells  83255  4.5   1000 534.1   213529 11.5
Vcells 172959  1.4   16777216 128.0   562476  4.3
i= 82001 
 used (Mb) gc trigger  (Mb) max used (Mb)
Ncells  83255  4.5   1000 534.1   213529 11.5
Vcells 172959  1.4   16777216 128.0   562476  4.3
i= 83001 
 used (Mb) gc trigger  (Mb) max used (Mb)
Ncells  83255  4.5   1000 534.1   213529 11.5
Vcells 172958  1.4   16777216 128.0   562476  4.3
i= 84001 
 used (Mb) gc trigger  (Mb) max used (Mb)
Ncells  83255  4.5   1000 534.1   213529 11.5
Vcells 172958  1.4   16777216 128.0   562476  4.3


> Best,
> Gergely

> PS1 see the image posted at
> 
http://stackoverflow.com/questions/40522584/memory-leak-with-closed-connections
> on memory usage over time
> PS2 the issue doesn't seem to be due to writing more data in the first
> R app compared to what the second R app can handle, as I tried the
> same with adding a Sys.sleep(0.01) in the first app and that's not an
> issue at all in the real application
> PS3 I also tried using stdin() instead of file('stdin'), but that did
> not work well for the stream running on multiple threads started by
> the same parent Java daemon
> PS4 I've tried this on Linux using R 3.2.3 and 3.3.2

For me, it's Linux, too (Fedora 24), using  'R 3.3.2 patched'..

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Running package tests and not stop on first fail

2016-11-08 Thread Martin Maechler
>>>>> Hervé Pagès <hpa...@fredhutch.org>
>>>>> on Mon, 7 Nov 2016 14:37:15 -0800 writes:

    > On 11/05/2016 01:53 PM, Martin Maechler wrote:
>>>>>>> Oliver Keyes <ironho...@gmail.com>
>>>>>>> on Fri, 4 Nov 2016 12:42:54 -0400 writes:
>> 
>> > On Friday, 4 November 2016, Martin Maechler
>> > <maech...@stat.math.ethz.ch> wrote:
>> 
>> >> >>>>> Dirk Eddelbuettel <e...@debian.org <javascript:;>>
>> >> >>>>> on Fri, 4 Nov 2016 10:36:52 -0500 writes:
>> >>
>> >> > On 4 November 2016 at 16:24, Martin Maechler wrote: |
>> >> My > proposed name '--no-stop-on-error' was a quick shot;
>> >> if | > somebody has a more concise or better "English
>> >> style" > wording | (which is somewhat compatible with all
>> >> the other > options you see | from 'R CMD check --help'),
>> >> | please > speak up.
>> >>
>> >> > Why not keep it simple?  The similar feature this most
>> >> > resembles is 'make -k' and its help page has
>> >>
>> >> > -k, --keep-going
>> >>
>> >> > Continue as much as possible after an > error.  While
>> >> the target that failed, and those that > depend on it,
>> >> cannot be remade, the other dependencies of > these
>> >> targets can be processed all the same.
>> >>
>> >> Yes, that would be quite a bit simpler and nice in my
>> >> view.  One may think it to be too vague,
>> 
>> > Mmn, I would agree on vagueness (and it breaks the pattern
>> > set by other flags of human-readability). Deep familiarity
>> > with make is probably not something we should ask of
>> > everyone who needs to test a package, too.
>> 
>> > I quite like stop-on-error=true (exactly the same as the
>> > previous suggestion but shaves off some characters by
>> > inverting the Boolean)
>> 
>> Thank you, Brian, Dirk and Oliver for these (and some offline)
>> thoughts and suggestions!
>> 
>> My current summary:
>> 
>> 1) I really don't want a  --=value
>> but rather stay with logical/binary variables that "express
>> themselves"... in the same way I strongly prefer
>> 
>> if (A_is_special)   
>> to
>> if (A_special == TRUE)  
>> 
>> for a logical variable A_* .   Yes, this is mostly a matter
>> of taste,.. but related to how R style itself "works"
>> 
>> 2) Brian mentioned that this is only about ./tests/ tests which
>> are continued, not about the Examples which are treated separately.
>> That's why we had contemplated additionally using 'tests' (because that's
>> the directory name used for unit/regression/.. tests) in the option
>> name.
>> 
>> Even though Brian is correct, ideally we *would* want to also influence 
the
>> examples' running to *not* stop on a first error..   However that would
>> need more work, reorganizing how the examples are run and that may not be
>> worth the pain.   However it should be considered a goal in the long run.

> My name is Hervé, and I was not suggesting that what happens with the
> examples should be changed. I was just preaching consistency (again
> sorry) between what happens with the examples and what happens with
> the tests. 

Thank you, Hervé and excuse me for not answering more focused on
what you said.
I think I do understand what you say (at least by now :-)) and
agree that consistency is something important and to be strived for,
also with these options.

> Why not simply change the latter?
> Do we really need an option to control this? 

Very good questions.  If the change could be made much better,
I'd agree we'd not need a new option because the change could be
considerided uniformly better than the current (R 3.3.2, say) behavior.
However the change as it is currently, is not good enough to be
the only option (see below). 

> The behavior was changed for the examples a couple of
> years ago and nobody felt the need to introduce an option
> to control this at the time.

Yes, that change was made very nicely (not by me) and I'd say
the result *was* uniformly better than the previous behavior, so
there did not seem much of a reason to 

<    3   4   5   6   7   8   9   10   11   12   >