Re: [Rd] NAs and rle

2020-08-26 Thread William Dunlap via R-devel
Splus's rle() also grouped NA's (separately from NaN's):

% Splus
TIBCO Software Inc. Confidential Information
Copyright (c) 1988-2008 TIBCO Software Inc. ALL RIGHTS RESERVED.
TIBCO Spotfire S+ Version 8.1.1 for Linux 2.6.9-34.EL, 32-bit : 2008
> dput(rle(c(11,11,NA,NA,NA,NaN,14,14,14,14)))
list("lengths" = c(2, 3, 1, 4)
, "values" = c(11., NA, NaN, 14.)
)

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Aug 25, 2020 at 10:57 PM Gabriel Becker  wrote:
>
> Hi All,
>
> A twitter user, Mike fc (@coolbutuseless) mentioned today that he was
> surprised that repeated NAs weren't treated as a run by the rle function.
>
> Now I know why they are not. NAs represent values which could be the same
> or different from eachother if they were known, so from a purely conceptual
> standpoint there is no way to tell whether they are the same and thus
> constitute a run or not.
>
> This conceptual strictness isnt universally observed, though, because we
> get the following:
>
> > unique(c(1, 2, 3, NA, NA, NA))
>
> [1]  1  2  3 NA
>
>
> Which means that rle(sort(x))$value is not guaranteed to be the same as
> unique(x), which is a little strange (though likely of little practical
> impact).
>
>
> Personally, to me it also seems that, from a purely data-compression
> standpoint, it would be valid to collapse those missing values into a run
> of missing, as it reduces size in-memory/on disk without losing any
> information.
>
> Now none of this is to say that I suggest the default behavior be changed
> (that would surely disrupt some non-trivial amount of existing code) but
> what do people think of a  group.nas argument which defaults to FALSE
> controlling the behavior?
>
> As a final point, there is some precedent here (though obviously not at all
> binding), as Bioconductor's Rle functionality does group NAs.
>
> Best,
> ~G
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] lm() takes weights from formula environment

2020-08-10 Thread William Dunlap via R-devel
I assume you are concerned about this because the formula is defined
in one environment and the model fitting with weights occurs in a
separate function.  If that is the case then the model fitting
function can create a new environment, a child of the formula's
environment, add the weights variable to it, and make that the new
environment of the formula.  (This new environment is only an
attribute of the copy of the formula in the model fitting function: it
will not affect the formula outside of that function.)  E.g.,


d <- data.frame(x = 1:3, y = c(1, 2, 1))

lmWithWeightsBad <- function(formula, data, weights) {
lm(formula, data=data, weights=weights)
}
coef(lmWithWeightsBad(y~x, data=d, weights=c(2,5,1))) # lm finds the
'weights' function in package:stats
#Error in model.frame.default(formula = formula, data = data, weights
= weights,  :
#  invalid type (closure) for variable '(weights)'

lmWithWeightsGood <- function(formula, data, weights) {
envir <- new.env(parent = environment(formula))
envir$weights <- weights
environment(formula) <- envir
lm(formula, data=data, weights=weights)
}
coef(lmWithWeightsGood(y~x, data=d, weights=c(2,5,1)))
#(Intercept)   x
#  1.2173913   0.2173913

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Mon, Aug 10, 2020 at 10:43 AM John Mount  wrote:
>
> I wish I had started with "I am disappointed that lm() doesn't continue its 
> search for weights into the calling environment" or "the fact that lm() looks 
> only in the formula environment and data frame for weights doesn't seem 
> consistent with how other values are treated."
>
> But I did not. So I do apologize for both that and for negative tone on my 
> part.
>
>
> Simplified example:
>
> d <- data.frame(x = 1:3, y = c(1, 2, 1))
> w <- c(1, 10, 1)
> f <- as.formula(y ~ x)
> lm(f, data = d, weights = w)  # works
>
> # fails
> environment(f) <- baseenv()
> lm(f, data = d, weights = w)
> # Error in eval(extras, data, env) : object 'w' not found
>
>
> > On Aug 9, 2020, at 11:56 AM, Duncan Murdoch  
> > wrote:
> >
> > This is fairly clearly documented in ?lm:
> >
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] CAR0 vs. EXTPTR_PTR

2020-07-22 Thread William Dunlap via R-devel
I know that binary packages are R-version specific, but it was a bit
surprising that Rcpp 1.0.5 built with R-4.0.2 cannot be loaded into
R-4.0.0.

% R-4.0.0 --quiet
> library(Rcpp, lib="lib-4.0.2")
Error: package or namespace load failed for ‘Rcpp’ in dyn.load(file,
DLLpath = DLLpath, ...):
 unable to load shared object '/tmp/bill/lib-4.0.2/Rcpp/libs/Rcpp.so':
  /tmp/bill/lib-4.0.2/Rcpp/libs/Rcpp.so: undefined symbol: EXTPTR_PTR
In addition: Warning message:
package ‘Rcpp’ was built under R version 4.0.2

It looks like R's include/Rinternals.h was rejiggered so the function
EXTPTR_PTR is called when CAR0 used to be.   (I think they do the same
thing.)

Bill Dunlap
TIBCO Software
wdunlap tibco.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Change in lapply's missing argument passing

2020-06-26 Thread William Dunlap via R-devel
Consider the following expression, in which we pass 'i=', with no value
given for the 'i' argument, to lapply.
lapply("x", function(i, j) c(i=missing(i),j=missing(j), i=)
>From R-2.14.0 (2011-10-31) through R-3.4.4 (2018-03-15) this evaluated to
c(i=TRUE, j=FALSE).  From R-3.5.0 (2018-04-23) through R-4.0.0 (2020-04-24)
this evaluated to c(i=FALSE, j=TRUE).

Was this change intentional?

Bill Dunlap
TIBCO Software
wdunlap tibco.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] mget(missingArgument)?

2020-06-22 Thread William Dunlap via R-devel
Currently, when mget() is used to get the value of a function's argument
with no default value and no value in the call it returns the empty name
(R_MissingArg).  Is that the right thing to do or should it return
'ifnotfound' or give an error?

E.g.,
> a <- (function(x) { y <- "y from function's environment";
mget(c("x","y","z"), envir=environment(), ifnotfound=666)})()
> str(a)
List of 3
 $ x: symbol
 $ y: chr "y from function's environment"
 $ z: num 666

The similar function get0() gives an error in that case.
> b <- (function(x) get0("x", envir=environment(), ifnotfound=666))()
Error in get0("x", envir = environment(), ifnotfound = 666) :
  argument "x" is missing, with no default

Bill Dunlap
TIBCO Software
wdunlap tibco.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] R-devel's ...names() questions

2020-05-22 Thread William Dunlap via R-devel
Am am missing something or does the new ...names() in R-devel not work
right?

> a <- function(x, ...) ...names()
> a(a=stop("a"), b=stop("b"))
[1] "a" ""
> a(stop("x"), stop("unnamed"), c=stop("c"), d=stop("d"))
[1] NA "" ""

> version
   _
platform   x86_64-pc-linux-gnu
arch   x86_64
os linux-gnu
system x86_64, linux-gnu
status Under development (unstable)
major  4
minor  1.0
year   2020
month  05
day19
svn rev78492
language   R
version.string R Under development (unstable) (2020-05-19 r78492)
nickname   Unsuffered Consequences

The following seems to do the right thing
alt...names <- function() evalq(names(substitute(...())),
envir=parent.frame())

However I wonder if it would be better to give the user a function, say
...args_unevaluated(...) to get the unevaluated ... arguments directlly
without having to know about the substitute(...()) trick.   Then the user
could get the length, the n'th, or the names using the usual length(), [[,
and names() functions instead of ...length(), ...elt(), and ...names().

Bill Dunlap
TIBCO Software
wdunlap tibco.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""

2020-05-22 Thread William Dunlap via R-devel
; no effect whenever  'collapse = '
> >
> > 2) Gabe proposes that 'collapse = ' and 'recycle0 = TRUE'
> > should be declared incompatible and error. If going in that
> > direction, I could also see them to give a warning (and
> > continue as if recycle = FALSE).
> >
> >
> > Herve makes a good point about when sep and collapse are both set. That
> > said, if the user explicitly sets recycle0, Personally, I don't think it
> > should be silently ignored under any configuration of other arguments.
> >
> > If all of the arguments are to go into effect, the question then becomes
> > one of ordering, I think.
> >
> > Consider
> >
> > paste(c("a", "b"), NULL, c("c",  "d"),  sep = " ", collapse = ",",
> > recycle0=TRUE)
> >
> > Currently that returns character(0), becuase the logic is
> > essenttially (in pseudo-code)
> >
> > collapse(paste(c("a", "b"), NULL, c("c",  "d"),  sep = " ",
> > recycle0=TRUE), collapse = ", ", recycle0=TRUE)
> >
> >   -> collapse(character(0), collapse = ", " recycle0=TRUE)
> >
> > -> character(0)
> >
> > Now Bill Dunlap argued, fairly convincingly I think, that paste(...,
> > collapse=) should /always/ return a character vector of length
> > exactly one. With recycle0, though,  it will return "" via the
> progression
> >
> > paste(c("a", "b"), NULL, c("c",  "d"),  sep = " ", collapse = ",",
> > recycle0=TRUE)
> >
> >   -> collapse(character(0), collapse = ", ")
> >
> > -> ""
> >
> >
> > because recycle0 is still applied to the sep-based operation which
> > occurs before collapse, thus leaving a vector of length 0 to collapse.
> >
> > That is consistent but seems unlikely to be what the user wanted, imho.
> > I think if it does this there should be at least a warning when paste
> > collapses to "" this way, if it is allowed at all (ie if mixing
> > collapse=and recycle0=TRUEis not simply made an error).
> >
> > I would like to hear others' thoughts as well though. @Pages, Herve
> > <mailto:hpa...@fredhutch.org> @William Dunlap
> > <mailto:wdun...@tibco.com> is "" what you envision as thee desired and
> > useful behavior there?
> >
> > Best,
> > ~G
> >
> >
> >
> > I have not yet my mind up but would tend to agree to "you guys",
> > but I think that other R Core members should chime in, too.
> >
> > Martin
> >
> >  >> On Fri, May 15, 2020 at 11:05 AM Hervé Pagès
> > mailto:hpa...@fredhutch.org>
> >  >> <mailto:hpa...@fredhutch.org <mailto:hpa...@fredhutch.org>>>
> > wrote:
> >  >>
> >  >> Totally agree with that.
> >  >>
> >  >> H.
> >  >>
> >  >> On 5/15/20 10:34, William Dunlap via R-devel wrote:
> >  >> > I agree: paste(collapse="something", ...) should always
> > return a
> >  >> single
> >  >> > character string, regardless of the value of recycle0.
> > This would be
> >  >> > similar to when there are no non-NULL arguments to paste;
> >  >> collapse="."
> >  >> > gives a single empty string and collapse=NULL gives a zero
> > long
> >  >> character
> >  >> > vector.
> >  >> >> paste()
> >  >> > character(0)
> >  >> >> paste(collapse=", ")
> >  >> > [1] ""
> >  >> >
> >  >> > Bill Dunlap
> >  >> > TIBCO Software
> >  >> > wdunlap tibco.com
> > <
> https://urldefense.proofpoint.com/v2/url?u=http-3A__tibco.com=DwMFaQ=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=q5ueyHReS5hHK6TZ0dJ1N7Ro8dx-rsLHys8GrCugOls=o9ozvxBK-kVvAUFro7U1RrI5w0U8EPb0uyjQwMvOpt8=
> >
> >  >>
> > <
> https://urldefense.proofpoint.com/v2/url?u=http-3A__tibco.com=DwMFaQ=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=cC2qctlVXd0qHMPvCyYvuVMqR8GU3DjTTqKJ0zjIFj8=

Re: [Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""

2020-05-21 Thread William Dunlap via R-devel
> 1) Bill and Hervé (I think) propose that 'recycle0' should have
>   no effect whenever  'collapse = '

I think that collapse= should make paste() return a single string,
regardless of the value of recycle0.  E.g., I would like to see

> paste0("X",seq_len(3),collapse=", ", recycle0=TRUE)
[1] "X1, X2, X3"
> paste0("X",seq_len(0),collapse=", ", recycle0=TRUE)
[1] ""

Currently the latter gives character(0).

paste's collapse argument has traditionally acted after all the other
arguments were dealt with, as in the following not extensively tested
function.

altPaste <- function (..., collapse = NULL) {
tmp <- paste(...)
if (!is.null(collapse)) {
paste(tmp, collapse=collapse)
} else {
tmp
}
}

E.g., in post-R-4.0.0 R-devel
> altPaste("X", seq_len(3), sep="", collapse=", ")
[1] "X1, X2, X3"
> altPaste("X", seq_len(0), sep="", collapse=", ")
[1] "X"
> altPaste("X", seq_len(0), sep="", collapse=", ", recycle0=TRUE)
[1] ""

I think it would be good if the above function continued to act the same as
paste itself.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Thu, May 21, 2020 at 9:42 AM Martin Maechler 
wrote:

> >>>>> Hervé Pagès
> >>>>> on Fri, 15 May 2020 13:44:28 -0700 writes:
>
> > There is still the situation where **both** 'sep' and 'collapse' are
> > specified:
>
> >> paste(integer(0), "nth", sep="", collapse=",")
> > [1] "nth"
>
> > In that case 'recycle0' should **not** be ignored i.e.
>
> > paste(integer(0), "nth", sep="", collapse=",", recycle0=TRUE)
>
> > should return the empty string (and not character(0) like it does at
> the
> > moment).
>
> > In other words, 'recycle0' should only control the first operation
> (the
> > operation controlled by 'sep'). Which makes plenty of sense: the 1st
> > operation is binary (or n-ary) while the collapse operation is
> unary.
> > There is no concept of recycling in the context of unary operations.
>
> Interesting, ..., and sounding somewhat convincing.
>
> > On 5/15/20 11:25, Gabriel Becker wrote:
> >> Hi all,
> >>
> >> This makes sense to me, but I would think that recycle0 and
> collapse
> >> should actually be incompatible and paste should throw an error if
> >> recycle0 were TRUE and collapse were declared in the same call. I
> don't
> >> think the value of recycle0 should be silently ignored if it is
> actively
> >> specified.
> >>
> >> ~G
>
> Just to summarize what I think we should know and agree (or be
> be "disproven") and where this comes from ...
>
> 1) recycle0 is a new R 4.0.0 option in paste() / paste0() which by default
>(recycle0 = FALSE) should (and *does* AFAIK) not change anything,
>hence  paste() / paste0() behave completely back-compatible
>if recycle0 is kept to FALSE.
>
> 2) recycle0 = TRUE is meant to give different behavior, notably
>0-length arguments (among '...') should result in 0-length results.
>
>The above does not specify what this means in detail, see 3)
>
> 3) The current R 4.0.0 implementation (for which I'm primarily responsible)
>and help(paste)  are in accordance.
>Notably the help page (Arguments -> 'recycle0' ; Details 1st para ;
> Examples)
>says and shows how the 4.0.0 implementation has been meant to work.
>
> 4) Several provenly smart members of the R community argue that
>both the implementation and the documentation of 'recycle0 =
>TRUE'  should be changed to be more logical / coherent / sensical ..
>
> Is the above all correct in your view?
>
> Assuming yes,  I read basically two proposals, both agreeing
> that  recycle0 = TRUE  should only ever apply to the action of 'sep'
> but not the action of 'collapse'.
>
> 1) Bill and Hervé (I think) propose that 'recycle0' should have
>no effect whenever  'collapse = '
>
> 2) Gabe proposes that 'collapse = ' and 'recycle0 = TRUE'
>should be declared incompatible and error. If going in that
>direction, I could also see them to give a warning (and
>continue as if recycle = FALSE).
>
> I have not yet my mind up but would tend to agree to "you guys",
> but I think that other R Core members should chime in, too.
>
> Martin
>
> >> On Fri, May 15, 2020 at 11:05 AM Hervé Pagès  >> <mailto:hpa...@

Re: [Rd] order function called on a data.frame?

2020-05-18 Thread William Dunlap via R-devel
do.call(order, df).  ->  do.call(order, unname(df)).

While you are looking at order(), it would be nice if ';decreasing' could
be a vector the the length of list(...) so you could ask to sort some
columns in increasing order and some decreasing.  I thought I put this on
bugzilla eons ago, but perhaps not.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Mon, May 18, 2020 at 8:52 AM Michael Lawrence via R-devel <
r-devel@r-project.org> wrote:

> I guess we could make it do the equivalent of do.call(order, df).
>
> On Mon, May 18, 2020 at 8:32 AM Rui Barradas  wrote:
> >
> > Hello,
> >
> > There is a result with lists? I am getting
> >
> >
> > order(list(letters, 1:26))
> > #Error in order(list(letters, 1:26)) :
> > #  unimplemented type 'list' in 'orderVector1'
> >
> > order(data.frame(letters, 1:26))
> > # [1] 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
> > #[22] 48 49 50 51 52  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16
> > #[43] 17 18 19 20 21 22 23 24 25 26
> >
> >
> > And I agree that order with data.frames should give a warning. The
> > result is indeed useless:
> >
> > data.frame(letters, 1:26)[order(data.frame(letters, 1:26)), ]
> >
> >
> > Hope this helps,
> >
> > Rui Barradas
> >
> >
> > Às 00:19 de 18/05/20, Jan Gorecki escreveu:
> > > Hi,
> > > base::order main input arguments are defined as:
> > >
> > > a sequence of numeric, complex, character or logical vectors, all of
> > > the same length, or a classed R object
> > >
> > > When passing a list or a data.frame, the resuts seems to be a bit
> > > useless. Shouldn't that raise an error, or at least warning?
> > >
> > > Best Regards,
> > > Jan Gorecki
> > >
> > > __
> > > R-devel@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > >
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>
> --
> Michael Lawrence
> Senior Scientist, Data Science and Statistical Computing
> Genentech, A Member of the Roche Group
> Office +1 (650) 225-7760
> micha...@gene.com
>
> Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""

2020-05-15 Thread William Dunlap via R-devel
I agree: paste(collapse="something", ...) should always return a single
character string, regardless of the value of recycle0.  This would be
similar to when there are no non-NULL arguments to paste; collapse="."
gives a single empty string and collapse=NULL gives a zero long character
vector.
> paste()
character(0)
> paste(collapse=", ")
[1] ""

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Thu, Apr 30, 2020 at 9:56 PM suharto_anggono--- via R-devel <
r-devel@r-project.org> wrote:

> Without 'collapse', 'paste' pastes (concatenates) its arguments
> elementwise (separated by 'sep', " " by default). New in R devel and R
> patched, specifying recycle0 = FALSE makes mixing zero-length and
> nonzero-length arguments results in length zero. The result of paste(n,
> "th", sep = "", recycle0 = FALSE) always have the same length as 'n'.
> Previously, the result is still as long as the longest argument, with the
> zero-length argument like "". If all og the arguments have length zero,
> 'recycle0' doesn't matter.
>
> As far as I understand, 'paste' with 'collapse' as a character string is
> supposed to put together elements of a vector into a single character
> string. I think 'recycle0' shouldn't change it.
>
> In current R devel and R patched, paste(character(0), collapse = "",
> recycle0 = FALSE) is character(0). I think it should be "", like
> paste(character(0), collapse="").
>
> paste(c("4", "5"), "th", sep = "", collapse = ", ", recycle0 = FALSE)
> is
> "4th, 5th".
> paste(c("4" ), "th", sep = "", collapse = ", ", recycle0 = FALSE)
> is
> "4th".
> I think
> paste(c(), "th", sep = "", collapse = ", ", recycle0 = FALSE)
> should be
> "",
> not character(0).
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] edit() doubles backslashes when keep.source=TRUE

2020-05-14 Thread William Dunlap via R-devel
Is it just my installation or does edit() (or fix(), etc.) in R-4.0.0
double all the backslashes when options(keep.source=TRUE)?  E.g.,

> options(keep.source=TRUE)
> f <- function(x) { cat("\t", x, "\n", sep="") }
> edit(f) # exit the editor without making any changes
The editor (vi or notepad) shows doubled backslashes
function(x) { cat("\\t", x, "\\n", sep="") }
as does the return value of edit().

If I set options(keep.source=FALSE) before defining 'f' or remove t's
'srcref' attribute then the backslashes are left alone.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] suggestion: "." in [lsv]apply()

2020-04-16 Thread William Dunlap via R-devel
Passing in a function passes not only an argument list but also an
environment from which to get free variables.  Since your function doesn't
pay attention to the environment you get things like the following.

> wsapply(list(1,2:3), paste(., ":", deparse(s)))
[[1]]
[1] "1 : paste(., \":\", deparse(s))"

[[2]]
[1] "2 : paste(., \":\", deparse(s))" "3 : paste(., \":\", deparse(s))"

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Thu, Apr 16, 2020 at 7:25 AM Sokol Serguei 
wrote:

> Hi,
>
> I would like to make a suggestion for a small syntactic modification of
> FUN argument in the family of functions [lsv]apply(). The idea is to
> allow one-liner expressions without typing "function(item) {...}" to
> surround them. The argument to the anonymous function is simply referred
> as ".". Let take an example. With this new feature, the following call
>
> sapply(split(mtcars, mtcars$cyl), function(d) summary(lm(mpg ~ wt,
> d))$r.squared)
> #4 6 8
> #0.5086326 0.4645102 0.4229655
>
>
> could be rewritten as
>
> sapply(split(mtcars, mtcars$cyl), summary(lm(mpg ~ wt, .))$r.squared)
>
> "Not a big saving in typing" you can say but multiplied by the number of
> [lsv]apply usage and a neater look, I think, the idea merits to be
> considered.
> To illustrate a possible implementation, I propose a wrapper example for
> sapply():
>
> wsapply=function(l, fun, ...) {
>  s=substitute(fun)
>  if (is.name(s) || is.call(s) && s[[1]]==as.name("function")) {
>  sapply(l, fun, ...) # legacy call
>  } else {
>  sapply(l, function(d) eval(s, list(.=d)), ...)
>  }
> }
>
> Now, we can do:
>
> wsapply(split(mtcars, mtcars$cyl), summary(lm(mpg ~ wt, .))$r.squared)
>
> or, traditional way:
>
> wsapply(split(mtcars, mtcars$cyl), function(d) summary(lm(mpg ~ wt,
> d))$r.squared)
>
> the both work.
>
> How do you feel about that?
>
> Best,
> Serguei.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] detect ->

2020-04-15 Thread William Dunlap via R-devel
You are right.  >= is not as evocative as =>.  Perhaps > and < would do?
%=>% and %<=% would work.
Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Wed, Apr 15, 2020 at 12:41 AM Adrian Dușa  wrote:

> Dear Bill,
>
> I already tried this, and it would have been great as (currently) the
> sufficiency relation is precisely "=>"... but:
>
> foo <- function(x) return(substitute(x))
> foo(A => B)
> Error: unexpected '>' in "foo(A =>"
>
> It seems that "=>" is a syntactic error for the R parser, while "<=" is
> not because it denotes less than or equal.
>
> Now, if I could find a way to define "=>" as a standalone operator, and
> convince the R parser to bypass that error, it would solve everything. If
> this is not possible, I am back to detecting "->".
>
> Best,
> Adrian
>
>
> > On 13 Apr 2020, at 19:19, William Dunlap  wrote:
> >
> > Using => and <= instead of -> and <- would make things easier, although
> the precedence would be different.
> >
> > Bill Dunlap
> > TIBCO Software
> > wdunlap tibco.com
> >
> >
> > On Mon, Apr 13, 2020 at 1:43 AM Adrian Dușa 
> wrote:
> > Thank you for your replies, this actually has little to do with the
> regular R code but more to signal what in my package QCA is referred to as
> a necessity relation A <- B (A is necessary for B) and sufficiency A -> B
> (A is sufficient for B).
> >
> > If switched by the parser, A -> B becomes B <- A which makes B necessary
> for A, while the intention is to signal sufficiency for B.
> >
> > Capturing in a quoted string is trivial, but I am now experimenting with
> substitute() to allow unquoted expressions.
> >
> > This is especially useful when selecting A and B from the columns of a
> data frame, using: c(A, B) instead of c("A", "B") with a lot more quotes
> for more complex expressions using more columns.
> >
> > I would be grateful for any pointer to a project that processes the code
> while it is still raw text. I could maybe learn from their code and adapt
> to my use case.
> >
> > Best wishes,
> > Adrian
> >
> > > On 13 Apr 2020, at 11:23, Gabriel Becker 
> wrote:
> > >
> > > Adrian,
> > >
> > > Indeed, this has come up in a few places, but as Gabor says, there is
> no such thing as right hand assignment at any point after parsing is
> complete.
> > >
> > > This means the only feasible way to detect it, which a few projects do
> I believe, is process the code while it is still raw text, before it goes
> into the parser, and have clever enough regular expressions.
> > >
> > > The next question, then, is why are you trying to detect right
> assignment. Doing so can be arguably useful fo linting, its true.
> Otherwise, though, because its not really a "real thing" when the R code is
> being executed, its not something thats generally meaningful to detect in
> most cases.
> > >
> > > Best,
> > > ~G
> > >
> > > On Mon, Apr 13, 2020 at 12:52 AM Gábor Csárdi 
> wrote:
> > > That parser already flips -> to <- before creating the parse tree.
> > >
> > > Gabor
> > >
> > > On Mon, Apr 13, 2020 at 8:39 AM Adrian Dușa 
> wrote:
> > > >
> > > > I searched and tried for hours, to no avail although it looks simple.
> > > >
> > > > (function(x) substitute(x))(A <- B)
> > > > #A <- B
> > > >
> > > > (function(x) substitute(x))(A -> B)
> > > > # B <- A
> > > >
> > > > In the first example, A occurs on the LHS, but in the second example
> A is somehow evaluated as if it occured on the RHS, despite my
> understanding that substitute() returns the unevaluated parse tree.
> > > >
> > > > Is there any way, or is it even possible to detect the right hand
> assignment, to determine whether A occurs on the LHS?
> > > >
> > > > Thanks in advance for any hint,
> > > > Adrian
> > > >
> > > > —
> > > > Adrian Dusa
> > > > University of Bucharest
> > > > Romanian Social Data Archive
> > > > Soseaua Panduri nr. 90-92
> > > > 050663 Bucharest sector 5
> > > > Romania
> > > > https://adriandusa.eu
> > > >
> > > > __
> > > > R-devel@r-project.org mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > >
> > > __
> > > R-devel@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> > —
> > Adrian Dusa
> > University of Bucharest
> > Romanian Social Data Archive
> > Soseaua Panduri nr. 90-92
> > 050663 Bucharest sector 5
> > Romania
> > https://adriandusa.eu
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> —
> Adrian Dusa
> University of Bucharest
> Romanian Social Data Archive
> Soseaua Panduri nr. 90-92
> 050663 Bucharest sector 5
> Romania
> https://adriandusa.eu
>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] detect ->

2020-04-13 Thread William Dunlap via R-devel
Using => and <= instead of -> and <- would make things easier, although the
precedence would be different.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Mon, Apr 13, 2020 at 1:43 AM Adrian Dușa  wrote:

> Thank you for your replies, this actually has little to do with the
> regular R code but more to signal what in my package QCA is referred to as
> a necessity relation A <- B (A is necessary for B) and sufficiency A -> B
> (A is sufficient for B).
>
> If switched by the parser, A -> B becomes B <- A which makes B necessary
> for A, while the intention is to signal sufficiency for B.
>
> Capturing in a quoted string is trivial, but I am now experimenting with
> substitute() to allow unquoted expressions.
>
> This is especially useful when selecting A and B from the columns of a
> data frame, using: c(A, B) instead of c("A", "B") with a lot more quotes
> for more complex expressions using more columns.
>
> I would be grateful for any pointer to a project that processes the code
> while it is still raw text. I could maybe learn from their code and adapt
> to my use case.
>
> Best wishes,
> Adrian
>
> > On 13 Apr 2020, at 11:23, Gabriel Becker  wrote:
> >
> > Adrian,
> >
> > Indeed, this has come up in a few places, but as Gabor says, there is no
> such thing as right hand assignment at any point after parsing is complete.
> >
> > This means the only feasible way to detect it, which a few projects do I
> believe, is process the code while it is still raw text, before it goes
> into the parser, and have clever enough regular expressions.
> >
> > The next question, then, is why are you trying to detect right
> assignment. Doing so can be arguably useful fo linting, its true.
> Otherwise, though, because its not really a "real thing" when the R code is
> being executed, its not something thats generally meaningful to detect in
> most cases.
> >
> > Best,
> > ~G
> >
> > On Mon, Apr 13, 2020 at 12:52 AM Gábor Csárdi 
> wrote:
> > That parser already flips -> to <- before creating the parse tree.
> >
> > Gabor
> >
> > On Mon, Apr 13, 2020 at 8:39 AM Adrian Dușa 
> wrote:
> > >
> > > I searched and tried for hours, to no avail although it looks simple.
> > >
> > > (function(x) substitute(x))(A <- B)
> > > #A <- B
> > >
> > > (function(x) substitute(x))(A -> B)
> > > # B <- A
> > >
> > > In the first example, A occurs on the LHS, but in the second example A
> is somehow evaluated as if it occured on the RHS, despite my understanding
> that substitute() returns the unevaluated parse tree.
> > >
> > > Is there any way, or is it even possible to detect the right hand
> assignment, to determine whether A occurs on the LHS?
> > >
> > > Thanks in advance for any hint,
> > > Adrian
> > >
> > > —
> > > Adrian Dusa
> > > University of Bucharest
> > > Romanian Social Data Archive
> > > Soseaua Panduri nr. 90-92
> > > 050663 Bucharest sector 5
> > > Romania
> > > https://adriandusa.eu
> > >
> > > __
> > > R-devel@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> —
> Adrian Dusa
> University of Bucharest
> Romanian Social Data Archive
> Soseaua Panduri nr. 90-92
> 050663 Bucharest sector 5
> Romania
> https://adriandusa.eu
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] is.vector could handle AsIs class better

2020-03-30 Thread William Dunlap via R-devel
The use of the term 'vector' in R comes from S, where it was used, starting
in the latter part of the 1970s, to refer to the most primitive
(irreducible) parts of an object.  It has little to do with the
mathematical or physical concept of a vector and, in my opinion, should not
be used much by ordinary users.  In hindsight, it may have been better to
use some Joycean neologism instead of the word vector so people would not
have any notions of what it should do.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Mon, Mar 30, 2020 at 2:26 AM Jan Gorecki  wrote:

> Thank you Gabriel,
> Agree, although I think that could be relaxed in this single case and
> AsIs class could be ignored.
> Best,
> Jan
>
> On Sun, Mar 29, 2020 at 7:09 PM Gabriel Becker 
> wrote:
> >
> > Jan,
> >
> > I believe it's because it has "a non-NULL attribute other than names" as
> per the documentation. In this case its class of "AsIs".
> >
> > Best,
> > ~G
> >
> > On Sun, Mar 29, 2020 at 6:29 AM Jan Gorecki 
> wrote:
> >>
> >> Dear R-devel,
> >>
> >> AsIs class seems to be well handled by `typeof` and `mode` function.
> >> Those two functions are being referred when explaining `is.vector`
> >> behaviour in manual. Yet `is.vector` does not seem to be handling AsIs
> >> class the same way.
> >>
> >> is.vector(1L)
> >> #[1] TRUE
> >> is.vector(I(1L))
> >> #[1] FALSE
> >>
> >> Is there any reason behind this behaviour?
> >> Could we have it supported so AsIs class is ignored when `is.vector`
> >> is doing its job?
> >>
> >> Best Regards,
> >> Jan Gorecki
> >>
> >> __
> >> R-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Inconsistant result for normalizePath on Windows

2020-03-23 Thread William Dunlap via R-devel
Re the trailing path separator - should file.path() be changed to not
produce doubled path separators when an argument has a trailing path
separator?

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Mon, Mar 23, 2020 at 9:24 AM Tomas Kalibera 
wrote:

>
> Hi Jiefei,
>
> the change in handling trailing path separators is not on purpose, but
> is a byproduct of a new implementation of normalizePath, which now
> handles symbolic links and normalizes case in long path names. It is not
> documented what happens to trailing separators, and hence portable
> programs should not depend on it. I don't think it is a property that
> should be documented/specified. The behavior of normalizePath is way too
> complicated already and it's result is OS-specific anyway.
>
> In R-devel as well as in 3.6, the trailing separator is preserved when
> the path does not exist - simply, the original path is returned. When
> the path does exist, R-devel removes the trailing separator but R 3.6
> does not, which is because the underlying Windows API call to implement
> it is now different. The new behavior reflects what
> GetFinalPathNameByHandle returns, which is a function now used for
> normalization also in other language runtimes on Windows. I think the
> new behavior is better: paths differing only in the trailing separator
> will be normalized to the same path.
>
> Best
> Tomas
>
> On 3/23/20 4:39 PM, Wang Jiefei wrote:
> > Hi all,
> >
> > I saw a quite surprising result in the devel R when using the function
> > *normalizePath*. If the input is a path to a folder, the function returns
> > an absolute path with/without a slash at the end depending on the
> existence
> > of the folder. I know both results are valid on Windows but this behavior
> > is different than R3.6, I do not know if the change in the devel version
> is
> > made on purpose. Here is a minimal example, suppose that the folder
> > `C:/windows1/` does not exist.
> >
> >> normalizePath("C:/windows/", mustWork = FALSE)
> > [1] "C:\\Windows"
> >> normalizePath("C:/windows1/", mustWork = FALSE)
> > [1] "C:\\windows1\\"
> >
> >
> > In R 3.6, the return value always ends with a slash if the input ends
> with
> > a slash. From the NEWS file, It seems like there are some changes to
> > *normalizePath* but none of them should be relevant, it might be an
> > unintentional result introduced by the update.
> >
> > Best,
> > Jiefei
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] support of `substitute(...())`

2020-03-12 Thread William Dunlap via R-devel
Note that substitute(...()) and substitute(someFunc(...))[-1] give slightly
different results, the former a pairlist and the latter a call.
  > str((function(...)substitute(...()))(stop(1),stop(2),stop(3)))
  Dotted pair list of 3
   $ : language stop(1)
   $ : language stop(2)
   $ : language stop(3)
  >
str((function(...)substitute(someFunc(...))[-1])(stop(1),stop(2),stop(3)))
   language stop(1)(stop(2), stop(3))

The ...() idiom has been around for a long time, but more recently
(slightly after R-3.4.0?) the ...elt(n) and ...length() functions were
introduced so you don't have to use it much.  I don't see a ...names()
function that would give the names of the ... arguments -
names(substitute(...())).

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Thu, Mar 12, 2020 at 2:09 AM Dénes Tóth  wrote:

> Dear R Core Team,
>
> I learnt approx. two years ago in this mailing list that one can use the
> following "trick" to get a (dotted pair)list of the ellipsis arguments
> inside a function:
>
> `substitute(...())`
>
> Now my problem is that I can not find any occurrence of this call within
> the R source - the most frequent solution there is
> `substitute(list(...))[-1L] `
>
> I would like to know if:
> 1) substitute(...()) is a trick or a feature in the language;
> 2) it will be supported in the future;
> 3) when (in which R version) it was introduced.
>
> A hint on where to look for the machinery in the R source would be also
> appreciated.
>
> Regards,
> Denes
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Change 77844 breaking pkgs [Re: dimnames incoherence?]

2020-02-22 Thread William Dunlap via R-devel
> but then, it seems people want to perpetuate the
> claim of R to be slow

More charitably, I think that the thinking may have been that since x[[i]]
gives you one element of x,
they should use x[[i]]<-value, for scalar i, to stick in one element.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Sat, Feb 22, 2020 at 12:44 PM Martin Maechler 
wrote:

> > Martin Maechler
> > on Sat, 22 Feb 2020 20:20:49 +0100 writes:
>
> > William Dunlap
> > on Fri, 21 Feb 2020 14:05:49 -0800 writes:
>
> >> If we change the behavior  NULL--[[--assignment from
>
> >> `[[<-`(NULL, 1, "a" ) # gives  "a"  (*not* a list)
>
> >> to
>
> >> `[[<-`(NULL, 1, "a" ) # gives  list("a")
>
> >> then we have more consistency there *and* your bug is fixed too.
> >> Of course, in other situations back-compatibility would be
> >> broken as well.
>
> >> Would that change the result of
> >> L <- list(One=1) ; L$Two[[1]] <- 2
> >> from the current list(One=1,Two=2) to list(One=1, Two=list(2))
>
> >> and the result of
> >> F <- 1L ; levels(F)[[1]] <- "one"
> >> from structure(1L, levels="one") to structure(1L,
> levels=list("one"))?
>
> > Yes (twice).
>
> > This is indeed what happens in current R-devel, as I had
> > committed the proposition above yesterday.
> > So R-devel (with svn rev >= 77844 )  does this :
>
> >> L <- list(One=1) ; L$Two[[1]] <- 2 ; dput(L)
> > list(One = 1, Two = list(2))
> >> F <- 1L ; levels(F)[[1]] <- "one" ; dput(F)
> > structure(1L, .Label = list("one"))
> >>
>
> > but I find that still considerably more logical than current
> > (pre R-devel) R's
>
> >> L <- list(One=1) ; L$Two[[1]] <- 2 ; dput(L)
> > list(One = 1, Two = 2)
> >> L <- list(One=1) ; L$Two[[1]] <- 2:3 ; dput(L)
> > list(One = 1, Two = list(2:3))
> >>
> >> F <- 1L ; levels(F)[[1]] <- "one" ; dput(F)
> > structure(1L, .Label = "one")
> >> F <- 1L ; levels(F)[[1]] <- c("one", "TWO") ; dput(F)
> > structure(1L, .Label = list(c("one", "TWO")))
> >>
>
>
> >> This change would make L$Name[[1]] <- value act like L$Name$one <-
> value
> >> in cases when L did not have a component named "Name" and value
> >> had length 1.
>
> > (I don't entirely get what you mean, but)
> > indeed,
> > the  [[<-  assignments will be closer to corresponding $<-
> assignments...
> > which I thought would be another good thing about the change.
>
> >> I have seen users use [[<- where [<- is more appropriate in cases
> like
> >> this.  Should there be a way to generate warnings about the change
> in
> >> behavior as you've done with other syntax changes?
>
> > Well, good question.
> > I'd guess one would get such warnings "all over the place",  and
> > if a warning is given only once per session it may not be
> > effective  ... also the warning be confusing to the 99.9% of R users
> who
> > don't even get what we are talking about here ;-)
>
> > Thank you for your comments.. I did not get too many.
>
> Well, there's one situation where semi-experienced package
> authors are bitten by the new R-devel behavior...
>
> I'm seeing a few dozen CRAN packages breaking in R-devel >= r77884.
>
> One case is exactly as you (Bill) mention above: people using
> dd[[.]] <- ..   where they should use single [.].
>
> In one package, I see an inefficient for loop over all rows of a
> data frame 'dd'
>
> for(i in 1:nrow(dd)) {
>
>  ...
>
>  dd$[[i]] <-  
>
> }
>
> This used to work -- as said quite inefficiently:
> for i=1 it created the **full** data frame column  and then,
> once the column exists, it presumably does assign one entry
> after the other...
>
> Now this code breaks (later!) in the package now, because the
> new column ends up as a *list* of strings, instead of a vector
> of strings.
>
> I think there are quite a few such cases also in other CRAN
> packages which now break with the latest R-devel.
>
> Coming back to Bill Dunlap's question: Should we not warn here?
> And now when our toplevel list is a data frame, maybe we should
> warn indeed, if we can easily limit ourselves to such "bizarre"
> ways of growng a data frame  ...
>
>
>   dd $ foo [[i]] <- vv
>
> <==>
>
>   `*tmp*` <- dd
>   dd <- `$<-`(`*tmp*`, value = `[[<-`(`*tmp*`$foo, i, vv))
>   rm(`*tmp*`)
>
> but then really we have the same problem as previously: The
>  `[[<-`(NULL, i, vv)  part does not "know" anything about the
> fact that we are in a data frame column creation context.
>
> If the R package author had used  '[i]' instead of '[[i]]'
> he|she would have been safe
>
> (as they would be if they worked more efficiently and created
> the whole variable as a vector and only then added it to the
> data frame ... but then, it seems people want to perpetuate the
> claim of R to be slow ... even if it's them who make R run
> slowly ... ;-))
>
>

[[alternative HTML version deleted]]


Re: [Rd] dimnames incoherence?

2020-02-21 Thread William Dunlap via R-devel
   If we change the behavior  NULL--[[--assignment from
`[[<-`(NULL, 1, "a" ) # gives  "a"  (*not* a list)
   to
`[[<-`(NULL, 1, "a" ) # gives  list("a")
   then we have more consistency there *and* your bug is fixed too.
   Of course, in other situations back-compatibility would be
   broken as well.

Would that change the result of
   L <- list(One=1) ; L$Two[[1]] <- 2
from the current list(One=1,Two=2) to list(One=1, Two=list(2))
and the result of
   F <- 1L ; levels(F)[[1]] <- "one"
from structure(1L, levels="one") to structure(1L, levels=list("one"))?
This change would make L$Name[[1]] <- value act like L$Name$one <- value
in cases when L did not have a component named "Name" and value
had length 1.

I have seen users use [[<- where [<- is more appropriate in cases like
this.  Should there be a way to generate warnings about the change in
behavior as you've done with other syntax changes?

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Wed, Feb 19, 2020 at 12:59 PM Martin Maechler 
wrote:

> > Martin Maechler
> > on Wed, 19 Feb 2020 18:06:57 +0100 writes:
>
> > Serguei Sokol
> > on Wed, 19 Feb 2020 15:21:21 +0100 writes:
>
> >> Hi,
> >> I was bitten by a little incoherence in dimnames assignment or may
> be I
> >> missed some point.
> >> Here is the case. If I assign row names via dimnames(a)[[1]], when
> >> nrow(a)=1 then an error is thrown. But if I do the same when
> nrow(a) > 1
> >> it's OK. Is one of this case works unexpectedly? Both? Neither?
>
> >> a=as.matrix(1)
> >> dimnames(a)[[1]]="a" # error: 'dimnames' must be a list
>
> >> aa=as.matrix(1:2)
> >> dimnames(aa)[[1]]=c("a", "b") # OK
>
> >> In the second case, dimnames(aa) is not a list (like in the first
> case)
> >> but it works.
> >> I would expect that the both work or neither.
>
> > I agree (even though I'm strongly advising people to use '<-'
> > instead of '=');
> > which in this case helps you get the name of the function really
> > involved:  It is  `dimnames<-`  (which is implemented in C
> > entirely, for matrices and other arrays).
>
> As a matter of fact, I wrote too quickly, the culprit here is
> the  `[[<-`  function (rather than `dimnames<-`),
> which has a special "inconsistency" feature when used to "add to NULL";
> almost surely inherited from S,  but I now think we should
> consider dropping on the occasion of aiming for  R 4.0.0 :
>
> It's documented in ?Extract  that  length 1  `[[.]]`-assignment works
> specially for NULL (and dimnames(.) are NULL here).
>
> Note you need to read and understand one of the tougher sections
> in the official  'R Language Definition'  Manual,
> section -- 3.4.4 Subset assignment ---
> i.e.,
>
> https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Subset-assignment
>
> notably this part:
>
> Nesting of complex assignments is evaluated recursively
>
>  names(x)[3] <- "Three"
>
> is equivalent to
>
>  `*tmp*` <- x
>  x <- "names<-"(`*tmp*`, value="[<-"(names(`*tmp*`), 3, value="Three"))
>  rm(`*tmp*`)
>
> and then, apply this to ourdimnames(a)[[1]] <- "a"
> and so  replace
>
>  -  'names<-' by 'dimnames<-'
>  -  '[<-' by '[[<-'
>
> --
>
> Here is the rest of my analysis as valid R code
> {this is not new, Peter Dalgaard had explained this 10 or 20
>  years ago to a mailing list audience IIRC} :
>
> ## MM: The problematic behavior (bug ?) is in `[[<-`, not in `dimnames<-` :
>
> `[[<-`(NULL, 1,   "a" ) # gives  "a"  (*not* a list)
> `[[<-`(NULL, 1, c("a","b")) # gives list(c("a","b"))  !!
>
> ##==> in C code: in  subassign.c  [ ~/R/D/r-devel/R/src/main/subassign.c ]
> ##==> function (~ 340 lines)
> ##do_subassign2_dflt(SEXP call, SEXP op, SEXP args, SEXP rho)
> ## has
> "
> line svn r.  svn auth. c.o.d.e...
>  --  - --
> 1741   4166  ihaka if (isNull(x)) {
> 1742  45446 ripley if (isNull(y)) {
> 1743  76166   luke UNPROTECT(2); /* args, y */
> 1744   4166  ihaka return x;
> 1745  45446 ripley }
> 1746  35680murdoch if (length(y) == 1)
> 1747  68094   luke x = allocVector(TYPEOF(y), 0);
> 1748  24954 ripley else
> 1749  68094   luke x = allocVector(VECSXP, 0);
> 1750   1820  ihaka }
>  --  - --
> "
> ## so clearly, in case the value is of length 1, no list is created .
>
> ## For dimnames<-  Replacing NULL by list()  should be done in both cases
> , and then things work :
> `[[<-`(list(), 1,   "a" ) # gives list( "a" )
> `[[<-`(list(), 1, c("a","b")) # gives list(c("a","b"))  !!
>
> ## but the problem here is that  `[[<-` at this time in the game
> ## does *not* know that it comes from dimnames<- 
>
> ---
>
> If we change the behavior  NULL--[[--assignment from
>
>  

Re: [Rd] dimnames incoherence?

2020-02-19 Thread William Dunlap via R-devel
How far would you like to go with the automatic creation of dimnames in
nested replacement operations on arrays?  It currently works nicely with [<-
   > a <- array(numeric(), dim=c(2,0,1)); dimnames(a)[3] <- list("One")
   > str(a)
num[1:2, 0 , 1]
- attr(*, "dimnames")=List of 3
 ..$ : NULL
 ..$ : NULL
 ..$ : chr "One"

It works most of the time (except for length=1) for [[<-
  > a <- array(numeric(), dim=c(2,0,1)); dimnames(a)[[1]] <- c("X1","X2")
  > a <- array(numeric(), dim=c(2,0,1)); dimnames(a)[[2]] <- character()
  > a <- array(numeric(), dim=c(2,0,1)); dimnames(a)[[3]] <- "Z1"
  Error in dimnames(a)[[3]] <- "Z1" : 'dimnames' must be a list

It does not work at all for names<-.
> a <- array(numeric(), dim=c(2,0,1)); names(dimnames(a)) <- c("X","Y","Z")
Error in names(dimnames(a)) <- c("X", "Y", "Z") :
  attempt to set an attribute on NULL
> a <- array(numeric(), dim=c(2,0,1)); dimnames(a)<-vector("list",3);
names(dimnames(a)) <- c("X","Y","Z")
> str(a)
 num[1:2, 0 , 1]
 - attr(*, "dimnames")=List of 3
  ..$ X: NULL
  ..$ Y: NULL
  ..$ Z: NULL

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Wed, Feb 19, 2020 at 6:24 AM Serguei Sokol 
wrote:

> Hi,
>
> I was bitten by a little incoherence in dimnames assignment or may be I
> missed some point.
> Here is the case. If I assign row names via dimnames(a)[[1]], when
> nrow(a)=1 then an error is thrown. But if I do the same when nrow(a) > 1
> it's OK. Is one of this case works unexpectedly? Both? Neither?
>
> a=as.matrix(1)
> dimnames(a)[[1]]="a" # error: 'dimnames' must be a list
>
> aa=as.matrix(1:2)
> dimnames(aa)[[1]]=c("a", "b") # OK
>
> In the second case, dimnames(aa) is not a list (like in the first case)
> but it works.
> I would expect that the both work or neither.
>
> Your thoughts are welcome.
> Best,
> Serguei.
>
> PS the same apply for dimnames(a)[[2]]<-.
>
>  > sessionInfo()
> R version 3.6.1 (2019-07-05)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Mageia 7
>
> Matrix products: default
> BLAS/LAPACK: /home/opt/OpenBLAS/lib/libopenblas_sandybridge-r0.3.6.so
>
> locale:
>   [1] LC_CTYPE=fr_FR.UTF-8   LC_NUMERIC=C
>   [3] LC_TIME=fr_FR.UTF-8LC_COLLATE=fr_FR.UTF-8
>   [5] LC_MONETARY=fr_FR.UTF-8LC_MESSAGES=fr_FR.UTF-8
>   [7] LC_PAPER=fr_FR.UTF-8   LC_NAME=C
>   [9] LC_ADDRESS=C   LC_TELEPHONE=C
> [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats graphics  grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] multbxxc_1.0.1rmumps_5.2.1-11
> [3] arrApply_2.1  RcppArmadillo_0.9.800.4.0
> [5] Rcpp_1.0.3slam_0.1-47
> [7] nnls_1.4
>
> loaded via a namespace (and not attached):
> [1] compiler_3.6.1   tools_3.6.1  codetools_0.2-16
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] tempdir() containing spaces breaks installing source packages

2019-12-13 Thread William Dunlap via R-devel
You might expand the scope of this a bit to include Windows usernames with
non-ASCII characters in them.  If I recall correctly, if you are logged
under a Cyrillic UTF-8 name then R will not even start.  We have seen this
in the wild.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Fri, Dec 13, 2019 at 8:47 AM Ivan Krylov  wrote:

> Hello everyone!
>
> Temp paths are used in system2() calls without shQuote() because
> they are assumed not to contain spaces. On Windows,
> GetShortPathName() is used to try to ensure that.
>
> Unfortunately, sometimes GetShortPathName() silently fails to return a
> 8.3 file path and gives a full path instead (8.3 file names might be
> disabled on newer Windows 10 installations, or there may be another
> reason). This has been spotted in the wild [*]. When %USERPROFILE%
> contains spaces, this results in tempdir() also containing spaces and
> prevents the user from being able to install source packages.
>
> As of ,
>
>  - src/library/utils/R/packages2.R line 839 contains an unquoted
>temporary file path (fil) passed to system2(), which results in it
>being split and R CMD INSTALL not being able to find the package
>file. In other invocations of R CMD INSTALL in the same file, the
>path is properly quoted.
>
>  - src/library/tools/R/check.R line 125 contains an unquoted temporary
>file path passed to system2, which results in Rterm.exe -f not being
>able to find the RtmpXX\Rin file, causing the attempt to
>run tools:::makeLazyLoading(...) to fail.
>
> I can report these two problems (thanks to Martin Maechler for the
> Bugzilla account and the advice) and attach the patches required to fix
> them, but there might be more. The bug report [**] is somewhat relevant
> here (though changing the default behaviour of system2() is obviously
> not the right solution as it would break existing code).
>
> Is there anything I should consider before creating the PR as
> described above?
>
> --
> Best regards,
> Ivan
>
> [*] https://stat.ethz.ch/pipermail/r-help/2019-December/465075.html
>
> [**] https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16127
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Error in close.connection(p) : ignoring SIGPIPE signal

2019-12-06 Thread William Dunlap via R-devel
You may be running out of file descriptors because the pipe objects are not
getting garbage collected often enough.  Adding the line
   if (cnt %% 100 == 0) { cat(cnt, "\n"); gc() }
to your loop  lets it continue indefinitely.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Fri, Dec 6, 2019 at 4:29 AM Benjamin Tyner  wrote:

> Andreas,
>
> How right you are! Still, I find it curious that in the context of the
> while(TRUE) loop, I am allowed to do this 653 times, with failure on the
> 654th attempt. Perhaps there is something asynchronous going on? If I
> eliminate the looping, it does indeed fail (as expected) on the first
> attempt to close the pipe.
>
> Regards
>
> Ben
>
> On 12/6/19 2:04 AM, Andreas Kersting wrote:
> > Hi Benjamin,
> >
> > you cannot pipe to echo, since it does not read from stdin.
> >
> > echo just echos is first arg, i.e. echo /dev/stdin > /dev/null will echo
> the string "/dev/stdin"to /dev/stdout, which is redirected to /dev/null.
> >
> > Try
> >
> > p <- pipe("cat > /dev/null", open = "w")
> >
> > instead.
> >
> > Regards,
> > Andreas
> >
> > 2019-12-06 02:46 GMT+01:00 Benjamin Tyner:
> >> Not sure if this is a bug, so posting here first. If I run:
> >> cnt <- 0L
> >> while (TRUE) {
> >> cnt <- cnt + 1L
> >> p <- pipe("echo /dev/stdin > /dev/null", open = "w")
> >> writeLines("foobar", p)
> >> tryCatch(close(p), error = function(e) { print(cnt); stop(e)})
> >> }
> >>
> >> then once cnt gets to around 650, it fails with:
> >>
> >> [1] 654
> >> Error in close.connection(p) : ignoring SIGPIPE signal
> >>
> >> Should I not be using pipe() in this way? Here is my sessionInfo()
> >>
> >> R version 3.6.0 (2019-04-26)
> >> Platform: x86_64-pc-linux-gnu (64-bit)
> >> Running under: Ubuntu 18.04.3 LTS
> >>
> >> Matrix products: default
> >> BLAS:   /home/btyner/R360/lib64/R/lib/libRblas.so
> >> LAPACK: /home/btyner/R360/lib64/R/lib/libRlapack.so
> >>
> >> locale:
> >>  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
> >>  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
> >>  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
> >>  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
> >>  [9] LC_ADDRESS=C   LC_TELEPHONE=C
> >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> >>
> >> attached base packages:
> >> [1] stats graphics  grDevices utils datasets  methods base
> >>
> >> loaded via a namespace (and not attached):
> >> [1] compiler_3.6.0
> >>
> >> Regards,
> >> Ben
> >>
> >> __
> >> R-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] class() |--> c("matrix", "arrary") [was "head.matrix ..."]

2019-11-15 Thread William Dunlap via R-devel
arrays and matrices have a numeric dims attribute, vectors don't.  If
statements lead to bad code.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Fri, Nov 15, 2019 at 1:19 PM Abby Spurdle  wrote:

> > > And indeed I think you are right on spot and this would mean
> > > that indeed the implicit class
> > > "matrix" should rather become c("matrix", "array").
> >
> > I've made up my mind (and not been contradicted by my fellow R
> > corers) to try go there for  R 4.0.0   next April.
>
> I'm not enthusiastic about matrices extending arrays.
> If a matrix is an array, then shouldn't all vectors in R, be arrays too?
>
> > #mockup
> > class (1)
> [1] "numeric" "array"
>
> Which is a bad idea.
> It contradicts the central principle that R uses "Vectors" rather than
> "Arrays".
> And I feel that matrices are and should be, a special case of vectors.
> (With their inheritance from vectors taking precedence over anything else).
>
> If the motivation is to solve the problem of 2D arrays, automatically
> being mapped to matrices:
>
> > class (array (1, c (2, 2) ) )
> [1] "matrix"
>
> Then wouldn't it be better, to treat 2D arrays, as a special case, and
> leave matrices as they are?
>
> > #mockup
> > class (array (1, c (2, 2) ) )
> [1] "array2d" "matrix" "array"
>
> Then 2D arrays would have access to both matrix and array methods...
>
> Note, I don't want to enter into (another) discussion on the
> differences between implicit class and classes defined via a class
> attribute.
> That's another discussion, which has little to do with my points above.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] calls with comment attribute

2019-11-12 Thread William Dunlap via R-devel
I suspect that the parser used it to store comments, including the initial
"#", before R started using the srcref attribute.  (S also stored comments
in the parse tree.)

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Tue, Nov 12, 2019 at 4:16 PM Duncan Murdoch 
wrote:

> On 12/11/2019 5:01 p.m., William Dunlap via R-devel wrote:
> > In general R doesn't print the "comment" attribute of an object
> > > structure(1:3, comment=c("a comment", "another comment"))
> > [1] 1 2 3
> > but if the object is a call it prints it in an unusual format
> > > structure(quote(func(arg)), comment=c("a comment", "another
> comment"))
> > a comment
> > another comment
> > func(arg)
> >
> > What is the rationale for the special treatment of calls?
>
> It was there in revision 2 of src/main/deparse.c in 1997.  (For those
> unfamiliar with R history:  the current revision of R is 77405.  That
> particular file has been revised 248 times since rev 2.)
>
> I suspect either nobody has noticed it before, or nobody had the nerve
> to touch it.
>
> Duncan Murdoch
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] calls with comment attribute

2019-11-12 Thread William Dunlap via R-devel
In general R doesn't print the "comment" attribute of an object
   > structure(1:3, comment=c("a comment", "another comment"))
   [1] 1 2 3
but if the object is a call it prints it in an unusual format
   > structure(quote(func(arg)), comment=c("a comment", "another comment"))
   a comment
   another comment
   func(arg)

What is the rationale for the special treatment of calls?

Bill Dunlap
TIBCO Software
wdunlap tibco.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] improving the performance of install.packages

2019-11-08 Thread William Dunlap via R-devel
Suppose update.packages("pkg") installed "pkg" if it were not already
installed, in addition to its current behavior of installing "pkg" if "pkg"
is installed but a newer version is available.  The OP could then use
update.packages() all the time instead of install.packages() the first time
and update.packages() subsequent times.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Fri, Nov 8, 2019 at 2:51 PM Duncan Murdoch 
wrote:

> On 08/11/2019 2:55 p.m., Joshua Bradley wrote:
> > I could do this...and I have before. This brings up a more fundamental
> > question though. You're asking me to write code that changes the logic of
> > the installation process (i.e. writing my own package installer). Instead
> > of doing that, I would rather integrate that logic into R itself to
> improve
> > the baseline installation process. This api proposal change would be
> > additive and would not break legacy code.
>
> That's not true.  The current behaviour is equivalent to force=TRUE; I
> believe the proposal was to change the default to force=FALSE.
>
> If you didn't change the default, it wouldn't help your example:  the
> badly written script would run with force=TRUE, and wouldn't benefit at
> all.
>
> Duncan Murdoch
>
> >
> > Package managers like pip (python), conda (python), yum (CentOS), apt
> > (Ubuntu), and apk (Alpine) are all "smart" enough to know (by their
> > defaults) when to not download a package again. By proposing this change,
> > I'm essentially asking that R follow some of the same conventions and
> best
> > practices that other package managers have adopted over the decades.
> >
> > I assumed this list is used to discuss proposals like this to the R
> > codebase. If I'm on the wrong list, please let me know.
> >
> > P.S. if this change happened, it would be interesting to study the effect
> > it has on the bandwidth across all CRAN mirrors. A significant drop would
> > turn into actual $$ saved
> >
> > Josh Bradley
> >
> >
> > On Fri, Nov 8, 2019 at 5:00 AM Duncan Murdoch 
> > wrote:
> >
> >> On 08/11/2019 2:06 a.m., Joshua Bradley wrote:
> >>> Hello,
> >>>
> >>> Currently if you install a package twice:
> >>>
> >>> install.packages("testit")
> >>> install.packages("testit")
> >>>
> >>> R will build the package from source (depending on what OS you're
> using)
> >>> twice by default. This becomes especially burdensome when people are
> >> using
> >>> big packages (i.e. lots of depends) and someone has a script with:
> >>>
> >>> install.packages("tidyverse")
> >>> ...
> >>> ... later on down the script
> >>> ...
> >>> install.packages("dplyr")
> >>>
> >>> In this case, "dplyr" is part of the tidyverse and will install twice.
> As
> >>> the primary "package manager" for R, it should not install a package
> >> twice
> >>> (by default) when it can be so easily checked. Indeed, many people
> resort
> >>> to writing a few lines of code to filter out already-installed packages
> >> An
> >>> r-help post from 2010 proposed a solution to improving the default
> >>> behavior, by adding "force=FALSE" as a api addition to
> install.packages.(
> >>> https://stat.ethz.ch/pipermail/r-help/2010-May/239492.html)
> >>>
> >>> Would the R-core devs still consider this proposal?
> >>
> >> Whether or not they'd do it, it's easy for you to do it.
> >>
> >> install.packages <- function(pkgs, ..., force = FALSE) {
> >> if (!force) {
> >>   pkgs <- Filter(Negate(requireNamespace), pkgs
> >>
> >> utils::install.packages(pkgs, ...)
> >> }
> >>
> >> You might want to make this more elaborate, e.g. doing update.packages()
> >> on the ones that exist.  But really, isn't the problem with the script
> >> you're using, which could have done a simple test before forcing a slow
> >> install?
> >>
> >> Duncan Murdoch
> >>
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] improving the performance of install.packages

2019-11-08 Thread William Dunlap via R-devel
While developing a package, I often run install.packages() on it many times
in a session without updating its version number.  How would your proposed
change affect this workflow?
Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Fri, Nov 8, 2019 at 11:56 AM Joshua Bradley  wrote:

> I could do this...and I have before. This brings up a more fundamental
> question though. You're asking me to write code that changes the logic of
> the installation process (i.e. writing my own package installer). Instead
> of doing that, I would rather integrate that logic into R itself to improve
> the baseline installation process. This api proposal change would be
> additive and would not break legacy code.
>
> Package managers like pip (python), conda (python), yum (CentOS), apt
> (Ubuntu), and apk (Alpine) are all "smart" enough to know (by their
> defaults) when to not download a package again. By proposing this change,
> I'm essentially asking that R follow some of the same conventions and best
> practices that other package managers have adopted over the decades.
>
> I assumed this list is used to discuss proposals like this to the R
> codebase. If I'm on the wrong list, please let me know.
>
> P.S. if this change happened, it would be interesting to study the effect
> it has on the bandwidth across all CRAN mirrors. A significant drop would
> turn into actual $$ saved
>
> Josh Bradley
>
>
> On Fri, Nov 8, 2019 at 5:00 AM Duncan Murdoch 
> wrote:
>
> > On 08/11/2019 2:06 a.m., Joshua Bradley wrote:
> > > Hello,
> > >
> > > Currently if you install a package twice:
> > >
> > > install.packages("testit")
> > > install.packages("testit")
> > >
> > > R will build the package from source (depending on what OS you're
> using)
> > > twice by default. This becomes especially burdensome when people are
> > using
> > > big packages (i.e. lots of depends) and someone has a script with:
> > >
> > > install.packages("tidyverse")
> > > ...
> > > ... later on down the script
> > > ...
> > > install.packages("dplyr")
> > >
> > > In this case, "dplyr" is part of the tidyverse and will install twice.
> As
> > > the primary "package manager" for R, it should not install a package
> > twice
> > > (by default) when it can be so easily checked. Indeed, many people
> resort
> > > to writing a few lines of code to filter out already-installed packages
> > An
> > > r-help post from 2010 proposed a solution to improving the default
> > > behavior, by adding "force=FALSE" as a api addition to
> install.packages.(
> > > https://stat.ethz.ch/pipermail/r-help/2010-May/239492.html)
> > >
> > > Would the R-core devs still consider this proposal?
> >
> > Whether or not they'd do it, it's easy for you to do it.
> >
> > install.packages <- function(pkgs, ..., force = FALSE) {
> >if (!force) {
> >  pkgs <- Filter(Negate(requireNamespace), pkgs
> >
> >utils::install.packages(pkgs, ...)
> > }
> >
> > You might want to make this more elaborate, e.g. doing update.packages()
> > on the ones that exist.  But really, isn't the problem with the script
> > you're using, which could have done a simple test before forcing a slow
> > install?
> >
> > Duncan Murdoch
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Puzzled about a new method for "[".

2019-11-04 Thread William Dunlap via R-devel
> the perils certainly are not immediately apparent to me.

Here is a concrete example of a peril
 `[.myclass` <- function(x, i, j, drop = if (missing(i)) TRUE else
length(cols) == 1)
   {
   SaveAt <- lapply(x, attributes)
   x <- NextMethod()
   lX <- lapply(names(x),function(nm, x, Sat){
 attributes(x[[nm]]) <- Sat[[nm]]
 x[[nm]]}, x = x, Sat = SaveAt)
   names(lX) <- names(x)
   x <- as.data.frame(lX)
   x
   }

 x <- data.frame(Mat=I(matrix(101:106,ncol=2)), Vec=201:203)
 xmc <- structure(x, class=c("myclass", class(x)))
 xmc[1:2,]
Error in attributes(x[[nm]]) <- Sat[[nm]] :
  dims [product 6] do not match the length of object [4]
 x[1:2,]
  Mat.1 Mat.2 Vec
1   101   104 201
2   102   105 202

I would be surprised if extracting a column from some rows of a data.frame
gave a different result than extracting some rows from a column of a
data.frame.  The row-selecting method used by [.data.frame depends on the
class of the column.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Mon, Nov 4, 2019 at 12:28 PM Rolf Turner  wrote:

>
> On 5/11/19 3:41 AM, Hadley Wickham wrote:
>
> > For what it's worth, I don't think this strategy can work in general,
> > because a class might have attributes that depend on its data/contents
> > (e.g. https://vctrs.r-lib.org/articles/s3-vector.html#cached-sum). I
> > don't think these are particularly common in practice, but it's
> > dangerous to assume that you can restore a class simply by restoring
> > its attributes after subsetting.
>
>
> You're probably right that there are lurking perils in general, but I am
> not trying to "restore a class".  I simply want to *retain* attributes
> of columns in a data frame.
>
> * I have a data frame X
> * I attach attributes to certain of its columns;
>   attr(X$melvin,"clyde") <- 42
>(I *don't* change the class of X$melvin.)
> * I form a subset of X:
>  Y <- X[1:100,3:10]
> * given that "melvin" is amongst columns 3 through 10 of X,
>  I want Y$melvin to retain the attribute "clyde", i.e. I
>  want attr(Y$melvin,"clyde") to return 42
>
> There is almost surely a better approach than the one that I've chosen
> (isn't there always?) but it seems to work, and the perils certainly are
> not immediately apparent to me.
>
> cheers,
>
> Rolf
>
> --
> Honorary Research Fellow
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Should slot<-() alter its first argument?

2019-09-19 Thread William Dunlap via R-devel
We noticed that the slot<- function alters its first argument, which goes
against the grain of a functional language.  The similar @<- does not
change its first argument.  Is this intended?  The timeSeries and distr
package depend on this altering.

> setClass("Z", rep=representation(x="character"))
> z <- new("Z", x="orig")
> `@<-`(z, "x", value="newer")
An object of class "Z"
Slot "x":
[1] "newer"

> z
An object of class "Z"
Slot "x":
[1] "orig"

>
> `slot<-`(z, "x", value="newest")
An object of class "Z"
Slot "x":
[1] "newest"

> z
An object of class "Z"
Slot "x":
[1] "newest"

Bill Dunlap
TIBCO Software
wdunlap tibco.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Error: package or namespace load failed for ‘utils

2019-09-08 Thread William Dunlap via R-devel
Also, check the settings of R_HOME and/or R_LIBS.
Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Sun, Sep 8, 2019 at 9:58 AM William Dunlap  wrote:

> Look at section 6.1 of the R Installation and Admin manual.
>
> 6.1 Default packages
>
> The set of packages loaded on startup is by default
>
> > getOption("defaultPackages")
> [1] "datasets"  "utils" "grDevices" "graphics"  "stats" "methods"
>
> (plus, of course, *base*) and this can be changed by setting the option
> in startup code (e.g. in ~/.Rprofile). It is initially set to the value
> of the environment variable R_DEFAULT_PACKAGES if set (as a
> comma-separated list). Setting R_DEFAULT_PACKAGES=NULL ensures that only
> package *base* is loaded.
>
> Changing the set of default packages is normally used to reduce the set
> for speed when scripting: in particular not using *methods*will reduce
> the start-up time by a factor of up to two. But it can also be used to
> customize R, e.g. for class use. Rscript also checks the environment
> variable R_SCRIPT_DEFAULT_PACKAGES; if set, this takes precedence over
> R_DEFAULT_PACKAGES.
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
>
> On Sun, Sep 8, 2019 at 8:42 AM Laurent Gautier  wrote:
>
>> Hi,
>>
>> When starting an embedded R I encounter the following issue under certain
>> conditions:
>>
>> ```
>> Error: package or namespace load failed for ‘utils’ in if (.identC(class1,
>> class2) || .identC(class2, "ANY")) TRUE else {:
>>  missing value where TRUE/FALSE needed
>> ```
>> (more such errors for grDevices, graphics, and stats)
>>
>> And in the end:
>>
>> ```
>> Warning messages:
>> 1: package ‘utils’ in options("defaultPackages") was not found
>> 2: package ‘grDevices’ in options("defaultPackages") was not found
>> 3: package ‘graphics’ in options("defaultPackages") was not found
>> 4: package ‘stats’ in options("defaultPackages") was not found
>> ```
>>
>> While the embedded R appears functional, no package can be loaded.
>>
>> The erorr message from R (`missing value where TRUE/FALSE needed`)
>> suggests
>> that R should be able to catch the underlying issue (I am yet to find what
>> it is) earlier and with this make the task of troubleshooting easier.
>>
>> Best,
>>
>>
>> Laurent
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Error: package or namespace load failed for ‘utils

2019-09-08 Thread William Dunlap via R-devel
Look at section 6.1 of the R Installation and Admin manual.

6.1 Default packages

The set of packages loaded on startup is by default

> getOption("defaultPackages")
[1] "datasets"  "utils" "grDevices" "graphics"  "stats" "methods"

(plus, of course, *base*) and this can be changed by setting the option in
startup code (e.g. in ~/.Rprofile). It is initially set to the value of the
environment variable R_DEFAULT_PACKAGES if set (as a comma-separated list).
Setting R_DEFAULT_PACKAGES=NULL ensures that only package *base* is loaded.

Changing the set of default packages is normally used to reduce the set for
speed when scripting: in particular not using *methods*will reduce the
start-up time by a factor of up to two. But it can also be used to
customize R, e.g. for class use. Rscript also checks the environment
variable R_SCRIPT_DEFAULT_PACKAGES; if set, this takes precedence over
R_DEFAULT_PACKAGES.
Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Sun, Sep 8, 2019 at 8:42 AM Laurent Gautier  wrote:

> Hi,
>
> When starting an embedded R I encounter the following issue under certain
> conditions:
>
> ```
> Error: package or namespace load failed for ‘utils’ in if (.identC(class1,
> class2) || .identC(class2, "ANY")) TRUE else {:
>  missing value where TRUE/FALSE needed
> ```
> (more such errors for grDevices, graphics, and stats)
>
> And in the end:
>
> ```
> Warning messages:
> 1: package ‘utils’ in options("defaultPackages") was not found
> 2: package ‘grDevices’ in options("defaultPackages") was not found
> 3: package ‘graphics’ in options("defaultPackages") was not found
> 4: package ‘stats’ in options("defaultPackages") was not found
> ```
>
> While the embedded R appears functional, no package can be loaded.
>
> The erorr message from R (`missing value where TRUE/FALSE needed`) suggests
> that R should be able to catch the underlying issue (I am yet to find what
> it is) earlier and with this make the task of troubleshooting easier.
>
> Best,
>
>
> Laurent
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] inconsistent handling of factor, character, and logical predictors in lm()

2019-08-31 Thread William Dunlap via R-devel
> Functions like lm() treat logical predictors as factors, *not* as
numerical variables.

Not quite.  A factor with all elements the same causes lm() to give an
error while a logical of all TRUEs or all FALSEs just omits it from the
model (it gets a coefficient of NA).  This is a fairly common situation
when you fit models to subsets of a big data.frame.  This is an argument
for fixing the single-valued-factor problem, which would become more
noticeable if logicals were treated as factors.

 > d <- data.frame(Age=c(2,4,6,8,10), Weight=c(878, 890, 930, 800, 750),
Diseased=c(FALSE,FALSE,FALSE,TRUE,TRUE))
> coef(lm(data=d, Weight ~ Age + Diseased))
 (Intercept)  Age DiseasedTRUE
877.7333   5.4000-151.
> coef(lm(data=d, Weight ~ Age + factor(Diseased)))
 (Intercept)  Age factor(Diseased)TRUE
877.7333   5.4000-151.
> coef(lm(data=d, Weight ~ Age + Diseased, subset=Age<7))
 (Intercept)  Age DiseasedTRUE
847.  13.   NA
> coef(lm(data=d, Weight ~ Age + factor(Diseased), subset=Age<7))
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
  contrasts can be applied only to factors with 2 or more levels
> coef(lm(data=d, Weight ~ Age + factor(Diseased, levels=c(FALSE,TRUE)),
subset=Age<7))
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
  contrasts can be applied only to factors with 2 or more levels

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Sat, Aug 31, 2019 at 8:54 AM Fox, John  wrote:

> Dear Abby,
>
> > On Aug 30, 2019, at 8:20 PM, Abby Spurdle  wrote:
> >
> >> I think that it would be better to handle factors, character
> predictors, and logical predictors consistently.
> >
> > "logical predictors" can be regarded as categorical or continuous (i.e.
> 0 or 1).
> > And the model matrix should be the same, either way.
>
> I think that you're mistaking a coincidence for a principle. The
> coincidence is that FALSE/TRUE coerces to 0/1 and sorts to FALSE, TRUE.
> Functions like lm() treat logical predictors as factors, *not* as numerical
> variables.
>
> That one would get the same coefficient in either case is a consequence of
> the coincidence and the fact that the default contrasts for unordered
> factors are contr.treatment(). For example, if you changed the contrasts
> option, you'd get a different estimate (though of course a model with the
> same fit to the data and an equivalent interpretation):
>
>  snip --
>
> > options(contrasts=c("contr.sum", "contr.poly"))
> > m3 <- lm(Sepal.Length ~ Sepal.Width + I(Species == "setosa"), data=iris)
> > m3
>
> Call:
> lm(formula = Sepal.Length ~ Sepal.Width + I(Species == "setosa"),
> data = iris)
>
> Coefficients:
> (Intercept)  Sepal.Width  I(Species == "setosa")1
>  2.6672   0.9418   0.8898
>
> > head(model.matrix(m3))
>   (Intercept) Sepal.Width I(Species == "setosa")1
> 1   1 3.5  -1
> 2   1 3.0  -1
> 3   1 3.2  -1
> 4   1 3.1  -1
> 5   1 3.6  -1
> 6   1 3.9  -1
> > tail(model.matrix(m3))
> (Intercept) Sepal.Width I(Species == "setosa")1
> 145   1 3.3   1
> 146   1 3.0   1
> 147   1 2.5   1
> 148   1 3.0   1
> 149   1 3.4   1
> 150   1 3.0   1
>
> > lm(Sepal.Length ~ Sepal.Width + as.numeric(Species == "setosa"),
> data=iris)
>
> Call:
> lm(formula = Sepal.Length ~ Sepal.Width + as.numeric(Species ==
> "setosa"), data = iris)
>
> Coefficients:
> (Intercept)  Sepal.Width
> as.numeric(Species == "setosa")
>  3.5571   0.9418
> -1.7797
>
> > -2*coef(m3)[3]
> I(Species == "setosa")1
>   -1.779657
>
>  snip --
>
>
> >
> > I think the first question to be asked is, which is the best approach,
> > categorical or continuous?
> > The continuous approach seems simpler and more efficient to me, but
> > output from the categorical approach may be more intuitive, for some
> > people.
>
> I think that this misses the point I was trying to make: lm() et al. treat
> logical variables as factors, not as numerical predictors. One could argue
> about what's the better approach but not about what lm() does. BTW, I
> prefer treating a logical predictor as a factor because the predictor is
> essentially categorical.
>
> >
> > I note that the use factors and characters, doesn't necessarily
> > produce consistent output, for $xlevels.
> > (Because factors can have 

[Rd] New lazyload rdx key type: list(eagerKey=, lazyKeys=)

2019-08-30 Thread William Dunlap via R-devel
Prior to R-3.6.0 the keys in the lazyload key files, e.g.
pkg/data/Rdata.rdx or pkg/R/pkg.rdx, seemed to all be 2-long integer
vectors.  Now they can be lists.  The ones I have seen have two components,
"eagerKey" is a 2-long integer vector and "lazyKeys" is a named list of
2-long integer vectors.

> rdx <- readRDS(system.file(package="survival", "data", "Rdata.rdx"))
> str(Filter(is.list, rdx$references))
List of 2
 $ env::1:List of 2
  ..$ eagerKey: int [1:2] 273691 183
  ..$ lazyKeys:List of 1
  .. ..$ lines: int [1:2] 273874 284
 $ env::2:List of 2
  ..$ eagerKey: int [1:2] 473142 166
  ..$ lazyKeys:List of 1
  .. ..$ lines: int [1:2] 473308 310

or

>  rdx <- readRDS(system.file(package="lambda.r", "R", "lambda.r.rdx"))
> length(Filter(is.integer, rdx$references))
[1] 4
> str(Filter(Negate(is.integer), rdx$references))
List of 5
 $ env::5:List of 2
  ..$ eagerKey: int [1:2] 28278 328
  ..$ lazyKeys:List of 2
  .. ..$ lines: int [1:2] 28606 80
  .. ..$ parseData: int [1:2] 28686 389
 $ env::6:List of 2
  ..$ eagerKey: int [1:2] 29075 327
  ..$ lazyKeys:List of 2
  .. ..$ lines: int [1:2] 29402 71
  .. ..$ parseData: int [1:2] 29473 321
 $ env::7:List of 2
  ..$ eagerKey: int [1:2] 29794 325
  ..$ lazyKeys:List of 2
  .. ..$ lines: int [1:2] 30119 117
  .. ..$ parseData: int [1:2] 30236 752
... many more ...

All the ones I've seen involve the environment in srcref attributes and
most packages do not keep that.  Will these be used for more sorts of
environments in the future?

What is the meaning of the lazyKeys?  Are these stored as promises until
needed or is there some special option to never or always load them?

Bill Dunlap
TIBCO Software
wdunlap tibco.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ?Syntax wrong about `?`'s precedence ?

2019-08-30 Thread William Dunlap via R-devel
Precedence is a property of the parser and has nothing to do with the
semantics assigned to various symbols.  Using just core R functions you can
see the precedence of '?' is between those of '=' and '<-'.

> # '=' has lower precedence than '?'
> str(as.list(parse(text="a ? b = c")[[1]]))
List of 3
 $ : symbol =
 $ : language `?`(a, b)
 $ : symbol c
> str(as.list(parse(text="a = b ? c")[[1]]))
List of 3
 $ : symbol =
 $ : symbol a
 $ : language `?`(b, c)
> # '<-' has higher precedence than '?'
> str(as.list(parse(text="a ? b <- c")[[1]]))
List of 3
 $ : symbol ?
 $ : symbol a
 $ : language b <- c
> str(as.list(parse(text="a <- b ? c")[[1]]))
List of 3
 $ : symbol ?
 $ : language a <- b
 $ : symbol c

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Fri, Aug 30, 2019 at 4:41 AM Stephen Ellison 
wrote:

> > From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of Ant F
> > Sent: 29 August 2019 12:06
> > To: r-devel@r-project.org
> > Subject: [Rd] ?Syntax wrong about `?`'s precedence ?
> > ...
> > See the following example :
> >
> > `?` <- `+`
>
> I'm curious; What did you expect to happen if you replace the function '?'
> with the operator '+' ?
> ? is surely now being evaluated as a user-defined function and not as an
> operator.
> Would you expect the results of doing that to be the same as evaluation
> without replacement?
>
> S Ellison
>
>
>
>
> ***
> This email and any attachments are confidential. Any u...{{dropped:10}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Feature request: non-dropping regmatches/strextract

2019-08-15 Thread William Dunlap via R-devel
Using a non-capturing group, "(?:...)" instead of "(...)", simplifies my
example a bit

> x <- c("Groucho ", "", "Harpo")
> strcapture("([[:alpha:]]+)?(?: *<([[:alpha:]. ]+@[[:alpha:]. ]+)>)?", x,
proto=data.frame(Name=character(), Address=character(),
stringsAsFactors=FALSE))
 Name  Address
1 Groucho grou...@marx.com
2   ch...@marx.com
3   Harpo

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Thu, Aug 15, 2019 at 1:04 PM William Dunlap  wrote:

> I don't care much for regmatches and haven't tried strextract, but I think
> replacing the character(0) by NA_character_ is almost always inappropriate
> if the match information comes from gregexpr.
>
> I think strcapture() does a pretty good job of what I think you are trying
> to do.  Perhaps adding an argument to map no match to NA instead of ""
> would give you just what you wanted.
>
> > x <- c("Groucho ", "", "Harpo")
> > d <- strcapture("([[:alpha:]]+)?( *<([[:alpha:]. ]+@[[:alpha:]. ]+)>)?",
> x, proto=data.frame(Name=character(), Junk=character(),
> Address=character(), stringsAsFactors=FALSE))
> > d[c("Name", "Address")]
>  Name  Address
> 1 Groucho grou...@marx.com
> 2   ch...@marx.com
> 3   Harpo
> > str(.Last.value)
> 'data.frame':   3 obs. of  2 variables:
>  $ Name   : chr  "Groucho" "" "Harpo"
>  $ Address: chr  "grou...@marx.com" "ch...@marx.com" ""
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
>
> On Thu, Aug 15, 2019 at 11:31 AM Cyclic Group Z_1 <
> cyclicgroup...@yahoo.com> wrote:
>
>> I do think keeping the default behavior is desirable for backwards
>> compatibility; my suggestion is not to change default behavior but to add
>> an optional argument that allows a different behavior. Although this can be
>> implemented in a user-defined function, retaining empty matches facilitates
>> programmatic use, and seems to be something that should be available in
>> base R. It is available, for example, in MATLAB, a comparable array
>> language.
>>
>> Alternatively, perhaps a nomatch (or maybe emptymatch) argument in the
>> spirit of `[.data.table`? That is, an argument nomatch where nomatch = NULL
>> (the default) results in drops for vector outputs and character(0) for list
>> outputs and nomatch = NA results in insertion of NA_character_, and nomatch
>> = '' results in insertion of empty string.
>>
>> I can submit proposed patch code if others think this is a good idea.
>>
>> What are your thoughts on the proposed alteration to (currently
>> nonexported) strextract? I assume (maybe wrongly) that the plan is to
>> eventually export that function.
>>
>> Thank you,
>> CG
>>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Feature request: non-dropping regmatches/strextract

2019-08-15 Thread William Dunlap via R-devel
I don't care much for regmatches and haven't tried strextract, but I think
replacing the character(0) by NA_character_ is almost always inappropriate
if the match information comes from gregexpr.

I think strcapture() does a pretty good job of what I think you are trying
to do.  Perhaps adding an argument to map no match to NA instead of ""
would give you just what you wanted.

> x <- c("Groucho ", "", "Harpo")
> d <- strcapture("([[:alpha:]]+)?( *<([[:alpha:]. ]+@[[:alpha:]. ]+)>)?",
x, proto=data.frame(Name=character(), Junk=character(),
Address=character(), stringsAsFactors=FALSE))
> d[c("Name", "Address")]
 Name  Address
1 Groucho grou...@marx.com
2   ch...@marx.com
3   Harpo
> str(.Last.value)
'data.frame':   3 obs. of  2 variables:
 $ Name   : chr  "Groucho" "" "Harpo"
 $ Address: chr  "grou...@marx.com" "ch...@marx.com" ""
Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Thu, Aug 15, 2019 at 11:31 AM Cyclic Group Z_1 
wrote:

> I do think keeping the default behavior is desirable for backwards
> compatibility; my suggestion is not to change default behavior but to add
> an optional argument that allows a different behavior. Although this can be
> implemented in a user-defined function, retaining empty matches facilitates
> programmatic use, and seems to be something that should be available in
> base R. It is available, for example, in MATLAB, a comparable array
> language.
>
> Alternatively, perhaps a nomatch (or maybe emptymatch) argument in the
> spirit of `[.data.table`? That is, an argument nomatch where nomatch = NULL
> (the default) results in drops for vector outputs and character(0) for list
> outputs and nomatch = NA results in insertion of NA_character_, and nomatch
> = '' results in insertion of empty string.
>
> I can submit proposed patch code if others think this is a good idea.
>
> What are your thoughts on the proposed alteration to (currently
> nonexported) strextract? I assume (maybe wrongly) that the plan is to
> eventually export that function.
>
> Thank you,
> CG
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Rf_defineVar(symbol, R_UnboundValue, environment) questions

2019-08-15 Thread William Dunlap via R-devel
While poking around the C++ code in the dplyr package I ran across the idiom
   Rf_defineVar(symbol, R_UnboundValue, environment)
to [sort of] remove 'symbol' from 'environment'

Using it makes the R-level functions objects(), exists(), and get()
somewhat inconsistent and I was wondering if that was intended.  E.g.,  use
SHLIB to make something from the following C code that dyn.load can load
into R

% cat defineVarAsUnboundValue.c
#include 
#include 

SEXP defineVarAsUnboundValue(SEXP name, SEXP envir)
{
Rf_defineVar(name, R_UnboundValue, envir);
return R_NilValue;
}
erratic:bill:292% R-3.6.1 CMD SHLIB defineVarAsUnboundValue.c
gcc -std=gnu99 -I"/home/R/R-3.6.1/lib64/R/include" -DNDEBUG
-I/usr/local/include  -fpic  -g -O2  -c defineVarAsUnboundValue.c -o
defineVarAsUnboundValue.o
gcc -std=gnu99 -shared -L/home/R/R-3.6.1/lib64/R/lib -L/usr/local/lib64 -o
defineVarAsUnboundValue.so defineVarAsUnboundValue.o
-L/home/R/R-3.6.1/lib64/R/lib -lR
erratic:bill:293% R-3.6.1 --quiet --vanilla
> dyn.load("defineVarAsUnboundValue.so")
> envir <- list2env(list(One=1, Two=2))
> objects(envir)
[1] "One" "Two"
>
> .Call("defineVarAsUnboundValue", quote(Two), envir)
NULL
> objects(envir)
[1] "One"
> objects(envir, all.names=TRUE) # is "Two" a 'hidden' object?
[1] "One" "Two"
> exists("Two", envir=envir, inherits=FALSE)
[1] TRUE
> get("Two", envir=envir, inherits=FALSE) # get fails when exists says ok
Error in get("Two", envir = envir, inherits = FALSE) :
  object 'Two' not found

Should Rf_defineVar(sym, R_UnboundValue, envir) remove sym from envir?

Bill Dunlap
TIBCO Software
wdunlap tibco.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Feature request: non-dropping regmatches/strextract

2019-08-15 Thread William Dunlap via R-devel
Changing the default behavior of regmatches would break its use with
gregexpr, where
the number of matches per input element faries, so a zero-length character
vector
makes more sense than NA_character_.

> x <- c("John Doe", "e e cummings", "Juan de la Madrid")
> m <- gregexpr("[A-Z]", x)
> regmatches(x,m)
[[1]]
[1] "J" "D"

[[2]]
character(0)

[[3]]
[1] "J" "M"

> vapply(.Last.value, function(x)paste(paste0(x, "."),collapse=""), "")
[1] "J.D." ".""J.M."

(We don't want e e cummings initials mapped to "NA.")

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Thu, Aug 15, 2019 at 12:15 AM Cyclic Group Z_1 via R-devel <
r-devel@r-project.org> wrote:

> A very common use case for regmatches is to extract regex matches into a
> new column in a data.frame (or data.table, etc.) or otherwise use the
> extracted strings alongside the input. However, the default behavior is to
> drop empty matches, which results in mismatches in column length if
> reassignment is done without subsetting.
>
> For consistency with other R functions and compatibility with this use
> case, it would be nice if regmatches did not automatically drop empty
> matches and would instead insert an NA_character_ value (similar to
> stringr::str_extract). This alternative regmatches could be implemented
> through an optional drop argument, a new function, or mentioned in the
> documentation (a la resample in ?sample).
>
> Alternatively, at the moment, there is a non-exported function strextract
> in utils which is very similar to stringr::str_extract. It would be great
> if this function, once exported, were to include a drop argument to prevent
> dropping positions with no matches.
>
> An example solution (last option):
>
> strextract <- function(pattern, x, perl = FALSE, useBytes = FALSE, drop =
> T) {
>  m <- regexec(pattern, x, perl=perl, useBytes=useBytes)
>  result <- regmatches(x, m)
>
>  if(isTRUE(drop)){
>  unlist(result)
>  } else if(isFALSE(drop)) {
>  unlist({result[lengths(result)==0] <- NA_character_; result})
>  } else {
>  stop("Invalid argument for `drop`")
>  }
> }
>
> Based on Ricardo Saporta's response to How to prevent regmatches drop non
> matches?
>
> --CG
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Infrequent but steady NULL-pointer caused segfault in as.POSIXlt.POSIXct (R 3.4.4)

2019-08-02 Thread William Dunlap via R-devel
If you can run things on LInux try running a few iterations of that loop
under valgrind, setting gctorture(TRUE) before the loop.

% R --debugger=valgrind --silent
> gctorture(TRUE)
> for(i in 1:5) { ... body of your loop ... }

valgrind can show memory misuse that eventually will cause R to crash.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Fri, Aug 2, 2019 at 1:23 AM Sun Yijiang  wrote:

> The R script I run daily for hours looks like this:
>
> while (!finish) {
> Sys.sleep(0.1)
> time = as.integer(format(Sys.time(), "%H%M")) # always crash here
> if (new.data.timestamp() <= time)
> next
> # ... do some jobs for about 2 minutes ...
> gc()
> }
>
> Basically it waits for new data, which comes in every 10 minutes, and
> do some jobs, then gc(), then loop again.  It works great most of the
> time, but crashes strangely once a month or so.  Although infrequent,
> it always crashes at the same place and gives the same error info,
> like this:
>
>  *** caught segfault ***
> address (nil), cause 'memory not mapped'
>
> Traceback:
>  1: as.POSIXlt.POSIXct(x, tz)
>  2: as.POSIXlt(x, tz)
>  3: format.POSIXlt(as.POSIXlt(x, tz), format, usetz, ...)
>  4: structure(format.POSIXlt(as.POSIXlt(x, tz), format, usetz, ...),
>   names = names(x))
>  5: format.POSIXct(Sys.time(), format = "%H%M")
>  6: format(Sys.time(), format = "%H%M")
>  7: format(Sys.time(), format = "%H%M")
> … …
>
> I looked into the dumped core with gdb, and found something very strange:
>
> gdb /usr/lib64/R/bin/exec/R ~/core.30387
> (gdb) bt 5
> #0  0x7f1dca844ff1 in __strlen_sse2_pminub () from /lib64/libc.so.6
> #1  0x7f1dcb20e8f9 in Rf_mkChar (name=0x0) at envir.c:3725
> #2  0x7f1dcb1dc225 in do_asPOSIXlt (call=,
> op=, args=,
> env=) at datetime.c:705
> #3  0x7f1dcb22197f in bcEval (body=body@entry=0x4064b28,
> rho=rho@entry=0xc449d38, useCache=useCache@entry=TRUE)
> at eval.c:6473
> #4  0x7f1dcb230370 in Rf_eval (e=0x4064b28,
> rho=rho@entry=0xc449d38) at eval.c:624
> (More stack frames follow…)
>
> Tracing into src/main/datetime.c:705, it’s a simple string-making code:
> SET_STRING_ELT(tzone, 1, mkChar(R_tzname[0]));
>
> mkChar function is defined in envir.c:3725:
> 3723  SEXP mkChar(const char *name)
> 3724  {
> 3725  size_t len =  strlen(name);
> … …
>
> gdb shows that the string pointer (name=0x0) mkChar received is NULL,
> and subsequently strlen(NULL) caused the segfault.  But quite
> contradictorily, gdb shows the value passed to mkChar in the caller is
> valid:
>
> (gdb) frame 2
> #2  0x7f1dcb1dc225 in do_asPOSIXlt (call=,
> op=, args=,
> env=) at datetime.c:705
> 705 datetime.c: No such file or directory.
> (gdb) p tzname[0]
> $1 = 0x4cf39c0 “CST”
>
> R_tzname is an alias of tzname. (#define R_tzname tzname in the same file.)
>
> At first, I suspect that some library may have messed up the memory
> and accidentally zeroed tzname (a global variable).  But with this gdb
> trace, it shows that tzname is good, only that the pointer passed to
> mkChar magically changed to zero.  Like this:
>
> mkChar(tzname[0])  // tzname[0] is “CST”, address 0x4cf39c
> … …
> SEXP mkChar(const char *name)  // name should be 0x4cf39c, but gdb shows
> 0x0
> {
> size_t len =  strlen(name);  // segfault, as name is NULL
> … …
>
> The only theory I can think of so far is that, on calling mkChar, the
> parameter passed on stack somehow got wiped out to zero by some buggy
> code in R or library.  At a higher level, what I see is this:  If you
> run format(Sys.time(), "%H%M”) a million times a day (together with
> other codes of course), once in a month or so this simple line can
> segfault.
>
> I’m lost in this confusion, could someone please help me find the
> right direction to further look into this problem?
>
> Regards,
> Steve
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Potential bug with data.frame replacement

2019-07-15 Thread William Dunlap via R-devel
This may be related to the size of the deparsed call in the error message
that Brodie and Luke were discussing recently on R-devel (" Mitigating
Stalls Caused by Call Deparse on Error").   I don't get a crash, but the
error message itself doesn't show up after the deparsed call.

> X <- sample(letters, 3000, TRUE)
> D <- data.frame(X, 1:3000, X, X, X, X, X)
> D$X1.3000 <- paste0("GSM", D)
Error in `$<-.data.frame`(`*tmp*`, X1.3000, value = c("GSMc(16, 6, 11, 1,
13, 7,
... many lines elided ...
 24, 24, 9, 7, 10, 17, 17, 6, 26, 26, 19, 11, 15, \n12, 9, 25, 17, 21, 24,
12, 14, 21, 23, 11, 7, 8, 11, 7, 10,
> # Note the message part of the error message was not printed
> # Use tryCatch to get the details
> e <- tryCatch(D$X1.3000 <- paste0("GSM", D), error=function(e)e)
> str(e)
List of 2
 $ message: chr "replacement has 7 rows, data has 3000"
 $ call   : language `$<-.data.frame`(`*tmp*`, X1.3000, value = c("GSMc(23,
10, 2, 9, 4, 3, 16, 12, 21, 26, 3, 17, 6, 25, 8, 1, 17, 10| __truncated__
...
 - attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
> nchar(deparse(e$call))
[1] 11068 11036 11023 11023 11023 11021 2


Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Mon, Jul 15, 2019 at 3:25 AM Benjamin Jean-Marie Tremblay <
b2tremb...@uwaterloo.ca> wrote:
>
> Dear R-devel,
>
> I have encountered a crash-inducing scenario and would like to enquire as
to
> whether this would be considered a bug. To reproduce the crash:
>
> X <- sample(letters, 3000, TRUE)
> D <- data.frame(X, 1:3000, X, X, X, X, X)
> D$X1.3000 <- paste0("GSM", D)
>
> The reason why I'm not sure if this would be considered a bug is because I
> typed this by accident, when what I meant was:
>
> D$X1.3000 <- paste0("GSM", D$X1.3000)
>
> I can never image a scenario where I would intentionally perform the
former.
>
> This issue seems to have something to do with the size of the data.frame,
as
> smaller examples will work fine:
>
> D <- data.frame(A = 1:10, B = letters[1:10])
> D$A <- paste0("A", D)
>
> Also just doing the paste0 part without trying to replace a data.frame
column
> not crash R for me.
>
> I can submit this on Bugzilla should this be deemed sufficiently buggy.
>
> I am running 3.6.0 on macOS (x86_64-apple-darwin15.6.0).
>
> Sincerely,
>
> B.T.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Calculation of e^{z^2/2} for a normal deviate z

2019-06-23 Thread William Dunlap via R-devel
include/Rmath.h declares a set of 'logspace' functions for use at the C
level.  I don't think there are core R functions that call them.

/* Compute the log of a sum or difference from logs of terms, i.e.,
 *
 * log (exp (logx) + exp (logy))
 * or  log (exp (logx) - exp (logy))
 *
 * without causing overflows or throwing away too much accuracy:
 */
double  Rf_logspace_add(double logx, double logy);
double  Rf_logspace_sub(double logx, double logy);
double  Rf_logspace_sum(const double *logx, int nx);

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Sun, Jun 23, 2019 at 1:40 AM Ben Bolker  wrote:

>
>   I agree with many the sentiments about the wisdom of computing very
> small p-values (although the example below may win some kind of a prize:
> I've seen people talking about p-values of the order of 10^(-2000), but
> never 10^(-(10^8)) !).  That said, there are a several tricks for
> getting more reasonable sums of very small probabilities.  The first is
> to scale the p-values by dividing the *largest* of the probabilities,
> then do the (p/sum(p)) computation, then multiply the result (I'm sure
> this is described/documented somewhere).  More generally, there are
> methods for computing sums on the log scale, e.g.
>
>
> https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.misc.logsumexp.html
>
>  I don't know where this has been implemented in the R ecosystem, but
> this sort of computation is the basis of the "Brobdingnag" package for
> operating on very large ("Brobdingnagian") and very small
> ("Lilliputian") numbers.
>
>
> On 2019-06-21 6:58 p.m., jing hua zhao wrote:
> > Hi Peter, Rui, Chrstophe and Gabriel,
> >
> > Thanks for your inputs --  the use of qnorm(., log=TRUE) is a good point
> in line with pnorm with which we devised log(p)  as
> >
> > log(2) + pnorm(-abs(z), lower.tail = TRUE, log.p = TRUE)
> >
> > that could do really really well for large z compared to Rmpfr. Maybe I
> am asking too much since
> >
> > z <-2
> >> Rmpfr::format(2*pnorm(mpfr(-abs(z),100),lower.tail=TRUE,log.p=FALSE))
> > [1] "1.660579603192917090365313727164e-86858901"
> >
> > already gives a rarely seen small p value. I gather I also need a
> multiple precision exp() and their sum since exp(z^2/2) is also a Bayes
> Factor so I  get log(x_i )/sum_i log(x_i) instead. To this point, I am
> obliged to clarify, see
> https://statgen.github.io/gwas-credible-sets/method/locuszoom-credible-sets.pdf
> .
> >
> > I agree many feel geneticists go to far with small p values which I
> would have difficulty to argue againston the other hand it is also expected
> to see these in a non-genetic context. For instance the Framingham study
> was established in 1948 just got $34m for six years on phenotypewide
> association which we would be interesting to see.
> >
> > Best wishes,
> >
> >
> > Jing Hua
> >
> >
> > 
> > From: peter dalgaard 
> > Sent: 21 June 2019 16:24
> > To: jing hua zhao
> > Cc: Rui Barradas; r-devel@r-project.org
> > Subject: Re: [Rd] Calculation of e^{z^2/2} for a normal deviate z
> >
> > You may want to look into using the log option to qnorm
> >
> > e.g., in round figures:
> >
> >> log(1e-300)
> > [1] -690.7755
> >> qnorm(-691, log=TRUE)
> > [1] -37.05315
> >> exp(37^2/2)
> > [1] 1.881797e+297
> >> exp(-37^2/2)
> > [1] 5.314068e-298
> >
> > Notice that floating point representation cuts out at 1e+/-308 or so. If
> you want to go outside that range, you may need explicit manipulation of
> the log values. qnorm() itself seems quite happy with much smaller values:
> >
> >> qnorm(-5000, log=TRUE)
> > [1] -99.94475
> >
> > -pd
> >
> >> On 21 Jun 2019, at 17:11 , jing hua zhao 
> wrote:
> >>
> >> Dear Rui,
> >>
> >> Thanks for your quick reply -- this allows me to see the bottom of
> this. I was hoping we could have a handle of those p in genmoics such as
> 1e-300 or smaller.
> >>
> >> Best wishes,
> >>
> >>
> >> Jing Hua
> >>
> >> 
> >> From: Rui Barradas 
> >> Sent: 21 June 2019 15:03
> >> To: jing hua zhao; r-devel@r-project.org
> >> Subject: Re: [Rd] Calculation of e^{z^2/2} for a normal deviate z
> >>
> >> Hello,
> >>
> >> Well, try it:
> >>
> >> p <- .Machine$double.eps^seq(0.5, 1, by = 0.05)
> >> z <- qnorm(p/2)
> >>
> >> pnorm(z)
> >> # [1] 7.450581e-09 1.22e-09 2.026908e-10 3.343152e-11 5.514145e-12
> >> # [6] 9.094947e-13 1.500107e-13 2.474254e-14 4.080996e-15 6.731134e-16
> >> #[11] 1.110223e-16
> >> p/2
> >> # [1] 7.450581e-09 1.22e-09 2.026908e-10 3.343152e-11 5.514145e-12
> >> # [6] 9.094947e-13 1.500107e-13 2.474254e-14 4.080996e-15 6.731134e-16
> >> #[11] 1.110223e-16
> >>
> >> exp(z*z/2)
> >> # [1] 9.184907e+06 5.301421e+07 3.073154e+08 1.787931e+09 1.043417e+10
> >> # [6] 6.105491e+10 3.580873e+11 2.104460e+12 1.239008e+13 7.306423e+13
> >> #[11] 4.314798e+14
> >>
> >>
> >> p is the smallest possible such that 1 + p != 1 and I couldn't find
> >> anything to worry about.
> >>
> >>
> >> R version 3.6.0 

Re: [Rd] [R] Open a file which name contains a tilde

2019-06-11 Thread William Dunlap via R-devel
Note that R treats tildes in file names differently on Windows and Linux.
On Windows, it is only replaced if it it at the beginning of the line and
is followed by a forward or backward slash or end-of-line.  On Linux it is
replaced no matter where it is in the text and ~someUser will be replaced
by someUser's home directory (if 'someUser' is a user with a home
directory).

Hence, if you have a Windows machine that can look at the file system on
your Linux machine you can use file.rename on Windows to change the names.
My inclination would be to use a bash script on Linux to change the names,
but if you are not comfortable with bash try the Windows approach.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Tue, Jun 11, 2019 at 1:13 PM Frank Schwidom  wrote:

> Hi Gabriel,
>
> I actually want to make renames over thousands of files. But if I am not
> able to express the source filename of the rename operation I will not be
> able to get the work done. Besides the fact that there are issues I think
> that R is qualified for solving my problem by the method how it can handle
> long vectors of strings, booleans and also lists.
>
> Kind regards,
> Frank
>
> On 2019-06-11 09:49:17, Gabriel Becker wrote:
> >Hi Frank,
> >I'm hesitant to be "that guy", but in case no one else has brought
> this up
> >to you, having files with a tilde in their names (generally but
> especially
> >on a linux system, where ~ in file names has a very important special
> >meaning in some cases, as we know) strikes me as an exceptionally bad
> >practice anyway. In light of that, the solution with the smallest
> amount
> >of pain for you is almost surely to just... not do that. Your
> filenames
> >will be better for it anyway.
> >There is a reason no one has complained about this before, and while I
> >haven't run a study or anything, I strongly suspect its that
> "everyone"
> >else is already on the "no tildes in filenames" bandwagon, so this
> >behavior, even if technically a bug, has no ability to cause them
> >problems.
> >Best,
> >~G
> >On Tue, Jun 11, 2019 at 8:25 AM Frank Schwidom <[1]schwi...@gmx.net>
> >wrote:
> >
> >  Hi,
> >
> >  yes, I have seen this package and it has the same tilde expanding
> >  problem.
> >
> >  Please excuse me I will cc this answer to r-help and r-devel to
> keep the
> >  discussion running.
> >
> >  Kind regards,
> >  Frank Schwidom
> >
> >  On 2019-06-11 09:12:36, Gábor Csárdi wrote:
> >  > Just in case, have you seen the fs package?
> >  > [2]https://fs.r-lib.org/
> >  >
> >  > Gabor
> >  >
> >  > On Tue, Jun 11, 2019 at 7:51 AM Frank Schwidom <[3]
> schwi...@gmx.net>
> >  wrote:
> >  > >
> >  > > Hi,
> >  > >
> >  > > to get rid of any possible filename modification I started a
> little
> >  project to cover my usecase:
> >  > >
> >  > > [4]https://github.com/schwidom/simplefs
> >  > >
> >  > > This is my first R package, suggestions and a review are
> welcome.
> >  > >
> >  > > Thanks in advance
> >  > > Frank Schwidom
> >  > >
> >  > > On 2019-06-07 09:04:06, Richard O'Keefe wrote:
> >  > > >How can expanding tildes anywhere but the beginning of a
> file
> >  name NOT be
> >  > > >considered a bug?
> >  > > >On Thu, 6 Jun 2019 at 23:04, Ivan Krylov
> >  <[1][5]krylov.r...@gmail.com> wrote:
> >  > > >
> >  > > >  On Wed, 5 Jun 2019 18:07:15 +0200
> >  > > >  Frank Schwidom <[2][6]schwi...@gmx.net> wrote:
> >  > > >
> >  > > >  > +> path.expand("a ~ b")
> >  > > >  > [1] "a /home/user b"
> >  > > >
> >  > > >  > How can I switch off any file crippling activity?
> >  > > >
> >  > > >  It doesn't seem to be possible if readline is enabled and
> >  works
> >  > > >  correctly.
> >  > > >
> >  > > >  Calls to path.expand [1] end up [2] in R_ExpandFileName
> [3],
> >  which
> >  > > >  calls R_ExpandFileName_readline [4], which uses
> libreadline
> >  function
> >  > > >  tilde_expand [5]. tilde_expand seems to be designed to
> expand
> >  '~'
> >  > > >  anywhere in the string it is handed, i.e. operate on
> whole
> >  command
> >  > > >  lines, not file paths.
> >  > > >
> >  > > >  I am taking the liberty of Cc-ing R-devel in case this
> can be
> >  > > >  considered a bug.
> >  > > >
> >  > > >  --
> >  > > >  Best regards,
> >  > > >  Ivan
> >  > > >
> >  > > >  [1]
> >  > > >
> >  [3][7]
> https://github.com/wch/r-source/blob/12d1d2d232d84aa355e48b81180a0e2c6f2f/src/main/names.c#L807
> >  > > >
> >  > > >  [2]
> >  > > >
> >  [4][8]
> https://github.com/wch/r-source/blob/12d1d2d232d84aa355e48b81180a0e2c6f2f/src/main/platform.c#L1915
> >  > > >
> >  > 

Re: [Rd] print.() not called when autoprinting

2019-05-21 Thread William Dunlap via R-devel
Letting a user supply the autoprint function would be nice also.  In a way
you can already do that, using addTaskCallback(), but that doesn't let you
suppress the standard autoprinting.

Having the default autoprinting do its own style of method dispatch doesn't
seem right.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Tue, May 21, 2019 at 10:50 AM Lionel Henry  wrote:

> FWIW it was the intention of the patch to make printing of unclassed
> functions consistent with other base types. This was documented in the
> "patch 3" section:
>
> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17398
>
> I think we need a general way to customise auto-printing for base types
> and even classed objects as that'd be useful for both users and IDEs.
>
> However S3 dispatch may not be optimal for this because it essentially
> requires polluting the global environment with print methods. Maybe
> it'd make sense to add getOption("autoprint") which should be set to
> a user- or environment- supplied function. That function would do the
> dispatch. I'd be happy to send a patch for this, if it makes sense.
>
> Best,
> Lionel
>
>
> > On 21 May 2019, at 13:38, William Dunlap via R-devel <
> r-devel@r-project.org> wrote:
> >
> > It also is a problem with storage.modes "integer" and "complex":
> >
> > 3.6.0> print.integer <- function(x,...) "integer vector"
> >3.6.0> 1:10
> > [1]  1  2  3  4  5  6  7  8  9 10
> > 3.6.0> print(1:10)
> > [1] "integer vector"
> > 3.6.0>
> > 3.6.0> print.complex <- function(x, ...) "complex vector"
> > 3.6.0> 1+2i
> > [1] 1+2i
> > 3.6.0> print(1+2i)
> > [1] "complex vector"
> >
> > Bill Dunlap
> > TIBCO Software
> > wdunlap tibco.com
> >
> >
> > On Tue, May 21, 2019 at 9:31 AM Martin Maechler <
> maech...@stat.math.ethz.ch>
> > wrote:
> >
> >>>>>>> William Dunlap via R-devel
> >>>>>>>on Thu, 16 May 2019 11:56:45 -0700 writes:
> >>
> >>> In R-3.6.0 autoprinting was changed so that print methods for the
> >> storage
> >>> modes are not called when there is no explicit class attribute.
> >> E.g.,
> >>
> >>> % R-3.6.0 --vanilla --quiet
> >>>> print.function <- function(x, ...) { cat("Function with argument
> >> list ");
> >>> cat(sep="\n", head(deparse(args(x)), -1)); invisible(x) }
> >>>> f <- function(x, ...) { sum( x * seq_along(x) ) }
> >>>> f
> >>> function(x, ...) { sum( x * seq_along(x) ) }
> >>>> print(f)
> >>> Function with argument list function (x, ...)
> >>
> >>> Previous to R-3.6.0 autoprinting did call such methods
> >>> % R-3.5.3 --vanilla --quiet
> >>>> print.function <- function(x, ...) { cat("Function with argument
> >> list ");
> >>> cat(sep="\n", head(deparse(args(x)), -1)); invisible(x) }
> >>>> f <- function(x, ...) { sum( x * seq_along(x) ) }
> >>>> f
> >>> Function with argument list function (x, ...)
> >>>> print(f)
> >>> Function with argument list function (x, ...)
> >>
> >>> Was this intentional?
> >>
> >> No, it was not.  ... and I've been the one committing the wrong change.
> >>
> >> ... Related to the NEWS entries which start
> >>
> >> "Changes in print.*() "
> >>
> >> Thank you Bill, for reporting
> >>
> >> It's amazing this has not been detected earlier by anybody.
> >>
> >> I think it is *only* for functions, not general
> >> print.() as you were suggesting - right?
> >>
> >> Martin
> >>
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] print.() not called when autoprinting

2019-05-21 Thread William Dunlap via R-devel
It also is a problem with storage.modes "integer" and "complex":

3.6.0> print.integer <- function(x,...) "integer vector"
3.6.0> 1:10
 [1]  1  2  3  4  5  6  7  8  9 10
3.6.0> print(1:10)
[1] "integer vector"
3.6.0>
3.6.0> print.complex <- function(x, ...) "complex vector"
3.6.0> 1+2i
[1] 1+2i
3.6.0> print(1+2i)
[1] "complex vector"

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Tue, May 21, 2019 at 9:31 AM Martin Maechler 
wrote:

> >>>>> William Dunlap via R-devel
> >>>>> on Thu, 16 May 2019 11:56:45 -0700 writes:
>
> > In R-3.6.0 autoprinting was changed so that print methods for the
> storage
> > modes are not called when there is no explicit class attribute.
>  E.g.,
>
> > % R-3.6.0 --vanilla --quiet
> >> print.function <- function(x, ...) { cat("Function with argument
> list ");
> > cat(sep="\n", head(deparse(args(x)), -1)); invisible(x) }
> >> f <- function(x, ...) { sum( x * seq_along(x) ) }
> >> f
> > function(x, ...) { sum( x * seq_along(x) ) }
> >> print(f)
> > Function with argument list function (x, ...)
>
> > Previous to R-3.6.0 autoprinting did call such methods
> > % R-3.5.3 --vanilla --quiet
> >> print.function <- function(x, ...) { cat("Function with argument
> list ");
> > cat(sep="\n", head(deparse(args(x)), -1)); invisible(x) }
> >> f <- function(x, ...) { sum( x * seq_along(x) ) }
> >> f
> > Function with argument list function (x, ...)
> >> print(f)
> > Function with argument list function (x, ...)
>
> > Was this intentional?
>
> No, it was not.  ... and I've been the one committing the wrong change.
>
> ... Related to the NEWS entries which start
>
>  "Changes in print.*() "
>
> Thank you Bill, for reporting
>
> It's amazing this has not been detected earlier by anybody.
>
> I think it is *only* for functions, not general
> print.() as you were suggesting - right?
>
> Martin
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] WISH: Built-in R session-specific universally unique identifier (UUID)

2019-05-20 Thread William Dunlap via R-devel
I think a machine-specific input, like the MAC address, to the UUID is
essential.  S+ used to make a seed for the random number generator based on
the the current time and process ID.  A customer complained that all
machines in his cluster generated the same random number stream.  The
machines were rebooted each night, simultaneously, and S+ was started
during the boot process so times and process ids were identical, hence the
seeds were identical.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Mon, May 20, 2019 at 4:48 PM Henrik Bengtsson 
wrote:

> # Proposal
>
> Provide a built-in mechanism for obtaining an identifier for the
> current R session, e.g.
>
> > Sys.info()[["session_uuid"]]
> [1] "4258db4d-d4fb-46b3-a214-8c762b99a443"
>
> The identifier should be "unique" in the sense that the probability
> for two R sessions(*) having the same identifier should be extremely
> small.  There's no need for reproducibility, i.e. the algorithm for
> producing the identifier may be changed at any time.
>
> (*) Two R sessions running at different times (seconds, minutes, days,
> years, ...) or on different machines (locally or anywhere in the
> world).
>
>
> # Use cases
>
> In parallel-processing workflows, R objects may be "exported"
> (serialized) to background R processes ("workers") for further
> processing.  In other workflows, objects may be saved to file to be
> reloaded in a future R session.  However, certain types of objects in
> R maybe only be relevant, or valid, in the R session that created
> them.  Attempts to use them in other R processes may give an obscure
> error or in the worst case produce garbage results.
>
> Having an identifier that is unique to each R process will make it
> possible to detect when an object is used in the wrong context.  This
> can be done by attaching the session identifier to the object.  For
> example,
>
> obj <- 42L
> attr(obj, "owner") <- Sys.info()[["session_uuid"]]
>
> With this, it is easy to validate the "ownership" later;
>
> stopifnot(identical(attr(obj, "owner"), Sys.info()[["session_uuid"]]))
>
> I argue that such an identifier should be part of base R for easy
> access and avoid each developer having to roll their own.
>
>
> # Possible implementation
>
> One proposal would be to bring in Simon Urbanek's 'uuid' package
> (https://cran.r-project.org/package=uuid) into base R.  This package
> provides:
>
> > uuid::UUIDgenerate()
> [1] "b7de6182-c9c1-47a8-b5cd-e5c8307a8efb"
>
> based on Theodore Ts'o's libuuid
> (https://mirrors.edge.kernel.org/pub/linux/utils/util-linux/).  From
> 'man uuid_generate':
>
> "The uuid_generate function creates a new universally unique
> identifier (UUID). The uuid will be generated based on high-quality
> randomness from /dev/urandom, if available. If it is not available,
> then uuid_generate will use an alternative algorithm which uses the
> current time, the local ethernet MAC address (if available), and
> random data generated using a pseudo-random generator.
> [...]
> The UUID is 16 bytes (128 bits) long, which gives approximately
> 3.4x10^38 unique values (there are approximately 10^80 elementary
> particles in the universe according to Carl Sagan's Cosmos). The new
> UUID can reasonably be considered unique among all UUIDs created on
> the local system, and among UUIDs created on other systems in the past
> and in the future."
>
> An alternative, that does not require adding a dependency on the
> libuuid library, would be to roll a poor man's version based on a set
> of semi-unique attributes, e.g.
>
> make_id <- function(...) {
>   args <- list(...)
>   saveRDS(args, file = f <- tempfile())
>   on.exit(file.remove(f))
>   unname(tools::md5sum(f))
> }
>
> session_id <- local({
>   id <- NULL
>   function() {
> if (is.null(id)) {
>   id <<- make_id(
> info= Sys.info(),
> pid = Sys.getpid(),
> tempdir = tempdir(),
> time= Sys.time(),
> random  = sample.int(.Machine$integer.max, size = 1L)
>   )
> }
> id
>   }
> })
>
> Example:
>
> > session_id()
> [1] "8d00b17384e69e7c9ecee47e0426b2a5"
>
> > session_id()
> [1] "8d00b17384e69e7c9ecee47e0426b2a5"
>
> /Henrik
>
> PS. Having a built-in make_id() function would be handy too, e.g. when
> creating object-specific identifiers for other purposes.
>
> PPS. It would be neat if there was an object, or connection, interface
> for tools::md5sum(), which currently only operates on files sitting on
> the file system. The digest package provides this functionality.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] print.() not called when autoprinting

2019-05-16 Thread William Dunlap via R-devel
In R-3.6.0 autoprinting was changed so that print methods for the storage
modes are not called when there is no explicit class attribute.   E.g.,

% R-3.6.0 --vanilla --quiet
> print.function <- function(x, ...) { cat("Function with argument list ");
cat(sep="\n", head(deparse(args(x)), -1)); invisible(x) }
> f <- function(x, ...) { sum( x * seq_along(x) ) }
> f
function(x, ...) { sum( x * seq_along(x) ) }
> print(f)
Function with argument list function (x, ...)

Previous to R-3.6.0 autoprinting did call such methods
% R-3.5.3 --vanilla --quiet
> print.function <- function(x, ...) { cat("Function with argument list ");
cat(sep="\n", head(deparse(args(x)), -1)); invisible(x) }
> f <- function(x, ...) { sum( x * seq_along(x) ) }
> f
Function with argument list function (x, ...)
> print(f)
Function with argument list function (x, ...)

Was this intentional?

Bill Dunlap
TIBCO Software
wdunlap tibco.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [r-devel] integrate over an infinite region produces wrong results depending on scaling

2019-04-14 Thread William Dunlap via R-devel
integrate(f, xmin, xmax) will have problems when f(x) is 0 over large parts
of (xmin,xmax).  It doesn't have any clues to where the non-zero regions
are.  It computes f(x) at 21 points at each step and if all of those are
zero (or some other constant?) for a few steps, it calls it a day.  If you
can narrow down the integration interval to the interesting part of the
function's domain you will get better results.

By the way, here is a way to see where integrate(f) evaluates f()  (the
keep.xy=TRUE argument doesn't seem to do anything).

> debugIntegrate <- function(f)
{
n_calls <- 0
x_args <- list()
other_args <- list()
value <- list()
function(x, ...) {
n_calls <<- n_calls + 1
x_args[[n_calls]] <<- x
other_args[[n_calls]] <<- list(...)
v <- f(x, ...)
value[[n_calls]] <<- v
v
}
}

> str(integrate(DF <- debugIntegrate(f), -Inf, 0, numstab = sc))
List of 5
 $ value   : num 1.5
 $ abs.error   : num 0.000145
 $ subdivisions: int 2
 $ message : chr "OK"
 $ call: language integrate(f = DF <- debugIntegrate(f), lower =
-Inf, upper = 0, numstab = sc)
 - attr(*, "class")= chr "integrate"
> curve(f(x, sc), min(unlist(environment(DF)$x_args)), 0, n = 501, main =
"Scaled f", bty = "n")
> with(environment(DF), points(unlist(x_args), unlist(value)))

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Sun, Apr 14, 2019 at 5:13 AM Andreï V. Kostyrka 
wrote:

> Dear all,
>
> This is the first time I am posting to the r-devel list. On
> StackOverflow, they suggested that the strange behaviour of integrate()
> was more bug-like. I am providing a short version of the question (full
> one with plots: https://stackoverflow.com/q/55639401).
>
> Suppose one wants integrate a function that is just a product of two
> density functions (like gamma). The support of the random variable is
> (-Inf, 0]. The scale parameter of the distribution is quite small
> (around 0.01), so often, the standard integration routine would fail to
> integrate a function that is non-zero on a very small section of the
> negative line (like [-0.02, -0.01], where it takes huge values, and
> almost 0 everywhere else). R’s integrate would often return the machine
> epsilon as a result. So I stretch the function around the zero by an
> inverse of the scale parameter, compute the integral, and then divide it
> by the scale. Sometimes, this re-scaling also failed, so I did both if
> the first result was very small.
>
> Today when integration of the rescaled function suddenly yielded a value
> of 1.5 instead of 3.5 (not even zero). The MWE is below.
>
> cons <- -0.020374721416129591
> sc <- 0.00271245601724757383
> sh <- 5.704
> f <- function(x, numstab = 1) dgamma(cons - x * numstab, shape = sh,
> scale = sc) * dgamma(-x * numstab, shape = sh, scale = sc) * numstab
>
> curve(f, -0.06, 0, n = 501, main = "Unscaled f", bty = "n")
> curve(f(x, sc), -0.06 / sc, 0, n = 501, main = "Scaled f", bty = "n")
>
> sum(f(seq(-0.08, 0, 1e-6))) * 1e-6 #  Checking by summation: 3.575294
> sum(f(seq(-30, 0, 1e-4), numstab = sc)) * 1e-4 # True value, 3.575294
> str(integrate(f, -Inf, 0)) # Gives 3.575294
> # $ value   : num 3.58
> # $ abs.error   : num 1.71e-06
> # $ subdivisions: int 10
> str(integrate(f, -Inf, 0, numstab = sc))
> # $ value   : num 1.5 # What?!
> # $ abs.error   : num 0.000145 # What?!
> # $ subdivisions: int 2
>
> It stop at just two subdivisions! The problem is, I cannot try various
> stabilising multipliers for the function because I have to compute this
> integral thousands of times for thousands of parameter values on
> thousands of sample windows for hundreds on models, so even in the
> super-computer cluster, this takes weeks. Besides that, reducing the
> rel.tol just to 1e-5 or 1e-6, helped a bit, but I am not sure whether
> this guarantees success (and reducing it to 1e-7 slowed down the
> computations in some cases). And I have looked at the Fortran code of
> the quadrature just to see the integration rule, and was wondering.
>
> How can I make sure that the integration routine will not produce such
> wrong results for such a function, and the integration will still be fast?
>
> Yours sincerely,
> Andreï V. Kostyrka
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Discrepancy between is.list() and is(x, "list")

2019-03-26 Thread William Dunlap via R-devel
I think this goes back to SV4 (c. late 1990's).  The is.  functions
are much older (c. mid 1970's) , from before any class system was in S.
is() and inherits() were introduced with the S4 class system and were meant
to escape from the prison made by ancient design choices.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Tue, Mar 26, 2019 at 2:11 PM Abs Spurdle  wrote:

> If I can merge this thread with the one I started yesterday...
>
> > "If the object does not have a class attribute, it has an implicit
> class..."
> > which I take to mean that if an object does have a class attribute it
> does not also have an implicit class.
> > I think this is reasonable behavior. Consider the "Date" class, which
> stores values as "numeric":
> > > class(Sys.Date())
> > [1] "Date"
> > > inherits(Sys.Date(),"numeric")
> > [1] FALSE
> > > class(unclass(Sys.Date()))
> > [1] "numeric"
> > > Sys.Date()%%2
> > Error in Ops.Date(Sys.Date(), 2) : %% not defined for "Date" objects
> > Letting the modulus operator (as one example) inherit the numeric class
> here could create problems.
>
> I disagree.
> A date object should probably extend integers rather than numerics, in the
> first place.
> However, if it extends numeric, then it extends numeric, otherwise it's a
> contradiction.
> So, inherits(Sys.Date(),"numeric") should return true.
>
> Modulo operators should be defined for both dates and numerics.
> However, the application of modulo operators to dates, is perhaps unclear,
> at least in the general case, anyway.
>
> > so instead of hitting utils:::head.function, it hits utils:::head.default
> > I also see this behavior at least as far aback as 3.5.1, so its not new
> to 3.5.3.
>
> These seem like significant design flaws.
> Implicit classes or whatever you want to call them, are clearly part of the
> class hierarchy.
>
> They should be included in inherits(), is() and standard method dispatch,
> regardless of whether they are part of the class vector or not.
>
> Also, is this something that was introduced in R 3.5.1?
> The only thing worse than a design flaw is a design flaw that isn't
> backward compatible.
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] POSIXlt$zone and $gmtoff questions

2019-03-08 Thread William Dunlap via R-devel
I've been searching for patterns in why some POSIXlt objects have the zone
and gmtoff components and some don't and why gmtoff is sometimes NA when
the zone is known.  Is there a pattern or is it just that the additional
fields and workarounds were added in an ad hoc way?

E.g.,  as.POSIXlt adds the zone and gmtoff components for all strings and
logical NA inputs if the time zone is not GMT or UTC

f <- function (lt)  {
stopifnot(inherits(lt, "POSIXlt"))
cat(format(lt), ", $zone=", deparse(lt$zone), ", $gmtoff=",
deparse(lt$gmtoff), "\n", sep = "")
}
f(as.POSIXlt("2018-03-08 16:31", tz="US/Pacific"))
# 2018-03-08 16:31:00, $zone="PST", $gmtoff=NA_integer_
f(as.POSIXlt(NA, tz="US/Pacific"))
# NA, $zone="", $gmtoff=NA_integer_
f(as.POSIXlt(NA_character_, tz="US/Pacific"))
# NA, $zone="", $gmtoff=NA_integer_


But in GMT or UTF it omits the zone and gmtoff components unless you give
it a single character NA

f(as.POSIXlt("2018-03-08 16:31", tz="GMT"))
# 2018-03-08 16:31:00, $zone=NULL, $gmtoff=NULL
f(as.POSIXlt(NA, tz="GMT"))
# NA, $zone=NULL, $gmtoff=NULL
f(as.POSIXlt(NA_character_, tz="GMT"))
# NA, $zone="", $gmtoff=NA_integer_


Another oddity is that as.POSIXlt(characterData, tz="not-GMT") fills the
gmtoff component with NAs even though the zone and isdst components give
the information required to figure out the gmtoff.  as.POSIXlt(POSIXctData)
does give proper values to gmtoff

f(as.POSIXlt("2019-03-08", tz="US/Pacific"))
# 2019-03-08, $zone="PST", $gmtoff=NA_integer_
f(as.POSIXlt(as.POSIXct("2019-03-08", tz="US/Pacific")))
# 2019-03-08, $zone="PST", $gmtoff=-28800L


Is this last an efficiency issue?

Bill Dunlap
TIBCO Software
wdunlap tibco.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] as.Date(Inf) displays as 'NA' but is actually 'Inf'

2019-03-05 Thread William Dunlap via R-devel
format.Date runs into trouble long before Inf:
  > as.Date("2018-03-05") + c(2147466052, 2147466053)
  [1] "5881580-07-11"  "-5877641-06-23"

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Tue, Mar 5, 2019 at 2:33 PM Gabriel Becker  wrote:

> Richard,
>
> Well others may chime in here, but from a mathematical point of view, the
> concept of "infinite days from right now" is well-defined, so it maybe a
> "valid" date in that sense, but what day and month it will be (year will be
> Inf) are indeterminate/not well defined. Those are rightfully, NA, it
> seems?
>
> I mean you could disallow dates to take Inf at all, ever. I don't feel
> strongly one way or the other about that, personally. That said, if inf
> dates are allowed, its not clear to me that displaying the "Formatted" date
> string as NA, even if the value isn't,  is wrong given it can't be
> determined for that "date" is. It could be displayed differently, I
> suppose, but all the ones I can think of off the top of my head would be
> problematic and probably break lots of formatted-dates parsing code out
> there in the wild (and in R, I would guess). Things like displaying
> "Inf-NA-NA", or just "Inf". Neither of those are going to handle a
> read-write round-trip well, I think.
>
> So my personal don't-really-have-a-hat-in-the-ring opinion would be to
> either leave it as is, or force as.Date(Inf, bla) to actually be NA.
>
> Best,
> ~G
>
> On Tue, Mar 5, 2019 at 12:06 PM Richard White  wrote:
>
> > Hi,
> >
> > I think I've discovered a bug in base R.
> >
> > Basically, when using 'Inf' as as 'Date', is is visually displayed as
> > 'NA', but R still treats it as 'Inf'. So it is very confusing to work
> > with, and can easily lead to errors:
> >
> > # Visually displays as NA
> >  > as.Date(Inf, origin="2018-01-01")
> > [1] NA
> >
> > # Visually displays as NA
> >  > str(as.Date(Inf, origin="2018-01-01"))
> > Date[1:1], format: NA
> >
> > # Is NOT NA
> >  > is.na(as.Date(Inf, origin="2018-01-01"))
> > [1] FALSE
> >
> > # Is still Inf
> >  > is.infinite(as.Date(Inf, origin="2018-01-01"))
> > [1] TRUE
> >
> > This gets really problematic when you are collapsing dates over groups
> > and you want to find the first date of a group. Because min() returns
> > Inf if there is no data:
> >
> > # Visually displays as NA
> >  > as.Date(min(), origin="2018-01-01")
> > [1] NA
> > Warning message: In min() : no non-missing arguments to min; returning
> Inf
> >
> > # Visually displays as NA
> >  > str(as.Date(min(), origin="2018-01-01"))
> > Date[1:1], format: NA
> > Warning message: In min() : no non-missing arguments to min; returning
> Inf
> >
> > # Is not NA
> >  > is.na(as.Date(min(), origin="2018-01-01"))
> > [1] FALSE
> > Warning message: In min() : no non-missing arguments to min; returning
> Inf
> >
> > # This is bad!
> >  > as.Date(min(), origin="2018-01-01") > "2018-01-01"
> > [1] TRUE
> > Warning message: In min() : no non-missing arguments to min; returning
> Inf
> >
> > Here is my sessionInfo():
> >
> >  > sessionInfo()
> > R version 3.5.0 (2018-04-23)
> > Platform: x86_64-pc-linux-gnu (64-bit)
> > Running under: Debian GNU/Linux 9 (stretch)
> > Matrix products: default
> > BLAS: /usr/lib/openblas-base/libblas.so.3
> > LAPACK: /usr/lib/libopenblasp-r0.2.19.so
> >
> > locale:
> > [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8
> > LC_MONETARY=C.UTF-8
> > [6] LC_MESSAGES=C LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] stats graphics grDevices utils datasets methods base loaded via a
> > namespace (and not attached):
> > [1] compiler_3.5.0 tools_3.5.0 yaml_2.1.19
> >
> >  > Sys.getlocale()
> > [1]
> >
> >
> "LC_CTYPE=C.UTF-8;LC_NUMERIC=C;LC_TIME=C.UTF-8;LC_COLLATE=C.UTF-8;LC_MONETARY=C.UTF-8;LC_MESSAGES=C;LC_PAPER=C.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C.UTF-8;LC_IDENTIFICATION=C"
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Intermittent crashes with inset `[<-` command

2019-02-27 Thread William Dunlap via R-devel
Valgrind (without gctorture) reports memory misuse:

% R --debugger=valgrind --debugger-args="--leak-check=full --num-callers=18"
...
> x <- 1:20
> y <- rep(letters[1:5], length(x) / 5L)
> for (i in 1:1000) {
+   # x[y == 'a'] <- x[y == 'b']
+   x <- `[<-`(x, y == 'a', x[y == 'b'])
+   cat(i, '')
+ }
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
29 30 31 32 33 34 35 36 37 ==4711== Invalid read of size 1
==4711==at 0x501A40F: Rf_xlength (Rinlinedfuns.h:542)
==4711==by 0x501A40F: VectorAssign (subassign.c:658)
==4711==by 0x501CDFE: do_subassign_dflt (subassign.c:1641)
==4711==by 0x5020100: do_subassign (subassign.c:1571)
==4711==by 0x4F66398: bcEval (eval.c:6795)
==4711==by 0x4F7D86D: R_compileAndExecute (eval.c:1407)
==4711==by 0x4F7DA70: do_for (eval.c:2185)
==4711==by 0x4F7741C: Rf_eval (eval.c:691)
==4711==by 0x4FA7181: Rf_ReplIteration (main.c:258)
==4711==by 0x4FA7570: R_ReplConsole (main.c:308)
==4711==by 0x4FA760E: run_Rmainloop (main.c:1082)
==4711==by 0x40075A: main (Rmain.c:29)
==4711==  Address 0x19b3ab90 is 0 bytes inside a block of size 160,048
free'd
==4711==at 0x4C2ACBD: free (vg_replace_malloc.c:530)
==4711==by 0x4FAFCB2: ReleaseLargeFreeVectors (memory.c:1055)
==4711==by 0x4FAFCB2: RunGenCollect (memory.c:1825)
==4711==by 0x4FAFCB2: R_gc_internal (memory.c:2998)
==4711==by 0x4FB166F: Rf_allocVector3 (memory.c:2682)
==4711==by 0x4FB2310: Rf_allocVector (Rinlinedfuns.h:577)
==4711==by 0x4FB2310: R_alloc (memory.c:2197)
==4711==by 0x5023F7A: logicalSubscript (subscript.c:575)
==4711==by 0x5026DA3: Rf_makeSubscript (subscript.c:994)
==4711==by 0x501A2F3: VectorAssign (subassign.c:656)
==4711==by 0x501CDFE: do_subassign_dflt (subassign.c:1641)
==4711==by 0x5020100: do_subassign (subassign.c:1571)
==4711==by 0x4F66398: bcEval (eval.c:6795)
==4711==by 0x4F7D86D: R_compileAndExecute (eval.c:1407)
==4711==by 0x4F7DA70: do_for (eval.c:2185)
==4711==by 0x4F7741C: Rf_eval (eval.c:691)
==4711==by 0x4FA7181: Rf_ReplIteration (main.c:258)
==4711==by 0x4FA7570: R_ReplConsole (main.c:308)
==4711==by 0x4FA760E: run_Rmainloop (main.c:1082)
==4711==by 0x40075A: main (Rmain.c:29)
==4711==  Block was alloc'd at
==4711==at 0x4C29BC3: malloc (vg_replace_malloc.c:299)
==4711==by 0x4FB1B04: Rf_allocVector3 (memory.c:2712)
==4711==by 0x5027574: Rf_allocVector (Rinlinedfuns.h:577)
==4711==by 0x5027574: Rf_ExtractSubset (subset.c:115)
==4711==by 0x502ADCD: VectorSubset (subset.c:198)
==4711==by 0x502ADCD: do_subset_dflt (subset.c:823)
==4711==by 0x502BE90: do_subset (subset.c:661)
==4711==by 0x4F7741C: Rf_eval (eval.c:691)
==4711==by 0x4F7BAC3: Rf_evalListKeepMissing (eval.c:2955)
==4711==by 0x50200CB: R_DispatchOrEvalSP (subassign.c:1535)
==4711==by 0x50200CB: do_subassign (subassign.c:1567)
==4711==by 0x4F66398: bcEval (eval.c:6795)
==4711==by 0x4F7D86D: R_compileAndExecute (eval.c:1407)
==4711==by 0x4F7DA70: do_for (eval.c:2185)
==4711==by 0x4F7741C: Rf_eval (eval.c:691)
==4711==by 0x4FA7181: Rf_ReplIteration (main.c:258)
==4711==by 0x4FA7570: R_ReplConsole (main.c:308)
==4711==by 0x4FA760E: run_Rmainloop (main.c:1082)
==4711==by 0x40075A: main (Rmain.c:29)
==4711==
==4711== Invalid read of size 8
==4711==at 0x501A856: XLENGTH_EX (Rinlinedfuns.h:189)
==4711==by 0x501A856: Rf_xlength (Rinlinedfuns.h:554)
==4711==by 0x501A856: VectorAssign (subassign.c:658)
==4711==by 0x501CDFE: do_subassign_dflt (subassign.c:1641)
==4711==by 0x5020100: do_subassign (subassign.c:1571)
==4711==by 0x4F66398: bcEval (eval.c:6795)
==4711==by 0x4F7D86D: R_compileAndExecute (eval.c:1407)
==4711==by 0x4F7DA70: do_for (eval.c:2185)
==4711==by 0x4F7741C: Rf_eval (eval.c:691)
==4711==by 0x4FA7181: Rf_ReplIteration (main.c:258)
==4711==by 0x4FA7570: R_ReplConsole (main.c:308)
==4711==by 0x4FA760E: run_Rmainloop (main.c:1082)
==4711==by 0x40075A: main (Rmain.c:29)
==4711==  Address 0x19b3abb0 is 32 bytes inside a block of size 160,048
free'd
==4711==at 0x4C2ACBD: free (vg_replace_malloc.c:530)
==4711==by 0x4FAFCB2: ReleaseLargeFreeVectors (memory.c:1055)
==4711==by 0x4FAFCB2: RunGenCollect (memory.c:1825)
==4711==by 0x4FAFCB2: R_gc_internal (memory.c:2998)
==4711==by 0x4FB166F: Rf_allocVector3 (memory.c:2682)
==4711==by 0x4FB2310: Rf_allocVector (Rinlinedfuns.h:577)
==4711==by 0x4FB2310: R_alloc (memory.c:2197)
==4711==by 0x5023F7A: logicalSubscript (subscript.c:575)
==4711==by 0x5026DA3: Rf_makeSubscript (subscript.c:994)
==4711==by 0x501A2F3: VectorAssign (subassign.c:656)
==4711==by 0x501CDFE: do_subassign_dflt (subassign.c:1641)
==4711==by 0x5020100: do_subassign (subassign.c:1571)
==4711==by 0x4F66398: bcEval (eval.c:6795)
==4711==by 0x4F7D86D: R_compileAndExecute (eval.c:1407)
==4711==by 

Re: [Rd] code for sum function

2019-02-20 Thread William Dunlap via R-devel
Someone said it used a possibly platform-dependent
higher-than-double-precision type.

By the way, in my example involving rep(1/3, n) I neglected to include the
most precise
way to calculate the sum: n%/%3 + (n%%3)/3.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Wed, Feb 20, 2019 at 2:45 PM Rampal Etienne 
wrote:

> Dear Will,
>
> This is exactly what I find.
> My point is thus that the sum function in R is not a naive sum nor a
> Kahansum (in all cases), but what algorithm is it using then?
>
> Cheers, Rampal
>
>
> On Tue, Feb 19, 2019, 19:08 William Dunlap 
>> The algorithm does make a differece.  You can use Kahan's summation
>> algorithm (https://en.wikipedia.org/wiki/Kahan_summation_algorithm) to
>> reduce the error compared to the naive summation algorithm.  E.g., in R
>> code:
>>
>> naiveSum <-
>> function(x) {
>>s <- 0.0
>>for(xi in x) s <- s + xi
>>s
>> }
>> kahanSum <- function (x)
>> {
>>s <- 0.0
>>c <- 0.0 # running compensation for lost low-order bits
>>for(xi in x) {
>>   y <- xi - c
>>   t <- s + y # low-order bits of y may be lost here
>>   c <- (t - s) - y
>>   s <- t
>>}
>>s
>> }
>>
>> > rSum <- vapply(c(1:20,10^(2:7)), function(n) sum(rep(1/7,n)), 0)
>> > rNaiveSum <- vapply(c(1:20,10^(2:7)), function(n) naiveSum(rep(1/7,n)),
>> 0)
>> > rKahanSum <- vapply(c(1:20,10^(2:7)), function(n) kahanSum(rep(1/7,n)),
>> 0)
>> >
>> > table(rSum == rNaiveSum)
>>
>> FALSE  TRUE
>>21 5
>> > table(rSum == rKahanSum)
>>
>> FALSE  TRUE
>> 323
>>
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>>
>> On Tue, Feb 19, 2019 at 10:36 AM Paul Gilbert 
>> wrote:
>>
>>> (I didn't see anyone else answer this, so ...)
>>>
>>> You can probably find the R code in src/main/ but I'm not sure. You are
>>> talking about a very simple calculation, so it seems unlike that the
>>> algorithm is the cause of the difference. I have done much more
>>> complicated things and usually get machine precision comparisons. There
>>> are four possibilities I can think of that could cause (small)
>>> differences.
>>>
>>> 0/ Your code is wrong, but that seems unlikely on such a simple
>>> calculations.
>>>
>>> 1/ You are summing a very large number of numbers, in which case the sum
>>> can become very large compared to numbers being added, then things can
>>> get a bit funny.
>>>
>>> 2/ You are using single precision in fortran rather than double. Double
>>> is needed for all floating point numbers you use!
>>>
>>> 3/ You have not zeroed the double precision numbers in fortran. (Some
>>> compilers do not do this automatically and you have to specify it.) Then
>>> if you accidentally put singles, like a constant 0.0 rather than a
>>> constant 0.0D+0, into a double you will have small junk in the lower
>>> precision part.
>>>
>>> (I am assuming you are talking about a sum of reals, not integer or
>>> complex.)
>>>
>>> HTH,
>>> Paul Gilbert
>>>
>>> On 2/14/19 2:08 PM, Rampal Etienne wrote:
>>> > Hello,
>>> >
>>> > I am trying to write FORTRAN code to do the same as some R code I
>>> have.
>>> > I get (small) differences when using the sum function in R. I know
>>> there
>>> > are numerical routines to improve precision, but I have not been able
>>> to
>>> > figure out what algorithm R is using. Does anyone know this? Or where
>>> > can I find the code for the sum function?
>>> >
>>> > Regards,
>>> >
>>> > Rampal Etienne
>>> >
>>> > __
>>> > R-devel@r-project.org mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] code for sum function

2019-02-19 Thread William Dunlap via R-devel
The algorithm does make a differece.  You can use Kahan's summation
algorithm (https://en.wikipedia.org/wiki/Kahan_summation_algorithm) to
reduce the error compared to the naive summation algorithm.  E.g., in R
code:

naiveSum <-
function(x) {
   s <- 0.0
   for(xi in x) s <- s + xi
   s
}
kahanSum <- function (x)
{
   s <- 0.0
   c <- 0.0 # running compensation for lost low-order bits
   for(xi in x) {
  y <- xi - c
  t <- s + y # low-order bits of y may be lost here
  c <- (t - s) - y
  s <- t
   }
   s
}

> rSum <- vapply(c(1:20,10^(2:7)), function(n) sum(rep(1/7,n)), 0)
> rNaiveSum <- vapply(c(1:20,10^(2:7)), function(n) naiveSum(rep(1/7,n)), 0)
> rKahanSum <- vapply(c(1:20,10^(2:7)), function(n) kahanSum(rep(1/7,n)), 0)
>
> table(rSum == rNaiveSum)

FALSE  TRUE
   21 5
> table(rSum == rKahanSum)

FALSE  TRUE
323


Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Tue, Feb 19, 2019 at 10:36 AM Paul Gilbert  wrote:

> (I didn't see anyone else answer this, so ...)
>
> You can probably find the R code in src/main/ but I'm not sure. You are
> talking about a very simple calculation, so it seems unlike that the
> algorithm is the cause of the difference. I have done much more
> complicated things and usually get machine precision comparisons. There
> are four possibilities I can think of that could cause (small) differences.
>
> 0/ Your code is wrong, but that seems unlikely on such a simple
> calculations.
>
> 1/ You are summing a very large number of numbers, in which case the sum
> can become very large compared to numbers being added, then things can
> get a bit funny.
>
> 2/ You are using single precision in fortran rather than double. Double
> is needed for all floating point numbers you use!
>
> 3/ You have not zeroed the double precision numbers in fortran. (Some
> compilers do not do this automatically and you have to specify it.) Then
> if you accidentally put singles, like a constant 0.0 rather than a
> constant 0.0D+0, into a double you will have small junk in the lower
> precision part.
>
> (I am assuming you are talking about a sum of reals, not integer or
> complex.)
>
> HTH,
> Paul Gilbert
>
> On 2/14/19 2:08 PM, Rampal Etienne wrote:
> > Hello,
> >
> > I am trying to write FORTRAN code to do the same as some R code I have.
> > I get (small) differences when using the sum function in R. I know there
> > are numerical routines to improve precision, but I have not been able to
> > figure out what algorithm R is using. Does anyone know this? Or where
> > can I find the code for the sum function?
> >
> > Regards,
> >
> > Rampal Etienne
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] nlminb with constraints failing on some platforms

2019-02-02 Thread William Dunlap via R-devel
Microsoft R Open 3.4.2
The enhanced R distribution from Microsoft
Microsoft packages Copyright (C) 2017 Microsoft Corporation

Using the Intel MKL for parallel mathematical computing (using 12 cores).

Default CRAN mirror snapshot taken on 2017-10-15.
See: https://mran.microsoft.com/.

> f <- function(x) sum( log(diff(x)^2+.01) + (x[1]-1)^2 )
> opt <- nlminb(rep(0, 10), f, lower=-1, upper=3)
> xhat <- rep(1, 10)
> abs( opt$objective - f(xhat) ) < 1e-4  ## Must be TRUE
[1] FALSE
> opt$objective - f(xhat)
[1] 3.696533
> str(opt)
List of 6
 $ par: num [1:10] 0.797 0.303 0.285 0.271 0.258 ...
 $ objective  : num -37.7
 $ convergence: int 1
 $ iterations : int 150
 $ evaluations: Named int [1:2] 155 1611
  ..- attr(*, "names")= chr [1:2] "function" "gradient"
 $ message: chr "iteration limit reached without convergence (10)"

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Tue, Jan 29, 2019 at 3:59 AM Kasper Kristensen via R-devel <
r-devel@r-project.org> wrote:

> I've noticed unstable behavior of nlminb on some Linux systems. The
> problem can be reproduced by compiling R-3.5.2 using gcc-8.2 and running
> the following snippet:
>
> f <- function(x) sum( log(diff(x)^2+.01) + (x[1]-1)^2 )
> opt <- nlminb(rep(0, 10), f, lower=-1, upper=3)
> xhat <- rep(1, 10)
> abs( opt$objective - f(xhat) ) < 1e-4  ## Must be TRUE
>
> The example works perfectly when removing the bounds. However, when bounds
> are added the snippet returns 'FALSE'.
>
> An older R version (3.4.4), compiled using the same gcc-8.2, did not have
> the problem. Between the two versions R has changed the flags to compile
> Fortran sources:
>
> < SAFE_FFLAGS = -O2 -fomit-frame-pointer -ffloat-store
> ---
> > SAFE_FFLAGS = -O2 -fomit-frame-pointer -msse2 -mfpmath=sse
>
> Reverting to the old SAFE_FFLAGS 'solves' the problem.
>
> > sessionInfo()
> R version 3.5.2 (2018-12-20)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Scientific Linux release 6.4 (Carbon)
>
> Matrix products: default
> BLAS/LAPACK:
> /zdata/groups/nfsopt/intel/2018update3/compilers_and_libraries_2018.3.222/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] compiler_3.5.2
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Runnable R packages

2019-02-01 Thread William Dunlap via R-devel
To download a package with all its dependencies and install it, use the
install.packages() functions instead of 'R CMD INSTALL'.  E.g., in bash:

mkdir /tmp/libJunk
env R_LIBS_SITE=libJunk R --quiet -e 'if
(!requireNamespace("purrr",quietly=TRUE)) install.packages("purrr")'

For corporate "production use" you probably want to set up your own
repository containing
fixed versions of packages instead of using CRAN.  Then edd repos="..." to
the install.packages()
call.  Of course you can put this into a package and somehow deal with the
bootstrapping issue.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Thu, Jan 31, 2019 at 8:04 AM David Lindelof  wrote:

> Would you care to share how your package installs its own dependencies? I
> assume this is done during the call to `main()`? (Last time I checked, R
> CMD INSTALL would not install a package's dependencies...)
>
>
> On Thu, Jan 31, 2019 at 4:38 PM Barry Rowlingson <
> b.rowling...@lancaster.ac.uk> wrote:
>
> >
> >
> > On Thu, Jan 31, 2019 at 3:14 PM David Lindelof 
> wrote:
> >
> >>
> >> In summary, I'm convinced R would benefit from something similar to
> Java's
> >> `Main-Class` header or Python's `__main__()` function. A new R CMD
> command
> >> would take a package, install its dependencies, and run its "main"
> >> function.
> >
> >
> >
> > I just created and built a very boilerplate R package called "runme". I
> > can install its dependencies and run its "main" function with:
> >
> >  $ R CMD INSTALL runme_0.0.0.9000.tar.gz
> >  $ R -e 'runme::main()'
> >
> > No new R CMDs needed. Now my choice of "main" is arbitrary, whereas with
> > python and java and C the entrypoint is more tightly specified (__name__
> ==
> > "__main__" in python, int main(..) in C and so on). But I don't think
> > that's much of a problem.
> >
> > Does that not satisfy your requirements close enough? If you want it in
> > one line then:
> >
> > R CMD INSTALL runme_0.0.0.9000.tar.gz && R -e 'runme::main()'
> >
> > will do the second if the first succeeds (Unix shells).
> >
> > You could write a script for $RHOME/bin/RUN which would be a two-liner
> and
> > that could mandate the use of "main" as an entry point. But good luck
> > getting anything into base R.
> >
> > Barry
> >
> >
> >
> >
> >> If we have this machinery available, we could even consider
> >> reaching out to Spark (and other tech stacks) developers and make it
> >> easier
> >> to develop R applications for those platforms.
> >>
> >>
> >
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] setClass accepts slot-mismatch between slots and prototype arguments

2019-01-10 Thread William Dunlap via R-devel
I was installing the 'diffobj' package into TERR and got an error from the
call
StyleSummary <- setClass("StyleSummary",
  slots=c(container="ANY", body="ANY", map="ANY"),
  prototype=list(
container=function(x) sprintf("\n%s\n", paste0(x, collapse="")),
body=identity,
detail=function(x) sprintf("\n%s\n", paste0("  ", x, collapse="")),
map=function(x) sprintf("\n%s", paste0("  ", x, collapse="\n"))
  ))
because the prototype contained components not in the slots list.  R does
not complain about the mismatch, but new("StyleSummary") does name make
something with a 'detail' slot.  Should this be an error?

I suspect that the package writer intended to include 'detail' in the slots
argument.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] history of objects() and ls()

2019-01-03 Thread William Dunlap via R-devel
S-PLUS took it from S, sometime in the early 1990's.  The "White Book"
("Statistical Models in S", Chambers and Hastie, eds.,1992), uses objects()
on p.88..

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Thu, Jan 3, 2019 at 4:47 PM Peter Dalgaard  wrote:

> As far as I remember, this comes from S-PLUS, introduced around v.3 (white
> book?) or maybe v.4, and due to a desire to cut some Unix ties as MS-DOS
> was taking over the world. However, it was long ago, in a different world,
> and besides, S-PLUS is dead (mostly).
>
> - Peter
>
> > On 4 Jan 2019, at 00:45 , Ben Bolker  wrote:
> >
> >
> >  I found out today (maybe I had known sometime before??) that objects()
> > is a synonym for ls().  I'm curious about the history, which seems to go
> > at least back to the beginning of R.  It's been thus since SVN revision
> > 2 (Sep 1997) ...
> >
> > svn cat https://svn.r-project.org/R/trunk/src/library/base/R/attach@2 |
> > grep objects
> >
> >  I had a quick look at the Becker & Chambers brown book (1984) and
> > Becker and Wilks blue book (1988) on Google books and could find ls but
> > not objects() ... ?
> >
> >  Anyone happen to know?
> >
> > cheers
> >   Ben Bolker
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd@cbs.dk  Priv: pda...@gmail.com
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] formula(model.frame(..)) is misleading

2018-12-21 Thread William Dunlap via R-devel
I don't have a copy of SV4 (or SV3, where model.frame was introduced), but
S+ 8.3 (based on SV4) puts the class "model.frame" on model.frame()'s
return value but has no methods (in the default packages) for class
"model.frame".  Perhaps that is why R omitted the class.

However, S+ 8.3's (and problably S's) formula.data.frame did look for a
"terms" attribute of a data.frame before making up an additive formula
based on the column names of a data.frame:

Splus-8.3> formula.data.frame
function(object)
{
if(length(tms <- attr(object, "terms")))
return(formula(tms))
n <- names(object)
f <- paste(n[-1.], collapse = "+")
f <- parse(text = paste(n[1.], f, sep = "~"))[[1.]]
oldClass(f) <- "formula"
f
}



Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Fri, Dec 21, 2018 at 8:16 AM Fox, John  wrote:

> Dear Martin,
>
> Since no one else has picked up on this, I’ll take a crack at it:
>
> The proposal is to define the S3 class of model-frame objects as
> c(“model.frame”, “data.frame”) (not the formal class of these objects, even
> though this feature was coincidentally introduced in S4). That’s unlikely
> to do harm, since model frames would still “inherit” data.frame methods.
>
> It's possible that some packages rely on current data.frame methods that
> are eventually superseded by specific model.frame methods or do something
> peculiar with the class of model frames, so as far as I can see, one can’t
> know whether problems will arise before trying it.
>
> I hope that helps,
>  John
>
>   -
>   John Fox, Professor Emeritus
>   McMaster University
>   Hamilton, Ontario, Canada
>   Web: http::/socserv.mcmaster.ca/jfox
>
> > On Dec 21, 2018, at 2:51 AM, Martin Maechler 
> wrote:
> >
> >>>>>> William Dunlap via R-devel
> >>>>>>on Thu, 20 Dec 2018 15:09:56 -0800 writes:
> >
> >> When formula() is applied to the output of model.frame()
> >> it ignores the formula in the model.frame's 'terms'
> >> attribute:
> >
> >>> d <- data.frame(A=log(1:6), B=LETTERS[rep(1:2,c(2,4))],
> >>> C=1/(1:6),
> >> D=rep(letters[25:26],c(4,2)), Y=1:6)
> >>> m0 <- model.frame(data=d, Y ~ A:B) formula(m0)
> >>  Y ~ A + B
> >>> `attributes<-`(terms(m0), value=NULL)
> >>  Y ~ A:B
> >
> >> This is in part because model.frame()'s output has class
> >> "data.frame" instread of c("model.frame","data.frame"), as
> >> SV4 did, so there are no methods for model.frames.
> >
> >> Is there a reason that model.frame() returns a data.frame
> >> with extra attributes but no special class or is it just
> >> an oversight?
> >
> > May guess is "oversight" || "well let's keep it simple"
> > Do you (all readers) see situation where it could harm now (with
> > the 20'000 packages on CRAN+BIoc+...) to do as SV4 (S version 4) has
> been doing?
> >
> > I'd be sympathetic to class()ing it.
> > Martin
> >
> >> Bill Dunlap TIBCO Software wdunlap tibco.com
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] formula(model.frame(..)) is misleading

2018-12-20 Thread William Dunlap via R-devel
When formula() is applied to the output of model.frame() it ignores the
formula in the model.frame's 'terms' attribute:

  > d <- data.frame(A=log(1:6), B=LETTERS[rep(1:2,c(2,4))], C=1/(1:6),
D=rep(letters[25:26],c(4,2)), Y=1:6)
  > m0 <- model.frame(data=d, Y ~ A:B)
  > formula(m0)
  Y ~ A + B
  > `attributes<-`(terms(m0), value=NULL)
  Y ~ A:B

This is in part because model.frame()'s output has class "data.frame"
instread of c("model.frame","data.frame"), as SV4 did, so there are no
methods for model.frames.

Is there a reason that model.frame() returns a data.frame with extra
attributes but no special class or is it just an oversight?

Bill Dunlap
TIBCO Software
wdunlap tibco.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] order(decreasing=c(TRUE,FALSE),...)

2018-12-04 Thread William Dunlap via R-devel
The NEWS file for R-devel (as of  2018-11-28 r75702) says

 • order(, decreasing=c(TRUE,FALSE)) could fail in some cases.
Reported from StackOverflow via Karl Nordström.

However, either I don't understand the meaning of decreasing=c(TRUE,FALSE)
or there are still problems.  I thought order(x,y,decreasing=c(TRUE,FALSE)
meant to return indices, i, such that x[i] was non-increasing and that ties
among the x's would be broken by y in non-decreasing order.  E.g., that
interpretation works for numeric vectors:
  > d <- data.frame(X=c(2,1,2,1,2,2,1), N=c(4:7,1:3))
  > d[order(d$X, d$N, decreasing=c(TRUE, FALSE)), ] # expect decreasing X
and, within group of tied Xes, increasing N
X N
  5 2 1
  6 2 2
  1 2 4
  3 2 6
  7 1 3
  2 1 5
  4 1 7
But it fails for character vectors:  E.g., add some of those that have the
same sort order as 'N':

  > d$Char <- LETTERS[d$N]
  > identical(order(d$N), order(d$Char)) # expect TRUE
  [1] TRUE

I expected the new columns to give the same sort order when they replace
'd$N' in the first call to order, but they do not:  It acts like it would
with decreasing=c(TRUE,TRUE).

  > order(d$X, d$Char, decreasing=c(TRUE, FALSE))
  [1] 3 1 6 5 4 2 7
  > d[order(d$X, d$Char, decreasing=c(TRUE, FALSE)), ]
X N Char
  3 2 6F
  1 2 4D
  6 2 2B
  5 2 1A
  4 1 7G
  2 1 5E
  7 1 3C

Bill Dunlap
TIBCO Software
wdunlap tibco.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] named arguments discouraged in `[.data.frame` and `[<-.data.frame`

2018-11-28 Thread William Dunlap via R-devel
They can get bitten in the last two lines of this example, where the 'x'
argument is not first:
> d <- data.frame(C1=c(r1=11,r2=21,r3=31), C2=c(12,22,32))
> d[1,1:2]
   C1 C2
r1 11 12
> `[`(d,j=1:2,i=1)
   C1 C2
r1 11 12
Warning message:
In `[.data.frame`(d, j = 1:2, i = 1) :
  named arguments other than 'drop' are discouraged
> `[`(j=1:2,d,i=1)
Error in (1:2)[d, i = 1] : incorrect number of dimensions
> do.call("[", list(j=1:2, i=1, x=d))
Error in 1:2[i = 1, x = list(C1 = c(11, 21, 31), C2 = c(12, 22, 32))] :
  incorrect number of dimensions

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Wed, Nov 28, 2018 at 11:30 AM Henrik Pärn  wrote:

> tl;dr:
>
> Why are named arguments discouraged in `[.data.frame`, `[<-.data.frame`
> and `[[.data.frame`?
>
> (because this question is of the kind 'why is R designed like this?', I
> though R-devel would be more appropriate than R-help)
>
> #
>
> Background:
>
> Now and then students presents there fancy functions like this:
>
> myfancyfun(d,12,0.3,0.2,500,1000,FALSE,TRUE,FALSE,TRUE,FALSE)
>
> Incomprehensible. Thus, I encourage them to use spaces and name arguments,
> _at least_ when trying to communicate their code with others. Something
> like:
>
> myfancyfun(data = d, n = 12, gamma = 0.3, prob = 0.2,
>   size = 500, niter = 1000, model = FALSE,
>  scale = TRUE, drop = FALSE, plot = TRUE, save = FALSE)
>
>
> Then some overzealous students started to use named arguments everywhere.
> E-v-e-r-y-w-h-e-r-e. Even in the most basic situation when indexing vectors
> (as a subtle protest?), like:
>
> vec <- 1:9
>
> vec[i = 4]
> `[`(x = vec, i = 4)
>
> vec[[i = 4]]
> `[[`(x = vec, i = 4)
>
> vec[i = 4] <- 10
> `[<-`(x = vec, i = 4, value = 10)
>
> ...or when indexing matrices:
>
> m <- matrix(vec, ncol = 3)
> m[i = 2, j = 2]
> `[`(x = m, i = 2, j = 2)
> # 5
>
> m[i = 2, j = 2] <- 0
> `[<-`(x = m, i = 2, j = 2, value = 0)
>
> ##
>
> This practice indeed feels like overkill, but it didn't seem to hurt
> either. Until they used it on data frames. Then suddenly warnings appeared
> that named arguments are discouraged:
>
> d <- data.frame(m)
>
> d[[i = "X2"]]
> # [1] 4 5 6
> # Warning message:
> # In `[[.data.frame`(d, i = "X2") :
> #  named arguments other than 'exact' are discouraged
>
> d[i = 2, j = 2]
> # [1] 0
> # Warning message:
> # In `[.data.frame`(d, i = 2, j = 2) :
> #  named arguments other than 'drop' are discouraged
>
> d[i = 2, j = 2] <- 5
> # Warning message:
> # In `[<-.data.frame`(`*tmp*`, i = 2, j = 2, value = 5) :
> #  named arguments are discouraged
>
>
> ##
>
> Of course I could tell them "don't do it, it's overkill and not common
> practice" or "it's just a warning, don't worry". However, I assume the
> warnings are there for a good reason.
>
> So how do I explain to the students that named arguments are actively
> discouraged in `[.data.frame` and `[<-.data.frame`, but not in `[` and
> `[<-`? When will they get bitten?
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] unlockEnvironment()?

2018-10-10 Thread William Dunlap via R-devel
R lets one lock an environment with both an R function,
base::lockEnvironment, and a C function, R_LockEnvironment, but, as far as
I can tell, no corresponding function to unlock an environment.  Is this
omission on principle or just something that has not been done yet?

I ask because several packages, including the well-used R6 and rlang
packages, fiddle with some bits in with SET_ENVFLAGS and ENVFLAGS to unlock
an environment.  (See grep output below.)

About 5000 (1/3 of CRAN) packages depend on R6 or rlang.  Should R supply a
more disciplined way of unlocking an environment?

Bill Dunlap
TIBCO Software
wdunlap tibco.com

$ { find . -type f -print0 | xargs -0 grep -n -C 2 ENVFLAGS ; } 2>/dev/null
./R6/tests/manual/encapsulation.R-5-unlockEnvironment <-
cfunction(signature(env = "environment"), body = '
./R6/tests/manual/encapsulation.R-6-  #define FRAME_LOCK_MASK (1<<14)
./R6/tests/manual/encapsulation.R:7:  #define FRAME_IS_LOCKED(e)
(ENVFLAGS(e) & FRAME_LOCK_MASK)
./R6/tests/manual/encapsulation.R:8:  #define UNLOCK_FRAME(e)
SET_ENVFLAGS(e, ENVFLAGS(e) & (~ FRAME_LOCK_MASK))
./R6/tests/manual/encapsulation.R-9-
./R6/tests/manual/encapsulation.R-10-  if (TYPEOF(env) == NILSXP)
./BMA/R/iBMA.glm.R-21-*/
./BMA/R/iBMA.glm.R-22-#define FRAME_LOCK_MASK (1<<14)
./BMA/R/iBMA.glm.R:23:#define FRAME_IS_LOCKED(e) (ENVFLAGS(e) &
FRAME_LOCK_MASK)
./BMA/R/iBMA.glm.R:24:#define UNLOCK_FRAME(e) SET_ENVFLAGS(e, ENVFLAGS(e) &
(~ FRAME_LOCK_MASK))
./BMA/R/iBMA.glm.R-25-'
./BMA/R/iBMA.glm.R-26-
--
./BMA/R/iBMA.surv.R-22-*/
./BMA/R/iBMA.surv.R-23-#define FRAME_LOCK_MASK (1<<14)
./BMA/R/iBMA.surv.R:24:#define FRAME_IS_LOCKED(e) (ENVFLAGS(e) &
FRAME_LOCK_MASK)
./BMA/R/iBMA.surv.R:25:#define UNLOCK_FRAME(e) SET_ENVFLAGS(e, ENVFLAGS(e)
& (~ FRAME_LOCK_MASK))
./BMA/R/iBMA.surv.R-26-'
./BMA/R/iBMA.surv.R-27-
./pkgload/src/unlock.c-20-*/
./pkgload/src/unlock.c-21-#define FRAME_LOCK_MASK (1 << 14)
./pkgload/src/unlock.c:22:#define FRAME_IS_LOCKED(e) (ENVFLAGS(e) &
FRAME_LOCK_MASK)
./pkgload/src/unlock.c:23:#define UNLOCK_FRAME(e) SET_ENVFLAGS(e,
ENVFLAGS(e) & (~FRAME_LOCK_MASK))
./pkgload/src/unlock.c-24-
./pkgload/src/unlock.c-25-extern SEXP R_TrueValue;
./SOD/src/tmp.cpp-11394-SEXP (ENCLOS)(SEXP x);
./SOD/src/tmp.cpp-11395-SEXP (HASHTAB)(SEXP x);
./SOD/src/tmp.cpp:11396:int (ENVFLAGS)(SEXP x);
./SOD/src/tmp.cpp:11397:void (SET_ENVFLAGS)(SEXP x, int v);
./SOD/src/tmp.cpp-11398-void SET_FRAME(SEXP x, SEXP v);
./SOD/src/tmp.cpp-11399-void SET_ENCLOS(SEXP x, SEXP v);
--
./SOD/src/tmp.h-11393-SEXP (ENCLOS)(SEXP x);
./SOD/src/tmp.h-11394-SEXP (HASHTAB)(SEXP x);
./SOD/src/tmp.h:11395:int (ENVFLAGS)(SEXP x);
./SOD/src/tmp.h:11396:void (SET_ENVFLAGS)(SEXP x, int v);
./SOD/src/tmp.h-11397-void SET_FRAME(SEXP x, SEXP v);
./SOD/src/tmp.h-11398-void SET_ENCLOS(SEXP x, SEXP v);

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] memory footprint of readRDS()

2018-09-18 Thread William Dunlap via R-devel
The ratio of object size to rds file size depends on the object.  Some
variation is due to how header information is stored in memory and in the
file but I suspect most is due to how compression works (e.g., a vector of
repeated values can be compressed into a smaller file than a bunch of
random bytes).

f <- function (data, ...)  {
force(data)
tf <- tempfile()
on.exit(unlink(tf))
save(data, file = tf)
c(`obj/file size` = as.numeric(object.size(data)/file.size(tf)))
}

> f(rep(0,1e6))
obj/file size
 1021.456
> f(rep(0,1e6), compress=FALSE)
obj/file size
0.986
> f(rep(89.7,1e6))
obj/file size
 682.6555
> f(log(1:1e6))
obj/file size
 1.309126
> f(vector("list",1e6))
obj/file size
 2021.744
> f(as.list(log(1:1e6)))
obj/file size
 8.907579
> f(sample(as.raw(0:255),size=8e6,replace=TRUE))
obj/file size
0.9998433
> f(rep(as.raw(0:255),length=8e6))
obj/file size
 254.5595
> f(as.character(1:1e6))
obj/file size
  23.5567



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Sep 18, 2018 at 8:28 AM, Joris Meys  wrote:

> Dear all,
>
> I tried to read in a 3.8Gb RDS file on a computer with 16Gb available
> memory. To my astonishment, the memory footprint of R rises quickly to over
> 13Gb and the attempt ends with an error that says "cannot allocate vector
> of size 5.8Gb".
>
> I would expect that 3 times the memory would be enough to read in that
> file, but apparently I was wrong. I checked the memory.limit() and that one
> gave me a value of more than 13Gb. So I wondered if this was to be
> expected, or if there could be an underlying reason why this file doesn't
> want to open.
>
> Thank you in advance
> Joris
>
> --
> Joris Meys
> Statistical consultant
>
> Department of Data Analysis and Mathematical Modelling
> Ghent University
> Coupure Links 653, B-9000 Gent (Belgium)
>  9000+Gent,%C2%A0Belgium=gmail=g>
>
> ---
> Biowiskundedagen 2017-2018
> http://www.biowiskundedagen.ugent.be/
>
> ---
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ROBUSTNESS: x || y and x && y to give warning/error if length(x) != 1 or length(y) != 1

2018-08-30 Thread William Dunlap via R-devel
Should the following two functions should always give the same result,
except for possible differences in the 'call' component of the warning
or error message?:

  f0 <- function(x, y) x || y
  f1 <- function(x, y) if (x) { TRUE } else { if (y) {TRUE } else { FALSE }
}

And the same for the 'and' version?

  g0 <- function(x, y) x && y
  g1 <- function(x, y) if (x) { if (y) { TRUE } else { FALSE } } else {
FALSE }

The proposal is to make them act the same when length(x) or length(y) is
not 1.
Should they also act the same when x or y is NA?  'if (x)' currently stops
if is.na(x)
and 'x||y' does not.  Or should we continue with 'if' restricted to
bi-valued
logical and '||' and '&&' handling tri-valued logic?



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, Aug 30, 2018 at 7:16 AM, Hadley Wickham  wrote:

> I think this is an excellent idea as it eliminates a situation which
> is almost certainly user error. Making it an error would break a small
> amount of existing code (even if for the better), so perhaps it should
> start as a warning, but be optionally upgraded to an error. It would
> be nice to have a fixed date (R version) in the future when the
> default will change to error.
>
> In an ideal world, I think the following four cases should all return
> the same error:
>
> if (logical()) 1
> #> Error in if (logical()) 1: argument is of length zero
> if (c(TRUE, TRUE)) 1
> #> Warning in if (c(TRUE, TRUE)) 1: the condition has length > 1 and only
> the
> #> first element will be used
> #> [1] 1
>
> logical() || TRUE
> #> [1] TRUE
> c(TRUE, TRUE) || TRUE
> #> [1] TRUE
>
> i.e. I think that `if`, `&&`, and `||` should all check that their
> input is a logical (or numeric) vector of length 1.
>
> Hadley
>
> On Tue, Aug 28, 2018 at 10:03 PM Henrik Bengtsson
>  wrote:
> >
> > # Issue
> >
> > 'x || y' performs 'x[1] || y' for length(x) > 1.  For instance (here
> > using R 3.5.1),
> >
> > > c(TRUE, TRUE) || FALSE
> > [1] TRUE
> > > c(TRUE, FALSE) || FALSE
> > [1] TRUE
> > > c(TRUE, NA) || FALSE
> > [1] TRUE
> > > c(FALSE, TRUE) || FALSE
> > [1] FALSE
> >
> > This property is symmetric in LHS and RHS (i.e. 'y || x' behaves the
> > same) and it also applies to 'x && y'.
> >
> > Note also how the above truncation of 'x' is completely silent -
> > there's neither an error nor a warning being produced.
> >
> >
> > # Discussion/Suggestion
> >
> > Using 'x || y' and 'x && y' with a non-scalar 'x' or 'y' is likely a
> > mistake.  Either the code is written assuming 'x' and 'y' are scalars,
> > or there is a coding error and vectorized versions 'x | y' and 'x & y'
> > were intended.  Should 'x || y' always be considered an mistake if
> > 'length(x) != 1' or 'length(y) != 1'?  If so, should it be a warning
> > or an error?  For instance,
> > '''r
> > > x <- c(TRUE, TRUE)
> > > y <- FALSE
> > > x || y
> >
> > Error in x || y : applying scalar operator || to non-scalar elements
> > Execution halted
> >
> > What about the case where 'length(x) == 0' or 'length(y) == 0'?  Today
> > 'x || y' returns 'NA' in such cases, e.g.
> >
> > > logical(0) || c(FALSE, NA)
> > [1] NA
> > > logical(0) || logical(0)
> > [1] NA
> > > logical(0) && logical(0)
> > [1] NA
> >
> > I don't know the background for this behavior, but I'm sure there is
> > an argument behind that one.  Maybe it's simply that '||' and '&&'
> > should always return a scalar logical and neither TRUE nor FALSE can
> > be returned.
> >
> > /Henrik
> >
> > PS. This is in the same vein as
> > https://mailman.stat.ethz.ch/pipermail/r-devel/2017-March/073817.html
> > - in R (>=3.4.0) we now get that if (1:2 == 1) ... is an error if
> > _R_CHECK_LENGTH_1_CONDITION_=true
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>
> --
> http://hadley.nz
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Where does L come from?

2018-08-27 Thread William Dunlap via R-devel
Rich Calaway pointed out that S4 came out c. 1996-97, not 1991.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Sun, Aug 26, 2018 at 8:30 PM, William Dunlap  wrote:

> >  the lack of a decimal place had historically not been significant
>
> Version 4 of S (c. 1991) and versions of S+ based on it treated a sequence
> of digits without a decimal  point as integer.
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Sat, Aug 25, 2018 at 4:33 PM, Duncan Murdoch 
> wrote:
>
>> On 25/08/2018 4:49 PM, Hervé Pagès wrote:
>>
>>> The choice of the L suffix in R to mean "R integer type", which
>>> is mapped to the "int" type at the C level, and NOT to the "long int"
>>> type, is really unfortunate as it seems to be misleading and confusing
>>> a lot of people.
>>>
>>
>> Can you provide any evidence of that (e.g. a link to a message from one
>> of these people)?  I think a lot of people don't really know about the L
>> suffix, but that's different from being confused or misleaded by it.
>>
>> And if you make a criticism like that, it would really be fair to suggest
>> what R should have done instead.  I can't think of anything better, given
>> that "i" was already taken, and that the lack of a decimal place had
>> historically not been significant.  Using "I" *would* have been confusing
>> (3i versus 3I being very different).  Deciding that 3 suddenly became an
>> integer value different from 3. would have led to lots of inefficient
>> conversions (since stats mainly deals with floating point values).
>>
>> Duncan Murdoch
>>
>>
>>
>>> The fact that nowadays "int" and "long int" have the same size on most
>>> platforms is only anecdotal here.
>>>
>>> Just my 2 cents.
>>>
>>> H.
>>>
>>> On 08/25/2018 10:01 AM, Dirk Eddelbuettel wrote:
>>>

 On 25 August 2018 at 09:28, Carl Boettiger wrote:
 | I always thought it meant "Long" (I'm assuming R's integers are long
 | integers in C sense (iirrc one can declare 'long x', and it being
 common to
 | refer to integers as "longs"  in the same way we use "doubles" to mean
 | double precision floating point).  But pure speculation on my part,
 so I'm
 | curious!

 It does per my copy (dated 1990 !!) of the 2nd ed of Kernighan &
 Ritchie.  It
 explicitly mentions (sec 2.2) that 'int' may be 16 or 32 bits, and
 'long' is
 32 bit; and (in sec 2.3) introduces the I, U, and L labels for
 constants.  So
 "back then when" 32 bit was indeed long.  And as R uses 32 bit integers
 ...

 (It is all murky because the size is an implementation detail and later
 "essentially everybody" moved to 32 bit integers and 64 bit longs as
 the 64
 bit architectures became prevalent.  Which is why when it matters one
 should
 really use more explicit types like int32_t or int64_t.)

 Dirk


>>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Where does L come from?

2018-08-26 Thread William Dunlap via R-devel
>  the lack of a decimal place had historically not been significant

Version 4 of S (c. 1991) and versions of S+ based on it treated a sequence
of digits without a decimal  point as integer.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Sat, Aug 25, 2018 at 4:33 PM, Duncan Murdoch 
wrote:

> On 25/08/2018 4:49 PM, Hervé Pagès wrote:
>
>> The choice of the L suffix in R to mean "R integer type", which
>> is mapped to the "int" type at the C level, and NOT to the "long int"
>> type, is really unfortunate as it seems to be misleading and confusing
>> a lot of people.
>>
>
> Can you provide any evidence of that (e.g. a link to a message from one of
> these people)?  I think a lot of people don't really know about the L
> suffix, but that's different from being confused or misleaded by it.
>
> And if you make a criticism like that, it would really be fair to suggest
> what R should have done instead.  I can't think of anything better, given
> that "i" was already taken, and that the lack of a decimal place had
> historically not been significant.  Using "I" *would* have been confusing
> (3i versus 3I being very different).  Deciding that 3 suddenly became an
> integer value different from 3. would have led to lots of inefficient
> conversions (since stats mainly deals with floating point values).
>
> Duncan Murdoch
>
>
>
>> The fact that nowadays "int" and "long int" have the same size on most
>> platforms is only anecdotal here.
>>
>> Just my 2 cents.
>>
>> H.
>>
>> On 08/25/2018 10:01 AM, Dirk Eddelbuettel wrote:
>>
>>>
>>> On 25 August 2018 at 09:28, Carl Boettiger wrote:
>>> | I always thought it meant "Long" (I'm assuming R's integers are long
>>> | integers in C sense (iirrc one can declare 'long x', and it being
>>> common to
>>> | refer to integers as "longs"  in the same way we use "doubles" to mean
>>> | double precision floating point).  But pure speculation on my part, so
>>> I'm
>>> | curious!
>>>
>>> It does per my copy (dated 1990 !!) of the 2nd ed of Kernighan &
>>> Ritchie.  It
>>> explicitly mentions (sec 2.2) that 'int' may be 16 or 32 bits, and
>>> 'long' is
>>> 32 bit; and (in sec 2.3) introduces the I, U, and L labels for
>>> constants.  So
>>> "back then when" 32 bit was indeed long.  And as R uses 32 bit integers
>>> ...
>>>
>>> (It is all murky because the size is an implementation detail and later
>>> "essentially everybody" moved to 32 bit integers and 64 bit longs as the
>>> 64
>>> bit architectures became prevalent.  Which is why when it matters one
>>> should
>>> really use more explicit types like int32_t or int64_t.)
>>>
>>> Dirk
>>>
>>>
>>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] longint

2018-08-15 Thread William Dunlap via R-devel
Note that include/S.h contains
  /*
 This is a legacy header and no longer documented.
 Code using it should be converted to use R.h
  */
  ...
  /* is this a good idea? - conflicts with many versions of f2c.h */
  # define longint int

S.h was meant to be used while converting to R C code written for S or S+.
S/S+ "integers" are represented as C "long ints", whose size depends on
the architecture, while R "integers" are represented as 32-bit C "ints".
"longint" was invented to hide this difference.


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Wed, Aug 15, 2018 at 5:32 PM, Benjamin Tyner  wrote:

> Thanks for the replies and for confirming my suspicion.
>
> Interestingly, src/include/S.h uses a trick:
>
>#define longint int
>
> and so does the nlme package (within src/init.c).
>
> On 08/15/2018 02:47 PM, Hervé Pagès wrote:
>
>> No segfault but a BIG warning from the compiler. That's because
>> dereferencing the pointer inside your myfunc() function will
>> produce an int that is not predictable i.e. it is system-dependent.
>> Its value will depend on sizeof(long int) (which is not
>> guaranteed to be 8) and on the endianness of the system.
>>
>> Also if the pointer you pass in the call to the function is
>> an array of long ints, then pointer arithmetic inside your myfunc()
>> won't necessarily take you to the array element that you'd expect.
>>
>> Note that there are very specific situations where you can actually
>> do this kind of things e.g. in the context of writing a callback
>> function to pass to qsort(). See 'man 3 qsort' if you are on a Unix
>> system. In that case pointers to void and explicit casts should
>> be used. If done properly, this is portable code and the compiler won't
>> issue warnings.
>>
>> H.
>>
>>
>> On 08/15/2018 07:05 AM, Brian Ripley wrote:
>>
>>>
>>>
>>> On 15 Aug 2018, at 12:48, Duncan Murdoch 
 wrote:

 On 15/08/2018 7:08 AM, Benjamin Tyner wrote:
> Hi
> In my R package, imagine I have a C function defined:
>  void myfunc(int *x) {
> // some code
>  }
> but when I call it, I pass it a pointer to a longint instead of a
> pointer to an int. Could this practice potentially result in a
> segfault?
>

 I don't think the passing would cause a segfault, but "some code" might
 be expecting a positive number, and due to the type error you could pass in
 a positive longint and have it interpreted as a negative int.

>>>
>>> Are you thinking only of a little-endian system?  A 32-bit lookup of a
>>> pointer to a 64-bit area could read the wrong half and get a completely
>>> different value.
>>>
>>>
 Duncan Murdoch

 __
 R-devel@r-project.org mailing list
 https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.et
 hz.ch_mailman_listinfo_r-2Ddevel=DwIFAg=eRAMFD45gAfqt84V
 tBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=ERck0
 y30d00Np6hqTNYfjusx1beZim0OrKe9O4vkUxU=x1gI9ACZol7WbaWQ7Oc
 v60csJFJClZotWkJIMwUdjIc=

>>>
>>> __
>>> R-devel@r-project.org mailing list
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.et
>>> hz.ch_mailman_listinfo_r-2Ddevel=DwIFAg=eRAMFD45gAfqt84V
>>> tBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=ERck0
>>> y30d00Np6hqTNYfjusx1beZim0OrKe9O4vkUxU=x1gI9ACZol7WbaWQ7Oc
>>> v60csJFJClZotWkJIMwUdjIc=
>>>
>>>
>>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] validspamobject?

2018-08-15 Thread William Dunlap via R-devel
That was my first thought (my second was trace(.Deprecated,...)).  However,
the spam authors don't use .Deprecated() or warning() to tell about
deprecated functions.  See spam/R/deprecated.R:

validspamobject <- function( ...) {
#.Deprecated('validate_spam()')
message("`validspamobject()` is deprecated. Use `validate_spam()`
directly")
validate_spam( ...)
}

spam.getOption <- function(...) {
#.Deprecated(msg="`spam.getOption( arg)` is deprecated.\n Use
`getOption( spam.arg)` directly")
message("`spam.getOption( arg)` is deprecated. Use `getOption(
spam.arg)` directly")
getOption(...)

}



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Wed, Aug 15, 2018 at 1:26 AM, Emil Bode  wrote:

> Hello,
>
> If you want to determine where the warning is generated, I think it's
> easiest to run R with options(warn=2).
> In that case all warnings are converted to errors, and you have more
> debugging tools, e.g. you can run traceback() to see the calling stack, or
> use options(error=recover).
> Hope you can catch it.
>
>
> Best regards,
> Emil Bode
>
> is an institute of the Dutch Academy KNAW  and funding
> organisation NWO .
>
> On 15/08/2018, 02:57, "R-devel on behalf of Ronald Barry" <
> r-devel-boun...@r-project.org on behalf of rpba...@alaska.edu> wrote:
>
> Greetings,
>   My R package has been showing warnings of the form:
>
> `validspamobject()` is deprecated. Use `validate_spam()` directly
>
> None of my code uses the function validspamobject, so it must be a
> problem
> in another package I'm calling, possibly spam or spdep.  Has this
> problem
> occurred with other people?  It doesn't have any deleterious effect,
> but
> it's annoying.  In particular, how do I determine which package is
> causing
> this warning?  Thanks.
>
> Ron B.
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] trace in uniroot() ?

2018-08-13 Thread William Dunlap via R-devel
To record the value of the function as well as the arguments, you can use
the following

instrumentObjectiveFunction <- function(FUN) {
newFUN <- local({
INFO <- list()
function(...) {
value <- FUN(...)
INFO[[length(INFO)+1]] <<- list(args=list(...), value=value)
value
}
})
newFUN
}

E.g.,
> untrace(ff)
> ff0 <- uniroot(instrumentedFF <- instrumentObjectiveFunction(ff), c(0,
10))
> str(environment(instrumentedFF)$INFO)
List of 13
 $ :List of 2
  ..$ args :List of 1
  .. ..$ : num 0
  ..$ value: num -1
 $ :List of 2
  ..$ args :List of 1
  .. ..$ : num 10
  ..$ value: num 146
 $ :List of 2
  ..$ args :List of 1
  .. ..$ : num 0.0678
  ..$ value: num -0.965
 $ :List of 2
  ..$ args :List of 1
  .. ..$ : num 5.03
  ..$ value: num 10.4
 $ :List of 2
  ..$ args :List of 1
  .. ..$ : num 0.49
  ..$ value: num -0.722
...


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Mon, Aug 13, 2018 at 3:44 PM, J C Nash  wrote:

> Despite my years with R, I didn't know about trace(). Thanks.
>
> However, my decades in the minimization and root finding game make me like
> having
> a trace that gives some info on the operation, the argument and the
> current function value.
> I've usually found glitches are a result of things like >= rather than >
> in tests etc., and
> knowing what was done is the quickest way to get there.
>
> This is, of course, the numerical software developer view. I know "users"
> (a far too vague
> term) don't like such output. I've sometimes been tempted with my svd or
> optimization codes to
> have a return message in bold-caps "YOUR ANSWER IS WRONG AND THERE'S A
> LAWYER WAITING TO
> MAKE YOU PAY", but I usually just satisfy myself with "Not at a
> minimum/root".
>
> Best, JN
>
> On 2018-08-13 06:00 PM, William Dunlap wrote:
> > I tend to avoid the the trace/verbose arguments for the various root
> finders and optimizers and instead use the trace
> > function or otherwise modify the function handed to the operator.  You
> can print or plot the arguments or save them.  E.g.,
> >
> >> trace(ff, print=FALSE, quote(cat("x=", deparse(x), "\n", sep="")))
> > [1] "ff"
> >> ff0 <- uniroot(ff, c(0, 10))
> > x=0
> > x=10
> > x=0.0678365490630423
> > x=5.03391827453152
> > x=0.490045026724842
> > x=2.76198165062818
> > x=1.09760394309444
> > x=1.92979279686131
> > x=1.34802524899502
> > x=1.38677998493585
> > x=1.3862897003949
> > x=1.38635073555115
> > x=1.3862897003949
> >
> > or
> >
> >> X <- numeric()
> >> trace(ff, print=FALSE, quote(X[[length(X)+1]] <<- x))
> > [1] "ff"
> >> ff0 <- uniroot(ff, c(0, 10))
> >> X
> >  [1]  0. 10.  0.06783655
> >  [4]  5.03391827  0.49004503  2.76198165
> >  [7]  1.09760394  1.92979280  1.34802525
> > [10]  1.38677998  1.38628970  1.38635074
> > [13]  1.38628970
> >
> > This will not tell you why the objective function is being called (e.g.
> in a line search
> > or in derivative estimation), but some plotting or other postprocessing
> can ususally figure that out.
> >
> >
> > Bill Dunlap
> > TIBCO Software
> > wdunlap tibco.com 
> >
> > On Mon, Jul 30, 2018 at 11:35 AM, J C Nash  > wrote:
> >
> > In looking at rootfinding for the histoRicalg project (see
> gitlab.com/nashjc/histoRicalg
> > ),
> > I thought I would check how uniroot() solves some problems. The
> following short example
> >
> > ff <- function(x){ exp(0.5*x) - 2 }
> > ff(2)
> > ff(1)
> > uniroot(ff, 0, 10)
> > uniroot(ff, c(0, 10), trace=1)
> > uniroot(ff, c(0, 10), trace=TRUE)
> >
> >
> > shows that the trace parameter, as described in the Rd file, does
> not seem to
> > be functional except in limited situations (and it suggests an
> > integer, then uses a logical for the example, e.g.,
> >  ## numerically, f(-|M|) becomes zero :
> >  u3 <- uniroot(exp, c(0,2), extendInt="yes", trace=TRUE)
> > )
> >
> > When extendInt is set, then there is some information output, but
> trace alone
> > produces nothing.
> >
> > I looked at the source code -- it is in 
> > R-3.5.1/src/library/stats/R/nlm.R
> and
> > calls zeroin2 code from R-3.5.1/src/library/stats/src/optimize.c as
> far as I
> > can determing. My code inspection suggests trace does not show the
> iterations
> > of the rootfinding, and only has effect when the search interval is
> allowed
> > to be extended. It does not appear that there is any mechanism to ask
> > the zeroin2 C code to display intermediate work.
> >
> > This isn't desperately important for me as I wrote an R version of
> the code in
> > package rootoned on R-forge (which Martin Maechler adapted as
> unirootR.R in
> > Rmpfr so multi-precision roots can be found). My zeroin.R has
> 'trace' to get
> > the pattern of different steps. In fact it is a bit excessive. Note
> > unirootR.R uses 'verbose' rather than 

Re: [Rd] trace in uniroot() ?

2018-08-13 Thread William Dunlap via R-devel
I tend to avoid the the trace/verbose arguments for the various root
finders and optimizers and instead use the trace function or otherwise
modify the function handed to the operator.  You can print or plot the
arguments or save them.  E.g.,

> trace(ff, print=FALSE, quote(cat("x=", deparse(x), "\n", sep="")))
[1] "ff"
> ff0 <- uniroot(ff, c(0, 10))
x=0
x=10
x=0.0678365490630423
x=5.03391827453152
x=0.490045026724842
x=2.76198165062818
x=1.09760394309444
x=1.92979279686131
x=1.34802524899502
x=1.38677998493585
x=1.3862897003949
x=1.38635073555115
x=1.3862897003949

or

> X <- numeric()
> trace(ff, print=FALSE, quote(X[[length(X)+1]] <<- x))
[1] "ff"
> ff0 <- uniroot(ff, c(0, 10))
> X
 [1]  0. 10.  0.06783655
 [4]  5.03391827  0.49004503  2.76198165
 [7]  1.09760394  1.92979280  1.34802525
[10]  1.38677998  1.38628970  1.38635074
[13]  1.38628970

This will not tell you why the objective function is being called (e.g. in
a line search
or in derivative estimation), but some plotting or other postprocessing can
ususally figure that out.


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Mon, Jul 30, 2018 at 11:35 AM, J C Nash  wrote:

> In looking at rootfinding for the histoRicalg project (see
> gitlab.com/nashjc/histoRicalg),
> I thought I would check how uniroot() solves some problems. The following
> short example
>
> ff <- function(x){ exp(0.5*x) - 2 }
> ff(2)
> ff(1)
> uniroot(ff, 0, 10)
> uniroot(ff, c(0, 10), trace=1)
> uniroot(ff, c(0, 10), trace=TRUE)
>
>
> shows that the trace parameter, as described in the Rd file, does not seem
> to
> be functional except in limited situations (and it suggests an
> integer, then uses a logical for the example, e.g.,
>  ## numerically, f(-|M|) becomes zero :
>  u3 <- uniroot(exp, c(0,2), extendInt="yes", trace=TRUE)
> )
>
> When extendInt is set, then there is some information output, but trace
> alone
> produces nothing.
>
> I looked at the source code -- it is in R-3.5.1/src/library/stats/R/nlm.R
> and
> calls zeroin2 code from R-3.5.1/src/library/stats/src/optimize.c as far
> as I
> can determing. My code inspection suggests trace does not show the
> iterations
> of the rootfinding, and only has effect when the search interval is allowed
> to be extended. It does not appear that there is any mechanism to ask
> the zeroin2 C code to display intermediate work.
>
> This isn't desperately important for me as I wrote an R version of the
> code in
> package rootoned on R-forge (which Martin Maechler adapted as unirootR.R in
> Rmpfr so multi-precision roots can be found). My zeroin.R has 'trace' to
> get
> the pattern of different steps. In fact it is a bit excessive. Note
> unirootR.R uses 'verbose' rather than 'trace'. However, it would be nice
> to be
> able to see what is going on with uniroot() to verify equivalent operation
> at
> the same precision level. It is very easy for codes to be very slightly
> different and give quite widely different output.
>
> Indeed, even without the trace, we see (zeroin from rootoned here)
>
> > zeroin(ff, c(0, 10), trace=FALSE)
> $root
> [1] 1.386294
>
> $froot
> [1] -5.658169e-10
>
> $rtol
> [1] 7.450581e-09
>
> $maxit
> [1] 9
>
> > uniroot(ff, c(0, 10), trace=FALSE)
> $root
> [1] 1.38629
>
> $f.root
> [1] -4.66072e-06
>
> $iter
> [1] 10
>
> $init.it
> [1] NA
>
> $estim.prec
> [1] 6.103516e-05
>
> >
>
> Is the lack of trace a bug, or at least an oversight? Being able to follow
> iterations is a
> classic approach to checking that computations are proceeding as they
> should.
>
> Best, JN
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] apply with zero-row matrix

2018-08-13 Thread William Dunlap via R-devel
vapply has a mandatory FUN.VALUE argument which specifies the type and size
of FUN's return value.  This helps when you want to cover the 0-length case
without 'if' statements.  You can change your apply calls to vapply calls,
but they will be a bit more complicated.  E.g.,  change
   apply(X=myMatrix, MARGIN=2, FUN=quantile)
to
   vapply(seq_len(ncol(myMatrix)), FUN=function(i)quantile(myMatrix[,i]),
FUN.VALUE=numeric(5))

The latter will always return a 5-row by ncol(myMatrix) matrix.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Mon, Jul 30, 2018 at 5:38 AM, Martin Maechler  wrote:

> > David Hugh-Jones
> > on Mon, 30 Jul 2018 10:12:24 +0100 writes:
>
> > Hi Martin, Fair enough for R functions in general. But the
> > behaviour of apply violates the expectation that apply(m,
> > 1, fun) calls fun n times when m has n rows.  That seems
> > pretty basic.
>
> Well, that expectation is obviously wrong ;-)  see below
>
> > Also, I understand from your argument why it makes sense
> > to call apply and return a special result (presumably
> > NULL) for an empty argument; but why should apply call fun?
>
> > Cheers David
>
> The reason is seen e.g. in
>
> > apply(matrix(,0,3), 2, quantile)
>  [,1] [,2] [,3]
> 0% NA   NA   NA
> 25%NA   NA   NA
> 50%NA   NA   NA
> 75%NA   NA   NA
> 100%   NA   NA   NA
> >
>
> and that is documented (+/-) in the first paragraph of the
> 'Value:' section of help(apply) :
>
>  > Value:
>  >
>  >  If each call to ‘FUN’ returns a vector of length ‘n’, then ‘apply’
>  >  returns an array of dimension ‘c(n, dim(X)[MARGIN])’ if ‘n > 1’.
>  >  If ‘n’ equals ‘1’, ‘apply’ returns a vector if ‘MARGIN’ has length
>  >  1 and an array of dimension ‘dim(X)[MARGIN]’ otherwise.  If ‘n’ is
>  >  ‘0’, the result has length 0 but not necessarily the ‘correct’
>  >  dimension.
>
>
> To determine 'n', the function *is* called once even when
> length(X) ==  0
>
> It may indeed be would helpful to add this explicitly to the
> help page  ( /src/library/base/man/apply.Rd ).
> Can you propose a wording (in *.Rd if possible) ?
>
> With regards,
> Martin
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] odd behavior of names

2018-07-29 Thread William Dunlap via R-devel
Bugzilla issue 16101 describes another first-list-name-printed-differently
oddity
with the Windows GUI version of R:

> a <- "One is \u043E\u0434\u0438\u043D\nTwo is \u0434\u0432\u0430\n"
> Encoding(a) # expect "UTF-8"
[1] "UTF-8"
> sapply(strsplit(a, "\n")[[1]], charToRaw)[c(1,1,2)]
$`One is один`
 [1] 4f 6e 65 20 69 73 20 d0 be d0 b4 d0
[13] b8 d0 bd

$`One is `
 [1] 4f 6e 65 20 69 73 20 d0 be d0 b4 d0
[13] b8 d0 bd

$`Two is `
 [1] 54 77 6f 20 69 73 20 d0 b4 d0 b2 d0
[13] b0

> names(.Last.value)
[1] "One is один" "One is один"
[3] "Two is два"





Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Sun, Jul 29, 2018 at 8:45 AM, David Winsemius 
wrote:

>
> > On Jul 29, 2018, at 6:31 AM, Gabor Grothendieck 
> wrote:
> >
> > The first component name has backticks around it and the second does
> > not. Though not wrong, it seems inconsistent.
> >
> > list(a = 1, b = 2)
> > ## $`a`
> > ## [1] 1
> > ##
> > ## $b
> > ## [1] 2
> >
> > R.version.string
> > ## [1] "R version 3.5.1 Patched (2018-07-02 r74950)"
>
> Agree it would be unexpected. Unable to reproduce on Mac:
>
> list(a = 1, b = 2)
> #--
> $a
> [1] 1
>
> $b
> [1] 2
>
>  R.version.string
> #[1] "R version 3.5.1 (2018-07-02)"
> Platform: x86_64-apple-darwin15.6.0 (64-bit)
>
>
> >
> >
> >
> > --
> > Statistics & Software Consulting
> > GKX Group, GKX Associates Inc.
> > tel: 1-877-GKX-GROUP
> > email: ggrothendieck at gmail.com
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> David Winsemius
> Alameda, CA, USA
>
> 'Any technology distinguishable from magic is insufficiently advanced.'
>  -Gehm's Corollary to Clarke's Third Law
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] NEWS vs. inst/NEWS

2018-07-06 Thread William Dunlap via R-devel
'Writing R Extensions', section 1.1.5, in the part about a package's 'inst'
directory, says that if NEW is in both the top level and in the inst
directory, the in inst will be installed:

Note that with the exceptions of INDEX, LICENSE/LICENCE and NEWS,
information files at the top level of the package will *not* be installed
and so not be known to users of Windows and macOS compiled packages (and
not seen by those who use R CMD INSTALL or install.packages on the
tarball). So any information files you wish an end user to see should be
included in inst. Note that if the named exceptions also occur in inst, the
version in inst will be that seen in the installed package.

However, if I have a package with a NEWS file in both the top-level and in
the inst directory, the top-level one appears in the installed file, not
the one in inst.

Try the following script that makes and installs a package in tempdir():
pkgName <- "junk"
dir.create(tdir <- tempfile())
dir.create(srcPkg <- file.path(tdir, pkgName))
cat(file=file.path(srcPkg, "DESCRIPTION"), sep="\n",
   paste("Package:", pkgName),
   paste("Title: NEWS vs. inst/NEWS"),
   paste("Description: Which is installed, NEWS or inst/NEWS?"),
   paste("Version: 0.1"))
file.create(file.path(srcPkg, "NAMESPACE"))
dir.create(inst <- file.path(srcPkg, "inst"))

instFiles <- c("NEWS", "CITATION") # these both at top-level and in inst
directory
for(instFile in instFiles) {
   cat(file=file.path(srcPkg, instFile), sep="\n",
   paste0("The original top-level ", instFile, " file"))
   cat(file=file.path(inst, instFile), sep="\n",
   paste0("inst/", instFile, " - the ", instFile, " file from the
source package's inst directory"))
}

dir.create(lib <- file.path(tdir, "lib"))
install.packages(lib=lib, srcPkg, repos=NULL, type="source")
sapply(instFiles, function(instFile) readLines(system.file(package=pkgName,
lib.loc=lib, mustWork=TRUE, instFile)))
# unlink(recursive=TRUE, tdir) # to clean up

The final sapply() gives me
> sapply(instFiles, function(instFile)
readLines(system.file(package=pkgName, lib.loc=lib, mustWork=TRUE,
instFile)))
NEWS
  "The original top-level NEWS file"
CITATION
"inst/CITATION - the CITATION file from the source package's inst directory"

Several CRAN packages have both NEWS and inst/NEWS (gdata, genetics,
gplots, mcgibbsit, modeltools, nimble, RRF, session, SII).  In most the two
files are identical but in RRF and nimble they differ.

Is the manual wrong is the code wrong?


Bill Dunlap
TIBCO Software
wdunlap tibco.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] minor problem in XML package

2018-06-06 Thread William Dunlap via R-devel
[The package XML is labelled ORPHANED and a comment says the CRAN team
maintains it.  I am not sure what address to send this to.]

In package XML version 3.98.1.11, RUtils.c registers the C function
RS_XML_xmlNodeChildrenReferences twice.  The registration information is
identical but this could cause maintenance problems if the routine were
changed.

% grep -n RS_XML_xmlNodeChildrenReferences RUtils.c
205:ENTRY(RS_XML_xmlNodeChildrenReferences, 3),
231:ENTRY(RS_XML_xmlNodeChildrenReferences, 3),

One of them should be deleted.

% diff -u XML/src/RUtils.c~ XML/src/RUtils.c
--- XML/src/RUtils.c~   2018-06-06 11:32:16.549338000 -0700
+++ XML/src/RUtils.c2018-06-06 11:33:07.899782000 -0700
@@ -228,7 +228,6 @@
ENTRY(RS_XML_xmlNodeName, 1),
ENTRY(RS_XML_xmlNodeNamespace, 1),
ENTRY(RS_XML_xmlNodeAttributes, 3),
-   ENTRY(RS_XML_xmlNodeChildrenReferences, 3),
ENTRY(R_xmlNodeValue, 3),
ENTRY(R_setXMLInternalTextNode_value, 2),
ENTRY(RS_XML_xmlNodeParent, 2),

Bill Dunlap
TIBCO Software
wdunlap tibco.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] length of `...`

2018-05-03 Thread William Dunlap via R-devel
In R-3.5.0 you can use ...length():
  > f <- function(..., n) ...length()
  > f(stop("one"), stop("two"), stop("three"), n=7)
  [1] 3

Prior to that substitute() is the way to go
  > g <- function(..., n) length(substitute(...()))
  > g(stop("one"), stop("two"), stop("three"), n=7)
  [1] 3

R-3.5.0 also has the ...elt(n) function, which returns
the evaluated n'th entry in ... , without evaluating the
other ... entries.
  > fn <- function(..., n) ...elt(n)
  > fn(stop("one"), 3*5, stop("three"), n=2)
  [1] 15

Prior to 3.5.0, eval the appropriate component of the output
of substitute() in the appropriate environment:
  > gn <- function(..., n) {
  +   nthExpr <- substitute(...())[[n]]
  +   eval(nthExpr, envir=parent.frame())
  + }
  > gn(stop("one"), environment(), stop("two"), n=2)
  




Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, May 3, 2018 at 7:29 AM, Dénes Tóth  wrote:

> Hi,
>
>
> In some cases the number of arguments passed as ... must be determined
> inside a function, without evaluating the arguments themselves. I use the
> following construct:
>
> dotlength <- function(...) length(substitute(expression(...))) - 1L
>
> # Usage (returns 3):
> dotlength(1, 4, something = undefined)
>
> How can I define a method for length() which could be called directly on
> `...`? Or is it an intention to extend the base length() function to accept
> ellipses?
>
>
> Regards,
> Denes
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] issue with model.frame()

2018-05-01 Thread William Dunlap via R-devel
You run into the same problem when using 'non-syntactical' names:

> mfB <- model.frame(y ~ `Temp(C)` + `Pres(mb)`,
data=data.frame(check.names=FALSE, y=1:10, `Temp(C)`=21:30,
`Pres(mb)`=991:1000))
> match(attr(terms(mfB), "term.labels"), names(mfB))   # gives NA's
[1] NA NA
> attr(terms(mfB), "term.labels")
[1] "`Temp(C)`"  "`Pres(mb)`"
> names(mfB)
[1] "y""Temp(C)"  "Pres(mb)"

Note that names(mfB) does not give a hint as whether they represent R
expressions or not (in this case they do not).  When they do represent R
expressions then one could parse() them and compare them to
as.list(attr(mfB),"variables")[-1]).


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, May 1, 2018 at 6:11 AM, Therneau, Terry M., Ph.D. via R-devel <
r-devel@r-project.org> wrote:

> A user sent me an example where coxph fails, and the root of the failure
> is a case where names(mf) is not equal to the term.labels attribute of the
> formula -- the latter has an extraneous newline. Here is an example that
> does not use the survival library.
>
> # first create a data set with many long names
> n <- 30  # number of rows for the dummy data set
> vname <- vector("character", 26)
> for (i in 1:26) vname[i] <- paste(rep(letters[1:i],2), collapse='')  #
> long variable names
>
> tdata <- data.frame(y=1:n, matrix(runif(n*26), nrow=n))
> names(tdata) <- c('y', vname)
>
> # Use it in a formula
> myform <- paste("y ~ cbind(", paste(vname, collapse=", "), ")")
> mf <- model.frame(formula(myform), data=tdata)
>
> match(attr(terms(mf), "term.labels"), names(mf))   # gives NA
>
> 
>
> In the user's case the function is ridge(x1, x2, ) rather than cbind,
> but the effect is the same.
> Any ideas for a work around?
>
> Aside: the ridge() function is very simple, it was added as an example to
> show how a user can add their own penalization to coxph.  I never expected
> serious use of it.  For this particular user the best answer is to use
> glmnet instead.   He/she is trying to apply an L2 penalty to a large number
> of SNP * covariate interactions.
>
> Terry T.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] disappearing tempdir()

2018-04-24 Thread William Dunlap via R-devel
Recent versions of Windows will remove empty directories from areas that
Windows considers places for temporary files.  It does not seem to matter
how old they are; empty directories are found and removred  c. once a day.
I haven't seen any documentation on this feature but I think you can turn
if off by disabling "Storage Sense" in Settings>System>Storage.

This means that R's tempdir() can easily disappear unless you put a file in
it.  I think an empty file will do the trick.  Perhaps R could do this when
it makes a new tempdir().  (When the file gets old, 30 days?, it will be
removed and then the empty directory holding it will be removed, but that
is better than the current situation.)

On a related note, R-3.5 has a new argument to tempdir: check=FALSE.  If
'check' is TRUE then tempdir() will make a new directory, with a new name,
in which to hold temporary files.  If it first tried to make a new
directory with the name of the previous tempdir() then things like fix(),
which cache the name of a file in tempdir(), will continue to work.  Is the
plan to make check=TRUE the default in tempdir(), or perhaps have
tempfile() call tempdir(check=TRUE)?  Then we would not have problems like

> file.rename(tempdir(), paste0(tempdir(), "~")) # mimic Windows cleaner
[1] TRUE
> file.create(tempfile())
[1] FALSE
Warning message:
In file.create(tempfile()) :
  cannot create file '/tmp/RtmpHKpWnV/file67f416dcb511', reason 'No such
file or directory'


Bill Dunlap
TIBCO Software
wdunlap tibco.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] odd assignInNamespace / setGeneric interaction

2018-04-19 Thread William Dunlap via R-devel
The problem is not specific to redefining the q function, but to
the interaction of assignInNamespace and setGeneric.  The
latter requires, roughtly, that the environment of the function
being replaced by an S4 generic is (or is the descendent of)
the environment in which it lives.

E.g., the following demonstrates the problem

% R --quiet --vanilla
> assignInNamespace("plot", function(x, ...) stop("No plotting allowed!"),
getNamespace("graphics"))
> library(stats4)
Error: package or namespace load failed for ‘stats4’ in loadNamespace(name):
 there is no package called ‘.GlobalEnv’

and defining the bogus plot function in the graphics namespace avoids the
problem

% R --quiet --vanilla
>  assignInNamespace("plot", with(getNamespace("graphics"), function(x,
...) stop("No plotting allowed!")), getNamespace("graphics"))
> library(stats4)
>

I suppose poeple who use assignInNamespace get what they deserve.


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, Apr 19, 2018 at 2:33 AM, Martin Maechler <maech...@stat.math.ethz.ch
> wrote:

> >>>>> Michael Lawrence <lawrence.mich...@gene.com>
> >>>>> on Wed, 18 Apr 2018 14:16:37 -0700 writes:
>
> > Hi Bill,
> > Ideally, your coworker would just make an alias (or shortcut or
> > whatever) for R that passed --no-save to R. I'll try to look into
> this
> > though.
>
> > Michael
>
> Yes, indeed!
>
> As some of you know, I've been using R (for ca 23 years now)
> almost only from ESS (Emacs Speaks Statistics).
>
> There, I've activated '--no-save' for ca 20 years or so,
> nowadays (since Emacs has adopted "custom") I have had this in
> my ~/.emacs  custom lines
>
>  '(inferior-R-args "--no-restore-history --no-save ")
>
> standalone (to paste into your own ~/.emacs ) :
>
> (custom-set-variables '(inferior-R-args "--no-restore-history --no-save "))
>
> 
>
> The current fashionable IDE to R,
> Rstudio, also allows to set such switches by its GUI:
>
> Menu [Tools]
>   --> (bottom) entry [Global Options]
> --> the first sidebar entry  [R General]:
> Look for two lines mentioning "workspace" or ".RData" and
> change to 'save never' ( == --no-save),
> and nowadays I also recommend my students to not *read*
> these, i.e., '--no-restore'
>
> ---
>
> @Michael: I'm not sure what you're considering.  I feel that in
>  general, there are already too many R startup tweaking
>  possibilities, notably via environment variables.
> [e.g., the current ways to pre-determine the active .libPaths() in R,
>  and the fact the R calls R again during 'R CMD check' etc,
>  sometimes drives me crazy when .libPaths() become incompatible
>  for too many reasons  yes, I'm diverting: that's another story]
>
> If we'd want to allow using (yet  another!) environment variable
> here, I'd at least would  make sure they are not consulted when
> explicit --no-save or --vanilla, etc are used.
>
> Martin
>
>
> > On Wed, Apr 18, 2018 at 1:38 PM, William Dunlap via R-devel
> > <r-devel@r-project.org> wrote:
> >> A coworker got tired of having to type 'yes' or 'no' after quitting
> R: he
> >> never wanted to save the R workspace when quitting.  So he added
> >> assignInNamespace lines to his .Rprofile file to replace base::q
> with
> >> one that, by default, called the original with save="no"..
> >>
> >> utils::assignInNamespace(".qOrig", base::q, "base")
> >> utils::assignInNamespace("q", function(save = "no", ...)
> >> base:::.qOrig(save = save, ...), "base")
> >>
> >> This worked fine until he decide to load the distr package:
> >>
> >> > suppressPackageStartupMessages(library(distr))
> >> Error: package or namespace load failed for ‘distr’ in
> >> loadNamespace(name):
> >> there is no package called ‘.GlobalEnv’
> >>
> >> distr calls setGeneric("q"), which indirectly causes the environment
> >> of base::q, .GlobalEnv, to be loaded as a namespace, causing the
> error.
> >> Giving his replacement q function the environment
> getNamespace("base")
> >> avoids the problem.
> >>
> >> I can reproduce the problem by making a package that just calls
> >> setGeneric("as.hexmode",...) and a NAMEPACE file with
> >> exportMethods("as.hexmode").  If my .Rpro

[Rd] odd assignInNamespace / setGeneric interaction

2018-04-18 Thread William Dunlap via R-devel
A coworker got tired of having to type 'yes' or 'no' after quitting R: he
never wanted to save the R workspace when quitting.  So he added
assignInNamespace lines to his .Rprofile file to replace base::q with
one that, by default, called the original with save="no"..

  utils::assignInNamespace(".qOrig", base::q, "base")
  utils::assignInNamespace("q", function(save = "no", ...)
base:::.qOrig(save = save, ...), "base")

This worked fine until he decide to load the distr package:

  > suppressPackageStartupMessages(library(distr))
  Error: package or namespace load failed for ‘distr’ in
loadNamespace(name):
   there is no package called ‘.GlobalEnv’

distr calls setGeneric("q"), which indirectly causes the environment
of base::q, .GlobalEnv, to be loaded as a namespace, causing the error.
Giving his replacement q function the environment getNamespace("base")
avoids the problem.

I can reproduce the problem by making a package that just calls
setGeneric("as.hexmode",...) and a NAMEPACE file with
exportMethods("as.hexmode").  If my .Rprofile puts a version of as.hexmode
with environment .GlobalEnv into the base namespace, then I get the same
error when trying to load the package.

I suppose this is mostly a curiosity and unlikely to happen to most people
but it did confuse us for a while.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] strange warning: data() error?

2018-04-16 Thread William Dunlap via R-devel
data(package="survival") gives, in part,

cgd Chronic Granulotomous Disease data
cgd0 (cgd)  Chronic Granulotomous Disease data
colon   Chemotherapy for Stage B/C colon cancer
flchain Assay of serum free light chain for 7874
subjects.
genfan  Generator fans
heart   Stanford Heart Transplant data
jasa (heart)Stanford Heart Transplant data
jasa1 (heart)   Stanford Heart Transplant data

The 'name1 (name2)' entries indicate that 'name'; is in the file
labelled name2.  If you run data(cgd) you get both cgd and cdg0
in .GlobalEnv;  if you run data(heart) you get heart, jasa, and jasa1.
I don't think this has changed recently, although it might be nice
if the names were handled more symmetrically, like alias entries
in help files.


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Mon, Apr 16, 2018 at 2:58 PM, Therneau, Terry M., Ph.D. via R-devel <
r-devel@r-project.org> wrote:

> A user asked me about this and I can't figure it out.
>
> tmt% R
> R Under development (unstable) (2018-04-09 r74565) -- "Unsuffered
> Consequences"
> Copyright (C) 2018 The R Foundation for Statistical Computing
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> > library(survival)
> > data(cgd0)
> Warning message:
> In data(cgd0) : data set ‘cgd0’ not found
>
> 
>
> The data set is present and can be manipulated: data() is not required.
> Other data sets in the survival package don't generate this message.
>
> Terry T.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] alpha transparency doesn't work for lines when xpd=TRUE

2018-04-16 Thread William Dunlap via R-devel
The problem occurs in the Windows GUI with the 'windows()' graphics device.
In the following example the red diagonal line appears in 3 plots but not
in the one
with xpd=TRUE and alpha.f=0.9.

> par(mfrow=c(2,2))
> for(xpd in c(FALSE, TRUE)) for(alpha.f in c(.9, 1))
plot(0:1,xpd=xpd,type="l",col=adjustcolor("red",alpha.f=alpha.f),main=paste0("xpd=",xpd,",
alpha.f=",alpha.f))
> dev.cur()
windows
  2
> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 16299)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C

[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.4.4


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Mon, Apr 16, 2018 at 12:14 PM, David Winsemius 
wrote:

>
> > On Apr 16, 2018, at 10:41 AM, Jiaxuan Chen 
> wrote:
> >
> > Dear R-devel,
> >
> > I think I've found a bug - the alpha transparency doesn't work when
> plotting lines with xpd = TRUE.
> >
> > #works
> > plot(1:20, col="#1874CD", xpd=T, type="l")
> >
> > #works
> > plot(1:20, col="#1874CD50", xpd=F, type="l")
> >
> > #doesn't work
> > plot(1:20, col="#1874CD50", xpd=T, type="l")
>
> It's behaving as expected (last two lines light blue) on a Mac (El
> Capitan) and R 3.4.3. (I did check to see if T and F were still TRUE and
> FALSE at the time. It's possible that they were not in your session. Only
> TRUE and FALSE are reserved words.)
> >
> > Thank you.
> >
> > Jim
> >
> >
> >   [[alternative HTML version deleted]]
>
> All the R mailing lists are plain text.
>
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> David Winsemius
> Alameda, CA, USA
>
> 'Any technology distinguishable from magic is insufficiently advanced.'
>  -Gehm's Corollary to Clarke's Third Law
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] backquotes and term.labels

2018-03-05 Thread William Dunlap via R-devel
I believe this has to do terms() making "term.labels" (hence the dimnames
of "factors")
with deparse(), so that the backquotes are included for non-syntactic
names.  The backquotes
are not in the column names of the input data.frame (nor model frame) so
you get a mismatch
when subscripting the data.frame or model.frame with elements of
terms()$term.labels.

I think you can avoid the problem by adding right after
ll <- attr(Terms, "term.labels")
the line
ll <- gsub("^`|`$", "", ll)

E.g.,

> d <- data.frame(check.names=FALSE, y=1/(1:5), `b$a$d`=sin(1:5)+2, `x y
z`=cos(1:5)+2)
> Terms <- terms( y ~ log(`b$a$d`) + `x y z` )
> m <- model.frame(Terms, data=d)
> colnames(m)
[1] "y""log(`b$a$d`)" "x y z"
> attr(Terms, "term.labels")
[1] "log(`b$a$d`)" "`x y z`"
>   ll <- attr(Terms, "term.labels")
> gsub("^`|`$", "", ll)
[1] "log(`b$a$d`)" "x y z"

It is a bit of a mess.


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Mon, Mar 5, 2018 at 12:55 PM, Therneau, Terry M., Ph.D. via R-devel <
r-devel@r-project.org> wrote:

> A user reported a problem with the survdiff function and the use of
> variables that contain a space.  Here is a simple example.  The same issue
> occurs in survfit for the same reason.
>
> lung2 <- lung
> names(lung2)[1] <- "in st"   # old name is inst
> survdiff(Surv(time, status) ~ `in st`, data=lung2)
> Error in `[.data.frame`(m, ll) : undefined columns selected
>
> In the body of the code the program want to send all of the right-hand
> side variables forward to the strata() function.  The code looks more or
> less like this, where m is the model frame
>
>   Terms <- terms(m)
>   index <- attr(Terms, "term.labels")
>   if (length(index) ==0)  X <- rep(1L, n)  # no coariates
>   else X <- strata(m[index])
>
> For the variable with a space in the name the term.label is "`in st`", and
> the subscript fails.
>
> Is this intended behaviour or a bug?  The issue is that the name of this
> column in the model frame does not have the backtics, while the terms
> structure does have them.
>
> Terry T.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] readLines interaction with gsub different in R-dev

2018-02-17 Thread William Dunlap via R-devel
I think the problem in R-devel happens when there are non-ASCII characters
in any
of the strings passed to gsub.

txt <- vapply(list(as.raw(c(0x41, 0x6d, 0xc3, 0xa9, 0x6c, 0x69, 0x65)),
as.raw(c(0x41, 0x6d, 0x65, 0x6c, 0x69, 0x61))), rawToChar, "")
txt
#[1] "Amélie" "Amelia"
Encoding(txt)
#[1] "unknown" "unknown"
gsub(perl=TRUE, "(\\w)(\\w)", "<\\L\\1\\U\\2>", txt)
#[1] "", txt[1])
#[1] "", txt[2])
#[1] ""

I can change the Encoding to "latin1" or "UTF-8" and get similar results
from gsub.


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Sat, Feb 17, 2018 at 7:35 AM, Hugh Parsonage 
wrote:

> | Confirmed for R-devel (current) on Ubuntu 17.10.  But ... isn't the
> regexp
> | you use wrong, ie isn't R-devel giving the correct answer?
>
> No, I don't think R-devel is correct (or at least consistent with the
> documentation). My interpretation of gsub("(\\w)", "\\U\\1", entry,
> perl = TRUE) is "Take every word character and replace it with itself,
> converted to uppercase."
>
> Perhaps my example was too minimal. Consider the following:
>
> R> gsub("(\\w)", "\\U\\1", entry, perl = TRUE)
> [1] "A"
>
> R> gsub("(\\w)", "\\1", entry, perl = TRUE)
> [1] "author: Amélie"   # OK, but very different to 'A', despite only
> not specifying uppercase
>
> R> gsub("(\\w)", "\\U\\1", "author: Amelie", perl = TRUE)
> [1] "AUTHOR: AMELIE"  # OK, but very different to 'A',
>
> R> gsub("^(\\w+?): (\\w)", "\\U\\1\\E: \\2", entry, perl = TRUE)
>  "AUTHOR"  # Where did everything after the first group go?
>
> I should note the following example too:
> R> gsub("(\\w)", "\\U\\1", entry, perl = TRUE, useBytes = TRUE)
> [1] "AUTHOR: AMéLIE"  # latin1 encoding
>
>
> A call to `readLines` (possibly `scan()` and `read.table` and friends)
> is essential.
>
>
>
>
> On 18 February 2018 at 02:15, Dirk Eddelbuettel  wrote:
> >
> > On 17 February 2018 at 21:10, Hugh Parsonage wrote:
> > | I was told to re-raise this issue with R-dev:
> > |
> > | In the documentation of R-dev and R-3.4.3, under ?gsub
> > |
> > | > replacement
> > | >... For perl = TRUE only, it can also contain "\U" or "\L" to
> convert the rest of the replacement to upper or lower case and "\E" to end
> case conversion.
> > |
> > | However, the following code runs differently:
> > |
> > | tempf <- tempfile()
> > | writeLines(enc2utf8("author: Amélie"), con = tempf, useBytes = TRUE)
> > | entry <- readLines(tempf, encoding = "UTF-8")
> > | gsub("(\\w)", "\\U\\1", entry, perl = TRUE)
> > |
> > |
> > | "AUTHOR: AMÉLIE"  # R-3.4.3
> > |
> > | "A"  # R-dev
> >
> > Confirmed for R-devel (current) on Ubuntu 17.10.  But ... isn't the
> regexp
> > you use wrong, ie isn't R-devel giving the correct answer?
> >
> > R> tempf <- tempfile()
> > R> writeLines(enc2utf8("author: Amélie"), con = tempf, useBytes = TRUE)
> > R> entry <- readLines(tempf, encoding = "UTF-8")
> > R> gsub("(\\w)", "\\U\\1", entry, perl = TRUE)
> > [1] "A"
> > R> gsub("(\\w+)", "\\U\\1", entry, perl = TRUE)
> > [1] "AUTHOR"
> > R> gsub("(.*)", "\\U\\1", entry, perl = TRUE)
> > [1] "AUTHOR: AMÉLIE"
> > R>
> >
> > Dirk
> >
> > --
> > http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Trailing underscores on C function names

2018-02-07 Thread William Dunlap via R-devel
In the fastmatch package, version 1.1-0, there is a C function called
"ctapply_", with a trailing underscore.  However, the NAMESPACE's call to
useDynLib refers to "ctapply", without the trailing underscore.

% grep ctapply NAMESPACE {R,src}/*
NAMESPACE:useDynLib(fastmatch, C_fmatch = fmatch, C_ctapply = ctapply,
C_coalesce = coalesce, C_append = append, mk_hash, get_table, get_values)
NAMESPACE:export(fmatch, fmatch.hash, ctapply, coalesce, "%fin%")
R/ctapply.R:ctapply <- function(X, INDEX, FUN, ..., MERGE=c)
.External(C_ctapply, parent.frame(), X, INDEX, FUN, MERGE, ...)
src/ctapply.c:SEXP ctapply_(SEXP args) {
src/ctapply.c: see ctapply(x, y, identity). It should be uncommon,
though


"Writing R Extensions" mentions, section 5.2, footnote 121, that .C and
.Fortran interpret their first argument as the name of the object file
symbol "possibly after some platform-specific translation, e.g. adding
leading or trailing underscores".

Should useDynLib use the underscored name?  The code doesn't seem
"platform-specific".  What are the rules concerning added underscores (or
capitalization?) that  R uses?

Bill Dunlap
TIBCO Software
wdunlap tibco.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R-gui sessions end when executing C-code

2018-02-02 Thread William Dunlap via R-devel
   SEXP eta = PROTECT(allocVector(REALSXP,H_c)); n_prot++;
   double *eta_c; eta_c = REAL(eta);
   for (i=0;i wrote:

> Hi
>
> I'm trying to develop some C code to find the fixpoint of a contraction
> mapping, the code compiles and gives the right results when executed in R.
> However R-gui session is frequently terminated. I'm suspecting some access
> violation error due to the exception code 0xc005
> In the error report windows 10 gives me.
>
> It is the first time I'm writing any C-code so I'm guessing I have done
> something really stupid. I have been trying to debug the code for a couple
> of days now,
> But I simply can't figure out what generates the problem. Could it be
> something particular to my windows set up and security stuff?
>
>
> I'm in the process of reading Writing R Extensions and Hadley Wickham's
> Advanced R but might have missed something.
>
> The windows error report:
>
> Faulting application name: Rgui.exe, version: 3.33.6774.0, time stamp:
> 0x58bd6d26
> Faulting module name: R.dll, version: 3.33.6774.0, time stamp: 0x58bd6d0b
> Exception code: 0xc005
> Fault offset: 0x0010b273
> Faulting process id: 0x1d14
> Faulting application start time: 0x01d39aede45c96e9
> Faulting application path: C:\Program Files\R\R-3.3.3\bin\x64\Rgui.exe
> Faulting module path: C:\Program Files\R\R-3.3.3\bin\x64\R.dll
> Report Id: c78d7c52-72c5-40f3-a3cc-927323d2af07
> Faulting package full name:
> Faulting package-relative application ID:
>
>
> ### How I call the C-function in R
>
> dyn.load("C://users//jeshyb//desktop//myC//lce_fixpoint_cc.dll")
>
>
> N = 10
> H = 3
> v <- rnorm(N*H)
> mu <- 0.1
> psi <- matrix(c(1,0,0.5,0.5,0,1),nrow=2)
> K <- dim(psi)[1]
> p <- rep(1/H,N*H)
> error <- 1e-10
>
>
> f<-function(p,v,mu,psi,N,H,K)
>{
>
> .Call("lce_fixpoint_cc",p, v,  mu,  psi,  as.integer(N), as.integer(H),
> as.integer(K),error)
>}
>
>
> for (i in 1:100)
>{
>   v <- rnorm(N*H)
>   p <- rep(1/H,N*H)
>
> a<-f(p,v,mu,psi,N,H,K)
>}
>
>
> a<-f(p,v,mu,psi,N,H,K)
> plot(a)
>
>
>
>  The C- function
>
>
>
> #include 
> #include 
>
>
> SEXP lce_fixpoint_cc(SEXP q, SEXP v, SEXP mu, SEXP psi, SEXP N,SEXP H,
> SEXP K, SEXP err)
> {
>
>int n_prot = 0;
>/* Make ready integer and double constants */
>PROTECT(N); n_prot++;
>PROTECT(H); n_prot++;
>PROTECT(K); n_prot++;
>int N_c = asInteger(N);
>int H_c = asInteger(H);
>int K_c = asInteger(K);
>
>double mu_c = asReal(mu);
>double mu2_c = 1.0 - mu_c;
>double error_c = asReal(err);
>double lowest_double = 1e-15;
>double tmp_c;
>double denom;
>double error_temp;
>double error_i_c;
>
>
>/* Make ready vector froms input */
>PROTECT(q); n_prot++;
>PROTECT(v); n_prot++;
>PROTECT(psi); n_prot++;
>double *v_c; v_c = REAL(v);
>double *psi_c; psi_c = REAL(psi);
>
>/* Initialize new vectors not given as input */
>SEXP q_copy = PROTECT(duplicate(q)); n_prot++;
>double *q_c; q_c = REAL(q_copy);
>
>SEXP q_new = 
> PROTECT(allocVector(REALSXP,length(q)));
> n_prot++;
>double *q_new_c; q_new_c = REAL(q_new);
>
>SEXP eta = PROTECT(allocVector(REALSXP,H_c));
> n_prot++;
>double *eta_c; eta_c = REAL(eta);
>
>SEXP exp_eta = PROTECT(allocVector(REALSXP,H_c));
> n_prot++;
>double *exp_eta_c; exp_eta_c = REAL(exp_eta);
>
>SEXP psi_ln_psiq =
> PROTECT(allocVector(REALSXP,H_c)); n_prot++;
>double *psi_ln_psiq_c; psi_ln_psiq_c =
> REAL(psi_ln_psiq);
>
>int not_converged;
>int maxIter = 1;
>int iter;
>   

Re: [Rd] as.character(list(NA))

2018-01-22 Thread William Dunlap via R-devel
I tend to avoid using as. functions on lists, since they act oddly in
several ways.
E.g, if the list "L" consists entirely of scalar elements then
as.numeric(L) acts like
as.numeric(unlist(L)) but if any element is not a scalar there is an
error.  as.character()
does not seem to make a distinction between the all-scalar and
not-all-scalar cases
but does various things with NA's of various types.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Mon, Jan 22, 2018 at 11:14 AM, Robert McGehee <
rmcge...@walleyetrading.net> wrote:

> Also perhaps a surprise that the behavior depends on the mode of the NA.
>
> > is.na(as.character(list(NA_real_)))
> [1] FALSE
> > is.na(as.character(list(NA_character_)))
> [1] TRUE
>
> Does this mean deparse() preserves NA-ness for NA_character_ but not
> NA_real_?
>
>
> -Original Message-
> From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of Hervé
> Pagès
> Sent: Monday, January 22, 2018 2:01 PM
> To: William Dunlap <wdun...@tibco.com>; Patrick Perry <
> ppe...@stern.nyu.edu>
> Cc: r-devel@r-project.org
> Subject: Re: [Rd] as.character(list(NA))
>
> On 01/20/2018 08:24 AM, William Dunlap via R-devel wrote:
> > I believe that for a list as.character() applies deparse()  to each
> element
> > of the list.  deparse() does not preserve NA-ness, as it is intended to
> > make text that the parser can read.
> >
> >> str(as.character(list(Na=NA, LglVec=c(TRUE,NA),
> > Function=function(x){x+1})))
> >   chr [1:3] "NA" "c(TRUE, NA)" "function (x) \n{\nx + 1\n}"
> >
>
> This really comes as a surprise though since coercion to all the
> other atomic types (except raw) preserve the NAs.
>
> And also as.character(unlist(list(NA))) preserves them.
>
> H.
>
> >
> > Bill Dunlap
> > TIBCO Software
> > wdunlap tibco.com
> >
> > On Sat, Jan 20, 2018 at 7:43 AM, Patrick Perry <ppe...@stern.nyu.edu>
> wrote:
> >
> >> As of R Under development (unstable) (2018-01-19 r74138):
> >>
> >>> as.character(list(NA))
> >> [1] "NA"
> >>
> >>> is.na(as.character(list(NA)))
> >> [1] FALSE
> >>
> >> __
> >> R-devel@r-project.org mailing list
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.
> ethz.ch_mailman_listinfo_r-2Ddevel=DwICAg=eRAMFD45gAfqt84VtBcfhQ=
> BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=
> VbamM9XRQOlfBakrmlrmQZ7DLgXZ-hhhFeLD-fKpoCo=
> Luhqwpr2bTltIA9Cy7kA4gwcQh16bla0S6OVe3Z09Xo=
> >>
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.
> ethz.ch_mailman_listinfo_r-2Ddevel=DwICAg=eRAMFD45gAfqt84VtBcfhQ=
> BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=
> VbamM9XRQOlfBakrmlrmQZ7DLgXZ-hhhFeLD-fKpoCo=
> Luhqwpr2bTltIA9Cy7kA4gwcQh16bla0S6OVe3Z09Xo=
> >
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fredhutch.org
> Phone:  (206) 667-5791
> Fax:(206) 667-1319
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] as.character(list(NA))

2018-01-20 Thread William Dunlap via R-devel
I believe that for a list as.character() applies deparse()  to each element
of the list.  deparse() does not preserve NA-ness, as it is intended to
make text that the parser can read.

> str(as.character(list(Na=NA, LglVec=c(TRUE,NA),
Function=function(x){x+1})))
 chr [1:3] "NA" "c(TRUE, NA)" "function (x) \n{\nx + 1\n}"


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Sat, Jan 20, 2018 at 7:43 AM, Patrick Perry  wrote:

> As of R Under development (unstable) (2018-01-19 r74138):
>
> > as.character(list(NA))
> [1] "NA"
>
> > is.na(as.character(list(NA)))
> [1] FALSE
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Fwd: R/MKL Intel 2018 Compatibility

2018-01-08 Thread William Dunlap via R-devel
The x and y passed to dgemm in that code are pointers to the same memory,
thus breaking Fortran's no-aliasing rule.  Is it possible the MKL depends
on the
caller following that rule?

You might try dsyrk() instead of dgemm.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Mon, Jan 8, 2018 at 6:57 AM, Tomas Kalibera 
wrote:

> Hi Guillaume,
>
> In principle, mycrossprod function does not need to PROTECT "ans",
> because it does not call any allocating function after allocating "ans"
> ("dgemm" in particular should not allocate from the R heap). So it is
> surprising that PROTECTion makes a difference in your case. I agree
> there is no harm protecting defensively. R itself calls dgemm with the R
> object for the result protected when calculating matrix products, but
> there it is needed because there is further allocation when setting up
> attributes for the result.
>
> Best
> Tomas
>
>
> On 01/08/2018 02:41 PM, Guillaume Collange wrote:
> > Dear all,
> >
> >
> >
> > I would like to submit an issue that we are facing.
> >
> >
> >
> > Indeed, in our environment, we are optimizing the R code to speed up some
> > mathematical calculations as matrix products using the INTEL libraries (
> > MKL) ( https://software.intel.com/en-us/mkl )
> >
> >
> >
> > With the last version of the MKL libraries Intel 2018, we are facing to
> an
> > issue with *all INTERNAL command* that are executing in R. The R console
> is
> > freezing executing a process at 100% and never stop!!! It’s really an
> issue
> > for us.
> >
> >
> >
> > As example, we can reproduce the error with *crossprod. Crossprod *which
> is
> > a wrapper of BLAS GEMM (optimized with MKL libraries), in this function
> it
> > seems that variables are not protected ( PROTECT(); UNPROTECT() ), see
> the
> > screenshot below, which is a recommendation for external commands:
> >
> >
> >
> > Picture1
> >
> >
> > *RECOMMANDATION*
> >
> > *Picture2*
> >
> > *Code of CROSSPROD*
> >
> >   Picture 3
> >
> >
> >
> > If we are recoding the CROSSPROD function with PROTECTT
> >
> > No more issues…
> >
> >
> >
> >
> >
> > Do you have any idea to solve this bug? Any recommendations?
> >
> >
> >
> >
> >
> > Thank you by advance for your help.
> >
> >
> >
> >
> >
> > Best regards,
> >
> > Guillaume Collange
> >
> >
> >
> >
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] silent recycling in logical indexing

2018-01-04 Thread William Dunlap via R-devel
I have never used this construct.  However, part of my job is seeing how
well CRAN packages work in our reimplementation of the R language
and I am continually surprised by the inventiveness of package writers.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, Jan 4, 2018 at 1:44 PM, Ben Bolker  wrote:

> Hmm.
>
> Chuck: I don't see how this example represents
> incomplete/incommensurate recycling. Doesn't TRUE replicate from
> length-1 to length-3 in this case (mat[c(TRUE,FALSE),2] would be an
> example of incomplete recycling)?
>
> William: clever, but maybe too clever unless you really need the
> speed? (The clever way is 8 times faster in the following case ...)
>
> x <- rep(1,1e6)
> rbenchmark::benchmark(x[c(FALSE,TRUE,FALSE)],x[seq_along(x) %% 3 == 2])
>
> On the other hand, it takes 0.025 vs 0.003 seconds per iteration ...
> fortunes::fortune("7ms")
>
>
> On Thu, Jan 4, 2018 at 4:09 PM, Berry, Charles  wrote:
> >
> >
> >> On Jan 4, 2018, at 11:56 AM, Ben Bolker  wrote:
> >>
> >>
> >>  Sorry if this has been covered here somewhere in the past, but ...
> >>
> >>  Does anyone know why logical vectors are *silently* recycled, even
> >> when they are incommensurate lengths, when doing logical indexing?
> >
> > It is convenient to use a single `TRUE' in programmatic manipulation of
> subscripts in the same manner as using an empty subscript interactively:
> >
> >> mat<-diag(1:3)
> >> expr1 <- quote(mat[])
> >> expr1[[3]] <- TRUE
> >> expr1[[4]] <- 2
> >> eval(expr1)
> > [1] 0 2 0
> >> mat[,2]
> > [1] 0 2 0
> >
> > HTH,
> >
> > Chuck
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] silent recycling in logical indexing

2018-01-04 Thread William Dunlap via R-devel
One use case is when you want to extract every third item, starting with
the second, of an arbitrary vector with
x[c(FALSE, TRUE, FALSE)]
instead of
x[seq_along(x) %% 3 == 2]

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, Jan 4, 2018 at 11:56 AM, Ben Bolker  wrote:

>
>   Sorry if this has been covered here somewhere in the past, but ...
>
>   Does anyone know why logical vectors are *silently* recycled, even
> when they are incommensurate lengths, when doing logical indexing?  This
> is as documented:
>
>   For ‘[’-indexing only: ‘i’, ‘j’, ‘...’ can be logical
>   vectors, indicating elements/slices to select.  Such vectors
>   are recycled if necessary to match the corresponding extent.
>
> but IMO weird:
>
> > x <- c(TRUE,TRUE,FALSE)
> > y <- c(TRUE,FALSE)
> > x[y]
> [1]  TRUE FALSE
>
> ## (TRUE, FALSE) gets recycled to (TRUE,FALSE,TRUE) and selects
> ##  the first and third elements
>
> If we do logical operations instead we do get a warning:
>
> > x | y
> [1] TRUE TRUE TRUE
> Warning message:
> In x | y : longer object length is not a multiple of shorter object length
>
>   Is it just too expensive to test for incomplete recycling when doing
> subsetting, or is there a sensible use case for incomplete recycling?
>
>   Ll. 546ff of main/src/subscript.c suggest that there is a place in the
> code where we already know if incomplete recycling has happened ...
>
>  Thoughts?
>
>cheers
>  Ben Bolker
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] winbuilder warning message wrt function pointers

2017-12-29 Thread William Dunlap via R-devel
And remove the cast on the return value of R_GETCCallable.  And check
that your function is found before using it.

#include 
#include 
#include 

void bdsmatrix_prod4(int nrow,int nblock,   int *bsize,
double *bmat, double *rmat,
int nfrail,   double *y) {
DL_FUNC fun = NULL;
if (fun==NULL) {
fun = R_GetCCallable("bdsmatrix", "bdsmatrix_prod4");
}
if (fun==NULL) {
Rf_error("Cannot find C function 'bdsmatrix_prod4' in library
'bdsmatrix.{so,dll}'");
}
fun(nrow, nblock, bsize, bmat, rmat, nfrail, y);
}




Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Dec 29, 2017 at 8:48 AM, William Dunlap  wrote:

> Try changing
>   static void (*fun)() = NULL;
> to
>   DL_FUNC fun = NULL;
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Fri, Dec 29, 2017 at 5:14 AM, Therneau, Terry M., Ph.D. <
> thern...@mayo.edu> wrote:
>
>> I've recently updated the coxme package, which calls internal routines
>> from the bdsmatrix package.  (It is in fact mentioned as an example of this
>> in the Extensions manual.)
>> The call connections are a blocks like this, one for each of the 9 called
>> C routines.
>>
>> void bdsmatrix_prod4(int nrow,int nblock,   int *bsize,
>> double *bmat, double *rmat,
>> int nfrail,   double *y) {
>> static void (*fun)() = NULL;
>> if (fun==NULL)
>> fun = (void (*)) R_GetCCallable("bdsmatrix", "bdsmatrix_prod4");
>> fun(nrow, nblock, bsize, bmat, rmat, nfrail, y);
>> }
>>
>> ..
>>
>> The winbuilder run is flagging all of these with
>>
>> bdsmatrix_stub.h:22:6: warning: ISO C forbids assignment between function
>> pointer and 'void *' [-Wpedantic]
>>   fun = (void (*)) R_GetCCallable("bdsmatrix", "bdsmatrix_prod4");
>>
>> Ignore?  Or should these lines have been written in a different way?
>>
>> Terry T.
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] winbuilder warning message wrt function pointers

2017-12-29 Thread William Dunlap via R-devel
Try changing
  static void (*fun)() = NULL;
to
  DL_FUNC fun = NULL;

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Dec 29, 2017 at 5:14 AM, Therneau, Terry M., Ph.D. <
thern...@mayo.edu> wrote:

> I've recently updated the coxme package, which calls internal routines
> from the bdsmatrix package.  (It is in fact mentioned as an example of this
> in the Extensions manual.)
> The call connections are a blocks like this, one for each of the 9 called
> C routines.
>
> void bdsmatrix_prod4(int nrow,int nblock,   int *bsize,
> double *bmat, double *rmat,
> int nfrail,   double *y) {
> static void (*fun)() = NULL;
> if (fun==NULL)
> fun = (void (*)) R_GetCCallable("bdsmatrix", "bdsmatrix_prod4");
> fun(nrow, nblock, bsize, bmat, rmat, nfrail, y);
> }
>
> ..
>
> The winbuilder run is flagging all of these with
>
> bdsmatrix_stub.h:22:6: warning: ISO C forbids assignment between function
> pointer and 'void *' [-Wpedantic]
>   fun = (void (*)) R_GetCCallable("bdsmatrix", "bdsmatrix_prod4");
>
> Ignore?  Or should these lines have been written in a different way?
>
> Terry T.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wish List: base::source() + Add Execution Time Argument

2017-12-21 Thread William Dunlap via R-devel
Is source() the right place for this?  It may be, but we've had customers
who would like
this sort of thing done for commands entered by hand.  And there are those
who want
a description of any "non-triivial" objects created in .GlobalEnv by each
expression, ...
Do they need a way to wrap each expression evaluated in envir=.GlobalEnv
with a
function of their choice, one that would print times, datasets created,
etc.?

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, Dec 21, 2017 at 3:46 AM, Juan Telleria  wrote:

> Dear R Developers,
>
> Adding to source() base function a Timer which indicates the execution time
> of the source code would be a very well welcome feature, and in my opinion
> not difficult to implement as an additional funtion argument.
>
> The source(timing = TRUE) function shall execute internally the following
> code for each statement:
>
> old <- Sys.time() # get start time at the beginning of source()
> # source code
> # print elapsed time
> new <- Sys.time() - old # calculate difference
> print(new) # print in nice format
>
> Thank you.
>
> Kind regards,
>
> Juan Telleria
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] tryCatch in on.exit()

2017-12-01 Thread William Dunlap via R-devel
Things work as I would expect if you give stop() a condition object instead
of a string:

makeError <- function(message, class = "simpleError", call = sys.call(-2)) {
structure(list(message=message, call=call), class=c(class, "error",
"condition"))
}
f0 <- function() {
on.exit(tryCatch(expr = stop("pb. in f0's on.exit"),
 error = function(e)cat("[error] caught",
paste(collapse="/", class(e)), ":", conditionMessage(e), "\n")))
stop("pb. in f0")
}
f1 <- function() {
on.exit(tryCatch(expr = stop(makeError("pb. in f1's on.exit",
class="simpleError")),
 error = function(e)cat("[error] caught",
paste(collapse="/", class(e)), ":", conditionMessage(e), "\n")))
stop(makeError("pb. in f1", class="simpleError"))
}
catch <- function(FUN) {
tryCatch(
expr = FUN(),
error = function(e)paste("[error] caught", paste(collapse="/",
class(e)), ":", conditionMessage(e)))
}
catch(f0) # calls stop("string")
#[error] caught simpleError/error/condition : pb. in f0's on.exit
#[1] "[error] caught simpleError/error/condition : pb. in f0's on.exit"
catch(f1) # calls stop(conditionObject)
#[error] caught simpleError/error/condition : pb. in f1's on.exit
#[1] "[error] caught simpleError/error/condition : pb. in f1"


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Dec 1, 2017 at 12:58 PM, William Dunlap  wrote:

> The following example involves a function whose on.exit()
> expression both generates an error and catches the error.
> The body of the function also generates an error.
>
> When calling the function wrapped in a tryCatch, should
> that tryCatch's error function be given the error from the
> body of the function, since the one from the on.exit has
> already been dealt with?  Currently the outer tryCatch gets
> the error from the on.exit expression.
>
> xx <- function() {
>   on.exit(tryCatch(
> expr = stop("error in xx's on.exit"),
> error=function(e) {
>   cat("xx's on.exit caught error: <<", conditionMessage(e), ">>\n",
> sep="")
> }))
>   stop("error in body of xx")
> }
> zz <- tryCatch(xx(), error=function(e)paste("outer tryCatch caught error
> <<", conditionMessage(e), ">>", sep=""))
> #xx's on.exit caught error: <>
> zz
> #[1] "outer tryCatch caught error <>"
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] tryCatch in on.exit()

2017-12-01 Thread William Dunlap via R-devel
The following example involves a function whose on.exit()
expression both generates an error and catches the error.
The body of the function also generates an error.

When calling the function wrapped in a tryCatch, should
that tryCatch's error function be given the error from the
body of the function, since the one from the on.exit has
already been dealt with?  Currently the outer tryCatch gets
the error from the on.exit expression.

xx <- function() {
  on.exit(tryCatch(
expr = stop("error in xx's on.exit"),
error=function(e) {
  cat("xx's on.exit caught error: <<", conditionMessage(e), ">>\n",
sep="")
}))
  stop("error in body of xx")
}
zz <- tryCatch(xx(), error=function(e)paste("outer tryCatch caught error
<<", conditionMessage(e), ">>", sep=""))
#xx's on.exit caught error: <>
zz
#[1] "outer tryCatch caught error <>"


Bill Dunlap
TIBCO Software
wdunlap tibco.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] check does not check that package examples remove tempdir()

2017-11-08 Thread William Dunlap via R-devel
I think recreating tempdir() is ok in an emergency situation, but package
code
should not be removing tempdir() - it may contain important information.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Wed, Nov 8, 2017 at 4:55 PM, Henrik Bengtsson <henrik.bengts...@gmail.com
> wrote:

> Related to this problem - from R-devel NEWS
> (https://cran.r-project.org/doc/manuals/r-devel/NEWS.html):
>
> * tempdir(check = TRUE) recreates the tmpdir() if it is no longer
> valid (e.g. because some other process has cleaned up the ‘/tmp’
> directory).
>
> Not sure if there's a plan to make check = TRUE the default though.
>
> /Henrik
>
> On Wed, Nov 8, 2017 at 4:43 PM, William Dunlap via R-devel
> <r-devel@r-project.org> wrote:
> > I was looking at the CRAN package 'bfork-0.1.2', which exposes the Unix
> > fork() and waitpid() calls at the R code level, and noticed that the help
> > file example for bfork::fork removes R's temporary directory, the value
> of
> > tempdir().   I think it happens because the forked process shares the
> value
> > of tempdir() with the parent process and removes it when it exits.
> >
> > This seems like a serious problem - should 'check' make sure that running
> > code in a package's examples, vignettes, etc. leaves tempdir() intact?
> >
> >> dir.exists(tempdir())
> > [1] TRUE
> >> library(bfork)
> >> example(fork)
> >
> > fork> ## create a function to be run as a separate process
> > fork> fn <- function() {
> > fork+ Sys.sleep(4)
> > fork+ print("World!")
> > fork+ }
> >
> > fork> ## fork the process
> > fork> pid <- fork(fn)
> >
> > fork> ## do work in the parent process
> > fork> print("Hello")
> > [1] "Hello"
> >
> > fork> ## wait for the child process
> > fork> waitpid(pid)
> > [1] "World!"
> > [1] 7063
> >> dir.exists(tempdir())
> > [1] FALSE
> >
> >
> > Bill Dunlap
> > TIBCO Software
> > wdunlap tibco.com
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] check does not check that package examples remove tempdir()

2017-11-08 Thread William Dunlap via R-devel
I was looking at the CRAN package 'bfork-0.1.2', which exposes the Unix
fork() and waitpid() calls at the R code level, and noticed that the help
file example for bfork::fork removes R's temporary directory, the value of
tempdir().   I think it happens because the forked process shares the value
of tempdir() with the parent process and removes it when it exits.

This seems like a serious problem - should 'check' make sure that running
code in a package's examples, vignettes, etc. leaves tempdir() intact?

> dir.exists(tempdir())
[1] TRUE
> library(bfork)
> example(fork)

fork> ## create a function to be run as a separate process
fork> fn <- function() {
fork+ Sys.sleep(4)
fork+ print("World!")
fork+ }

fork> ## fork the process
fork> pid <- fork(fn)

fork> ## do work in the parent process
fork> print("Hello")
[1] "Hello"

fork> ## wait for the child process
fork> waitpid(pid)
[1] "World!"
[1] 7063
> dir.exists(tempdir())
[1] FALSE


Bill Dunlap
TIBCO Software
wdunlap tibco.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Extreme bunching of random values from runif with Mersenne-Twister seed

2017-11-03 Thread William Dunlap via R-devel
Another other generator is subject to the same problem with the same
probabilitiy.

> Filter(function(s){set.seed(s,
kind="Knuth-TAOCP-2002");runif(1,17,26)>25.99}, 1:1)
 [1]  280  415  826 1372 2224 2544 3270 3594 3809 4116 4236 5018 5692 7043
7212 7364 7747 9256 9491 9568 9886



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Nov 3, 2017 at 10:31 AM, Tirthankar Chakravarty <
tirthankar.li...@gmail.com> wrote:

>
> Bill,
>
> I have clarified this on SO, and I will copy that clarification in here:
>
> "Sure, we tested them on other 8-digit numbers as well & we could not
> replicate. However, these are honest-to-goodness numbers generated by a
> non-adversarial system that has no conception of these numbers being used
> for anything other than a unique key for an entity -- these are not a
> specially constructed edge case. Would be good to know what seeds will and
> will not work, and why."
>
> These numbers are generated by an application that serves a form, and
> associates form IDs in a sequence. The application calls our API depending
> on the form values entered by users, which in turn calls our R code that
> executes some code that needs an RNG. Since the API has to be stateless, to
> be able to replicate the results for possible debugging, we need to draw
> random numbers in a way that we can replicate the results of the API
> response -- we use the form ID as seeds.
>
> I repeat, there is no design or anything adversarial about the way that
> these numbers were generated -- the system generating these numbers and
> the users entering inputs have no conception of our use of an RNG -- this
> is meant to just be a random sequence of form IDs. This issue was
> discovered completely by chance when the output of the API was observed to
> be highly non-random. It is possible that it is a 1/10^8 chance, but that
> is hard to believe, given that the API hit depends on user input. Note also
> that the issue goes away when we use a different RNG as mentioned below.
>
> T
>
> On Fri, Nov 3, 2017 at 9:58 PM, William Dunlap  wrote:
>
>> The random numbers in a stream initialized with one seed should have
>> about the desired distribution.  You don't win by changing the seed all the
>> time.  Your seeds caused the first numbers of a bunch of streams to be
>> about the same, but the second and subsequent entries in each stream do
>> look uniformly distributed.
>>
>> You didn't say what your 'upstream process' was, but it is easy to come
>> up with seeds that give about the same first value:
>>
>> > Filter(function(s){set.seed(s);runif(1,17,26)>25.99}, 1:1)
>>  [1]  514  532 1951 2631 3974 4068 4229 6092 6432 7264 9090
>>
>>
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>> On Fri, Nov 3, 2017 at 12:49 AM, Tirthankar Chakravarty <
>> tirthankar.li...@gmail.com> wrote:
>>
>>> This is cross-posted from SO (https://stackoverflow.com/q/4
>>> 7079702/1414455),
>>> but I now feel that this needs someone from R-Devel to help understand
>>> why
>>> this is happening.
>>>
>>> We are facing a weird situation in our code when using R's [`runif`][1]
>>> and
>>> setting seed with `set.seed` with the `kind = NULL` option (which
>>> resolves,
>>> unless I am mistaken, to `kind = "default"`; the default being
>>> `"Mersenne-Twister"`).
>>>
>>> We set the seed using (8 digit) unique IDs generated by an upstream
>>> system,
>>> before calling `runif`:
>>>
>>> seeds = c(
>>>   "86548915", "86551615", "86566163", "86577411", "86584144",
>>>   "86584272", "86620568", "86724613", "86756002", "86768593",
>>> "86772411",
>>>   "86781516", "86794389", "86805854", "86814600", "86835092",
>>> "86874179",
>>>   "86876466", "86901193", "86987847", "86988080")
>>>
>>> random_values = sapply(seeds, function(x) {
>>>   set.seed(x)
>>>   y = runif(1, 17, 26)
>>>   return(y)
>>> })
>>>
>>> This gives values that are **extremely** bunched together.
>>>
>>> > summary(random_values)
>>>Min. 1st Qu.  MedianMean 3rd Qu.Max.
>>>   25.13   25.36   25.66   25.58   25.83   25.94
>>>
>>> This behaviour of `runif` goes away when we use `kind =
>>> "Knuth-TAOCP-2002"`, and we get values that appear to be much more evenly
>>> spread out.
>>>
>>> random_values = sapply(seeds, function(x) {
>>>   set.seed(x, kind = "Knuth-TAOCP-2002")
>>>   y = runif(1, 17, 26)
>>>   return(y)
>>> })
>>>
>>> *Output omitted.*
>>>
>>> ---
>>>
>>> **The most interesting thing here is that this does not happen on Windows
>>> -- only happens on Ubuntu** (`sessionInfo` output for Ubuntu & Windows
>>> below).
>>>
>>> # Windows output: #
>>>
>>> > seeds = c(
>>> +   "86548915", "86551615", "86566163", "86577411", "86584144",
>>> +   "86584272", "86620568", "86724613", "86756002", "86768593",
>>> "86772411",
>>> +   "86781516", "86794389", "86805854", "86814600", "86835092",
>>> "86874179",
>>> +   "86876466", "86901193", "86987847", 

Re: [Rd] Extreme bunching of random values from runif with Mersenne-Twister seed

2017-11-03 Thread William Dunlap via R-devel
The random numbers in a stream initialized with one seed should have about
the desired distribution.  You don't win by changing the seed all the
time.  Your seeds caused the first numbers of a bunch of streams to be
about the same, but the second and subsequent entries in each stream do
look uniformly distributed.

You didn't say what your 'upstream process' was, but it is easy to come up
with seeds that give about the same first value:

> Filter(function(s){set.seed(s);runif(1,17,26)>25.99}, 1:1)
 [1]  514  532 1951 2631 3974 4068 4229 6092 6432 7264 9090



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Nov 3, 2017 at 12:49 AM, Tirthankar Chakravarty <
tirthankar.li...@gmail.com> wrote:

> This is cross-posted from SO (https://stackoverflow.com/q/47079702/1414455
> ),
> but I now feel that this needs someone from R-Devel to help understand why
> this is happening.
>
> We are facing a weird situation in our code when using R's [`runif`][1] and
> setting seed with `set.seed` with the `kind = NULL` option (which resolves,
> unless I am mistaken, to `kind = "default"`; the default being
> `"Mersenne-Twister"`).
>
> We set the seed using (8 digit) unique IDs generated by an upstream system,
> before calling `runif`:
>
> seeds = c(
>   "86548915", "86551615", "86566163", "86577411", "86584144",
>   "86584272", "86620568", "86724613", "86756002", "86768593",
> "86772411",
>   "86781516", "86794389", "86805854", "86814600", "86835092",
> "86874179",
>   "86876466", "86901193", "86987847", "86988080")
>
> random_values = sapply(seeds, function(x) {
>   set.seed(x)
>   y = runif(1, 17, 26)
>   return(y)
> })
>
> This gives values that are **extremely** bunched together.
>
> > summary(random_values)
>Min. 1st Qu.  MedianMean 3rd Qu.Max.
>   25.13   25.36   25.66   25.58   25.83   25.94
>
> This behaviour of `runif` goes away when we use `kind =
> "Knuth-TAOCP-2002"`, and we get values that appear to be much more evenly
> spread out.
>
> random_values = sapply(seeds, function(x) {
>   set.seed(x, kind = "Knuth-TAOCP-2002")
>   y = runif(1, 17, 26)
>   return(y)
> })
>
> *Output omitted.*
>
> ---
>
> **The most interesting thing here is that this does not happen on Windows
> -- only happens on Ubuntu** (`sessionInfo` output for Ubuntu & Windows
> below).
>
> # Windows output: #
>
> > seeds = c(
> +   "86548915", "86551615", "86566163", "86577411", "86584144",
> +   "86584272", "86620568", "86724613", "86756002", "86768593",
> "86772411",
> +   "86781516", "86794389", "86805854", "86814600", "86835092",
> "86874179",
> +   "86876466", "86901193", "86987847", "86988080")
> >
> > random_values = sapply(seeds, function(x) {
> +   set.seed(x)
> +   y = runif(1, 17, 26)
> +   return(y)
> + })
> >
> > summary(random_values)
>Min. 1st Qu.  MedianMean 3rd Qu.Max.
>   17.32   20.14   23.00   22.17   24.07   25.90
>
> Can someone help understand what is going on?
>
> Ubuntu
> --
>
> R version 3.4.0 (2017-04-21)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 16.04.2 LTS
>
> Matrix products: default
> BLAS: /usr/lib/libblas/libblas.so.3.6.0
> LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8  LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8   LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8   LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8  LC_NAME=en_US.UTF-8
>  [9] LC_ADDRESS=en_US.UTF-8LC_TELEPHONE=en_US.UTF-8
> [11] LC_MEASUREMENT=en_US.UTF-8LC_IDENTIFICATION=en_US.UTF-8
>
> attached base packages:
> [1] parallel  stats graphics  grDevices utils datasets
> methods   base
>
> other attached packages:
> [1] RMySQL_0.10.8   DBI_0.6-1
>  [3] jsonlite_1.4tidyjson_0.2.2
>  [5] optiRum_0.37.3  lubridate_1.6.0
>  [7] httr_1.2.1  gdata_2.18.0
>  [9] XLConnect_0.2-12XLConnectJars_0.2-12
> [11] data.table_1.10.4   stringr_1.2.0
> [13] readxl_1.0.0xlsx_0.5.7
> [15] xlsxjars_0.6.1  rJava_0.9-8
> [17] sqldf_0.4-10RSQLite_1.1-2
> [19] gsubfn_0.6-6proto_1.0.0
> [21] dplyr_0.5.0 purrr_0.2.4
> [23] readr_1.1.1 tidyr_0.6.3
> [25] tibble_1.3.0tidyverse_1.1.1
> [27] rBayesianOptimization_1.1.0 xgboost_0.6-4
> [29] MLmetrics_1.1.1 caret_6.0-76
> [31] ROCR_1.0-7  gplots_3.0.1
> [33] effects_3.1-2   pROC_1.10.0
> [35] pscl_1.4.9  lattice_0.20-35
> [37] MASS_7.3-47 ggplot2_2.2.1
>
> loaded via a namespace (and not attached):
> [1] splines_3.4.0  foreach_1.4.3  AUC_0.3.0
> 

Re: [Rd] Cannot Compute Box's M (Three Days Trying...)

2017-10-27 Thread William Dunlap via R-devel
Does it work if you supply the closing parenthesis on the call to boxM?
The parser says the input is incomplete and a missing closing parenthesis
would cause that error..

// create a string command with that variable name.String boxVariable =
"boxM(boxMVariable [,-5], boxMVariable[,5]";

// try to execute the command...
// FAILS with org.rosuda.REngine.Rserve.RserveException: eval failed,
request status: R parser: input incomplete FAILS !   REXP
theBoxMResult = rConnection.eval(boxVariable); FAILS <

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Oct 27, 2017 at 12:41 PM, Morkus via R-devel 
wrote:

> It can't be this hard, right? I really need a shove in the right direction
> here. Been spinning wheels for three days. Cannot get past the errors.
>
> I'm doing something wrong, obviously, since I can easily compute the Box's
> M right there in RStudio
>
> But I don't see what is wrong below with the coding equivalent.
>
> The entire code snippet is below. The code fails below on the call to the
> boxM statistic call.
>
> PLEASE HELP!!!
>
> Thanks in advance,
>
> -
>
> rConnection.eval("library('biotools')");
>
> String inputIris = "5.1,3.5,1.4,0.2,setosa\n" +
> "4.9,3,1.4,0.2,setosa\n" +
> "4.7,3.2,1.3,0.2,setosa\n" +
> "4.6,3.1,1.5,0.2,setosa\n" +
> "5,3.6,1.4,0.2,setosa\n" +
> "5.4,3.9,1.7,0.4,setosa\n" +
> "4.6,3.4,1.4,0.3,setosa\n" +
> "5,3.4,1.5,0.2,setosa\n" +
> "4.4,2.9,1.4,0.2,setosa\n" +
> "4.9,3.1,1.5,0.1,setosa\n" +
> "5.4,3.7,1.5,0.2,setosa\n" +
> "4.8,3.4,1.6,0.2,setosa\n" +
> "4.8,3,1.4,0.1,setosa\n" +
> "4.3,3,1.1,0.1,setosa\n" +
> "5.8,4,1.2,0.2,setosa\n" +
> "5.7,4.4,1.5,0.4,setosa\n" +
> "5.4,3.9,1.3,0.4,setosa\n" +
> "5.1,3.5,1.4,0.3,setosa\n" +
> "5.7,3.8,1.7,0.3,setosa\n" +
> "5.1,3.8,1.5,0.3,setosa\n" +
> "5.4,3.4,1.7,0.2,setosa\n" +
> "5.1,3.7,1.5,0.4,setosa\n" +
> "4.6,3.6,1,0.2,setosa\n" +
> "5.1,3.3,1.7,0.5,setosa\n" +
> "4.8,3.4,1.9,0.2,setosa\n" +
> "5,3,1.6,0.2,setosa\n" +
> "5,3.4,1.6,0.4,setosa\n" +
> "5.2,3.5,1.5,0.2,setosa\n" +
> "5.2,3.4,1.4,0.2,setosa\n" +
> "4.7,3.2,1.6,0.2,setosa\n" +
> "4.8,3.1,1.6,0.2,setosa\n" +
> "5.4,3.4,1.5,0.4,setosa\n" +
> "5.2,4.1,1.5,0.1,setosa\n" +
> "5.5,4.2,1.4,0.2,setosa\n" +
> "4.9,3.1,1.5,0.2,setosa\n" +
> "5,3.2,1.2,0.2,setosa\n" +
> "5.5,3.5,1.3,0.2,setosa\n" +
> "4.9,3.6,1.4,0.1,setosa\n" +
> "4.4,3,1.3,0.2,setosa\n" +
> "5.1,3.4,1.5,0.2,setosa\n" +
> "5,3.5,1.3,0.3,setosa\n" +
> "4.5,2.3,1.3,0.3,setosa\n" +
> "4.4,3.2,1.3,0.2,setosa\n" +
> "5,3.5,1.6,0.6,setosa\n" +
> "5.1,3.8,1.9,0.4,setosa\n" +
> "4.8,3,1.4,0.3,setosa\n" +
> "5.1,3.8,1.6,0.2,setosa\n" +
> "4.6,3.2,1.4,0.2,setosa\n" +
> "5.3,3.7,1.5,0.2,setosa\n" +
> "5,3.3,1.4,0.2,setosa\n" +
> "7,3.2,4.7,1.4,versicolor\n" +
> "6.4,3.2,4.5,1.5,versicolor\n" +
> "6.9,3.1,4.9,1.5,versicolor\n" +
> "5.5,2.3,4,1.3,versicolor\n" +
> "6.5,2.8,4.6,1.5,versicolor\n" +
> "5.7,2.8,4.5,1.3,versicolor\n" +
> "6.3,3.3,4.7,1.6,versicolor\n" +
> "4.9,2.4,3.3,1,versicolor\n" +
> "6.6,2.9,4.6,1.3,versicolor\n" +
> "5.2,2.7,3.9,1.4,versicolor\n" +
> "5,2,3.5,1,versicolor\n" +
> "5.9,3,4.2,1.5,versicolor\n" +
> "6,2.2,4,1,versicolor\n" +
> "6.1,2.9,4.7,1.4,versicolor\n" +
> "5.6,2.9,3.6,1.3,versicolor\n" +
> "6.7,3.1,4.4,1.4,versicolor\n" +
> "5.6,3,4.5,1.5,versicolor\n" +
> "5.8,2.7,4.1,1,versicolor\n" +
> "6.2,2.2,4.5,1.5,versicolor\n" +
> "5.6,2.5,3.9,1.1,versicolor\n" +
> "5.9,3.2,4.8,1.8,versicolor\n" +
> "6.1,2.8,4,1.3,versicolor\n" +
> "6.3,2.5,4.9,1.5,versicolor\n" +
> "6.1,2.8,4.7,1.2,versicolor\n" +
> "6.4,2.9,4.3,1.3,versicolor\n" +
> "6.6,3,4.4,1.4,versicolor\n" +
> "6.8,2.8,4.8,1.4,versicolor\n" +
> "6.7,3,5,1.7,versicolor\n" +
> "6,2.9,4.5,1.5,versicolor\n" +
> "5.7,2.6,3.5,1,versicolor\n" +
> "5.5,2.4,3.8,1.1,versicolor\n" +
> "5.5,2.4,3.7,1,versicolor\n" +
> "5.8,2.7,3.9,1.2,versicolor\n" +
> "6,2.7,5.1,1.6,versicolor\n" +
> "5.4,3,4.5,1.5,versicolor\n" +
> "6,3.4,4.5,1.6,versicolor\n" +
> "6.7,3.1,4.7,1.5,versicolor\n" +
> "6.3,2.3,4.4,1.3,versicolor\n" +
> "5.6,3,4.1,1.3,versicolor\n" +
> "5.5,2.5,4,1.3,versicolor\n" +
> "5.5,2.6,4.4,1.2,versicolor\n" +
> "6.1,3,4.6,1.4,versicolor\n" +
> "5.8,2.6,4,1.2,versicolor\n" +
> "5,2.3,3.3,1,versicolor\n" +
> "5.6,2.7,4.2,1.3,versicolor\n" +
> "5.7,3,4.2,1.2,versicolor\n" +
> "5.7,2.9,4.2,1.3,versicolor\n" +
> "6.2,2.9,4.3,1.3,versicolor\n" +
> "5.1,2.5,3,1.1,versicolor\n" +
> "5.7,2.8,4.1,1.3,versicolor\n" +
> "6.3,3.3,6,2.5,virginica\n" +
> "5.8,2.7,5.1,1.9,virginica\n" +
> "7.1,3,5.9,2.1,virginica\n" +
> "6.3,2.9,5.6,1.8,virginica\n" +
> "6.5,3,5.8,2.2,virginica\n" +
> "7.6,3,6.6,2.1,virginica\n" +
> "4.9,2.5,4.5,1.7,virginica\n" +
> "7.3,2.9,6.3,1.8,virginica\n" +
> "6.7,2.5,5.8,1.8,virginica\n" +
> "7.2,3.6,6.1,2.5,virginica\n" +
> "6.5,3.2,5.1,2,virginica\n" +
> "6.4,2.7,5.3,1.9,virginica\n" +
> "6.8,3,5.5,2.1,virginica\n" +
> "5.7,2.5,5,2,virginica\n" +
> "5.8,2.8,5.1,2.4,virginica\n" +
> "6.4,3.2,5.3,2.3,virginica\n" +
> "6.5,3,5.5,1.8,virginica\n" +
> 

  1   2   >