[Rd] Support for as(x, "raw")
Hi, as(x, "") is supported and does as.(x) for all vector types except for raw. For example all the following coercions work and do what you'd expect: as(1L, "logical"), as(1L, "double"), as(1L, "complex"), as(1L, "character"), as(1L, "list"). But as(1L, "raw") does not: > as(1L, "raw") Error in as(1L, "raw") : no method or default for coercing “integer” to “raw” Even though as.raw(1L) works: > as.raw(1L) [1] 01 Is there any particular reason for that or would it be reasonable to define a coerce() method from ANY to raw like it's been done for all the other vector types: > selectMethod(coerce, c("ANY", "logical")) Method Definition: function (from, to, strict = TRUE) { value <- as.logical(from) if (strict) attributes(value) <- NULL value } Signatures: from to target "ANY" "logical" defined "ANY" "logical" ... ... > selectMethod(coerce, c("ANY", "list")) Method Definition: function (from, to, strict = TRUE) { value <- as.list(from) if (strict) attributes(value) <- NULL value } Signatures: from to target "ANY" "list" defined "ANY" "list" > selectMethod(coerce, c("ANY", "raw")) Error in selectMethod(coerce, c("ANY", "raw")) : no method found for signature ANY, raw Thanks, H. -- Hervé Pagès Bioconductor Core Team hpages.on.git...@gmail.com [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Question regarding .make_numeric_version with non-character input
On 4/25/24 07:04, Kurt Hornik wrote: ... > Sure, I'll look into adding something. (Too late for 4.4.0, of course.) > > Best > -k Great. Thanks! H. -- Hervé Pagès Bioconductor Core Team hpages.on.git...@gmail.com [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Question regarding .make_numeric_version with non-character input
On 4/24/24 23:07, Kurt Hornik wrote: >>>>>> Hervé Pagès writes: >> Hi Kurt, >> Is it intended that numeric_version() returns an error by default on >> non-character input in R 4.4.0? > Dear Herve, yes, that's the intention. > >> It seems that I can turn this into a warning by setting >> _R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_=false but I don't >> seem to be able to find any of this mentioned in the NEWS file. > That's what I added for smoothing the transition: it will be removed > from the trunk shortly. Thanks for clarifying. Could this be documented in the NEWS file? This is a breaking change (it breaks a couple of Bioconductor packages) and people are not going to set this environment variable if they are not aware of it. Thanks again, H. > > Best > -k > >> Thanks, >> H. >> On 4/1/24 05:28, Kurt Hornik wrote: >> Andrea Gilardi via R-devel writes: > >> Thanks: should be fixed now in the trunk. > >> Best >> -k >> Thank you very much Dirk for your kind words and for confirming the >> bug. >> Next week I will open a new issue on Bugzilla adding the related >> patch. > >> Kind regards > >> Andrea > >> On 29/03/2024 20:14, Dirk Eddelbuettel wrote: > >> On 29 March 2024 at 17:56, Andrea Gilardi via R-devel wrote: >> | Dear all, >> | >> | I have a question regarding the R-devel version of >> .make_numeric_version() function. As far as I can understand, the current >> code >> (https://github.com/wch/r-source/blob/66b91578dfc85140968f07dd4e72d8cb8a54f4c6/src/library/base/R/version.R#L50-L56) >> runs the following steps in case of non-character input: >> | >> | 1. It creates a message named msg using gettextf. >> | 2. Such object is then passed to stop(msg) or warning(msg) >> according to the following condition >> | >> | >> tolower(Sys.getenv("_R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_") != >> "false") >> | >> | However, I don't understand the previous code since the >> output of Sys.getenv("_R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_") != >> "false" is just a boolean value and tolower() will just return "true" or >> "false". Maybe the intended code is >> tolower(Sys.getenv("_R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_")) != >> "false" ? Or am I missing something? > >> Yes, agreed -- good catch. In full, the code is (removing >> leading >> whitespace, and putting it back onto single lines) > >> msg <- gettextf("invalid non-character version specification >> 'x' (type: %s)", typeof(x)) >> >> if(tolower(Sys.getenv("_R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_") != >> "false")) >> stop(msg, domain = NA) >> else >> warning(msg, domain = NA, immediate. = TRUE) > >> where msg is constant (but reflecting language settings via >> standard i18n) >> and as you not the parentheses appear wrong. What was intended >> is likely > >> msg <- gettextf("invalid non-character version specification >> 'x' (type: %s)", typeof(x)) >> >> if(tolower(Sys.getenv("_R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_")) >> != "false") >> stop(msg, domain = NA) >> else >> warning(msg, domain = NA, immediate. = TRUE) > >> If you use bugzilla before and have a handle, maybe file a bug >> report with >> this as patch athttps://bugs.r-project.org/ > >> Dirk >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > >> -- >> Hervé Pagès >> Bioconductor Core Team >> hpages.on.git...@gmail.com -- Hervé Pagès Bioconductor Core Team hpages.on.git...@gmail.com [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Question regarding .make_numeric_version with non-character input
Hi Kurt, Is it intended that numeric_version() returns an error by default on non-character input in R 4.4.0? It seems that I can turn this into a warning by setting _R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_=false but I don't seem to be able to find any of this mentioned in the NEWS file. Thanks, H. On 4/1/24 05:28, Kurt Hornik wrote: >>>>>> Andrea Gilardi via R-devel writes: > Thanks: should be fixed now in the trunk. > > Best > -k > >> Thank you very much Dirk for your kind words and for confirming the bug. >> Next week I will open a new issue on Bugzilla adding the related patch. >> Kind regards >> Andrea >> On 29/03/2024 20:14, Dirk Eddelbuettel wrote: >>> On 29 March 2024 at 17:56, Andrea Gilardi via R-devel wrote: >>> | Dear all, >>> | >>> | I have a question regarding the R-devel version of >>> .make_numeric_version() function. As far as I can understand, the current >>> code >>> (https://github.com/wch/r-source/blob/66b91578dfc85140968f07dd4e72d8cb8a54f4c6/src/library/base/R/version.R#L50-L56) >>> runs the following steps in case of non-character input: >>> | >>> | 1. It creates a message named msg using gettextf. >>> | 2. Such object is then passed to stop(msg) or warning(msg) according to >>> the following condition >>> | >>> | tolower(Sys.getenv("_R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_") != >>> "false") >>> | >>> | However, I don't understand the previous code since the output of >>> Sys.getenv("_R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_") != "false" >>> is just a boolean value and tolower() will just return "true" or "false". >>> Maybe the intended code is >>> tolower(Sys.getenv("_R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_")) != >>> "false" ? Or am I missing something? >>> >>> Yes, agreed -- good catch. In full, the code is (removing leading >>> whitespace, and putting it back onto single lines) >>> >>> msg <- gettextf("invalid non-character version specification 'x' (type: >>> %s)", typeof(x)) >>> if(tolower(Sys.getenv("_R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_") >>> != "false")) >>> stop(msg, domain = NA) >>> else >>> warning(msg, domain = NA, immediate. = TRUE) >>> >>> where msg is constant (but reflecting language settings via standard i18n) >>> and as you not the parentheses appear wrong. What was intended is likely >>> >>> msg <- gettextf("invalid non-character version specification 'x' (type: >>> %s)", typeof(x)) >>> if(tolower(Sys.getenv("_R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_")) >>> != "false") >>> stop(msg, domain = NA) >>> else >>> warning(msg, domain = NA, immediate. = TRUE) >>> >>> If you use bugzilla before and have a handle, maybe file a bug report with >>> this as patch athttps://bugs.r-project.org/ >>> >>> Dirk >>> >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Hervé Pagès Bioconductor Core Team hpages.on.git...@gmail.com [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Spurious warning in as.data.frame.factor()
Thanks Martin. We'll update the BioC builders to the latest R devel soon. Cheers, H. On 3/15/24 10:26, Martin Maechler wrote: >>>>>> Martin Maechler >>>>>> on Fri, 15 Mar 2024 11:24:22 +0100 writes: >>>>>> Ivan Krylov >>>>>> on Thu, 14 Mar 2024 14:17:38 +0300 writes: > >> On Thu, 14 Mar 2024 10:41:54 +0100 > >> Martin Maechler wrote: > > >>> Anybody trying S7 examples and see if they work w/o producing > >>> wrong warnings? > > >> It looks like this is not applicable to S7. If I overwrite > >> as.data.frame with a newly created S7 generic, it fails to dispatch on > >> existing S3 classes: > > >> new_generic('as.data.frame', 'x')(factor(1)) > >> # Error: Can't find method for `as.data.frame(S3)`. > > >> But there is no need to overwrite the generic, because S7 classes > >> should work with existing S3 generics: > > >> foo <- new_class('foo', parent = class_double) > >> method(as.data.frame, foo) <- function(x) structure( > >> # this is probably not generally correct > >> list(x), > >> names = deparse1(substitute(x)), > >> row.names = seq_len(length(x)), > >> class = 'data.frame' > >> ) > >> str(as.data.frame(foo(pi))) > >> # 'data.frame': 1 obs. of 1 variable: > >> # $ x: num 3.14 > > >> So I think that is nothing to break because S7 methods for > >> as.data.frame will rely on S3 for dispatch. > > > Yes, as it should be. Thank you for checking.. > > > >>> > The patch passes make check-devel, but I'm not sure how to safely > >>> > put setGeneric('as.data.frame'); as.data.frame(factor(1:10)) in a > >>> > regression test. > >>> > >>> {What's the danger/problem? we do have "similar" tests in both > >>> src/library/methods/tests/*.R > >>> tests/reg-S4.R > >>> > >>> -- maybe we can discuss bi-laterally (or here, as you prefer) > >>> } > > >> This might be educational for other people wanting to add a regression > >> test to their patch. I see that tests/reg-tests-1e.R is already > running > >> under options(warn = 2), so if I add the following near line 750 > >> ("Deprecation of *direct* calls to as.data.frame.")... > > >> # Should not warn for a call from a derivedDefaultMethod to the raw > >> # S3 method -- implementation detail of S4 dispatch > >> setGeneric('as.data.frame') > >> as.data.frame(factor(1)) > > >> ...then as.data.frame will remain an S4 generic. Should the test then > >> rm(as.data.frame) and keep going? (Or even keep the S4 generic?) Is > >> there any hidden state I may be breaking for the rest of the test this > >> way? > >> The test does pass like this, so this may be worrying about nothing. > > > Indeed, this could be educational; I think just adding > > > removeGeneric('as.data.frame') > > > is appropriate here as it is self-explaining and should not leave > > much traces. > > > I'm about to test this in reg-tests-1e.R and with make check-all > > and commit later today, > > thanking you, Ivan! > > This has been committed to R-devel svn rev 86139 now. > > So these spurious warnings in situations where as.data.frame() > is an S4 generic --- notably for the many Bioconductor package > depending on {BiocGenerics} --- should disappear within 24 > hours or less. > > Martin -- Hervé Pagès Bioconductor Core Team hpages.on.git...@gmail.com [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Spurious warning in as.data.frame.factor()
Hi, The acrobatics that as.data.frame.factor() is going thru in order to recognize a direct call don't play nice if as.data.frame() is an S4 generic: df <- as.data.frame(factor(11:12)) suppressPackageStartupMessages(library(BiocGenerics)) isGeneric("as.data.frame") # [1] TRUE df <- as.data.frame(factor(11:12)) # Warning message: # In as.data.frame.factor(factor(11:12)) : # Direct call of 'as.data.frame.factor()' is deprecated. Use 'as.data.frame.vector()' or 'as.data.frame()' instead This spurious warning showed up on the recent Bioconductor daily build reports after we've updated the build machines to the latest R devel. It's causing some confusion and breaks at least one unit test. Thanks, H. > sessionInfo() R Under development (unstable) (2024-03-06 r86056) Platform: x86_64-pc-linux-gnu Running under: Ubuntu 22.04.4 LTS Matrix products: default BLAS: /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C time zone: America/New_York tzcode source: system (glibc) attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] BiocGenerics_0.49.1 loaded via a namespace (and not attached): [1] compiler_4.4.0 -- Hervé Pagès Bioconductor Core Team hpages.on.git...@gmail.com [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] NOTE: multiple local function definitions for ?fun? with different formal arguments
Thanks. Workarounds are interesting but... what's the point of the NOTE in the first place? H. On 2/4/24 09:07, Duncan Murdoch wrote: > On 04/02/2024 10:55 a.m., Izmirlian, Grant (NIH/NCI) [E] via R-devel > wrote: >> Well you can see that yeast is exactly weekday you have. The way out >> is to just not name the result > > I think something happened to your explanation... > >> >> toto <- function(mode) >> { >> ifelse(mode == 1, >> function(a,b) a*b, >> function(u, v, w) (u + v) / w) >> } > > It's a bad idea to use ifelse() when you really want if() ... else ... > . In this case it works, but it doesn't always. So the workaround > should be > > > toto <- function(mode) > { > if(mode == 1) > function(a,b) a*b > else > function(u, v, w) (u + v) / w > } > > >> >> >> >> From: Grant Izmirlian >> Date: Sun, Feb 4, 2024, 10:44 AM >> To: "Izmirlian, Grant (NIH/NCI) [E]" >> Subject: Fwd: [EXTERNAL] R-devel Digest, Vol 252, Issue 2 >> >> Hi, >> >> I just ran into this 'R CMD check' NOTE for the first time: >> >> * checking R code for possible problems ... NOTE >> toto: multiple local function definitions for �fun� with different >> formal arguments >> >> The "offending" code is something like this (simplified from the real >> code): >> >> toto <- function(mode) >> { >> if (mode == 1) >> fun <- function(a, b) a*b >> else >> fun <- function(u, v, w) (u + v) / w >> fun >> } >> >> Is that NOTE really intended? Hard to see why this code would be >> considered "wrong". >> >> I know it's just a NOTE but still... > > I agree it's a false positive, but the issue is that you have a > function object in your function which can't be called > unconditionally. The workaround doesn't create such an object. > > Recognizing that your function never tries to call fun requires global > inspection of toto(), and most of the checks are based on local > inspection. > > Duncan Murdoch > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Hervé Pagès Bioconductor Core Team hpages.on.git...@gmail.com [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] NOTE: multiple local function definitions for ‘fun’ with different formal arguments
Hi, I just ran into this 'R CMD check' NOTE for the first time: * checking R code for possible problems ... NOTE toto: multiple local function definitions for ‘fun’ with different formal arguments The "offending" code is something like this (simplified from the real code): toto <- function(mode) { if (mode == 1) fun <- function(a, b) a*b else fun <- function(u, v, w) (u + v) / w fun } Is that NOTE really intended? Hard to see why this code would be considered "wrong". I know it's just a NOTE but still... Thanks, H. -- Hervé Pagès Bioconductor Core Team hpages.on.git...@gmail.com [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Should subsetting named vector return named vector including named unmatched elements?
Never been a big fan of this behavior either but maybe the intention was to make it easier to distinguish between 2 types of NAs in the result: those that were present in the original vector vs those that are introduced by an unmatched subscript. Like in this example: x <- setNames(c(101:108, NA), letters[1:9]) x # a b c d e f g h i # 101 102 103 104 105 106 107 108 NA x[c("g", "k", "a", "i")] # g a i # 107 NA 101 NA The first NA is the result of an unmatched subscript, while the second one comes from 'x'. This is of limited interest though. In most real world applications I've worked on, we actually need to "fix" the names of the result. Best, H. On 1/18/24 11:51, Jiří Moravec wrote: > Subsetting vector (including lists) returns the same number of > elements as the subsetting vector, including unmatched elements which > are reported as `NA` or `NULL` (in case of lists). > > Consider: > > ``` > menu = list( > "bacon" = "foo", > "eggs" = "bar", > "beans" = "baz" > ) > > select = c("bacon", "eggs", "spam") > > menu[select] > # $bacon > # [1] "foo" > # > # $eggs > # [1] "bar" > # > # $ > # NULL > > ``` > > Wouldn't it be more logical to return named vector/list including > names of unmatched elements when subsetting using names? After all, > the unmatched elements are already returned. I.e., the output would > look like this: > > ``` > > menu[select] > # $bacon > # [1] "foo" > # > # $eggs > # [1] "bar" > # > # $spam > # NULL > > ``` > > The simple fix `menu[select] |> setNames(select)` solves, but it feels > to me like something that could be a default behaviour. > > On slightly unrelated note, when I was asking if there is a better > solution, the `menu[select]` seems to allocate more memory than > `menu_env = list2env(menu); mget(select, envir = menu, ifnotfound = > list(NULL)`. Or the sapply solution. Is this a benchmarking artifact? > > https://stackoverflow.com/q/77828678/4868692 > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Hervé Pagès Bioconductor Core Team hpages.on.git...@gmail.com [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] 'R CMD INSTALL' keeps going on despite serious errors, and returns exit code 0
I see. We'll update soon. Thanks Martin. On 11/4/23 06:52, Martin Maechler wrote: >>>>>> Hervé Pagès >>>>>> on Fri, 3 Nov 2023 15:10:40 -0700 writes: > > Hi list, > > > Here is an example: > > > hpages@XPS15:~$ R CMD INSTALL CoreGx * installing > > > > hpages@XPS15:~$ R CMD INSTALL CoreGx > > * installing to library ‘/home/hpages/R/R-4.4.r85388/site-library’ > ^^^ > > Yes, this was bad behavior was the case for a short time (too > long, my fault !!) in R-devel. > > But that, svn rev 85388 , was *long* ago (close to 2 weeks): > Current R-devel is 85471 > (The bug was "only" in 382--388, fixed in 389 -- you were really unlucky!) > > Still, I'm sorry that you were accidentally affected, too. > Martin > > > > * installing *source* package ‘CoreGx’ ... > > ** using staged installation > > ** R > > ** data > > *** moving datasets to lazyload DB > > ** inst > > ** byte-compile and prepare package for lazy loading > > Error : in method for ‘updateObject’ with signature > > ‘object="CoreSet"’: arguments (‘verbose’) after ‘...’ in the generic > > must appear in the method, in the same place at the end of the > argument list > > Error: unable to load R code in package ‘CoreGx’ > > ** help > > *** installing help indices > > ** building package indices > > ** installing vignettes > > ** testing if installed package can be loaded from temporary > location > > Error : in method for ‘updateObject’ with signature > > ‘object="CoreSet"’: arguments (‘verbose’) after ‘...’ in the generic > > must appear in the method, in the same place at the end of the > argument list > > Error: package or namespace load failed for ‘CoreGx’: > > unable to load R code in package ‘CoreGx’ > > Error: loading failed > > ** testing if installed package can be loaded from final location > > Error : in method for ‘updateObject’ with signature > > ‘object="CoreSet"’: arguments (‘verbose’) after ‘...’ in the generic > > must appear in the method, in the same place at the end of the > argument list > > Error: package or namespace load failed for ‘CoreGx’: > > unable to load R code in package ‘CoreGx’ > > Error: loading failed > > Error : in method for ‘updateObject’ with signature > > ‘object="CoreSet"’: arguments (‘verbose’) after ‘...’ in the generic > > must appear in the method, in the same place at the end of the > argument list > > Error: unable to load R code in package ‘CoreGx’ > > ** testing if installed package keeps a record of temporary > > installation path > > * DONE (CoreGx) > > > Many serious errors were ignored. Plus the command returned exit code > 0: > > > hpages@XPS15:~$ echo $? > > 0 > > > This is with R 4.4, that BioC 3.19 will be based on and that we only > > started to use recently for our daily builds. > > > Strangely, we only see this on Linux. On Windows and Mac, we get the > > usual hard error, as expected. See: > > > - > > >https://bioconductor.org/checkResults/3.19/bioc-LATEST/CoreGx/nebbiolo1-install.html > > > - > > >https://bioconductor.org/checkResults/3.19/bioc-LATEST/CoreGx/palomino3-install.html > > > - > > >https://bioconductor.org/checkResults/3.19/bioc-LATEST/CoreGx/merida1-install.html > > > To reproduce: > > > library(remotes) > > install_git("https://git.bioconductor.org/packages/CoreGx";) > > > Thanks, > > > H. > > >> sessionInfo() > > R Under development (unstable) (2023-10-22 r85388) > > Platform: x86_64-pc-linux-gnu > > Running under: Ubuntu 23.10 > > > Matrix products: default > > BLAS: /home/hpages/R/R-4.4.r85388/lib/libRblas.so > > LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.11.0 > > > locale: > > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > > [7] LC_
Re: [Rd] 'R CMD INSTALL' keeps going on despite serious errors, and returns exit code 0
Forgot to mention that the package actually got installed, but is unloadable (not surprisingly): > "CoreGx" %in% rownames(installed.packages()) [1] TRUE > suppressWarnings(suppressMessages(library(CoreGx))) Error : in method for ‘updateObject’ with signature ‘object="CoreSet"’: arguments (‘verbose’) after ‘...’ in the generic must appear in the method, in the same place at the end of the argument list Error: package or namespace load failed for ‘CoreGx’: unable to load R code in package ‘CoreGx’ Best, H. On 11/3/23 15:10, Hervé Pagès wrote: > > Hi list, > > Here is an example: > > hpages@XPS15:~$ R CMD INSTALL CoreGx > * installing to library ‘/home/hpages/R/R-4.4.r85388/site-library’ > * installing *source* package ‘CoreGx’ ... > ** using staged installation > ** R > ** data > *** moving datasets to lazyload DB > ** inst > ** byte-compile and prepare package for lazy loading > Error : in method for ‘updateObject’ with signature > ‘object="CoreSet"’: arguments (‘verbose’) after ‘...’ in the generic > must appear in the method, in the same place at the end of the > argument list > Error: unable to load R code in package ‘CoreGx’ > ** help > *** installing help indices > ** building package indices > ** installing vignettes > ** testing if installed package can be loaded from temporary location > Error : in method for ‘updateObject’ with signature > ‘object="CoreSet"’: arguments (‘verbose’) after ‘...’ in the generic > must appear in the method, in the same place at the end of the > argument list > Error: package or namespace load failed for ‘CoreGx’: > unable to load R code in package ‘CoreGx’ > Error: loading failed > ** testing if installed package can be loaded from final location > Error : in method for ‘updateObject’ with signature > ‘object="CoreSet"’: arguments (‘verbose’) after ‘...’ in the generic > must appear in the method, in the same place at the end of the > argument list > Error: package or namespace load failed for ‘CoreGx’: > unable to load R code in package ‘CoreGx’ > Error: loading failed > Error : in method for ‘updateObject’ with signature > ‘object="CoreSet"’: arguments (‘verbose’) after ‘...’ in the generic > must appear in the method, in the same place at the end of the > argument list > Error: unable to load R code in package ‘CoreGx’ > ** testing if installed package keeps a record of temporary > installation path > * DONE (CoreGx) > > Many serious errors were ignored. Plus the command returned exit code 0: > > hpages@XPS15:~$ echo $? > 0 > > This is with R 4.4, that BioC 3.19 will be based on and that we only > started to use recently for our daily builds. > > Strangely, we only see this on Linux. On Windows and Mac, we get the > usual hard error, as expected. See: > > - > https://bioconductor.org/checkResults/3.19/bioc-LATEST/CoreGx/nebbiolo1-install.html > > - > https://bioconductor.org/checkResults/3.19/bioc-LATEST/CoreGx/palomino3-install.html > > - > https://bioconductor.org/checkResults/3.19/bioc-LATEST/CoreGx/merida1-install.html > > To reproduce: > > library(remotes) > install_git("https://git.bioconductor.org/packages/CoreGx";) > > Thanks, > > H. > > > sessionInfo() > R Under development (unstable) (2023-10-22 r85388) > Platform: x86_64-pc-linux-gnu > Running under: Ubuntu 23.10 > > Matrix products: default > BLAS: /home/hpages/R/R-4.4.r85388/lib/libRblas.so > LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.11.0 > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > time zone: America/Los_Angeles > tzcode source: system (glibc) > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] remotes_2.4.2.1 > > loaded via a namespace (and not attached): > [1] processx_3.8.2 compiler_4.4.0 R6_2.5.1 rprojroot_2.0.3 > [5] cli_3.6.1 prettyunits_1.2.0 tools_4.4.0 crayon_1.5.2 > [9] desc_1.4.2 callr_3.7.3 pkgbuild_1.4.2 ps_1.7.5 > > -- > Hervé Pagès > > Bioconductor Core Team > hpages.on.git...@gmail.com -- Hervé Pagès Bioconductor Core Team hpages.on.git...@gmail.com [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] 'R CMD INSTALL' keeps going on despite serious errors, and returns exit code 0
Hi list, Here is an example: hpages@XPS15:~$ R CMD INSTALL CoreGx * installing to library ‘/home/hpages/R/R-4.4.r85388/site-library’ * installing *source* package ‘CoreGx’ ... ** using staged installation ** R ** data *** moving datasets to lazyload DB ** inst ** byte-compile and prepare package for lazy loading Error : in method for ‘updateObject’ with signature ‘object="CoreSet"’: arguments (‘verbose’) after ‘...’ in the generic must appear in the method, in the same place at the end of the argument list Error: unable to load R code in package ‘CoreGx’ ** help *** installing help indices ** building package indices ** installing vignettes ** testing if installed package can be loaded from temporary location Error : in method for ‘updateObject’ with signature ‘object="CoreSet"’: arguments (‘verbose’) after ‘...’ in the generic must appear in the method, in the same place at the end of the argument list Error: package or namespace load failed for ‘CoreGx’: unable to load R code in package ‘CoreGx’ Error: loading failed ** testing if installed package can be loaded from final location Error : in method for ‘updateObject’ with signature ‘object="CoreSet"’: arguments (‘verbose’) after ‘...’ in the generic must appear in the method, in the same place at the end of the argument list Error: package or namespace load failed for ‘CoreGx’: unable to load R code in package ‘CoreGx’ Error: loading failed Error : in method for ‘updateObject’ with signature ‘object="CoreSet"’: arguments (‘verbose’) after ‘...’ in the generic must appear in the method, in the same place at the end of the argument list Error: unable to load R code in package ‘CoreGx’ ** testing if installed package keeps a record of temporary installation path * DONE (CoreGx) Many serious errors were ignored. Plus the command returned exit code 0: hpages@XPS15:~$ echo $? 0 This is with R 4.4, that BioC 3.19 will be based on and that we only started to use recently for our daily builds. Strangely, we only see this on Linux. On Windows and Mac, we get the usual hard error, as expected. See: - https://bioconductor.org/checkResults/3.19/bioc-LATEST/CoreGx/nebbiolo1-install.html - https://bioconductor.org/checkResults/3.19/bioc-LATEST/CoreGx/palomino3-install.html - https://bioconductor.org/checkResults/3.19/bioc-LATEST/CoreGx/merida1-install.html To reproduce: library(remotes) install_git("https://git.bioconductor.org/packages/CoreGx";) Thanks, H. > sessionInfo() R Under development (unstable) (2023-10-22 r85388) Platform: x86_64-pc-linux-gnu Running under: Ubuntu 23.10 Matrix products: default BLAS: /home/hpages/R/R-4.4.r85388/lib/libRblas.so LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.11.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C time zone: America/Los_Angeles tzcode source: system (glibc) attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] remotes_2.4.2.1 loaded via a namespace (and not attached): [1] processx_3.8.2 compiler_4.4.0 R6_2.5.1 rprojroot_2.0.3 [5] cli_3.6.1 prettyunits_1.2.0 tools_4.4.0 crayon_1.5.2 [9] desc_1.4.2 callr_3.7.3 pkgbuild_1.4.2 ps_1.7.5 -- Hervé Pagès Bioconductor Core Team hpages.on.git...@gmail.com [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] dim<-() changed in R-devel; no longer removing "dimnames" when doing dim(x) <- dim(x)
Hi Martin, Henrik, I actually like this change. Makes a lot of sense IMO that dim(x) <- dim(x) be a no-op, or, more generally, that foo(x) <- foo(x) be a no-op for any setter/getter combo. FWIW S4Arrays::set_dim() does that too. It also preserves the dimnames if the right value is only adding or dropping outermost (ineffective) dimensions: > x <- array(1:6, dim=c(2,3,1), dimnames=list(c("A", "B"), c("x","y", "z"), "T")) > S4Arrays:::set_dim(x, 2:3) x y z A 1 3 5 B 2 4 6 Note that this is consistent with drop(). Best, H. On 10/30/23 03:53, Martin Maechler wrote: >>>>>> Henrik Bengtsson >>>>>> on Sun, 29 Oct 2023 10:42:19 -0700 writes: > > Hello, > > > the fix of PR18612 > > (https://bugs.r-project.org/show_bug.cgi?id=18612) in > > r85380 > > > (https://github.com/wch/r-source/commit/2653cc6203fce4c48874111c75bbccac3ac4e803) > > caused a change in `dim<-()`. Specifically, in the past, > > any `dim<-()` assignment would _always_ remove "dimnames" > > and "names" attributes per help("dim"): > > > > The replacement method changes the "dim" attribute > > (provided the new value is compatible) and removes any > > "dimnames" and "names" attributes. > > > In the new version, assigning the same "dim" as before > > will no longer remove "dimnames". I'm reporting here to > > check whether this change was intended, or if it was an > > unintended side effect of the bug fix. > > > For example, in R Under development (unstable) (2023-10-21 > > r85379), we would get: > > >> x <- array(1:2, dim=c(1,2), dimnames=list("A", > >> c("a","b"))) str(dimnames(x)) > > List of 2 $ : chr "A" $ : chr [1:2] "a" "b" > > >> dim(x) <- dim(x) ## Removes "dimnames" no matter what > >> str(dimnames(x)) > > NULL > > > > whereas in R Under development (unstable) (2023-10-21 > > r85380) and beyond, we now get: > > >> x <- array(1:2, dim=c(1,2), dimnames=list("A", > >> c("a","b"))) str(dimnames(x)) > > List of 2 $ : chr "A" $ : chr [1:2] "a" "b" > > >> dim(x) <- dim(x) ## No longer removes "dimnames" > >> str(dimnames(x)) > > List of 2 $ : chr "A" $ : chr [1:2] "a" "b" > > >> dim(x) <- rev(dim(x)) ## Still removes "dimnames" > >> str(dimnames(x)) > > NULL > > > /Henrik > > Thank you, Henrik. > > This is "funny" (in an unusal sense): > indeed, the change was *in*advertent, by me (svn rev 85380). > > I had experimentally {i.e., only in my own private version of R-devel!} > modified the behavior of `dim<-` somewhat > such it does *not* unnecessarily drop dimnames, > e.g., in your `dim(x) <- dim(x)` case above, > one could really argue that it's a "true loss" if x loses > dimnames "unnecessarily" ... > > OTOH, I knew in the mean time that `dim<-` has always been > documented to drop dimnames in all cases, and even more > importantly, I got a strong recommendation to *not* go further > with this idea -- not only for back compatibility reasons, but > also for internal logical consistency. > > Most probably, we will just revert this inadvertent change, > but before that ... since it has been out in the wild anyway, > we could quickly consider if it *did* break code. > > I assume it did, or you would not have noticed ? > > Martin > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Hervé Pagès Bioconductor Core Team hpages.on.git...@gmail.com [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] as(, "dgTMatrix")' is deprecated.
Hi Martin, On 10/3/23 10:17, Martin Maechler wrote: >>>>>> Duncan Murdoch >>>>>> on Tue, 3 Oct 2023 12:59:10 -0400 writes: > > On 03/10/2023 12:50 p.m., Koenker, Roger W wrote: > >> I’ve been getting this warning for a while now (about > >> five years if memory serves) and I’m finally tired of it, > >> but also too tired to track it down in Matrix. As far as > >> I can grep I have no reference to either deprecated > >> object, only the apparently innocuous Matrix::Matrix(A, > >> sparse = TRUE). Can someone advise, Martin perhaps? I > >> thought it might come from Rmosek, but mosek folks don’t > >> think so. > >>https://groups.google.com/g/mosek/c/yEwXmMfHBbg/m/l_mkeM4vAAAJ > > > A quick scan of that discussion didn't turn up anything > > relevant, e.g. a script to produce the warning. Could you > > be more specific, or just post the script here? > > > In general, a good way to locate the source of a warning > > is to set options(warn=2) to turn it into an error, and > > then trigger it. The traceback from the error will > > include a bunch of junk from the code that catches the > > warning, but it will also include the context where it was > > triggered. > > > Duncan Murdoch > > Indeed. > > But Roger is right that it in the end, (almost surely) it is > from our {Matrix} package. > > Indeed for several years now, we have tried to make the setup > leaner (and hence faster) by not explicitly define coercion > from to because the size of > is here about 200, and we don't want to have to provide > 200^2 = 40'000 coercion methods. 40,000 coercion methods sounds indeed crazy. But have you considered having 200 coercions from ANY to ? For example the coercion from ANY to dgTMatrix would do as(as(as(from, "dMatrix"), "generalMatrix"), "TsparseMatrix"). Maybe the ANY->xyzMatrix methods could even be generated programmatically? Best, H. > > Rather, Matrix package users should use to high level abstract Matrix > classes such as "sparseMatrix" or "CsparseMatrix" or > "TsparseMatrix" or "dMatrix", "symmetricMatrix". > > In the case of as(, "dgTMatrix") , if you > replace "dgTMatrix" by "TsparseMatrix" > the result will be the same but also work in the future when the > deprecation may have been turned into a defunctation ... > > Martin > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Hervé Pagès Bioconductor Core Team hpages.on.git...@gmail.com [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Recent changes to as.complex(NA_real_)
On 9/25/23 07:05, Martin Maechler wrote: >>>>>> Hervé Pagès >>>>>> on Sat, 23 Sep 2023 16:52:21 -0700 writes: > > Hi Martin, > > On 9/23/23 06:43, Martin Maechler wrote: > >>>>>>> Hervé Pagès > >>>>>>> on Fri, 22 Sep 2023 16:55:05 -0700 writes: > >> > The problem is that you have things that are > >> > **semantically** different but look exactly the same: > >> > >> > They look the same: > >> > >> >> x > >> > [1] NA > >> >> y > >> > [1] NA > >> >> z > >> > [1] NA > >> > >> >> is.na(x) > >> > [1] TRUE > >> >> is.na(y) > >> > [1] TRUE > >> >> is.na(z) > >> > [1] TRUE > >> > >> >> str(x) > >> > cplx NA > >> >> str(y) > >> > num NA > >> >> str(z) > >> > cplx NA > >> > >> > but they are semantically different e.g. > >> > >> >> Re(x) > >> > [1] NA > >> >> Re(y) > >> > [1] -0.5 # surprise! > >> > >> >> Im(x) # surprise! > >> > [1] 2 > >> >> Im(z) > >> > [1] NA > >> > >> > so any expression involving Re() or Im() will produce > >> > different results on input that look the same on the > >> > surface. > >> > >> > You can address this either by normalizing the internal > >> > representation of complex NA to always be complex(r=NaN, > >> > i=NA_real_), like for NA_complex_, or by allowing the > >> > infinite variations that are currently allowed and at the > >> > same time making sure that both Re() and Im() always > >> > return NA_real_ on a complex NA. > >> > >> > My point is that the behavior of complex NA should be > >> > predictable. Right now it's not. Once it's predictable > >> > (with Re() and Im() both returning NA_real_ regardless of > >> > internal representation), then it no longer matters what > >> > kind of complex NA is returned by as.complex(NA_real_), > >> > because they are no onger distinguishable. > >> > >> > H. > >> > >> > On 9/22/23 13:43, Duncan Murdoch wrote: > >> >> Since the result of is.na(x) is the same on each of > >> >> those, I don't see a problem. As long as that is > >> >> consistent, I don't see a problem. You shouldn't be using > >> >> any other test for NA-ness. You should never be > >> >> expecting identical() to treat different types as the > >> >> same (e.g. identical(NA, NA_real_) is FALSE, as it > >> >> should be). If you are using a different test, that's > >> >> user error. > >> >> > >> >> Duncan Murdoch > >> >> > >> >> On 22/09/2023 2:41 p.m., Hervé Pagès wrote: > >> >>> We could also question the value of having an infinite > >> >>> number of NA representations in the complex space. For > >> >>> example all these complex values are displayed the same > >> >>> way (as NA), are considered NAs by is.na(), but are not > >> >>> identical or semantically equivalent (from an Re() or > >> >>> Im() point of view): > >> >>> > >> >>> NA_real_ + 0i > >> >>> > >> >>> complex(r=NA_real_, i=Inf) > >> >>> > >> >>> complex(r=2, i=NA_real_) > >> >>> > >> >>> complex(r=NaN, i=NA_real_) > >> >>> > >> >>> In other words, using a single representation for > >> >>> complex NA (i.e. complex(r=NA_real_, i=NA_real_)) would > >> >>> avoid a lot of unnecessary complications and surprises. > >> >>> > >> >>> Once you do that, whether as.co
Re: [Rd] Recent changes to as.complex(NA_real_)
Hi Martin, On 9/23/23 06:43, Martin Maechler wrote: >>>>>> Hervé Pagès >>>>>> on Fri, 22 Sep 2023 16:55:05 -0700 writes: > > The problem is that you have things that are > > **semantically** different but look exactly the same: > > > They look the same: > > >> x > > [1] NA > >> y > > [1] NA > >> z > > [1] NA > > >> is.na(x) > > [1] TRUE > >> is.na(y) > > [1] TRUE > >> is.na(z) > > [1] TRUE > > >> str(x) > > cplx NA > >> str(y) > > num NA > >> str(z) > > cplx NA > > > but they are semantically different e.g. > > >> Re(x) > > [1] NA > >> Re(y) > > [1] -0.5 # surprise! > > >> Im(x) # surprise! > > [1] 2 > >> Im(z) > > [1] NA > > > so any expression involving Re() or Im() will produce > > different results on input that look the same on the > > surface. > > > You can address this either by normalizing the internal > > representation of complex NA to always be complex(r=NaN, > > i=NA_real_), like for NA_complex_, or by allowing the > > infinite variations that are currently allowed and at the > > same time making sure that both Re() and Im() always > > return NA_real_ on a complex NA. > > > My point is that the behavior of complex NA should be > > predictable. Right now it's not. Once it's predictable > > (with Re() and Im() both returning NA_real_ regardless of > > internal representation), then it no longer matters what > > kind of complex NA is returned by as.complex(NA_real_), > > because they are no onger distinguishable. > > > H. > > > On 9/22/23 13:43, Duncan Murdoch wrote: > >> Since the result of is.na(x) is the same on each of > >> those, I don't see a problem. As long as that is > >> consistent, I don't see a problem. You shouldn't be using > >> any other test for NA-ness. You should never be > >> expecting identical() to treat different types as the > >> same (e.g. identical(NA, NA_real_) is FALSE, as it > >> should be). If you are using a different test, that's > >> user error. > >> > >> Duncan Murdoch > >> > >> On 22/09/2023 2:41 p.m., Hervé Pagès wrote: > >>> We could also question the value of having an infinite > >>> number of NA representations in the complex space. For > >>> example all these complex values are displayed the same > >>> way (as NA), are considered NAs by is.na(), but are not > >>> identical or semantically equivalent (from an Re() or > >>> Im() point of view): > >>> > >>> NA_real_ + 0i > >>> > >>> complex(r=NA_real_, i=Inf) > >>> > >>> complex(r=2, i=NA_real_) > >>> > >>> complex(r=NaN, i=NA_real_) > >>> > >>> In other words, using a single representation for > >>> complex NA (i.e. complex(r=NA_real_, i=NA_real_)) would > >>> avoid a lot of unnecessary complications and surprises. > >>> > >>> Once you do that, whether as.complex(NA_real_) should > >>> return complex(r=NA_real_, i=0) or complex(r=NA_real_, > >>> i=NA_real_) becomes a moot point. > >>> > >>> Best, > >>> > >>> H. > > Thank you, Hervé. > Your proposition is yet another one, > to declare that all complex NA's should be treated as identical > (almost/fully?) everywhere. > > This would be a possibility, but I think a drastic one. > > I think there are too many cases, where I want to keep the > information of the real part independent of the values of the > imaginary part (e.g. think of the Riemann hypothesis), and > typically vice versa. Use NaN for that, not NA. > > With your proposal, for a (potentially large) vector of complex numbers, > after >Re(z) <- 1/2 > > I could no longer rely on Re(z) == 1/2, > because it would be wrong for those z where (the imaginary part/ the number) > was NA/NaN. My proposal is to do this only if the Re and/or Im pa
Re: [Rd] Recent changes to as.complex(NA_real_)
On 9/22/23 16:55, Hervé Pagès wrote: > The problem is that you have things that are **semantically** > different but look exactly the same: > > They look the same: > > > x > [1] NA > > y > [1] NA > > z > [1] NA > > > is.na(x) > [1] TRUE > > is.na(y) > [1] TRUE > > is.na(z) > [1] TRUE > > > str(x) > cplx NA > > str(y) > num NA > oops, that was supposed to be: > str(y) cplx NA but somehow I managed to copy/paste the wrong thing, sorry. H. -- Hervé Pagès Bioconductor Core Team hpages.on.git...@gmail.com [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Recent changes to as.complex(NA_real_)
The problem is that you have things that are **semantically** different but look exactly the same: They look the same: > x [1] NA > y [1] NA > z [1] NA > is.na(x) [1] TRUE > is.na(y) [1] TRUE > is.na(z) [1] TRUE > str(x) cplx NA > str(y) num NA > str(z) cplx NA but they are semantically different e.g. > Re(x) [1] NA > Re(y) [1] -0.5 # surprise! > Im(x) # surprise! [1] 2 > Im(z) [1] NA so any expression involving Re() or Im() will produce different results on input that look the same on the surface. You can address this either by normalizing the internal representation of complex NA to always be complex(r=NaN, i=NA_real_), like for NA_complex_, or by allowing the infinite variations that are currently allowed and at the same time making sure that both Re() and Im() always return NA_real_ on a complex NA. My point is that the behavior of complex NA should be predictable. Right now it's not. Once it's predictable (with Re() and Im() both returning NA_real_ regardless of internal representation), then it no longer matters what kind of complex NA is returned by as.complex(NA_real_), because they are no onger distinguishable. H. On 9/22/23 13:43, Duncan Murdoch wrote: > Since the result of is.na(x) is the same on each of those, I don't see > a problem. As long as that is consistent, I don't see a problem. You > shouldn't be using any other test for NA-ness. You should never be > expecting identical() to treat different types as the same (e.g. > identical(NA, NA_real_) is FALSE, as it should be). If you are using > a different test, that's user error. > > Duncan Murdoch > > On 22/09/2023 2:41 p.m., Hervé Pagès wrote: >> We could also question the value of having an infinite number of NA >> representations in the complex space. For example all these complex >> values are displayed the same way (as NA), are considered NAs by >> is.na(), but are not identical or semantically equivalent (from an Re() >> or Im() point of view): >> >> NA_real_ + 0i >> >> complex(r=NA_real_, i=Inf) >> >> complex(r=2, i=NA_real_) >> >> complex(r=NaN, i=NA_real_) >> >> In other words, using a single representation for complex NA (i.e. >> complex(r=NA_real_, i=NA_real_)) would avoid a lot of unnecessary >> complications and surprises. >> >> Once you do that, whether as.complex(NA_real_) should return >> complex(r=NA_real_, i=0) or complex(r=NA_real_, i=NA_real_) becomes a >> moot point. >> >> Best, >> >> H. >> >> On 9/22/23 03:38, Martin Maechler wrote: >>>>>>>> Mikael Jagan >>>>>>>> on Thu, 21 Sep 2023 00:47:39 -0400 writes: >>> > Revisiting this thread from April: >>> >>> >https://stat.ethz.ch/pipermail/r-devel/2023-April/082545.html >>> >>> > where the decision (not yet backported) was made for >>> > as.complex(NA_real_) to give NA_complex_ instead of >>> > complex(r=NA_real_, i=0), to be consistent with >>> > help("as.complex") and as.complex(NA) and >>> as.complex(NA_integer_). >>> >>> > Was any consideration given to the alternative? >>> > That is, to changing as.complex(NA) and >>> as.complex(NA_integer_) to >>> > give complex(r=NA_real_, i=0), consistent with >>> > as.complex(NA_real_), then amending help("as.complex") >>> > accordingly? >>> >>> Hmm, as, from R-core, mostly I was involved, I admit to say "no", >>> to my knowledge the (above) alternative wasn't considered. >>> >>> > The principle that >>> > Im(as.complex()) should be zero >>> > is quite fundamental, in my view, hence the "new" behaviour >>> > seems to really violate the principle of least surprise ... >>> >>> of course "least surprise" is somewhat subjective. Still, >>> I clearly agree that the above would be one desirable property. >>> >>> I think that any solution will lead to *some* surprise for some >>> cases, I think primarily because there are *many* different >>> values z for which is.na(z) is true, and in any case >>> NA_complex_ is only of the many. >>> >>> I also agree with Mikael that we should reconsider the issue >>> that was raised by Davis Vaughan here ("on R-devel") last April. >>> >>> > Another (but maybe weaker) argument is that >>
Re: [Rd] Recent changes to as.complex(NA_real_)
indeed, but I think > we should try to look at it only *secondary* to your first > proposal. > > > Whatever decision is made about as.complex(NA_real_), > > maybe these points should be weighed before it becomes part of > > R-release ... > > > Mikael > > Indeed. > > Can we please get other opinions / ideas here? > > Thank you in advance for your thoughts! > Martin > > --- > > PS: > > Our *print()*ing of complex NA's ("NA" here meaning NA or NaN) > is also unsatisfactory, e.g. in the case where all entries of a > vector are NA in the sense of is.na(.), but their > Re() and Im() are not all NA: > >showC <- function(z) noquote(sprintf("(R = %g, I = %g)", Re(z), Im(z))) >z <- complex(, c(11, NA, NA), c(NA, 99, NA)) >z >showC(z) > > gives > >> z >[1] NA NA NA >> showC(z) >[1] (R = 11, I = NA) (R = NA, I = 99) (R = NA, I = NA) > > but that (printing of complex) *is* another issue, > in which we have the re-opened bugzilla PR#16752 > ==>https://bugs.r-project.org/show_bug.cgi?id=16752 > > on which we also worked during the R Sprint in Warwick three > weeks ago, and where I want to commit changes in any case {but > think we should change even a bit more than we got to during the > Sprint}. > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Hervé Pagès Bioconductor Core Team hpages.on.git...@gmail.com [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] FYI: daily R source tarballs from ETH: *.xz instead of *.bz2)
On 9/11/23 22:39, Prof Brian Ripley wrote: > On 09/09/2023 01:56, Hervé Pagès wrote: >> Hi Martin, >> >> Sounds good. Are there any plans to support the xz compression for >> package source tarballs? > > What makes you think it is not supported? I guess because I've never seen source tarballs distributed as .xz files but it's good to know that 'R CMD build' and 'R CMD INSTALL' support that. So let me reformulate my question: do CRAN have any plans to switch from .tar.gz to .xz for the distribution of source tarballs? Is this something that tools like write_PACKAGES(), available.packages(), and install.packages() would be able to handle? Would they be able to handle a mix of .tar.gz and .xz packages? (Which would be important for a smooth transition from .tar.gz to .xz across CRAN/Bioconductor.) I'm just trying to get a sense if the effort to reduce bandwidth will go beyond the distribution of R source snapshots. Thanks, H. -- Hervé Pagès Bioconductor Core Team hpages.on.git...@gmail.com [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] FYI: daily R source tarballs from ETH: *.xz instead of *.bz2)
Hi Martin, Sounds good. Are there any plans to support the xz compression for package source tarballs? Thanks, H. On 9/8/23 06:44, Martin Maechler wrote: > A quick notice for anyone who uses cron-like scripts to get > R source tarballs from the ETH R/daily/ s: > > I've finally switched to replace *.bz2 by *.xz which does save > quite a bit of bandwidth. > > Currently, you can see the 2 day old *.bz2 (and their sizes) and > compare with the new *.xz one (sorted newest first): > >https://stat.ethz.ch/R/daily/?C=M;O=D > > > Best, > Martin > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Hervé Pagès Bioconductor Core Team hpages.on.git...@gmail.com [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] codetools wrongly complains about lazy evaluation in S4 methods
Oh but I see now that you've already tried this in your R/AllGenerics.R, sorry for missing that, but that you worry about the following message being disruptive on CRAN: The following object is masked from 'package:base': qr.X Why would that be? As long as you only define methods for objects that **you** control everything is fine. In other words you're not allowed to define a method for "qr" objects because that method would override base::qr.X(). But the generic itself and the method that you define for your objects don't override anything so should not disrupt anything. H. On 6/15/23 13:51, Hervé Pagès wrote: > > I'd argue that at the root of the problem is that your qr.X() generic > dispatches on all its arguments, including the 'ncol' argument which I > think the dispatch mechanism needs to evaluate **before** dispatch can > actually happen. > > So yes lazy evaluation is a real feature but it does not play well for > arguments of a generic that are involved in the dispatch. > > If you explicitly defined your generic with: > > setGeneric("qr.X", signature="qr") > > you should be fine. > > More generally speaking, it's a good idea to restrict the signature of > a generic to the arguments "that make sense". For unary operations > this is usually the 1st argument, for binary operations the first two > arguments etc... Additional arguments that control the operation like > modiflers, toggles, flags, rng seed, and other parameters, usually > have not place in the signature of the generic. > > H. > > On 6/14/23 20:57, Mikael Jagan wrote: >> Thanks all - yes, I think that Simon's diagnosis ("user error") is >> correct: >> in this situation one should define a reasonable generic function >> explicitly, >> with a call to setGeneric, and not rely on the call inside of >> setMethod ... >> >> But it is still not clear what the way forward should be (for package >> Matrix, >> where we would like to export a method for 'qr.X'). If we do >> nothing, then >> there is the note, already mentioned: >> >> * checking R code for possible problems ... NOTE >> qr.X: no visible binding for global variable ‘R’ >> Undefined global functions or variables: >> R >> >> If we add the following to our R/AllGenerics.R : >> >> setGeneric("qr.X", >> function(qr, complete = FALSE, ncol, ...) >> standardGeneric("qr.X"), >> useAsDefault = function(qr, complete = FALSE, ncol, >> ...) { >> if(missing(ncol)) >> base::qr.X(qr, complete = complete) >> else base::qr.X(qr, complete = complete, ncol = ncol) >> }, >> signature = "qr") >> >> then we get a startup message, which would be quite disruptive on CRAN : >> >> The following object is masked from 'package:base': >> >> qr.X >> >> and if we further add setGenericImplicit("qr.X", restore = (TRUE|FALSE)) >> to our R/zzz.R, then for either value of 'restore' we encounter : >> >> ** testing if installed package can be loaded from temporary >> location >> Error: package or namespace load failed for 'Matrix': >> Function found when exporting methods from the namespace >> 'Matrix' which is not S4 generic: 'qr.X' >> >> Are there possibilities that I have missed? >> >> It seems to me that the best option might be to define an implicit >> generic >> 'qr.X' in methods via '.initImplicitGenerics' in >> methods/R/makeBasicFunsList.R, >> where I see that an implicit generic 'qr.R' is already defined ... ? >> >> The patch pasted below "solves everything", though we'd still have to >> think >> about how to work for versions of R without the patch ... >> >> Mikael >> >> Index: src/library/methods/R/makeBasicFunsList.R >> === >> --- src/library/methods/R/makeBasicFunsList.R (revision 84541) >> +++ src/library/methods/R/makeBasicFunsList.R (working copy) >> @@ -263,6 +263,17 @@ >> signature = "qr", where = where) >> setGenericImplicit("qr.R", where, FALSE) >> >> + setGeneric("qr.X", >> + function(qr, complete = FALSE, ncol
Re: [Rd] codetools wrongly complains about lazy evaluation in S4 methods
all, it should only be part of the method implementation. If one was >> to implement the same default behavior in the generic itself (not >> necessarily a good idea) the default would be ncol = if (complete) >> nrow(qr.R(qr, TRUE)) else min(dim(qr.R(qr, TRUE))) to not rely on the >> internals of the implementation. >> >> Cheers, >> Simon >> >> >>> On 14/06/2023, at 6:03 AM, Kasper Daniel Hansen >>> wrote: >>> >>> On Sat, Jun 3, 2023 at 11:51 AM Mikael Jagan >>> wrote: >>> >>>> The formals of the newly generic 'qr.X' are inherited from the >>>> non-generic >>>> function in the base namespace. Notably, the inherited default >>>> value of >>>> formal argument 'ncol' relies on lazy evaluation: >>>> >>>>> formals(qr.X)[["ncol"]] >>>> if (complete) nrow(R) else min(dim(R)) >>>> >>>> where 'R' must be defined in the body of any method that might >>>> evaluate >>>> 'ncol'. >>>> >>> >>> Perhaps I am misunderstanding something, but I think Mikael's >>> expectations >>> about the scoping rules of R are wrong. The enclosing environment >>> of ncol >>> is where it was _defined_ not where it is _called_ (apologies if I am >>> messing up the computer science terminology here). >>> >>> This suggests to me that codetools is right. But a more extended >>> example >>> would be useful. Perhaps there is something special with setOldClass() >>> which I am no aware of. >>> >>> Also, Bioconductor has 100s of packages with S4 where codetools >>> works well. >>> >>> Kasper >>> >>> [[alternative HTML version deleted]] >>> >>> __ >>> R-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >> > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Hervé Pagès Bioconductor Core Team hpages.on.git...@gmail.com [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] issue with .local() hack used in S4 methods
Hi, Just ran across this: foo <- function(x, ..., z=22) z setMethod("foo", "character", function(x, y=-5, z=22) y) # Creating a generic function from function ‘foo’ in the global environment Then: foo("a") # [1] 22 Should return -5, not 22. That's because the call to .local() used internally by the foo() method does not name the arguments placed after the ellipsis: selectMethod("foo", "character") Method Definition: function (x, ..., z = 22) { .local <- function (x, y = 5, z = 22) y .local(x, ..., z) <--- should be .local(x, ..., z=z) } Thanks, H. sessionInfo() R version 4.3.0 (2023-04-21) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 23.04 Matrix products: default BLAS: /home/hpages/R/R-4.3.0/lib/libRblas.so LAPACK: /home/hpages/R/R-4.3.0/lib/libRlapack.so; LAPACK version 3.11.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C time zone: America/Los_Angeles tzcode source: system (glibc) attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.3.0 codetools_0.2-19 -- Hervé Pagès Bioconductor Core Team hpages.on.git...@gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] mapply(): Special case of USE.NAMES=TRUE with recent R-devel updates
And also: > mapply(paste, c(a="A"), character(), USE.NAMES = TRUE) Error in names(answer) <- names1 : 'names' attribute [1] must be the same length as the vector [0] When the shortest arguments get recycled to the length of the longest, shouldn't their names also get recycled? > mapply(paste, c(a="A", b="B"), letters[1:6], USE.NAMES=TRUE) a b "A a" "B b" "A c" "B d" "A e" "B f" That's assuming that rep() accurately materializes recycling (I hope it does): > rep(c(a="A", b="B"), length.out=6) a b a b a b "A" "B" "A" "B" "A" "B" > rep(c(a="A", b="B"), length.out=0) named character(0) I always wished that the process of recycling which happens everywhere all the time in R was implemented in its own dedicated function recycle(). But that's another story. Anyways, back to mapply(): Once what happens to the names during recycling is clarified, there should be no need to be explicit about what should happen when the length "of the first ... argument" is zero because it will no longer be a special case. Cheers, H. On 30/11/2021 22:10, Henrik Bengtsson wrote: Hi, in R-devel (4.2.0), we now get: mapply(paste, "A", character(), USE.NAMES = TRUE) named list() Now, in ?mapply we have: USE.NAMES: logical; use the names of the first ... argument, or if that is an unnamed character vector, use that vector as the names. This basically says we should get: answer <- list() first <- "A" names(answer) <- first which obviously is an error. The help is not explicit what should happen when the length "of the first ... argument" is zero, but the above behavior effectively does something like: answer <- list() first <- "A" names(answer) <- first[seq_along(answer)] answer named list() Is there a need for the docs to be updated, or should the result be an unnamed empty list? /Henrik __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Hervé Pagès Bioconductor Core Team hpages.on.git...@gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] How can a package be aware of whether it's on CRAN
But why would you need to check for anything in the first place? If you only use 2 cores in your examples, vignettes, and unit tests, 'R CMD check' will run fine everywhere and not eat all the CPU power of the machine where it's running. H. On 23/11/2021 12:05, Gábor Csárdi wrote: On Tue, Nov 23, 2021 at 8:49 PM Henrik Bengtsson wrote: Is there any reliable way to let packages to know if they are on CRAN, so they can set omp cores to 2 by default? Instead of testing for "on CRAN" or not, you can test for 'R CMD check' running or not. 'R CMD check' sets environment variable _R_CHECK_LIMIT_CORES_=TRUE. You can use that to limit your code to run at most two (2) parallel threads or processes. AFAICT this is only set with --as-cran and many CRAN machines don't use that and I am fairly sure that some of them don't set this env var manually, either. Gabor [...] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Hervé Pagès Bioconductor Core Team hpages.on.git...@gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Spurious warnings in coercion from double/complex/character to raw
On 10/09/2021 12:53, brodie gaslam wrote: On Friday, September 10, 2021, 03:13:54 PM EDT, Hervé Pagès wrote: Good catch, thanks! Replacing if(ISNAN(vi) || (tmp = (int) vi) < 0 || tmp > 255) { tmp = 0; warn |= WARN_RAW; } pa[i] = (Rbyte) tmp; with if(ISNAN(vi) || vi <= -1.0 || vi >= 256.0) { tmp = 0; warn |= WARN_RAW; } else { tmp = (int) vi; } pa[i] = (Rbyte) tmp; should address that. FWIW IntegerFromReal() has a similar risk of int overflow (src/main/coerce.c, lines 128-138): int attribute_hidden IntegerFromReal(double x, int *warn) { if (ISNAN(x)) return NA_INTEGER; else if (x >= INT_MAX+1. || x <= INT_MIN ) { *warn |= WARN_INT_NA; return NA_INTEGER; } return (int) x; } The cast to int will also be an int overflow situation if x is > INT_MAX and < INT_MAX+1 so the risk is small! I might be being dense, but it feels this isn't a problem? Quoting C99 6.3.1.4 again (emph added): When a finite value of real floating type is converted to an integer type other than _Bool, **the fractional part is discarded** (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.50) Does the "fractional part is discarded" not save us here? I think it does. Thanks for clarifying and sorry for the false positive! H. Best, B. -- Hervé Pagès Bioconductor Core Team hpages.on.git...@gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Spurious warnings in coercion from double/complex/character to raw
On 10/09/2021 09:12, Duncan Murdoch wrote: On 10/09/2021 11:29 a.m., Hervé Pagès wrote: Hi, The first warning below is unexpected and confusing: > as.raw(c(3e9, 5.1)) [1] 00 05 Warning messages: 1: NAs introduced by coercion to integer range 2: out-of-range values treated as 0 in coercion to raw The reason we get it is that coercion from numeric to raw is currently implemented on top of coercion from numeric to int (file src/main/coerce.c, lines 700-710): case REALSXP: for (i = 0; i < n; i++) { // if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt(); tmp = IntegerFromReal(REAL_ELT(v, i), &warn); if(tmp == NA_INTEGER || tmp < 0 || tmp > 255) { tmp = 0; warn |= WARN_RAW; } pa[i] = (Rbyte) tmp; } break; The first warning comes from the call to IntegerFromReal(). The following code avoids the spurious warning and is also simpler and slightly faster: case REALSXP: for (i = 0; i < n; i++) { // if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt(); double vi = REAL_ELT(v, i); if(ISNAN(vi) || (tmp = (int) vi) < 0 || tmp > 255) { tmp = 0; warn |= WARN_RAW; } pa[i] = (Rbyte) tmp; } break; Doesn't that give different results in case vi is so large that "(int) vi" overflows? (I don't know what the C standard says, but some online references say that behaviour is implementation dependent.) For example, if vi = 1.0 + INT_MAX; wouldn't "(int) vi" be equal to a small integer? Good catch, thanks! Replacing if(ISNAN(vi) || (tmp = (int) vi) < 0 || tmp > 255) { tmp = 0; warn |= WARN_RAW; } pa[i] = (Rbyte) tmp; with if(ISNAN(vi) || vi <= -1.0 || vi >= 256.0) { tmp = 0; warn |= WARN_RAW; } else { tmp = (int) vi; } pa[i] = (Rbyte) tmp; should address that. FWIW IntegerFromReal() has a similar risk of int overflow (src/main/coerce.c, lines 128-138): int attribute_hidden IntegerFromReal(double x, int *warn) { if (ISNAN(x)) return NA_INTEGER; else if (x >= INT_MAX+1. || x <= INT_MIN ) { *warn |= WARN_INT_NA; return NA_INTEGER; } return (int) x; } The cast to int will also be an int overflow situation if x is > INT_MAX and < INT_MAX+1 so the risk is small! There are other instances of this situation in IntegerFromComplex() and IntegerFromString(). More below... Duncan Murdoch Coercion from complex to raw has the same problem: > as.raw(c(3e9+0i, 5.1)) [1] 00 05 Warning messages: 1: NAs introduced by coercion to integer range 2: out-of-range values treated as 0 in coercion to raw Current implementation (file src/main/coerce.c, lines 711-721): case CPLXSXP: for (i = 0; i < n; i++) { // if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt(); tmp = IntegerFromComplex(COMPLEX_ELT(v, i), &warn); if(tmp == NA_INTEGER || tmp < 0 || tmp > 255) { tmp = 0; warn |= WARN_RAW; } pa[i] = (Rbyte) tmp; } break; This implementation has the following additional problem when the supplied complex has a nonzero imaginary part: > as.raw(300+4i) [1] 00 Warning messages: 1: imaginary parts discarded in coercion 2: out-of-range values treated as 0 in coercion to raw > as.raw(3e9+4i) [1] 00 Warning messages: 1: NAs introduced by coercion to integer range 2: out-of-range values treated as 0 in coercion to raw In one case we get a warning about the discarding of the imaginary part but not the other case, which is unexpected. We should see the exact same warning (or warnings) in both cases. With the following fix we only get the warning about the discarding of the imaginary part if we are not in a "out-of-range values treated as 0 in coercion to raw" situation: case CPLXSXP: for (i = 0; i < n; i++) { // if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt(); Rcomplex vi = COMPLEX_ELT(v, i); if(ISNAN(vi.r) || ISNAN(vi.i) || (tmp = (int) vi.r) < 0 || tmp > 255) { tmp = 0; warn |= WARN_RAW; } else { if(vi.i != 0.0) warn |= WARN_IMAG; } pa[i] = (Rbyte) tmp; } break; Corrected version: if(ISNAN(vi.r) || ISNAN(vi.i) || vi.r <= -1.00 || vi.r >= 256.00) { tmp = 0; warn |= WARN_RAW; } else { tmp = (int) vi.r; if(vi.i != 0.0)
Re: [Rd] Unneeded if statements in RealFromComplex C code
Thanks Martin! Best, H. On 10/09/2021 02:24, Martin Maechler wrote: Hervé Pagès on Thu, 9 Sep 2021 17:54:06 -0700 writes: > Hi, > I just stumbled across these 2 lines in RealFromComplex (lines 208 & 209 > in src/main/coerce.c): > double attribute_hidden > RealFromComplex(Rcomplex x, int *warn) > { > if (ISNAN(x.r) || ISNAN(x.i)) > return NA_REAL; > if (ISNAN(x.r)) return x.r;<- line 208 > if (ISNAN(x.i)) return NA_REAL;<- line 209 > if (x.i != 0) > *warn |= WARN_IMAG; > return x.r; > } > They were added in 2015 (revision 69410). by me. "Of course" the intent at the time was to *replace* the previous 2 lines and return NA/NaN of the "exact same kind" but in the mean time, I have learned that trying to preserve exact *kinds* of NaN / NA is typically not platform portable, anyway because compiler/library optimizations and implementations are pretty "free to do what they want" with these. > They don't serve any purpose and might slow things down a little (unless > compiler optimization is able to ignore them). In any case they should > probably be removed. I've cleaned up now, indeed back compatibly, i.e., removing both lines as you suggested. Thank you, Hervé! Martin > Cheers, > H. > -- > Hervé Pagès > Bioconductor Core Team > hpages.on.git...@gmail.com -- Hervé Pagès Bioconductor Core Team hpages.on.git...@gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Spurious warnings in coercion from double/complex/character to raw
Hi, The first warning below is unexpected and confusing: > as.raw(c(3e9, 5.1)) [1] 00 05 Warning messages: 1: NAs introduced by coercion to integer range 2: out-of-range values treated as 0 in coercion to raw The reason we get it is that coercion from numeric to raw is currently implemented on top of coercion from numeric to int (file src/main/coerce.c, lines 700-710): case REALSXP: for (i = 0; i < n; i++) { // if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt(); tmp = IntegerFromReal(REAL_ELT(v, i), &warn); if(tmp == NA_INTEGER || tmp < 0 || tmp > 255) { tmp = 0; warn |= WARN_RAW; } pa[i] = (Rbyte) tmp; } break; The first warning comes from the call to IntegerFromReal(). The following code avoids the spurious warning and is also simpler and slightly faster: case REALSXP: for (i = 0; i < n; i++) { // if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt(); double vi = REAL_ELT(v, i); if(ISNAN(vi) || (tmp = (int) vi) < 0 || tmp > 255) { tmp = 0; warn |= WARN_RAW; } pa[i] = (Rbyte) tmp; } break; Coercion from complex to raw has the same problem: > as.raw(c(3e9+0i, 5.1)) [1] 00 05 Warning messages: 1: NAs introduced by coercion to integer range 2: out-of-range values treated as 0 in coercion to raw Current implementation (file src/main/coerce.c, lines 711-721): case CPLXSXP: for (i = 0; i < n; i++) { // if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt(); tmp = IntegerFromComplex(COMPLEX_ELT(v, i), &warn); if(tmp == NA_INTEGER || tmp < 0 || tmp > 255) { tmp = 0; warn |= WARN_RAW; } pa[i] = (Rbyte) tmp; } break; This implementation has the following additional problem when the supplied complex has a nonzero imaginary part: > as.raw(300+4i) [1] 00 Warning messages: 1: imaginary parts discarded in coercion 2: out-of-range values treated as 0 in coercion to raw > as.raw(3e9+4i) [1] 00 Warning messages: 1: NAs introduced by coercion to integer range 2: out-of-range values treated as 0 in coercion to raw In one case we get a warning about the discarding of the imaginary part but not the other case, which is unexpected. We should see the exact same warning (or warnings) in both cases. With the following fix we only get the warning about the discarding of the imaginary part if we are not in a "out-of-range values treated as 0 in coercion to raw" situation: case CPLXSXP: for (i = 0; i < n; i++) { // if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt(); Rcomplex vi = COMPLEX_ELT(v, i); if(ISNAN(vi.r) || ISNAN(vi.i) || (tmp = (int) vi.r) < 0 || tmp > 255) { tmp = 0; warn |= WARN_RAW; } else { if(vi.i != 0.0) warn |= WARN_IMAG; } pa[i] = (Rbyte) tmp; } break; Finally, coercion from character to raw has the same problem and its code can be fixed in a similar manner: > as.raw(c("3e9", 5.1)) [1] 00 05 Warning messages: 1: NAs introduced by coercion to integer range 2: out-of-range values treated as 0 in coercion to raw Cheers, H. -- Hervé Pagès Bioconductor Core Team hpages.on.git...@gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Unneeded if statements in RealFromComplex C code
Hi, I just stumbled across these 2 lines in RealFromComplex (lines 208 & 209 in src/main/coerce.c): double attribute_hidden RealFromComplex(Rcomplex x, int *warn) { if (ISNAN(x.r) || ISNAN(x.i)) return NA_REAL; if (ISNAN(x.r)) return x.r; <- line 208 if (ISNAN(x.i)) return NA_REAL; <- line 209 if (x.i != 0) *warn |= WARN_IMAG; return x.r; } They were added in 2015 (revision 69410). They don't serve any purpose and might slow things down a little (unless compiler optimization is able to ignore them). In any case they should probably be removed. Cheers, H. -- Hervé Pagès Bioconductor Core Team hpages.on.git...@gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] surprised matrix (1:256, 8, 8) doesn't cause error/warning
Hi Martin, It kind of does make sense to issue the warning when **recycling** (and this is consistent with what happens with recycling in general): > matrix(1:4, 6, 6) [,1] [,2] [,3] [,4] [,5] [,6] [1,]131313 [2,]242424 [3,]313131 [4,]424242 [5,]131313 [6,]242424 > matrix(1:4, 5, 6) [,1] [,2] [,3] [,4] [,5] [,6] [1,]123412 [2,]234123 [3,]341234 [4,]412341 [5,]123412 Warning message: In matrix(1:4, 5, 6) : data length [4] is not a sub-multiple or multiple of the number of rows [5] (Note that the warning is misleading. matrix() is happy to take data with a length that is not a sub-multiple of the number of rows or cols as long as it's a sub-multiple of the length of the matrix.) However I'm not sure that **truncating** the data is desirable behavior: > matrix(1:6, 1, 3) [,1] [,2] [,3] [1,]123 > matrix(1:6, 1, 5) [,1] [,2] [,3] [,4] [,5] [1,]12345 Warning message: In matrix(1:6, 1, 5) : data length [6] is not a sub-multiple or multiple of the number of columns [5] Maybe you get a warning sometimes, if you are lucky, but still. Finally note that you never get any warning with array(): > array(1:4, c(5, 6)) [,1] [,2] [,3] [,4] [,5] [,6] [1,]123412 [2,]234123 [3,]341234 [4,]412341 [5,]123412 > array(1:6, c(1, 5)) [,1] [,2] [,3] [,4] [,5] [1,]12345 Cheers, H. On 2/1/21 1:08 AM, Martin Maechler wrote: Abby Spurdle (/əˈbi/) on Mon, 1 Feb 2021 19:50:32 +1300 writes: > I'm a little surprised that the following doesn't trigger an error or a warning. > matrix (1:256, 8, 8) > The help file says that the main argument is recycled, if it's too short. > But doesn't say what happens if it's too long. It's somewhat subtler than one may assume : matrix(1:9, 2,3) [,1] [,2] [,3] [1,]135 [2,]246 Warning message: In matrix(1:9, 2, 3) : data length [9] is not a sub-multiple or multiple of the number of rows [2] matrix(1:8, 2,3) [,1] [,2] [,3] [1,]135 [2,]246 Warning message: In matrix(1:8, 2, 3) : data length [8] is not a sub-multiple or multiple of the number of columns [3] matrix(1:12, 2,3) [,1] [,2] [,3] [1,]135 [2,]246 So it looks to me the current behavior is quite on purpose. Are you sure it's not documented at all when reading the docs carefully? (I did *not*, just now). __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Hervé Pagès Bioconductor Core Team hpages.on.git...@gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""
Excellent! Thanks Martin. H. On 5/28/20 00:39, Martin Maechler wrote: Martin Maechler on Wed, 27 May 2020 13:35:44 +0200 writes: Hervé Pagès on Tue, 26 May 2020 12:38:13 -0700 writes: >> Hi Martin, On 5/26/20 06:24, Martin Maechler wrote: ... >>> >>> What about remaining back-compatible, not only to R 3.y.z >>> with default recycle0=FALSE, but also to R 4.0.0 with >>> recycle0=TRUE >> What back-compatibility with R 4.0.0 are we talking about? >> The 'recycle0' arg was added **after** the R 4.0.0 release >> and has never been part of an official release yet. > Yes, of course. It was *planned* for R 4.0.0 and finally was > too late (feature freeze etc)... I'm sorry I was wrong and > misleading above. >> This is the time to fix it. > Well, R 4.0.1 is already in 'beta' and does contain it too. > So the "fix" should happen really really fast, or we (R core) > take it out from there entirely. Well, in the end your repeated good reasoning has prevailed: I've committed a change (to R-devel; most probably in time to be ported to 4.0.1 beta). I think this implements the recycle0 = TRUE behavior you have been advocating for, in svn r78591 (2020-05-27 19:45:07 +0200) with message paste(), paste0(): collapse= always gives a string (also w/ `recycle0=TRUE`) Best regards, Martin -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""
Hi Martin, On 5/26/20 06:24, Martin Maechler wrote: ... What about remaining back-compatible, not only to R 3.y.z with default recycle0=FALSE, but also to R 4.0.0 with recycle0=TRUE What back-compatibility with R 4.0.0 are we talking about? The 'recycle0' arg was added **after** the R 4.0.0 release and has never been part of an official release yet. This is the time to fix it. *and* add a new option for the Suharto-Bill-Hervé-Gabe behavior, e.g., recycle0="sep.only" or just recycle0="sep" ? OMG! As (for back-compatibility reasons) you have to specify 'recycle0 = ..' anyway, you would get what makes most sense to you by using such a third option. ? (WDYT ?) Don't bother. I'd rather use paste(paste(x, y, z, sep="#", recycle0=TRUE), collapse=",") i.e. explicitly break down the 2 operations (sep and collapse). Might be slightly less efficient but I find it way more readable than paste(x, y, z, sep="#", collapse=",", recycle0="sep.only") BTW I appreciate you trying to accomodate everybody's taste. That doesn't sound like an easy task ;-) I'll just reiterate my earlier comment that controlling the collapse operation via an argument named 'recycle0' doesn't make sense (collapse involves NO recycling). So I don't know if the current 'recyle0=TRUE' behavior is "the correct one" but at the very least the name of the argument is a misnomer and misleading. More generally speaking using the same argument to control 2 distinct operations is not good API design. A better design is to use 2 arguments. Then the 2 arguments can generally be made orthogonal (like in this case) i.e. all possible combinations are valid (4 combinations in this case). Thanks, H. Martin > Switching to scheme (3) or to a new custom scheme > would be a completely different proposal. >> >> At least I'm consistent right? > Yes :-) > Anyway discussing recycling schemes is interesting but not directly > related with what the OP brought up (behavior of the 'collapse' operation). > Cheers, > H. >> >> ~G -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""
On 5/24/20 00:26, Gabriel Becker wrote: On Sat, May 23, 2020 at 9:59 PM Hervé Pagès <mailto:hpa...@fredhutch.org>> wrote: On 5/23/20 17:45, Gabriel Becker wrote: > Maybe my intuition is just > different but when I collapse multiple character vectors together, I > expect all the characters from each of those vectors to be in the > resulting collapsed one. Yes I'd expect that too. But the **collapse** operation in paste() has never been about collapsing **multiple** character vectors together. What it does is collapse the **single** character vector that comes out of the 'sep' operation. I understand what it does, I broke ti down the same way in my post earlier in the thread. the fact remains is that it is a single function which significantly muddies the waters. so you can say paste0(x,y, collapse=",", recycle0=TRUE) is not a collapse operation on multiple vectors, and of course there's a sense in which you're not wrong (again I understand what these functions do), but it sure looks like one in the invocation, doesn't it? Honestly the thing that this whole discussion has shown me most clearly is that, imho, collapse (accepting ONLY one data vector) and paste(accepting multiple) should never have been a single function to begin with. But that ship sailed long long ago. Yes :-( So paste(x, y, z, sep="", collapse=",") is analogous to sum(x + y + z) Honestly, I'd be significantly more comfortable if 1:10 + integer(0) + 5 were an error too. This is actually the recycling scheme used by mapply(): > mapply(function(x, y, z) c(x, y, z), 1:10, integer(0), 5) Error in mapply(FUN = FUN, ...) : zero-length inputs cannot be mixed with those of non-zero length AFAIK base R uses 3 different recycling schemes for n-ary operations: (1) The recycling scheme used by arithmetic and comparison operations (Arith, Compare, Logic group generics). (2) The recycling scheme used by classic paste(). (3) The recycling scheme used by mapply(). Having such a core mechanism like recycling being inconsistent across base R is sad. It makes it really hard to predict how a given n-ary function will recycle its arguments unless you spend some time trying it yourself with several combinations of vector lengths. It is of course the source of numerous latent bugs. I wish there was only one but that's just a dream. None of these 3 recycling schemes is perfect. IMO (2) is by far the worst. (3) is too restrictive and would need to be refined if we wanted to make it a good universal recycling scheme. Anyway I don't think it makes sense to introduce a 4th recycling scheme at this point even though it would be a nice item to put on the wish list for R 7.0.0 with the ultimate goal that it will universally adopted in R 11.0.0 ;-) So if we have to do with what we have IMO (1) is the scheme that makes most sense although I agree that it can do some surprising things for some unusual combinations of vector lengths. It's the scheme I adhere to in my own binary operations e.g. in S4Vector::pcompare(). The modest proposal of the 'recycle0' argument is only to let the user switch from recycling scheme (2) to (1) if they're not happy with scheme (2) (I'm one of them). Switching to scheme (3) or to a new custom scheme would be a completely different proposal. At least I'm consistent right? Yes :-) Anyway discussing recycling schemes is interesting but not directly related with what the OP brought up (behavior of the 'collapse' operation). Cheers, H. ~G -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""
On 5/23/20 17:45, Gabriel Becker wrote: Maybe my intuition is just different but when I collapse multiple character vectors together, I expect all the characters from each of those vectors to be in the resulting collapsed one. Yes I'd expect that too. But the **collapse** operation in paste() has never been about collapsing **multiple** character vectors together. What it does is collapse the **single** character vector that comes out of the 'sep' operation. So paste(x, y, z, sep="", collapse=",") is analogous to sum(x + y + z) The element-wise addition is analog to the 'sep' operation. The sum() operation is analog to the 'collapse' operation. H. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""
On 5/22/20 18:12, brodie gaslam wrote: FWIW what convinces me is consistency with other aggregating functions applied to zero length inputs: sum(numeric(0)) ## [1] 0 Right. And 1 is the identity element of multiplication: > prod(numeric(0)) [1] 1 And the empty string is the identity element of string aggregation by concatenation. H. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""
Gabe, It's the current behavior of paste() that is a major source of bugs: ## Add "rs" prefix to SNP ids and collapse them in a ## comma-separated string. collapse_snp_ids <- function(snp_ids) paste("rs", snp_ids, sep="", collapse=",") snp_groups <- list( group1=c(55, 22, 200), group2=integer(0), group3=c(99, 550) ) vapply(snp_groups, collapse_snp_ids, character(1)) #group1group2group3 # "rs55,rs22,rs200" "rs" "rs99,rs550" This has hit me so many times! Now with 'collapse0=TRUE', we finally have the opportunity to make it do the right thing. Let's not miss that opportunity. Cheers, H. On 5/22/20 11:26, Gabriel Becker wrote: I understand that this is consistent but it also strikes me as an enormous 'gotcha' of a magnitude that 'we' are trying to avoid/smooth over at this point in user-facing R space. For the record I'm not suggesting it should return something other than "", and in particular I'm not arguing that any call to paste /that does not return an error/ with non-NULL collapse should return a character vector of length one. Rather I'm pointing out that it could (perhaps should, imo) simply be an error, which is also consistent, in the strict sense, with previous behavior in that it is the developer simply declining to extend the recycle0 argument to the full parameter space (there is no rule that says we must do so, arguments whose use is incompatible with other arguments can be reasonable and called for). I don't feel feel super strongly that reeturning "" in this and similar cases horrible and should never happen, but i'd bet dollars to donuts that to the extent that behavior occurs it will be a disproportionately major source of bugs, and i think thats at least worth considering in addition to pure consistency. ~G On Fri, May 22, 2020 at 9:50 AM William Dunlap <mailto:wdun...@tibco.com>> wrote: I agree with Herve, processing collapse happens last so collapse=non-NULL always leads to a single character string being returned, the same as paste(collapse=""). See the altPaste function I posted yesterday. Bill Dunlap TIBCO Software wdunlap tibco.com <https://urldefense.proofpoint.com/v2/url?u=http-3A__tibco.com&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Z1o-HO3_OqxOR9LaRguGvnG7X4vF_z1_q13I7zmjcfY&s=7ZT1IjmexPqsDBhrV3NspPTr8M8XiMweEwJWErgAlqw&e=> On Fri, May 22, 2020 at 9:12 AM Hervé Pagès mailto:hpa...@fredhutch.org>> wrote: I think that paste(c("a", "b"), NULL, c("c", "d"), sep = " ", collapse = ",", recycle0=TRUE) should just return an empty string and don't see why it needs to emit a warning or raise an error. To me it does exactly what the user is asking for, which is to change how the 3 arguments are recycled **before** the 'sep' operation. The 'recycle0' argument has no business in the 'collapse' operation (which comes after the 'sep' operation): this operation still behaves like it always had. That's all there is to it. H. On 5/22/20 03:00, Gabriel Becker wrote: > Hi Martin et al, > > > > On Thu, May 21, 2020 at 9:42 AM Martin Maechler > mailto:maech...@stat.math.ethz.ch> <mailto:maech...@stat.math.ethz.ch <mailto:maech...@stat.math.ethz.ch>>> wrote: > > >>>>> Hervé Pagès > >>>>> on Fri, 15 May 2020 13:44:28 -0700 writes: > > > There is still the situation where **both** 'sep' and > 'collapse' are > > specified: > > >> paste(integer(0), "nth", sep="", collapse=",") > > [1] "nth" > > > In that case 'recycle0' should **not** be ignored i.e. > > > paste(integer(0), "nth", sep="", collapse=",", recycle0=TRUE) > > > should return the empty string (and not character(0) like it > does at the > > moment). > > > In other words, 'recycle0' should only control the first > operati
Re: [Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""
I think that paste(c("a", "b"), NULL, c("c", "d"), sep = " ", collapse = ",", recycle0=TRUE) should just return an empty string and don't see why it needs to emit a warning or raise an error. To me it does exactly what the user is asking for, which is to change how the 3 arguments are recycled **before** the 'sep' operation. The 'recycle0' argument has no business in the 'collapse' operation (which comes after the 'sep' operation): this operation still behaves like it always had. That's all there is to it. H. On 5/22/20 03:00, Gabriel Becker wrote: Hi Martin et al, On Thu, May 21, 2020 at 9:42 AM Martin Maechler mailto:maech...@stat.math.ethz.ch>> wrote: >>>>> Hervé Pagès >>>>> on Fri, 15 May 2020 13:44:28 -0700 writes: > There is still the situation where **both** 'sep' and 'collapse' are > specified: >> paste(integer(0), "nth", sep="", collapse=",") > [1] "nth" > In that case 'recycle0' should **not** be ignored i.e. > paste(integer(0), "nth", sep="", collapse=",", recycle0=TRUE) > should return the empty string (and not character(0) like it does at the > moment). > In other words, 'recycle0' should only control the first operation (the > operation controlled by 'sep'). Which makes plenty of sense: the 1st > operation is binary (or n-ary) while the collapse operation is unary. > There is no concept of recycling in the context of unary operations. Interesting, ..., and sounding somewhat convincing. > On 5/15/20 11:25, Gabriel Becker wrote: >> Hi all, >> >> This makes sense to me, but I would think that recycle0 and collapse >> should actually be incompatible and paste should throw an error if >> recycle0 were TRUE and collapse were declared in the same call. I don't >> think the value of recycle0 should be silently ignored if it is actively >> specified. >> >> ~G Just to summarize what I think we should know and agree (or be be "disproven") and where this comes from ... 1) recycle0 is a new R 4.0.0 option in paste() / paste0() which by default (recycle0 = FALSE) should (and *does* AFAIK) not change anything, hence paste() / paste0() behave completely back-compatible if recycle0 is kept to FALSE. 2) recycle0 = TRUE is meant to give different behavior, notably 0-length arguments (among '...') should result in 0-length results. The above does not specify what this means in detail, see 3) 3) The current R 4.0.0 implementation (for which I'm primarily responsible) and help(paste) are in accordance. Notably the help page (Arguments -> 'recycle0' ; Details 1st para ; Examples) says and shows how the 4.0.0 implementation has been meant to work. 4) Several provenly smart members of the R community argue that both the implementation and the documentation of 'recycle0 = TRUE' should be changed to be more logical / coherent / sensical .. Is the above all correct in your view? Assuming yes, I read basically two proposals, both agreeing that recycle0 = TRUE should only ever apply to the action of 'sep' but not the action of 'collapse'. 1) Bill and Hervé (I think) propose that 'recycle0' should have no effect whenever 'collapse = ' 2) Gabe proposes that 'collapse = ' and 'recycle0 = TRUE' should be declared incompatible and error. If going in that direction, I could also see them to give a warning (and continue as if recycle = FALSE). Herve makes a good point about when sep and collapse are both set. That said, if the user explicitly sets recycle0, Personally, I don't think it should be silently ignored under any configuration of other arguments. If all of the arguments are to go into effect, the question then becomes one of ordering, I think. Consider paste(c("a", "b"), NULL, c("c", "d"), sep = " ", collapse = ",", recycle0=TRUE) Currently that returns character(0), becuase the logic is essenttially (in pseudo-code) collapse(paste(c("a", "b"), NULL, c("c", "d"), sep = " ", recycle0=TRUE), collapse = ", ", recycle0=TRUE
Re: [Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""
There is still the situation where **both** 'sep' and 'collapse' are specified: > paste(integer(0), "nth", sep="", collapse=",") [1] "nth" In that case 'recycle0' should **not** be ignored i.e. paste(integer(0), "nth", sep="", collapse=",", recycle0=TRUE) should return the empty string (and not character(0) like it does at the moment). In other words, 'recycle0' should only control the first operation (the operation controlled by 'sep'). Which makes plenty of sense: the 1st operation is binary (or n-ary) while the collapse operation is unary. There is no concept of recycling in the context of unary operations. H. On 5/15/20 11:25, Gabriel Becker wrote: Hi all, This makes sense to me, but I would think that recycle0 and collapse should actually be incompatible and paste should throw an error if recycle0 were TRUE and collapse were declared in the same call. I don't think the value of recycle0 should be silently ignored if it is actively specified. ~G On Fri, May 15, 2020 at 11:05 AM Hervé Pagès <mailto:hpa...@fredhutch.org>> wrote: Totally agree with that. H. On 5/15/20 10:34, William Dunlap via R-devel wrote: > I agree: paste(collapse="something", ...) should always return a single > character string, regardless of the value of recycle0. This would be > similar to when there are no non-NULL arguments to paste; collapse="." > gives a single empty string and collapse=NULL gives a zero long character > vector. >> paste() > character(0) >> paste(collapse=", ") > [1] "" > > Bill Dunlap > TIBCO Software > wdunlap tibco.com <https://urldefense.proofpoint.com/v2/url?u=http-3A__tibco.com&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=cC2qctlVXd0qHMPvCyYvuVMqR8GU3DjTTqKJ0zjIFj8&s=rXIwWqf4U4HZS_bjUT3KfA9ARaV5YTb_kEcXWHnkt-c&e=> > > > On Thu, Apr 30, 2020 at 9:56 PM suharto_anggono--- via R-devel < > r-devel@r-project.org <mailto:r-devel@r-project.org>> wrote: > >> Without 'collapse', 'paste' pastes (concatenates) its arguments >> elementwise (separated by 'sep', " " by default). New in R devel and R >> patched, specifying recycle0 = FALSE makes mixing zero-length and >> nonzero-length arguments results in length zero. The result of paste(n, >> "th", sep = "", recycle0 = FALSE) always have the same length as 'n'. >> Previously, the result is still as long as the longest argument, with the >> zero-length argument like "". If all og the arguments have length zero, >> 'recycle0' doesn't matter. >> >> As far as I understand, 'paste' with 'collapse' as a character string is >> supposed to put together elements of a vector into a single character >> string. I think 'recycle0' shouldn't change it. >> >> In current R devel and R patched, paste(character(0), collapse = "", >> recycle0 = FALSE) is character(0). I think it should be "", like >> paste(character(0), collapse=""). >> >> paste(c("4", "5"), "th", sep = "", collapse = ", ", recycle0 = FALSE) >> is >> "4th, 5th". >> paste(c("4" ), "th", sep = "", collapse = ", ", recycle0 = FALSE) >> is >> "4th". >> I think >> paste(c( ), "th", sep = "", collapse = ", ", recycle0 = FALSE) >> should be >> "", >> not character(0). >> >> __ >> R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list >> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=776IovW06eUHr1EDrabHLY7F47rU9CCUEItSDI96zc0&s=xN84DhkZeoxzn6SG0QTMpOGg2w_ThmjZmZymGUuD0Uw&e= >> > > [[alternative HTML version deleted]] > > __ > R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list > https://urldefense.proofpoint.com/v2/url
Re: [Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""
Totally agree with that. H. On 5/15/20 10:34, William Dunlap via R-devel wrote: I agree: paste(collapse="something", ...) should always return a single character string, regardless of the value of recycle0. This would be similar to when there are no non-NULL arguments to paste; collapse="." gives a single empty string and collapse=NULL gives a zero long character vector. paste() character(0) paste(collapse=", ") [1] "" Bill Dunlap TIBCO Software wdunlap tibco.com On Thu, Apr 30, 2020 at 9:56 PM suharto_anggono--- via R-devel < r-devel@r-project.org> wrote: Without 'collapse', 'paste' pastes (concatenates) its arguments elementwise (separated by 'sep', " " by default). New in R devel and R patched, specifying recycle0 = FALSE makes mixing zero-length and nonzero-length arguments results in length zero. The result of paste(n, "th", sep = "", recycle0 = FALSE) always have the same length as 'n'. Previously, the result is still as long as the longest argument, with the zero-length argument like "". If all og the arguments have length zero, 'recycle0' doesn't matter. As far as I understand, 'paste' with 'collapse' as a character string is supposed to put together elements of a vector into a single character string. I think 'recycle0' shouldn't change it. In current R devel and R patched, paste(character(0), collapse = "", recycle0 = FALSE) is character(0). I think it should be "", like paste(character(0), collapse=""). paste(c("4", "5"), "th", sep = "", collapse = ", ", recycle0 = FALSE) is "4th, 5th". paste(c("4" ), "th", sep = "", collapse = ", ", recycle0 = FALSE) is "4th". I think paste(c(), "th", sep = "", collapse = ", ", recycle0 = FALSE) should be "", not character(0). __ R-devel@r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=776IovW06eUHr1EDrabHLY7F47rU9CCUEItSDI96zc0&s=xN84DhkZeoxzn6SG0QTMpOGg2w_ThmjZmZymGUuD0Uw&e= [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=776IovW06eUHr1EDrabHLY7F47rU9CCUEItSDI96zc0&s=xN84DhkZeoxzn6SG0QTMpOGg2w_ThmjZmZymGUuD0Uw&e= -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] "cd" floating in the air in the man page for paste/paste0
Thanks for the fix. H. On 5/12/20 23:29, Tomas Kalibera wrote: Thanks, fixed. Tomas On 5/13/20 5:14 AM, Dirk Eddelbuettel wrote: On 12 May 2020 at 19:59, Hervé Pagès wrote: | While reading about the new 'recycle0' argument of paste/paste0, I | spotted a mysterious "cd" floating in the air in the man page: | | recycle0: ‘logical’ indicating if zero-length character arguments (and | all zero-length or no arguments when ‘collapse’ is not | ‘NULL’) should lead to the zero-length ‘character(0)’. | cd | ^^ | | This is in R 4.0.0 Patched and R devel. Also still in r-devel as of svn r78432: \item{recycle0}{\code{\link{logical}} indicating if zero-length character arguments (and all zero-length or no arguments when \code{collapse} is not \code{NULL}) should lead to the zero-length \code{\link{character}(0)}.}cd ^^ Dirk -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] "cd" floating in the air in the man page for paste/paste0
Hi, While reading about the new 'recycle0' argument of paste/paste0, I spotted a mysterious "cd" floating in the air in the man page: recycle0: ‘logical’ indicating if zero-length character arguments (and all zero-length or no arguments when ‘collapse’ is not ‘NULL’) should lead to the zero-length ‘character(0)’. cd ^^ This is in R 4.0.0 Patched and R devel. Cheers, H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Rtools and R 4.0.0?
Thanks Jeroen! On Tue, Apr 7, 2020 at 6:07 PM Kevin Ushey wrote: Regardless, I would like to thank R core, CRAN, and Jeroen for all of the time that has gone into creating and validating this new toolchain. This is arduous work at an especially arduous time, so I'd like to voice my appreciation for all the time and energy they have spent on making this possible. Absolutely. Thanks to R core, CRAN, Jeroen, and all the other people involved in creating the new Windows toolchain. Cheers, H. Best, Kevin On Tue, Apr 7, 2020 at 7:47 AM Dirk Eddelbuettel wrote: There appears to have been some progress on this matter: -Note that @command{g++} 4.9.x (as used for @R{} on Windows up to 3.6.x) +Note that @command{g++} 4.9.x (as used on Windows prior to @R{} 4.0.0) See SVN commit r78169 titled 'anticipate change in Windows toolchain', or the mirrored git commit at https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_wch_r-2Dsource_commit_bd674e2b76b2384169424e3d899fbfb5ac174978&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=zMjaTujju0afmK5eIVPZrNajypj8QjuNbSyoAv93ISk&s=oQL_LnqplfOV3qS3_v0vWloGk5Qhr6pWl4Yjzs4Tzzo&e= Dirk -- https://urldefense.proofpoint.com/v2/url?u=http-3A__dirk.eddelbuettel.com&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=zMjaTujju0afmK5eIVPZrNajypj8QjuNbSyoAv93ISk&s=nOplDwpoh_urogK65Old_l1Qi-EbVpyC0Mv4LgeLl64&e= | @eddelbuettel | e...@debian.org __ R-devel@r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=zMjaTujju0afmK5eIVPZrNajypj8QjuNbSyoAv93ISk&s=vUQZdkVyqq3iT9HukcKqEjg80sI-OZoKuy9DKiufquw&e= __ R-devel@r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=zMjaTujju0afmK5eIVPZrNajypj8QjuNbSyoAv93ISk&s=vUQZdkVyqq3iT9HukcKqEjg80sI-OZoKuy9DKiufquw&e= __ R-devel@r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=zMjaTujju0afmK5eIVPZrNajypj8QjuNbSyoAv93ISk&s=vUQZdkVyqq3iT9HukcKqEjg80sI-OZoKuy9DKiufquw&e= -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Hard memory limit of 16GB under Windows?
Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > ls() character(0) > memory.limit() [1] 32627 > sessionInfo() R version 3.6.3 (2020-02-29) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18363) Matrix products: default locale: [1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252 [4] LC_NUMERIC=C LC_TIME=French_France.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.6.3 > __ R-devel@r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=r6WLJ5dXWo2qb7mQwONaCxYeeWgKwycd3y89JoqY-oY&s=ABvG3sGKR5ln27FVCM8dlmZ82X93ZCTigbMxHeBEb6E&e= [[alternative HTML version deleted]] ______ R-devel@r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=r6WLJ5dXWo2qb7mQwONaCxYeeWgKwycd3y89JoqY-oY&s=ABvG3sGKR5ln27FVCM8dlmZ82X93ZCTigbMxHeBEb6E&e= -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] object.size vs lobstr::obj_size
On 3/27/20 15:19, Hadley Wickham wrote: On Fri, Mar 27, 2020 at 4:01 PM Hervé Pagès <mailto:hpa...@fredhutch.org>> wrote: On 3/27/20 12:00, Hadley Wickham wrote: > > > On Fri, Mar 27, 2020 at 10:39 AM Hervé Pagès mailto:hpa...@fredhutch.org> > <mailto:hpa...@fredhutch.org <mailto:hpa...@fredhutch.org>>> wrote: > > Hi Tomas, > > On 3/27/20 07:01, Tomas Kalibera wrote: > > they provide an over-approximation > > They can also provide an "under-approximation" (to say the least) e.g. > on reference objects where the entire substance of the object is > ignored > which makes object.size() completely meaningless in that case: > > setRefClass("A", fields=c(stuff="ANY")) > object.size(new("A", stuff=raw(0))) # 680 bytes > object.size(new("A", stuff=runif(1e8))) # 680 bytes > > Why wouldn't object.size() look at the content of environments? > > > As the author, I'm obviously biased, but I do like lobstr::obj_sizes() > which allows you to see the additional size occupied by one object given > any number of other objects. This is particularly important for > reference classes since individual objects appear quite large: > > A <- setRefClass("A", fields=c(stuff="ANY")) > lobstr::obj_size(new("A", stuff=raw(0))) > #> 567,056 B > > But the vast majority is shared across all instances of that class: > > lobstr::obj_size(A) > #> 719,232 B > lobstr::obj_sizes(A, new("A", stuff=raw(0))) > #> * 719,232 B > #> * 720 B > lobstr::obj_sizes(A, new("A", stuff=runif(1e8))) > #> * 719,232 B > #> * 800,000,720 B Nice. Can you clarify the situation with lobstr::obj_size vs pryr::object_size? I've heard of the latter before and use it sometimes but never heard of the former before seeing Stefan's post. Then I checked the authors of both and thought maybe they should talk to each other ;-) pryr is basically retired :) TBH I don't know why I gave up on it, except lobstr is a cooler name 🤣 That's where all active development is happening. (The underlying code is substantially similar although lobstr includes bug fixes not present in pryr) Good to know, thanks! Couldn't find any mention of pryr being abandoned and superseded by lobster (which definitely sounds more yummy) in pryr's README.md or DESCRIPTION file. Would be good to put this somewhere. H. Hadley -- http://hadley.nz <https://urldefense.proofpoint.com/v2/url?u=http-3A__hadley.nz&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=YbZWqj-epVToKynrOqXF8TgrxHYKx1pF3q2GrOuJwBQ&s=qCeYCgVDbk_GzadBoAgc3cf81fQfRJXpsf0P5meMhtU&e=> -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] object.size vs lobstr::obj_size
On 3/27/20 12:00, Hadley Wickham wrote: On Fri, Mar 27, 2020 at 10:39 AM Hervé Pagès <mailto:hpa...@fredhutch.org>> wrote: Hi Tomas, On 3/27/20 07:01, Tomas Kalibera wrote: > they provide an over-approximation They can also provide an "under-approximation" (to say the least) e.g. on reference objects where the entire substance of the object is ignored which makes object.size() completely meaningless in that case: setRefClass("A", fields=c(stuff="ANY")) object.size(new("A", stuff=raw(0))) # 680 bytes object.size(new("A", stuff=runif(1e8))) # 680 bytes Why wouldn't object.size() look at the content of environments? As the author, I'm obviously biased, but I do like lobstr::obj_sizes() which allows you to see the additional size occupied by one object given any number of other objects. This is particularly important for reference classes since individual objects appear quite large: A <- setRefClass("A", fields=c(stuff="ANY")) lobstr::obj_size(new("A", stuff=raw(0))) #> 567,056 B But the vast majority is shared across all instances of that class: lobstr::obj_size(A) #> 719,232 B lobstr::obj_sizes(A, new("A", stuff=raw(0))) #> * 719,232 B #> * 720 B lobstr::obj_sizes(A, new("A", stuff=runif(1e8))) #> * 719,232 B #> * 800,000,720 B Nice. Can you clarify the situation with lobstr::obj_size vs pryr::object_size? I've heard of the latter before and use it sometimes but never heard of the former before seeing Stefan's post. Then I checked the authors of both and thought maybe they should talk to each other ;-) Thanks, H. Hadley -- http://hadley.nz <https://urldefense.proofpoint.com/v2/url?u=http-3A__hadley.nz&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=MX7Olw-dGRDfJNWEqIDTTTkaagVswOEqcRnxuRBAdjw&s=haVkOV6bEj7VnjT4Gn4iXzRqO7IOqDZUZuEeFPSHQuM&e=> -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] object.size vs lobstr::obj_size
Hi Tomas, On 3/27/20 07:01, Tomas Kalibera wrote: they provide an over-approximation They can also provide an "under-approximation" (to say the least) e.g. on reference objects where the entire substance of the object is ignored which makes object.size() completely meaningless in that case: setRefClass("A", fields=c(stuff="ANY")) object.size(new("A", stuff=raw(0))) # 680 bytes object.size(new("A", stuff=runif(1e8))) # 680 bytes Why wouldn't object.size() look at the content of environments? Thanks, H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] configure --with-pcre1 fails with latest R 4.0 on Ubuntu 14.04
Excellent. Thank you! H. On 3/20/20 23:55, Tomas Kalibera wrote: On 3/18/20 6:11 PM, Hervé Pagès wrote: Thanks Tomas. Any chance the old version of the error message could be restored? It would definitely be more helpful than the current one. It's confusing to get an error and be told to use --with-pcre1 when you're already using it. The message now gives the required version and UTF-8 support requirement, so one does not have to look that one line up. Thanks to Brian Ripley, Tomas H. On 3/18/20 01:08, Tomas Kalibera wrote: On 3/17/20 8:18 PM, Hervé Pagès wrote: Using --with-pcre1 to configure the latest R 4.0 (revision 77988) on an Ubuntu 14.04.5 LTS system gives me the following error: ... checking if lzma version >= 5.0.3... yes checking for pcre2-config... no checking for pcre_fullinfo in -lpcre... yes checking pcre.h usability... yes checking pcre.h presence... yes checking for pcre.h... yes checking pcre/pcre.h usability... no checking pcre/pcre.h presence... no checking for pcre/pcre.h... no checking if PCRE1 version >= 8.32 and has UTF-8 support... no checking whether PCRE support suffices... configure: error: pcre2 library and headers are required, or use --with-pcre1 Maybe the real problem is that the PCRE version on this OS is 8.31? Yes, R requires PCRE version at least 8.32 as documented in R-Admin, and this is since September 2019. The error message is not particularly helpful. An earlier version of the message gave the requirement explicitly, when people would have been more likely to have that old versions of PCRE1. The few who still have it now need to see also the output line above to get the requirement and/or look into the manual. R 4.0 is still keeping support for PCRE1 (>=8.32), but PCRE2 should be used whenever possible. Best, Tomas Thanks, H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] configure --with-pcre1 fails with latest R 4.0 on Ubuntu 14.04
Thanks Tomas. Any chance the old version of the error message could be restored? It would definitely be more helpful than the current one. It's confusing to get an error and be told to use --with-pcre1 when you're already using it. H. On 3/18/20 01:08, Tomas Kalibera wrote: On 3/17/20 8:18 PM, Hervé Pagès wrote: Using --with-pcre1 to configure the latest R 4.0 (revision 77988) on an Ubuntu 14.04.5 LTS system gives me the following error: ... checking if lzma version >= 5.0.3... yes checking for pcre2-config... no checking for pcre_fullinfo in -lpcre... yes checking pcre.h usability... yes checking pcre.h presence... yes checking for pcre.h... yes checking pcre/pcre.h usability... no checking pcre/pcre.h presence... no checking for pcre/pcre.h... no checking if PCRE1 version >= 8.32 and has UTF-8 support... no checking whether PCRE support suffices... configure: error: pcre2 library and headers are required, or use --with-pcre1 Maybe the real problem is that the PCRE version on this OS is 8.31? Yes, R requires PCRE version at least 8.32 as documented in R-Admin, and this is since September 2019. The error message is not particularly helpful. An earlier version of the message gave the requirement explicitly, when people would have been more likely to have that old versions of PCRE1. The few who still have it now need to see also the output line above to get the requirement and/or look into the manual. R 4.0 is still keeping support for PCRE1 (>=8.32), but PCRE2 should be used whenever possible. Best, Tomas Thanks, H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] configure --with-pcre1 fails with latest R 4.0 on Ubuntu 14.04
Using --with-pcre1 to configure the latest R 4.0 (revision 77988) on an Ubuntu 14.04.5 LTS system gives me the following error: ... checking if lzma version >= 5.0.3... yes checking for pcre2-config... no checking for pcre_fullinfo in -lpcre... yes checking pcre.h usability... yes checking pcre.h presence... yes checking for pcre.h... yes checking pcre/pcre.h usability... no checking pcre/pcre.h presence... no checking for pcre/pcre.h... no checking if PCRE1 version >= 8.32 and has UTF-8 support... no checking whether PCRE support suffices... configure: error: pcre2 library and headers are required, or use --with-pcre1 Maybe the real problem is that the PCRE version on this OS is 8.31? The error message is not particularly helpful. Thanks, H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] rounding change
Thanks for the heads up. The new result for round(51/80, digits=3) is also consistent with sprintf("%.3f", 51/80), format(51/80, digits=3), print(51/80, digits=3), and with the sprintf() function in C. Which is somehow satisfying. H. On 3/5/20 05:54, Therneau, Terry M., Ph.D. via R-devel wrote: This is a small heads up for package maintainers. Under the more recent R-devel, R CMD check turned up some changes in the *.out files. The simple demonstration is to type "round(51/80, 3)", which gives .638 under the old and .637 under the new. (One of my coxph test cases has a concordance of exactly 51/80). In this particular case 51/80 is exactly .6375, but that value does not have an exact representation in base 2. The line below would argue that the new version is correct, at least with respect to the internal representation. > print(51/80, digits = 20) [1] 0.63745559 This is not a bug or problem, it just means that whichever version I put into my survival/tests/book6.Rout.save file, one of R-devel or R-current will flag an issue. [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=ByODf3XxvkT0Ag-YiS72sOZMg3b9vKH-pDRcZARaGWQ&s=z5huvy_ZadTqpmI7_sfnFcohmR_I4LdQ3LmOjyEg6kw&e= -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] unlink() on "~" removes the home directory
On 2/26/20 14:47, Gábor Csárdi wrote: !!! DON'T TRY THE CODE IN THIS EMAIL AT HOME !!! Ok I'll try it at work on my boss's computer, sounds a lot safer. H. Well, unlink() does what it is supposed to do, so you could argue that there is nothing wrong with it. Also, nobody would call unlink() on "~", right? The situation is not so simple, however. E.g. if you happen to have a directory called "~", and you iterate over all files and directories to selectively remove some of them, then your code might end up calling unlink on the local "~" directory, and then your home is gone. But you would not create a directory named "~", that is just asking for trouble. Well, surely, _intentionally_ you would not do that. Unintentionally, you might. E.g. something like this is enough: # Create a subpath within a base directory badfun <- function(base = ".", path) { dir.create(file.path(base, path), recursive = TRUE, showWarnings = FALSE) } badfun(path = "~/foo") (If you did run this, be very careful how you remove the directory called "~"!) A real example is `R CMD build` which deletes the home directory of the current user if the root of the package contains a non-empty "~" directory. Luckily this is now fixed in R-devel, so R 4.0.0 will do better. (R 3.6.3 will not.) See https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_wch_r-2Dsource_commit_1d4f7aa1dac427ea2213d1f7cd7b5c16e896af22&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=C3VCGF44o7jATPOlC8aZhaT4YGU1JtcOixJKZgu6KyI&s=iWNt-0G2gZa99bnOqNBMOHph0NyVoJdsIwuA07GhJZQ&e= I have seen several bug reports about various packages (that call R CMD build) removing the home directory, so this indeed happens in practice to a number of people. The commit above will fix `R CMD build`, but it would be great to "fix" this in general. It seems pretty hard to prevent users from creating of a "~" directory. But preventing unlink() from deleting "~" does not actually seem too hard. If unlink() could just refuse removing "~" (when expand = TRUE), that would be great. It seems to me that the current behavior is very-very rarely intended, and its consequences are potentially disastrous. If unlink("~", recursive = TRUE) errors, you can still remove a local "~" file/dir with unlink("./~", ...). And you can still remove your home directory if you really want to do that, with unlink(path.expand("~"), ...). So no functionality is lost. Also, if anyone is aware of packages/functions that tend to create "~" directories or files, please let me know. I would be happy to submit a patch for the new unlink("~") behavior. Thanks, Gabor __ R-devel@r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=C3VCGF44o7jATPOlC8aZhaT4YGU1JtcOixJKZgu6KyI&s=FeZWU9uN-HwDNkSBOmbYXiGqu8q8-U6DI-ddyUn7HHw&e= -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Bug in printing array of type "list"
Hi, This array is of type "list" but print() reports otherwise: a1 <- array(list(1), 2:0) typeof(a1) # [1] "list" a1 # <2 x 1 x 0 array of character> # [,1] # [1,] # [2,] No such problem with an array of type "logical": a2 <- array(NA, 2:0) typeof(a2) # [1] "logical" a2 # <2 x 1 x 0 array of logical> # [,1] # [1,] # [2,] Thanks, H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] as.vector() broken on a matrix or array of type "list"
Hi Martin, On 09/26/2018 12:41 AM, Martin Maechler wrote: Hervé Pagès on Tue, 25 Sep 2018 23:27:19 -0700 writes: > Hi, Unlike on an atomic matrix, as.vector() doesn't drop > the "dim" attribute of matrix or array of type "list": m <- matrix(list(), nrow=2, ncol=3) m # [,1] [,2] [,3] # [1,] NULL NULL NULL # [2,] NULL NULL NULL as.vector(m) # [,1] [,2] [,3] # [1,] NULL NULL NULL # [2,] NULL NULL NULL as documented and as always, including (probably all) versions of S and S-plus. is.vector(as.vector(m)) # [1] FALSE as bad is that looks, that's also "known" and has been the case forever as well... I agree that the semantics of as.vector(.) are not what you would expect, and probably neither what we would do when creating R today. *) The help page {the same for as.vector() and is.vector()} mentions that as.vector() behavior more than once, notably at the end of 'Details' and its 'Note's ... with one exception where you have a strong point, and the documenation is incomplete at least -- under the heading Methods for 'as.vector()': ... follow the conventions of the default method. In particular ... ... ... • ‘is.vector(as.vector(x, m), m)’ should be true for any mode ‘m’, including the default ‘"any"’. and you are right that this is not fulfilled in the case the list has a 'dim' attribute. But I don't think we "can" change as.vector(.) for that case (where it is a no-op). Rather possibly is.vector(.) should not return FALSE but TRUE -- with the reasoning (I think most experienced R programmers would agree) that the foremost property of 'm' is to be - a list() {with a dim attribute and matrix-like indexing possibility} rather than - a 'matrix' {where every matrix entry is a list()}. Note that this change would break all the code around that uses is.vector() to distinguish between an array (of mode "atomic" or "list") and a non-array. Arguably is.array() should preferably be used for that but I'm sure there is a lot of code around that uses is.vector(). The bottom of the problem is that as.vector() doesn't drop attributes that is.vector() sees as "vector breakers" i.e. as breaking the vector nature of an object. So for example is.vector() considers the "dim" attribute to be a vector breaker but as.vector() doesn't drop it. So yes in order to bring is.vector() and as.vector() in agreement you can either change one or the other, or both. My gut feeling though is that it would be less disruptive to not change what is.vector() thinks about the "dim" attribute and to make sure that as.vector() **always** drops it (together with "dimnames" if present). How much code around could there be that calls as.vector() on an array and expects the "dim" attribute to be dropped **except** when the mode() of the array is "list"? It is more likely that the code around that calls as.vector() on an array doesn't expect such exception and so is broken. This was actually the case for my code ;-) Thanks, H. At the moment my gut feeling would propose to only update the documentation, adding that one case as "an exception for historic reasons". Martin - *) {Possibly such an R we would create today would be much closer to julia, where every function is generic / a multi-dispach method "a la S4" and still be blazingly fast, thanks to JIT compilation, method caching and more smart things.} But as you know one of the strength of (base) R is its stability and reliability. You can only use something as a "the language of applied statistics and data science" and rely that published code still works 10 years later if the language is not changed/redesigned from scratch every few years ((as some ... are)). -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] as.vector() broken on a matrix or array of type "list"
Hi, Unlike on an atomic matrix, as.vector() doesn't drop the "dim" attribute of matrix or array of type "list": m <- matrix(list(), nrow=2, ncol=3) m # [,1] [,2] [,3] # [1,] NULL NULL NULL # [2,] NULL NULL NULL as.vector(m) # [,1] [,2] [,3] # [1,] NULL NULL NULL # [2,] NULL NULL NULL is.vector(as.vector(m)) # [1] FALSE Thanks, H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Bias in R's random integers?
Hi, Note that it wouldn't be the first time that sample() changes behavior in a non-backward compatible way: https://stat.ethz.ch/pipermail/r-devel/2012-October/065049.html Cheers, H. On 09/20/2018 08:15 AM, Duncan Murdoch wrote: On 20/09/2018 6:59 AM, Ralf Stubner wrote: On 9/20/18 1:43 AM, Carl Boettiger wrote: For a well-tested C algorithm, based on my reading of Lemire, the unbiased "algorithm 3" in https://urldefense.proofpoint.com/v2/url?u=https-3A__arxiv.org_abs_1805.10941&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=tVt5ARiRzaOYr7BgOc0nC_hDq80BUkAUKNwcowN5W1k&s=TtofIDsvWasBZGzOl9J0kBQnJMksr2Rg3u1l8CM5-qE&e= is part already of the C standard library in OpenBSD and macOS (as arc4random_uniform), and in the GNU standard library. Lemire also provides C++ code in the appendix of his piece for both this and the faster "nearly divisionless" algorithm. It would be excellent if any R core members were interested in considering bindings to these algorithms as a patch, or might express expectations for how that patch would have to operate (e.g. re Duncan's comment about non-integer arguments to sample size). Otherwise, an R package binding seems like a good starting point, but I'm not the right volunteer. It is difficult to do this in a package, since R does not provide access to the random bits generated by the RNG. Only a float in (0,1) is available via unif_rand(). I believe it is safe to multiply the unif_rand() value by 2^32, and take the whole number part as an unsigned 32 bit integer. Depending on the RNG in use, that will give at least 25 random bits. (The low order bits are the questionable ones. 25 is just a guess, not a guarantee.) However, if one is willing to use an external RNG, it is of course possible. After reading about Lemire's work [1], I had planned to integrate such an unbiased sampling scheme into the dqrng package, which I have now started. [2] Using Duncan's example, the results look much better: library(dqrng) m <- (2/5)*2^32 y <- dqsample(m, 100, replace = TRUE) table(y %% 2) 0 1 500252 499748 Another useful diagnostic is plot(density(y[y %% 2 == 0])) Obviously that should give a more or less uniform density, but for values near m, the default sample() gives some nice pretty pictures of quite non-uniform densities. By the way, there are actually quite a few examples of very large m besides m = (2/5)*2^32 where performance of sample() is noticeably bad. You'll see problems in y %% 2 for any integer a > 1 with m = 2/(1 + 2a) * 2^32, problems in y %% 3 for m = 3/(1 + 3a)*2^32 or m = 3/(2 + 3a)*2^32, etc. So perhaps I'm starting to be convinced that the default sample() should be fixed. Duncan Murdoch Currently I am taking the other interpretation of "truncated": table(dqsample(2.5, 100, replace = TRUE)) 0 1 499894 500106 I will adjust this to whatever is decided for base R. However, there is currently neither long vector nor weighted sampling support. And the performance without replacement is quite bad compared to R's algorithm with hashing. cheerio ralf [1] via https://urldefense.proofpoint.com/v2/url?u=http-3A__www.pcg-2Drandom.org_posts_bounded-2Drands.html&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=tVt5ARiRzaOYr7BgOc0nC_hDq80BUkAUKNwcowN5W1k&s=OlX-dzwoOeFlod3Gofa_1TQaZwmjsCH9C9v3lM5Y2rY&e= [2] https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_daqana_dqrng_tree_feature_sample&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=tVt5ARiRzaOYr7BgOc0nC_hDq80BUkAUKNwcowN5W1k&s=DNaSqRCy89Hvbg1G0SpyEL0kkr9_RqWXi9pTy75V32M&e= __ R-devel@r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=tVt5ARiRzaOYr7BgOc0nC_hDq80BUkAUKNwcowN5W1k&s=WOx4NyeYmWxpDG3tBRQ9-_Y3_7YAlKUKOP6gZLs0BrQ&e= __ R-devel@r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=tVt5ARiRzaOYr7BgOc0nC_hDq80BUkAUKNwcowN5W1k&s=WOx4NyeYmWxpDG3tBRQ9-_Y3_7YAlKUKOP6gZLs0BrQ&e= -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Argument 'dim' misspelled in error message
Thanks! On 09/01/2018 05:42 AM, Kurt Hornik wrote: Hervé Pagès writes: Thanks: fixed in the trunk with c75223. Best -k Hi, The following error message misspells the name of the 'dim' argument: array(integer(0), dim=integer(0)) Error in array(integer(0), dim = integer(0)) : 'dims' cannot be of length 0 The name of the argument is 'dim' not 'dims': args(array) function (data = NA, dim = length(data), dimnames = NULL) NULL Cheers, H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFAw&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=SzMRc3M_TJEtaAqp-2nqiquGAjCH605Ocf2-jkPG_1E&s=1PeobGV2Ld7gOtIS5coLotgg3VLknDQyCXVjO08DbX4&e= -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Argument 'dim' misspelled in error message
Hi, The following error message misspells the name of the 'dim' argument: > array(integer(0), dim=integer(0)) Error in array(integer(0), dim = integer(0)) : 'dims' cannot be of length 0 The name of the argument is 'dim' not 'dims': > args(array) function (data = NA, dim = length(data), dimnames = NULL) NULL Cheers, H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Where does L come from?
On 08/25/2018 04:33 PM, Duncan Murdoch wrote: On 25/08/2018 4:49 PM, Hervé Pagès wrote: The choice of the L suffix in R to mean "R integer type", which is mapped to the "int" type at the C level, and NOT to the "long int" type, is really unfortunate as it seems to be misleading and confusing a lot of people. I don't have stats about this so I take back the "lot". Can you provide any evidence of that (e.g. a link to a message from one of these people)? I think a lot of people don't really know about the L suffix, but that's different from being confused or misleaded by it. And if you make a criticism like that, it would really be fair to suggest what R should have done instead. I can't think of anything better, given that "i" was already taken, and that the lack of a decimal place had historically not been significant. Using "I" *would* have been confusing (3i versus 3I being very different). Deciding that 3 suddenly became an integer value different from 3. would have led to lots of inefficient conversions (since stats mainly deals with floating point values). Maybe 10N, or 10n? I'm not convinced that 10I would have been confusing but the I can easily be mistaken for a 1. H. Duncan Murdoch The fact that nowadays "int" and "long int" have the same size on most platforms is only anecdotal here. Just my 2 cents. H. On 08/25/2018 10:01 AM, Dirk Eddelbuettel wrote: On 25 August 2018 at 09:28, Carl Boettiger wrote: | I always thought it meant "Long" (I'm assuming R's integers are long | integers in C sense (iirrc one can declare 'long x', and it being common to | refer to integers as "longs" in the same way we use "doubles" to mean | double precision floating point). But pure speculation on my part, so I'm | curious! It does per my copy (dated 1990 !!) of the 2nd ed of Kernighan & Ritchie. It explicitly mentions (sec 2.2) that 'int' may be 16 or 32 bits, and 'long' is 32 bit; and (in sec 2.3) introduces the I, U, and L labels for constants. So "back then when" 32 bit was indeed long. And as R uses 32 bit integers ... (It is all murky because the size is an implementation detail and later "essentially everybody" moved to 32 bit integers and 64 bit longs as the 64 bit architectures became prevalent. Which is why when it matters one should really use more explicit types like int32_t or int64_t.) Dirk -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Where does L come from?
On 08/25/2018 02:23 PM, Dirk Eddelbuettel wrote: On 25 August 2018 at 13:49, Hervé Pagès wrote: | The choice of the L suffix in R to mean "R integer type", which | is mapped to the "int" type at the C level, and NOT to the "long int" | type, is really unfortunate as it seems to be misleading and confusing | a lot of people. The point I was trying to make in what you quote below is that the L may come from a time when int and long int were in fact the same on most relevant architectures. And it is hardly R's fault that C was allowed to change. Also, it hardly matters given that R has precisely one integer type so I am unsure where you see the confusion between long int and int. | The fact that nowadays "int" and "long int" have the same size on most | platforms is only anecdotal here. | | Just my 2 cents. Are you sure? R> Rcpp::evalCpp("sizeof(long int)") [1] 8 R> Rcpp::evalCpp("sizeof(int)") [1] 4 R> My bad, it's only the same on Windows. My point is that the discussion about the size of int vs long int is only a distraction here. The important bit is that 10L in R is represented by 10 in C, which is an int, not by 10L, which is a long int. Could hardly be more confusing. H. Dirk | H. | | On 08/25/2018 10:01 AM, Dirk Eddelbuettel wrote: | > | > On 25 August 2018 at 09:28, Carl Boettiger wrote: | > | I always thought it meant "Long" (I'm assuming R's integers are long | > | integers in C sense (iirrc one can declare 'long x', and it being common to | > | refer to integers as "longs" in the same way we use "doubles" to mean | > | double precision floating point). But pure speculation on my part, so I'm | > | curious! | > | > It does per my copy (dated 1990 !!) of the 2nd ed of Kernighan & Ritchie. It | > explicitly mentions (sec 2.2) that 'int' may be 16 or 32 bits, and 'long' is | > 32 bit; and (in sec 2.3) introduces the I, U, and L labels for constants. So | > "back then when" 32 bit was indeed long. And as R uses 32 bit integers ... | > | > (It is all murky because the size is an implementation detail and later | > "essentially everybody" moved to 32 bit integers and 64 bit longs as the 64 | > bit architectures became prevalent. Which is why when it matters one should | > really use more explicit types like int32_t or int64_t.) | > | > Dirk | > | | -- | Hervé Pagès | | Program in Computational Biology | Division of Public Health Sciences | Fred Hutchinson Cancer Research Center | 1100 Fairview Ave. N, M1-B514 | P.O. Box 19024 | Seattle, WA 98109-1024 | | E-mail: hpa...@fredhutch.org | Phone: (206) 667-5791 | Fax:(206) 667-1319 -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Where does L come from?
The choice of the L suffix in R to mean "R integer type", which is mapped to the "int" type at the C level, and NOT to the "long int" type, is really unfortunate as it seems to be misleading and confusing a lot of people. The fact that nowadays "int" and "long int" have the same size on most platforms is only anecdotal here. Just my 2 cents. H. On 08/25/2018 10:01 AM, Dirk Eddelbuettel wrote: On 25 August 2018 at 09:28, Carl Boettiger wrote: | I always thought it meant "Long" (I'm assuming R's integers are long | integers in C sense (iirrc one can declare 'long x', and it being common to | refer to integers as "longs" in the same way we use "doubles" to mean | double precision floating point). But pure speculation on my part, so I'm | curious! It does per my copy (dated 1990 !!) of the 2nd ed of Kernighan & Ritchie. It explicitly mentions (sec 2.2) that 'int' may be 16 or 32 bits, and 'long' is 32 bit; and (in sec 2.3) introduces the I, U, and L labels for constants. So "back then when" 32 bit was indeed long. And as R uses 32 bit integers ... (It is all murky because the size is an implementation detail and later "essentially everybody" moved to 32 bit integers and 64 bit longs as the 64 bit architectures became prevalent. Which is why when it matters one should really use more explicit types like int32_t or int64_t.) Dirk -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] longint
On 08/16/2018 11:30 AM, Prof Brian Ripley wrote: On 16/08/2018 18:33, Hervé Pagès wrote: ... Only on Intel platforms int is 32 bits. Strictly speaking int is only required to be >= 16 bits. Who knows what the size of an int is on the Sunway TaihuLight for example ;-) R's configure checks that int is 32 bit and will not compile without it (src/main/arithmetic.c) ... so int and int32_t are the same on all platforms where the latter is defined. Good to know. Thanks for the clarification! -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] longint
On 08/16/2018 05:12 AM, Dirk Eddelbuettel wrote: On 15 August 2018 at 20:32, Benjamin Tyner wrote: | Thanks for the replies and for confirming my suspicion. | | Interestingly, src/include/S.h uses a trick: | | #define longint int | | and so does the nlme package (within src/init.c). As Bill Dunlap already told you, this is a) ancient and b) was concerned with the int as 16 bit to 32 bit transition period. Ie a long time ago. Old C programmers remember. You should preferably not even use 'long int' on the other side but rely on the fact that all compiler nowadays allow you to specify exactly what size is used via int64_t (long), int32_t (int), ... and the unsigned cousins (which R does not have). So please receive the value as a int64_t and then cast it to an int32_t -- which corresponds to R's notion of an integer on every platform. Only on Intel platforms int is 32 bits. Strictly speaking int is only required to be >= 16 bits. Who knows what the size of an int is on the Sunway TaihuLight for example ;-) H. And please note that that conversion is lossy. If you must keep 64 bits then the bit64 package by Jens Oehlschlaegel is good and eg fully supported inside data.table. We use it for 64-bit integers as nanosecond timestamps in our nanotime package (which has some converters). Dirk -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] longint
No segfault but a BIG warning from the compiler. That's because dereferencing the pointer inside your myfunc() function will produce an int that is not predictable i.e. it is system-dependent. Its value will depend on sizeof(long int) (which is not guaranteed to be 8) and on the endianness of the system. Also if the pointer you pass in the call to the function is an array of long ints, then pointer arithmetic inside your myfunc() won't necessarily take you to the array element that you'd expect. Note that there are very specific situations where you can actually do this kind of things e.g. in the context of writing a callback function to pass to qsort(). See 'man 3 qsort' if you are on a Unix system. In that case pointers to void and explicit casts should be used. If done properly, this is portable code and the compiler won't issue warnings. H. On 08/15/2018 07:05 AM, Brian Ripley wrote: On 15 Aug 2018, at 12:48, Duncan Murdoch wrote: On 15/08/2018 7:08 AM, Benjamin Tyner wrote: Hi In my R package, imagine I have a C function defined: void myfunc(int *x) { // some code } but when I call it, I pass it a pointer to a longint instead of a pointer to an int. Could this practice potentially result in a segfault? I don't think the passing would cause a segfault, but "some code" might be expecting a positive number, and due to the type error you could pass in a positive longint and have it interpreted as a negative int. Are you thinking only of a little-endian system? A 32-bit lookup of a pointer to a 64-bit area could read the wrong half and get a completely different value. Duncan Murdoch __ R-devel@r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=ERck0y30d00Np6hqTNYfjusx1beZim0OrKe9O4vkUxU&s=x1gI9ACZol7WbaWQ7Ocv60csJFJClZotWkJIMwUdjIc&e= __ R-devel@r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=ERck0y30d00Np6hqTNYfjusx1beZim0OrKe9O4vkUxU&s=x1gI9ACZol7WbaWQ7Ocv60csJFJClZotWkJIMwUdjIc&e= -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] MARGIN in base::unique.matrix() and base::unique.array()
Hi, The man page for base::unique.matrix() and base::unique.array() says that MARGIN is expected to be a single integer. OTOH the code in charge of checking the user supplied MARGIN is: if (length(MARGIN) > ndim || any(MARGIN > ndim)) stop(gettextf("MARGIN = %d is invalid for dim = %d", MARGIN, dx), domain = NA) which doesn't really make sense. As a consequence the user gets an obscure error message when specifying a MARGIN that satisfies the above check but is in fact invalid: > unique(matrix(1:10, ncol=2), MARGIN=1:2) Error in args[[MARGIN]] <- !duplicated.default(temp, fromLast = fromLast, : object of type 'symbol' is not subsettable Also the code used by the above check to generate the error message is broken: > unique(matrix(1:10, ncol=2), MARGIN=1:3) Error in sprintf(gettext(fmt, domain = domain), ...) : arguments cannot be recycled to the same length > unique(matrix(1:10, ncol=2), MARGIN=3) Error in unique.matrix(matrix(1:10, ncol = 2), MARGIN = 3) : c("MARGIN = 3 is invalid for dim = 5", "MARGIN = 3 is invalid for dim = 2") Thanks, H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Subsetting the "ROW"s of an object
On 06/08/2018 02:15 PM, Hadley Wickham wrote: On Fri, Jun 8, 2018 at 2:09 PM, Berry, Charles wrote: On Jun 8, 2018, at 1:49 PM, Hadley Wickham wrote: Hmmm, yes, there must be some special case in the C code to avoid recycling a length-1 logical vector: Here is a version that (I think) handles Herve's issue of arrays having one or more 0 dimensions. subset_ROW <- function(x,i) { dims <- dim(x) index_list <- which(dims[-1] != 0L) + 3 mc <- quote(x[i]) nd <- max(1L, length(dims)) mc[ index_list ] <- list(TRUE) mc[[ nd + 3L ]] <- FALSE names( mc )[ nd+3L ] <- "drop" eval(mc) } Curiously enough the timing is *much* better for this implementation than for the first version I sent. Constructing a version of `mc' that looks like `x[idrop=FALSE]' can be done with `alist(a=)' in place of `list(TRUE)' in the earlier version but seems to slow things down noticeably. It requires almost twice (!!) as much time as the version above. I think that's probably because alist() is a slow way to generate a missing symbol: bench::mark( alist(x = ), list(x = quote(expr = )), check = FALSE )[1:5] #> # A tibble: 2 x 5 #> expressionmin mean median max #> #> 1 alist(x = ) 2.8µs 3.54µs 3.29µs 34.9µs #> 2 list(x = quote(expr = ))169ns 219.38ns181ns 24.2µs (note the units) That's a good one. Need to change this in S4Vectors::default_extractROWS() and other places. Thanks! H. Hadley -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Subsetting the "ROW"s of an object
The C code for subsetting doesn't need to recycle a logical subscript. It only needs to walk on it and start again at the beginning of the vector when it reaches the end. Not exactly the same as detecting the "take everything along that dimension" situation though. x[TRUE, TRUE, TRUE] triggers the full subsetting machinery when x[] and x[ , , ] could (and should) easily avoid it. H. On 06/08/2018 01:49 PM, Hadley Wickham wrote: Hmmm, yes, there must be some special case in the C code to avoid recycling a length-1 logical vector: dims <- c(4, 4, 4, 1e5) arr <- array(rnorm(prod(dims)), dims) dim(arr) #> [1] 4 4 4 10 i <- c(1, 3) bench::mark( arr[i, TRUE, TRUE, TRUE], arr[i, , , ] )[c("expression", "min", "mean", "max")] #> # A tibble: 2 x 4 #> expressionmin mean max #> #> 1 arr[i, TRUE, TRUE, TRUE] 41.8ms 43.6ms 46.5ms #> 2 arr[i, , , ] 41.7ms 43.1ms 46.3ms On Fri, Jun 8, 2018 at 12:31 PM, Berry, Charles wrote: On Jun 8, 2018, at 11:52 AM, Hadley Wickham wrote: On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles wrote: On Jun 8, 2018, at 10:37 AM, Hervé Pagès wrote: Also the TRUEs cause problems if some dimensions are 0: matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE] Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] : (subscript) logical subscript too long OK. But this is easy enough to handle. H. On 06/08/2018 10:29 AM, Hadley Wickham wrote: I suspect this will have suboptimal performance since the TRUEs will get recycled. (Maybe there is, or could be, ALTREP, support for recycling) Hadley AFAICS, it is not an issue. Taking arr <- array(rnorm(2^22),c(2^10,4,4,4)) as a test case and using a function that will either use the literal code `x[idrop=FALSE]' or `eval(mc)': subset_ROW4 <- function(x, i, useLiteral=FALSE) { literal <- quote(x[idrop=FALSE]) mc <- quote(x[i]) nd <- max(1L, length(dim(x))) mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L) mc[["drop"]] <- FALSE if (useLiteral) eval(literal) else eval(mc) } I get identical times with system.time(for (i in 1:1) subset_ROW4(arr,seq(1,length=10,by=100),TRUE)) and with system.time(for (i in 1:1) subset_ROW4(arr,seq(1,length=10,by=100),FALSE)) I think that's because you used a relatively low precision timing mechnaism, and included the index generation in the timing. I see: arr <- array(rnorm(2^22),c(2^10,4,4,4)) i <- seq(1,length = 10, by = 100) bench::mark( arr[i, TRUE, TRUE, TRUE], arr[i, , , ] ) #> # A tibble: 2 x 1 #> expressionminmean median max n_gc #> #> 1 arr[i, TRUE,… 7.4µs 10.9µs 10.66µs 1.22ms 2 #> 2 arr[i, , , ] 7.06µs 8.8µs 7.85µs 538.09µs 2 So not a huge difference, but it's there. Funny. I get similar results to yours above albeit with smaller differences. Usually < 5 percent. But with subset_ROW4 I see no consistent difference. In this example, it runs faster on average using `eval(mc)' to return the result: arr <- array(rnorm(2^22),c(2^10,4,4,4)) i <- seq(1,length=10,by=100) bench::mark(subset_ROW4(arr,i,FALSE), subset_ROW4(arr,i,TRUE))[,1:8] # A tibble: 2 x 8 expression min mean median max `itr/sec` mem_alloc n_gc 1 subset_ROW4(arr, i, FALSE) 28.9µs 34.9µs 32.1µs 1.36ms28686. 5.05KB 5 2 subset_ROW4(arr, i, TRUE)28.9µs 35µs 32.4µs 875.11µs28572. 5.05KB 5 And on subsequent reps the lead switches back and forth. Chuck -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Subsetting the "ROW"s of an object
A missing subscript is still preferable to a TRUE though because it carries the meaning "take it all". A TRUE also achieves this but via implicit recycling. For example x[ , , ] and x[TRUE, TRUE, TRUE] achieve the same thing (if length(x) != 0) and are both no-ops but the subsetting code gets a chance to immediately and easily detect the former as a no-op whereas it will probably not be able to do it so easily for the latter. So in this case it will most likely generate a copy of 'x' and fill the new array by taking a full walk on it. H. On 06/08/2018 11:52 AM, Hadley Wickham wrote: On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles wrote: On Jun 8, 2018, at 10:37 AM, Hervé Pagès wrote: Also the TRUEs cause problems if some dimensions are 0: > matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE] Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] : (subscript) logical subscript too long OK. But this is easy enough to handle. H. On 06/08/2018 10:29 AM, Hadley Wickham wrote: I suspect this will have suboptimal performance since the TRUEs will get recycled. (Maybe there is, or could be, ALTREP, support for recycling) Hadley AFAICS, it is not an issue. Taking arr <- array(rnorm(2^22),c(2^10,4,4,4)) as a test case and using a function that will either use the literal code `x[idrop=FALSE]' or `eval(mc)': subset_ROW4 <- function(x, i, useLiteral=FALSE) { literal <- quote(x[idrop=FALSE]) mc <- quote(x[i]) nd <- max(1L, length(dim(x))) mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L) mc[["drop"]] <- FALSE if (useLiteral) eval(literal) else eval(mc) } I get identical times with system.time(for (i in 1:1) subset_ROW4(arr,seq(1,length=10,by=100),TRUE)) and with system.time(for (i in 1:1) subset_ROW4(arr,seq(1,length=10,by=100),FALSE)) I think that's because you used a relatively low precision timing mechnaism, and included the index generation in the timing. I see: arr <- array(rnorm(2^22),c(2^10,4,4,4)) i <- seq(1,length = 10, by = 100) bench::mark( arr[i, TRUE, TRUE, TRUE], arr[i, , , ] ) #> # A tibble: 2 x 1 #> expressionminmean median max n_gc #> #> 1 arr[i, TRUE,… 7.4µs 10.9µs 10.66µs 1.22ms 2 #> 2 arr[i, , , ] 7.06µs 8.8µs 7.85µs 538.09µs 2 So not a huge difference, but it's there. Hadley -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Subsetting the "ROW"s of an object
Also the TRUEs cause problems if some dimensions are 0: > matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE] Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] : (subscript) logical subscript too long H. On 06/08/2018 10:29 AM, Hadley Wickham wrote: I suspect this will have suboptimal performance since the TRUEs will get recycled. (Maybe there is, or could be, ALTREP, support for recycling) Hadley On Fri, Jun 8, 2018 at 10:16 AM, Berry, Charles wrote: On Jun 8, 2018, at 8:45 AM, Hadley Wickham wrote: Hi all, Is there a better to way to subset the ROWs (in the sense of NROW) of an vector, matrix, data frame or array than this? You can use TRUE to fill the subscripts for dimensions 2:nd subset_ROW <- function(x, i) { nd <- length(dim(x)) if (nd <= 1L) { x[i] } else { dims <- rep(list(quote(expr = )), nd - 1L) do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE))) } } subset_ROW <- function(x,i) { mc <- quote(x[i]) nd <- max(1L, length(dim(x))) mc[seq(4, length=nd-1L)] <- rep(list(TRUE), nd - 1L) mc[["drop"]] <- FALSE eval(mc) } subset_ROW(1:10, 4:6) #> [1] 4 5 6 str(subset_ROW(array(1:10, c(10)), 2:4)) #> int [1:3(1d)] 2 3 4 str(subset_ROW(array(1:10, c(10, 1)), 2:4)) #> int [1:3, 1] 2 3 4 str(subset_ROW(array(1:10, c(5, 2)), 2:4)) #> int [1:3, 1:2] 2 3 4 7 8 9 str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4)) #> int [1:3, 1, 1] 2 3 4 subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4) #> x y #> 2 2 9 #> 3 3 8 #> 4 4 7 HTH, Chuck -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Subsetting the "ROW"s of an object
On 06/08/2018 10:32 AM, Hervé Pagès wrote: On 06/08/2018 10:15 AM, Michael Lawrence wrote: There probably should be an abstraction for this. In S4Vectors, we have extractROWS(). FWIW the code in S4Vectors that does what your subset_ROW() does is: https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Bioconductor_S4Vectors_blob_04cc9516af986b30445e99fd1337f13321b7b4f6_R_subsetting-2Dutils.R-23L466-2DL476&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=LnDTzOeXwI6VI-4SVVi2rwDE7A-az-AhxPAB6X7Lkhc&s=_2PVGd2BrNNHtPjGsJkhSLAmtX3eoFuZDWWs2c8zZ4w&e= Wrong link sorry. Here is the correct one: https://github.com/Bioconductor/S4Vectors/blob/04cc9516af986b30445e99fd1337f13321b7b4f6/R/subsetting-utils.R#L453-L464 H. (This is the default "extractROWS" method.) Except for the normalization of 'i', it does the same as your subset_ROW(). I don't know how to do this without generating a call with missing arguments. H. Michael On Fri, Jun 8, 2018 at 8:45 AM, Hadley Wickham wrote: Hi all, Is there a better to way to subset the ROWs (in the sense of NROW) of an vector, matrix, data frame or array than this? subset_ROW <- function(x, i) { nd <- length(dim(x)) if (nd <= 1L) { x[i] } else { dims <- rep(list(quote(expr = )), nd - 1L) do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE))) } } subset_ROW(1:10, 4:6) #> [1] 4 5 6 str(subset_ROW(array(1:10, c(10)), 2:4)) #> int [1:3(1d)] 2 3 4 str(subset_ROW(array(1:10, c(10, 1)), 2:4)) #> int [1:3, 1] 2 3 4 str(subset_ROW(array(1:10, c(5, 2)), 2:4)) #> int [1:3, 1:2] 2 3 4 7 8 9 str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4)) #> int [1:3, 1, 1] 2 3 4 subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4) #> x y #> 2 2 9 #> 3 3 8 #> 4 4 7 It seems like there should be a way to do this that doesn't require generating a call with missing arguments, but I can't think of it. Thanks! Hadley -- https://urldefense.proofpoint.com/v2/url?u=http-3A__hadley.nz&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=MF0DzYDiaYtcFXIyQwpQKs9lVbLNvdBBUubTv7BVAfM&s=GSpoAzc1Kn_BnTIkDh0HBFGKtRm-xFodxEPOejriC9Q&e= __ R-devel@r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=MF0DzYDiaYtcFXIyQwpQKs9lVbLNvdBBUubTv7BVAfM&s=HsEbNAT5IElAUS-W2VVSeJs4tfQc77heV7BbQxru518&e= __ R-devel@r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=MF0DzYDiaYtcFXIyQwpQKs9lVbLNvdBBUubTv7BVAfM&s=HsEbNAT5IElAUS-W2VVSeJs4tfQc77heV7BbQxru518&e= -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Subsetting the "ROW"s of an object
On 06/08/2018 10:15 AM, Michael Lawrence wrote: There probably should be an abstraction for this. In S4Vectors, we have extractROWS(). FWIW the code in S4Vectors that does what your subset_ROW() does is: https://github.com/Bioconductor/S4Vectors/blob/04cc9516af986b30445e99fd1337f13321b7b4f6/R/subsetting-utils.R#L466-L476 (This is the default "extractROWS" method.) Except for the normalization of 'i', it does the same as your subset_ROW(). I don't know how to do this without generating a call with missing arguments. H. Michael On Fri, Jun 8, 2018 at 8:45 AM, Hadley Wickham wrote: Hi all, Is there a better to way to subset the ROWs (in the sense of NROW) of an vector, matrix, data frame or array than this? subset_ROW <- function(x, i) { nd <- length(dim(x)) if (nd <= 1L) { x[i] } else { dims <- rep(list(quote(expr = )), nd - 1L) do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE))) } } subset_ROW(1:10, 4:6) #> [1] 4 5 6 str(subset_ROW(array(1:10, c(10)), 2:4)) #> int [1:3(1d)] 2 3 4 str(subset_ROW(array(1:10, c(10, 1)), 2:4)) #> int [1:3, 1] 2 3 4 str(subset_ROW(array(1:10, c(5, 2)), 2:4)) #> int [1:3, 1:2] 2 3 4 7 8 9 str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4)) #> int [1:3, 1, 1] 2 3 4 subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4) #> x y #> 2 2 9 #> 3 3 8 #> 4 4 7 It seems like there should be a way to do this that doesn't require generating a call with missing arguments, but I can't think of it. Thanks! Hadley -- https://urldefense.proofpoint.com/v2/url?u=http-3A__hadley.nz&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=MF0DzYDiaYtcFXIyQwpQKs9lVbLNvdBBUubTv7BVAfM&s=GSpoAzc1Kn_BnTIkDh0HBFGKtRm-xFodxEPOejriC9Q&e= __ R-devel@r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=MF0DzYDiaYtcFXIyQwpQKs9lVbLNvdBBUubTv7BVAfM&s=HsEbNAT5IElAUS-W2VVSeJs4tfQc77heV7BbQxru518&e= __ R-devel@r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=MF0DzYDiaYtcFXIyQwpQKs9lVbLNvdBBUubTv7BVAfM&s=HsEbNAT5IElAUS-W2VVSeJs4tfQc77heV7BbQxru518&e= -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Dispatch mechanism seems to alter object before calling method on it
On 05/16/2018 01:24 PM, Michael Lawrence wrote: On Wed, May 16, 2018 at 12:23 PM, Hervé Pagès wrote: On 05/16/2018 10:22 AM, Michael Lawrence wrote: Factors and data.frames are not structures, because they must have a class attribute. Just call them "objects". They are higher level than structures, which in practice just shape data without adding a lot of semantics. Compare getClass("matrix") and getClass("factor"). I agree that inheritance through explicit coercion is confusing. As far as I know, there are only 2 places where it is used: 1) Objects with attributes but no class, basically "structure" and its subclasses "array" <- "matrix" 2) Classes that extend a reference type ("environment", "name" and "externalptr") via hidden delegation (@.xData) I'm not sure if anyone should be doing #2. For #1, a simple "fix" would be just to drop inheritance of "structure" from "vector". I think the intent was to mimic base R behavior, where it will happily strip (or at least ignore) attributes when passing an array or matrix to an internal function that expects a vector. A related problem, which explains why factor and data.frame inherit from "vector" even though they are objects, is that any S4 object derived from those needs to be (for pragmatic compatibility reasons) an integer vector or list, respectively, internally (the virtual @.Data slot). Separating that from inheritance would probably be difficult. Yes, we can consider these to be problems, to some extent stemming from the behavior and design of R itself, but I'm not sure it's worth doing anything about them at this point. Thanks for the informative discussion. It still doesn't explain why 'm' gets its attributes stripped and 'x' does not though: m <- matrix(1:12, ncol=3) x <- structure(1:3, titi="A") setGeneric("foo", function(x) standardGeneric("foo")) setMethod("foo", "vector", identity) foo(m) # [1] 1 2 3 4 5 6 7 8 9 10 11 12 foo(x) # [1] 1 2 3 # attr(,"titi") # [1] "A" If I understand correctly, both are "structures", not "objects". The structure 'x' has no class, so nothing special is going to happen. As you know, S4 has a well-defined class hierarchy. Just look at getClass("structure") to see its subclasses. There was at some point an attempt to create a sort of dynamic inheritance, where a 'test' function would be called and could figure this out. However, that was never implemented. For one thing, it would be even more confusing. Why aren't these problems worth fixing? More generally speaking the erratic behavior of the S4 system with respect to S3 objects has been a plague since the beginning of the methods package. And many people have complained about this in many occasions in one way or another. For the record, here are some of the most notorious problems: class(as.numeric(1:4)) # [1] "numeric" class(as(1:4, "numeric")) # [1] "integer" This is not really a problem with the methods package. is.numeric(1L) is TRUE, thus integer extends numeric, so coercing an integer to numeric is a no-op. Only as(1:4, "numeric", strict=FALSE) should be a no-op. as(1:4, "numeric") should still coerce because as() is supposed to perform strict coercion by default. as.numeric() should really be called as.double() or something. But that's not going to change, of course. as.numeric() is doing the right thing (i.e. strict coercion) so there is no need to touch it. is.vector(matrix()) # [1] FALSE is(matrix(), "vector") # [1] TRUE We already discussed this in the context of "structure" inheriting from "vector" and explicit coercion. is.list(data.frame()) # [1] TRUE is(data.frame(), "list") # [1] FALSE extends("data.frame", "list") # [1] TRUE This is a compromise for compatibility with inherits(), since the result of data.frame() is an S3 object. So we should add to the list that inherits(data.frame(), "list") is broken too. Once it gets fixed, is(data.frame(), "list") won't need to compromise anymore and will be free to return the correct answer. is(data.frame(), "vector") # [1] FALSE is(data.frame(), "factor") # [1] FALSE is(data.frame(), "vector_OR_factor") # [1] TRUE The question is: which inheritance to follow, S3 or S4? Since "vector" is a basic class, inheritance follows S3 rules. But the class union is an S4 class, so it follows S4 rules. etc... Many people stay away from S4 because of these incomprehensible behaviors. Finally
Re: [Rd] Dispatch mechanism seems to alter object before calling method on it
On 05/16/2018 10:22 AM, Michael Lawrence wrote: Factors and data.frames are not structures, because they must have a class attribute. Just call them "objects". They are higher level than structures, which in practice just shape data without adding a lot of semantics. Compare getClass("matrix") and getClass("factor"). I agree that inheritance through explicit coercion is confusing. As far as I know, there are only 2 places where it is used: 1) Objects with attributes but no class, basically "structure" and its subclasses "array" <- "matrix" 2) Classes that extend a reference type ("environment", "name" and "externalptr") via hidden delegation (@.xData) I'm not sure if anyone should be doing #2. For #1, a simple "fix" would be just to drop inheritance of "structure" from "vector". I think the intent was to mimic base R behavior, where it will happily strip (or at least ignore) attributes when passing an array or matrix to an internal function that expects a vector. A related problem, which explains why factor and data.frame inherit from "vector" even though they are objects, is that any S4 object derived from those needs to be (for pragmatic compatibility reasons) an integer vector or list, respectively, internally (the virtual @.Data slot). Separating that from inheritance would probably be difficult. Yes, we can consider these to be problems, to some extent stemming from the behavior and design of R itself, but I'm not sure it's worth doing anything about them at this point. Thanks for the informative discussion. It still doesn't explain why 'm' gets its attributes stripped and 'x' does not though: m <- matrix(1:12, ncol=3) x <- structure(1:3, titi="A") setGeneric("foo", function(x) standardGeneric("foo")) setMethod("foo", "vector", identity) foo(m) # [1] 1 2 3 4 5 6 7 8 9 10 11 12 foo(x) # [1] 1 2 3 # attr(,"titi") # [1] "A" If I understand correctly, both are "structures", not "objects". Why aren't these problems worth fixing? More generally speaking the erratic behavior of the S4 system with respect to S3 objects has been a plague since the beginning of the methods package. And many people have complained about this in many occasions in one way or another. For the record, here are some of the most notorious problems: class(as.numeric(1:4)) # [1] "numeric" class(as(1:4, "numeric")) # [1] "integer" is.vector(matrix()) # [1] FALSE is(matrix(), "vector") # [1] TRUE is.list(data.frame()) # [1] TRUE is(data.frame(), "list") # [1] FALSE extends("data.frame", "list") # [1] TRUE setClassUnion("vector_OR_factor", c("vector", "factor")) is(data.frame(), "vector") # [1] FALSE is(data.frame(), "factor") # [1] FALSE is(data.frame(), "vector_OR_factor") # [1] TRUE etc... Many people stay away from S4 because of these incomprehensible behaviors. Finally note that even pure S3 operations can produce output that doesn't make sense: is.list(data.frame()) # [1] TRUE is.vector(list()) # [1] TRUE is.vector(data.frame()) # [1] FALSE (that is: a data frame is a list and a list is a vector but a data frame is not a vector!) Why aren't these problems taken more seriously? Thanks, H. Michael On Wed, May 16, 2018 at 8:33 AM, Hervé Pagès wrote: On 05/15/2018 09:13 PM, Michael Lawrence wrote: My understanding is that array (or any other structure) does not "simply" inherit from vector, because structures are not vectors in the strictest sense. Basically, once a vector gains attributes, it is a structure, not a vector. The methods package accommodates this by defining an "is" relationship between "structure" and "vector" via an "explicit coerce", such that any "structure" passed to a "vector" method is first passed to as.vector(), which strips attributes. This is very much by design. It seems that the problem is really with matrices and arrays, not with "structures" in general: f <- factor(c("z", "x", "z"), levels=letters) m <- matrix(1:12, ncol=3) df <- data.frame(f=f) x <- structure(1:3, titi="A") Only the matrix looses its attributes when passed to a "vector" method: setGeneric("foo", function(x) standardGeneric("foo")) setMethod("foo", "vector", identity) foo(f) # attributes are preserved # [1] z x z # Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z foo(m) # attribute
Re: [Rd] Dispatch mechanism seems to alter object before calling method on it
On 05/15/2018 09:13 PM, Michael Lawrence wrote: My understanding is that array (or any other structure) does not "simply" inherit from vector, because structures are not vectors in the strictest sense. Basically, once a vector gains attributes, it is a structure, not a vector. The methods package accommodates this by defining an "is" relationship between "structure" and "vector" via an "explicit coerce", such that any "structure" passed to a "vector" method is first passed to as.vector(), which strips attributes. This is very much by design. It seems that the problem is really with matrices and arrays, not with "structures" in general: f <- factor(c("z", "x", "z"), levels=letters) m <- matrix(1:12, ncol=3) df <- data.frame(f=f) x <- structure(1:3, titi="A") Only the matrix looses its attributes when passed to a "vector" method: setGeneric("foo", function(x) standardGeneric("foo")) setMethod("foo", "vector", identity) foo(f) # attributes are preserved # [1] z x z # Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z foo(m) # attributes are stripped # [1] 1 2 3 4 5 6 7 8 9 10 11 12 foo(df)# attributes are preserved # f # 1 z # 2 x # 3 z foo(x) # attributes are preserved # [1] 1 2 3 # attr(,"titi") # [1] "A" Also if structures are passed to as.vector() before being passed to a "vector" method, shouldn't as.vector() and foo() be equivalent on them? For 'f' and 'x' they're not: as.vector(f) # [1] "z" "x" "z" as.vector(x) # [1] 1 2 3 Finally note that for factors and data frames the "vector" method gets selected despite the fact that is( , "vector") is FALSE: is(f, "vector") # [1] FALSE is(m, "vector") # [1] TRUE is(df, "vector") # [1] FALSE is(x, "vector") # [1] TRUE Couldn't we recognize these problems as real, even if they are by design? Hopefully we can all agree that: - the dispatch mechanism should only dispatch, not alter objects; - is() and selectMethod() should not contradict each other. Thanks, H. Michael On Tue, May 15, 2018 at 5:25 PM, Hervé Pagès wrote: Hi, This was quite unexpected: setGeneric("foo", function(x) standardGeneric("foo")) setMethod("foo", "vector", identity) foo(matrix(1:12, ncol=3)) # [1] 1 2 3 4 5 6 7 8 9 10 11 12 foo(array(1:24, 4:2)) # [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 If I define a method for array objects, things work as expected though: setMethod("foo", "array", identity) foo(matrix(1:12, ncol=3)) # [,1] [,2] [,3] # [1,]159 # [2,]26 10 # [3,]37 11 # [4,]48 12 So, luckily, I have a workaround. But shouldn't the dispatch mechanism stay away from the business of altering objects before passed to it? Thanks, H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=gynT4YhbmVKZhnX4srXlCWZZRyVBMXG211CKgftdEs0&s=_I0aFHQVnXdBfB5kTLg9TxK_2LHdSuaB6gqZwSx1orQ&e= -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Dispatch mechanism seems to alter object before calling method on it
Hi, This was quite unexpected: setGeneric("foo", function(x) standardGeneric("foo")) setMethod("foo", "vector", identity) foo(matrix(1:12, ncol=3)) # [1] 1 2 3 4 5 6 7 8 9 10 11 12 foo(array(1:24, 4:2)) # [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 If I define a method for array objects, things work as expected though: setMethod("foo", "array", identity) foo(matrix(1:12, ncol=3)) # [,1] [,2] [,3] # [1,]159 # [2,]26 10 # [3,]37 11 # [4,]48 12 So, luckily, I have a workaround. But shouldn't the dispatch mechanism stay away from the business of altering objects before passed to it? Thanks, H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] length of `...`
Thanks Martin for the clarifications. H. On 05/04/2018 06:02 AM, Martin Maechler wrote: Hervé Pagès on Thu, 3 May 2018 08:55:20 -0700 writes: > Hi, > It would be great if one of the experts could comment on the > difference between Hadley's dotlength and ...length? The fact > that someone bothered to implement a new primitive for that > when there seems to be a very simple and straightforward R-only > solution suggests that there might be some gotchas/pitfalls with > the R-only solution. Namely dotlength <- function(...) nargs() (This is subtly different from calling nargs() directly as it will only count the elements in ...) Hadley Well, I was the "someone". In the past I had seen (and used myself) length(list(...)) and of course that was not usable. I knew of some substitute() / match.call() tricks [but I think did not know Bill's cute substitute(...()) !] at the time, but found them too esoteric. Aditionally and importantly, ...length() and ..elt(n) were developed "synchronously", and the R-substitutes for ..elt() definitely are less trivial (I did not find one at the time), as Duncan's example to Bill's proposal has shown, so I had looked at .Primitive() solutions of both. In hindsight I should have asked here for advice, but may at the time I had been a bit frustrated by the results of some of my RFCs ((nothing specific in mind !)) But __if__ there's really no example where current (3.5.0 and newer) ...length() differs from Hadley's dotlength() I'd vert happy to replace ...length 's C based definition by Hadley's beautiful minimal solution. Martin > On 05/03/2018 08:34 AM, Hadley Wickham wrote: >> On Thu, May 3, 2018 at 8:18 AM, Duncan Murdoch wrote: >>> On 03/05/2018 11:01 AM, William Dunlap via R-devel wrote: >>>> >>>> In R-3.5.0 you can use ...length(): >>>> > f <- function(..., n) ...length() >>>> > f(stop("one"), stop("two"), stop("three"), n=7) >>>> [1] 3 >>>> >>>> Prior to that substitute() is the way to go >>>> > g <- function(..., n) length(substitute(...())) >>>> > g(stop("one"), stop("two"), stop("three"), n=7) >>>> [1] 3 >>>> >>>> R-3.5.0 also has the ...elt(n) function, which returns >>>> the evaluated n'th entry in ... , without evaluating the >>>> other ... entries. >>>> > fn <- function(..., n) ...elt(n) >>>> > fn(stop("one"), 3*5, stop("three"), n=2) >>>> [1] 15 >>>> >>>> Prior to 3.5.0, eval the appropriate component of the output >>>> of substitute() in the appropriate environment: >>>> > gn <- function(..., n) { >>>> + nthExpr <- substitute(...())[[n]] >>>> + eval(nthExpr, envir=parent.frame()) >>>> + } >>>> > gn(stop("one"), environment(), stop("two"), n=2) >>>> >>>> >>> >>> Bill, the last of these doesn't quite work, because ... can be passed down >>> through a string of callers. You don't necessarily want to evaluate it in >>> the parent.frame(). For example: >>> >>> x <- "global" >>> f <- function(...) { >>> x <- "f" >>> g(...) >>> } >>> g <- function(...) { >>> firstExpr <- substitute(...())[[1]] >>> c(list(...)[[1]], eval(firstExpr, envir = parent.frame())) >>> } >>> >>> Calling g(x) correctly prints "global" twice, but calling f(x) incorrectly >>> prints >>> >>> [1] "global" "f" >>> >>> You can get the first element of ... without evaluating the rest using ..1, >>> but I don't know a way to do this for general n in pre-3.5.0 base R. >> >> If you don't mind using a package: >> >> # works with R 3.1 and up >> library(rlang) >> >> x <- "global" >> f <- function(...) { >> x <- "f" >> g(...) >> } >> g <- function(...) { >> dots <- enquos(...) >> eval_tidy(dots[[1]]) >> } >>
Re: [Rd] length of `...`
Hi, It would be great if one of the experts could comment on the difference between Hadley's dotlength and ...length? The fact that someone bothered to implement a new primitive for that when there seems to be a very simple and straightforward R-only solution suggests that there might be some gotchas/pitfalls with the R-only solution. Thanks, H. On 05/03/2018 08:34 AM, Hadley Wickham wrote: On Thu, May 3, 2018 at 8:18 AM, Duncan Murdoch wrote: On 03/05/2018 11:01 AM, William Dunlap via R-devel wrote: In R-3.5.0 you can use ...length(): > f <- function(..., n) ...length() > f(stop("one"), stop("two"), stop("three"), n=7) [1] 3 Prior to that substitute() is the way to go > g <- function(..., n) length(substitute(...())) > g(stop("one"), stop("two"), stop("three"), n=7) [1] 3 R-3.5.0 also has the ...elt(n) function, which returns the evaluated n'th entry in ... , without evaluating the other ... entries. > fn <- function(..., n) ...elt(n) > fn(stop("one"), 3*5, stop("three"), n=2) [1] 15 Prior to 3.5.0, eval the appropriate component of the output of substitute() in the appropriate environment: > gn <- function(..., n) { + nthExpr <- substitute(...())[[n]] + eval(nthExpr, envir=parent.frame()) + } > gn(stop("one"), environment(), stop("two"), n=2) Bill, the last of these doesn't quite work, because ... can be passed down through a string of callers. You don't necessarily want to evaluate it in the parent.frame(). For example: x <- "global" f <- function(...) { x <- "f" g(...) } g <- function(...) { firstExpr <- substitute(...())[[1]] c(list(...)[[1]], eval(firstExpr, envir = parent.frame())) } Calling g(x) correctly prints "global" twice, but calling f(x) incorrectly prints [1] "global" "f" You can get the first element of ... without evaluating the rest using ..1, but I don't know a way to do this for general n in pre-3.5.0 base R. If you don't mind using a package: # works with R 3.1 and up library(rlang) x <- "global" f <- function(...) { x <- "f" g(...) } g <- function(...) { dots <- enquos(...) eval_tidy(dots[[1]]) } f(x, stop("!")) #> [1] "global" g(x, stop("!")) #> [1] "global" Hadley -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] as.list method for by Objects
On 01/30/2018 02:50 PM, Michael Lawrence wrote: by() does not always return a list. In Gabe's example, it returns an integer, thus it is coerced to a list. as.list() means that it should be a VECSXP, not necessarily with "list" in the class attribute. The documentation is not particularly clear about what as.list() means for list derivatives. IMO clarifications should stick to simple concepts and formulations like "is.list(x) is TRUE" or "x is a list or a list derivative" rather than "x is a VECSXP". Coercion is useful beyond the use case of implementing a .C entry point and calling as.numeric/as.list/etc... on its arguments. This is why I was hoping that we could maybe discuss the possibility of making the as.list() contract less vague than just "as.list() must return a list or a list derivative". Again, I think that 2 things weight quite a lot in that discussion: 1) as.list() returns an object of class "data.frame" on a data.frame (strict coercion). If all what as.list() needed to do was to return a VECSXP, then as.list.default() already does this on a data.frame so why did someone bother adding an as.list.data.frame method that does strict coercion? 2) The S4 coercion system based on as() does strict coercion by default. H. Michael On Tue, Jan 30, 2018 at 2:41 PM, Hervé Pagès <mailto:hpa...@fredhutch.org>> wrote: Hi Gabe, Interestingly the behavior of as.list() on by objects seem to depend on the object itself: > b1 <- by(1:2, 1:2, identity) > class(as.list(b1)) [1] "list" > b2 <- by(warpbreaks[, 1:2], warpbreaks[,"tension"], summary) > class(as.list(b2)) [1] "by" This is with R 3.4.3 and R devel (2017-12-11 r73889). H. On 01/30/2018 02:33 PM, Gabriel Becker wrote: Dario, What version of R are you using. In my mildly old 3.4.0 installation and in the version of Revel I have lying around (also mildly old...) I don't see the behavior I think you are describing > b = by(1:2, 1:2, identity) > class(as.list(b)) [1] "list" > sessionInfo() R Under development (unstable) (2017-12-19 r73926) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: OS X El Capitan 10.11.6 Matrix products: default BLAS: /Users/beckerg4/local/Rdevel/R.framework/Versions/3.5/Resources/lib/libRblas.dylib LAPACK: /Users/beckerg4/local/Rdevel/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.5.0 > As for by not having a class definition, no S3 class has an explicit definition, so this is somewhat par for the course here... did I misunderstand something? ~G On Tue, Jan 30, 2018 at 2:24 PM, Hervé Pagès mailto:hpa...@fredhutch.org> <mailto:hpa...@fredhutch.org <mailto:hpa...@fredhutch.org>>> wrote: I agree that it makes sense to expect as.list() to perform a "strict coercion" i.e. to return an object of class "list", *even* on a list derivative. That's what as( , "list") does by default: # on a data.frame object as(data.frame(), "list") # object of class "list" # (but strangely it drops the names) # on a by object x <- by(warpbreaks[, 1:2], warpbreaks[,"tension"], summary) as(x, "list") # object of class "list" More generally speaking as() is expected to perform "strict coercion" by default, unless called with 'strict=FALSE'. That's also what as.list() does on a data.frame: as.list(data.frame()) # object of class "list" FWIW as.numeric() also performs "strict coercion" on an integer vector: as.numeric(1:3) # object of class "numeric" So an as.list.env method that does the same as as(x, "list") would bring a small touch of consistency in an otherwise quite inconsistent world of coercion methods(*). H. (*) as(data.frame(), "list",
Re: [Rd] as.list method for by Objects
Hi Gabe, Interestingly the behavior of as.list() on by objects seem to depend on the object itself: > b1 <- by(1:2, 1:2, identity) > class(as.list(b1)) [1] "list" > b2 <- by(warpbreaks[, 1:2], warpbreaks[,"tension"], summary) > class(as.list(b2)) [1] "by" This is with R 3.4.3 and R devel (2017-12-11 r73889). H. On 01/30/2018 02:33 PM, Gabriel Becker wrote: Dario, What version of R are you using. In my mildly old 3.4.0 installation and in the version of Revel I have lying around (also mildly old...) I don't see the behavior I think you are describing > b = by(1:2, 1:2, identity) > class(as.list(b)) [1] "list" > sessionInfo() R Under development (unstable) (2017-12-19 r73926) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: OS X El Capitan 10.11.6 Matrix products: default BLAS: /Users/beckerg4/local/Rdevel/R.framework/Versions/3.5/Resources/lib/libRblas.dylib LAPACK: /Users/beckerg4/local/Rdevel/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.5.0 > As for by not having a class definition, no S3 class has an explicit definition, so this is somewhat par for the course here... did I misunderstand something? ~G On Tue, Jan 30, 2018 at 2:24 PM, Hervé Pagès <mailto:hpa...@fredhutch.org>> wrote: I agree that it makes sense to expect as.list() to perform a "strict coercion" i.e. to return an object of class "list", *even* on a list derivative. That's what as( , "list") does by default: # on a data.frame object as(data.frame(), "list") # object of class "list" # (but strangely it drops the names) # on a by object x <- by(warpbreaks[, 1:2], warpbreaks[,"tension"], summary) as(x, "list") # object of class "list" More generally speaking as() is expected to perform "strict coercion" by default, unless called with 'strict=FALSE'. That's also what as.list() does on a data.frame: as.list(data.frame()) # object of class "list" FWIW as.numeric() also performs "strict coercion" on an integer vector: as.numeric(1:3) # object of class "numeric" So an as.list.env method that does the same as as(x, "list") would bring a small touch of consistency in an otherwise quite inconsistent world of coercion methods(*). H. (*) as(data.frame(), "list", strict=FALSE) doesn't do what you'd expect (just one of many examples) On 01/29/2018 05:00 PM, Dario Strbenac wrote: Good day, I'd like to suggest the addition of an as.list method for a by object that actually returns a list of class "list". This would make it safer to do type-checking, because is.list also returns TRUE for a data.frame variable and using class(result) == "list" is an alternative that only returns TRUE for lists. It's also confusing initially that class(x) [1] "by" is.list(x) [1] TRUE since there's no explicit class definition for "by" and no mention if it has any superclasses. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia __ R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=8nXbMrKus1XsG7MluCRy3sluJKKhMVwOPHtudDpYJ4o&s=qDnEZOWalov3E9h1dajp8RLURfRz0-nbwH721jFAcEo&e= <https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=8nXbMrKus1XsG7MluCRy3sluJKKhMVwOPHtudDpYJ4o&s=qDnEZOWalov3E9h1dajp8RLURfRz0-nbwH721jFAcEo&e=> -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org <mailto:hpa...@fredhutch.org> Phone: (206) 667-5791 Fax: (206) 667-1319 ___
Re: [Rd] as.list method for by Objects
On 01/30/2018 02:24 PM, Hervé Pagès wrote: I agree that it makes sense to expect as.list() to perform a "strict coercion" i.e. to return an object of class "list", *even* on a list derivative. That's what as( , "list") does by default: # on a data.frame object as(data.frame(), "list") # object of class "list" # (but strangely it drops the names) # on a by object x <- by(warpbreaks[, 1:2], warpbreaks[,"tension"], summary) as(x, "list") # object of class "list" More generally speaking as() is expected to perform "strict coercion" by default, unless called with 'strict=FALSE'. That's also what as.list() does on a data.frame: as.list(data.frame()) # object of class "list" FWIW as.numeric() also performs "strict coercion" on an integer vector: as.numeric(1:3) # object of class "numeric" So an as.list.env method that does the same as as(x, "list") ^^^ oops, I meant as.list.by, sorry... H. would bring a small touch of consistency in an otherwise quite inconsistent world of coercion methods(*). H. (*) as(data.frame(), "list", strict=FALSE) doesn't do what you'd expect (just one of many examples) On 01/29/2018 05:00 PM, Dario Strbenac wrote: Good day, I'd like to suggest the addition of an as.list method for a by object that actually returns a list of class "list". This would make it safer to do type-checking, because is.list also returns TRUE for a data.frame variable and using class(result) == "list" is an alternative that only returns TRUE for lists. It's also confusing initially that class(x) [1] "by" is.list(x) [1] TRUE since there's no explicit class definition for "by" and no mention if it has any superclasses. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia __ R-devel@r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=8nXbMrKus1XsG7MluCRy3sluJKKhMVwOPHtudDpYJ4o&s=qDnEZOWalov3E9h1dajp8RLURfRz0-nbwH721jFAcEo&e= -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] as.list method for by Objects
I agree that it makes sense to expect as.list() to perform a "strict coercion" i.e. to return an object of class "list", *even* on a list derivative. That's what as( , "list") does by default: # on a data.frame object as(data.frame(), "list") # object of class "list" # (but strangely it drops the names) # on a by object x <- by(warpbreaks[, 1:2], warpbreaks[,"tension"], summary) as(x, "list") # object of class "list" More generally speaking as() is expected to perform "strict coercion" by default, unless called with 'strict=FALSE'. That's also what as.list() does on a data.frame: as.list(data.frame()) # object of class "list" FWIW as.numeric() also performs "strict coercion" on an integer vector: as.numeric(1:3) # object of class "numeric" So an as.list.env method that does the same as as(x, "list") would bring a small touch of consistency in an otherwise quite inconsistent world of coercion methods(*). H. (*) as(data.frame(), "list", strict=FALSE) doesn't do what you'd expect (just one of many examples) On 01/29/2018 05:00 PM, Dario Strbenac wrote: Good day, I'd like to suggest the addition of an as.list method for a by object that actually returns a list of class "list". This would make it safer to do type-checking, because is.list also returns TRUE for a data.frame variable and using class(result) == "list" is an alternative that only returns TRUE for lists. It's also confusing initially that class(x) [1] "by" is.list(x) [1] TRUE since there's no explicit class definition for "by" and no mention if it has any superclasses. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia __ R-devel@r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=8nXbMrKus1XsG7MluCRy3sluJKKhMVwOPHtudDpYJ4o&s=qDnEZOWalov3E9h1dajp8RLURfRz0-nbwH721jFAcEo&e= -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] sum() returns NA on a long *logical* vector when nb of TRUE values exceeds 2^31
Hi Martin, Henrik, Thanks for the follow up. @Martin: I vote for 2) without *any* hesitation :-) (and uniformity could be restored at some point in the future by having prod(), rowSums(), colSums(), and others align with the behavior of length() and sum()) Cheers, H. On 01/27/2018 03:06 AM, Martin Maechler wrote: Henrik Bengtsson on Thu, 25 Jan 2018 09:30:42 -0800 writes: > Just following up on this old thread since matrixStats 0.53.0 is now > out, which supports this use case: >> x <- rep(TRUE, times = 2^31) >> y <- sum(x) >> y > [1] NA > Warning message: > In sum(x) : integer overflow - use sum(as.numeric(.)) >> y <- matrixStats::sum2(x, mode = "double") >> y > [1] 2147483648 >> str(y) > num 2.15e+09 > No coercion is taking place, so the memory overhead is zero: >> profmem::profmem(y <- matrixStats::sum2(x, mode = "double")) > Rprofmem memory profiling of: > y <- matrixStats::sum2(x, mode = "double") > Memory allocations: > bytes calls > total 0 > /Henrik Thank you, Henrik, for the reminder. Back in June, I had mentioned to Hervé and R-devel that 'logical' should remain to be treated as 'integer' as in all arithmetic in (S and) R. Hervé did mention the isum() function in the C code which is relevant here .. which does have a LONG INT counter already -- *but* if we consider that sum() has '...' i.e. a conceptually arbitrary number of long vector integer arguments that counter won't suffice even there. Before talking about implementation / patch, I think we should consider 2 possible goals of a change --- I agree the status quo is not a real option 1) sum(x) for logical and integer x would return a double in any case and overflow should not happen (unless for the case where the result would be larger the .Machine$double.max which I think will not be possible even with "arbitrary" nargs() of sum. 2) sum(x) for logical and integer x should return an integer in all cases there is no overflow, including returning NA_integer_ in case of NAs. If there would be an overflow it must be detected "in time" and the result should be double. The big advantage of 2) is that it is back compatible in 99.x % of use cases, and another advantage that it may be a very small bit more efficient. Also, in the case of "counting" (logical), it is nice to get an integer instead of double when we can -- entirely analogously to the behavior of length() which returns integer whenever possible. The advantage of 1) is uniformity. We should (at least provisionally) decide between 1) and 2) and then go for that. It could be that going for 1) may have bad compatibility-consequences in package space, because indeed we had documented sum() would be integer for logical and integer arguments. I currently don't really have time to {work on implementing + dealing with the consequences} for either .. Martin > On Fri, Jun 2, 2017 at 1:58 PM, Henrik Bengtsson > wrote: >> I second this feature request (it's understandable that this and >> possibly other parts of the code was left behind / forgotten after the >> introduction of long vector). >> >> I think mean() avoids full copies, so in the meanwhile, you can work >> around this limitation using: >> >> countTRUE <- function(x, na.rm = FALSE) { >> nx <- length(x) >> if (nx < .Machine$integer.max) return(sum(x, na.rm = na.rm)) >> nx * mean(x, na.rm = na.rm) >> } >> >> (not sure if one needs to worry about rounding errors, i.e. where n %% 0 != 0) >> >> x <- rep(TRUE, times = .Machine$integer.max+1) >> object.size(x) >> ## 8589934632 bytes >> >> p <- profmem::profmem( n <- countTRUE(x) ) >> str(n) >> ## num 2.15e+09 >> print(n == .Machine$integer.max + 1) >> ## [1] TRUE >> >> print(p) >> ## Rprofmem memory profiling of: >> ## n <- countTRUE(x) >> ## >> ## Memory allocations: >> ## bytes calls >> ## total 0 >> >> >> FYI / related: I've just updated matrixStats::sum2() to support >> logicals (develop branch) and I'll also try to update >> matrixStats::count() to count beyond .Machine$integer.max. >> >> /Henrik >> >> On Fri, Jun 2, 2017 at 4:05 AM, Hervé Pagès wrote: >>> Hi, >>
Re: [Rd] as.character(list(NA))
On 01/22/2018 01:02 PM, William Dunlap wrote: I tend to avoid using as. functions on lists, since they act oddly in several ways. E.g, if the list "L" consists entirely of scalar elements then as.numeric(L) acts like as.numeric(unlist(L)) but if any element is not a scalar there is an error. FWIW personally I see this as a nice feature and use as.numeric(L) instead of as.numeric(unlist(L) in places where I'd rather fail than getting something that is not parallel to the input. H. as.character() does not seem to make a distinction between the all-scalar and not-all-scalar cases but does various things with NA's of various types. Bill Dunlap TIBCO Software wdunlap tibco.com <https://urldefense.proofpoint.com/v2/url?u=http-3A__tibco.com&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=TBOnlxYy_MjXJqxLqI8WB2WWHNMtc8qw5gxsqUoyoH4&s=XiTysBsoZb4M91NAS4DK6nK982wAf7JGpRSDrXioQ3A&e=> On Mon, Jan 22, 2018 at 11:14 AM, Robert McGehee mailto:rmcge...@walleyetrading.net>> wrote: Also perhaps a surprise that the behavior depends on the mode of the NA. > is.na <https://urldefense.proofpoint.com/v2/url?u=http-3A__is.na&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=TBOnlxYy_MjXJqxLqI8WB2WWHNMtc8qw5gxsqUoyoH4&s=z8HTNapemGDhhH0ICUN2hgJrtUcxsgM96mcUwD8QzQk&e=>(as.character(list(NA_real_))) [1] FALSE > is.na <https://urldefense.proofpoint.com/v2/url?u=http-3A__is.na&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=TBOnlxYy_MjXJqxLqI8WB2WWHNMtc8qw5gxsqUoyoH4&s=z8HTNapemGDhhH0ICUN2hgJrtUcxsgM96mcUwD8QzQk&e=>(as.character(list(NA_character_))) [1] TRUE Does this mean deparse() preserves NA-ness for NA_character_ but not NA_real_? -Original Message----- From: R-devel [mailto:r-devel-boun...@r-project.org <mailto:r-devel-boun...@r-project.org>] On Behalf Of Hervé Pagès Sent: Monday, January 22, 2018 2:01 PM To: William Dunlap mailto:wdun...@tibco.com>>; Patrick Perry mailto:ppe...@stern.nyu.edu>> Cc: r-devel@r-project.org <mailto:r-devel@r-project.org> Subject: Re: [Rd] as.character(list(NA)) On 01/20/2018 08:24 AM, William Dunlap via R-devel wrote: > I believe that for a list as.character() applies deparse() to each element > of the list. deparse() does not preserve NA-ness, as it is intended to > make text that the parser can read. > >> str(as.character(list(Na=NA, LglVec=c(TRUE,NA), > Function=function(x){x+1}))) > chr [1:3] "NA" "c(TRUE, NA)" "function (x) \n{\n x + 1\n}" > This really comes as a surprise though since coercion to all the other atomic types (except raw) preserve the NAs. And also as.character(unlist(list(NA))) preserves them. H. > > Bill Dunlap > TIBCO Software > wdunlap tibco.com <https://urldefense.proofpoint.com/v2/url?u=http-3A__tibco.com&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=TBOnlxYy_MjXJqxLqI8WB2WWHNMtc8qw5gxsqUoyoH4&s=XiTysBsoZb4M91NAS4DK6nK982wAf7JGpRSDrXioQ3A&e=> > > On Sat, Jan 20, 2018 at 7:43 AM, Patrick Perry mailto:ppe...@stern.nyu.edu>> wrote: > >> As of R Under development (unstable) (2018-01-19 r74138): >> >>> as.character(list(NA)) >> [1] "NA" >> >>> is.na <https://urldefense.proofpoint.com/v2/url?u=http-3A__is.na&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=TBOnlxYy_MjXJqxLqI8WB2WWHNMtc8qw5gxsqUoyoH4&s=z8HTNapemGDhhH0ICUN2hgJrtUcxsgM96mcUwD8QzQk&e=>(as.character(list(NA))) >> [1] FALSE >> >> __ >> R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list >> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=VbamM9XRQOlfBakrmlrmQZ7DLgXZ-hhhFeLD-fKpoCo&s=Luhqwpr2bTltIA9Cy7kA4gwcQh16bla0S6OVe3Z09Xo&e= <https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=VbamM9XRQOlfBakrmlrmQZ7DLgXZ-hhhFeLD-fKpoCo&s=Luhqwpr2bTltIA9Cy7kA4gwcQh16bla0S6OVe3Z09Xo&e=> >> > > [[alternative HTML version deleted]] > > __ > R-
Re: [Rd] as.character(list(NA))
On 01/20/2018 08:24 AM, William Dunlap via R-devel wrote: I believe that for a list as.character() applies deparse() to each element of the list. deparse() does not preserve NA-ness, as it is intended to make text that the parser can read. str(as.character(list(Na=NA, LglVec=c(TRUE,NA), Function=function(x){x+1}))) chr [1:3] "NA" "c(TRUE, NA)" "function (x) \n{\nx + 1\n}" This really comes as a surprise though since coercion to all the other atomic types (except raw) preserve the NAs. And also as.character(unlist(list(NA))) preserves them. H. Bill Dunlap TIBCO Software wdunlap tibco.com On Sat, Jan 20, 2018 at 7:43 AM, Patrick Perry wrote: As of R Under development (unstable) (2018-01-19 r74138): as.character(list(NA)) [1] "NA" is.na(as.character(list(NA))) [1] FALSE __ R-devel@r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=VbamM9XRQOlfBakrmlrmQZ7DLgXZ-hhhFeLD-fKpoCo&s=Luhqwpr2bTltIA9Cy7kA4gwcQh16bla0S6OVe3Z09Xo&e= [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=VbamM9XRQOlfBakrmlrmQZ7DLgXZ-hhhFeLD-fKpoCo&s=Luhqwpr2bTltIA9Cy7kA4gwcQh16bla0S6OVe3Z09Xo&e= -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Unexpected dimnames attribute returned by cbind/rbind
Hi, > m5 <- cbind(integer(5), integer(5)) > m5 [,1] [,2] [1,]00 [2,]00 [3,]00 [4,]00 [5,]00 > dimnames(m5) NULL No dimnames, as expected. > m0 <- cbind(integer(0), integer(0)) > m0 [,1] [,2] > dimnames(m0) [[1]] NULL [[2]] NULL Unexpected dimnames attribute! rbind'ing empty vectors also returns a matrix with unexpected dimnames: > dimnames(rbind(character(0), character(0))) [[1]] NULL [[2]] NULL Cheers, H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] format() doesn't propagate the dim and dimnames when underlying type of array is "list"
Hi, Here is how to reproduce: With a matrix of atomic type m1 <- matrix(1:12, ncol=3, dimnames=list(letters[1:4], LETTERS[1:3])) typeof(m1) # [1] "integer" m1 # A B C # a 1 5 9 # b 2 6 10 # c 3 7 11 # d 4 8 12 format(m1) # ABC # a " 1" " 5" " 9" # b " 2" " 6" "10" # c " 3" " 7" "11" # d " 4" " 8" "12" ==> dim and dimnames are propagated. With a matrix of type "list" m2 <- matrix(rep(list(1:5, NULL, "AA"), 4), ncol=3, dimnames=list(letters[1:4], LETTERS[1:3])) typeof(m2) # [1] "list" m2 # A B C # a Integer,5 NULL "AA" # b NULL "AA" Integer,5 # c "AA" Integer,5 NULL # d Integer,5 NULL "AA" format(m2) # [1] "1, 2, 3, 4, 5" "NULL" "AA""1, 2, 3, 4, 5" # [5] "NULL" "AA""1, 2, 3, 4, 5" "NULL" # [9] "AA""1, 2, 3, 4, 5" "NULL" "AA" ==> dim and dimnames are dropped! The same thing seems to happen with arrays of arbitrary dimensions. Thanks, H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] binary form of is() contradicts its unary form
Yes, data.frame is not an S4 class but is(data.frame()) finds its super-classes anyway and without the need to wrap it in asS4(). And "list' is one of the super-classes. Then is(data.frame(), "list") contradicts this. I'm not asking for a workaround. I already have one with 'class2 %in% is(object)' as reported in my original post. 'is(asS4(object), class2)' is maybe another one but, unlike the former, it's not obvious that it will behave consistently with unary is(). There could be some other surprise on the way. You're missing the point of my original post. Which is that there is a serious inconsistency between the unary and binary forms of is(). Maybe the binary form is right in case of is(data.frame(), "list"). But then the unary form should not return "list'. This inconsistency will potentially hurt anybody who tries to do computations on a class hierarchy, especially if the hierarchy is complex and mixes S4 and S3 classes. So I'm hoping this can be addressed. Hope you understand. Cheers, H. On 11/29/2017 12:21 PM, Suzen, Mehmet wrote: Hi Herve, Interesting observation with `setClass` but it is for S4. It looks like `data.frame()` is not an S4 class. isS4(data.frame()) [1] FALSE And in your case this might help: is(asS4(data.frame()), "list") [1] TRUE Looks like `is` is designed for S4 classes, I am not entirely sure. Best, -Mehmet On 29 November 2017 at 20:46, Hervé Pagès wrote: Hi Mehmet, On 11/29/2017 11:22 AM, Suzen, Mehmet wrote: Hi Herve, I think you are confusing subclasses and classes. There is no contradiction. `is` documentation is very clear: `With one argument, returns all the super-classes of this object's class.` Yes that's indeed very clear. So if "list" is a super-class of "data.frame" (as reported by is(data.frame())), then is(data.frame(), "list") should be TRUE. With S4 classes: setClass("A") setClass("B", contains="A") ## Get all the super-classes of B. is(new("B")) # [1] "B" "A" ## Does a B object inherit from A? is(new("B"), "A") # [1] TRUE Cheers, H. Note that object class is always `data.frame` here, check: > class(data.frame()) [1] "data.frame" > is(data.frame(), "data.frame") [1] TRUE Best, Mehmet On 29 Nov 2017 19:13, "Hervé Pagès" mailto:hpa...@fredhutch.org>> wrote: Hi, The unary forms of is() and extends() report that data.frame extends list, oldClass, and vector: > is(data.frame()) [1] "data.frame" "list" "oldClass" "vector" > extends("data.frame") [1] "data.frame" "list" "oldClass" "vector" However, the binary form of is() disagrees: > is(data.frame(), "list") [1] FALSE > is(data.frame(), "oldClass") [1] FALSE > is(data.frame(), "vector") [1] FALSE while the binary form of extends() agrees: > extends("data.frame", "list") [1] TRUE > extends("data.frame", "oldClass") [1] TRUE > extends("data.frame", "vector") [1] TRUE Who is right? Shouldn't 'is(object, class2)' be equivalent to 'class2 %in% is(object)'? Furthermore, is there any reason why 'is(object, class2)' is not implemented as 'class2 %in% is(object)'? Thanks, H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Canc <https://urldefense.proofpoint.com/v2/url?u=https-3A__maps.google.com_-3Fq-3DFred-2BHutchinson-2BCanc-26entry-3Dgmail-26source-3Dg&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=AptypGUf1qnpkFcOc1eU_vdGSHsush3RGVUyjk7yDu8&s=sTr3VPPxYCZLOtlBS3DToP4-Wi44EOLs99gJcV932b0&e=>er Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org <mailto:hpa...@fredhutch.org> Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Edo4xQQyNSdlhiJjtVDnOcunTA8a6KT5EN7_jowitP8&s=ES11eQ8qMdiYMc5X-SbEfQyy2VoX6MUfX0skN-QWunc&e= <https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.e
Re: [Rd] binary form of is() contradicts its unary form
Hi Mehmet, On 11/29/2017 11:22 AM, Suzen, Mehmet wrote: Hi Herve, I think you are confusing subclasses and classes. There is no contradiction. `is` documentation is very clear: `With one argument, returns all the super-classes of this object's class.` Yes that's indeed very clear. So if "list" is a super-class of "data.frame" (as reported by is(data.frame())), then is(data.frame(), "list") should be TRUE. With S4 classes: setClass("A") setClass("B", contains="A") ## Get all the super-classes of B. is(new("B")) # [1] "B" "A" ## Does a B object inherit from A? is(new("B"), "A") # [1] TRUE Cheers, H. Note that object class is always `data.frame` here, check: > class(data.frame()) [1] "data.frame" > is(data.frame(), "data.frame") [1] TRUE Best, Mehmet On 29 Nov 2017 19:13, "Hervé Pagès" mailto:hpa...@fredhutch.org>> wrote: Hi, The unary forms of is() and extends() report that data.frame extends list, oldClass, and vector: > is(data.frame()) [1] "data.frame" "list" "oldClass" "vector" > extends("data.frame") [1] "data.frame" "list" "oldClass" "vector" However, the binary form of is() disagrees: > is(data.frame(), "list") [1] FALSE > is(data.frame(), "oldClass") [1] FALSE > is(data.frame(), "vector") [1] FALSE while the binary form of extends() agrees: > extends("data.frame", "list") [1] TRUE > extends("data.frame", "oldClass") [1] TRUE > extends("data.frame", "vector") [1] TRUE Who is right? Shouldn't 'is(object, class2)' be equivalent to 'class2 %in% is(object)'? Furthermore, is there any reason why 'is(object, class2)' is not implemented as 'class2 %in% is(object)'? Thanks, H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Canc <https://urldefense.proofpoint.com/v2/url?u=https-3A__maps.google.com_-3Fq-3DFred-2BHutchinson-2BCanc-26entry-3Dgmail-26source-3Dg&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=AptypGUf1qnpkFcOc1eU_vdGSHsush3RGVUyjk7yDu8&s=sTr3VPPxYCZLOtlBS3DToP4-Wi44EOLs99gJcV932b0&e=>er Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org <mailto:hpa...@fredhutch.org> Phone: (206) 667-5791 Fax:(206) 667-1319 ______ R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-devel <https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=AptypGUf1qnpkFcOc1eU_vdGSHsush3RGVUyjk7yDu8&s=OzNPwqjAWVsXOGKMCmd4Fa7Udcm21ewfJmUN78LenQY&e=> -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] binary form of is() contradicts its unary form
Hi, The unary forms of is() and extends() report that data.frame extends list, oldClass, and vector: > is(data.frame()) [1] "data.frame" "list" "oldClass" "vector" > extends("data.frame") [1] "data.frame" "list" "oldClass" "vector" However, the binary form of is() disagrees: > is(data.frame(), "list") [1] FALSE > is(data.frame(), "oldClass") [1] FALSE > is(data.frame(), "vector") [1] FALSE while the binary form of extends() agrees: > extends("data.frame", "list") [1] TRUE > extends("data.frame", "oldClass") [1] TRUE > extends("data.frame", "vector") [1] TRUE Who is right? Shouldn't 'is(object, class2)' be equivalent to 'class2 %in% is(object)'? Furthermore, is there any reason why 'is(object, class2)' is not implemented as 'class2 %in% is(object)'? Thanks, H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] `[<-.data.frame` sets rownames incorrectly
On 11/21/2017 06:19 PM, Hervé Pagès wrote: Hi, Here is another problem with data frame subsetting: > df <- data.frame(aa=1:3) > value <- data.frame(aa=11:12, row.names=c("A", "B")) > `[<-`(df, 4:5, , value=value) aa 1 1 2 2 3 3 A 11 B 12 > `[<-`(df, 5:4, , value=value) aa 1 1 2 2 3 3 B 12 A 11 This actually produces: > `[<-`(df, 5:4, , value=value) aa 1 1 2 2 3 3 A 12 B 11 but should instead produce: aa 1 1 2 2 3 3 B 12 A 11 sorry for the confusion. H. For this last result, the rownames of the 2 last rows should be swapped. H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] `[<-.data.frame` sets rownames incorrectly
Hi, Here is another problem with data frame subsetting: > df <- data.frame(aa=1:3) > value <- data.frame(aa=11:12, row.names=c("A", "B")) > `[<-`(df, 4:5, , value=value) aa 1 1 2 2 3 3 A 11 B 12 > `[<-`(df, 5:4, , value=value) aa 1 1 2 2 3 3 B 12 A 11 For this last result, the rownames of the 2 last rows should be swapped. H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] `[[<-.data.frame` leaves holes after existing columns and returns a corrupt data frame
Hi, `[<-.data.frame` is cautious about not leaving holes after existing columns: > `[<-`(data.frame(id=1:6), 3, value=data.frame(V3=11:16)) Error in `[<-.data.frame`(data.frame(id = 1:6), 3, value = data.frame(V3 = 11:16)) : new columns would leave holes after existing columns but `[[<-.data.frame` not so much: > `[[<-`(data.frame(id=1:6), 3, value=11:16) id V3 1 1 NULL 11 2 2 12 3 3 13 4 4 14 5 5 15 6 6 16 Warning message: In format.data.frame(x, digits = digits, na.encode = FALSE) : corrupt data frame: columns will be truncated or padded with NAs The latter should probably behave like the former in that case. Maybe by sharing more code with it? Thanks, H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] split() - unexpected sorting of results
Hi, On 10/20/2017 12:53 PM, Peter Meissner wrote: Thanks, for the explanation. Still, I think this is surprising bahaviour which might be handled better. Maybe a little surprising, but no more than: > x <- sample(11L) > sort(x) [1] 1 2 3 4 5 6 7 8 9 10 11 > sort(as.character(x)) [1] "1" "10" "11" "2" "3" "4" "5" "6" "7" "8" "9" The fact that sort(), as.factor(), split() and many other things behave consistently with respect to the underlying order of character vectors avoids other even bigger surprises. Also note that the underlying order of character vectors actually depends on your locale. One way to guarantee consistent results across platforms/locales is by explicitly specifying the levels when making a factor e.g. f <- factor(x, levels=unique(x)) split(1:11, f) This is particularly sensible when writing unit tests. Cheers, H. Best, Peter Am 20.10.2017 9:49 nachm. schrieb "Iñaki Úcar" : Hi Peter, 2017-10-20 21:33 GMT+02:00 Peter Meissner : Hey, I found this - for me - quite surprising and puzzling behaviour of split(). split(1:11, as.character(1:11)) split(1:11, 1:11) When splitting by numerics everything works as expected - sorting of input == sorting of output -- but when using a character vector everything gets re-sorted alphabetical. Although, there are some references in the help files to what happens when using split, I did not find any note on this - for me - rather unexpected behaviour. As the documentation states, f: a ‘factor’ in the sense that ‘as.factor(f)’ defines the grouping, or a list of such factors in which case their interaction is used for the grouping. And, in fact, as.factor(1:11) [1] 1 2 3 4 5 6 7 8 9 10 11 Levels: 1 2 3 4 5 6 7 8 9 10 11 as.factor(as.character(1:11)) [1] 1 2 3 4 5 6 7 8 9 10 11 Levels: 1 10 11 2 3 4 5 6 7 8 9 Regards, Iñaki I would like it best when the sorting of split results stays the same no matter the input (sorting of input == sorting of output) If that is not possibly a note of caution in the help pages and maybe an example might be valuable. Best, Peter [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=o5-lZT7zAjFNU8C0Z9D7XaQO_2NGmhKF-IbGZFhSvO0&s=4cZ9rSLJAVnnjULGMCDPAclXHoc9_le3Z1DrZg0nQqg&e= [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=o5-lZT7zAjFNU8C0Z9D7XaQO_2NGmhKF-IbGZFhSvO0&s=4cZ9rSLJAVnnjULGMCDPAclXHoc9_le3Z1DrZg0nQqg&e= -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] sum() returns NA on a long *logical* vector when nb of TRUE values exceeds 2^31
Hi Martin, On 06/07/2017 03:54 AM, Martin Maechler wrote: Martin Maechler on Tue, 6 Jun 2017 09:45:44 +0200 writes: Hervé Pagès on Fri, 2 Jun 2017 04:05:15 -0700 writes: >> Hi, I have a long numeric vector 'xx' and I want to use >> sum() to count the number of elements that satisfy some >> criteria like non-zero values or values lower than a >> certain threshold etc... >> The problem is: sum() returns an NA (with a warning) if >> the count is greater than 2^31. For example: >>> xx <- runif(3e9) sum(xx < 0.9) >> [1] NA Warning message: In sum(xx < 0.9) : integer >> overflow - use sum(as.numeric(.)) >> This already takes a long time and doing >> sum(as.numeric(.)) would take even longer and require >> allocation of 24Gb of memory just to store an >> intermediate numeric vector made of 0s and 1s. Plus, >> having to do sum(as.numeric(.)) every time I need to >> count things is not convenient and is easy to forget. >> It seems that sum() on a logical vector could be modified >> to return the count as a double when it cannot be >> represented as an integer. Note that length() already >> does this so that wouldn't create a precedent. Also and >> FWIW prod() avoids the problem by always returning a >> double, whatever the type of the input is (except on a >> complex vector). >> I can provide a patch if this change sounds reasonable. > This sounds very reasonable, thank you Hervé, for the > report, and even more for a (small) patch. I was made aware of the fact, that R treats logical and integer very often identically in the C code, and in general we even mention that logicals are treated as 0/1/NA integers in arithmetic. For the present case that would mean that we should also safe-guard against *integer* overflow in sum(.) and that is not something we have done / wanted to do in the past... Speed being one reason. So this ends up being more delicate than I had thought at first, because changing sum() only would mean that sum(LOGI) and sum(as.integer(LOGI)) would start differ for a logical vector LOGI. So, for now this is something that must be approached carefully, and the R Core team may want discuss "in private" first. I'm sorry for having raised possibly unrealistic expectations. No worries. Thanks for taking my proposal into consideration. Note that the isum() function in src/main/summary.c is already using a 64-bit accumulator to accommodate intermediate sums > INT_MAX. So it should be easy to modify the function to make it overflow for much bigger final sums without altering performance. Seems like R_XLEN_T_MAX would be the natural threshold. Cheers, H. Martin > Martin >> Cheers, H. >> -- >> Hervé Pagès >> Program in Computational Biology Division of Public >> Health Sciences Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA >> 98109-1024 >> E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax: >> (206) 667-1319 >> __ >> R-devel@r-project.org mailing list >> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIDAw&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=dyRNzyVdDYXzNX0sXIl5sdDqDXSxROm4-uM_XMquX_E&s=Qq6QdMWvudWgR_WGKdbBVNnVs5JO6s692MxjDo2JR9Y&e= > __ > R-devel@r-project.org mailing list > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIDAw&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=dyRNzyVdDYXzNX0sXIl5sdDqDXSxROm4-uM_XMquX_E&s=Qq6QdMWvudWgR_WGKdbBVNnVs5JO6s692MxjDo2JR9Y&e= -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] surprisingly, S4 classes with a "dim" or "dimnames" slot are final (in the Java sense)
Thanks Michael for taking care of this. H. On 06/06/2017 11:48 AM, Michael Lawrence wrote: I've fixed this and will commit soon. Disregard my dim<-() example; that behaves as expected (the class needs a dim<-() method). Michael On Tue, Jun 6, 2017 at 5:16 AM, Michael Lawrence mailto:micha...@gene.com>> wrote: Thanks for the report. The issue is that one cannot set special attributes like names, dim, dimnames, etc on S4 objects. I was aready working on this and will have a fix soon. > a2 <- new("A2") > dim(a2) <- c(2, 3) Error in dim(a2) <- c(2, 3) : invalid first argument On Mon, Jun 5, 2017 at 6:08 PM, Hervé Pagès mailto:hpa...@fredhutch.org>> wrote: Hi, It's nice to be able to define S4 classes with slots that correspond to standard attributes: setClass("A1", slots=c(names="character")) setClass("A2", slots=c(dim="integer")) setClass("A3", slots=c(dimnames="list")) By doing this, one gets a few methods for free: a1 <- new("A1", names=letters[1:3]) names(a1) # "a" "b" "c" a2 <- new("A2", dim=4:3) nrow(a2) # 4 a3 <- new("A3", dimnames=list(NULL, letters[1:3])) colnames(a3) # "a" "b" "c" However, when it comes to subclassing, some of these slots cause problems. I can extend A1: setClass("B1", contains="A1") but trying to extend A2 or A3 produces an error (with a non-informative message in the 1st case and a somewhat obscure one in the 2nd): setClass("B2", contains="A2") # Error in attr(prototype, slotName) <- attr(pri, slotName) : # invalid first argument setClass("B3", contains="A3") # Error in attr(prototype, slotName) <- attr(pri, slotName) : # 'dimnames' applied to non-array So it seems that the presence of a "dim" or "dimnames" slot prevents a class from being extended. Is this expected? I couldn't find anything in TFM about this. Sorry if I missed it. Thanks, H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org <mailto:hpa...@fredhutch.org> Phone: (206) 667-5791 Fax: (206) 667-1319 __ R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-devel <https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=7MsydJAWI1B1wsZHDmsO-mpZ_vfvDpTo-YMHgUXrQKQ&s=dXHseRValxgm4TXgSsjasFRGgqAf46IivoNi4VnRj3o&e=> -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] surprisingly, S4 classes with a "dim" or "dimnames" slot are final (in the Java sense)
Hi, It's nice to be able to define S4 classes with slots that correspond to standard attributes: setClass("A1", slots=c(names="character")) setClass("A2", slots=c(dim="integer")) setClass("A3", slots=c(dimnames="list")) By doing this, one gets a few methods for free: a1 <- new("A1", names=letters[1:3]) names(a1) # "a" "b" "c" a2 <- new("A2", dim=4:3) nrow(a2) # 4 a3 <- new("A3", dimnames=list(NULL, letters[1:3])) colnames(a3) # "a" "b" "c" However, when it comes to subclassing, some of these slots cause problems. I can extend A1: setClass("B1", contains="A1") but trying to extend A2 or A3 produces an error (with a non-informative message in the 1st case and a somewhat obscure one in the 2nd): setClass("B2", contains="A2") # Error in attr(prototype, slotName) <- attr(pri, slotName) : # invalid first argument setClass("B3", contains="A3") # Error in attr(prototype, slotName) <- attr(pri, slotName) : # 'dimnames' applied to non-array So it seems that the presence of a "dim" or "dimnames" slot prevents a class from being extended. Is this expected? I couldn't find anything in TFM about this. Sorry if I missed it. Thanks, H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] sum() returns NA on a long *logical* vector when nb of TRUE values exceeds 2^31
Hi, I have a long numeric vector 'xx' and I want to use sum() to count the number of elements that satisfy some criteria like non-zero values or values lower than a certain threshold etc... The problem is: sum() returns an NA (with a warning) if the count is greater than 2^31. For example: > xx <- runif(3e9) > sum(xx < 0.9) [1] NA Warning message: In sum(xx < 0.9) : integer overflow - use sum(as.numeric(.)) This already takes a long time and doing sum(as.numeric(.)) would take even longer and require allocation of 24Gb of memory just to store an intermediate numeric vector made of 0s and 1s. Plus, having to do sum(as.numeric(.)) every time I need to count things is not convenient and is easy to forget. It seems that sum() on a logical vector could be modified to return the count as a double when it cannot be represented as an integer. Note that length() already does this so that wouldn't create a precedent. Also and FWIW prod() avoids the problem by always returning a double, whatever the type of the input is (except on a complex vector). I can provide a patch if this change sounds reasonable. Cheers, H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] stopifnot() does not stop at first non-TRUE argument
On 05/16/2017 09:59 AM, peter dalgaard wrote: On 16 May 2017, at 18:37 , Suharto Anggono Suharto Anggono via R-devel wrote: switch(i, ...) extracts 'i'-th argument in '...'. It is like eval(as.name(paste0("..", i))) . Hey, that's pretty neat! Indeed! Seems like this topic is even more connected to switch() than I anticipated... H. -pd Just mentioning other things: - For 'n', n <- nargs() can be used. - sys.call() can be used in place of match.call() . --- peter dalgaard on Mon, 15 May 2017 16:28:42 +0200 writes: I think Hervé's idea was just that if switch can evaluate arguments selectively, so can stopifnot(). But switch() is .Primitive, so does it from C. if he just meant that, then "yes, of course" (but not so interesting). I think it is almost a no-brainer to implement a sequential stopifnot if dropping to C code is allowed. In R it gets trickier, but how about this: Something like this, yes, that's close to what Serguei Sokol had proposed (and of course I *do* want to keep the current sophistication of stopifnot(), so this is really too simple) Stopifnot <- function(...) { n <- length(match.call()) - 1 for (i in 1:n) { nm <- as.name(paste0("..",i)) if (!eval(nm)) stop("not all true") } } Stopifnot(2+2==4) Stopifnot(2+2==5, print("Hey!!!") == "Hey!!!") Stopifnot(2+2==4, print("Hey!!!") == "Hey!!!") Stopifnot(T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,F,T) On 15 May 2017, at 15:37 , Martin Maechler wrote: I'm still curious about Hervé's idea on using switch() for the issue. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com __ R-devel@r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=mLJLORFCunDiCafHllurGVVVHiMf85ExkM7B5DngfIk&s=helOsmplADBmY6Ct7r30onNuD8a6GKz6yuSgjPxljeU&e= -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] stopifnot() does not stop at first non-TRUE argument
On 05/15/2017 07:28 AM, peter dalgaard wrote: I think Hervé's idea was just that if switch can evaluate arguments selectively, so can stopifnot(). Yep. Thanks, H. But switch() is .Primitive, so does it from C. I think it is almost a no-brainer to implement a sequential stopifnot if dropping to C code is allowed. In R it gets trickier, but how about this: Stopifnot <- function(...) { n <- length(match.call()) - 1 for (i in 1:n) { nm <- as.name(paste0("..",i)) if (!eval(nm)) stop("not all true") } } Stopifnot(2+2==4) Stopifnot(2+2==5, print("Hey!!!") == "Hey!!!") Stopifnot(2+2==4, print("Hey!!!") == "Hey!!!") Stopifnot(T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,F,T) On 15 May 2017, at 15:37 , Martin Maechler wrote: I'm still curious about Hervé's idea on using switch() for the issue. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel