[Rd] Support for as(x, "raw")

2024-05-14 Thread Hervé Pagès
Hi,

as(x, "") is supported and does as.(x) for all 
vector types except for raw. For example all the following coercions 
work and do what you'd expect: as(1L, "logical"), as(1L, "double"), 
as(1L, "complex"), as(1L, "character"), as(1L, "list"). But as(1L, 
"raw") does not:

     > as(1L, "raw")
     Error in as(1L, "raw") :
       no method or default for coercing “integer” to “raw”

Even though as.raw(1L) works:

     > as.raw(1L)
     [1] 01

Is there any particular reason for that or would it be reasonable to 
define a coerce() method from ANY to raw like it's been done for all the 
other vector types:

     > selectMethod(coerce, c("ANY", "logical"))
     Method Definition:
     function (from, to, strict = TRUE)
     {
         value <- as.logical(from)
         if (strict)
             attributes(value) <- NULL
         value
     }
     
     Signatures:
             from  to
     target  "ANY" "logical"
     defined "ANY" "logical"

     ...

     ...

     > selectMethod(coerce, c("ANY", "list"))
     Method Definition:
     function (from, to, strict = TRUE)
     {
         value <- as.list(from)
         if (strict)
             attributes(value) <- NULL
         value
     }
     
     Signatures:
         from  to
     target  "ANY" "list"
     defined "ANY" "list"


     > selectMethod(coerce, c("ANY", "raw"))
     Error in selectMethod(coerce, c("ANY", "raw")) :
       no method found for signature ANY, raw

Thanks,

H.

-- 
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Question regarding .make_numeric_version with non-character input

2024-04-25 Thread Hervé Pagès
On 4/25/24 07:04, Kurt Hornik wrote:

...
> Sure, I'll look into adding something.  (Too late for 4.4.0, of course.)
>
> Best
> -k

Great. Thanks!

H.

-- 
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Question regarding .make_numeric_version with non-character input

2024-04-25 Thread Hervé Pagès
On 4/24/24 23:07, Kurt Hornik wrote:

>>>>>> Hervé Pagès writes:
>> Hi Kurt,
>> Is it intended that numeric_version() returns an error by default on
>> non-character input in R 4.4.0?
> Dear Herve, yes, that's the intention.
>
>> It seems that I can turn this into a warning by setting
>> _R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_=false but I don't
>> seem to be able to find any of this mentioned in the NEWS file.
> That's what I added for smoothing the transition: it will be removed
> from the trunk shortly.

Thanks for clarifying.  Could this be documented in the NEWS file? This 
is a breaking change (it breaks a couple of Bioconductor packages) and 
people are not going to set this environment variable if they are not 
aware of it.

Thanks again,

H.

>
> Best
> -k
>
>> Thanks,
>> H.
>> On 4/1/24 05:28, Kurt Hornik wrote:
>>  Andrea Gilardi via R-devel writes:
>  
>>  Thanks: should be fixed now in the trunk.
>  
>>  Best
>>  -k
>>  Thank you very much Dirk for your kind words and for confirming the 
>> bug.
>>  Next week I will open a new issue on Bugzilla adding the related 
>> patch.
>  
>>  Kind regards
>  
>>  Andrea
>  
>>  On 29/03/2024 20:14, Dirk Eddelbuettel wrote:
>  
>>  On 29 March 2024 at 17:56, Andrea Gilardi via R-devel wrote:
>>  | Dear all,
>>  |
>>  | I have a question regarding the R-devel version of 
>> .make_numeric_version() function. As far as I can understand, the current 
>> code 
>> (https://github.com/wch/r-source/blob/66b91578dfc85140968f07dd4e72d8cb8a54f4c6/src/library/base/R/version.R#L50-L56)
>>  runs the following steps in case of non-character input:
>>  |
>>  | 1. It creates a message named msg using gettextf.
>>  | 2. Such object is then passed to stop(msg) or warning(msg) 
>> according to the following condition
>>  |
>>  | 
>> tolower(Sys.getenv("_R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_") != 
>> "false")
>>  |
>>  | However, I don't understand the previous code since the 
>> output of Sys.getenv("_R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_") != 
>> "false" is just a boolean value and tolower() will just return "true" or 
>> "false". Maybe the intended code is 
>> tolower(Sys.getenv("_R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_")) != 
>> "false" ? Or am I missing something?
>  
>>  Yes, agreed -- good catch.  In full, the code is (removing 
>> leading
>>  whitespace, and putting it back onto single lines)
>  
>>  msg <- gettextf("invalid non-character version specification 
>> 'x' (type: %s)", typeof(x))
>>  
>> if(tolower(Sys.getenv("_R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_") != 
>> "false"))
>>  stop(msg, domain = NA)
>>  else
>>  warning(msg, domain = NA, immediate. = TRUE)
>  
>>  where msg is constant (but reflecting language settings via 
>> standard i18n)
>>  and as you not the parentheses appear wrong.  What was intended 
>> is likely
>  
>>  msg <- gettextf("invalid non-character version specification 
>> 'x' (type: %s)", typeof(x))
>>  
>> if(tolower(Sys.getenv("_R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_")) 
>> != "false")
>>  stop(msg, domain = NA)
>>  else
>>  warning(msg, domain = NA, immediate. = TRUE)
>  
>>  If you use bugzilla before and have a handle, maybe file a bug 
>> report with
>>  this as patch athttps://bugs.r-project.org/
>  
>>  Dirk
>>  __
>>  R-devel@r-project.org  mailing list
>>  https://stat.ethz.ch/mailman/listinfo/r-devel
>  
>>  __
>>  R-devel@r-project.org  mailing list
>>  https://stat.ethz.ch/mailman/listinfo/r-devel
>  
>> -- 
>> Hervé Pagès
>> Bioconductor Core Team
>> hpages.on.git...@gmail.com

-- 
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Question regarding .make_numeric_version with non-character input

2024-04-24 Thread Hervé Pagès
Hi Kurt,

Is it intended that numeric_version() returns an error by default on 
non-character input in R 4.4.0? It seems that I can turn this into a 
warning by setting 
_R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_=false but I don't seem 
to be able to find any of this mentioned in the NEWS file.

Thanks,

H.

On 4/1/24 05:28, Kurt Hornik wrote:
>>>>>> Andrea Gilardi via R-devel writes:
> Thanks: should be fixed now in the trunk.
>
> Best
> -k
>
>> Thank you very much Dirk for your kind words and for confirming the bug.
>> Next week I will open a new issue on Bugzilla adding the related patch.
>> Kind regards
>> Andrea
>> On 29/03/2024 20:14, Dirk Eddelbuettel wrote:
>>> On 29 March 2024 at 17:56, Andrea Gilardi via R-devel wrote:
>>> | Dear all,
>>> |
>>> | I have a question regarding the R-devel version of 
>>> .make_numeric_version() function. As far as I can understand, the current 
>>> code 
>>> (https://github.com/wch/r-source/blob/66b91578dfc85140968f07dd4e72d8cb8a54f4c6/src/library/base/R/version.R#L50-L56)
>>>  runs the following steps in case of non-character input:
>>> |
>>> | 1. It creates a message named msg using gettextf.
>>> | 2. Such object is then passed to stop(msg) or warning(msg) according to 
>>> the following condition
>>> |
>>> | tolower(Sys.getenv("_R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_") != 
>>> "false")
>>> |
>>> | However, I don't understand the previous code since the output of 
>>> Sys.getenv("_R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_") != "false" 
>>> is just a boolean value and tolower() will just return "true" or "false". 
>>> Maybe the intended code is 
>>> tolower(Sys.getenv("_R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_")) != 
>>> "false" ? Or am I missing something?
>>>
>>> Yes, agreed -- good catch.  In full, the code is (removing leading
>>> whitespace, and putting it back onto single lines)
>>>
>>> msg <- gettextf("invalid non-character version specification 'x' (type: 
>>> %s)", typeof(x))
>>> if(tolower(Sys.getenv("_R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_") 
>>> != "false"))
>>> stop(msg, domain = NA)
>>> else
>>> warning(msg, domain = NA, immediate. = TRUE)
>>>
>>> where msg is constant (but reflecting language settings via standard i18n)
>>> and as you not the parentheses appear wrong.  What was intended is likely
>>>
>>> msg <- gettextf("invalid non-character version specification 'x' (type: 
>>> %s)", typeof(x))
>>> if(tolower(Sys.getenv("_R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_")) 
>>> != "false")
>>> stop(msg, domain = NA)
>>> else
>>> warning(msg, domain = NA, immediate. = TRUE)
>>>
>>> If you use bugzilla before and have a handle, maybe file a bug report with
>>> this as patch athttps://bugs.r-project.org/
>>>
>>> Dirk
>>>
>> __
>> R-devel@r-project.org  mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> __
> R-devel@r-project.org  mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Spurious warning in as.data.frame.factor()

2024-03-18 Thread Hervé Pagès
Thanks Martin. We'll update the BioC builders to the latest R devel soon.

Cheers,

H.

On 3/15/24 10:26, Martin Maechler wrote:
>>>>>> Martin Maechler
>>>>>>  on Fri, 15 Mar 2024 11:24:22 +0100 writes:
>>>>>> Ivan Krylov
>>>>>>  on Thu, 14 Mar 2024 14:17:38 +0300 writes:
>  >> On Thu, 14 Mar 2024 10:41:54 +0100
>  >> Martin Maechler  wrote:
>
>  >>> Anybody trying S7 examples and see if they work w/o producing
>  >>> wrong warnings?
>
>  >> It looks like this is not applicable to S7. If I overwrite
>  >> as.data.frame with a newly created S7 generic, it fails to dispatch on
>  >> existing S3 classes:
>
>  >> new_generic('as.data.frame', 'x')(factor(1))
>  >> # Error: Can't find method for `as.data.frame(S3)`.
>
>  >> But there is no need to overwrite the generic, because S7 classes
>  >> should work with existing S3 generics:
>
>  >> foo <- new_class('foo', parent = class_double)
>  >> method(as.data.frame, foo) <- function(x) structure(
>  >> # this is probably not generally correct
>  >> list(x),
>  >> names = deparse1(substitute(x)),
>  >> row.names = seq_len(length(x)),
>  >> class = 'data.frame'
>  >> )
>  >> str(as.data.frame(foo(pi)))
>  >> # 'data.frame':   1 obs. of  1 variable:
>  >> #  $ x:  num 3.14
>
>  >> So I think that is nothing to break because S7 methods for
>  >> as.data.frame will rely on S3 for dispatch.
>
>  > Yes, as it should be.  Thank you for checking..
>
>
>  >>> > The patch passes make check-devel, but I'm not sure how to safely
>  >>> > put setGeneric('as.data.frame'); as.data.frame(factor(1:10)) in a
>  >>> > regression test.
>  >>>
>  >>> {What's the danger/problem?  we do have "similar" tests in both
>  >>> src/library/methods/tests/*.R
>  >>> tests/reg-S4.R
>  >>>
>  >>> -- maybe we can discuss bi-laterally  (or here, as you prefer)
>  >>> }
>
>  >> This might be educational for other people wanting to add a regression
>  >> test to their patch. I see that tests/reg-tests-1e.R is already 
> running
>  >> under options(warn = 2), so if I add the following near line 750
>  >> ("Deprecation of *direct* calls to as.data.frame.")...
>
>  >> # Should not warn for a call from a derivedDefaultMethod to the raw
>  >> # S3 method -- implementation detail of S4 dispatch
>  >> setGeneric('as.data.frame')
>  >> as.data.frame(factor(1))
>
>  >> ...then as.data.frame will remain an S4 generic. Should the test then
>  >> rm(as.data.frame) and keep going? (Or even keep the S4 generic?) Is
>  >> there any hidden state I may be breaking for the rest of the test this
>  >> way?
>  >> The test does pass like this, so this may be worrying about nothing.
>
>  > Indeed, this could be educational;  I think just adding
>
>  > removeGeneric('as.data.frame')
>
>  > is appropriate here as it is self-explaining and should not leave
>  > much traces.
>
>  > I'm about to test this in reg-tests-1e.R and with make check-all
>  > and commit later today,
>  > thanking you, Ivan!
>
> This has been committed to R-devel svn rev 86139 now.
>
> So these spurious warnings in situations where  as.data.frame()
> is an S4 generic --- notably for the many Bioconductor package
> depending on {BiocGenerics} ---  should disappear within 24
> hours or less.
>
> Martin

-- 
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Spurious warning in as.data.frame.factor()

2024-03-12 Thread Hervé Pagès
Hi,

The acrobatics that as.data.frame.factor() is going thru in order to 
recognize a direct call don't play nice if as.data.frame() is an S4 
generic:

     df <- as.data.frame(factor(11:12))

     suppressPackageStartupMessages(library(BiocGenerics))
     isGeneric("as.data.frame")
     # [1] TRUE

     df <- as.data.frame(factor(11:12))
     # Warning message:
     # In as.data.frame.factor(factor(11:12)) :
     #   Direct call of 'as.data.frame.factor()' is deprecated. Use 
'as.data.frame.vector()' or 'as.data.frame()' instead

This spurious warning showed up on the recent Bioconductor daily build 
reports after we've updated the build machines to the latest R devel. 
It's causing some confusion and breaks at least one unit test.

Thanks,

H.

 > sessionInfo()
R Under development (unstable) (2024-03-06 r86056)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.4 LTS

Matrix products: default
BLAS:   /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_GB  LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats graphics  grDevices utils datasets  methods base

other attached packages:
[1] BiocGenerics_0.49.1

loaded via a namespace (and not attached):
[1] compiler_4.4.0

-- 
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] NOTE: multiple local function definitions for ?fun? with different formal arguments

2024-02-06 Thread Hervé Pagès
Thanks. Workarounds are interesting but... what's the point of the NOTE 
in the first place?

H.

On 2/4/24 09:07, Duncan Murdoch wrote:
> On 04/02/2024 10:55 a.m., Izmirlian, Grant (NIH/NCI) [E] via R-devel 
> wrote:
>> Well you can see that yeast is exactly weekday you have.  The way out 
>> is to just not name the result
>
> I think something happened to your explanation...
>
>>
>> toto <- function(mode)
>> {
>>  ifelse(mode == 1,
>>  function(a,b) a*b,
>>  function(u, v, w) (u + v) / w)
>> }
>
> It's a bad idea to use ifelse() when you really want if() ... else ... 
> .  In this case it works, but it doesn't always.  So the workaround 
> should be
>
>
> toto <- function(mode)
> {
>     if(mode == 1)
>     function(a,b) a*b
>     else
>     function(u, v, w) (u + v) / w
> }
>
>
>>
>>
>> 
>> From: Grant Izmirlian 
>> Date: Sun, Feb 4, 2024, 10:44 AM
>> To: "Izmirlian, Grant (NIH/NCI) [E]" 
>> Subject: Fwd: [EXTERNAL] R-devel Digest, Vol 252, Issue 2
>>
>> Hi,
>>
>> I just ran into this 'R CMD check' NOTE for the first time:
>>
>> * checking R code for possible problems ... NOTE
>> toto: multiple local function definitions for �fun� with different
>>    formal arguments
>>
>> The "offending" code is something like this (simplified from the real 
>> code):
>>
>> toto <- function(mode)
>> {
>>  if (mode == 1)
>>  fun <- function(a, b) a*b
>>  else
>>  fun <- function(u, v, w) (u + v) / w
>>  fun
>> }
>>
>> Is that NOTE really intended? Hard to see why this code would be
>> considered "wrong".
>>
>> I know it's just a NOTE but still...
>
> I agree it's a false positive, but the issue is that you have a 
> function object in your function which can't be called 
> unconditionally.  The workaround doesn't create such an object.
>
> Recognizing that your function never tries to call fun requires global 
> inspection of toto(), and most of the checks are based on local 
> inspection.
>
> Duncan Murdoch
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] NOTE: multiple local function definitions for ‘fun’ with different formal arguments

2024-02-03 Thread Hervé Pagès
Hi,

I just ran into this 'R CMD check' NOTE for the first time:

* checking R code for possible problems ... NOTE
toto: multiple local function definitions for ‘fun’ with different
   formal arguments

The "offending" code is something like this (simplified from the real code):

toto <- function(mode)
{
     if (mode == 1)
     fun <- function(a, b) a*b
     else
     fun <- function(u, v, w) (u + v) / w
     fun
}

Is that NOTE really intended? Hard to see why this code would be 
considered "wrong".

I know it's just a NOTE but still...

Thanks,

H.

-- 
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Should subsetting named vector return named vector including named unmatched elements?

2024-01-18 Thread Hervé Pagès
Never been a big fan of this behavior either but maybe the intention was 
to make it easier to distinguish between 2 types of NAs in the result: 
those that were present in the original vector vs those that are 
introduced by an unmatched subscript. Like in this example:

     x <- setNames(c(101:108, NA), letters[1:9])
     x
     #   a   b   c   d   e   f   g   h   i
     # 101 102 103 104 105 106 107 108  NA

     x[c("g", "k", "a", "i")]
     #    g     a    i
     #  107   NA  101   NA

The first NA is the result of an unmatched subscript, while the second 
one comes from 'x'.

This is of limited interest though. In most real world applications I've 
worked on, we actually need to "fix" the names of the result.

Best,

H.

On 1/18/24 11:51, Jiří Moravec wrote:
> Subsetting vector (including lists) returns the same number of 
> elements as the subsetting vector, including unmatched elements which 
> are reported as `NA` or `NULL` (in case of lists).
>
> Consider:
>
> ```
> menu = list(
>   "bacon" = "foo",
>   "eggs" = "bar",
>   "beans" = "baz"
>   )
>
> select = c("bacon", "eggs", "spam")
>
> menu[select]
> # $bacon
> # [1] "foo"
> #
> # $eggs
> # [1] "bar"
> #
> # $
> # NULL
>
> ```
>
> Wouldn't it be more logical to return named vector/list including 
> names of unmatched elements when subsetting using names? After all, 
> the unmatched elements are already returned. I.e., the output would 
> look like this:
>
> ```
>
> menu[select]
> # $bacon
> # [1] "foo"
> #
> # $eggs
> # [1] "bar"
> #
> # $spam
> # NULL
>
> ```
>
> The simple fix `menu[select] |> setNames(select)` solves, but it feels 
> to me like something that could be a default behaviour.
>
> On slightly unrelated note, when I was asking if there is a better 
> solution, the `menu[select]` seems to allocate more memory than 
> `menu_env = list2env(menu); mget(select, envir = menu, ifnotfound = 
> list(NULL)`. Or the sapply solution. Is this a benchmarking artifact?
>
> https://stackoverflow.com/q/77828678/4868692
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] 'R CMD INSTALL' keeps going on despite serious errors, and returns exit code 0

2023-11-04 Thread Hervé Pagès
I see. We'll update soon. Thanks Martin.

On 11/4/23 06:52, Martin Maechler wrote:
>>>>>> Hervé Pagès
>>>>>>  on Fri, 3 Nov 2023 15:10:40 -0700 writes:
>  > Hi list,
>
>  > Here is an example:
>
>  >      hpages@XPS15:~$ R CMD INSTALL CoreGx     * installing
>
>
>  >     hpages@XPS15:~$ R CMD INSTALL CoreGx
>  >     * installing to library ‘/home/hpages/R/R-4.4.r85388/site-library’
>  ^^^
>
> Yes, this was bad behavior was the case for a short time (too
> long, my fault !!) in R-devel.
>
> But that,  svn rev 85388 , was *long* ago (close to 2 weeks):
> Current R-devel is 85471
> (The bug was "only" in 382--388, fixed in 389 -- you were really unlucky!)
>
> Still, I'm sorry that you were accidentally affected, too.
> Martin
>
>
>  >     * installing *source* package ‘CoreGx’ ...
>  >     ** using staged installation
>  >     ** R
>  >     ** data
>  >     *** moving datasets to lazyload DB
>  >     ** inst
>  >     ** byte-compile and prepare package for lazy loading
>  >     Error : in method for ‘updateObject’ with signature
>  > ‘object="CoreSet"’:  arguments (‘verbose’) after ‘...’ in the generic
>  > must appear in the method, in the same place at the end of the 
> argument list
>  >     Error: unable to load R code in package ‘CoreGx’
>  >     ** help
>  >     *** installing help indices
>  >     ** building package indices
>  >     ** installing vignettes
>  >     ** testing if installed package can be loaded from temporary 
> location
>  >     Error : in method for ‘updateObject’ with signature
>  > ‘object="CoreSet"’:  arguments (‘verbose’) after ‘...’ in the generic
>  > must appear in the method, in the same place at the end of the 
> argument list
>  >     Error: package or namespace load failed for ‘CoreGx’:
>  >  unable to load R code in package ‘CoreGx’
>  >     Error: loading failed
>  >     ** testing if installed package can be loaded from final location
>  >     Error : in method for ‘updateObject’ with signature
>  > ‘object="CoreSet"’:  arguments (‘verbose’) after ‘...’ in the generic
>  > must appear in the method, in the same place at the end of the 
> argument list
>  >     Error: package or namespace load failed for ‘CoreGx’:
>  >  unable to load R code in package ‘CoreGx’
>  >     Error: loading failed
>  >     Error : in method for ‘updateObject’ with signature
>  > ‘object="CoreSet"’:  arguments (‘verbose’) after ‘...’ in the generic
>  > must appear in the method, in the same place at the end of the 
> argument list
>  >     Error: unable to load R code in package ‘CoreGx’
>  >     ** testing if installed package keeps a record of temporary
>  > installation path
>  >     * DONE (CoreGx)
>
>  > Many serious errors were ignored. Plus the command returned exit code 
> 0:
>
>  >     hpages@XPS15:~$ echo $?
>  >     0
>
>  > This is with R 4.4, that BioC 3.19 will be based on and that we only
>  > started to use recently for our daily builds.
>
>  > Strangely, we only see this on Linux. On Windows and Mac, we get the
>  > usual hard error, as expected. See:
>
>  > -
>  
> >https://bioconductor.org/checkResults/3.19/bioc-LATEST/CoreGx/nebbiolo1-install.html
>
>  > -
>  
> >https://bioconductor.org/checkResults/3.19/bioc-LATEST/CoreGx/palomino3-install.html
>
>  > -
>  
> >https://bioconductor.org/checkResults/3.19/bioc-LATEST/CoreGx/merida1-install.html
>
>  > To reproduce:
>
>  >     library(remotes)
>  >     install_git("https://git.bioconductor.org/packages/CoreGx";)
>
>  > Thanks,
>
>  > H.
>
>  >> sessionInfo()
>  > R Under development (unstable) (2023-10-22 r85388)
>  > Platform: x86_64-pc-linux-gnu
>  > Running under: Ubuntu 23.10
>
>  > Matrix products: default
>  > BLAS:   /home/hpages/R/R-4.4.r85388/lib/libRblas.so
>  > LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.11.0
>
>  > locale:
>  >  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>  >  [3] LC_TIME=en_US.UTF-8    LC_COLLATE=en_US.UTF-8
>  >  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>  >  [7] LC_

Re: [Rd] 'R CMD INSTALL' keeps going on despite serious errors, and returns exit code 0

2023-11-03 Thread Hervé Pagès
Forgot to mention that the package actually got installed, but is 
unloadable (not surprisingly):

     > "CoreGx" %in% rownames(installed.packages())
     [1] TRUE

     > suppressWarnings(suppressMessages(library(CoreGx)))
     Error : in method for ‘updateObject’ with signature 
‘object="CoreSet"’:  arguments (‘verbose’) after ‘...’ in the generic 
must appear in the method, in the same place at the end of the argument list
     Error: package or namespace load failed for ‘CoreGx’:
      unable to load R code in package ‘CoreGx’

Best,

H.

On 11/3/23 15:10, Hervé Pagès wrote:
>
> Hi list,
>
> Here is an example:
>
>     hpages@XPS15:~$ R CMD INSTALL CoreGx
>     * installing to library ‘/home/hpages/R/R-4.4.r85388/site-library’
>     * installing *source* package ‘CoreGx’ ...
>     ** using staged installation
>     ** R
>     ** data
>     *** moving datasets to lazyload DB
>     ** inst
>     ** byte-compile and prepare package for lazy loading
>     Error : in method for ‘updateObject’ with signature 
> ‘object="CoreSet"’:  arguments (‘verbose’) after ‘...’ in the generic 
> must appear in the method, in the same place at the end of the 
> argument list
>     Error: unable to load R code in package ‘CoreGx’
>     ** help
>     *** installing help indices
>     ** building package indices
>     ** installing vignettes
>     ** testing if installed package can be loaded from temporary location
>     Error : in method for ‘updateObject’ with signature 
> ‘object="CoreSet"’:  arguments (‘verbose’) after ‘...’ in the generic 
> must appear in the method, in the same place at the end of the 
> argument list
>     Error: package or namespace load failed for ‘CoreGx’:
>  unable to load R code in package ‘CoreGx’
>     Error: loading failed
>     ** testing if installed package can be loaded from final location
>     Error : in method for ‘updateObject’ with signature 
> ‘object="CoreSet"’:  arguments (‘verbose’) after ‘...’ in the generic 
> must appear in the method, in the same place at the end of the 
> argument list
>     Error: package or namespace load failed for ‘CoreGx’:
>  unable to load R code in package ‘CoreGx’
>     Error: loading failed
>     Error : in method for ‘updateObject’ with signature 
> ‘object="CoreSet"’:  arguments (‘verbose’) after ‘...’ in the generic 
> must appear in the method, in the same place at the end of the 
> argument list
>     Error: unable to load R code in package ‘CoreGx’
>     ** testing if installed package keeps a record of temporary 
> installation path
>     * DONE (CoreGx)
>
> Many serious errors were ignored. Plus the command returned exit code 0:
>
>     hpages@XPS15:~$ echo $?
>     0
>
> This is with R 4.4, that BioC 3.19 will be based on and that we only 
> started to use recently for our daily builds.
>
> Strangely, we only see this on Linux. On Windows and Mac, we get the 
> usual hard error, as expected. See:
>
> - 
> https://bioconductor.org/checkResults/3.19/bioc-LATEST/CoreGx/nebbiolo1-install.html
>
> - 
> https://bioconductor.org/checkResults/3.19/bioc-LATEST/CoreGx/palomino3-install.html
>
> - 
> https://bioconductor.org/checkResults/3.19/bioc-LATEST/CoreGx/merida1-install.html
>
> To reproduce:
>
>     library(remotes)
>     install_git("https://git.bioconductor.org/packages/CoreGx";)
>
> Thanks,
>
> H.
>
> > sessionInfo()
> R Under development (unstable) (2023-10-22 r85388)
> Platform: x86_64-pc-linux-gnu
> Running under: Ubuntu 23.10
>
> Matrix products: default
> BLAS:   /home/hpages/R/R-4.4.r85388/lib/libRblas.so
> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.11.0
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8    LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
>  [9] LC_ADDRESS=C   LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> time zone: America/Los_Angeles
> tzcode source: system (glibc)
>
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods base
>
> other attached packages:
> [1] remotes_2.4.2.1
>
> loaded via a namespace (and not attached):
>  [1] processx_3.8.2    compiler_4.4.0    R6_2.5.1 rprojroot_2.0.3
>  [5] cli_3.6.1 prettyunits_1.2.0 tools_4.4.0 crayon_1.5.2
>  [9] desc_1.4.2    callr_3.7.3   pkgbuild_1.4.2 ps_1.7.5
>
> -- 
> Hervé Pagès
>
> Bioconductor Core Team
> hpages.on.git...@gmail.com

-- 
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] 'R CMD INSTALL' keeps going on despite serious errors, and returns exit code 0

2023-11-03 Thread Hervé Pagès
Hi list,

Here is an example:

     hpages@XPS15:~$ R CMD INSTALL CoreGx
     * installing to library ‘/home/hpages/R/R-4.4.r85388/site-library’
     * installing *source* package ‘CoreGx’ ...
     ** using staged installation
     ** R
     ** data
     *** moving datasets to lazyload DB
     ** inst
     ** byte-compile and prepare package for lazy loading
     Error : in method for ‘updateObject’ with signature 
‘object="CoreSet"’:  arguments (‘verbose’) after ‘...’ in the generic 
must appear in the method, in the same place at the end of the argument list
     Error: unable to load R code in package ‘CoreGx’
     ** help
     *** installing help indices
     ** building package indices
     ** installing vignettes
     ** testing if installed package can be loaded from temporary location
     Error : in method for ‘updateObject’ with signature 
‘object="CoreSet"’:  arguments (‘verbose’) after ‘...’ in the generic 
must appear in the method, in the same place at the end of the argument list
     Error: package or namespace load failed for ‘CoreGx’:
  unable to load R code in package ‘CoreGx’
     Error: loading failed
     ** testing if installed package can be loaded from final location
     Error : in method for ‘updateObject’ with signature 
‘object="CoreSet"’:  arguments (‘verbose’) after ‘...’ in the generic 
must appear in the method, in the same place at the end of the argument list
     Error: package or namespace load failed for ‘CoreGx’:
  unable to load R code in package ‘CoreGx’
     Error: loading failed
     Error : in method for ‘updateObject’ with signature 
‘object="CoreSet"’:  arguments (‘verbose’) after ‘...’ in the generic 
must appear in the method, in the same place at the end of the argument list
     Error: unable to load R code in package ‘CoreGx’
     ** testing if installed package keeps a record of temporary 
installation path
     * DONE (CoreGx)

Many serious errors were ignored. Plus the command returned exit code 0:

     hpages@XPS15:~$ echo $?
     0

This is with R 4.4, that BioC 3.19 will be based on and that we only 
started to use recently for our daily builds.

Strangely, we only see this on Linux. On Windows and Mac, we get the 
usual hard error, as expected. See:

- 
https://bioconductor.org/checkResults/3.19/bioc-LATEST/CoreGx/nebbiolo1-install.html

- 
https://bioconductor.org/checkResults/3.19/bioc-LATEST/CoreGx/palomino3-install.html

- 
https://bioconductor.org/checkResults/3.19/bioc-LATEST/CoreGx/merida1-install.html

To reproduce:

     library(remotes)
     install_git("https://git.bioconductor.org/packages/CoreGx";)

Thanks,

H.

 > sessionInfo()
R Under development (unstable) (2023-10-22 r85388)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 23.10

Matrix products: default
BLAS:   /home/hpages/R/R-4.4.r85388/lib/libRblas.so
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.11.0

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8    LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: America/Los_Angeles
tzcode source: system (glibc)

attached base packages:
[1] stats graphics  grDevices utils datasets  methods base

other attached packages:
[1] remotes_2.4.2.1

loaded via a namespace (and not attached):
  [1] processx_3.8.2    compiler_4.4.0    R6_2.5.1 rprojroot_2.0.3
  [5] cli_3.6.1 prettyunits_1.2.0 tools_4.4.0 crayon_1.5.2
  [9] desc_1.4.2    callr_3.7.3   pkgbuild_1.4.2 ps_1.7.5

-- 
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] dim<-() changed in R-devel; no longer removing "dimnames" when doing dim(x) <- dim(x)

2023-10-30 Thread Hervé Pagès
Hi Martin, Henrik,

I actually like this change.

Makes a lot of sense IMO that dim(x) <- dim(x) be a no-op, or, more 
generally, that foo(x) <- foo(x) be a no-op for any setter/getter combo.

FWIW S4Arrays::set_dim() does that too. It also preserves the dimnames 
if the right value is only adding or dropping outermost (ineffective) 
dimensions:

     > x <- array(1:6, dim=c(2,3,1), dimnames=list(c("A", "B"), 
c("x","y", "z"), "T"))
     > S4Arrays:::set_dim(x, 2:3)
       x y z
     A 1 3 5
     B 2 4 6

Note that this is consistent with drop().

Best,

H.

On 10/30/23 03:53, Martin Maechler wrote:
>>>>>> Henrik Bengtsson
>>>>>>  on Sun, 29 Oct 2023 10:42:19 -0700 writes:
>  > Hello,
>
>  > the fix of PR18612
>  > (https://bugs.r-project.org/show_bug.cgi?id=18612) in
>  > r85380
>  > 
> (https://github.com/wch/r-source/commit/2653cc6203fce4c48874111c75bbccac3ac4e803)
>  > caused a change in `dim<-()`.  Specifically, in the past,
>  > any `dim<-()` assignment would _always_ remove "dimnames"
>  > and "names" attributes per help("dim"):
>
>
>  > The replacement method changes the "dim" attribute
>  > (provided the new value is compatible) and removes any
>  > "dimnames" and "names" attributes.
>
>  > In the new version, assigning the same "dim" as before
>  > will no longer remove "dimnames".  I'm reporting here to
>  > check whether this change was intended, or if it was an
>  > unintended side effect of the bug fix.
>
>  > For example, in R Under development (unstable) (2023-10-21
>  > r85379), we would get:
>
>  >> x <- array(1:2, dim=c(1,2), dimnames=list("A",
>  >> c("a","b"))) str(dimnames(x))
>  > List of 2 $ : chr "A" $ : chr [1:2] "a" "b"
>
>  >> dim(x) <- dim(x) ## Removes "dimnames" no matter what
>  >> str(dimnames(x))
>  >  NULL
>
>
>  > whereas in R Under development (unstable) (2023-10-21
>  > r85380) and beyond, we now get:
>
>  >> x <- array(1:2, dim=c(1,2), dimnames=list("A",
>  >> c("a","b"))) str(dimnames(x))
>  > List of 2 $ : chr "A" $ : chr [1:2] "a" "b"
>
>  >> dim(x) <- dim(x) ## No longer removes "dimnames"
>  >> str(dimnames(x))
>  > List of 2 $ : chr "A" $ : chr [1:2] "a" "b"
>
>  >> dim(x) <- rev(dim(x)) ## Still removes "dimnames"
>  >> str(dimnames(x))
>  >  NULL
>
>  > /Henrik
>
> Thank you, Henrik.
>
> This is "funny" (in an unusal sense):
> indeed, the change was *in*advertent, by me (svn rev 85380).
>
> I had experimentally {i.e., only in my own private version of R-devel!}
> modified the behavior of `dim<-` somewhat
> such it does *not* unnecessarily drop dimnames,
> e.g., in your   `dim(x) <- dim(x)` case above,
> one could really argue that it's a "true loss" if x loses
> dimnames "unnecessarily" ...
>
> OTOH, I knew in the mean time that  `dim<-` has always been
> documented to drop dimnames in all cases,  and even more
> importantly, I got a strong recommendation to *not* go further
> with this idea -- not only for back compatibility reasons, but
> also for internal logical consistency.
>
> Most probably, we will just revert this inadvertent change,
> but before that ... since it has been out in the wild anyway,
> we could quickly consider if it *did* break code.
>
> I assume it did, or you would not have noticed ?
>
> Martin
>
> __
> R-devel@r-project.org  mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] as(, "dgTMatrix")' is deprecated.

2023-10-04 Thread Hervé Pagès
Hi Martin,

On 10/3/23 10:17, Martin Maechler wrote:
>>>>>> Duncan Murdoch
>>>>>>  on Tue, 3 Oct 2023 12:59:10 -0400 writes:
>  > On 03/10/2023 12:50 p.m., Koenker, Roger W wrote:
>  >> I’ve been getting this warning for a while now (about
>  >> five years if memory serves) and I’m finally tired of it,
>  >> but also too tired to track it down in Matrix.  As far as
>  >> I can grep I have no reference to either deprecated
>  >> object, only the apparently innocuous Matrix::Matrix(A,
>  >> sparse = TRUE).  Can someone advise, Martin perhaps?  I
>  >> thought it might come from Rmosek, but mosek folks don’t
>  >> think so.
>  >>https://groups.google.com/g/mosek/c/yEwXmMfHBbg/m/l_mkeM4vAAAJ
>
>  > A quick scan of that discussion didn't turn up anything
>  > relevant, e.g. a script to produce the warning.  Could you
>  > be more specific, or just post the script here?
>
>  > In general, a good way to locate the source of a warning
>  > is to set options(warn=2) to turn it into an error, and
>  > then trigger it.  The traceback from the error will
>  > include a bunch of junk from the code that catches the
>  > warning, but it will also include the context where it was
>  > triggered.
>
>  > Duncan Murdoch
>
> Indeed.
>
> But Roger is right that it in the end, (almost surely) it is
> from our {Matrix} package.
>
> Indeed for several years now, we have tried to make the setup
> leaner (and hence faster) by not explicitly define coercion
> from  to   because  the size of
>  is here about 200, and we don't want to have to provide
> 200^2 = 40'000  coercion methods.

40,000 coercion methods sounds indeed crazy. But have you considered 
having 200 coercions from ANY to ?

For example the coercion from ANY to dgTMatrix would do as(as(as(from, 
"dMatrix"), "generalMatrix"), "TsparseMatrix").

Maybe the ANY->xyzMatrix methods could even be generated programmatically?

Best,

H.

>
> Rather, Matrix package users should use to high level abstract Matrix
> classes such as "sparseMatrix" or "CsparseMatrix" or
> "TsparseMatrix" or "dMatrix", "symmetricMatrix".
>
> In the case of  as(, "dgTMatrix") , if you
> replace "dgTMatrix" by "TsparseMatrix"
> the result will be the same but also work in the future when the
> deprecation may have been turned into a defunctation ...
>
> Martin
>
> __
> R-devel@r-project.org  mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Recent changes to as.complex(NA_real_)

2023-09-25 Thread Hervé Pagès


On 9/25/23 07:05, Martin Maechler wrote:
>>>>>> Hervé Pagès
>>>>>>  on Sat, 23 Sep 2023 16:52:21 -0700 writes:
>  > Hi Martin,
>  > On 9/23/23 06:43, Martin Maechler wrote:
>  >>>>>>> Hervé Pagès
>  >>>>>>> on Fri, 22 Sep 2023 16:55:05 -0700 writes:
>  >> > The problem is that you have things that are
>  >> > **semantically** different but look exactly the same:
>  >>
>  >> > They look the same:
>  >>
>  >> >> x
>  >> > [1] NA
>  >> >> y
>  >> > [1] NA
>  >> >> z
>  >> > [1] NA
>  >>
>  >> >> is.na(x)
>  >> > [1] TRUE
>  >> >> is.na(y)
>  >> > [1] TRUE
>  >> >> is.na(z)
>  >> > [1] TRUE
>  >>
>  >> >> str(x)
>  >> >   cplx NA
>  >> >> str(y)
>  >> >   num NA
>  >> >> str(z)
>  >> >   cplx NA
>  >>
>  >> > but they are semantically different e.g.
>  >>
>  >> >> Re(x)
>  >> > [1] NA
>  >> >> Re(y)
>  >> > [1] -0.5  # surprise!
>  >>
>  >> >> Im(x)  # surprise!
>  >> > [1] 2
>  >> >> Im(z)
>  >> > [1] NA
>  >>
>  >> > so any expression involving Re() or Im() will produce
>  >> > different results on input that look the same on the
>  >> > surface.
>  >>
>  >> > You can address this either by normalizing the internal
>  >> > representation of complex NA to always be complex(r=NaN,
>  >> > i=NA_real_), like for NA_complex_, or by allowing the
>  >> > infinite variations that are currently allowed and at the
>  >> > same time making sure that both Re() and Im()  always
>  >> > return NA_real_ on a complex NA.
>  >>
>  >> > My point is that the behavior of complex NA should be
>  >> > predictable. Right now it's not. Once it's predictable
>  >> > (with Re() and Im() both returning NA_real_ regardless of
>  >> > internal representation), then it no longer matters what
>  >> > kind of complex NA is returned by as.complex(NA_real_),
>  >> > because they are no onger distinguishable.
>  >>
>  >> > H.
>  >>
>  >> > On 9/22/23 13:43, Duncan Murdoch wrote:
>  >> >> Since the result of is.na(x) is the same on each of
>  >> >> those, I don't see a problem.  As long as that is
>  >> >> consistent, I don't see a problem. You shouldn't be using
>  >> >> any other test for NA-ness.  You should never be
>  >> >> expecting identical() to treat different types as the
>  >> >> same (e.g.  identical(NA, NA_real_) is FALSE, as it
>  >> >> should be).  If you are using a different test, that's
>  >> >> user error.
>  >> >>
>  >> >> Duncan Murdoch
>  >> >>
>  >> >> On 22/09/2023 2:41 p.m., Hervé Pagès wrote:
>  >> >>> We could also question the value of having an infinite
>  >> >>> number of NA representations in the complex space. For
>  >> >>> example all these complex values are displayed the same
>  >> >>> way (as NA), are considered NAs by is.na(), but are not
>  >> >>> identical or semantically equivalent (from an Re() or
>  >> >>> Im() point of view):
>  >> >>>
>  >> >>>       NA_real_ + 0i
>  >> >>>
>  >> >>>       complex(r=NA_real_, i=Inf)
>  >> >>>
>  >> >>>       complex(r=2, i=NA_real_)
>  >> >>>
>  >> >>>       complex(r=NaN, i=NA_real_)
>  >> >>>
>  >> >>> In other words, using a single representation for
>  >> >>> complex NA (i.e.  complex(r=NA_real_, i=NA_real_)) would
>  >> >>> avoid a lot of unnecessary complications and surprises.
>  >> >>>
>  >> >>> Once you do that, whether as.co

Re: [Rd] Recent changes to as.complex(NA_real_)

2023-09-23 Thread Hervé Pagès
Hi Martin,

On 9/23/23 06:43, Martin Maechler wrote:
>>>>>> Hervé Pagès
>>>>>>  on Fri, 22 Sep 2023 16:55:05 -0700 writes:
>  > The problem is that you have things that are
>  > **semantically** different but look exactly the same:
>
>  > They look the same:
>
>  >> x
>  > [1] NA
>  >> y
>  > [1] NA
>  >> z
>  > [1] NA
>
>  >> is.na(x)
>  > [1] TRUE
>  >> is.na(y)
>  > [1] TRUE
>  >> is.na(z)
>  > [1] TRUE
>
>  >> str(x)
>  >   cplx NA
>  >> str(y)
>  >   num NA
>  >> str(z)
>  >   cplx NA
>
>  > but they are semantically different e.g.
>
>  >> Re(x)
>  > [1] NA
>  >> Re(y)
>  > [1] -0.5  # surprise!
>
>  >> Im(x)  # surprise!
>  > [1] 2
>  >> Im(z)
>  > [1] NA
>
>  > so any expression involving Re() or Im() will produce
>  > different results on input that look the same on the
>  > surface.
>
>  > You can address this either by normalizing the internal
>  > representation of complex NA to always be complex(r=NaN,
>  > i=NA_real_), like for NA_complex_, or by allowing the
>  > infinite variations that are currently allowed and at the
>  > same time making sure that both Re() and Im()  always
>  > return NA_real_ on a complex NA.
>
>  > My point is that the behavior of complex NA should be
>  > predictable. Right now it's not. Once it's predictable
>  > (with Re() and Im() both returning NA_real_ regardless of
>  > internal representation), then it no longer matters what
>  > kind of complex NA is returned by as.complex(NA_real_),
>  > because they are no onger distinguishable.
>
>  > H.
>
>  > On 9/22/23 13:43, Duncan Murdoch wrote:
>  >> Since the result of is.na(x) is the same on each of
>  >> those, I don't see a problem.  As long as that is
>  >> consistent, I don't see a problem. You shouldn't be using
>  >> any other test for NA-ness.  You should never be
>  >> expecting identical() to treat different types as the
>  >> same (e.g.  identical(NA, NA_real_) is FALSE, as it
>  >> should be).  If you are using a different test, that's
>  >> user error.
>  >>
>  >> Duncan Murdoch
>  >>
>  >> On 22/09/2023 2:41 p.m., Hervé Pagès wrote:
>  >>> We could also question the value of having an infinite
>  >>> number of NA representations in the complex space. For
>  >>> example all these complex values are displayed the same
>  >>> way (as NA), are considered NAs by is.na(), but are not
>  >>> identical or semantically equivalent (from an Re() or
>  >>> Im() point of view):
>  >>>
>  >>>       NA_real_ + 0i
>  >>>
>  >>>       complex(r=NA_real_, i=Inf)
>  >>>
>  >>>       complex(r=2, i=NA_real_)
>  >>>
>  >>>       complex(r=NaN, i=NA_real_)
>  >>>
>  >>> In other words, using a single representation for
>  >>> complex NA (i.e.  complex(r=NA_real_, i=NA_real_)) would
>  >>> avoid a lot of unnecessary complications and surprises.
>  >>>
>  >>> Once you do that, whether as.complex(NA_real_) should
>  >>> return complex(r=NA_real_, i=0) or complex(r=NA_real_,
>  >>> i=NA_real_) becomes a moot point.
>  >>>
>  >>> Best,
>  >>>
>  >>> H.
>
> Thank you, Hervé.
> Your proposition is yet another one,
> to declare that all complex NA's should be treated as identical
> (almost/fully?) everywhere.
>
> This would be a possibility, but I think a drastic one.
>
> I think there are too many cases, where I want to keep the
> information of the real part independent of the values of the
> imaginary part (e.g. think of the Riemann hypothesis), and
> typically vice versa.
Use NaN for that, not NA.
>
> With your proposal, for a (potentially large) vector of complex numbers,
> after
>Re(z)  <-  1/2
>
> I could no longer rely on   Re(z) == 1/2,
> because it would be wrong for those z where (the imaginary part/ the number)
> was NA/NaN.

My proposal is to do this only if the Re and/or Im pa

Re: [Rd] Recent changes to as.complex(NA_real_)

2023-09-22 Thread Hervé Pagès
On 9/22/23 16:55, Hervé Pagès wrote:

> The problem is that you have things that are **semantically** 
> different but look exactly the same:
>
> They look the same:
>
> > x
> [1] NA
> > y
> [1] NA
> > z
> [1] NA
>
> > is.na(x)
> [1] TRUE
> > is.na(y)
> [1] TRUE
> > is.na(z)
> [1] TRUE
>
> > str(x)
>  cplx NA
> > str(y)
>  num NA
>
oops, that was supposed to be:

 > str(y)
  cplx NA

but somehow I managed to copy/paste the wrong thing, sorry.

H.

-- 
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Recent changes to as.complex(NA_real_)

2023-09-22 Thread Hervé Pagès
The problem is that you have things that are **semantically** different 
but look exactly the same:

They look the same:

 > x
[1] NA
 > y
[1] NA
 > z
[1] NA

 > is.na(x)
[1] TRUE
 > is.na(y)
[1] TRUE
 > is.na(z)
[1] TRUE

 > str(x)
  cplx NA
 > str(y)
  num NA
 > str(z)
  cplx NA

but they are semantically different e.g.

 > Re(x)
[1] NA
 > Re(y)
[1] -0.5  # surprise!

 > Im(x)  # surprise!
[1] 2
 > Im(z)
[1] NA

so any expression involving Re() or Im() will produce different results 
on input that look the same on the surface.

You can address this either by normalizing the internal representation 
of complex NA to always be complex(r=NaN, i=NA_real_), like for 
NA_complex_, or by allowing the infinite variations that are currently 
allowed and at the same time making sure that both Re() and Im()  always 
return NA_real_ on a complex NA.

My point is that the behavior of complex NA should be predictable. Right 
now it's not. Once it's predictable (with Re() and Im() both returning 
NA_real_ regardless of internal representation), then it no longer 
matters what kind of complex NA is returned by as.complex(NA_real_), 
because they are no onger distinguishable.

H.

On 9/22/23 13:43, Duncan Murdoch wrote:
> Since the result of is.na(x) is the same on each of those, I don't see 
> a problem.  As long as that is consistent, I don't see a problem. You 
> shouldn't be using any other test for NA-ness.  You should never be 
> expecting identical() to treat different types as the same (e.g. 
> identical(NA, NA_real_) is FALSE, as it should be).  If you are using 
> a different test, that's user error.
>
> Duncan Murdoch
>
> On 22/09/2023 2:41 p.m., Hervé Pagès wrote:
>> We could also question the value of having an infinite number of NA
>> representations in the complex space. For example all these complex
>> values are displayed the same way (as NA), are considered NAs by
>> is.na(), but are not identical or semantically equivalent (from an Re()
>> or Im() point of view):
>>
>>       NA_real_ + 0i
>>
>>       complex(r=NA_real_, i=Inf)
>>
>>       complex(r=2, i=NA_real_)
>>
>>       complex(r=NaN, i=NA_real_)
>>
>> In other words, using a single representation for complex NA (i.e.
>> complex(r=NA_real_, i=NA_real_)) would avoid a lot of unnecessary
>> complications and surprises.
>>
>> Once you do that, whether as.complex(NA_real_) should return
>> complex(r=NA_real_, i=0) or complex(r=NA_real_, i=NA_real_) becomes a
>> moot point.
>>
>> Best,
>>
>> H.
>>
>> On 9/22/23 03:38, Martin Maechler wrote:
>>>>>>>> Mikael Jagan
>>>>>>>>   on Thu, 21 Sep 2023 00:47:39 -0400 writes:
>>>   > Revisiting this thread from April:
>>>
>>> >https://stat.ethz.ch/pipermail/r-devel/2023-April/082545.html
>>>
>>>   > where the decision (not yet backported) was made for
>>>   > as.complex(NA_real_) to give NA_complex_ instead of
>>>   > complex(r=NA_real_, i=0), to be consistent with
>>>   > help("as.complex") and as.complex(NA) and 
>>> as.complex(NA_integer_).
>>>
>>>   > Was any consideration given to the alternative?
>>>   > That is, to changing as.complex(NA) and 
>>> as.complex(NA_integer_) to
>>>   > give complex(r=NA_real_, i=0), consistent with
>>>   > as.complex(NA_real_), then amending help("as.complex")
>>>   > accordingly?
>>>
>>> Hmm, as, from R-core, mostly I was involved, I admit to say "no",
>>> to my knowledge the (above) alternative wasn't considered.
>>>
>>>     > The principle that
>>>     > Im(as.complex()) should be zero
>>>     > is quite fundamental, in my view, hence the "new" behaviour
>>>     > seems to really violate the principle of least surprise ...
>>>
>>> of course "least surprise"  is somewhat subjective.  Still,
>>> I clearly agree that the above would be one desirable property.
>>>
>>> I think that any solution will lead to *some* surprise for some
>>> cases, I think primarily because there are *many* different
>>> values z  for which  is.na(z)  is true,  and in any case
>>> NA_complex_  is only of the many.
>>>
>>> I also agree with Mikael that we should reconsider the issue
>>> that was raised by Davis Vaughan here ("on R-devel") last April.
>>>
>>>   > Another (but maybe weaker) argument is that
>>

Re: [Rd] Recent changes to as.complex(NA_real_)

2023-09-22 Thread Hervé Pagès
 indeed, but I think
> we should try to look at it only *secondary* to your first
> proposal.
>
>  > Whatever decision is made about as.complex(NA_real_),
>  > maybe these points should be weighed before it becomes part of
>  > R-release ...
>
>  > Mikael
>
> Indeed.
>
> Can we please get other opinions / ideas here?
>
> Thank you in advance for your thoughts!
> Martin
>
> ---
>
> PS:
>
>   Our *print()*ing  of complex NA's ("NA" here meaning NA or NaN)
>   is also unsatisfactory, e.g. in the case where all entries of a
>   vector are NA in the sense of is.na(.), but their
>   Re() and Im() are not all NA:
>   
>showC <- function(z) noquote(sprintf("(R = %g, I = %g)", Re(z), Im(z)))
>z <- complex(, c(11, NA, NA), c(NA, 99, NA))
>z
>showC(z)
>
> gives
>
>> z
>[1] NA NA NA
>> showC(z)
>[1] (R = 11, I = NA) (R = NA, I = 99) (R = NA, I = NA)
>
> but that (printing of complex) *is* another issue,
> in which we have the re-opened bugzilla PR#16752
>  ==>https://bugs.r-project.org/show_bug.cgi?id=16752
>
> on which we also worked during the R Sprint in Warwick three
> weeks ago, and where I want to commit changes in any case {but
> think we should change even a bit more than we got to during the
> Sprint}.
>
> __
> R-devel@r-project.org  mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] FYI: daily R source tarballs from ETH: *.xz instead of *.bz2)

2023-09-12 Thread Hervé Pagès
On 9/11/23 22:39, Prof Brian Ripley wrote:

> On 09/09/2023 01:56, Hervé Pagès wrote:
>> Hi Martin,
>>
>> Sounds good. Are there any plans to support the xz compression for
>> package source tarballs?
>
> What makes you think it is not supported?

I guess because I've never seen source tarballs distributed as .xz files 
but it's good to know that 'R CMD build' and 'R CMD INSTALL' support that.

So let me reformulate my question: do CRAN have any plans to switch from 
.tar.gz to .xz for the distribution of source tarballs? Is this 
something that tools like write_PACKAGES(), available.packages(), and 
install.packages() would be able to handle? Would they be able to handle 
a mix of .tar.gz and .xz packages? (Which would be important for a 
smooth transition from .tar.gz to .xz across CRAN/Bioconductor.)

I'm just trying to get a sense if the effort to reduce bandwidth will go 
beyond the distribution of R source snapshots.

Thanks,

H.

-- 
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] FYI: daily R source tarballs from ETH: *.xz instead of *.bz2)

2023-09-08 Thread Hervé Pagès
Hi Martin,

Sounds good. Are there any plans to support the xz compression for 
package source tarballs?

Thanks,

H.

On 9/8/23 06:44, Martin Maechler wrote:
> A quick notice for anyone who uses cron-like scripts to get
> R source tarballs from the ETH  R/daily/ s:
>
> I've finally switched to replace *.bz2 by *.xz which does save
> quite a bit of bandwidth.
>
> Currently, you can see the 2 day old *.bz2 (and their sizes) and
> compare with the new  *.xz one  (sorted newest first):
>
>https://stat.ethz.ch/R/daily/?C=M;O=D
>
>
> Best,
> Martin
>
> __
> R-devel@r-project.org  mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] codetools wrongly complains about lazy evaluation in S4 methods

2023-06-15 Thread Hervé Pagès
Oh but I see now that you've already tried this in your R/AllGenerics.R, 
sorry for missing that, but that you worry about the following message 
being disruptive on CRAN:

     The following object is masked from 'package:base':

     qr.X

Why would that be? As long as you only define methods for objects that 
**you** control everything is fine. In other words you're not allowed to 
define a method for "qr" objects because that method would override 
base::qr.X(). But the generic itself and the method that you define for 
your objects don't override anything so should not disrupt anything.

H.

On 6/15/23 13:51, Hervé Pagès wrote:
>
> I'd argue that at the root of the problem is that your qr.X() generic 
> dispatches on all its arguments, including the 'ncol' argument which I 
> think the dispatch mechanism needs to evaluate **before** dispatch can 
> actually happen.
>
> So yes lazy evaluation is a real feature but it does not play well for 
> arguments of a generic that are involved in the dispatch.
>
> If you explicitly defined your generic with:
>
>    setGeneric("qr.X", signature="qr")
>
> you should be fine.
>
> More generally speaking, it's a good idea to restrict the signature of 
> a generic to the arguments "that make sense". For unary operations 
> this is usually the 1st argument, for binary operations the first two 
> arguments etc... Additional arguments that control the operation like 
> modiflers, toggles, flags, rng seed, and other parameters, usually 
> have not place in the signature of the generic.
>
> H.
>
> On 6/14/23 20:57, Mikael Jagan wrote:
>> Thanks all - yes, I think that Simon's diagnosis ("user error") is 
>> correct:
>> in this situation one should define a reasonable generic function 
>> explicitly,
>> with a call to setGeneric, and not rely on the call inside of 
>> setMethod ...
>>
>> But it is still not clear what the way forward should be (for package 
>> Matrix,
>> where we would like to export a method for 'qr.X').  If we do 
>> nothing, then
>> there is the note, already mentioned:
>>
>>     * checking R code for possible problems ... NOTE
>>     qr.X: no visible binding for global variable ‘R’
>>     Undefined global functions or variables:
>>   R
>>
>> If we add the following to our R/AllGenerics.R :
>>
>>     setGeneric("qr.X",
>>    function(qr, complete = FALSE, ncol, ...)
>>    standardGeneric("qr.X"),
>>    useAsDefault = function(qr, complete = FALSE, ncol, 
>> ...) {
>>    if(missing(ncol))
>>    base::qr.X(qr, complete = complete)
>>    else base::qr.X(qr, complete = complete, ncol = ncol)
>>    },
>>    signature = "qr")
>>
>> then we get a startup message, which would be quite disruptive on CRAN :
>>
>>     The following object is masked from 'package:base':
>>
>>     qr.X
>>
>> and if we further add setGenericImplicit("qr.X", restore = (TRUE|FALSE))
>> to our R/zzz.R, then for either value of 'restore' we encounter :
>>
>>     ** testing if installed package can be loaded from temporary 
>> location
>>     Error: package or namespace load failed for 'Matrix':
>>  Function found when exporting methods from the namespace 
>> 'Matrix' which is not S4 generic: 'qr.X'
>>
>> Are there possibilities that I have missed?
>>
>> It seems to me that the best option might be to define an implicit 
>> generic
>> 'qr.X' in methods via '.initImplicitGenerics' in 
>> methods/R/makeBasicFunsList.R,
>> where I see that an implicit generic 'qr.R' is already defined ... ?
>>
>> The patch pasted below "solves everything", though we'd still have to 
>> think
>> about how to work for versions of R without the patch ...
>>
>> Mikael
>>
>> Index: src/library/methods/R/makeBasicFunsList.R
>> ===
>> --- src/library/methods/R/makeBasicFunsList.R    (revision 84541)
>> +++ src/library/methods/R/makeBasicFunsList.R    (working copy)
>> @@ -263,6 +263,17 @@
>>     signature = "qr", where = where)
>>  setGenericImplicit("qr.R", where, FALSE)
>>
>> +    setGeneric("qr.X",
>> +   function(qr, complete = FALSE, ncol

Re: [Rd] codetools wrongly complains about lazy evaluation in S4 methods

2023-06-15 Thread Hervé Pagès
all, it should only be part of the method implementation. If one was 
>> to implement the same default behavior in the generic itself (not 
>> necessarily a good idea) the default would be ncol = if (complete) 
>> nrow(qr.R(qr, TRUE)) else min(dim(qr.R(qr, TRUE))) to not rely on the 
>> internals of the implementation.
>>
>> Cheers,
>> Simon
>>
>>
>>> On 14/06/2023, at 6:03 AM, Kasper Daniel Hansen 
>>>  wrote:
>>>
>>> On Sat, Jun 3, 2023 at 11:51 AM Mikael Jagan  
>>> wrote:
>>>
>>>> The formals of the newly generic 'qr.X' are inherited from the 
>>>> non-generic
>>>> function in the base namespace.  Notably, the inherited default 
>>>> value of
>>>> formal argument 'ncol' relies on lazy evaluation:
>>>>
>>>>> formals(qr.X)[["ncol"]]
>>>>  if (complete) nrow(R) else min(dim(R))
>>>>
>>>> where 'R' must be defined in the body of any method that might 
>>>> evaluate
>>>> 'ncol'.
>>>>
>>>
>>> Perhaps I am misunderstanding something, but I think Mikael's 
>>> expectations
>>> about the scoping rules of R are wrong.  The enclosing environment 
>>> of ncol
>>> is where it was _defined_ not where it is _called_ (apologies if I am
>>> messing up the computer science terminology here).
>>>
>>> This suggests to me that codetools is right.  But a more extended 
>>> example
>>> would be useful. Perhaps there is something special with setOldClass()
>>> which I am no aware of.
>>>
>>> Also, Bioconductor has 100s of packages with S4 where codetools 
>>> works well.
>>>
>>> Kasper
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] issue with .local() hack used in S4 methods

2023-05-20 Thread Hervé Pagès

Hi,

Just ran across this:

    foo <- function(x, ..., z=22) z

    setMethod("foo", "character", function(x, y=-5, z=22) y)
    # Creating a generic function from function ‘foo’ in the global 
environment


Then:

    foo("a")
    # [1] 22

Should return -5, not 22.

That's because the call to .local() used internally by the foo() method 
does not name the arguments placed after the ellipsis:



selectMethod("foo", "character")

Method Definition:

function (x, ..., z = 22)
{
    .local <- function (x, y = 5, z = 22)
    y
    .local(x, ..., z)  <--- should be .local(x, ..., z=z)
}

Thanks,

H.


sessionInfo()

R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 23.04

Matrix products: default
BLAS:   /home/hpages/R/R-4.3.0/lib/libRblas.so
LAPACK: /home/hpages/R/R-4.3.0/lib/libRlapack.so;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8    LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: America/Los_Angeles
tzcode source: system (glibc)

attached base packages:
[1] stats graphics  grDevices utils datasets  methods base

loaded via a namespace (and not attached):
[1] compiler_4.3.0   codetools_0.2-19

--
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] mapply(): Special case of USE.NAMES=TRUE with recent R-devel updates

2021-11-30 Thread Hervé Pagès

And also:

  > mapply(paste, c(a="A"), character(), USE.NAMES = TRUE)

  Error in names(answer) <- names1 :

'names' attribute [1] must be the same length as the vector [0]


When the shortest arguments get recycled to the length of the longest, 
shouldn't their names also get recycled?


  > mapply(paste, c(a="A", b="B"), letters[1:6], USE.NAMES=TRUE)

  a b

  "A a" "B b" "A c" "B d" "A e" "B f"

That's assuming that rep() accurately materializes recycling (I hope it 
does):


  > rep(c(a="A", b="B"), length.out=6)

a   b   a   b   a   b

  "A" "B" "A" "B" "A" "B"


  > rep(c(a="A", b="B"), length.out=0)

  named character(0)


I always wished that the process of recycling which happens everywhere 
all the time in R was implemented in its own dedicated function 
recycle(). But that's another story.


Anyways, back to mapply(): Once what happens to the names during 
recycling is clarified, there should be no need to be explicit about 
what should happen when the length "of the first ... argument" is zero 
because it will no longer be a special case.


Cheers,
H.


On 30/11/2021 22:10, Henrik Bengtsson wrote:

Hi,

in R-devel (4.2.0), we now get:


mapply(paste, "A", character(), USE.NAMES = TRUE)

named list()

Now, in ?mapply we have:

USE.NAMES: logical; use the names of the first ... argument, or if
that is an unnamed character vector, use that vector as the names.

This basically says we should get:


answer <- list()
first <- "A"
names(answer) <- first


which obviously is an error. The help is not explicit what should
happen when the length "of the first ... argument" is zero, but the
above behavior effectively does something like:


answer <- list()
first <- "A"
names(answer) <- first[seq_along(answer)]
answer

named list()

Is there a need for the docs to be updated, or should the result be an
unnamed empty list?

/Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] How can a package be aware of whether it's on CRAN

2021-11-23 Thread Hervé Pagès

But why would you need to check for anything in the first place?

If you only use 2 cores in your examples, vignettes, and unit tests, 'R 
CMD check' will run fine everywhere and not eat all the CPU power of the 
machine where it's running.


H.

On 23/11/2021 12:05, Gábor Csárdi wrote:

On Tue, Nov 23, 2021 at 8:49 PM Henrik Bengtsson
 wrote:



Is there any reliable way to let packages to know if they are on CRAN, so they 
can set omp cores to 2 by default?


Instead of testing for "on CRAN" or not, you can test for 'R CMD
check' running or not. 'R CMD check' sets environment variable
_R_CHECK_LIMIT_CORES_=TRUE. You can use that to limit your code to run
at most two (2) parallel threads or processes.


AFAICT this is only set with --as-cran and many CRAN machines don't
use that and I am fairly sure that some of them don't set this env var
manually, either.

Gabor

[...]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Spurious warnings in coercion from double/complex/character to raw

2021-09-10 Thread Hervé Pagès




On 10/09/2021 12:53, brodie gaslam wrote:



On Friday, September 10, 2021, 03:13:54 PM EDT, Hervé Pagès 
 wrote:

Good catch, thanks!

Replacing

  if(ISNAN(vi) || (tmp = (int) vi) < 0 || tmp > 255) {
  tmp = 0;

  warn |= WARN_RAW;

  }
  pa[i] = (Rbyte) tmp;

with

  if(ISNAN(vi) || vi <= -1.0 || vi >= 256.0)
    {
  tmp = 0;

  warn |= WARN_RAW;

  } else {
  tmp = (int) vi;
  }
  pa[i] = (Rbyte) tmp;

should address that.

FWIW IntegerFromReal() has a similar risk of int overflow
(src/main/coerce.c, lines 128-138):


    int attribute_hidden

    IntegerFromReal(double x, int *warn)

    {

    if (ISNAN(x))

    return NA_INTEGER;

    else if (x >= INT_MAX+1. || x <= INT_MIN ) {

    *warn |= WARN_INT_NA;

    return NA_INTEGER;

    }

    return (int) x;

    }



The cast to int will also be an int overflow situation if x is > INT_MAX
and < INT_MAX+1 so the risk is small!


I might be being dense, but it feels this isn't a problem?  Quoting C99
6.3.1.4 again (emph added):


When a finite value of real floating type is converted to an integer
type other than _Bool, **the fractional part is discarded** (i.e., the
value is truncated toward zero). If the value of the integral part
cannot be represented by the integer type, the behavior is undefined.50)


Does the "fractional part is discarded" not save us here?


I think it does. Thanks for clarifying and sorry for the false positive!

H.



Best,

B.




--
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Spurious warnings in coercion from double/complex/character to raw

2021-09-10 Thread Hervé Pagès




On 10/09/2021 09:12, Duncan Murdoch wrote:

On 10/09/2021 11:29 a.m., Hervé Pagès wrote:

Hi,

The first warning below is unexpected and confusing:

    > as.raw(c(3e9, 5.1))
    [1] 00 05
    Warning messages:
    1: NAs introduced by coercion to integer range
    2: out-of-range values treated as 0 in coercion to raw

The reason we get it is that coercion from numeric to raw is currently
implemented on top of coercion from numeric to int (file
src/main/coerce.c, lines 700-710):

  case REALSXP:
  for (i = 0; i < n; i++) {
//  if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt();
  tmp = IntegerFromReal(REAL_ELT(v, i), &warn);
  if(tmp == NA_INTEGER || tmp < 0 || tmp > 255) {
  tmp = 0;
  warn |= WARN_RAW;
  }
  pa[i] = (Rbyte) tmp;
  }
  break;

The first warning comes from the call to IntegerFromReal().

The following code avoids the spurious warning and is also simpler and
slightly faster:

  case REALSXP:
  for (i = 0; i < n; i++) {
//  if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt();
  double vi = REAL_ELT(v, i);
  if(ISNAN(vi) || (tmp = (int) vi) < 0 || tmp > 255) {
  tmp = 0;
  warn |= WARN_RAW;
  }
  pa[i] = (Rbyte) tmp;
  }
  break;


Doesn't that give different results in case vi is so large that "(int) 
vi" overflows?  (I don't know what the C standard says, but some online 
references say that behaviour is implementation dependent.)


For example, if

   vi = 1.0 +  INT_MAX;

wouldn't "(int) vi" be equal to a small integer?


Good catch, thanks!

Replacing

if(ISNAN(vi) || (tmp = (int) vi) < 0 || tmp > 255) {
tmp = 0;

warn |= WARN_RAW;

}
pa[i] = (Rbyte) tmp;

with

if(ISNAN(vi) || vi <= -1.0 || vi >= 256.0)
 {
tmp = 0;

warn |= WARN_RAW;

} else {
tmp = (int) vi;
}
pa[i] = (Rbyte) tmp;

should address that.

FWIW IntegerFromReal() has a similar risk of int overflow 
(src/main/coerce.c, lines 128-138):


  int attribute_hidden

  IntegerFromReal(double x, int *warn)

  {

  if (ISNAN(x))

  return NA_INTEGER;

  else if (x >= INT_MAX+1. || x <= INT_MIN ) {

  *warn |= WARN_INT_NA;

  return NA_INTEGER;

  }

  return (int) x;

  }



The cast to int will also be an int overflow situation if x is > INT_MAX 
and < INT_MAX+1 so the risk is small! There are other instances of this 
situation in IntegerFromComplex() and IntegerFromString().


More below...



Duncan Murdoch




Coercion from complex to raw has the same problem:

    > as.raw(c(3e9+0i, 5.1))
    [1] 00 05
    Warning messages:
    1: NAs introduced by coercion to integer range
    2: out-of-range values treated as 0 in coercion to raw

Current implementation (file src/main/coerce.c, lines 711-721):

  case CPLXSXP:
  for (i = 0; i < n; i++) {
//  if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt();
  tmp = IntegerFromComplex(COMPLEX_ELT(v, i), &warn);
  if(tmp == NA_INTEGER || tmp < 0 || tmp > 255) {
  tmp = 0;
  warn |= WARN_RAW;
  }
  pa[i] = (Rbyte) tmp;
  }
  break;

This implementation has the following additional problem when the
supplied complex has a nonzero imaginary part:

    > as.raw(300+4i)
    [1] 00
    Warning messages:
    1: imaginary parts discarded in coercion
    2: out-of-range values treated as 0 in coercion to raw

    > as.raw(3e9+4i)
    [1] 00
    Warning messages:
    1: NAs introduced by coercion to integer range
    2: out-of-range values treated as 0 in coercion to raw

In one case we get a warning about the discarding of the imaginary part
but not the other case, which is unexpected. We should see the exact
same warning (or warnings) in both cases.

With the following fix we only get the warning about the discarding of
the imaginary part if we are not in a "out-of-range values treated as 0
in coercion to raw" situation:

  case CPLXSXP:
  for (i = 0; i < n; i++) {
//  if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt();
  Rcomplex vi = COMPLEX_ELT(v, i);
  if(ISNAN(vi.r) || ISNAN(vi.i) || (tmp = (int) vi.r) < 0 ||
tmp > 255) {
  tmp = 0;
  warn |= WARN_RAW;
  } else {
  if(vi.i != 0.0)
  warn |= WARN_IMAG;
  }
  pa[i] = (Rbyte) tmp;
  }
  break;


Corrected version:

if(ISNAN(vi.r) || ISNAN(vi.i) || vi.r <= -1.00 ||
 vi.r >= 256.00) {

tmp = 0;

warn |= WARN_RAW;

} else {

tmp = (int) vi.r;
if(vi.i != 0.0)

 

Re: [Rd] Unneeded if statements in RealFromComplex C code

2021-09-10 Thread Hervé Pagès

Thanks Martin!

Best,
H.

On 10/09/2021 02:24, Martin Maechler wrote:

Hervé Pagès
 on Thu, 9 Sep 2021 17:54:06 -0700 writes:


 > Hi,

 > I just stumbled across these 2 lines in RealFromComplex (lines 208 & 209
 > in src/main/coerce.c):

 > double attribute_hidden
 > RealFromComplex(Rcomplex x, int *warn)
 > {
 >   if (ISNAN(x.r) || ISNAN(x.i))
 >   return NA_REAL;
 >   if (ISNAN(x.r)) return x.r;<- line 208
 >   if (ISNAN(x.i)) return NA_REAL;<- line 209
 >   if (x.i != 0)
 >  *warn |= WARN_IMAG;
 >   return x.r;
 > }

 > They were added in 2015 (revision 69410).

by me.  "Of course" the intent at the time was to  *replace* the
previous 2 lines and return NA/NaN of the "exact same kind"

but in the mean time, I have learned that trying to preserve
exact *kinds* of NaN / NA is typically not platform portable,
anyway because compiler/library optimizations and
implementations are pretty "free to do what they want" with these.

 > They don't serve any purpose and might slow things down a little (unless
 > compiler optimization is able to ignore them). In any case they should
 > probably be removed.

I've cleaned up now, indeed back compatibly, i.e., removing both
lines as you suggested.

Thank you, Hervé!

Martin


 > Cheers,
 > H.

 > --
 > Hervé Pagès

 > Bioconductor Core Team
 > hpages.on.git...@gmail.com



--
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Spurious warnings in coercion from double/complex/character to raw

2021-09-10 Thread Hervé Pagès

Hi,

The first warning below is unexpected and confusing:

  > as.raw(c(3e9, 5.1))
  [1] 00 05
  Warning messages:
  1: NAs introduced by coercion to integer range
  2: out-of-range values treated as 0 in coercion to raw

The reason we get it is that coercion from numeric to raw is currently 
implemented on top of coercion from numeric to int (file 
src/main/coerce.c, lines 700-710):


case REALSXP:
for (i = 0; i < n; i++) {
//  if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt();
tmp = IntegerFromReal(REAL_ELT(v, i), &warn);
if(tmp == NA_INTEGER || tmp < 0 || tmp > 255) {
tmp = 0;
warn |= WARN_RAW;
}
pa[i] = (Rbyte) tmp;
}
break;

The first warning comes from the call to IntegerFromReal().

The following code avoids the spurious warning and is also simpler and 
slightly faster:


case REALSXP:
for (i = 0; i < n; i++) {
//  if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt();
double vi = REAL_ELT(v, i);
if(ISNAN(vi) || (tmp = (int) vi) < 0 || tmp > 255) {
tmp = 0;
warn |= WARN_RAW;
}
pa[i] = (Rbyte) tmp;
}
break;

Coercion from complex to raw has the same problem:

  > as.raw(c(3e9+0i, 5.1))
  [1] 00 05
  Warning messages:
  1: NAs introduced by coercion to integer range
  2: out-of-range values treated as 0 in coercion to raw

Current implementation (file src/main/coerce.c, lines 711-721):

case CPLXSXP:
for (i = 0; i < n; i++) {
//  if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt();
tmp = IntegerFromComplex(COMPLEX_ELT(v, i), &warn);
if(tmp == NA_INTEGER || tmp < 0 || tmp > 255) {
tmp = 0;
warn |= WARN_RAW;
}
pa[i] = (Rbyte) tmp;
}
break;

This implementation has the following additional problem when the 
supplied complex has a nonzero imaginary part:


  > as.raw(300+4i)
  [1] 00
  Warning messages:
  1: imaginary parts discarded in coercion
  2: out-of-range values treated as 0 in coercion to raw

  > as.raw(3e9+4i)
  [1] 00
  Warning messages:
  1: NAs introduced by coercion to integer range
  2: out-of-range values treated as 0 in coercion to raw

In one case we get a warning about the discarding of the imaginary part 
but not the other case, which is unexpected. We should see the exact 
same warning (or warnings) in both cases.


With the following fix we only get the warning about the discarding of 
the imaginary part if we are not in a "out-of-range values treated as 0 
in coercion to raw" situation:


case CPLXSXP:
for (i = 0; i < n; i++) {
//  if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt();
Rcomplex vi = COMPLEX_ELT(v, i);
if(ISNAN(vi.r) || ISNAN(vi.i) || (tmp = (int) vi.r) < 0 || 
tmp > 255) {

tmp = 0;
warn |= WARN_RAW;
} else {
if(vi.i != 0.0)
warn |= WARN_IMAG;
}
pa[i] = (Rbyte) tmp;
}
break;

Finally, coercion from character to raw has the same problem and its 
code can be fixed in a similar manner:


  > as.raw(c("3e9", 5.1))
  [1] 00 05
  Warning messages:
  1: NAs introduced by coercion to integer range
  2: out-of-range values treated as 0 in coercion to raw

Cheers,
H.


--
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Unneeded if statements in RealFromComplex C code

2021-09-09 Thread Hervé Pagès

Hi,

I just stumbled across these 2 lines in RealFromComplex (lines 208 & 209 
in src/main/coerce.c):


  double attribute_hidden

  RealFromComplex(Rcomplex x, int *warn)

  {

  if (ISNAN(x.r) || ISNAN(x.i))

  return NA_REAL;

  if (ISNAN(x.r)) return x.r;
  <- line 208
  if (ISNAN(x.i)) return NA_REAL;
  <- line 209
  if (x.i != 0)

  *warn |= WARN_IMAG;

  return x.r;

  }


They were added in 2015 (revision 69410).

They don't serve any purpose and might slow things down a little (unless 
compiler optimization is able to ignore them). In any case they should 
probably be removed.


Cheers,
H.

--
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] surprised matrix (1:256, 8, 8) doesn't cause error/warning

2021-02-22 Thread Hervé Pagès

Hi Martin,

It kind of does make sense to issue the warning when **recycling** (and 
this is consistent with what happens with recycling in general):


  > matrix(1:4, 6, 6)
   [,1] [,2] [,3] [,4] [,5] [,6]
  [1,]131313
  [2,]242424
  [3,]313131
  [4,]424242
  [5,]131313
  [6,]242424

  > matrix(1:4, 5, 6)
   [,1] [,2] [,3] [,4] [,5] [,6]
  [1,]123412
  [2,]234123
  [3,]341234
  [4,]412341
  [5,]123412
  Warning message:
  In matrix(1:4, 5, 6) :
data length [4] is not a sub-multiple or multiple of the number of 
rows [5]


(Note that the warning is misleading. matrix() is happy to take data 
with a length that is not a sub-multiple of the number of rows or cols 
as long as it's a sub-multiple of the length of the matrix.)


However I'm not sure that **truncating** the data is desirable behavior:

  > matrix(1:6, 1, 3)
   [,1] [,2] [,3]
  [1,]123

  > matrix(1:6, 1, 5)
   [,1] [,2] [,3] [,4] [,5]
  [1,]12345
  Warning message:
  In matrix(1:6, 1, 5) :
  data length [6] is not a sub-multiple or multiple of the number of 
columns [5]


Maybe you get a warning sometimes, if you are lucky, but still.

Finally note that you never get any warning with array():

  > array(1:4, c(5, 6))
   [,1] [,2] [,3] [,4] [,5] [,6]
  [1,]123412
  [2,]234123
  [3,]341234
  [4,]412341
  [5,]123412

  > array(1:6, c(1, 5))
   [,1] [,2] [,3] [,4] [,5]
  [1,]12345

Cheers,
H.


On 2/1/21 1:08 AM, Martin Maechler wrote:

Abby Spurdle (/əˈbi/)
 on Mon, 1 Feb 2021 19:50:32 +1300 writes:


 > I'm a little surprised that the following doesn't trigger an error or a 
warning.
 > matrix (1:256, 8, 8)

 > The help file says that the main argument is recycled, if it's too short.
 > But doesn't say what happens if it's too long.

It's somewhat subtler than one may assume :


matrix(1:9, 2,3)

  [,1] [,2] [,3]
[1,]135
[2,]246
Warning message:
In matrix(1:9, 2, 3) :
   data length [9] is not a sub-multiple or multiple of the number of rows [2]


matrix(1:8, 2,3)

  [,1] [,2] [,3]
[1,]135
[2,]246
Warning message:
In matrix(1:8, 2, 3) :
   data length [8] is not a sub-multiple or multiple of the number of columns 
[3]


matrix(1:12, 2,3)

  [,1] [,2] [,3]
[1,]135
[2,]246




So it looks to me the current behavior is quite on purpose.
Are you sure it's not documented at all when reading the docs
carefully?  (I did *not*, just now).

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""

2020-05-28 Thread Hervé Pagès

Excellent! Thanks Martin.

H.

On 5/28/20 00:39, Martin Maechler wrote:

Martin Maechler
 on Wed, 27 May 2020 13:35:44 +0200 writes:



Hervé Pagès
 on Tue, 26 May 2020 12:38:13 -0700 writes:


 >> Hi Martin, On 5/26/20 06:24, Martin Maechler wrote: ...
 >>>
 >>> What about remaining back-compatible, not only to R 3.y.z
 >>> with default recycle0=FALSE, but also to R 4.0.0 with
 >>> recycle0=TRUE

 >> What back-compatibility with R 4.0.0 are we talking about?
 >> The 'recycle0' arg was added **after** the R 4.0.0 release
 >> and has never been part of an official release yet.

 > Yes, of course.  It was *planned* for R 4.0.0 and finally was
 > too late (feature freeze etc)... I'm sorry I was wrong and
 > misleading above.

 >> This is the time to fix it.

 > Well, R 4.0.1 is already in 'beta' and does contain it too.
 > So the "fix" should happen really really fast, or we (R core)
 > take it out from there entirely.

Well, in the end your repeated good reasoning has prevailed:
I've committed a change (to R-devel; most probably in
time to be ported to 4.0.1 beta).
I think this implements the   recycle0 = TRUE   behavior you
have been advocating for,
in svn r78591  (2020-05-27 19:45:07 +0200)   with message

  paste(), paste0(): collapse= always gives a string
  (also w/ `recycle0=TRUE`)

Best regards,
Martin



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""

2020-05-26 Thread Hervé Pagès

Hi Martin,

On 5/26/20 06:24, Martin Maechler wrote:
...


What about remaining back-compatible, not only to R 3.y.z with
default recycle0=FALSE, but also to R 4.0.0 with recycle0=TRUE


What back-compatibility with R 4.0.0 are we talking about? The 
'recycle0' arg was added **after** the R 4.0.0 release and has never 
been part of an official release yet. This is the time to fix it.



*and* add a new option for the Suharto-Bill-Hervé-Gabe behavior,
e.g., recycle0="sep.only" or just  recycle0="sep" ?


OMG!



As (for back-compatibility reasons) you have to specify
'recycle0 = ..'  anyway, you would get what makes most sense to
you by using such a third option.

? (WDYT ?)


Don't bother. I'd rather use

  paste(paste(x, y, z, sep="#", recycle0=TRUE), collapse=",")

i.e. explicitly break down the 2 operations (sep and collapse). Might be 
slightly less efficient but I find it way more readable than


  paste(x, y, z, sep="#", collapse=",", recycle0="sep.only")

BTW I appreciate you trying to accomodate everybody's taste. That 
doesn't sound like an easy task ;-)


I'll just reiterate my earlier comment that controlling the collapse 
operation via an argument named 'recycle0' doesn't make sense (collapse 
involves NO recycling). So I don't know if the current 'recyle0=TRUE' 
behavior is "the correct one" but at the very least the name of the 
argument is a misnomer and misleading.


More generally speaking using the same argument to control 2 distinct 
operations is not good API design. A better design is to use 2 
arguments. Then the 2 arguments can generally be made orthogonal (like 
in this case) i.e. all possible combinations are valid (4 combinations 
in this case).


Thanks,
H.




Martin

 > Switching to scheme (3) or to a new custom scheme
 > would be a completely different proposal.

 >>
 >> At least I'm consistent right?

 > Yes :-)

 > Anyway discussing recycling schemes is interesting but not directly
 > related with what the OP brought up (behavior of the 'collapse' 
operation).

 > Cheers,
 > H.

 >>
 >> ~G



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""

2020-05-24 Thread Hervé Pagès

On 5/24/20 00:26, Gabriel Becker wrote:



On Sat, May 23, 2020 at 9:59 PM Hervé Pagès <mailto:hpa...@fredhutch.org>> wrote:


On 5/23/20 17:45, Gabriel Becker wrote:
 > Maybe my intuition is just
 > different but when I collapse multiple character vectors together, I
 > expect all the characters from each of those vectors to be in the
 > resulting collapsed one.

Yes I'd expect that too. But the **collapse** operation in paste() has
never been about collapsing **multiple** character vectors together.
What it does is collapse the **single** character vector that comes out
of the 'sep' operation.


I understand what it does, I broke ti down the same way in my post 
earlier in the thread. the fact remains is that it is a single function 
which significantly muddies the waters. so you can say


paste0(x,y, collapse=",", recycle0=TRUE)

is not a collapse operation on multiple vectors, and of course there's a 
sense in which you're not wrong (again I understand what these functions 
do), but it sure looks like one in the invocation, doesn't it?


Honestly the thing that this whole discussion has shown me most clearly 
is that, imho, collapse (accepting ONLY one data vector) and 
paste(accepting multiple) should never have been a single function to 
begin with.  But that ship sailed long long ago.


Yes :-(



So

    paste(x, y, z, sep="", collapse=",")

is analogous to

    sum(x + y + z)


Honestly, I'd be significantly more comfortable if

1:10 + integer(0) + 5

were an error too.


This is actually the recycling scheme used by mapply():

  > mapply(function(x, y, z) c(x, y, z), 1:10, integer(0), 5)
  Error in mapply(FUN = FUN, ...) :
zero-length inputs cannot be mixed with those of non-zero length

AFAIK base R uses 3 different recycling schemes for n-ary operations:

(1) The recycling scheme used by arithmetic and comparison operations
(Arith, Compare, Logic group generics).

(2) The recycling scheme used by classic paste().

(3) The recycling scheme used by mapply().

Having such a core mechanism like recycling being inconsistent across 
base R is sad. It makes it really hard to predict how a given n-ary 
function will recycle its arguments unless you spend some time trying it 
yourself with several combinations of vector lengths. It is of course 
the source of numerous latent bugs. I wish there was only one but that's 
just a dream.


None of these 3 recycling schemes is perfect. IMO (2) is by far the 
worst. (3) is too restrictive and would need to be refined if we wanted 
to make it a good universal recycling scheme.


Anyway I don't think it makes sense to introduce a 4th recycling scheme 
at this point even though it would be a nice item to put on the wish 
list for R 7.0.0 with the ultimate goal that it will universally adopted 
in R 11.0.0 ;-)


So if we have to do with what we have IMO (1) is the scheme that makes 
most sense although I agree that it can do some surprising things for 
some unusual combinations of vector lengths. It's the scheme I adhere to 
in my own binary operations e.g. in S4Vector::pcompare().


The modest proposal of the 'recycle0' argument is only to let the user 
switch from recycling scheme (2) to (1) if they're not happy with scheme 
(2) (I'm one of them). Switching to scheme (3) or to a new custom scheme 
would be a completely different proposal.




At least I'm consistent right?


Yes :-)

Anyway discussing recycling schemes is interesting but not directly 
related with what the OP brought up (behavior of the 'collapse' operation).


Cheers,
H.



~G


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""

2020-05-23 Thread Hervé Pagès

On 5/23/20 17:45, Gabriel Becker wrote:
Maybe my intuition is just 
different but when I collapse multiple character vectors together, I 
expect all the characters from each of those vectors to be in the 
resulting collapsed one.


Yes I'd expect that too. But the **collapse** operation in paste() has 
never been about collapsing **multiple** character vectors together. 
What it does is collapse the **single** character vector that comes out 
of the 'sep' operation.


So

  paste(x, y, z, sep="", collapse=",")

is analogous to

  sum(x + y + z)

The element-wise addition is analog to the 'sep' operation.
The sum() operation is analog to the 'collapse' operation.

H.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""

2020-05-23 Thread Hervé Pagès

On 5/22/20 18:12, brodie gaslam wrote:


FWIW what convinces me is consistency with other aggregating functions applied
to zero length inputs:

sum(numeric(0))
## [1] 0


Right.

And 1 is the identity element of multiplication:

> prod(numeric(0))
[1] 1

And the empty string is the identity element of string aggregation by 
concatenation.


H.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""

2020-05-22 Thread Hervé Pagès

Gabe,

It's the current behavior of paste() that is a major source of bugs:

  ## Add "rs" prefix to SNP ids and collapse them in a
  ## comma-separated string.
  collapse_snp_ids <- function(snp_ids)
  paste("rs", snp_ids, sep="", collapse=",")

  snp_groups <- list(
group1=c(55, 22, 200),
group2=integer(0),
group3=c(99, 550)
  )

  vapply(snp_groups, collapse_snp_ids, character(1))
  #group1group2group3
  # "rs55,rs22,rs200"  "rs"  "rs99,rs550"

This has hit me so many times!

Now with 'collapse0=TRUE', we finally have the opportunity to make it do 
the right thing. Let's not miss that opportunity.


Cheers,
H.


On 5/22/20 11:26, Gabriel Becker wrote:
I understand that this is consistent but it also strikes me as an 
enormous 'gotcha' of a magnitude that 'we' are trying to avoid/smooth 
over at this point in user-facing R space.


For the record I'm not suggesting it should return something other than 
"", and in particular I'm not arguing that any call to paste /that does 
not return an error/ with non-NULL collapse should return a character 
vector of length one.


Rather I'm pointing out that it could (perhaps should, imo) simply be an 
error, which is also consistent, in the strict sense, with 
previous behavior in that it is the developer simply declining to extend 
the recycle0 argument to the full parameter space (there is no rule that 
says we must do so, arguments whose use is incompatible with other 
arguments can be reasonable and called for).


I don't feel feel super strongly that reeturning "" in this and similar 
cases horrible and should never happen, but i'd bet dollars to donuts 
that to the extent that behavior occurs it will be a disproportionately 
major source of bugs, and i think thats at least worth considering in 
addition to pure consistency.


~G

On Fri, May 22, 2020 at 9:50 AM William Dunlap <mailto:wdun...@tibco.com>> wrote:


I agree with Herve, processing collapse happens last so
collapse=non-NULL always leads to a single character string being
returned, the same as paste(collapse="").  See the altPaste function
I posted yesterday.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

<https://urldefense.proofpoint.com/v2/url?u=http-3A__tibco.com&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Z1o-HO3_OqxOR9LaRguGvnG7X4vF_z1_q13I7zmjcfY&s=7ZT1IjmexPqsDBhrV3NspPTr8M8XiMweEwJWErgAlqw&e=>


On Fri, May 22, 2020 at 9:12 AM Hervé Pagès mailto:hpa...@fredhutch.org>> wrote:

I think that

     paste(c("a", "b"), NULL, c("c",  "d"),  sep = " ", collapse
= ",",
recycle0=TRUE)

should just return an empty string and don't see why it needs to
emit a
warning or raise an error. To me it does exactly what the user
is asking
for, which is to change how the 3 arguments are recycled
**before** the
'sep' operation.

The 'recycle0' argument has no business in the 'collapse' operation
(which comes after the 'sep' operation): this operation still
behaves
like it always had.

That's all there is to it.

H.


On 5/22/20 03:00, Gabriel Becker wrote:
 > Hi Martin et al,
 >
 >
 >
 > On Thu, May 21, 2020 at 9:42 AM Martin Maechler
 > mailto:maech...@stat.math.ethz.ch>
<mailto:maech...@stat.math.ethz.ch
<mailto:maech...@stat.math.ethz.ch>>> wrote:
 >
 >      >>>>> Hervé Pagès
 >      >>>>>     on Fri, 15 May 2020 13:44:28 -0700 writes:
 >
 >          > There is still the situation where **both** 'sep' and
 >     'collapse' are
 >          > specified:
 >
 >          >> paste(integer(0), "nth", sep="", collapse=",")
 >          > [1] "nth"
 >
 >          > In that case 'recycle0' should **not** be ignored i.e.
 >
 >          > paste(integer(0), "nth", sep="", collapse=",",
recycle0=TRUE)
 >
 >          > should return the empty string (and not
character(0) like it
 >     does at the
 >          > moment).
 >
 >          > In other words, 'recycle0' should only control the
first
 >     operati

Re: [Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""

2020-05-22 Thread Hervé Pagès

I think that

   paste(c("a", "b"), NULL, c("c",  "d"),  sep = " ", collapse = ",", 
recycle0=TRUE)


should just return an empty string and don't see why it needs to emit a 
warning or raise an error. To me it does exactly what the user is asking 
for, which is to change how the 3 arguments are recycled **before** the 
'sep' operation.


The 'recycle0' argument has no business in the 'collapse' operation 
(which comes after the 'sep' operation): this operation still behaves 
like it always had.


That's all there is to it.

H.


On 5/22/20 03:00, Gabriel Becker wrote:

Hi Martin et al,



On Thu, May 21, 2020 at 9:42 AM Martin Maechler 
mailto:maech...@stat.math.ethz.ch>> wrote:


 >>>>> Hervé Pagès
 >>>>>     on Fri, 15 May 2020 13:44:28 -0700 writes:

     > There is still the situation where **both** 'sep' and
'collapse' are
     > specified:

     >> paste(integer(0), "nth", sep="", collapse=",")
     > [1] "nth"

     > In that case 'recycle0' should **not** be ignored i.e.

     > paste(integer(0), "nth", sep="", collapse=",", recycle0=TRUE)

     > should return the empty string (and not character(0) like it
does at the
     > moment).

     > In other words, 'recycle0' should only control the first
operation (the
     > operation controlled by 'sep'). Which makes plenty of sense:
the 1st
     > operation is binary (or n-ary) while the collapse operation
is unary.
     > There is no concept of recycling in the context of unary
operations.

Interesting, ..., and sounding somewhat convincing.

     > On 5/15/20 11:25, Gabriel Becker wrote:
     >> Hi all,
     >>
     >> This makes sense to me, but I would think that recycle0 and
collapse
     >> should actually be incompatible and paste should throw an
error if
     >> recycle0 were TRUE and collapse were declared in the same
call. I don't
     >> think the value of recycle0 should be silently ignored if it
is actively
     >> specified.
     >>
     >> ~G

Just to summarize what I think we should know and agree (or be
be "disproven") and where this comes from ...

1) recycle0 is a new R 4.0.0 option in paste() / paste0() which by
default
    (recycle0 = FALSE) should (and *does* AFAIK) not change anything,
    hence  paste() / paste0() behave completely back-compatible
    if recycle0 is kept to FALSE.

2) recycle0 = TRUE is meant to give different behavior, notably
    0-length arguments (among '...') should result in 0-length results.

    The above does not specify what this means in detail, see 3)

3) The current R 4.0.0 implementation (for which I'm primarily
responsible)
    and help(paste)  are in accordance.
    Notably the help page (Arguments -> 'recycle0' ; Details 1st
para ; Examples)
    says and shows how the 4.0.0 implementation has been meant to work.

4) Several provenly smart members of the R community argue that
    both the implementation and the documentation of 'recycle0 =
    TRUE'  should be changed to be more logical / coherent / sensical ..

Is the above all correct in your view?

Assuming yes,  I read basically two proposals, both agreeing
that  recycle0 = TRUE  should only ever apply to the action of 'sep'
but not the action of 'collapse'.

1) Bill and Hervé (I think) propose that 'recycle0' should have
    no effect whenever  'collapse = '

2) Gabe proposes that 'collapse = ' and 'recycle0 = TRUE'
    should be declared incompatible and error. If going in that
    direction, I could also see them to give a warning (and
    continue as if recycle = FALSE).


Herve makes a good point about when sep and collapse are both set. That 
said, if the user explicitly sets recycle0, Personally, I don't think it 
should be silently ignored under any configuration of other arguments.


If all of the arguments are to go into effect, the question then becomes 
one of ordering, I think.


Consider

paste(c("a", "b"), NULL, c("c",  "d"),  sep = " ", collapse = ",", 
recycle0=TRUE)


Currently that returns character(0), becuase the logic is 
essenttially (in pseudo-code)


collapse(paste(c("a", "b"), NULL, c("c",  "d"),  sep = " ", 
recycle0=TRUE), collapse = ", ", recycle0=TRUE

Re: [Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""

2020-05-15 Thread Hervé Pagès
There is still the situation where **both** 'sep' and 'collapse' are 
specified:


  > paste(integer(0), "nth", sep="", collapse=",")
  [1] "nth"

In that case 'recycle0' should **not** be ignored i.e.

  paste(integer(0), "nth", sep="", collapse=",", recycle0=TRUE)

should return the empty string (and not character(0) like it does at the 
moment).


In other words, 'recycle0' should only control the first operation (the 
operation controlled by 'sep'). Which makes plenty of sense: the 1st 
operation is binary (or n-ary) while the collapse operation is unary. 
There is no concept of recycling in the context of unary operations.


H.

On 5/15/20 11:25, Gabriel Becker wrote:

Hi all,

This makes sense to me, but I would think that recycle0 and collapse 
should actually be incompatible and paste should throw an error if 
recycle0 were TRUE and collapse were declared in the same call. I don't 
think the value of recycle0 should be silently ignored if it is actively 
specified.


~G

On Fri, May 15, 2020 at 11:05 AM Hervé Pagès <mailto:hpa...@fredhutch.org>> wrote:


Totally agree with that.

H.

On 5/15/20 10:34, William Dunlap via R-devel wrote:
 > I agree: paste(collapse="something", ...) should always return a
single
 > character string, regardless of the value of recycle0.  This would be
 > similar to when there are no non-NULL arguments to paste;
collapse="."
 > gives a single empty string and collapse=NULL gives a zero long
character
 > vector.
 >> paste()
 > character(0)
 >> paste(collapse=", ")
 > [1] ""
 >
 > Bill Dunlap
 > TIBCO Software
 > wdunlap tibco.com

<https://urldefense.proofpoint.com/v2/url?u=http-3A__tibco.com&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=cC2qctlVXd0qHMPvCyYvuVMqR8GU3DjTTqKJ0zjIFj8&s=rXIwWqf4U4HZS_bjUT3KfA9ARaV5YTb_kEcXWHnkt-c&e=>
 >
 >
 > On Thu, Apr 30, 2020 at 9:56 PM suharto_anggono--- via R-devel <
 > r-devel@r-project.org <mailto:r-devel@r-project.org>> wrote:
 >
 >> Without 'collapse', 'paste' pastes (concatenates) its arguments
 >> elementwise (separated by 'sep', " " by default). New in R devel
and R
 >> patched, specifying recycle0 = FALSE makes mixing zero-length and
 >> nonzero-length arguments results in length zero. The result of
paste(n,
 >> "th", sep = "", recycle0 = FALSE) always have the same length as
'n'.
 >> Previously, the result is still as long as the longest argument,
with the
 >> zero-length argument like "". If all og the arguments have
length zero,
 >> 'recycle0' doesn't matter.
 >>
 >> As far as I understand, 'paste' with 'collapse' as a character
string is
 >> supposed to put together elements of a vector into a single
character
 >> string. I think 'recycle0' shouldn't change it.
 >>
 >> In current R devel and R patched, paste(character(0), collapse = "",
 >> recycle0 = FALSE) is character(0). I think it should be "", like
 >> paste(character(0), collapse="").
 >>
 >> paste(c("4", "5"), "th", sep = "", collapse = ", ", recycle0 =
FALSE)
 >> is
 >> "4th, 5th".
 >> paste(c("4"     ), "th", sep = "", collapse = ", ", recycle0 =
FALSE)
 >> is
 >> "4th".
 >> I think
 >> paste(c(        ), "th", sep = "", collapse = ", ", recycle0 =
FALSE)
 >> should be
 >> "",
 >> not character(0).
 >>
 >> __
 >> R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list
 >>

https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=776IovW06eUHr1EDrabHLY7F47rU9CCUEItSDI96zc0&s=xN84DhkZeoxzn6SG0QTMpOGg2w_ThmjZmZymGUuD0Uw&e=
 >>
 >
 >       [[alternative HTML version deleted]]
 >
 > __
 > R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list
 >

https://urldefense.proofpoint.com/v2/url

Re: [Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""

2020-05-15 Thread Hervé Pagès

Totally agree with that.

H.

On 5/15/20 10:34, William Dunlap via R-devel wrote:

I agree: paste(collapse="something", ...) should always return a single
character string, regardless of the value of recycle0.  This would be
similar to when there are no non-NULL arguments to paste; collapse="."
gives a single empty string and collapse=NULL gives a zero long character
vector.

paste()

character(0)

paste(collapse=", ")

[1] ""

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Thu, Apr 30, 2020 at 9:56 PM suharto_anggono--- via R-devel <
r-devel@r-project.org> wrote:


Without 'collapse', 'paste' pastes (concatenates) its arguments
elementwise (separated by 'sep', " " by default). New in R devel and R
patched, specifying recycle0 = FALSE makes mixing zero-length and
nonzero-length arguments results in length zero. The result of paste(n,
"th", sep = "", recycle0 = FALSE) always have the same length as 'n'.
Previously, the result is still as long as the longest argument, with the
zero-length argument like "". If all og the arguments have length zero,
'recycle0' doesn't matter.

As far as I understand, 'paste' with 'collapse' as a character string is
supposed to put together elements of a vector into a single character
string. I think 'recycle0' shouldn't change it.

In current R devel and R patched, paste(character(0), collapse = "",
recycle0 = FALSE) is character(0). I think it should be "", like
paste(character(0), collapse="").

paste(c("4", "5"), "th", sep = "", collapse = ", ", recycle0 = FALSE)
is
"4th, 5th".
paste(c("4" ), "th", sep = "", collapse = ", ", recycle0 = FALSE)
is
"4th".
I think
paste(c(), "th", sep = "", collapse = ", ", recycle0 = FALSE)
should be
"",
not character(0).

__
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=776IovW06eUHr1EDrabHLY7F47rU9CCUEItSDI96zc0&s=xN84DhkZeoxzn6SG0QTMpOGg2w_ThmjZmZymGUuD0Uw&e=



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=776IovW06eUHr1EDrabHLY7F47rU9CCUEItSDI96zc0&s=xN84DhkZeoxzn6SG0QTMpOGg2w_ThmjZmZymGUuD0Uw&e=



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] "cd" floating in the air in the man page for paste/paste0

2020-05-14 Thread Hervé Pagès

Thanks for the fix.

H.


On 5/12/20 23:29, Tomas Kalibera wrote:

Thanks, fixed.
Tomas

On 5/13/20 5:14 AM, Dirk Eddelbuettel wrote:

On 12 May 2020 at 19:59, Hervé Pagès wrote:
| While reading about the new 'recycle0' argument of paste/paste0, I
| spotted a mysterious "cd" floating in the air in the man page:
|
|    recycle0: ‘logical’ indicating if zero-length character arguments 
(and

|  all zero-length or no arguments when ‘collapse’ is not
|  ‘NULL’) should lead to the zero-length ‘character(0)’.
| cd
| ^^
|
| This is in R 4.0.0 Patched and R devel.

Also still in r-devel as of svn r78432:

   \item{recycle0}{\code{\link{logical}} indicating if zero-length
 character arguments (and all zero-length or no arguments when
 \code{collapse} is not \code{NULL}) should lead to the zero-length
 \code{\link{character}(0)}.}cd
     ^^

Dirk





--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] "cd" floating in the air in the man page for paste/paste0

2020-05-12 Thread Hervé Pagès

Hi,

While reading about the new 'recycle0' argument of paste/paste0, I 
spotted a mysterious "cd" floating in the air in the man page:


  recycle0: ‘logical’ indicating if zero-length character arguments (and
all zero-length or no arguments when ‘collapse’ is not
‘NULL’) should lead to the zero-length ‘character(0)’.
   cd
   ^^

This is in R 4.0.0 Patched and R devel.

Cheers,
H.


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Rtools and R 4.0.0?

2020-04-28 Thread Hervé Pagès

Thanks Jeroen!


On Tue, Apr 7, 2020 at 6:07 PM Kevin Ushey  wrote:


Regardless, I would like to thank R core, CRAN, and Jeroen for all of
the time that has gone into creating and validating this new
toolchain. This is arduous work at an especially arduous time, so I'd
like to voice my appreciation for all the time and energy they have
spent on making this possible.


Absolutely. Thanks to R core, CRAN, Jeroen, and all the other people 
involved in creating the new Windows toolchain.


Cheers,
H.



Best,
Kevin

On Tue, Apr 7, 2020 at 7:47 AM Dirk Eddelbuettel  wrote:



There appears to have been some progress on this matter:

-Note that @command{g++} 4.9.x (as used for @R{} on Windows up to 3.6.x)
+Note that @command{g++} 4.9.x (as used on Windows prior to @R{} 4.0.0)

See SVN commit r78169 titled 'anticipate change in Windows toolchain', or the
mirrored git commit at
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_wch_r-2Dsource_commit_bd674e2b76b2384169424e3d899fbfb5ac174978&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=zMjaTujju0afmK5eIVPZrNajypj8QjuNbSyoAv93ISk&s=oQL_LnqplfOV3qS3_v0vWloGk5Qhr6pWl4Yjzs4Tzzo&e=

Dirk

--
https://urldefense.proofpoint.com/v2/url?u=http-3A__dirk.eddelbuettel.com&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=zMjaTujju0afmK5eIVPZrNajypj8QjuNbSyoAv93ISk&s=nOplDwpoh_urogK65Old_l1Qi-EbVpyC0Mv4LgeLl64&e=
  | @eddelbuettel | e...@debian.org

__
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=zMjaTujju0afmK5eIVPZrNajypj8QjuNbSyoAv93ISk&s=vUQZdkVyqq3iT9HukcKqEjg80sI-OZoKuy9DKiufquw&e=


__
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=zMjaTujju0afmK5eIVPZrNajypj8QjuNbSyoAv93ISk&s=vUQZdkVyqq3iT9HukcKqEjg80sI-OZoKuy9DKiufquw&e=


__
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=zMjaTujju0afmK5eIVPZrNajypj8QjuNbSyoAv93ISk&s=vUQZdkVyqq3iT9HukcKqEjg80sI-OZoKuy9DKiufquw&e=



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Hard memory limit of 16GB under Windows?

2020-04-07 Thread Hervé Pagès

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

  > ls()
character(0)
  > memory.limit()
[1] 32627
  > sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252
LC_MONETARY=French_France.1252
[4] LC_NUMERIC=C   LC_TIME=French_France.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.6.3
  >

__
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=r6WLJ5dXWo2qb7mQwONaCxYeeWgKwycd3y89JoqY-oY&s=ABvG3sGKR5ln27FVCM8dlmZ82X93ZCTigbMxHeBEb6E&e=



[[alternative HTML version deleted]]

______
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=r6WLJ5dXWo2qb7mQwONaCxYeeWgKwycd3y89JoqY-oY&s=ABvG3sGKR5ln27FVCM8dlmZ82X93ZCTigbMxHeBEb6E&e=



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] object.size vs lobstr::obj_size

2020-03-27 Thread Hervé Pagès

On 3/27/20 15:19, Hadley Wickham wrote:



On Fri, Mar 27, 2020 at 4:01 PM Hervé Pagès <mailto:hpa...@fredhutch.org>> wrote:




On 3/27/20 12:00, Hadley Wickham wrote:
 >
 >
 > On Fri, Mar 27, 2020 at 10:39 AM Hervé Pagès
mailto:hpa...@fredhutch.org>
 > <mailto:hpa...@fredhutch.org <mailto:hpa...@fredhutch.org>>> wrote:
 >
 >     Hi Tomas,
 >
 >     On 3/27/20 07:01, Tomas Kalibera wrote:
 >      > they provide an over-approximation
 >
 >     They can also provide an "under-approximation" (to say the
least) e.g.
 >     on reference objects where the entire substance of the object is
 >     ignored
 >     which makes object.size() completely meaningless in that case:
 >
 >         setRefClass("A", fields=c(stuff="ANY"))
 >         object.size(new("A", stuff=raw(0)))      # 680 bytes
 >         object.size(new("A", stuff=runif(1e8)))  # 680 bytes
 >
 >     Why wouldn't object.size() look at the content of environments?
 >
 >
 > As the author, I'm obviously biased, but I do like
lobstr::obj_sizes()
 > which allows you to see the additional size occupied by one
object given
 > any number of other objects. This is particularly important for
 > reference classes since individual objects appear quite large:
 >
 > A <- setRefClass("A", fields=c(stuff="ANY"))
 > lobstr::obj_size(new("A", stuff=raw(0)))
 > #> 567,056 B
 >
 > But the vast majority is shared across all instances of that class:
 >
 > lobstr::obj_size(A)
 > #> 719,232 B
 > lobstr::obj_sizes(A, new("A", stuff=raw(0)))
 > #> * 719,232 B
 > #> *     720 B
 > lobstr::obj_sizes(A, new("A", stuff=runif(1e8)))
 > #> *     719,232 B
 > #> * 800,000,720 B

Nice. Can you clarify the situation with lobstr::obj_size vs
pryr::object_size? I've heard of the latter before and use it sometimes
but never heard of the former before seeing Stefan's post. Then I
checked the authors of both and thought maybe they should talk to each
other ;-)


pryr is basically retired :) TBH I don't know why I gave up on it, 
except lobstr is a cooler name 🤣 That's where all active development is 
happening. (The underlying code is substantially similar although 
lobstr includes bug fixes not present in pryr)


Good to know, thanks! Couldn't find any mention of pryr being abandoned 
and superseded by lobster (which definitely sounds more yummy) in pryr's 
README.md or DESCRIPTION file. Would be good to put this somewhere.


H.




Hadley
--
http://hadley.nz 
<https://urldefense.proofpoint.com/v2/url?u=http-3A__hadley.nz&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=YbZWqj-epVToKynrOqXF8TgrxHYKx1pF3q2GrOuJwBQ&s=qCeYCgVDbk_GzadBoAgc3cf81fQfRJXpsf0P5meMhtU&e=>


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] object.size vs lobstr::obj_size

2020-03-27 Thread Hervé Pagès




On 3/27/20 12:00, Hadley Wickham wrote:



On Fri, Mar 27, 2020 at 10:39 AM Hervé Pagès <mailto:hpa...@fredhutch.org>> wrote:


Hi Tomas,

On 3/27/20 07:01, Tomas Kalibera wrote:
 > they provide an over-approximation

They can also provide an "under-approximation" (to say the least) e.g.
on reference objects where the entire substance of the object is
ignored
which makes object.size() completely meaningless in that case:

    setRefClass("A", fields=c(stuff="ANY"))
    object.size(new("A", stuff=raw(0)))      # 680 bytes
    object.size(new("A", stuff=runif(1e8)))  # 680 bytes

Why wouldn't object.size() look at the content of environments?


As the author, I'm obviously biased, but I do like lobstr::obj_sizes() 
which allows you to see the additional size occupied by one object given 
any number of other objects. This is particularly important for 
reference classes since individual objects appear quite large:


A <- setRefClass("A", fields=c(stuff="ANY"))
lobstr::obj_size(new("A", stuff=raw(0)))
#> 567,056 B

But the vast majority is shared across all instances of that class:

lobstr::obj_size(A)
#> 719,232 B
lobstr::obj_sizes(A, new("A", stuff=raw(0)))
#> * 719,232 B
#> *     720 B
lobstr::obj_sizes(A, new("A", stuff=runif(1e8)))
#> *     719,232 B
#> * 800,000,720 B


Nice. Can you clarify the situation with lobstr::obj_size vs 
pryr::object_size? I've heard of the latter before and use it sometimes 
but never heard of the former before seeing Stefan's post. Then I 
checked the authors of both and thought maybe they should talk to each 
other ;-)


Thanks,
H.



Hadley
--
http://hadley.nz 
<https://urldefense.proofpoint.com/v2/url?u=http-3A__hadley.nz&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=MX7Olw-dGRDfJNWEqIDTTTkaagVswOEqcRnxuRBAdjw&s=haVkOV6bEj7VnjT4Gn4iXzRqO7IOqDZUZuEeFPSHQuM&e=>


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] object.size vs lobstr::obj_size

2020-03-27 Thread Hervé Pagès

Hi Tomas,

On 3/27/20 07:01, Tomas Kalibera wrote:

they provide an over-approximation


They can also provide an "under-approximation" (to say the least) e.g. 
on reference objects where the entire substance of the object is ignored 
which makes object.size() completely meaningless in that case:


  setRefClass("A", fields=c(stuff="ANY"))
  object.size(new("A", stuff=raw(0)))  # 680 bytes
  object.size(new("A", stuff=runif(1e8)))  # 680 bytes

Why wouldn't object.size() look at the content of environments?

Thanks,
H.

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] configure --with-pcre1 fails with latest R 4.0 on Ubuntu 14.04

2020-03-22 Thread Hervé Pagès

Excellent. Thank you!   H.

On 3/20/20 23:55, Tomas Kalibera wrote:

On 3/18/20 6:11 PM, Hervé Pagès wrote:
Thanks Tomas. Any chance the old version of the error message could be 
restored? It would definitely be more helpful than the current one. 
It's confusing to get an error and be told to use --with-pcre1 when 
you're already using it.


The message now gives the required version and UTF-8 support 
requirement, so one does not have to look that one line up.

Thanks to Brian Ripley,

Tomas




H.

On 3/18/20 01:08, Tomas Kalibera wrote:

On 3/17/20 8:18 PM, Hervé Pagès wrote:
Using --with-pcre1 to configure the latest R 4.0 (revision 77988) on 
an Ubuntu 14.04.5 LTS system gives me the following error:


...
checking if lzma version >= 5.0.3... yes
checking for pcre2-config... no
checking for pcre_fullinfo in -lpcre... yes
checking pcre.h usability... yes
checking pcre.h presence... yes
checking for pcre.h... yes
checking pcre/pcre.h usability... no
checking pcre/pcre.h presence... no
checking for pcre/pcre.h... no
checking if PCRE1 version >= 8.32 and has UTF-8 support... no
checking whether PCRE support suffices... configure: error: pcre2 
library and headers are required, or use --with-pcre1


Maybe the real problem is that the PCRE version on this OS is 8.31?


Yes, R requires PCRE version at least 8.32 as documented in R-Admin, 
and this is since September 2019.



The error message is not particularly helpful.


An earlier version of the message gave the requirement explicitly, 
when people would have been more likely to have that old versions of 
PCRE1.
The few who still have it now need to see also the output line above 
to get the requirement and/or look into the manual.


R 4.0 is still keeping support for PCRE1 (>=8.32), but PCRE2 should 
be used whenever possible.


Best,
Tomas



Thanks,
H.









--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] configure --with-pcre1 fails with latest R 4.0 on Ubuntu 14.04

2020-03-18 Thread Hervé Pagès
Thanks Tomas. Any chance the old version of the error message could be 
restored? It would definitely be more helpful than the current one. It's 
confusing to get an error and be told to use --with-pcre1 when you're 
already using it.


H.

On 3/18/20 01:08, Tomas Kalibera wrote:

On 3/17/20 8:18 PM, Hervé Pagès wrote:
Using --with-pcre1 to configure the latest R 4.0 (revision 77988) on 
an Ubuntu 14.04.5 LTS system gives me the following error:


...
checking if lzma version >= 5.0.3... yes
checking for pcre2-config... no
checking for pcre_fullinfo in -lpcre... yes
checking pcre.h usability... yes
checking pcre.h presence... yes
checking for pcre.h... yes
checking pcre/pcre.h usability... no
checking pcre/pcre.h presence... no
checking for pcre/pcre.h... no
checking if PCRE1 version >= 8.32 and has UTF-8 support... no
checking whether PCRE support suffices... configure: error: pcre2 
library and headers are required, or use --with-pcre1


Maybe the real problem is that the PCRE version on this OS is 8.31?


Yes, R requires PCRE version at least 8.32 as documented in R-Admin, and 
this is since September 2019.



The error message is not particularly helpful.


An earlier version of the message gave the requirement explicitly, when 
people would have been more likely to have that old versions of PCRE1.
The few who still have it now need to see also the output line above to 
get the requirement and/or look into the manual.


R 4.0 is still keeping support for PCRE1 (>=8.32), but PCRE2 should be 
used whenever possible.


Best,
Tomas



Thanks,
H.





--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] configure --with-pcre1 fails with latest R 4.0 on Ubuntu 14.04

2020-03-17 Thread Hervé Pagès
Using --with-pcre1 to configure the latest R 4.0 (revision 77988) on an 
Ubuntu 14.04.5 LTS system gives me the following error:


...
checking if lzma version >= 5.0.3... yes
checking for pcre2-config... no
checking for pcre_fullinfo in -lpcre... yes
checking pcre.h usability... yes
checking pcre.h presence... yes
checking for pcre.h... yes
checking pcre/pcre.h usability... no
checking pcre/pcre.h presence... no
checking for pcre/pcre.h... no
checking if PCRE1 version >= 8.32 and has UTF-8 support... no
checking whether PCRE support suffices... configure: error: pcre2 
library and headers are required, or use --with-pcre1


Maybe the real problem is that the PCRE version on this OS is 8.31?

The error message is not particularly helpful.

Thanks,
H.

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] rounding change

2020-03-11 Thread Hervé Pagès
Thanks for the heads up. The new result for round(51/80, digits=3) is 
also consistent with sprintf("%.3f", 51/80), format(51/80, digits=3), 
print(51/80, digits=3), and with the sprintf() function in C. Which is 
somehow satisfying.


H.

On 3/5/20 05:54, Therneau, Terry M., Ph.D. via R-devel wrote:

This is a small heads up for package maintainers.   Under the more recent 
R-devel, R CMD
check turned up some changes in the *.out files.   The simple demonstration is 
to type
"round(51/80, 3)", which gives .638 under the old and .637 under the new.   
(One of my
coxph test cases has a concordance of exactly 51/80).

In this particular case 51/80 is exactly .6375, but that value does not 
have an exact
representation in base 2.  The line below would argue that the new version is 
correct, at
least with respect to the internal representation.

  > print(51/80, digits = 20)
[1] 0.63745559

This is not a bug or problem, it just means that whichever version I put into my
survival/tests/book6.Rout.save file, one of R-devel or R-current will flag an 
issue.



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=ByODf3XxvkT0Ag-YiS72sOZMg3b9vKH-pDRcZARaGWQ&s=z5huvy_ZadTqpmI7_sfnFcohmR_I4LdQ3LmOjyEg6kw&e=



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] unlink() on "~" removes the home directory

2020-02-26 Thread Hervé Pagès

On 2/26/20 14:47, Gábor Csárdi wrote:

!!! DON'T TRY THE CODE IN THIS EMAIL AT HOME !!!


Ok I'll try it at work on my boss's computer, sounds a lot safer.

H.



Well, unlink() does what it is supposed to do, so you could argue that
there is nothing wrong with it. Also, nobody would call unlink() on
"~", right?

The situation is not so simple, however. E.g. if you happen to have a
directory called "~", and you iterate over all files and directories
to selectively remove some of them, then your code might end up
calling unlink on the local "~" directory, and then your home is gone.

But you would not create a directory named "~", that is just asking
for trouble. Well, surely, _intentionally_ you would not do that.
Unintentionally, you might. E.g. something like this is enough:

# Create a subpath within a base directory
badfun <- function(base = ".", path) {
   dir.create(file.path(base, path), recursive = TRUE, showWarnings = FALSE)
}
badfun(path = "~/foo")

(If you did run this, be very careful how you remove the directory called "~"!)

A real example is `R CMD build` which deletes the home directory of
the current user if the root of the package contains a non-empty "~"
directory. Luckily this is now fixed in R-devel, so R 4.0.0 will do
better. (R 3.6.3 will not.) See
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_wch_r-2Dsource_commit_1d4f7aa1dac427ea2213d1f7cd7b5c16e896af22&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=C3VCGF44o7jATPOlC8aZhaT4YGU1JtcOixJKZgu6KyI&s=iWNt-0G2gZa99bnOqNBMOHph0NyVoJdsIwuA07GhJZQ&e=

I have seen several bug reports about various packages (that call R
CMD build) removing the home directory, so this indeed happens in
practice to a number of people. The commit above will fix `R CMD
build`, but it would be great to "fix" this in general.

It seems pretty hard to prevent users from creating of a "~"
directory. But preventing unlink() from deleting "~" does not actually
seem too hard. If unlink() could just refuse removing "~" (when expand
= TRUE), that would be great. It seems to me that the current behavior
is very-very rarely intended, and its consequences are potentially
disastrous.

If unlink("~", recursive = TRUE) errors, you can still remove a local
"~" file/dir with unlink("./~", ...). And you can still remove your
home directory if you really want to do that, with
unlink(path.expand("~"), ...). So no functionality is lost.

Also, if anyone is aware of packages/functions that tend to create "~"
directories or files, please let me know.

I would be happy to submit a patch for the new unlink("~") behavior.

Thanks,
Gabor

__
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=C3VCGF44o7jATPOlC8aZhaT4YGU1JtcOixJKZgu6KyI&s=FeZWU9uN-HwDNkSBOmbYXiGqu8q8-U6DI-ddyUn7HHw&e=



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Bug in printing array of type "list"

2018-09-26 Thread Hervé Pagès

Hi,

This array is of type "list" but print() reports otherwise:

  a1 <- array(list(1), 2:0)

  typeof(a1)
  # [1] "list"

  a1
  # <2 x 1 x 0 array of character>
  #  [,1]
  # [1,]
  # [2,]

No such problem with an array of type "logical":

  a2 <- array(NA, 2:0)

  typeof(a2)
  # [1] "logical"

  a2
  # <2 x 1 x 0 array of logical>
  #  [,1]
  # [1,]
  # [2,]

Thanks,
H.

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] as.vector() broken on a matrix or array of type "list"

2018-09-26 Thread Hervé Pagès

Hi Martin,

On 09/26/2018 12:41 AM, Martin Maechler wrote:

Hervé Pagès
 on Tue, 25 Sep 2018 23:27:19 -0700 writes:


 > Hi, Unlike on an atomic matrix, as.vector() doesn't drop
 > the "dim" attribute of matrix or array of type "list":



m <- matrix(list(), nrow=2, ncol=3)
m
#  [,1] [,2] [,3]
# [1,] NULL NULL NULL
# [2,] NULL NULL NULL




as.vector(m)
#  [,1] [,2] [,3]
# [1,] NULL NULL NULL
# [2,] NULL NULL NULL


as documented and as always, including (probably all) versions of S and S-plus.


is.vector(as.vector(m))
# [1] FALSE


as bad is that looks, that's also "known" and has been the case
forever as well...

I agree that the semantics of as.vector(.)  are not what you
would expect, and probably neither what we would do when
creating R today. *)
The help page {the same for as.vector() and is.vector()}
mentions that as.vector() behavior more than once, notably at
the end of 'Details' and its 'Note's
... with one exception where you have a strong point, and the documenation
is incomplete at least -- under the heading

  Methods for 'as.vector()':

... follow the conventions of the default method.  In particular

...
...
...

• ‘is.vector(as.vector(x, m), m)’ should be true for any mode ‘m’,
   including the default ‘"any"’.

and you are right that this is not fulfilled in the case the
list has a 'dim' attribute.

But I don't think we "can" change as.vector(.) for that case
(where it is a no-op).
Rather  possibly is.vector(.) should not return FALSE but TRUE -- with
the reasoning (I think most experienced R programmers would
agree) that the foremost property of 'm' is to be
  - a list() {with a dim attribute and matrix-like indexing possibility}
rather than
  - a 'matrix' {where every matrix entry is a list()}.


Note that this change would break all the code around that uses
is.vector() to distinguish between an array (of mode "atomic" or
"list") and a non-array. Arguably is.array() should preferably be
used for that but I'm sure there is a lot of code around that uses
is.vector().

The bottom of the problem is that as.vector() doesn't drop attributes
that is.vector() sees as "vector breakers" i.e. as breaking the vector
nature of an object. So for example is.vector() considers the "dim"
attribute to be a vector breaker but as.vector() doesn't drop it.

So yes in order to bring is.vector() and as.vector() in agreement you
can either change one or the other, or both. My gut feeling though is
that it would be less disruptive to not change what is.vector() thinks
about the "dim" attribute and to make sure that as.vector() **always**
drops it (together with "dimnames" if present). How much code around
could there be that calls as.vector() on an array and expects the "dim"
attribute to be dropped **except** when the mode() of the array is
"list"? It is more likely that the code around that calls as.vector()
on an array doesn't expect such exception and so is broken. This was
actually the case for my code ;-)

Thanks,
H.



At the moment my gut feeling would propose to only update the
documentation, adding that one case as "an exception for historic reasons".

Martin

-
*) {Possibly such an R we would create today would be much closer to
 julia, where every function is generic / a multi-dispach method
 "a la S4"  and still be blazingly fast, thanks to JIT
 compilation, method caching and more smart things.}
But as you know one of the strength of (base) R is its stability
and reliability.  You can only use something as a "the language
of applied statistics and data science" and rely that published
code still works 10 years later if the language is not
changed/redesigned from scratch every few years ((as some ... are)).





--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] as.vector() broken on a matrix or array of type "list"

2018-09-25 Thread Hervé Pagès

Hi,

Unlike on an atomic matrix, as.vector() doesn't drop the "dim"
attribute of matrix or array of type "list":

  m <- matrix(list(), nrow=2, ncol=3)
  m
  #  [,1] [,2] [,3]
  # [1,] NULL NULL NULL
  # [2,] NULL NULL NULL

  as.vector(m)
  #  [,1] [,2] [,3]
  # [1,] NULL NULL NULL
  # [2,] NULL NULL NULL

  is.vector(as.vector(m))
  # [1] FALSE

Thanks,
H.

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bias in R's random integers?

2018-09-20 Thread Hervé Pagès

Hi,

Note that it wouldn't be the first time that sample() changes behavior
in a non-backward compatible way:

  https://stat.ethz.ch/pipermail/r-devel/2012-October/065049.html

Cheers,
H.


On 09/20/2018 08:15 AM, Duncan Murdoch wrote:

On 20/09/2018 6:59 AM, Ralf Stubner wrote:

On 9/20/18 1:43 AM, Carl Boettiger wrote:
For a well-tested C algorithm, based on my reading of Lemire, the 
unbiased
"algorithm 3" in 
https://urldefense.proofpoint.com/v2/url?u=https-3A__arxiv.org_abs_1805.10941&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=tVt5ARiRzaOYr7BgOc0nC_hDq80BUkAUKNwcowN5W1k&s=TtofIDsvWasBZGzOl9J0kBQnJMksr2Rg3u1l8CM5-qE&e= 
is part already of the C
standard library in OpenBSD and macOS (as arc4random_uniform), and in 
the
GNU standard library.  Lemire also provides C++ code in the appendix 
of his

piece for both this and the faster "nearly divisionless" algorithm.

It would be excellent if any R core members were interested in 
considering
bindings to these algorithms as a patch, or might express 
expectations for

how that patch would have to operate (e.g. re Duncan's comment about
non-integer arguments to sample size).  Otherwise, an R package binding
seems like a good starting point, but I'm not the right volunteer.

It is difficult to do this in a package, since R does not provide access
to the random bits generated by the RNG. Only a float in (0,1) is
available via unif_rand(). 


I believe it is safe to multiply the unif_rand() value by 2^32, and take 
the whole number part as an unsigned 32 bit integer.  Depending on the 
RNG in use, that will give at least 25 random bits.  (The low order bits 
are the questionable ones.  25 is just a guess, not a guarantee.)


However, if one is willing to use an external

RNG, it is of course possible. After reading about Lemire's work [1], I
had planned to integrate such an unbiased sampling scheme into the dqrng
package, which I have now started. [2]

Using Duncan's example, the results look much better:


library(dqrng)
m <- (2/5)*2^32
y <- dqsample(m, 100, replace = TRUE)
table(y %% 2)


  0  1
500252 499748


Another useful diagnostic is

   plot(density(y[y %% 2 == 0]))

Obviously that should give a more or less uniform density, but for 
values near m, the default sample() gives some nice pretty pictures of 
quite non-uniform densities.


By the way, there are actually quite a few examples of very large m 
besides m = (2/5)*2^32 where performance of sample() is noticeably bad. 
You'll see problems in y %% 2 for any integer a > 1 with m = 2/(1 + 2a) 
* 2^32, problems in y %% 3 for m = 3/(1 + 3a)*2^32 or m = 3/(2 + 
3a)*2^32, etc.


So perhaps I'm starting to be convinced that the default sample() should 
be fixed.


Duncan Murdoch




Currently I am taking the other interpretation of "truncated":


table(dqsample(2.5, 100, replace = TRUE))


  0  1
499894 500106

I will adjust this to whatever is decided for base R.


However, there is currently neither long vector nor weighted sampling
support. And the performance without replacement is quite bad compared
to R's algorithm with hashing.

cheerio
ralf

[1] via 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.pcg-2Drandom.org_posts_bounded-2Drands.html&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=tVt5ARiRzaOYr7BgOc0nC_hDq80BUkAUKNwcowN5W1k&s=OlX-dzwoOeFlod3Gofa_1TQaZwmjsCH9C9v3lM5Y2rY&e= 

[2] 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_daqana_dqrng_tree_feature_sample&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=tVt5ARiRzaOYr7BgOc0nC_hDq80BUkAUKNwcowN5W1k&s=DNaSqRCy89Hvbg1G0SpyEL0kkr9_RqWXi9pTy75V32M&e= 





__
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=tVt5ARiRzaOYr7BgOc0nC_hDq80BUkAUKNwcowN5W1k&s=WOx4NyeYmWxpDG3tBRQ9-_Y3_7YAlKUKOP6gZLs0BrQ&e= 





__
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=tVt5ARiRzaOYr7BgOc0nC_hDq80BUkAUKNwcowN5W1k&s=WOx4NyeYmWxpDG3tBRQ9-_Y3_7YAlKUKOP6gZLs0BrQ&e= 



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Argument 'dim' misspelled in error message

2018-09-04 Thread Hervé Pagès

Thanks!

On 09/01/2018 05:42 AM, Kurt Hornik wrote:

Hervé Pagès writes:


Thanks: fixed in the trunk with c75223.

Best
-k


Hi,
The following error message misspells the name of
the 'dim' argument:



array(integer(0), dim=integer(0))

Error in array(integer(0), dim = integer(0)) :
  'dims' cannot be of length 0



The name of the argument is 'dim' not 'dims':



args(array)

function (data = NA, dim = length(data), dimnames = NULL)
NULL



Cheers,
H.



--
Hervé Pagès



Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024



E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319



__
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFAw&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=SzMRc3M_TJEtaAqp-2nqiquGAjCH605Ocf2-jkPG_1E&s=1PeobGV2Ld7gOtIS5coLotgg3VLknDQyCXVjO08DbX4&e=


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Argument 'dim' misspelled in error message

2018-08-31 Thread Hervé Pagès

Hi,

The following error message misspells the name of
the 'dim' argument:

  > array(integer(0), dim=integer(0))
  Error in array(integer(0), dim = integer(0)) :
'dims' cannot be of length 0

The name of the argument is 'dim' not 'dims':

  > args(array)
  function (data = NA, dim = length(data), dimnames = NULL)
  NULL

Cheers,
H.

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Where does L come from?

2018-08-25 Thread Hervé Pagès

On 08/25/2018 04:33 PM, Duncan Murdoch wrote:

On 25/08/2018 4:49 PM, Hervé Pagès wrote:

The choice of the L suffix in R to mean "R integer type", which
is mapped to the "int" type at the C level, and NOT to the "long int"
type, is really unfortunate as it seems to be misleading and confusing
a lot of people.


I don't have stats about this so I take back the "lot".

Can you provide any evidence of that (e.g. a link to a message from one 
of these people)?  I think a lot of people don't really know about the L 
suffix, but that's different from being confused or misleaded by it.


And if you make a criticism like that, it would really be fair to 
suggest what R should have done instead.  I can't think of anything 
better, given that "i" was already taken, and that the lack of a decimal 
place had historically not been significant.  Using "I" *would* have 
been confusing (3i versus 3I being very different).  Deciding that 3 
suddenly became an integer value different from 3. would have led to 
lots of inefficient conversions (since stats mainly deals with floating 
point values).


Maybe 10N, or 10n? I'm not convinced that 10I would have been
confusing but the I can easily be mistaken for a 1.

H.



Duncan Murdoch




The fact that nowadays "int" and "long int" have the same size on most
platforms is only anecdotal here.

Just my 2 cents.

H.

On 08/25/2018 10:01 AM, Dirk Eddelbuettel wrote:


On 25 August 2018 at 09:28, Carl Boettiger wrote:
| I always thought it meant "Long" (I'm assuming R's integers are long
| integers in C sense (iirrc one can declare 'long x', and it being 
common to

| refer to integers as "longs"  in the same way we use "doubles" to mean
| double precision floating point).  But pure speculation on my part, 
so I'm

| curious!

It does per my copy (dated 1990 !!) of the 2nd ed of Kernighan & 
Ritchie.  It
explicitly mentions (sec 2.2) that 'int' may be 16 or 32 bits, and 
'long' is
32 bit; and (in sec 2.3) introduces the I, U, and L labels for 
constants.  So
"back then when" 32 bit was indeed long.  And as R uses 32 bit 
integers ...


(It is all murky because the size is an implementation detail and later
"essentially everybody" moved to 32 bit integers and 64 bit longs as 
the 64
bit architectures became prevalent.  Which is why when it matters one 
should

really use more explicit types like int32_t or int64_t.)

Dirk







--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Where does L come from?

2018-08-25 Thread Hervé Pagès




On 08/25/2018 02:23 PM, Dirk Eddelbuettel wrote:


On 25 August 2018 at 13:49, Hervé Pagès wrote:
| The choice of the L suffix in R to mean "R integer type", which
| is mapped to the "int" type at the C level, and NOT to the "long int"
| type, is really unfortunate as it seems to be misleading and confusing
| a lot of people.

The point I was trying to make in what you quote below is that the L may come
from a time when int and long int were in fact the same on most relevant
architectures. And it is hardly R's fault that C was allowed to change.

Also, it hardly matters given that R has precisely one integer type so I am
unsure where you see the confusion between long int and int.
  
| The fact that nowadays "int" and "long int" have the same size on most

| platforms is only anecdotal here.
|
| Just my 2 cents.

Are you sure?

   R> Rcpp::evalCpp("sizeof(long int)")
   [1] 8
   R> Rcpp::evalCpp("sizeof(int)")
   [1] 4
   R>


My bad, it's only the same on Windows. My point is that the discussion
about the size of int vs long int is only a distraction here. The 
important bit is that 10L in R is represented by 10 in C, which is an

int, not by 10L, which is a long int. Could hardly be more confusing.

H.




Dirk

| H.
|
| On 08/25/2018 10:01 AM, Dirk Eddelbuettel wrote:
| >
| > On 25 August 2018 at 09:28, Carl Boettiger wrote:
| > | I always thought it meant "Long" (I'm assuming R's integers are long
| > | integers in C sense (iirrc one can declare 'long x', and it being common 
to
| > | refer to integers as "longs"  in the same way we use "doubles" to mean
| > | double precision floating point).  But pure speculation on my part, so I'm
| > | curious!
| >
| > It does per my copy (dated 1990 !!) of the 2nd ed of Kernighan & Ritchie.  
It
| > explicitly mentions (sec 2.2) that 'int' may be 16 or 32 bits, and 'long' is
| > 32 bit; and (in sec 2.3) introduces the I, U, and L labels for constants.  
So
| > "back then when" 32 bit was indeed long.  And as R uses 32 bit integers ...
| >
| > (It is all murky because the size is an implementation detail and later
| > "essentially everybody" moved to 32 bit integers and 64 bit longs as the 64
| > bit architectures became prevalent.  Which is why when it matters one should
| > really use more explicit types like int32_t or int64_t.)
| >
| > Dirk
| >
|
| --
| Hervé Pagès
|
| Program in Computational Biology
| Division of Public Health Sciences
| Fred Hutchinson Cancer Research Center
| 1100 Fairview Ave. N, M1-B514
| P.O. Box 19024
| Seattle, WA 98109-1024
|
| E-mail: hpa...@fredhutch.org
| Phone:  (206) 667-5791
| Fax:(206) 667-1319



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Where does L come from?

2018-08-25 Thread Hervé Pagès

The choice of the L suffix in R to mean "R integer type", which
is mapped to the "int" type at the C level, and NOT to the "long int"
type, is really unfortunate as it seems to be misleading and confusing
a lot of people.

The fact that nowadays "int" and "long int" have the same size on most
platforms is only anecdotal here.

Just my 2 cents.

H.

On 08/25/2018 10:01 AM, Dirk Eddelbuettel wrote:


On 25 August 2018 at 09:28, Carl Boettiger wrote:
| I always thought it meant "Long" (I'm assuming R's integers are long
| integers in C sense (iirrc one can declare 'long x', and it being common to
| refer to integers as "longs"  in the same way we use "doubles" to mean
| double precision floating point).  But pure speculation on my part, so I'm
| curious!

It does per my copy (dated 1990 !!) of the 2nd ed of Kernighan & Ritchie.  It
explicitly mentions (sec 2.2) that 'int' may be 16 or 32 bits, and 'long' is
32 bit; and (in sec 2.3) introduces the I, U, and L labels for constants.  So
"back then when" 32 bit was indeed long.  And as R uses 32 bit integers ...

(It is all murky because the size is an implementation detail and later
"essentially everybody" moved to 32 bit integers and 64 bit longs as the 64
bit architectures became prevalent.  Which is why when it matters one should
really use more explicit types like int32_t or int64_t.)

Dirk



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] longint

2018-08-16 Thread Hervé Pagès

On 08/16/2018 11:30 AM, Prof Brian Ripley wrote:

On 16/08/2018 18:33, Hervé Pagès wrote:

...


Only on Intel platforms int is 32 bits. Strictly speaking int is only
required to be >= 16 bits. Who knows what the size of an int is on
the Sunway TaihuLight for example ;-)


R's configure checks that int is 32 bit and will not compile without it 
(src/main/arithmetic.c) ... so int and int32_t are the same on all 
platforms where the latter is defined.


Good to know. Thanks for the clarification!

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] longint

2018-08-16 Thread Hervé Pagès

On 08/16/2018 05:12 AM, Dirk Eddelbuettel wrote:


On 15 August 2018 at 20:32, Benjamin Tyner wrote:
| Thanks for the replies and for confirming my suspicion.
|
| Interestingly, src/include/S.h uses a trick:
|
|     #define longint int
|
| and so does the nlme package (within src/init.c).

As Bill Dunlap already told you, this is a) ancient and b) was concerned with
the int as 16 bit to 32 bit transition period. Ie a long time ago. Old C
programmers remember.

You should preferably not even use 'long int' on the other side but rely on
the fact that all compiler nowadays allow you to specify exactly what size is
used via int64_t (long), int32_t (int), ... and the unsigned cousins (which R
does not have).  So please receive the value as a int64_t and then cast it to
an int32_t -- which corresponds to R's notion of an integer on every platform.


Only on Intel platforms int is 32 bits. Strictly speaking int is only
required to be >= 16 bits. Who knows what the size of an int is on
the Sunway TaihuLight for example ;-)

H.



And please note that that conversion is lossy.  If you must keep 64 bits then
the bit64 package by Jens Oehlschlaegel is good and eg fully supported inside
data.table. We use it for 64-bit integers as nanosecond timestamps in our
nanotime package (which has some converters).

Dirk



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] longint

2018-08-15 Thread Hervé Pagès
No segfault but a BIG warning from the compiler. That's because 
dereferencing the pointer inside your myfunc() function will

produce an int that is not predictable i.e. it is system-dependent.
Its value will depend on sizeof(long int) (which is not
guaranteed to be 8) and on the endianness of the system.

Also if the pointer you pass in the call to the function is
an array of long ints, then pointer arithmetic inside your myfunc()
won't necessarily take you to the array element that you'd expect.

Note that there are very specific situations where you can actually
do this kind of things e.g. in the context of writing a callback
function to pass to qsort(). See 'man 3 qsort' if you are on a Unix
system. In that case pointers to void and explicit casts should
be used. If done properly, this is portable code and the compiler won't
issue warnings.

H.


On 08/15/2018 07:05 AM, Brian Ripley wrote:




On 15 Aug 2018, at 12:48, Duncan Murdoch  wrote:


On 15/08/2018 7:08 AM, Benjamin Tyner wrote:
Hi
In my R package, imagine I have a C function defined:
 void myfunc(int *x) {
// some code
 }
but when I call it, I pass it a pointer to a longint instead of a
pointer to an int. Could this practice potentially result in a segfault?


I don't think the passing would cause a segfault, but "some code" might be 
expecting a positive number, and due to the type error you could pass in a positive 
longint and have it interpreted as a negative int.


Are you thinking only of a little-endian system?  A 32-bit lookup of a pointer 
to a 64-bit area could read the wrong half and get a completely different value.



Duncan Murdoch

__
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=ERck0y30d00Np6hqTNYfjusx1beZim0OrKe9O4vkUxU&s=x1gI9ACZol7WbaWQ7Ocv60csJFJClZotWkJIMwUdjIc&e=


__
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=ERck0y30d00Np6hqTNYfjusx1beZim0OrKe9O4vkUxU&s=x1gI9ACZol7WbaWQ7Ocv60csJFJClZotWkJIMwUdjIc&e=



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] MARGIN in base::unique.matrix() and base::unique.array()

2018-07-02 Thread Hervé Pagès

Hi,

The man page for base::unique.matrix() and base::unique.array() says
that MARGIN is expected to be a single integer. OTOH the code in charge
of checking the user supplied MARGIN is:

if (length(MARGIN) > ndim || any(MARGIN > ndim))
stop(gettextf("MARGIN = %d is invalid for dim = %d",
MARGIN, dx), domain = NA)

which doesn't really make sense.

As a consequence the user gets an obscure error message when specifying
a MARGIN that satisfies the above check but is in fact invalid:

  > unique(matrix(1:10, ncol=2), MARGIN=1:2)
  Error in args[[MARGIN]] <- !duplicated.default(temp, fromLast = 
fromLast,  :

object of type 'symbol' is not subsettable

Also the code used by the above check to generate the error message
is broken:

  > unique(matrix(1:10, ncol=2), MARGIN=1:3)
  Error in sprintf(gettext(fmt, domain = domain), ...) :
arguments cannot be recycled to the same length

  > unique(matrix(1:10, ncol=2), MARGIN=3)
  Error in unique.matrix(matrix(1:10, ncol = 2), MARGIN = 3) :
c("MARGIN = 3 is invalid for dim = 5", "MARGIN = 3 is invalid for 
dim = 2")


Thanks,
H.

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subsetting the "ROW"s of an object

2018-06-08 Thread Hervé Pagès




On 06/08/2018 02:15 PM, Hadley Wickham wrote:

On Fri, Jun 8, 2018 at 2:09 PM, Berry, Charles  wrote:




On Jun 8, 2018, at 1:49 PM, Hadley Wickham  wrote:

Hmmm, yes, there must be some special case in the C code to avoid
recycling a length-1 logical vector:



Here is a version that (I think) handles Herve's issue of arrays having one or 
more 0 dimensions.

subset_ROW <-
 function(x,i)
{
 dims <- dim(x)
 index_list <- which(dims[-1] != 0L) + 3
 mc <- quote(x[i])
 nd <- max(1L, length(dims))
 mc[ index_list ] <- list(TRUE)
 mc[[ nd + 3L ]] <- FALSE
 names( mc )[ nd+3L ] <- "drop"
 eval(mc)
}

Curiously enough the timing is *much* better for this implementation than for 
the first version I sent.

Constructing a version of `mc' that looks like `x[idrop=FALSE]' can be done 
with `alist(a=)' in place of `list(TRUE)' in the earlier version but seems to 
slow things down noticeably. It requires almost twice (!!) as much time as the 
version above.


I think that's probably because alist() is a slow way to generate a
missing symbol:

bench::mark(
   alist(x = ),
   list(x = quote(expr = )),
   check = FALSE
)[1:5]
#> # A tibble: 2 x 5
#>   expressionmin mean   median  max
#>  
#> 1 alist(x = ) 2.8µs   3.54µs   3.29µs   34.9µs
#> 2 list(x = quote(expr = ))169ns 219.38ns181ns   24.2µs

(note the units)


That's a good one. Need to change this in S4Vectors::default_extractROWS()
and other places. Thanks!

H.



Hadley




--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subsetting the "ROW"s of an object

2018-06-08 Thread Hervé Pagès

The C code for subsetting doesn't need to recycle a logical subscript.
It only needs to walk on it and start again at the beginning of the
vector when it reaches the end. Not exactly the same as detecting the
"take everything along that dimension" situation though.
x[TRUE, TRUE, TRUE] triggers the full subsetting machinery when x[]
and x[ , , ] could (and should) easily avoid it.

H.

On 06/08/2018 01:49 PM, Hadley Wickham wrote:

Hmmm, yes, there must be some special case in the C code to avoid
recycling a length-1 logical vector:

dims <- c(4, 4, 4, 1e5)

arr <- array(rnorm(prod(dims)), dims)
dim(arr)
#> [1]  4  4  4 10
i <- c(1, 3)

bench::mark(
   arr[i, TRUE, TRUE, TRUE],
   arr[i, , , ]
)[c("expression", "min", "mean", "max")]
#> # A tibble: 2 x 4
#>   expressionmin mean  max
#> 
#> 1 arr[i, TRUE, TRUE, TRUE]   41.8ms   43.6ms   46.5ms
#> 2 arr[i, , , ]   41.7ms   43.1ms   46.3ms


On Fri, Jun 8, 2018 at 12:31 PM, Berry, Charles  wrote:




On Jun 8, 2018, at 11:52 AM, Hadley Wickham  wrote:

On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles  wrote:




On Jun 8, 2018, at 10:37 AM, Hervé Pagès  wrote:

Also the TRUEs cause problems if some dimensions are 0:


matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE]

Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] :
   (subscript) logical subscript too long


OK. But this is easy enough to handle.



H.

On 06/08/2018 10:29 AM, Hadley Wickham wrote:

I suspect this will have suboptimal performance since the TRUEs will
get recycled. (Maybe there is, or could be, ALTREP, support for
recycling)
Hadley



AFAICS, it is not an issue. Taking

arr <- array(rnorm(2^22),c(2^10,4,4,4))

as a test case

and using a function that will either use the literal code `x[idrop=FALSE]' 
or `eval(mc)':

subset_ROW4 <-
 function(x, i, useLiteral=FALSE)
{
literal <- quote(x[idrop=FALSE])
mc <- quote(x[i])
nd <- max(1L, length(dim(x)))
mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L)
mc[["drop"]] <- FALSE
if (useLiteral)
eval(literal)
else
eval(mc)
}

I get identical times with

system.time(for (i in 1:1) subset_ROW4(arr,seq(1,length=10,by=100),TRUE))

and with

system.time(for (i in 1:1) subset_ROW4(arr,seq(1,length=10,by=100),FALSE))


I think that's because you used a relatively low precision timing
mechnaism, and included the index generation in the timing. I see:

arr <- array(rnorm(2^22),c(2^10,4,4,4))
i <- seq(1,length = 10, by = 100)

bench::mark(
  arr[i, TRUE, TRUE, TRUE],
  arr[i, , , ]
)
#> # A tibble: 2 x 1
#>   expressionminmean   median  max  n_gc
#>
#> 1 arr[i, TRUE,…   7.4µs  10.9µs  10.66µs   1.22ms 2
#> 2 arr[i, , , ]   7.06µs   8.8µs   7.85µs 538.09µs 2

So not a huge difference, but it's there.



Funny. I get similar results to yours above albeit with smaller differences. 
Usually < 5 percent.

But with subset_ROW4 I see no consistent difference.

In this example, it runs faster on average using `eval(mc)' to return the 
result:


arr <- array(rnorm(2^22),c(2^10,4,4,4))
i <- seq(1,length=10,by=100)
bench::mark(subset_ROW4(arr,i,FALSE), subset_ROW4(arr,i,TRUE))[,1:8]

# A tibble: 2 x 8
   expression  min mean   median  max `itr/sec` 
mem_alloc  n_gc
   

1 subset_ROW4(arr, i, FALSE)   28.9µs   34.9µs   32.1µs   1.36ms28686.
5.05KB 5
2 subset_ROW4(arr, i, TRUE)28.9µs 35µs   32.4µs 875.11µs28572.
5.05KB 5




And on subsequent reps the lead switches back and forth.


Chuck







--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subsetting the "ROW"s of an object

2018-06-08 Thread Hervé Pagès

A missing subscript is still preferable to a TRUE though because it
carries the meaning "take it all". A TRUE also achieves this but via
implicit recycling. For example x[ , , ] and x[TRUE, TRUE, TRUE]
achieve the same thing (if length(x) != 0) and are both no-ops but
the subsetting code gets a chance to immediately and easily detect
the former as a no-op whereas it will probably not be able to do it
so easily for the latter. So in this case it will most likely generate
a copy of 'x' and fill the new array by taking a full walk on it.

H.

On 06/08/2018 11:52 AM, Hadley Wickham wrote:

On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles  wrote:




On Jun 8, 2018, at 10:37 AM, Hervé Pagès  wrote:

Also the TRUEs cause problems if some dimensions are 0:

  > matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE]
  Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] :
(subscript) logical subscript too long


OK. But this is easy enough to handle.



H.

On 06/08/2018 10:29 AM, Hadley Wickham wrote:

I suspect this will have suboptimal performance since the TRUEs will
get recycled. (Maybe there is, or could be, ALTREP, support for
recycling)
Hadley



AFAICS, it is not an issue. Taking

arr <- array(rnorm(2^22),c(2^10,4,4,4))

as a test case

and using a function that will either use the literal code `x[idrop=FALSE]' 
or `eval(mc)':

subset_ROW4 <-
  function(x, i, useLiteral=FALSE)
{
 literal <- quote(x[idrop=FALSE])
 mc <- quote(x[i])
 nd <- max(1L, length(dim(x)))
 mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L)
 mc[["drop"]] <- FALSE
 if (useLiteral)
 eval(literal)
 else
 eval(mc)
  }

I get identical times with

system.time(for (i in 1:1) subset_ROW4(arr,seq(1,length=10,by=100),TRUE))

and with

system.time(for (i in 1:1) subset_ROW4(arr,seq(1,length=10,by=100),FALSE))


I think that's because you used a relatively low precision timing
mechnaism, and included the index generation in the timing. I see:

arr <- array(rnorm(2^22),c(2^10,4,4,4))
i <- seq(1,length = 10, by = 100)

bench::mark(
   arr[i, TRUE, TRUE, TRUE],
   arr[i, , , ]
)
#> # A tibble: 2 x 1
#>   expressionminmean   median  max  n_gc
#>
#> 1 arr[i, TRUE,…   7.4µs  10.9µs  10.66µs   1.22ms 2
#> 2 arr[i, , , ]   7.06µs   8.8µs   7.85µs 538.09µs 2

So not a huge difference, but it's there.

Hadley




--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subsetting the "ROW"s of an object

2018-06-08 Thread Hervé Pagès

Also the TRUEs cause problems if some dimensions are 0:

  > matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE]
  Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] :
(subscript) logical subscript too long

H.

On 06/08/2018 10:29 AM, Hadley Wickham wrote:

I suspect this will have suboptimal performance since the TRUEs will
get recycled. (Maybe there is, or could be, ALTREP, support for
recycling)
Hadley

On Fri, Jun 8, 2018 at 10:16 AM, Berry, Charles  wrote:




On Jun 8, 2018, at 8:45 AM, Hadley Wickham  wrote:

Hi all,

Is there a better to way to subset the ROWs (in the sense of NROW) of
an vector, matrix, data frame or array than this?



You can use TRUE to fill the subscripts for dimensions 2:nd



subset_ROW <- function(x, i) {
  nd <- length(dim(x))
  if (nd <= 1L) {
x[i]
  } else {
dims <- rep(list(quote(expr = )), nd - 1L)
do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE)))
  }
}



subset_ROW <-
 function(x,i)
{
 mc <- quote(x[i])
 nd <- max(1L, length(dim(x)))
 mc[seq(4, length=nd-1L)] <- rep(list(TRUE), nd - 1L)
 mc[["drop"]] <- FALSE
 eval(mc)

}



subset_ROW(1:10, 4:6)
#> [1] 4 5 6

str(subset_ROW(array(1:10, c(10)), 2:4))
#>  int [1:3(1d)] 2 3 4
str(subset_ROW(array(1:10, c(10, 1)), 2:4))
#>  int [1:3, 1] 2 3 4
str(subset_ROW(array(1:10, c(5, 2)), 2:4))
#>  int [1:3, 1:2] 2 3 4 7 8 9
str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4))
#>  int [1:3, 1, 1] 2 3 4

subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4)
#>   x y
#> 2 2 9
#> 3 3 8
#> 4 4 7



HTH,

Chuck







--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subsetting the "ROW"s of an object

2018-06-08 Thread Hervé Pagès

On 06/08/2018 10:32 AM, Hervé Pagès wrote:

On 06/08/2018 10:15 AM, Michael Lawrence wrote:

There probably should be an abstraction for this. In S4Vectors, we
have extractROWS().


FWIW the code in S4Vectors that does what your subset_ROW() does is:


https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Bioconductor_S4Vectors_blob_04cc9516af986b30445e99fd1337f13321b7b4f6_R_subsetting-2Dutils.R-23L466-2DL476&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=LnDTzOeXwI6VI-4SVVi2rwDE7A-az-AhxPAB6X7Lkhc&s=_2PVGd2BrNNHtPjGsJkhSLAmtX3eoFuZDWWs2c8zZ4w&e= 


Wrong link sorry. Here is the correct one:


https://github.com/Bioconductor/S4Vectors/blob/04cc9516af986b30445e99fd1337f13321b7b4f6/R/subsetting-utils.R#L453-L464

H.




(This is the default "extractROWS" method.)

Except for the normalization of 'i', it does the same as your
subset_ROW(). I don't know how to do this without generating a call
with missing arguments.

H.



Michael

On Fri, Jun 8, 2018 at 8:45 AM, Hadley Wickham  
wrote:

Hi all,

Is there a better to way to subset the ROWs (in the sense of NROW) of
an vector, matrix, data frame or array than this?

subset_ROW <- function(x, i) {
   nd <- length(dim(x))
   if (nd <= 1L) {
 x[i]
   } else {
 dims <- rep(list(quote(expr = )), nd - 1L)
 do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE)))
   }
}

subset_ROW(1:10, 4:6)
#> [1] 4 5 6

str(subset_ROW(array(1:10, c(10)), 2:4))
#>  int [1:3(1d)] 2 3 4
str(subset_ROW(array(1:10, c(10, 1)), 2:4))
#>  int [1:3, 1] 2 3 4
str(subset_ROW(array(1:10, c(5, 2)), 2:4))
#>  int [1:3, 1:2] 2 3 4 7 8 9
str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4))
#>  int [1:3, 1, 1] 2 3 4

subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4)
#>   x y
#> 2 2 9
#> 3 3 8
#> 4 4 7

It seems like there should be a way to do this that doesn't require
generating a call with missing arguments, but I can't think of it.

Thanks!

Hadley

--
https://urldefense.proofpoint.com/v2/url?u=http-3A__hadley.nz&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=MF0DzYDiaYtcFXIyQwpQKs9lVbLNvdBBUubTv7BVAfM&s=GSpoAzc1Kn_BnTIkDh0HBFGKtRm-xFodxEPOejriC9Q&e= 



__
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=MF0DzYDiaYtcFXIyQwpQKs9lVbLNvdBBUubTv7BVAfM&s=HsEbNAT5IElAUS-W2VVSeJs4tfQc77heV7BbQxru518&e= 





__
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=MF0DzYDiaYtcFXIyQwpQKs9lVbLNvdBBUubTv7BVAfM&s=HsEbNAT5IElAUS-W2VVSeJs4tfQc77heV7BbQxru518&e= 







--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subsetting the "ROW"s of an object

2018-06-08 Thread Hervé Pagès

On 06/08/2018 10:15 AM, Michael Lawrence wrote:

There probably should be an abstraction for this. In S4Vectors, we
have extractROWS().


FWIW the code in S4Vectors that does what your subset_ROW() does is:


https://github.com/Bioconductor/S4Vectors/blob/04cc9516af986b30445e99fd1337f13321b7b4f6/R/subsetting-utils.R#L466-L476

(This is the default "extractROWS" method.)

Except for the normalization of 'i', it does the same as your
subset_ROW(). I don't know how to do this without generating a call
with missing arguments.

H.



Michael

On Fri, Jun 8, 2018 at 8:45 AM, Hadley Wickham  wrote:

Hi all,

Is there a better to way to subset the ROWs (in the sense of NROW) of
an vector, matrix, data frame or array than this?

subset_ROW <- function(x, i) {
   nd <- length(dim(x))
   if (nd <= 1L) {
 x[i]
   } else {
 dims <- rep(list(quote(expr = )), nd - 1L)
 do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE)))
   }
}

subset_ROW(1:10, 4:6)
#> [1] 4 5 6

str(subset_ROW(array(1:10, c(10)), 2:4))
#>  int [1:3(1d)] 2 3 4
str(subset_ROW(array(1:10, c(10, 1)), 2:4))
#>  int [1:3, 1] 2 3 4
str(subset_ROW(array(1:10, c(5, 2)), 2:4))
#>  int [1:3, 1:2] 2 3 4 7 8 9
str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4))
#>  int [1:3, 1, 1] 2 3 4

subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4)
#>   x y
#> 2 2 9
#> 3 3 8
#> 4 4 7

It seems like there should be a way to do this that doesn't require
generating a call with missing arguments, but I can't think of it.

Thanks!

Hadley

--
https://urldefense.proofpoint.com/v2/url?u=http-3A__hadley.nz&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=MF0DzYDiaYtcFXIyQwpQKs9lVbLNvdBBUubTv7BVAfM&s=GSpoAzc1Kn_BnTIkDh0HBFGKtRm-xFodxEPOejriC9Q&e=

__
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=MF0DzYDiaYtcFXIyQwpQKs9lVbLNvdBBUubTv7BVAfM&s=HsEbNAT5IElAUS-W2VVSeJs4tfQc77heV7BbQxru518&e=



__
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=MF0DzYDiaYtcFXIyQwpQKs9lVbLNvdBBUubTv7BVAfM&s=HsEbNAT5IElAUS-W2VVSeJs4tfQc77heV7BbQxru518&e=



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Dispatch mechanism seems to alter object before calling method on it

2018-05-16 Thread Hervé Pagès

On 05/16/2018 01:24 PM, Michael Lawrence wrote:

On Wed, May 16, 2018 at 12:23 PM, Hervé Pagès  wrote:

On 05/16/2018 10:22 AM, Michael Lawrence wrote:


Factors and data.frames are not structures, because they must have a
class attribute. Just call them "objects". They are higher level than
structures, which in practice just shape data without adding a lot of
semantics. Compare getClass("matrix") and getClass("factor").

I agree that inheritance through explicit coercion is confusing. As
far as I know, there are only 2 places where it is used:
1) Objects with attributes but no class, basically "structure" and its
subclasses "array" <- "matrix"
2) Classes that extend a reference type ("environment", "name" and
"externalptr") via hidden delegation (@.xData)

I'm not sure if anyone should be doing #2. For #1, a simple "fix"
would be just to drop inheritance of "structure" from "vector". I
think the intent was to mimic base R behavior, where it will happily
strip (or at least ignore) attributes when passing an array or matrix
to an internal function that expects a vector.

A related problem, which explains why factor and data.frame inherit
from "vector" even though they are objects, is that any S4 object
derived from those needs to be (for pragmatic compatibility reasons)
an integer vector or list, respectively, internally (the virtual
@.Data slot). Separating that from inheritance would probably be
difficult.

Yes, we can consider these to be problems, to some extent stemming
from the behavior and design of R itself, but I'm not sure it's worth
doing anything about them at this point.



Thanks for the informative discussion. It still doesn't explain
why 'm' gets its attributes stripped and 'x' does not though:

   m <- matrix(1:12, ncol=3)
   x <- structure(1:3, titi="A")

   setGeneric("foo", function(x) standardGeneric("foo"))
   setMethod("foo", "vector", identity)

   foo(m)
   # [1]  1  2  3  4  5  6  7  8  9 10 11 12

   foo(x)
   # [1] 1 2 3
   # attr(,"titi")
   # [1] "A"

If I understand correctly, both are "structures", not "objects".



The structure 'x' has no class, so nothing special is going to happen.
As you know, S4 has a well-defined class hierarchy. Just look at
getClass("structure") to see its subclasses. There was at some point
an attempt to create a sort of dynamic inheritance, where a 'test'
function would be called and could figure this out. However, that was
never implemented. For one thing, it would be even more confusing.


Why aren't these problems worth fixing? More generally speaking
the erratic behavior of the S4 system with respect to S3 objects
has been a plague since the beginning of the methods package.
And many people have complained about this in many occasions in
one way or another. For the record, here are some of the most
notorious problems:

   class(as.numeric(1:4))
   # [1] "numeric"
   class(as(1:4, "numeric"))
   # [1] "integer"



This is not really a problem with the methods package. is.numeric(1L)
is TRUE, thus integer extends numeric, so coercing an integer to
numeric is a no-op.


Only as(1:4, "numeric", strict=FALSE) should be a no-op.
as(1:4, "numeric") should still coerce because as() is supposed
to perform strict coercion by default.


as.numeric() should really be called as.double()
or something. But that's not going to change, of course.


as.numeric() is doing the right thing (i.e. strict coercion) so there
is no need to touch it.




   is.vector(matrix())
   # [1] FALSE
   is(matrix(), "vector")
   # [1] TRUE



We already discussed this in the context of "structure" inheriting
from "vector" and explicit coercion.


   is.list(data.frame())
   # [1] TRUE
   is(data.frame(), "list")
   # [1] FALSE
   extends("data.frame", "list")
   # [1] TRUE



This is a compromise for compatibility with inherits(), since the
result of data.frame() is an S3 object.


So we should add to the list that inherits(data.frame(), "list") is
broken too. Once it gets fixed, is(data.frame(), "list") won't need
to compromise anymore and will be free to return the correct answer.





   is(data.frame(), "vector")
   # [1] FALSE
   is(data.frame(), "factor")
   # [1] FALSE
   is(data.frame(), "vector_OR_factor")
   # [1] TRUE



The question is: which inheritance to follow, S3 or S4? Since "vector"
is a basic class, inheritance follows S3 rules. But the class union is
an S4 class, so it follows S4 rules.


   etc...

Many people stay away from S4 because of these incomprehensible
behaviors.

Finally

Re: [Rd] Dispatch mechanism seems to alter object before calling method on it

2018-05-16 Thread Hervé Pagès

On 05/16/2018 10:22 AM, Michael Lawrence wrote:

Factors and data.frames are not structures, because they must have a
class attribute. Just call them "objects". They are higher level than
structures, which in practice just shape data without adding a lot of
semantics. Compare getClass("matrix") and getClass("factor").

I agree that inheritance through explicit coercion is confusing. As
far as I know, there are only 2 places where it is used:
1) Objects with attributes but no class, basically "structure" and its
subclasses "array" <- "matrix"
2) Classes that extend a reference type ("environment", "name" and
"externalptr") via hidden delegation (@.xData)

I'm not sure if anyone should be doing #2. For #1, a simple "fix"
would be just to drop inheritance of "structure" from "vector". I
think the intent was to mimic base R behavior, where it will happily
strip (or at least ignore) attributes when passing an array or matrix
to an internal function that expects a vector.

A related problem, which explains why factor and data.frame inherit
from "vector" even though they are objects, is that any S4 object
derived from those needs to be (for pragmatic compatibility reasons)
an integer vector or list, respectively, internally (the virtual
@.Data slot). Separating that from inheritance would probably be
difficult.

Yes, we can consider these to be problems, to some extent stemming
from the behavior and design of R itself, but I'm not sure it's worth
doing anything about them at this point.


Thanks for the informative discussion. It still doesn't explain
why 'm' gets its attributes stripped and 'x' does not though:

  m <- matrix(1:12, ncol=3)
  x <- structure(1:3, titi="A")

  setGeneric("foo", function(x) standardGeneric("foo"))
  setMethod("foo", "vector", identity)

  foo(m)
  # [1]  1  2  3  4  5  6  7  8  9 10 11 12

  foo(x)
  # [1] 1 2 3
  # attr(,"titi")
  # [1] "A"

If I understand correctly, both are "structures", not "objects".

Why aren't these problems worth fixing? More generally speaking
the erratic behavior of the S4 system with respect to S3 objects
has been a plague since the beginning of the methods package.
And many people have complained about this in many occasions in
one way or another. For the record, here are some of the most
notorious problems:

  class(as.numeric(1:4))
  # [1] "numeric"
  class(as(1:4, "numeric"))
  # [1] "integer"

  is.vector(matrix())
  # [1] FALSE
  is(matrix(), "vector")
  # [1] TRUE

  is.list(data.frame())
  # [1] TRUE
  is(data.frame(), "list")
  # [1] FALSE
  extends("data.frame", "list")
  # [1] TRUE

  setClassUnion("vector_OR_factor", c("vector", "factor"))
  is(data.frame(), "vector")
  # [1] FALSE
  is(data.frame(), "factor")
  # [1] FALSE
  is(data.frame(), "vector_OR_factor")
  # [1] TRUE

  etc...

Many people stay away from S4 because of these incomprehensible
behaviors.

Finally note that even pure S3 operations can produce output that
doesn't make sense:

  is.list(data.frame())
  # [1] TRUE
  is.vector(list())
  # [1] TRUE
  is.vector(data.frame())
  # [1] FALSE

  (that is: a data frame is a list and a list is a vector but
  a data frame is not a vector!)

Why aren't these problems taken more seriously?

Thanks,
H.



Michael

On Wed, May 16, 2018 at 8:33 AM, Hervé Pagès  wrote:

On 05/15/2018 09:13 PM, Michael Lawrence wrote:


My understanding is that array (or any other structure) does not
"simply" inherit from vector, because structures are not vectors in
the strictest sense. Basically, once a vector gains attributes, it is
a structure, not a vector. The methods package accommodates this by
defining an "is" relationship between "structure" and "vector" via an
"explicit coerce", such that any "structure" passed to a "vector"
method is first passed to as.vector(), which strips attributes. This
is very much by design.



It seems that the problem is really with matrices and arrays, not
with "structures" in general:

   f <- factor(c("z", "x", "z"), levels=letters)
   m <- matrix(1:12, ncol=3)
   df <- data.frame(f=f)
   x <- structure(1:3, titi="A")

Only the matrix looses its attributes when passed to a "vector"
method:

   setGeneric("foo", function(x) standardGeneric("foo"))
   setMethod("foo", "vector", identity)

   foo(f) # attributes are preserved
   # [1] z x z
   # Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z

   foo(m) # attribute

Re: [Rd] Dispatch mechanism seems to alter object before calling method on it

2018-05-16 Thread Hervé Pagès

On 05/15/2018 09:13 PM, Michael Lawrence wrote:

My understanding is that array (or any other structure) does not
"simply" inherit from vector, because structures are not vectors in
the strictest sense. Basically, once a vector gains attributes, it is
a structure, not a vector. The methods package accommodates this by
defining an "is" relationship between "structure" and "vector" via an
"explicit coerce", such that any "structure" passed to a "vector"
method is first passed to as.vector(), which strips attributes. This
is very much by design.


It seems that the problem is really with matrices and arrays, not
with "structures" in general:

  f <- factor(c("z", "x", "z"), levels=letters)
  m <- matrix(1:12, ncol=3)
  df <- data.frame(f=f)
  x <- structure(1:3, titi="A")

Only the matrix looses its attributes when passed to a "vector"
method:

  setGeneric("foo", function(x) standardGeneric("foo"))
  setMethod("foo", "vector", identity)

  foo(f) # attributes are preserved
  # [1] z x z
  # Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z

  foo(m) # attributes are stripped
  # [1]  1  2  3  4  5  6  7  8  9 10 11 12

  foo(df)# attributes are preserved
  #   f
  # 1 z
  # 2 x
  # 3 z

  foo(x) # attributes are preserved
  # [1] 1 2 3
  # attr(,"titi")
  # [1] "A"

Also if structures are passed to as.vector() before being passed to
a "vector" method, shouldn't as.vector() and foo() be equivalent on
them? For 'f' and 'x' they're not:

  as.vector(f)
  # [1] "z" "x" "z"

  as.vector(x)
  # [1] 1 2 3

Finally note that for factors and data frames the "vector" method gets
selected despite the fact that is( , "vector") is FALSE:

  is(f, "vector")
  # [1] FALSE

  is(m, "vector")
  # [1] TRUE

  is(df, "vector")
  # [1] FALSE

  is(x, "vector")
  # [1] TRUE

Couldn't we recognize these problems as real, even if they are by
design? Hopefully we can all agree that:
- the dispatch mechanism should only dispatch, not alter objects;
- is() and selectMethod() should not contradict each other.

Thanks,
H.



Michael


On Tue, May 15, 2018 at 5:25 PM, Hervé Pagès  wrote:

Hi,

This was quite unexpected:

   setGeneric("foo", function(x) standardGeneric("foo"))

   setMethod("foo", "vector", identity)

   foo(matrix(1:12, ncol=3))
   # [1]  1  2  3  4  5  6  7  8  9 10 11 12

   foo(array(1:24, 4:2))
   # [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24

If I define a method for array objects, things work as expected though:

   setMethod("foo", "array", identity)

   foo(matrix(1:12, ncol=3))
   #  [,1] [,2] [,3]
   # [1,]159
   # [2,]26   10
   # [3,]37   11
   # [4,]48   12

So, luckily, I have a workaround.

But shouldn't the dispatch mechanism stay away from the business of
altering objects before passed to it?

Thanks,
H.

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=gynT4YhbmVKZhnX4srXlCWZZRyVBMXG211CKgftdEs0&s=_I0aFHQVnXdBfB5kTLg9TxK_2LHdSuaB6gqZwSx1orQ&e=



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Dispatch mechanism seems to alter object before calling method on it

2018-05-15 Thread Hervé Pagès

Hi,

This was quite unexpected:

  setGeneric("foo", function(x) standardGeneric("foo"))

  setMethod("foo", "vector", identity)

  foo(matrix(1:12, ncol=3))
  # [1]  1  2  3  4  5  6  7  8  9 10 11 12

  foo(array(1:24, 4:2))
  # [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 
22 23 24


If I define a method for array objects, things work as expected though:

  setMethod("foo", "array", identity)

  foo(matrix(1:12, ncol=3))
  #  [,1] [,2] [,3]
  # [1,]159
  # [2,]26   10
  # [3,]37   11
  # [4,]48   12

So, luckily, I have a workaround.

But shouldn't the dispatch mechanism stay away from the business of
altering objects before passed to it?

Thanks,
H.

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] length of `...`

2018-05-08 Thread Hervé Pagès

Thanks Martin for the clarifications.  H.

On 05/04/2018 06:02 AM, Martin Maechler wrote:

Hervé Pagès 
 on Thu, 3 May 2018 08:55:20 -0700 writes:


 > Hi,
 > It would be great if one of the experts could comment on the
 > difference between Hadley's dotlength and ...length? The fact
 > that someone bothered to implement a new primitive for that
 > when there seems to be a very simple and straightforward R-only
 > solution suggests that there might be some gotchas/pitfalls with
 > the R-only solution.

Namely


dotlength <- function(...) nargs()



(This is subtly different from calling nargs() directly as it will
only count the elements in ...)



Hadley



Well,  I was the "someone".  In the past I had seen (and used myself)

length(list(...))

and of course that was not usable.
I knew of some substitute() / match.call() tricks [but I think
did not know Bill's cute substitute(...()) !] at the time, but
found them too esoteric.

Aditionally and importantly,  ...length()  and  ..elt(n)  were
developed  "synchronously",  and the R-substitutes for ..elt()
definitely are less trivial (I did not find one at the time), as
Duncan's example to Bill's proposal has shown, so I had looked
at .Primitive() solutions of both.

In hindsight I should have asked here for advice,  but may at
the time I had been a bit frustrated by the results of some of
my RFCs ((nothing specific in mind !))

But __if__ there's really no example where current (3.5.0 and newer)

   ...length()

differs from Hadley's  dotlength()
I'd vert happy to replace ...length 's C based definition by
Hadley's beautiful minimal solution.

Martin


 > On 05/03/2018 08:34 AM, Hadley Wickham wrote:
 >> On Thu, May 3, 2018 at 8:18 AM, Duncan Murdoch 
 wrote:
 >>> On 03/05/2018 11:01 AM, William Dunlap via R-devel wrote:
 >>>>
 >>>> In R-3.5.0 you can use ...length():
 >>>> > f <- function(..., n) ...length()
 >>>> > f(stop("one"), stop("two"), stop("three"), n=7)
 >>>> [1] 3
 >>>>
 >>>> Prior to that substitute() is the way to go
 >>>> > g <- function(..., n) length(substitute(...()))
 >>>> > g(stop("one"), stop("two"), stop("three"), n=7)
 >>>> [1] 3
 >>>>
 >>>> R-3.5.0 also has the ...elt(n) function, which returns
 >>>> the evaluated n'th entry in ... , without evaluating the
 >>>> other ... entries.
 >>>> > fn <- function(..., n) ...elt(n)
 >>>> > fn(stop("one"), 3*5, stop("three"), n=2)
 >>>> [1] 15
 >>>>
 >>>> Prior to 3.5.0, eval the appropriate component of the output
 >>>> of substitute() in the appropriate environment:
 >>>> > gn <- function(..., n) {
 >>>> +   nthExpr <- substitute(...())[[n]]
 >>>> +   eval(nthExpr, envir=parent.frame())
 >>>> + }
 >>>> > gn(stop("one"), environment(), stop("two"), n=2)
 >>>> 
 >>>>
 >>>
 >>> Bill, the last of these doesn't quite work, because ... can be passed 
down
 >>> through a string of callers.  You don't necessarily want to evaluate 
it in
 >>> the parent.frame().  For example:
 >>>
 >>> x <- "global"
 >>> f <- function(...) {
 >>> x <- "f"
 >>> g(...)
 >>> }
 >>> g <- function(...) {
 >>> firstExpr <- substitute(...())[[1]]
 >>> c(list(...)[[1]], eval(firstExpr, envir = parent.frame()))
 >>> }
 >>>
 >>> Calling g(x) correctly prints "global" twice, but calling f(x) 
incorrectly
 >>> prints
 >>>
 >>> [1] "global" "f"
 >>>
 >>> You can get the first element of ... without evaluating the rest using 
..1,
 >>> but I don't know a way to do this for general n in pre-3.5.0 base R.
 >>
 >> If you don't mind using a package:
 >>
 >> # works with R 3.1 and up
 >> library(rlang)
 >>
 >> x <- "global"
 >> f <- function(...) {
 >> x <- "f"
 >> g(...)
 >> }
     >> g <- function(...) {
 >> dots <- enquos(...)
 >> eval_tidy(dots[[1]])
 >> }
 >>
  

Re: [Rd] length of `...`

2018-05-03 Thread Hervé Pagès

Hi,

It would be great if one of the experts could comment on the
difference between Hadley's dotlength and ...length? The fact
that someone bothered to implement a new primitive for that
when there seems to be a very simple and straightforward R-only
solution suggests that there might be some gotchas/pitfalls with
the R-only solution.

Thanks,
H.


On 05/03/2018 08:34 AM, Hadley Wickham wrote:

On Thu, May 3, 2018 at 8:18 AM, Duncan Murdoch  wrote:

On 03/05/2018 11:01 AM, William Dunlap via R-devel wrote:


In R-3.5.0 you can use ...length():
> f <- function(..., n) ...length()
> f(stop("one"), stop("two"), stop("three"), n=7)
[1] 3

Prior to that substitute() is the way to go
> g <- function(..., n) length(substitute(...()))
> g(stop("one"), stop("two"), stop("three"), n=7)
[1] 3

R-3.5.0 also has the ...elt(n) function, which returns
the evaluated n'th entry in ... , without evaluating the
other ... entries.
> fn <- function(..., n) ...elt(n)
> fn(stop("one"), 3*5, stop("three"), n=2)
[1] 15

Prior to 3.5.0, eval the appropriate component of the output
of substitute() in the appropriate environment:
> gn <- function(..., n) {
+   nthExpr <- substitute(...())[[n]]
+   eval(nthExpr, envir=parent.frame())
+ }
> gn(stop("one"), environment(), stop("two"), n=2)




Bill, the last of these doesn't quite work, because ... can be passed down
through a string of callers.  You don't necessarily want to evaluate it in
the parent.frame().  For example:

x <- "global"
f <- function(...) {
   x <- "f"
   g(...)
}
g <- function(...) {
   firstExpr <- substitute(...())[[1]]
   c(list(...)[[1]], eval(firstExpr, envir = parent.frame()))
}

Calling g(x) correctly prints "global" twice, but calling f(x) incorrectly
prints

[1] "global" "f"

You can get the first element of ... without evaluating the rest using ..1,
but I don't know a way to do this for general n in pre-3.5.0 base R.


If you don't mind using a package:

# works with R 3.1 and up
library(rlang)

x <- "global"
f <- function(...) {
   x <- "f"
   g(...)
}
g <- function(...) {
   dots <- enquos(...)
   eval_tidy(dots[[1]])
}

f(x, stop("!"))
#> [1] "global"
g(x, stop("!"))
#> [1] "global"

Hadley



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] as.list method for by Objects

2018-01-30 Thread Hervé Pagès

On 01/30/2018 02:50 PM, Michael Lawrence wrote:
by() does not always return a list. In Gabe's example, it returns an 
integer, thus it is coerced to a list. as.list() means that it should be 
a VECSXP, not necessarily with "list" in the class attribute.


The documentation is not particularly clear about what as.list()
means for list derivatives. IMO clarifications should stick to
simple concepts and formulations like "is.list(x) is TRUE" or
"x is a list or a list derivative" rather than "x is a VECSXP".
Coercion is useful beyond the use case of implementing a .C entry
point and calling as.numeric/as.list/etc... on its arguments.

This is why I was hoping that we could maybe discuss the possibility
of making the as.list() contract less vague than just "as.list()
must return a list or a list derivative".

Again, I think that 2 things weight quite a lot in that discussion:
  1) as.list() returns an object of class "data.frame" on a
 data.frame (strict coercion). If all what as.list() needed to
 do was to return a VECSXP, then as.list.default() already does
 this on a data.frame so why did someone bother adding an
 as.list.data.frame method that does strict coercion?
  2) The S4 coercion system based on as() does strict coercion by
     default.

H.



Michael

On Tue, Jan 30, 2018 at 2:41 PM, Hervé Pagès <mailto:hpa...@fredhutch.org>> wrote:


Hi Gabe,

Interestingly the behavior of as.list() on by objects seem to
depend on the object itself:

 > b1 <- by(1:2, 1:2, identity)
 > class(as.list(b1))
[1] "list"

 > b2 <- by(warpbreaks[, 1:2], warpbreaks[,"tension"], summary)
 > class(as.list(b2))
[1] "by"

This is with R 3.4.3 and R devel (2017-12-11 r73889).

H.

On 01/30/2018 02:33 PM, Gabriel Becker wrote:

Dario,

What version of R are you using. In my mildly old 3.4.0
installation and in the version of Revel I have lying around
(also mildly old...)  I don't see the behavior I think you are
describing

     > b = by(1:2, 1:2, identity)

     > class(as.list(b))

     [1] "list"

     > sessionInfo()

     R Under development (unstable) (2017-12-19 r73926)

     Platform: x86_64-apple-darwin15.6.0 (64-bit)

     Running under: OS X El Capitan 10.11.6


     Matrix products: default

     BLAS:

/Users/beckerg4/local/Rdevel/R.framework/Versions/3.5/Resources/lib/libRblas.dylib


     LAPACK:

/Users/beckerg4/local/Rdevel/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib



     locale:

     [1]
en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8


     attached base packages:

     [1] stats     graphics  grDevices utils     datasets 
methods   base



     loaded via a namespace (and not attached):

     [1] compiler_3.5.0

     >


As for by not having a class definition, no S3 class has an
explicit definition, so this is somewhat par for the course here...

did I misunderstand something?


~G

On Tue, Jan 30, 2018 at 2:24 PM, Hervé Pagès
mailto:hpa...@fredhutch.org>
<mailto:hpa...@fredhutch.org <mailto:hpa...@fredhutch.org>>> wrote:

     I agree that it makes sense to expect as.list() to perform
     a "strict coercion" i.e. to return an object of class "list",
     *even* on a list derivative. That's what as( , "list") does
     by default:

        # on a data.frame object
        as(data.frame(), "list")  # object of class "list"
                                  # (but strangely it drops the
names)

        # on a by object
        x <- by(warpbreaks[, 1:2], warpbreaks[,"tension"], summary)
        as(x, "list")  # object of class "list"

     More generally speaking as() is expected to perform "strict
     coercion" by default, unless called with 'strict=FALSE'.

     That's also what as.list() does on a data.frame:

        as.list(data.frame())  # object of class "list"

     FWIW as.numeric() also performs "strict coercion" on an integer
     vector:

        as.numeric(1:3)  # object of class "numeric"

     So an as.list.env method that does the same as as(x, "list")
     would bring a small touch of consistency in an otherwise
     quite inconsistent world of coercion methods(*).

     H.

     (*) as(data.frame(), "list", 

Re: [Rd] as.list method for by Objects

2018-01-30 Thread Hervé Pagès

Hi Gabe,

Interestingly the behavior of as.list() on by objects seem to
depend on the object itself:

> b1 <- by(1:2, 1:2, identity)
> class(as.list(b1))
[1] "list"

> b2 <- by(warpbreaks[, 1:2], warpbreaks[,"tension"], summary)
> class(as.list(b2))
[1] "by"

This is with R 3.4.3 and R devel (2017-12-11 r73889).

H.

On 01/30/2018 02:33 PM, Gabriel Becker wrote:

Dario,

What version of R are you using. In my mildly old 3.4.0 installation and 
in the version of Revel I have lying around (also mildly old...)  I 
don't see the behavior I think you are describing


> b = by(1:2, 1:2, identity)

> class(as.list(b))

[1] "list"

> sessionInfo()

R Under development (unstable) (2017-12-19 r73926)

Platform: x86_64-apple-darwin15.6.0 (64-bit)

Running under: OS X El Capitan 10.11.6


Matrix products: default

BLAS:

/Users/beckerg4/local/Rdevel/R.framework/Versions/3.5/Resources/lib/libRblas.dylib

LAPACK:

/Users/beckerg4/local/Rdevel/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib


locale:

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8


attached base packages:

[1] stats     graphics  grDevices utils     datasets  methods   base


loaded via a namespace (and not attached):

[1] compiler_3.5.0

> 




As for by not having a class definition, no S3 class has an explicit 
definition, so this is somewhat par for the course here...


did I misunderstand something?


~G

On Tue, Jan 30, 2018 at 2:24 PM, Hervé Pagès <mailto:hpa...@fredhutch.org>> wrote:


I agree that it makes sense to expect as.list() to perform
a "strict coercion" i.e. to return an object of class "list",
*even* on a list derivative. That's what as( , "list") does
by default:

   # on a data.frame object
   as(data.frame(), "list")  # object of class "list"
                             # (but strangely it drops the names)

   # on a by object
   x <- by(warpbreaks[, 1:2], warpbreaks[,"tension"], summary)
   as(x, "list")  # object of class "list"

More generally speaking as() is expected to perform "strict
coercion" by default, unless called with 'strict=FALSE'.

That's also what as.list() does on a data.frame:

   as.list(data.frame())  # object of class "list"

FWIW as.numeric() also performs "strict coercion" on an integer
vector:

   as.numeric(1:3)  # object of class "numeric"

So an as.list.env method that does the same as as(x, "list")
would bring a small touch of consistency in an otherwise
quite inconsistent world of coercion methods(*).

H.

(*) as(data.frame(), "list", strict=FALSE) doesn't do what you'd
     expect (just one of many examples)


On 01/29/2018 05:00 PM, Dario Strbenac wrote:

Good day,

I'd like to suggest the addition of an as.list method for a by
object that actually returns a list of class "list". This would
make it safer to do type-checking, because is.list also returns
TRUE for a data.frame variable and using class(result) == "list"
is an alternative that only returns TRUE for lists. It's also
confusing initially that

class(x)

[1] "by"

is.list(x)

[1] TRUE

since there's no explicit class definition for "by" and no
mention if it has any superclasses.

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia

__
R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list

https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=8nXbMrKus1XsG7MluCRy3sluJKKhMVwOPHtudDpYJ4o&s=qDnEZOWalov3E9h1dajp8RLURfRz0-nbwH721jFAcEo&e=

<https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=8nXbMrKus1XsG7MluCRy3sluJKKhMVwOPHtudDpYJ4o&s=qDnEZOWalov3E9h1dajp8RLURfRz0-nbwH721jFAcEo&e=>


-- 
Hervé Pagès


Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org <mailto:hpa...@fredhutch.org>
Phone: (206) 667-5791 
Fax: (206) 667-1319 


___

Re: [Rd] as.list method for by Objects

2018-01-30 Thread Hervé Pagès

On 01/30/2018 02:24 PM, Hervé Pagès wrote:

I agree that it makes sense to expect as.list() to perform
a "strict coercion" i.e. to return an object of class "list",
*even* on a list derivative. That's what as( , "list") does
by default:

   # on a data.frame object
   as(data.frame(), "list")  # object of class "list"
     # (but strangely it drops the names)

   # on a by object
   x <- by(warpbreaks[, 1:2], warpbreaks[,"tension"], summary)
   as(x, "list")  # object of class "list"

More generally speaking as() is expected to perform "strict
coercion" by default, unless called with 'strict=FALSE'.

That's also what as.list() does on a data.frame:

   as.list(data.frame())  # object of class "list"

FWIW as.numeric() also performs "strict coercion" on an integer
vector:

   as.numeric(1:3)  # object of class "numeric"

So an as.list.env method that does the same as as(x, "list")

^^^
oops, I meant as.list.by, sorry...

H.


would bring a small touch of consistency in an otherwise
quite inconsistent world of coercion methods(*).

H.

(*) as(data.frame(), "list", strict=FALSE) doesn't do what you'd
     expect (just one of many examples)


On 01/29/2018 05:00 PM, Dario Strbenac wrote:

Good day,

I'd like to suggest the addition of an as.list method for a by object 
that actually returns a list of class "list". This would make it safer 
to do type-checking, because is.list also returns TRUE for a 
data.frame variable and using class(result) == "list" is an 
alternative that only returns TRUE for lists. It's also confusing 
initially that



class(x)

[1] "by"

is.list(x)

[1] TRUE

since there's no explicit class definition for "by" and no mention if 
it has any superclasses.


--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia

__
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=8nXbMrKus1XsG7MluCRy3sluJKKhMVwOPHtudDpYJ4o&s=qDnEZOWalov3E9h1dajp8RLURfRz0-nbwH721jFAcEo&e= 







--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] as.list method for by Objects

2018-01-30 Thread Hervé Pagès

I agree that it makes sense to expect as.list() to perform
a "strict coercion" i.e. to return an object of class "list",
*even* on a list derivative. That's what as( , "list") does
by default:

  # on a data.frame object
  as(data.frame(), "list")  # object of class "list"
# (but strangely it drops the names)

  # on a by object
  x <- by(warpbreaks[, 1:2], warpbreaks[,"tension"], summary)
  as(x, "list")  # object of class "list"

More generally speaking as() is expected to perform "strict
coercion" by default, unless called with 'strict=FALSE'.

That's also what as.list() does on a data.frame:

  as.list(data.frame())  # object of class "list"

FWIW as.numeric() also performs "strict coercion" on an integer
vector:

  as.numeric(1:3)  # object of class "numeric"

So an as.list.env method that does the same as as(x, "list")
would bring a small touch of consistency in an otherwise
quite inconsistent world of coercion methods(*).

H.

(*) as(data.frame(), "list", strict=FALSE) doesn't do what you'd
expect (just one of many examples)


On 01/29/2018 05:00 PM, Dario Strbenac wrote:

Good day,

I'd like to suggest the addition of an as.list method for a by object that actually returns a list 
of class "list". This would make it safer to do type-checking, because is.list also 
returns TRUE for a data.frame variable and using class(result) == "list" is an 
alternative that only returns TRUE for lists. It's also confusing initially that


class(x)

[1] "by"

is.list(x)

[1] TRUE

since there's no explicit class definition for "by" and no mention if it has 
any superclasses.

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia

__
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=8nXbMrKus1XsG7MluCRy3sluJKKhMVwOPHtudDpYJ4o&s=qDnEZOWalov3E9h1dajp8RLURfRz0-nbwH721jFAcEo&e=



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] sum() returns NA on a long *logical* vector when nb of TRUE values exceeds 2^31

2018-01-30 Thread Hervé Pagès

Hi Martin, Henrik,

Thanks for the follow up.

@Martin: I vote for 2) without *any* hesitation :-)

(and uniformity could be restored at some point in the
future by having prod(), rowSums(), colSums(), and others
align with the behavior of length() and sum())

Cheers,
H.


On 01/27/2018 03:06 AM, Martin Maechler wrote:

Henrik Bengtsson 
 on Thu, 25 Jan 2018 09:30:42 -0800 writes:


 > Just following up on this old thread since matrixStats 0.53.0 is now
 > out, which supports this use case:

 >> x <- rep(TRUE, times = 2^31)

 >> y <- sum(x)
 >> y
 > [1] NA
 > Warning message:
 > In sum(x) : integer overflow - use sum(as.numeric(.))

 >> y <- matrixStats::sum2(x, mode = "double")
 >> y
 > [1] 2147483648
 >> str(y)
 > num 2.15e+09

 > No coercion is taking place, so the memory overhead is zero:

 >> profmem::profmem(y <- matrixStats::sum2(x, mode = "double"))
 > Rprofmem memory profiling of:
 > y <- matrixStats::sum2(x, mode = "double")

 > Memory allocations:
 > bytes calls
 > total 0

 > /Henrik

Thank you, Henrik, for the reminder.

Back in June, I had mentioned to Hervé and R-devel that
'logical' should remain to be treated as 'integer' as in all
arithmetic in (S and) R. Hervé did mention the isum()
function in the C code which is relevant here .. which does have
a LONG INT counter already -- *but* if we consider that sum()
has '...' i.e. a conceptually arbitrary number of long vector
integer arguments that counter won't suffice even there.

Before talking about implementation / patch, I think we should
consider 2 possible goals of a change --- I agree the status quo
is not a real option

1) sum(x) for logical and integer x  would return a double
   in any case and overflow should not happen (unless for
   the case where the result would be larger the
   .Machine$double.max which I think will not be possible
   even with "arbitrary" nargs() of sum.

2) sum(x) for logical and integer x  should return an integer in
all cases there is no overflow, including returning
NA_integer_ in case of NAs.
If there would be an overflow it must be detected "in time"
and the result should be double.

The big advantage of 2) is that it is back compatible in 99.x %
of use cases, and another advantage that it may be a very small
bit more efficient.  Also, in the case of "counting" (logical),
it is nice to get an integer instead of double when we can --
entirely analogously to the behavior of length() which returns
integer whenever possible.

The advantage of 1) is uniformity.

We should (at least provisionally) decide between 1) and 2) and then go for 
that.
It could be that going for 1) may have bad
compatibility-consequences in package space, because indeed we
had documented sum() would be integer for logical and integer arguments.

I currently don't really have time to
{work on implementing + dealing with the consequences}
for either ..

Martin

 > On Fri, Jun 2, 2017 at 1:58 PM, Henrik Bengtsson
 >  wrote:
 >> I second this feature request (it's understandable that this and
 >> possibly other parts of the code was left behind / forgotten after the
 >> introduction of long vector).
 >>
 >> I think mean() avoids full copies, so in the meanwhile, you can work
 >> around this limitation using:
 >>
 >> countTRUE <- function(x, na.rm = FALSE) {
 >> nx <- length(x)
 >> if (nx < .Machine$integer.max) return(sum(x, na.rm = na.rm))
 >> nx * mean(x, na.rm = na.rm)
 >> }
 >>
 >> (not sure if one needs to worry about rounding errors, i.e. where n %% 
0 != 0)
 >>
 >> x <- rep(TRUE, times = .Machine$integer.max+1)
 >> object.size(x)
 >> ## 8589934632 bytes
 >>
 >> p <- profmem::profmem( n <- countTRUE(x) )
 >> str(n)
 >> ## num 2.15e+09
 >> print(n == .Machine$integer.max + 1)
 >> ## [1] TRUE
 >>
 >> print(p)
 >> ## Rprofmem memory profiling of:
 >> ## n <- countTRUE(x)
 >> ##
 >> ## Memory allocations:
 >> ##  bytes calls
 >> ## total 0
 >>
 >>
 >> FYI / related: I've just updated matrixStats::sum2() to support
 >> logicals (develop branch) and I'll also try to update
 >> matrixStats::count() to count beyond .Machine$integer.max.
 >>
 >> /Henrik
 >>
 >> On Fri, Jun 2, 2017 at 4:05 AM, Hervé Pagès  
wrote:
 >>> Hi,
 >>

Re: [Rd] as.character(list(NA))

2018-01-22 Thread Hervé Pagès

On 01/22/2018 01:02 PM, William Dunlap wrote:
I tend to avoid using as. functions on lists, since they act oddly 
in several ways.
E.g, if the list "L" consists entirely of scalar elements then 
as.numeric(L) acts like
as.numeric(unlist(L)) but if any element is not a scalar there is an 
error.


FWIW personally I see this as a nice feature and use as.numeric(L)
instead of as.numeric(unlist(L) in places where I'd rather fail than
getting something that is not parallel to the input.

H.


  as.character()
does not seem to make a distinction between the all-scalar and 
not-all-scalar cases

but does various things with NA's of various types.

Bill Dunlap
TIBCO Software
wdunlap tibco.com 
<https://urldefense.proofpoint.com/v2/url?u=http-3A__tibco.com&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=TBOnlxYy_MjXJqxLqI8WB2WWHNMtc8qw5gxsqUoyoH4&s=XiTysBsoZb4M91NAS4DK6nK982wAf7JGpRSDrXioQ3A&e=>


On Mon, Jan 22, 2018 at 11:14 AM, Robert McGehee 
mailto:rmcge...@walleyetrading.net>> wrote:


Also perhaps a surprise that the behavior depends on the mode of the NA.

 > is.na

<https://urldefense.proofpoint.com/v2/url?u=http-3A__is.na&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=TBOnlxYy_MjXJqxLqI8WB2WWHNMtc8qw5gxsqUoyoH4&s=z8HTNapemGDhhH0ICUN2hgJrtUcxsgM96mcUwD8QzQk&e=>(as.character(list(NA_real_)))
[1] FALSE
 > is.na

<https://urldefense.proofpoint.com/v2/url?u=http-3A__is.na&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=TBOnlxYy_MjXJqxLqI8WB2WWHNMtc8qw5gxsqUoyoH4&s=z8HTNapemGDhhH0ICUN2hgJrtUcxsgM96mcUwD8QzQk&e=>(as.character(list(NA_character_)))
[1] TRUE

Does this mean deparse() preserves NA-ness for NA_character_ but not
NA_real_?


-Original Message-----
    From: R-devel [mailto:r-devel-boun...@r-project.org
<mailto:r-devel-boun...@r-project.org>] On Behalf Of Hervé Pagès
Sent: Monday, January 22, 2018 2:01 PM
To: William Dunlap mailto:wdun...@tibco.com>>;
Patrick Perry mailto:ppe...@stern.nyu.edu>>
Cc: r-devel@r-project.org <mailto:r-devel@r-project.org>
Subject: Re: [Rd] as.character(list(NA))

On 01/20/2018 08:24 AM, William Dunlap via R-devel wrote:
 > I believe that for a list as.character() applies deparse()  to
each element
 > of the list.  deparse() does not preserve NA-ness, as it is
intended to
 > make text that the parser can read.
 >
 >> str(as.character(list(Na=NA, LglVec=c(TRUE,NA),
 > Function=function(x){x+1})))
 >   chr [1:3] "NA" "c(TRUE, NA)" "function (x) \n{\n    x + 1\n}"
 >

This really comes as a surprise though since coercion to all the
other atomic types (except raw) preserve the NAs.

And also as.character(unlist(list(NA))) preserves them.

H.

 >
 > Bill Dunlap
 > TIBCO Software
 > wdunlap tibco.com

<https://urldefense.proofpoint.com/v2/url?u=http-3A__tibco.com&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=TBOnlxYy_MjXJqxLqI8WB2WWHNMtc8qw5gxsqUoyoH4&s=XiTysBsoZb4M91NAS4DK6nK982wAf7JGpRSDrXioQ3A&e=>
 >
 > On Sat, Jan 20, 2018 at 7:43 AM, Patrick Perry
mailto:ppe...@stern.nyu.edu>> wrote:
 >
 >> As of R Under development (unstable) (2018-01-19 r74138):
 >>
 >>> as.character(list(NA))
 >> [1] "NA"
 >>
 >>> is.na

<https://urldefense.proofpoint.com/v2/url?u=http-3A__is.na&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=TBOnlxYy_MjXJqxLqI8WB2WWHNMtc8qw5gxsqUoyoH4&s=z8HTNapemGDhhH0ICUN2hgJrtUcxsgM96mcUwD8QzQk&e=>(as.character(list(NA)))
 >> [1] FALSE
 >>
 >> __
 >> R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list
 >>

https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=VbamM9XRQOlfBakrmlrmQZ7DLgXZ-hhhFeLD-fKpoCo&s=Luhqwpr2bTltIA9Cy7kA4gwcQh16bla0S6OVe3Z09Xo&e=

<https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=VbamM9XRQOlfBakrmlrmQZ7DLgXZ-hhhFeLD-fKpoCo&s=Luhqwpr2bTltIA9Cy7kA4gwcQh16bla0S6OVe3Z09Xo&e=>
 >>
 >
 >       [[alternative HTML version deleted]]
 >
 > __
 > R-

Re: [Rd] as.character(list(NA))

2018-01-22 Thread Hervé Pagès

On 01/20/2018 08:24 AM, William Dunlap via R-devel wrote:

I believe that for a list as.character() applies deparse()  to each element
of the list.  deparse() does not preserve NA-ness, as it is intended to
make text that the parser can read.


str(as.character(list(Na=NA, LglVec=c(TRUE,NA),

Function=function(x){x+1})))
  chr [1:3] "NA" "c(TRUE, NA)" "function (x) \n{\nx + 1\n}"



This really comes as a surprise though since coercion to all the
other atomic types (except raw) preserve the NAs.

And also as.character(unlist(list(NA))) preserves them.

H.



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Sat, Jan 20, 2018 at 7:43 AM, Patrick Perry  wrote:


As of R Under development (unstable) (2018-01-19 r74138):


as.character(list(NA))

[1] "NA"


is.na(as.character(list(NA)))

[1] FALSE

__
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=VbamM9XRQOlfBakrmlrmQZ7DLgXZ-hhhFeLD-fKpoCo&s=Luhqwpr2bTltIA9Cy7kA4gwcQh16bla0S6OVe3Z09Xo&e=



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=VbamM9XRQOlfBakrmlrmQZ7DLgXZ-hhhFeLD-fKpoCo&s=Luhqwpr2bTltIA9Cy7kA4gwcQh16bla0S6OVe3Z09Xo&e=



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Unexpected dimnames attribute returned by cbind/rbind

2017-12-22 Thread Hervé Pagès

Hi,

  > m5 <- cbind(integer(5), integer(5))
  > m5
   [,1] [,2]
  [1,]00
  [2,]00
  [3,]00
  [4,]00
  [5,]00
  > dimnames(m5)
  NULL

No dimnames, as expected.

  > m0 <- cbind(integer(0), integer(0))
  > m0
   [,1] [,2]
  > dimnames(m0)
  [[1]]
  NULL

  [[2]]
  NULL

Unexpected dimnames attribute!

rbind'ing empty vectors also returns a matrix with unexpected
dimnames:

  > dimnames(rbind(character(0), character(0)))
  [[1]]
  NULL

  [[2]]
  NULL

Cheers,
H.

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] format() doesn't propagate the dim and dimnames when underlying type of array is "list"

2017-12-14 Thread Hervé Pagès

Hi,

Here is how to reproduce:

With a matrix of atomic type


  m1 <- matrix(1:12, ncol=3,
   dimnames=list(letters[1:4], LETTERS[1:3]))

  typeof(m1)
  # [1] "integer"

  m1
  #   A B  C
  # a 1 5  9
  # b 2 6 10
  # c 3 7 11
  # d 4 8 12

  format(m1)
  #   ABC
  # a " 1" " 5" " 9"
  # b " 2" " 6" "10"
  # c " 3" " 7" "11"
  # d " 4" " 8" "12"

==> dim and dimnames are propagated.

With a matrix of type "list"


  m2 <- matrix(rep(list(1:5, NULL, "AA"), 4), ncol=3,
   dimnames=list(letters[1:4], LETTERS[1:3]))

  typeof(m2)
  # [1] "list"

  m2
  #   A B C
  # a Integer,5 NULL  "AA"
  # b NULL  "AA"  Integer,5
  # c "AA"  Integer,5 NULL
  # d Integer,5 NULL  "AA"

  format(m2)
  # [1] "1, 2, 3, 4, 5" "NULL"  "AA""1, 2, 3, 4, 5"
  # [5] "NULL"  "AA""1, 2, 3, 4, 5" "NULL"
  # [9] "AA""1, 2, 3, 4, 5" "NULL"  "AA"

==> dim and dimnames are dropped!

The same thing seems to happen with arrays of arbitrary dimensions.

Thanks,
H.

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] binary form of is() contradicts its unary form

2017-11-29 Thread Hervé Pagès

Yes, data.frame is not an S4 class but is(data.frame())
finds its super-classes anyway and without the need to wrap
it in asS4(). And "list' is one of the super-classes. Then
is(data.frame(), "list") contradicts this.

I'm not asking for a workaround. I already have one with
'class2 %in% is(object)' as reported in my original post.
'is(asS4(object), class2)' is maybe another one but, unlike
the former, it's not obvious that it will behave consistently
with unary is(). There could be some other surprise on the
way.

You're missing the point of my original post. Which is that
there is a serious inconsistency between the unary and binary
forms of is(). Maybe the binary form is right in case of
is(data.frame(), "list"). But then the unary form should not
return "list'. This inconsistency will potentially hurt anybody
who tries to do computations on a class hierarchy, especially
if the hierarchy is complex and mixes S4 and S3 classes. So I'm
hoping this can be addressed. Hope you understand.

Cheers,
H.


On 11/29/2017 12:21 PM, Suzen, Mehmet wrote:

Hi Herve,

Interesting observation with `setClass` but it is for S4.  It looks
like `data.frame()` is not an S4 class.


isS4(data.frame())

[1] FALSE

And in your case this might help:


is(asS4(data.frame()), "list")

[1] TRUE

Looks like `is` is designed for S4 classes, I am not entirely sure.

Best,
-Mehmet

On 29 November 2017 at 20:46, Hervé Pagès  wrote:

Hi Mehmet,

On 11/29/2017 11:22 AM, Suzen, Mehmet wrote:


Hi Herve,

I think you are confusing subclasses and classes. There is no
contradiction. `is` documentation
is very clear:

`With one argument, returns all the super-classes of this object's class.`



Yes that's indeed very clear. So if "list" is a super-class
of "data.frame" (as reported by is(data.frame())), then
is(data.frame(), "list") should be TRUE.

With S4 classes:

   setClass("A")
   setClass("B", contains="A")

   ## Get all the super-classes of B.
   is(new("B"))
   # [1] "B" "A"

   ## Does a B object inherit from A?
   is(new("B"), "A")
   # [1] TRUE

Cheers,
H.



Note that object class is always `data.frame` here, check:

  > class(data.frame())
[1] "data.frame"
  > is(data.frame(), "data.frame")
[1] TRUE

Best,
Mehmet





On 29 Nov 2017 19:13, "Hervé Pagès" mailto:hpa...@fredhutch.org>> wrote:

 Hi,

 The unary forms of is() and extends() report that data.frame
 extends list, oldClass, and vector:

> is(data.frame())
[1] "data.frame" "list"   "oldClass"   "vector"

> extends("data.frame")
[1] "data.frame" "list"   "oldClass"   "vector"

 However, the binary form of is() disagrees:

> is(data.frame(), "list")
[1] FALSE
> is(data.frame(), "oldClass")
[1] FALSE
> is(data.frame(), "vector")
[1] FALSE

 while the binary form of extends() agrees:

> extends("data.frame", "list")
[1] TRUE
> extends("data.frame", "oldClass")
[1] TRUE
> extends("data.frame", "vector")
[1] TRUE

 Who is right?

 Shouldn't 'is(object, class2)' be equivalent
 to 'class2 %in% is(object)'? Furthermore, is there
 any reason why 'is(object, class2)' is not implemented
 as 'class2 %in% is(object)'?

 Thanks,
 H.

 --
 Hervé Pagès

 Program in Computational Biology
 Division of Public Health Sciences
 Fred Hutchinson Canc

<https://urldefense.proofpoint.com/v2/url?u=https-3A__maps.google.com_-3Fq-3DFred-2BHutchinson-2BCanc-26entry-3Dgmail-26source-3Dg&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=AptypGUf1qnpkFcOc1eU_vdGSHsush3RGVUyjk7yDu8&s=sTr3VPPxYCZLOtlBS3DToP4-Wi44EOLs99gJcV932b0&e=>er
 Research Center
 1100 Fairview Ave. N, M1-B514
 P.O. Box 19024
 Seattle, WA 98109-1024

 E-mail: hpa...@fredhutch.org <mailto:hpa...@fredhutch.org>
 Phone:  (206) 667-5791
 Fax:(206) 667-1319

 __
 R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list
 
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Edo4xQQyNSdlhiJjtVDnOcunTA8a6KT5EN7_jowitP8&s=ES11eQ8qMdiYMc5X-SbEfQyy2VoX6MUfX0skN-QWunc&e=

<https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.e

Re: [Rd] binary form of is() contradicts its unary form

2017-11-29 Thread Hervé Pagès

Hi Mehmet,

On 11/29/2017 11:22 AM, Suzen, Mehmet wrote:

Hi Herve,

I think you are confusing subclasses and classes. There is no
contradiction. `is` documentation
is very clear:

`With one argument, returns all the super-classes of this object's class.`


Yes that's indeed very clear. So if "list" is a super-class
of "data.frame" (as reported by is(data.frame())), then
is(data.frame(), "list") should be TRUE.

With S4 classes:

  setClass("A")
  setClass("B", contains="A")

  ## Get all the super-classes of B.
  is(new("B"))
  # [1] "B" "A"

  ## Does a B object inherit from A?
  is(new("B"), "A")
  # [1] TRUE

Cheers,
H.



Note that object class is always `data.frame` here, check:

 > class(data.frame())
[1] "data.frame"
 > is(data.frame(), "data.frame")
[1] TRUE

Best,
Mehmet





On 29 Nov 2017 19:13, "Hervé Pagès" mailto:hpa...@fredhutch.org>> wrote:

Hi,

The unary forms of is() and extends() report that data.frame
extends list, oldClass, and vector:

   > is(data.frame())
   [1] "data.frame" "list"   "oldClass"   "vector"

   > extends("data.frame")
   [1] "data.frame" "list"   "oldClass"   "vector"

However, the binary form of is() disagrees:

   > is(data.frame(), "list")
   [1] FALSE
   > is(data.frame(), "oldClass")
   [1] FALSE
   > is(data.frame(), "vector")
   [1] FALSE

while the binary form of extends() agrees:

   > extends("data.frame", "list")
   [1] TRUE
   > extends("data.frame", "oldClass")
   [1] TRUE
   > extends("data.frame", "vector")
   [1] TRUE

Who is right?

Shouldn't 'is(object, class2)' be equivalent
to 'class2 %in% is(object)'? Furthermore, is there
any reason why 'is(object, class2)' is not implemented
as 'class2 %in% is(object)'?

Thanks,
H.

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Canc

<https://urldefense.proofpoint.com/v2/url?u=https-3A__maps.google.com_-3Fq-3DFred-2BHutchinson-2BCanc-26entry-3Dgmail-26source-3Dg&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=AptypGUf1qnpkFcOc1eU_vdGSHsush3RGVUyjk7yDu8&s=sTr3VPPxYCZLOtlBS3DToP4-Wi44EOLs99gJcV932b0&e=>er
Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org <mailto:hpa...@fredhutch.org>
Phone:  (206) 667-5791
Fax:(206) 667-1319

______
R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

<https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=AptypGUf1qnpkFcOc1eU_vdGSHsush3RGVUyjk7yDu8&s=OzNPwqjAWVsXOGKMCmd4Fa7Udcm21ewfJmUN78LenQY&e=>



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] binary form of is() contradicts its unary form

2017-11-29 Thread Hervé Pagès

Hi,

The unary forms of is() and extends() report that data.frame
extends list, oldClass, and vector:

  > is(data.frame())
  [1] "data.frame" "list"   "oldClass"   "vector"

  > extends("data.frame")
  [1] "data.frame" "list"   "oldClass"   "vector"

However, the binary form of is() disagrees:

  > is(data.frame(), "list")
  [1] FALSE
  > is(data.frame(), "oldClass")
  [1] FALSE
  > is(data.frame(), "vector")
  [1] FALSE

while the binary form of extends() agrees:

  > extends("data.frame", "list")
  [1] TRUE
  > extends("data.frame", "oldClass")
  [1] TRUE
  > extends("data.frame", "vector")
  [1] TRUE

Who is right?

Shouldn't 'is(object, class2)' be equivalent
to 'class2 %in% is(object)'? Furthermore, is there
any reason why 'is(object, class2)' is not implemented
as 'class2 %in% is(object)'?

Thanks,
H.

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] `[<-.data.frame` sets rownames incorrectly

2017-11-21 Thread Hervé Pagès

On 11/21/2017 06:19 PM, Hervé Pagès wrote:

Hi,

Here is another problem with data frame subsetting:

   > df <- data.frame(aa=1:3)
   > value <- data.frame(aa=11:12, row.names=c("A", "B"))

   > `[<-`(df, 4:5, , value=value)
 aa
   1  1
   2  2
   3  3
   A 11
   B 12

   > `[<-`(df, 5:4, , value=value)
 aa
   1  1
   2  2
   3  3
   B 12
   A 11


This actually produces:

  > `[<-`(df, 5:4, , value=value)
aa
  1  1
  2  2
  3  3
  A 12
  B 11

but should instead produce:

aa
  1  1
  2  2
  3  3
  B 12
  A 11

sorry for the confusion.

H.



For this last result, the rownames of the 2 last rows should
be swapped.

H.



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] `[<-.data.frame` sets rownames incorrectly

2017-11-21 Thread Hervé Pagès

Hi,

Here is another problem with data frame subsetting:

  > df <- data.frame(aa=1:3)
  > value <- data.frame(aa=11:12, row.names=c("A", "B"))

  > `[<-`(df, 4:5, , value=value)
aa
  1  1
  2  2
  3  3
  A 11
  B 12

  > `[<-`(df, 5:4, , value=value)
aa
  1  1
  2  2
  3  3
  B 12
  A 11

For this last result, the rownames of the 2 last rows should
be swapped.

H.

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] `[[<-.data.frame` leaves holes after existing columns and returns a corrupt data frame

2017-11-21 Thread Hervé Pagès

Hi,

`[<-.data.frame` is cautious about not leaving holes after existing
columns:

  > `[<-`(data.frame(id=1:6), 3, value=data.frame(V3=11:16))
  Error in `[<-.data.frame`(data.frame(id = 1:6), 3, value = 
data.frame(V3 = 11:16)) :

new columns would leave holes after existing columns

but `[[<-.data.frame` not so much:

  > `[[<-`(data.frame(id=1:6), 3, value=11:16)
id  V3
  1  1 NULL 11
  2  2  12
  3  3  13
  4  4  14
  5  5  15
  6  6  16
  Warning message:
  In format.data.frame(x, digits = digits, na.encode = FALSE) :
corrupt data frame: columns will be truncated or padded with NAs

The latter should probably behave like the former in that case. Maybe
by sharing more code with it?

Thanks,
H.


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] split() - unexpected sorting of results

2017-10-20 Thread Hervé Pagès

Hi,

On 10/20/2017 12:53 PM, Peter Meissner wrote:

Thanks, for the explanation.

Still, I think this is surprising bahaviour which might be handled better.


Maybe a little surprising, but no more than:

> x <- sample(11L)

> sort(x)
 [1]  1  2  3  4  5  6  7  8  9 10 11

> sort(as.character(x))
 [1] "1"  "10" "11" "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"

The fact that sort(), as.factor(), split() and many other things behave
consistently with respect to the underlying order of character vectors
avoids other even bigger surprises.

Also note that the underlying order of character vectors actually
depends on your locale. One way to guarantee consistent results across
platforms/locales is by explicitly specifying the levels when making
a factor e.g.

  f <- factor(x, levels=unique(x))
  split(1:11, f)

This is particularly sensible when writing unit tests.

Cheers,
H.



Best, Peter

Am 20.10.2017 9:49 nachm. schrieb "Iñaki Úcar" :


Hi Peter,

2017-10-20 21:33 GMT+02:00 Peter Meissner :

Hey,

I found this - for me - quite surprising and puzzling behaviour of

split().



split(1:11, as.character(1:11))
split(1:11, 1:11)


When splitting by numerics everything works as expected - sorting of

input

== sorting of output -- but when using a character vector everything gets
re-sorted alphabetical.


Although, there are some references in the help files to what happens

when

using split, I did not find any note on this - for me - rather unexpected
behaviour.


As the documentation states,

f: a ‘factor’ in the sense that ‘as.factor(f)’ defines the
   grouping, or a list of such factors in which case their
   interaction is used for the grouping.

And, in fact,


as.factor(1:11)

  [1] 1  2  3  4  5  6  7  8  9  10 11
Levels: 1 2 3 4 5 6 7 8 9 10 11


as.factor(as.character(1:11))

  [1] 1  2  3  4  5  6  7  8  9  10 11
Levels: 1 10 11 2 3 4 5 6 7 8 9

Regards,
Iñaki


I would like it best when the sorting of split results stays the same no
matter the input (sorting of input == sorting of output)

If that is not possibly a note of caution in the help pages and maybe an
example might be valuable.


Best, Peter

 [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=o5-lZT7zAjFNU8C0Z9D7XaQO_2NGmhKF-IbGZFhSvO0&s=4cZ9rSLJAVnnjULGMCDPAclXHoc9_le3Z1DrZg0nQqg&e=




[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=o5-lZT7zAjFNU8C0Z9D7XaQO_2NGmhKF-IbGZFhSvO0&s=4cZ9rSLJAVnnjULGMCDPAclXHoc9_le3Z1DrZg0nQqg&e=



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] sum() returns NA on a long *logical* vector when nb of TRUE values exceeds 2^31

2017-06-07 Thread Hervé Pagès

Hi Martin,

On 06/07/2017 03:54 AM, Martin Maechler wrote:

Martin Maechler 
 on Tue, 6 Jun 2017 09:45:44 +0200 writes:



Hervé Pagès 
 on Fri, 2 Jun 2017 04:05:15 -0700 writes:


 >> Hi, I have a long numeric vector 'xx' and I want to use
 >> sum() to count the number of elements that satisfy some
 >> criteria like non-zero values or values lower than a
 >> certain threshold etc...

 >> The problem is: sum() returns an NA (with a warning) if
 >> the count is greater than 2^31. For example:

 >>> xx <- runif(3e9) sum(xx < 0.9)
 >> [1] NA Warning message: In sum(xx < 0.9) : integer
 >> overflow - use sum(as.numeric(.))

 >> This already takes a long time and doing
 >> sum(as.numeric(.)) would take even longer and require
 >> allocation of 24Gb of memory just to store an
 >> intermediate numeric vector made of 0s and 1s. Plus,
 >> having to do sum(as.numeric(.)) every time I need to
 >> count things is not convenient and is easy to forget.

 >> It seems that sum() on a logical vector could be modified
 >> to return the count as a double when it cannot be
 >> represented as an integer.  Note that length() already
 >> does this so that wouldn't create a precedent. Also and
 >> FWIW prod() avoids the problem by always returning a
 >> double, whatever the type of the input is (except on a
 >> complex vector).

 >> I can provide a patch if this change sounds reasonable.

 > This sounds very reasonable, thank you Hervé, for the
 > report, and even more for a (small) patch.

I was made aware of the fact, that R treats logical and
integer very often identically in the C code, and in general we
even mention that logicals are treated as 0/1/NA integers in
arithmetic.

For the present case that would mean that we should also
safe-guard against *integer* overflow in sum(.)  and that is
not something we have done / wanted to do in the past...  Speed
being one reason.

So this ends up being more delicate than I had thought at first,
because changing  sum()  only would mean that

   sum(LOGI)  and
   sum(as.integer(LOGI))

would start differ for a logical vector LOGI.

So, for now this is something that must be approached carefully,
and the R Core team may want discuss "in private" first.

I'm sorry for having raised possibly unrealistic expectations.


No worries. Thanks for taking my proposal into consideration.
Note that the isum() function in src/main/summary.c is already using
a 64-bit accumulator to accommodate intermediate sums > INT_MAX.
So it should be easy to modify the function to make it overflow for
much bigger final sums without altering performance. Seems like
R_XLEN_T_MAX would be the natural threshold.

Cheers,
H.



Martin

 > Martin

 >> Cheers, H.

 >> --
 >> Hervé Pagès

 >> Program in Computational Biology Division of Public
 >> Health Sciences Fred Hutchinson Cancer Research Center
 >> 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA
 >> 98109-1024

 >> E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:
 >> (206) 667-1319

 >> __
 >> R-devel@r-project.org mailing list
 >> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIDAw&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=dyRNzyVdDYXzNX0sXIl5sdDqDXSxROm4-uM_XMquX_E&s=Qq6QdMWvudWgR_WGKdbBVNnVs5JO6s692MxjDo2JR9Y&e=

 > __
 > R-devel@r-project.org mailing list
 > 
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIDAw&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=dyRNzyVdDYXzNX0sXIl5sdDqDXSxROm4-uM_XMquX_E&s=Qq6QdMWvudWgR_WGKdbBVNnVs5JO6s692MxjDo2JR9Y&e=



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] surprisingly, S4 classes with a "dim" or "dimnames" slot are final (in the Java sense)

2017-06-06 Thread Hervé Pagès

Thanks Michael for taking care of this.  H.

On 06/06/2017 11:48 AM, Michael Lawrence wrote:

I've fixed this and will commit soon.

Disregard my dim<-() example; that behaves as expected (the class needs
a dim<-() method).

Michael

On Tue, Jun 6, 2017 at 5:16 AM, Michael Lawrence mailto:micha...@gene.com>> wrote:

Thanks for the report. The issue is that one cannot set special
attributes like names, dim, dimnames, etc on S4 objects. I was
aready working on this and will have a fix soon.

 > a2 <- new("A2")
 > dim(a2) <- c(2, 3)
Error in dim(a2) <- c(2, 3) : invalid first argument


On Mon, Jun 5, 2017 at 6:08 PM, Hervé Pagès mailto:hpa...@fredhutch.org>> wrote:

Hi,

It's nice to be able to define S4 classes with slots that correspond
to standard attributes:

   setClass("A1", slots=c(names="character"))
   setClass("A2", slots=c(dim="integer"))
   setClass("A3", slots=c(dimnames="list"))

By doing this, one gets a few methods for free:

   a1 <- new("A1", names=letters[1:3])
   names(a1) # "a" "b" "c"
   a2 <- new("A2", dim=4:3)
   nrow(a2)  # 4
   a3 <- new("A3", dimnames=list(NULL, letters[1:3]))
   colnames(a3)  # "a" "b" "c"

However, when it comes to subclassing, some of these slots cause
problems. I can extend A1:

   setClass("B1", contains="A1")

but trying to extend A2 or A3 produces an error (with a
non-informative
message in the 1st case and a somewhat obscure one in the 2nd):

   setClass("B2", contains="A2")
   # Error in attr(prototype, slotName) <- attr(pri, slotName) :
   #   invalid first argument

   setClass("B3", contains="A3")
   # Error in attr(prototype, slotName) <- attr(pri, slotName) :
   #   'dimnames' applied to non-array

So it seems that the presence of a "dim" or "dimnames" slot
prevents a
class from being extended. Is this expected? I couldn't find
anything
in TFM about this. Sorry if I missed it.

Thanks,
H.

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org <mailto:hpa...@fredhutch.org>
Phone: (206) 667-5791 
Fax: (206) 667-1319 

__
R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

<https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=7MsydJAWI1B1wsZHDmsO-mpZ_vfvDpTo-YMHgUXrQKQ&s=dXHseRValxgm4TXgSsjasFRGgqAf46IivoNi4VnRj3o&e=>





--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] surprisingly, S4 classes with a "dim" or "dimnames" slot are final (in the Java sense)

2017-06-05 Thread Hervé Pagès

Hi,

It's nice to be able to define S4 classes with slots that correspond
to standard attributes:

  setClass("A1", slots=c(names="character"))
  setClass("A2", slots=c(dim="integer"))
  setClass("A3", slots=c(dimnames="list"))

By doing this, one gets a few methods for free:

  a1 <- new("A1", names=letters[1:3])
  names(a1) # "a" "b" "c"
  a2 <- new("A2", dim=4:3)
  nrow(a2)  # 4
  a3 <- new("A3", dimnames=list(NULL, letters[1:3]))
  colnames(a3)  # "a" "b" "c"

However, when it comes to subclassing, some of these slots cause
problems. I can extend A1:

  setClass("B1", contains="A1")

but trying to extend A2 or A3 produces an error (with a non-informative
message in the 1st case and a somewhat obscure one in the 2nd):

  setClass("B2", contains="A2")
  # Error in attr(prototype, slotName) <- attr(pri, slotName) :
  #   invalid first argument

  setClass("B3", contains="A3")
  # Error in attr(prototype, slotName) <- attr(pri, slotName) :
  #   'dimnames' applied to non-array

So it seems that the presence of a "dim" or "dimnames" slot prevents a
class from being extended. Is this expected? I couldn't find anything
in TFM about this. Sorry if I missed it.

Thanks,
H.

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] sum() returns NA on a long *logical* vector when nb of TRUE values exceeds 2^31

2017-06-02 Thread Hervé Pagès

Hi,

I have a long numeric vector 'xx' and I want to use sum() to count
the number of elements that satisfy some criteria like non-zero
values or values lower than a certain threshold etc...

The problem is: sum() returns an NA (with a warning) if the count
is greater than 2^31. For example:

  > xx <- runif(3e9)
  > sum(xx < 0.9)
  [1] NA
  Warning message:
  In sum(xx < 0.9) : integer overflow - use sum(as.numeric(.))

This already takes a long time and doing sum(as.numeric(.)) would
take even longer and require allocation of 24Gb of memory just to
store an intermediate numeric vector made of 0s and 1s. Plus, having
to do sum(as.numeric(.)) every time I need to count things is not
convenient and is easy to forget.

It seems that sum() on a logical vector could be modified to return
the count as a double when it cannot be represented as an integer.
Note that length() already does this so that wouldn't create a
precedent. Also and FWIW prod() avoids the problem by always returning
a double, whatever the type of the input is (except on a complex
vector).

I can provide a patch if this change sounds reasonable.

Cheers,
H.

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] stopifnot() does not stop at first non-TRUE argument

2017-05-16 Thread Hervé Pagès

On 05/16/2017 09:59 AM, peter dalgaard wrote:



On 16 May 2017, at 18:37 , Suharto Anggono Suharto Anggono via R-devel 
 wrote:

switch(i, ...)
extracts 'i'-th argument in '...'. It is like
eval(as.name(paste0("..", i))) .


Hey, that's pretty neat!


Indeed! Seems like this topic is even more connected to switch()
than I anticipated...

H.



-pd



Just mentioning other things:
- For 'n',
n <- nargs()
can be used.
- sys.call() can be used in place of match.call() .
---

peter dalgaard 
   on Mon, 15 May 2017 16:28:42 +0200 writes:



I think Hervé's idea was just that if switch can evaluate arguments 
selectively, so can stopifnot(). But switch() is .Primitive, so does it from C.


if he just meant that, then "yes, of course" (but not so interesting).


I think it is almost a no-brainer to implement a sequential stopifnot if 
dropping to C code is allowed. In R it gets trickier, but how about this:


Something like this, yes, that's close to what Serguei Sokol had proposed
(and of course I *do*  want to keep the current sophistication
of stopifnot(), so this is really too simple)


Stopifnot <- function(...)
{
n <- length(match.call()) - 1
for (i in 1:n)
{
nm <- as.name(paste0("..",i))
if (!eval(nm)) stop("not all true")
}
}
Stopifnot(2+2==4)
Stopifnot(2+2==5, print("Hey!!!") == "Hey!!!")
Stopifnot(2+2==4, print("Hey!!!") == "Hey!!!")
Stopifnot(T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,F,T)




On 15 May 2017, at 15:37 , Martin Maechler  
wrote:

I'm still curious about Hervé's idea on using  switch()  for the
issue.



--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com


__
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=mLJLORFCunDiCafHllurGVVVHiMf85ExkM7B5DngfIk&s=helOsmplADBmY6Ct7r30onNuD8a6GKz6yuSgjPxljeU&e=




--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] stopifnot() does not stop at first non-TRUE argument

2017-05-15 Thread Hervé Pagès

On 05/15/2017 07:28 AM, peter dalgaard wrote:

I think Hervé's idea was just that if switch can evaluate arguments 
selectively, so can stopifnot().


Yep.

Thanks,
H.


But switch() is .Primitive, so does it from C.

I think it is almost a no-brainer to implement a sequential stopifnot if 
dropping to C code is allowed. In R it gets trickier, but how about this:

Stopifnot <- function(...)
{
  n <- length(match.call()) - 1
  for (i in 1:n)
  {
nm <- as.name(paste0("..",i))
if (!eval(nm)) stop("not all true")
  }
}
Stopifnot(2+2==4)
Stopifnot(2+2==5, print("Hey!!!") == "Hey!!!")
Stopifnot(2+2==4, print("Hey!!!") == "Hey!!!")
Stopifnot(T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,F,T)



On 15 May 2017, at 15:37 , Martin Maechler  wrote:

I'm still curious about Hervé's idea on using  switch()  for the
issue.




--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

  1   2   3   4   5   >