Re: [Rd] Change to I() in R 4.1

2020-10-30 Thread Pages, Herve

On 10/29/20 23:08, Pages, Herve wrote:
...
> 
> I can think of 2 ways to move forward:
> 
> 1. Keep I()'s current implementation but suppress the warning. We'll
> make the necessary adjustments to DataFrame() to repair columns supplied
> as I() objects. Note that we would still be in the situation where
> I() objects break validObject() but we've been in that situation for
> years and so far we've managed to work around it. However this doesn't
> mean that validObject() shouldn't be fixed. Note that print(I())
> would also need to be fixed (it says "" which is
> misleading). Anyways, these 2 issues are separated from the main issue
> and can be dealt with later.

1b. A variant of the above could be to use the old implementation for S4 
objects only:

   I <- function(x)
   {
   if (isS4(x)) {
   structure(x, class = unique.default(c("AsIs", oldClass(x
   } else {
   `class<-`(x, unique.default(c("AsIs", oldClass(x
   }
   }

That is probably a good compromise for now.

I would also suggest that the "package" attribute of the S4 class be 
kept around so the code that we use to restore the original object has a 
way to restore it exactly, including its full class specification. Right 
now, and also with the previous implementation, we cannot do that 
because attr(class(x), "package") is lost. So something like this:

   I <- function(x)
   {
   if (isS4(x)) {
   x_class <- class(x)
   new_classes <- c("AsIs", x_class)
   attr(new_classes, "package") <- attr(x_class, "package")
   structure(x, class=new_classes)
   } else {
   `class<-`(x, unique.default(c("AsIs", oldClass(x
   }
   }

Thanks,
H.


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Change to I() in R 4.1

2020-10-29 Thread Pages, Herve
Hi Martin,

On 10/26/20 04:52, Martin Maechler wrote:
>>
>> Hi there,
>> Is that change in R-devel intentional?
>>
>> library(Matrix)
>> m <- as(matrix(c(0, 1)), "sparseMatrix")
>>
>> isS4(m)
>> # [1] TRUE
>>
>> x <- I(m)
>> # Warning message:
>> # In `class<-`(x, unique.default(c("AsIs", oldClass(x :
>> #   Setting class(x) to multiple strings ("AsIs", "dgCMatrix", ...);
>> result will no longer be an S4 object
>>
>> isS4(x)
>> # [1] FALSE
>>
>> This works fine in R 4.0.3 i.e. no warning and I() doesn't turn off the
>> S4 bit of the object.
>>
>> This change breaks 17 Bioconductor packages.
>>
>> Seems that the culprit is this change in how I() is implemented:
>>
>> In R 4.0.3:
>>
>> > I
>> function (x)
>> {
>>   structure(x, class = unique(c("AsIs", oldClass(x
>> }
>>
>> In R devel:
>>
>> > I
>> function (x)
>> `class<-`(x, unique.default(c("AsIs", oldClass(x
> 
> Yes, (by me),  as  I()  was sticking out in the slowness bug PR#17794
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.r-2Dproject.org_bugzilla_show-5Fbug.cgi-3Fid-3D17794&d=DwIDAw&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=ej3wnc10LiheZsqjRuonTr2WwHWU4ecaDSrTBFby8wU&s=isOqEV-_6Yk1PlRZBoBchZHZYnpQxPGEZdZPgTHMkKg&e=
> 
> and the direct dangerous  `call<-`  will be replaced happily in
> I()'s definition.
> 
> *But* as Luke Tierney had remarked to R-core, direct changing of
> the class of an S4 object has given the above warning for a
> quite while, (svn r47934 | jmc | 2009-02-17 )
> *and* it has rather been an inconsistency in R, that you could
> still use "low-level" means to change the class of an S4 object
> to something invalid.
> 
> I really don't think people should be allowed to use I() to
> change a valid S4 object into an invalid one, but this is what
> happens (R 4.0.3 patched) :
> 
>> require(Matrix); M <- Matrix(0, 2,3); IM <- I(M)
>> validObject(IM)
> Error in .classEnv(classDef) :
>trying to get slot "package" from an object of a basic class ("NULL") with 
> no slots
>> IM
>   --- FAILURE REPORT --
>   --- failure: length > 1 in coercion to logical ---
>   --- srcref ---
> :
>   --- package (from environment) ---
> methods
>   --- call from context ---
> showDefault(object)
>   --- call from argument ---
> !is.null(clDef) && isS4(object) && is.na(match(clDef@className,
>  .BasicClasses))
>   --- R stacktrace ---
> where 1: showDefault(object)
> where 2: Error in showDefault(object) :
>cannot get a slot ("slots") from an object of type "NULL"

The way I interpret this is that 'IM' breaks validObject(). This is not 
exactly the same as saying that 'IM' is invalid.

> 
>> Unfortunately there is a bunch of code around that calls I() on S4
>> objects, admittedly not necessarily for very good reasons, but it
>> happens. Would it be possible that I() has a less destructive effect on
>> S4 objects?
> 
> I'm not sure if this is really desirable... but I may fail to
> see the point of allowing invalid   I()  objects as they
> appear in R 4.0.x  ..
> 
> So what do you really propose that  I(.) should be doing, e.g.,
> for  'M'  above ?

To provide some context, people typically use I() when they construct a 
data frame as a way to tell the data frame constructor to treat the 
supplied columns as-is. In Bioconductor we have DataFrame, a 
data-frame-like structure where the columns can be anything, including 
S4 objects, has long they have a vector-like semantic. Like 
data.frame(), the DataFrame() constructor will also treat columns that 
carry the AsIs tag as-is.

A typical use case where this feature is used on an S4 column is to nest 
DataFrame objects. For example, with R 4.0.3:

   library(S4Vectors)
   df <- DataFrame(X=1:3, Y=letters[1:3])

   DataFrame(a=df, Z=11:13)
   # DataFrame with 3 rows and 3 columns
   # a.X a.Y Z
   # 
   # 1 1   a11
   # 2 2   b12
   # 3 3   c13

But if we wrap 'df' in I():

   DataFrame(a=I(df), Z=11:13)
   # DataFrame with 3 rows and 2 columns
   # a Z
   #
   # 1 1:a11
   # 2 2:b12
   # 3 3:c13

'df' ends up in the 1st column of the returned DataFrame.

AFAICT the fact that wrapping 'df' in I() was producing some kind of 
Frankenstein object that breaks validObject() has never been a problem 
in practice because the Dataframe() constructor immediately removes the 
AsIs tag internally. This restores the original object. Note that this 
is an important difference with data.frame() where the AsIs tag sticks 
around:

   class(DataFrame(a=I(letters))$a)
   # [1] "character"

   class(data.frame(a=I(letters))$a)
   # [1] "AsIs"

Now the new behavior of I() makes a little bit more damage to an S4 
object. In addition to breaking validObject(), like the old behavior 
did, it removes it

Re: [Rd] Change to I() in R 4.1

2020-10-26 Thread Martin Maechler
> 
> Hi there,
> Is that change in R-devel intentional?
> 
>library(Matrix)
>m <- as(matrix(c(0, 1)), "sparseMatrix")
> 
>isS4(m)
># [1] TRUE
> 
>x <- I(m)
># Warning message:
># In `class<-`(x, unique.default(c("AsIs", oldClass(x :
>#   Setting class(x) to multiple strings ("AsIs", "dgCMatrix", ...); 
> result will no longer be an S4 object
> 
>isS4(x)
># [1] FALSE
> 
> This works fine in R 4.0.3 i.e. no warning and I() doesn't turn off the 
> S4 bit of the object.
> 
> This change breaks 17 Bioconductor packages.
> 
> Seems that the culprit is this change in how I() is implemented:
> 
> In R 4.0.3:
> 
>> I
>function (x)
>{
>  structure(x, class = unique(c("AsIs", oldClass(x
>}
> 
> In R devel:
> 
>> I
>function (x)
>`class<-`(x, unique.default(c("AsIs", oldClass(x

Yes, (by me),  as  I()  was sticking out in the slowness bug PR#17794
  https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17794

and the direct dangerous  `call<-`  will be replaced happily in
I()'s definition.

*But* as Luke Tierney had remarked to R-core, direct changing of
the class of an S4 object has given the above warning for a
quite while, (svn r47934 | jmc | 2009-02-17 )
*and* it has rather been an inconsistency in R, that you could
still use "low-level" means to change the class of an S4 object
to something invalid.

I really don't think people should be allowed to use I() to
change a valid S4 object into an invalid one, but this is what
happens (R 4.0.3 patched) :

> require(Matrix); M <- Matrix(0, 2,3); IM <- I(M)
> validObject(IM)
Error in .classEnv(classDef) : 
  trying to get slot "package" from an object of a basic class ("NULL") with no 
slots
> IM
 --- FAILURE REPORT -- 
 --- failure: length > 1 in coercion to logical ---
 --- srcref --- 
: 
 --- package (from environment) --- 
methods
 --- call from context --- 
showDefault(object)
 --- call from argument --- 
!is.null(clDef) && isS4(object) && is.na(match(clDef@className, 
.BasicClasses))
 --- R stacktrace ---
where 1: showDefault(object)
where 2: Error in showDefault(object) : 
  cannot get a slot ("slots") from an object of type "NULL"
> 


> Unfortunately there is a bunch of code around that calls I() on S4 
> objects, admittedly not necessarily for very good reasons, but it 
> happens. Would it be possible that I() has a less destructive effect on 
> S4 objects?

I'm not sure if this is really desirable... but I may fail to
see the point of allowing invalid   I()  objects as they
appear in R 4.0.x  ..

So what do you really propose that  I(.) should be doing, e.g.,
for  'M'  above ?

Martin


> 
> Thanks,
> H.
> 
>  > sessionInfo()
> R Under development (unstable) (2020-10-17 r79346)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 20.04.1 LTS
> 
> Matrix products: default
> BLAS:   /home/biocbuild/bbs-3.13-bioc/R/lib/libRblas.so
> LAPACK: /home/biocbuild/bbs-3.13-bioc/R/lib/libRlapack.so
> 
> locale:
>   [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
>   [9] LC_ADDRESS=C   LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base
> 
> other attached packages:
> [1] Matrix_1.2-18
> 
> loaded via a namespace (and not attached):
> [1] compiler_4.1.0  grid_4.1.0  lattice_0.20-41
> 
> -- 
> Hervé Pagès
> 
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
> 
> E-mail: hpa...@fredhutch.org
> Phone:  (206) 667-5791
> Fax:(206) 667-1319
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Change to I() in R 4.1

2020-10-23 Thread Pages, Herve
Hi there,

Is that change in R-devel intentional?

   library(Matrix)
   m <- as(matrix(c(0, 1)), "sparseMatrix")

   isS4(m)
   # [1] TRUE

   x <- I(m)
   # Warning message:
   # In `class<-`(x, unique.default(c("AsIs", oldClass(x :
   #   Setting class(x) to multiple strings ("AsIs", "dgCMatrix", ...); 
result will no longer be an S4 object

   isS4(x)
   # [1] FALSE

This works fine in R 4.0.3 i.e. no warning and I() doesn't turn off the 
S4 bit of the object.

This change breaks 17 Bioconductor packages.

Seems that the culprit is this change in how I() is implemented:

In R 4.0.3:

   > I
   function (x)
   {
 structure(x, class = unique(c("AsIs", oldClass(x
   }

In R devel:

   > I
   function (x)
   `class<-`(x, unique.default(c("AsIs", oldClass(x

Unfortunately there is a bunch of code around that calls I() on S4 
objects, admittedly not necessarily for very good reasons, but it 
happens. Would it be possible that I() has a less destructive effect on 
S4 objects?

Thanks,
H.

 > sessionInfo()
R Under development (unstable) (2020-10-17 r79346)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS

Matrix products: default
BLAS:   /home/biocbuild/bbs-3.13-bioc/R/lib/libRblas.so
LAPACK: /home/biocbuild/bbs-3.13-bioc/R/lib/libRlapack.so

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] Matrix_1.2-18

loaded via a namespace (and not attached):
[1] compiler_4.1.0  grid_4.1.0  lattice_0.20-41

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel