Re: [Bioc-devel] as.data.frame for GRanges when one meta column is a data frame

2018-07-05 Thread Jialin Ma
Dear Hervé,

It seems that the printing method is broken not because the data frame
has nested data frame, but because the nested data frame has "AsIs"
class, for example:

> df <- data.frame(x = c(1,2))
> df$d <- data.frame(z = c(3,4))
> df
  x z
1 1 3
2 2 4
> str(df)
'data.frame':   2 obs. of  2 variables:
 $ x: num  1 2
 $ d:'data.frame':  2 obs. of  1 variable:
  ..$ z: num  3 4
> df$d <- I(data.frame(z = c(3,4)))
> df
Error in dim(rvec) <- dim(x) : 
  dims [product 2] do not match the length of object [1]
> str(df)
'data.frame':   2 obs. of  2 variables:
 $ x: num  1 2
 $ d:Classes ‘AsIs’ and 'data.frame':   2 obs. of  1 variable:
  ..$ z: num  3 4


Also, as far as I know, nested data frames are used in some packages
such as jsonlite:

> df <- data.frame(x = c(1,2))
> df$d <- data.frame(z = c(3,4))
> jsonlite::toJSON(df)
[{"x":1,"d":{"z":3}},{"x":2,"d":{"z":4}}] 

> str(jsonlite::fromJSON(txt =
'[{"x":1,"d":{"z":3}},{"x":2,"d":{"z":4}}]'))
'data.frame':   2 obs. of  2 variables:
 $ x: int  1 2
 $ d:'data.frame':  2 obs. of  1 variable:
  ..$ z: int  3 4

But I agree with you that it may be more consistent to flatten the
nested data frame. I will make changes to my package in order to fix
the errors.

Many thanks,
Jialin



On Thu, 2018-07-05 at 10:59 -0700, Hervé Pagès wrote:
> Hi Jialin,
> 
> Note that up to BioC 3.7, as.data.frame(gr) in your example
> was returning a broken data.frame:
> 
>> as.data.frame(gr)
>Error in dim(rvec) <- dim(x) :
>  dims [product 6] do not match the length of object [1]
> 
> More precisely, the call to as.data.frame(gr) is successful and
> returns a data.frame but that data.frame cannot be displayed:
> 
>> df2 <- as.data.frame(gr)
>> df2
>Error in dim(rvec) <- dim(x) :
>  dims [product 6] do not match the length of object [1]
> 
> The problem is with the print.data.frame() method:
> 
>> print.data.frame(df2)
>Error in dim(rvec) <- dim(x) :
>  dims [product 6] do not match the length of object [1]
> 
> Feel free to bring this up to the R devel folks.
> 
> Anyway, since it's not clear whether data.frame objects are actually
> expected to support nesting, it's safer to have as.data.frame()
> getting rid of the nesting.
> 
> Furthermore: as.data.frame() **has** to "un-nest" nested objects
> in the general case e.g. when the nested objects are S4
> vector-like objects like Hits, GRanges, DataFrame, etc... That's
> because an ordinary data.frame cannot contain these objects. So it
> seems preferable to un-nest everything rather than making an
> exception
> when the metadata column is a data.frame. In particular, this
> exception
> would lead to inconsistent behavior if the data.frame column is
> replaced
> with a DataFrame.
> 
> For the record, here is the commit that refactored as.data.frame()
> to un-nest everything:
> 
>  
> https://github.com/Bioconductor/S4Vectors/commit/d84bc18dea7a23206194
> 6fbfe30d2072b88705a7
> 
> With this new approach, as.data.frame() can work on "complicated"
> objects i.e. on objects with an arbitrary number of nesting levels.
> 
> Hope this makes sense.
> 
> Cheers,
> H.
> 
> 
> On 07/04/2018 01:38 PM, Jialin Ma wrote:
> > Dear all,
> > 
> > It seems that the devel branch of Bioconductor has made
> > changes/improvements on the behavior of as.data.frame. In the case
> > that
> > input is a GRanges with a meta column of data frame, as.data.frame
> > in
> > devel will flatten the nested data frame. I made an example below:
> > 
> >   library(GenomicRanges)
> >   gr <- GRanges("chr2", IRanges(1:6, width = 2))
> >   gr$df <- data.frame(x = runif(6))
> >   str(as.data.frame(gr))
> > 
> > which shows:
> > 
> >'data.frame':6 obs. of  6 variables:
> >$ seqnames: Factor w/ 1 level "chr2": 1 1 1 1 1 1
> >$ start   : int  1 2 3 4 5 6
> >$ end : int  2 3 4 5 6 7
> >$ width   : int  2 2 2 2 2 2
> >$ strand  : Factor w/ 3 levels "+","-","*": 3 3 3 3 3 3
> >$ x   : num  0.55 0.058 0.966 0.75 0.764 ...
> > 
> > with session info:
> > 
> >R version 3.5.0 (2018-04-23)
> >Platform: x86_64-suse-linux-gnu (64-bit)
> >Running under: openSUSE Tumbleweed
> > 
> >attached base packages:
> >[1] parallel  stats4stats graphics  grDevices
> > utils datasets
> >[8] methods   base
> >
> >other attached packages:
> >[1] GenomicRanges_1.33.6 GenomeInfoDb_1.17.1  IRanges_2.15.14
> >[4] S4Vectors_0.19.17BiocGenerics_0.27.1  magrittr_1.5
> >
> >loaded via a namespace (and not attached):
> >[1]
> > zlibbioc_1.27.0compiler_3.5.0 XVector_0.21.3
> >[4] tools_3.5.0GenomeInfoDbData_1.1.0 RCurl_1.95-
> > 4.10
> >[7] yaml_2.1.19bitops_1.0-6
> >
> > 
> > While in the old version, the same code have the following results:
> > 
> >'data.frame':6 obs. of  6 variables:
> >$ seqnames: Factor w/ 1 level "chr2": 1 1 1 1 1 1
> >$ start   : int  1 2 3 4 5 6
> >$ end : 

Re: [Bioc-devel] as.data.frame for GRanges when one meta column is a data frame

2018-07-05 Thread Hervé Pagès

Hi Jialin,

Note that up to BioC 3.7, as.data.frame(gr) in your example
was returning a broken data.frame:

  > as.data.frame(gr)
  Error in dim(rvec) <- dim(x) :
dims [product 6] do not match the length of object [1]

More precisely, the call to as.data.frame(gr) is successful and
returns a data.frame but that data.frame cannot be displayed:

  > df2 <- as.data.frame(gr)
  > df2
  Error in dim(rvec) <- dim(x) :
dims [product 6] do not match the length of object [1]

The problem is with the print.data.frame() method:

  > print.data.frame(df2)
  Error in dim(rvec) <- dim(x) :
dims [product 6] do not match the length of object [1]

Feel free to bring this up to the R devel folks.

Anyway, since it's not clear whether data.frame objects are actually
expected to support nesting, it's safer to have as.data.frame()
getting rid of the nesting.

Furthermore: as.data.frame() **has** to "un-nest" nested objects
in the general case e.g. when the nested objects are S4
vector-like objects like Hits, GRanges, DataFrame, etc... That's
because an ordinary data.frame cannot contain these objects. So it
seems preferable to un-nest everything rather than making an exception
when the metadata column is a data.frame. In particular, this exception
would lead to inconsistent behavior if the data.frame column is replaced
with a DataFrame.

For the record, here is the commit that refactored as.data.frame()
to un-nest everything:


https://github.com/Bioconductor/S4Vectors/commit/d84bc18dea7a232061946fbfe30d2072b88705a7

With this new approach, as.data.frame() can work on "complicated"
objects i.e. on objects with an arbitrary number of nesting levels.

Hope this makes sense.

Cheers,
H.


On 07/04/2018 01:38 PM, Jialin Ma wrote:

Dear all,

It seems that the devel branch of Bioconductor has made
changes/improvements on the behavior of as.data.frame. In the case that
input is a GRanges with a meta column of data frame, as.data.frame in
devel will flatten the nested data frame. I made an example below:

  library(GenomicRanges)
  gr <- GRanges("chr2", IRanges(1:6, width = 2))
  gr$df <- data.frame(x = runif(6))
  str(as.data.frame(gr))

which shows:

   'data.frame':6 obs. of  6 variables:
   $ seqnames: Factor w/ 1 level "chr2": 1 1 1 1 1 1
   $ start   : int  1 2 3 4 5 6
   $ end : int  2 3 4 5 6 7
   $ width   : int  2 2 2 2 2 2
   $ strand  : Factor w/ 3 levels "+","-","*": 3 3 3 3 3 3
   $ x   : num  0.55 0.058 0.966 0.75 0.764 ...

with session info:

   R version 3.5.0 (2018-04-23)
   Platform: x86_64-suse-linux-gnu (64-bit)
   Running under: openSUSE Tumbleweed

   attached base packages:
   [1] parallel  stats4stats graphics  grDevices
utils datasets
   [8] methods   base
   
   other attached packages:

   [1] GenomicRanges_1.33.6 GenomeInfoDb_1.17.1  IRanges_2.15.14
   [4] S4Vectors_0.19.17BiocGenerics_0.27.1  magrittr_1.5
   
   loaded via a namespace (and not attached):

   [1]
zlibbioc_1.27.0compiler_3.5.0 XVector_0.21.3
   [4] tools_3.5.0GenomeInfoDbData_1.1.0 RCurl_1.95-
4.10
   [7] yaml_2.1.19bitops_1.0-6
   


While in the old version, the same code have the following results:

   'data.frame':6 obs. of  6 variables:
   $ seqnames: Factor w/ 1 level "chr2": 1 1 1 1 1 1
   $ start   : int  1 2 3 4 5 6
   $ end : int  2 3 4 5 6 7
   $ width   : int  2 2 2 2 2 2
   $ strand  : Factor w/ 3 levels "+","-","*": 3 3 3 3 3 3
   $ df  :Classes ‘AsIs’ and 'data.frame':  6 obs. of  1
variable:
 ..$ x: num  0.935 0.577 0.245 0.687 0.194 ...

with session info:

   R version 3.5.0 (2018-04-23)
   Platform: x86_64-suse-linux-gnu (64-bit)
   Running under: openSUSE Tumbleweed
   
   attached base packages:

   [1] parallel  stats4stats graphics  grDevices
utils datasets
   [8] methods   base
   
   other attached packages:

   [1] GenomicRanges_1.32.3 GenomeInfoDb_1.17.1  IRanges_2.14.10
   [4] S4Vectors_0.18.3 BiocGenerics_0.27.1  magrittr_1.5
   
   loaded via a namespace (and not attached):

   [1]
zlibbioc_1.27.0compiler_3.5.0 BiocInstaller_1.30.0
   [4]
XVector_0.21.3 tools_3.5.0GenomeInfoDbData_1.1.0
   [7] RCurl_1.95-4.10yaml_2.1.19bitops_1.0-
6
   


I personally feel that automatically flattening the nested data frame
may not be the right behavior. I am not sure about it but I would like
to suggest to keep data frame column as is when using as.data.frame
(also do not add "AsIs" class as it will cause error showing the
converted data frame).

Any thoughts?

Best regards,
Jialin



 Forwarded Message 
From: "Shepherd, Lori" 
To: marl...@gmx.cn 
Subject: failing Bioconductor package TnT
Date: Tue, 3 Jul 2018 12:25:20 +


Dear TnT maintainer,

I'd like to bring to your attention that the TnT package is failing
to pass 'R CMD build' on all platforms in the devel version of
Bioconductor (i.e. BioC 3.8):


[Bioc-devel] as.data.frame for GRanges when one meta column is a data frame

2018-07-04 Thread Jialin Ma
Dear all,

It seems that the devel branch of Bioconductor has made
changes/improvements on the behavior of as.data.frame. In the case that
input is a GRanges with a meta column of data frame, as.data.frame in
devel will flatten the nested data frame. I made an example below:

 library(GenomicRanges)
 gr <- GRanges("chr2", IRanges(1:6, width = 2))
 gr$df <- data.frame(x = runif(6))
 str(as.data.frame(gr))

which shows:

  'data.frame': 6 obs. of  6 variables:
  $ seqnames: Factor w/ 1 level "chr2": 1 1 1 1 1 1
  $ start   : int  1 2 3 4 5 6
  $ end : int  2 3 4 5 6 7
  $ width   : int  2 2 2 2 2 2
  $ strand  : Factor w/ 3 levels "+","-","*": 3 3 3 3 3 3
  $ x   : num  0.55 0.058 0.966 0.75 0.764 ...

with session info:

  R version 3.5.0 (2018-04-23)
  Platform: x86_64-suse-linux-gnu (64-bit)
  Running under: openSUSE Tumbleweed

  attached base packages:
  [1] parallel  stats4stats graphics  grDevices
utils datasets 
  [8] methods   base 
  
  other attached packages:
  [1] GenomicRanges_1.33.6 GenomeInfoDb_1.17.1  IRanges_2.15.14 
  [4] S4Vectors_0.19.17BiocGenerics_0.27.1  magrittr_1.5
  
  loaded via a namespace (and not attached):
  [1]
zlibbioc_1.27.0compiler_3.5.0 XVector_0.21.3
  [4] tools_3.5.0GenomeInfoDbData_1.1.0 RCurl_1.95-
4.10   
  [7] yaml_2.1.19bitops_1.0-6
  

While in the old version, the same code have the following results:

  'data.frame': 6 obs. of  6 variables:
  $ seqnames: Factor w/ 1 level "chr2": 1 1 1 1 1 1
  $ start   : int  1 2 3 4 5 6
  $ end : int  2 3 4 5 6 7
  $ width   : int  2 2 2 2 2 2
  $ strand  : Factor w/ 3 levels "+","-","*": 3 3 3 3 3 3
  $ df  :Classes ‘AsIs’ and 'data.frame':   6 obs. of  1
variable:
..$ x: num  0.935 0.577 0.245 0.687 0.194 ...

with session info:

  R version 3.5.0 (2018-04-23)
  Platform: x86_64-suse-linux-gnu (64-bit)
  Running under: openSUSE Tumbleweed
  
  attached base packages:
  [1] parallel  stats4stats graphics  grDevices
utils datasets 
  [8] methods   base 
  
  other attached packages:
  [1] GenomicRanges_1.32.3 GenomeInfoDb_1.17.1  IRanges_2.14.10 
  [4] S4Vectors_0.18.3 BiocGenerics_0.27.1  magrittr_1.5
  
  loaded via a namespace (and not attached):
  [1]
zlibbioc_1.27.0compiler_3.5.0 BiocInstaller_1.30.0  
  [4]
XVector_0.21.3 tools_3.5.0GenomeInfoDbData_1.1.0
  [7] RCurl_1.95-4.10yaml_2.1.19bitops_1.0-
6  
  

I personally feel that automatically flattening the nested data frame
may not be the right behavior. I am not sure about it but I would like
to suggest to keep data frame column as is when using as.data.frame
(also do not add "AsIs" class as it will cause error showing the
converted data frame).

Any thoughts?

Best regards,
Jialin



 Forwarded Message 
From: "Shepherd, Lori" 
To: marl...@gmx.cn 
Subject: failing Bioconductor package TnT
Date: Tue, 3 Jul 2018 12:25:20 +

> Dear TnT maintainer,
> 
> I'd like to bring to your attention that the TnT package is failing
> to pass 'R CMD build' on all platforms in the devel version of
> Bioconductor (i.e. BioC 3.8):
> 
> http://bioconductor.org/checkResults/devel/bioc-LATEST/TnT
> 
> Would you mind taking a look at this? Don't hesitate to ask on the bi
> oc-de...@r-project.org mailing list if you have any question or need
> help.
> 
> 
> While devel is a place to experiment with new features, we expect
> packages to build and check cleanly in a reasonable time period and
> not stay broken for
> any extended period of time.   The package has been failing since
> 06/11/18
> 
> If no action is taken over the next few weeks we will begin the
> deprecation process for your package.  
> 
> 
> Thank you for your time and effort, and your continued contribution
> to Bioconductor.
> 
> Pleae be advised that Bioconductor has switched from svn to Git. Some
> helpful links can be found here: 
> https://bioconductor.org/developers/how-to/git/
> http://bioconductor.org/developers/how-to/git/bug-fix-in-release-and-
> devel/
> 
> 
> 
> Lori Shepherd
> Bioconductor Core Team
> Roswell Park Cancer Institute
> Department of Biostatistics & Bioinformatics
> Elm & Carlton Streets
> Buffalo, New York 14263
> 
> This email message may contain legally privileged and/or confidential
> information. If you are not the intended recipient(s), or the
> employee or agent responsible for the delivery of this message to the
> intended recipient(s), you are hereby notified that any disclosure,
> copying, distribution, or use of this email message is prohibited. If
> you have received this message in error, please notify the sender
> immediately by e-mail and delete this email message from your
> computer. Thank you.

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel