date:20150401

Re: [Bioc-devel] SummarizedExperiment subset of 4 dimensions

2015-04-01 Thread Michael Lawrence

It would be nice if someone from Seattle would weigh in on this.

Also, we might want to consider an assayMatrix() accessor that always
returns an assay in 2D, except, as you suggest, it might be a matrix of
multiples (vectors, matrices, etc) by putting dimensions on a list. That
way, generic code can at least assume consistent dimensionality, even if
the values are complex. I don't really have any use cases though; just
seems possibly beneficial in the abstract.

On Wed, Apr 1, 2015 at 1:19 AM, Jesper Gådin jesper.ga...@gmail.com wrote:

 Hi Wolfgang and Michael,

 As Michael says, there is no redundant information in the 4D array I have,
 and all the values are integers.

 Of course I can simulate 4D by e.g. creating extra 3D arrays as assays
 equal to the length of the fourth dimension, but that makes the assay list
 a mess. It would also require me to write accessor functions that
 transforms the data into 4D before subsequent calculations (or to use a for
 loop..).

 Another option would be to include the 4D as a multiple in the 3D, which
 would not require a later transformation into 4D. If I understood correct,
 the array is just a long vector, which is indexed into different
 dimensions, and so everything in an SE object could as well be written as
 2D. But (my belief is that) it is actually convenient to use the properties
 of dimensions for arrays.

 So if there is not a problem extending to 4D, I would be extremely
 grateful if you could take a look at it. :)

 Regards,
 Jesper

 On Tue, Mar 31, 2015 at 2:16 PM, Michael Lawrence 
 lawrence.mich...@gene.com wrote:

 One would need a long-form colData that aligns with the array.

 But now I realize that's not what Jesper wants to do here, and is not how
 SE is currently designed. Jesper is using the third (and now fourth)
 dimension to store an additional dimension of information about the same
 sample. We already support 3D arrays for this, presumably motivated VCF,
 where, for example, each sample can have a probability for WT, het, or hom
 at each position. In that case, all of the values are genotype likelihoods,
 i.e., they all measure the same thing, so they seem to belong in the same
 assay. But they're also the same biological sample. Essentially, we have
 complex measurements that might be a vector, or for Jesper even a matrix.

 The important question for interoperability is whether we want there to
 be a contract that assays are always two dimensions. I guess we've already
 violated that with VCF. Extending to a fourth is not really hurting
 anything.


 On Tue, Mar 31, 2015 at 4:52 AM, Wolfgang Huber whu...@embl.de wrote:


 Hi Michael

 where would you put the “colData”-style metadata for the 3rd, 4th, …
 dimensions?

 As an (ex-)physicists of course I like arrays, and the more dimensions
 the better, but in practical work I’ve consistently been bitten by the
 rigidity of such a design choice too early in a process.

 Wolfgang

 On 31 Mar 2015, at 13:32, Michael Lawrence lawrence.mich...@gene.com
 wrote:

 Taken in the abstract, the tidy data argument is one for consistent data
 structures that enable interoperability, which is what we have with
 SummarizedExperiment. The long form or tidy data frame is an effective
 general representation, but if there is additional structure in your data,
 why not represent it formally? Given the way R lays out the data in arrays,
 it should be possible to add that fourth dimension, in an assay array,
 while still using the colData to annotate that structure. It does not make
 the data any less tidy, but it does make it more structured.

 On Tue, Mar 31, 2015 at 4:14 AM, Wolfgang Huber whu...@embl.de wrote:

 Dear Jesper

 this is maybe not the answer you want to hear, but stuffing in 4, 5, …
 dimensions may not be all that useful, as you can always roll out these
 higher dimensions into the existing third (or even into the second, the
 SummarizedExperiment columns). There is Hadley’s concept of “tidy data”
 (see e.g. http://www.jstatsoft.org/v59/i10 ) — a paper that is really
 worthwhile to read — which implies that the tidy way forward is to stay
 with 2 (or maybe 3) dimensions in SummarizedExperiment, and to record the
 information that you’d otherwise stuff into the higher dimensions in the
 colData covariates.

 Wolfgang

 Wolfgang Huber
 Principal Investigator, EMBL Senior Scientist
 Genome Biology Unit
 European Molecular Biology Laboratory (EMBL)
 Heidelberg, Germany

 T +49-6221-3878823
 wolfgang.hu...@embl.de
 http://www.huber.embl.de





  On 30 Mar 2015, at 12:38, Jesper Gådin jesper.ga...@gmail.com
 wrote:
 
  Hi!
 
  The SummarizedExperiment class is an extremely powerful container for
  biological data(thank you!), and all my thinking nowadays is just
 circling
  around how to stuff it as effectively as possible.
 
  Have been using 3 dimension for a long time, which has been very
  successful. Now I also have a case for using 4 dimensions. Everything
  seemed to work as expected

Re: [Bioc-devel] SummarizedExperiment subset of 4 dimensions

2015-04-01 Thread Martin Morgan


On 04/01/2015 05:08 AM, Michael Lawrence wrote:

It would be nice if someone from Seattle would weigh in on this.


I was hoping to weigh in with 'it's done' but will instead with 'it will be 
done'.

A second aspect of Jesper's data that took me a little by surprise and is 
related to Michael's comment below was that assays() can simultaneously hold 
arrays of 2, 3, (and 4) dimensions.


Martin



Also, we might want to consider an assayMatrix() accessor that always
returns an assay in 2D, except, as you suggest, it might be a matrix of
multiples (vectors, matrices, etc) by putting dimensions on a list. That
way, generic code can at least assume consistent dimensionality, even if
the values are complex. I don't really have any use cases though; just
seems possibly beneficial in the abstract.

On Wed, Apr 1, 2015 at 1:19 AM, Jesper Gådin jesper.ga...@gmail.com wrote:


Hi Wolfgang and Michael,

As Michael says, there is no redundant information in the 4D array I have,
and all the values are integers.

Of course I can simulate 4D by e.g. creating extra 3D arrays as assays
equal to the length of the fourth dimension, but that makes the assay list
a mess. It would also require me to write accessor functions that
transforms the data into 4D before subsequent calculations (or to use a for
loop..).

Another option would be to include the 4D as a multiple in the 3D, which
would not require a later transformation into 4D. If I understood correct,
the array is just a long vector, which is indexed into different
dimensions, and so everything in an SE object could as well be written as
2D. But (my belief is that) it is actually convenient to use the properties
of dimensions for arrays.

So if there is not a problem extending to 4D, I would be extremely
grateful if you could take a look at it. :)

Regards,
Jesper

On Tue, Mar 31, 2015 at 2:16 PM, Michael Lawrence 
lawrence.mich...@gene.com wrote:


One would need a long-form colData that aligns with the array.

But now I realize that's not what Jesper wants to do here, and is not how
SE is currently designed. Jesper is using the third (and now fourth)
dimension to store an additional dimension of information about the same
sample. We already support 3D arrays for this, presumably motivated VCF,
where, for example, each sample can have a probability for WT, het, or hom
at each position. In that case, all of the values are genotype likelihoods,
i.e., they all measure the same thing, so they seem to belong in the same
assay. But they're also the same biological sample. Essentially, we have
complex measurements that might be a vector, or for Jesper even a matrix.

The important question for interoperability is whether we want there to
be a contract that assays are always two dimensions. I guess we've already
violated that with VCF. Extending to a fourth is not really hurting
anything.


On Tue, Mar 31, 2015 at 4:52 AM, Wolfgang Huber whu...@embl.de wrote:



Hi Michael

where would you put the “colData”-style metadata for the 3rd, 4th, …
dimensions?

As an (ex-)physicists of course I like arrays, and the more dimensions
the better, but in practical work I’ve consistently been bitten by the
rigidity of such a design choice too early in a process.

Wolfgang

On 31 Mar 2015, at 13:32, Michael Lawrence lawrence.mich...@gene.com
wrote:

Taken in the abstract, the tidy data argument is one for consistent data
structures that enable interoperability, which is what we have with
SummarizedExperiment. The long form or tidy data frame is an effective
general representation, but if there is additional structure in your data,
why not represent it formally? Given the way R lays out the data in arrays,
it should be possible to add that fourth dimension, in an assay array,
while still using the colData to annotate that structure. It does not make
the data any less tidy, but it does make it more structured.

On Tue, Mar 31, 2015 at 4:14 AM, Wolfgang Huber whu...@embl.de wrote:


Dear Jesper

this is maybe not the answer you want to hear, but stuffing in 4, 5, …
dimensions may not be all that useful, as you can always roll out these
higher dimensions into the existing third (or even into the second, the
SummarizedExperiment columns). There is Hadley’s concept of “tidy data”
(see e.g. http://www.jstatsoft.org/v59/i10 ) — a paper that is really
worthwhile to read — which implies that the tidy way forward is to stay
with 2 (or maybe 3) dimensions in SummarizedExperiment, and to record the
information that you’d otherwise stuff into the higher dimensions in the
colData covariates.

Wolfgang

Wolfgang Huber
Principal Investigator, EMBL Senior Scientist
Genome Biology Unit
European Molecular Biology Laboratory (EMBL)
Heidelberg, Germany

T +49-6221-3878823
wolfgang.hu...@embl.de
http://www.huber.embl.de






On 30 Mar 2015, at 12:38, Jesper Gådin jesper.ga...@gmail.com

wrote:


Hi!

The SummarizedExperiment class is an extremely powerful container for
biological data(thank you!), and all my

Re: [Bioc-devel] issue about S4 slot has a dist object.

2015-04-01 Thread Hervé Pagès


On 04/01/2015 05:05 PM, Michael Lawrence wrote:

So this explains why I wasn't able to figure out how that package was
importing graph, and, yes, I also thought it was strange that graph did
not export it. The methods package explicitly conditions on the warn
level, so it is apparently intentional. It just looks in the global
class table for duplicates, so it does not pay attention to the
namespace. It's not clear to what extent the methods package assumes
that there are no duplicates in the class table; probably too much work
to fix.

As for sharing class definitions, perhaps the methods package should
define it. It already defines classes from the stats package, like
aov. We could start moving more stuff from BiocGenerics to methods.


Sounds good. The more upstream these class definitions are the better.

Just moved setOldClass(dist) from graph to BiocGenerics 0.13.11 and
exported the dist class.

H.





On Wed, Apr 1, 2015 at 4:29 PM, Martin Morgan mtmor...@fredhutch.org
mailto:mtmor...@fredhutch.org wrote:

On 04/01/2015 03:52 PM, Hervé Pagès wrote:

Hi,

In the same way that we avoid having 2 packages define the same
S4 generic function by moving the shared generic definitions to
BiocGenerics, it seems that we should also avoid having 2 packages
call setOldClass on the same S3 class. Like with S4 generic
functions,
we've already started to do this by putting some setOldClass
statements in BiocGenerics (e.g. we've done it for the 'connection'
classes 'file', 'url', 'gzfile', 'bzfile', etc..., see
class?gzfile).
So if nobody objects we'll do this for the 'dist' class too.

Then you won't need to use setOldClass in your cogena package
Zhilong.
You'll just need to make sure that you import BiocGenerics.


This sounds like an ok work-around to me.

For the hard-core...

One thing is that this is not seen when the package is loaded by
itself, e.g.,

  biocLite(zhilongjia/cogena)
  library(cogena)
 

only when loaded by BiocCheck (on the source directory)

  BiocCheck::BiocCheck(cogena)
* This is BiocCheck, version 1.3.13.
* BiocCheck is a work in progress. Output and severity of issues may
   change.
* Installing package...
Note: the specification for S3 class dist in package 'cogena'
seems equivalent to one from package 'graph': not turning on
duplicate class definitions for this class.
^C

This is because BiocCheck (indirectly?) Imports: graph. But the old
class definition seems to 'leak' (even though graph is not on the
search path, and the dist old class is not exported from graph, and
BiocCheck doesn't import the non-exported dist class, and cogena
doesn't Depend or Import graph!)

Also of interest perhaps is that the Note is only printed when
warn=1 (which BiocCheck also uses)

(new R session)

  requireNamespace(graph)
Loading required namespace: graph
  requireNamespace(cogena)
Loading required namespace: cogena
  q()

(new R session:)

  options(warn=1)
  requireNamespace(graph)
Loading required namespace: graph
  requireNamespace(cogena)
Loading required namespace: cogena

Note: the specification for S3 class dist in package 'cogena'
seems equivalent to one from package 'graph': not turning on
duplicate class definitions for this class.
 



Cheers,
H.


On 04/01/2015 03:28 PM, Michael Lawrence wrote:

Using setOldClass is generally fine. In this case, the graph
package is
already defining the dist class, so you could just import
that. The graph
package might have to export it though.

On Wed, Apr 1, 2015 at 3:15 PM, Zhilong Jia
zhilong...@gmail.com mailto:zhilong...@gmail.com wrote:

Hi,

Here is the package.
(https://tracker.bioconductor.__org/issue1204
https://tracker.bioconductor.org/issue1204 or
https://github.com/zhilongjia/__cogena
https://github.com/zhilongjia/cogena; ). When I
biocCheck it, there is a
note.

Note: the specification for S3 class “dist” in package
‘cogena’ seems
equivalent to one from package ‘graph’: not turning on
duplicate class
definitions for this class.


In the source code, there are two R files are related
with this issue,
cogena_class.R
and

https://github.com/__zhilongjia/cogena/blob/master/__R/cogena_class.R

https://github.com/zhilongjia/cogena/blob/master/R/cogena_class.R
dist_class.R

Re: [Bioc-devel] issue about S4 slot has a dist object.

2015-04-01 Thread Michael Lawrence

I used my unreleased rgtags package to search for references to setOldClass
in Bioc:

library(rgtags)
classes - sub(.*\\., , methods(print))
defs - do.call(c, lapply(classes, findDefinitions))
oldClass - lapply(expr(defs), `[[`, 1L) == quote(setOldClass)
defs[oldClass]

   tagname  path line
1 AsIs BiocGenerics/R/S3-classes-as-S4-classes.R   20
2  POSIXlt   gmapR/R/GsnapOutput-class.R   10
3 dist   MLInterfaces/R/AllClasses.R  105
4 dist MLInterfaces/inst/oldFiles/INIT.R   15
5 dist  graph/R/AllClasses.R   45
6 dist   phyloseq/R/allClasses.R  151
7  formula  biovizBase/R/facets-method.R   21
8  formula geeni/R/gdManager-class.R8
9 function   gQTLstats/R/allS4.R1
10  hclust  MLInterfaces/inst/oldFiles/classInterfaces.R   16
11  hclust   chroGPS/R/clusGPS.R2
12  kmeans  MLInterfaces/inst/oldFiles/classInterfaces.R   17
13 numeric_version AnnotationHub/R/AnnotationHubMetadata-class.R3
14 numeric_version AnnotationHubData/R/AnnotationHubMetadata-class.R   12
15  prcomp MLInterfaces/R/clDesign.R   23
16  prcomp  MLInterfaces/inst/oldFiles/classInterfaces.R   18
17 sessionInfoGGtools/R/AllClasses.R1

Note that POSIXlt, formula and function are already defined by the methods
package. I have removed the calls from gmapR and biovizBase, but the
maintainers of the other packages should be notified. It looks like dist
was a good choice, because it is in three packages. The rest of the classes
are somewhat specialized. What's the deal with AnnotationHub and
AnnotationHubData? I will run a similar analysis of CRAN.


On Wed, Apr 1, 2015 at 5:18 PM, Hervé Pagès hpa...@fredhutch.org wrote:

 On 04/01/2015 05:05 PM, Michael Lawrence wrote:

 So this explains why I wasn't able to figure out how that package was
 importing graph, and, yes, I also thought it was strange that graph did
 not export it. The methods package explicitly conditions on the warn
 level, so it is apparently intentional. It just looks in the global
 class table for duplicates, so it does not pay attention to the
 namespace. It's not clear to what extent the methods package assumes
 that there are no duplicates in the class table; probably too much work
 to fix.

 As for sharing class definitions, perhaps the methods package should
 define it. It already defines classes from the stats package, like
 aov. We could start moving more stuff from BiocGenerics to methods.


 Sounds good. The more upstream these class definitions are the better.

 Just moved setOldClass(dist) from graph to BiocGenerics 0.13.11 and
 exported the dist class.

 H.




 On Wed, Apr 1, 2015 at 4:29 PM, Martin Morgan mtmor...@fredhutch.org
 mailto:mtmor...@fredhutch.org wrote:

 On 04/01/2015 03:52 PM, Hervé Pagès wrote:

 Hi,

 In the same way that we avoid having 2 packages define the same
 S4 generic function by moving the shared generic definitions to
 BiocGenerics, it seems that we should also avoid having 2 packages
 call setOldClass on the same S3 class. Like with S4 generic
 functions,
 we've already started to do this by putting some setOldClass
 statements in BiocGenerics (e.g. we've done it for the
 'connection'
 classes 'file', 'url', 'gzfile', 'bzfile', etc..., see
 class?gzfile).
 So if nobody objects we'll do this for the 'dist' class too.

 Then you won't need to use setOldClass in your cogena package
 Zhilong.
 You'll just need to make sure that you import BiocGenerics.


 This sounds like an ok work-around to me.

 For the hard-core...

 One thing is that this is not seen when the package is loaded by
 itself, e.g.,

   biocLite(zhilongjia/cogena)
   library(cogena)
  

 only when loaded by BiocCheck (on the source directory)

   BiocCheck::BiocCheck(cogena)
 * This is BiocCheck, version 1.3.13.
 * BiocCheck is a work in progress. Output and severity of issues may
change.
 * Installing package...
 Note: the specification for S3 class dist in package 'cogena'
 seems equivalent to one from package 'graph': not turning on
 duplicate class definitions for this class.
 ^C

 This is because BiocCheck (indirectly?) Imports: graph. But the old
 class definition seems to 'leak' (even though graph is not on the
 search path, and the dist old class is not exported from graph, and
 BiocCheck doesn't import the non-exported dist class, and cogena
 doesn't Depend or Import graph!)

 Also

Re: [Bioc-devel] issue about S4 slot has a dist object.

2015-04-01 Thread Michael Lawrence

So this explains why I wasn't able to figure out how that package was
importing graph, and, yes, I also thought it was strange that graph did not
export it. The methods package explicitly conditions on the warn level, so
it is apparently intentional. It just looks in the global class table for
duplicates, so it does not pay attention to the namespace. It's not clear
to what extent the methods package assumes that there are no duplicates in
the class table; probably too much work to fix.

As for sharing class definitions, perhaps the methods package should define
it. It already defines classes from the stats package, like aov. We could
start moving more stuff from BiocGenerics to methods.



On Wed, Apr 1, 2015 at 4:29 PM, Martin Morgan mtmor...@fredhutch.org
wrote:

 On 04/01/2015 03:52 PM, Hervé Pagès wrote:

 Hi,

 In the same way that we avoid having 2 packages define the same
 S4 generic function by moving the shared generic definitions to
 BiocGenerics, it seems that we should also avoid having 2 packages
 call setOldClass on the same S3 class. Like with S4 generic functions,
 we've already started to do this by putting some setOldClass
 statements in BiocGenerics (e.g. we've done it for the 'connection'
 classes 'file', 'url', 'gzfile', 'bzfile', etc..., see class?gzfile).
 So if nobody objects we'll do this for the 'dist' class too.

 Then you won't need to use setOldClass in your cogena package Zhilong.
 You'll just need to make sure that you import BiocGenerics.


 This sounds like an ok work-around to me.

 For the hard-core...

 One thing is that this is not seen when the package is loaded by itself,
 e.g.,

  biocLite(zhilongjia/cogena)
  library(cogena)
 

 only when loaded by BiocCheck (on the source directory)

  BiocCheck::BiocCheck(cogena)
 * This is BiocCheck, version 1.3.13.
 * BiocCheck is a work in progress. Output and severity of issues may
   change.
 * Installing package...
 Note: the specification for S3 class dist in package 'cogena' seems
 equivalent to one from package 'graph': not turning on duplicate class
 definitions for this class.
 ^C

 This is because BiocCheck (indirectly?) Imports: graph. But the old class
 definition seems to 'leak' (even though graph is not on the search path,
 and the dist old class is not exported from graph, and BiocCheck doesn't
 import the non-exported dist class, and cogena doesn't Depend or Import
 graph!)

 Also of interest perhaps is that the Note is only printed when warn=1
 (which BiocCheck also uses)

 (new R session)

  requireNamespace(graph)
 Loading required namespace: graph
  requireNamespace(cogena)
 Loading required namespace: cogena
  q()

 (new R session:)

  options(warn=1)
  requireNamespace(graph)
 Loading required namespace: graph
  requireNamespace(cogena)
 Loading required namespace: cogena

 Note: the specification for S3 class dist in package 'cogena' seems
 equivalent to one from package 'graph': not turning on duplicate class
 definitions for this class.
 



 Cheers,
 H.


 On 04/01/2015 03:28 PM, Michael Lawrence wrote:

 Using setOldClass is generally fine. In this case, the graph package is
 already defining the dist class, so you could just import that. The graph
 package might have to export it though.

 On Wed, Apr 1, 2015 at 3:15 PM, Zhilong Jia zhilong...@gmail.com
 wrote:

  Hi,

 Here is the package. (https://tracker.bioconductor.org/issue1204 or
 https://github.com/zhilongjia/cogena; ). When I biocCheck it, there is
 a
 note.

 Note: the specification for S3 class “dist” in package ‘cogena’ seems
 equivalent to one from package ‘graph’: not turning on duplicate class
 definitions for this class.


 In the source code, there are two R files are related with this issue,
 cogena_class.R
 and  https://github.com/zhilongjia/cogena/blob/master/R/cogena_class.R
 
 dist_class.R
 https://github.com/zhilongjia/cogena/blob/master/R/dist_class.R in
 the R
 dir. Here there is a dist slot in cogena class. In the dist_class.R
 https://github.com/zhilongjia/cogena/blob/master/R/dist_class.R, I
 use
 setOldClass, but it seems it is not recommended by Bioconductor.

 How to repair this issue? Thank you.

 Regards,
 Zhilong
 https://github.com/zhilongjia/cogena/blob/master/R/cogena_class.R

  [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


 [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel




 --
 Computational Biology / Fred Hutchinson Cancer Research Center
 1100 Fairview Ave. N.
 PO Box 19024 Seattle, WA 98109

 Location: Arnold Building M1 B861
 Phone: (206) 667-2793


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] issue about S4 slot has a dist object.

2015-04-01 Thread Zhilong Jia

Hi,

Here is the package. (https://tracker.bioconductor.org/issue1204 or
https://github.com/zhilongjia/cogena; ). When I biocCheck it, there is a
note.

Note: the specification for S3 class “dist” in package ‘cogena’ seems
equivalent to one from package ‘graph’: not turning on duplicate class
definitions for this class.


In the source code, there are two R files are related with this issue,
cogena_class.R
and  https://github.com/zhilongjia/cogena/blob/master/R/cogena_class.R
dist_class.R
https://github.com/zhilongjia/cogena/blob/master/R/dist_class.R in the R
dir. Here there is a dist slot in cogena class. In the dist_class.R
https://github.com/zhilongjia/cogena/blob/master/R/dist_class.R, I use
setOldClass, but it seems it is not recommended by Bioconductor.

How to repair this issue? Thank you.

Regards,
Zhilong
https://github.com/zhilongjia/cogena/blob/master/R/cogena_class.R

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Changes to the SummarizedExperiment Class

2015-04-01 Thread Michael Love

Yes, you're right! Sorry for the noise. I forgot this was how it
always behaved. All I had to do was change the argument name.

On Wed, Apr 1, 2015 at 3:51 PM, Hervé Pagès hpa...@fredhutch.org wrote:
 Hi Michael,

 On 04/01/2015 07:17 AM, Michael Love wrote:

 I'll retract those last two emails about empty GRanges. That's simply:

 se - SummarizedExperiment(assays, colData=colData)
 mcols(se) - myDataFrame


 Glad you found a simple way to do what you wanted.

 More below...


 On Tue, Mar 31, 2015 at 4:40 PM, Michael Love
 michaelisaiahl...@gmail.com wrote:

 Would this code inspired by the release version of GenomicRanges work?
 e.g. if I want to add a DataFrame with 10 rows:

 names - letters[1:10]
 x - relist(GRanges(), PartitioningByEnd(integer(10), names=names))
 mcols(x) - DataFrame(foo=1:10)

 Then give x to the rowRanges argument of SummarizedExperiment?

 On Tue, Mar 31, 2015 at 3:49 PM, Michael Love
 michaelisaiahl...@gmail.com wrote:

 I forgot to ask my other question. I had gone in early March and fixed
 my code to eliminate rowData-, but the argument to SummarizedExperiment
 was still called rowData, and a DataFrame could be provided. Then I
 didn't check for a few weeks, but the argument for the rowData slot is
 now called rowRanges. What's the trick to putting a DataFrame on an
 empty GRanges, so I can get the old behavior but now using the rowRanges
 argument?


 I'm not sure what you meant by so I can get the old behavior but
 now using the rowRanges argument.

 Just to clarify: the renaming of rowData to rowRanges is a change
 of name only, not a change of behavior. More precisely the new
 rowRanges() accessor should behave exactly as the old rowData()
 accessor. The same applies to the 'rowRanges' argument of the
 SummarizedExperiment() constructor. So whatever you were passing
 before to the 'rowData' argument, you should still be able to pass
 it to the new 'rowRanges' argument. Please let us know if it's not
 the case as this is certainly not intended.

 Thanks,
 H.



 On Tue, Mar 31, 2015 at 3:40 PM, Michael Love
 michaelisaiahl...@gmail.com wrote:

 With GenomicRanges 1.19.48, I'm still having issues with re-naming the
 first assay and duplication of memory from my March 9 email. I tried
 assayNames- as well. My use case is if I am given a
 SummarizedExperiment where the first element is not named counts
 (albeit the SE is most likely coming from summarizeOverlaps() and
 already named counts...).

 sessionInfo()

 R Under development (unstable) (2015-03-31 r68129)
 Platform: x86_64-apple-darwin12.5.0 (64-bit)
 Running under: OS X 10.8.5 (Mountain Lion)

 locale:
 [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

 attached base packages:
 [1] stats4parallel  stats graphics  grDevices datasets  utils
 methods   base

 other attached packages:
 [1] GenomicRanges_1.19.48 GenomeInfoDb_1.3.16   IRanges_2.1.43
 S4Vectors_0.5.22
 [5] BiocGenerics_0.13.10  testthat_0.9.1devtools_1.7.0
 knitr_1.9
 [9] BiocInstaller_1.17.6

 loaded via a namespace (and not attached):
 [1] formatR_1.1XVector_0.7.4  tools_3.3.0stringr_0.6.2
 evaluate_0.5.5

 On Mon, Mar 9, 2015 at 1:21 PM, Michael Love
 michaelisaiahl...@gmail.com wrote:



 On Mar 9, 2015 12:36 PM, Martin Morgan mtmor...@fredhutch.org
 wrote:


 On 03/09/2015 08:07 AM, Michael Love wrote:


 Some guidance on how to avoid duplication of the matrix for
 developers
 would be greatly appreciated.



 It's unsatisfactory, but using withDimnames=FALSE avoids duplication
 on extraction of assays (but obviously you don't have dimnames on the
 matrix). Row or column subsetting necessarily causes the subsetted assay
 data to be duplicated. There should not be any duplication when 
 rowRanges()
 or colData() are changed without changing their dimension / ordering.


 Thanks Martin for checking into the regression.

 Sorry, I should have been more specific earlier, I meant more
 guidance/documentation in the man page for SE. I scanned the 'Extension'
 section but didn't find a note on withDimnames for extracting the matrix 
 or
 this example of renaming the assays (it seems like this could easily be
 relevant for other package authors).

 A prominent note there might help devs write more memory efficient
 packages.

 The argument section mentions speed but I'd explicitly mention memory
 given that we're often storing big matrices:

 Setting withDimnames=FALSE  increases the speed with which assays are
 extracted.

 (its entirely possible the info is there but i missed it)

 Best,

 Mike


 Another example of a trouble point, is that if I am given an SE with
 an unnamed assay and I need to give the assay a name, this also can
 expand the memory used. I had found a solution (which works with
 GenomicRanges 1.18 / current release) with:

 names(assays(se, withDimnames=FALSE))[1] - foo

 But now I'm looking in devel and this appears to no longer work. The
 memory used expands, equivalent to:

Re: [Bioc-devel] SummarizedExperiment subset of 4 dimensions

Re: [Bioc-devel] SummarizedExperiment subset of 4 dimensions

Re: [Bioc-devel] issue about S4 slot has a dist object.

Re: [Bioc-devel] issue about S4 slot has a dist object.

Re: [Bioc-devel] issue about S4 slot has a dist object.

[Bioc-devel] issue about S4 slot has a dist object.

Re: [Bioc-devel] Changes to the SummarizedExperiment Class

7 matches

Site Navigation

Mail list logo

Footer information