Re: [Bioc-devel] rownames in SummerizedExperiments

Michael Lawrence Sun, 06 Apr 2014 20:28:30 -0700

eSet constrains rownames to be unique, so that's a precedent. Seems like
there should at least be row/column consistency here.



On Sun, Apr 6, 2014 at 6:22 PM, Martin Morgan <mtmor...@fhcrc.org> wrote:

> On 04/06/2014 04:21 PM, Michael Lawrence wrote:
>
>>
>>
>>
>> On Sun, Apr 6, 2014 at 2:48 PM, Simon Anders <and...@embl.de
>> <mailto:and...@embl.de>> wrote:
>>
>>     Hi Michael
>>
>>     On 06/04/14 23:32, Michael Lawrence wrote:
>>      > On an arbitrary vector, the names do not need to be unique, but
>> they DO
>>      > need to be unique on a DataFrame (according to the data.frame
>>      > conventions). Conditioning on whether there are duplicate names
>> would be
>>      > too complicated, so it is left to the user to declare whether the
>> names
>>      > are expected on the result. Since in general the vector names are
>> not
>>      > valid rownames, the default is FALSE. I guess if we really wanted
>> to be
>>      > consistent with R, we would mangle the names to make them unique,
>> but
>>      > that check is expensive.
>>
>>     Thanks for the response, but I'm not sure I understand it. I thought
>>     "use.names=TRUE" instructs "mcols" to use the rownames of the
>>     SummerizedExperiment object as rownames for the returned DataFrame.
>> Now,
>>     as the rownames of the SummerizedExperiment have to be unique anyway
>> (at
>>     least, I suppose they have to -- they are names, too, after all, and
>> not
>>     just an arbitrary vector), how can it happen that duplicate names
>> might
>>     appear?
>>
>>
>> I don't think the SE rownames are constrained to be unique. I haven't
>> tested it,
>>
>
> Empirically, the row names can be duplicated, but the column names cannot.
>
> The lack of constraint on row names is enabled by the rowData
> GenomicRanges, while the constraint on column names is introduced by the
> (rownames of the) colData DataFrame. So the lack of symmetry in the class
> leads to lack of symmetry for dimnames. The use of GenomicRanges for rows
> has been the subject of previous discussion.
>
> It wouldn't be inconceivable to impose constraints on duplicate row names
> in SummarizedExperiment and set use.names=TRUE by default, or to redefine
> mcols(se) to use.names=!any(dupclicated(se)). There would be performance
> consequences (how much?) and an mcols inconsistency. I think this is part
> of the same discussion as
>
>   https://stat.ethz.ch/pipermail/bioc-devel/2014-March/005409.html
>
> which I have not yet followed through on.
>
> Syntax wise, there is also
>
>   mcols(se)[rownames(se) == "gene_D", "yellowness"]
>
> This is more efficient (and more error prone) than either use.names or
> Michael's suggestion.
>
> Martin
>
>
>  but I don't see the assertion in the code. This is because an SE is
>> modeled as a
>> matrix, which does not have the same constraint as a data.frame.
>>
>>     The use case: I have a SummerizedExperiment object with gene IDs in
>> the
>>     rownames. Let's say I want to get the value in the meta-data column
>>     "yellowness" for "gene_D".
>>
>>     With en ExpressionSet, I could write:
>>         fData(es)["gene_D","yellowness"]
>>
>>     With SummerizeExperiment, it has to be:
>>         mcols(se,use.names=TRUE)["gene_D","yellowness"]
>>
>>     Of course, it's no big deal, but I find it quite clumsy, and I wonder
>>     why it has to be this way.
>>
>>
>> Well, there's this syntax:
>> mcols(se["gene_D",])$yellowness
>>
>>
>>        Simon
>>
>>
>>
>
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] rownames in SummerizedExperiments

Reply via email to