That is a good start. But for convenience, I would favor something that
just returns the vector corresponding to "column" rather than a data.frame.

Thanks,
Michael



On Wed, Jun 18, 2014 at 10:11 AM, Hervé Pagès <hpa...@fhcrc.org> wrote:

> Hi Michael,
>
>
> On 06/18/2014 06:03 AM, Michael Lawrence wrote:
>
>> Let's say I have a vector of gene IDs where some are NA, and are some are
>> repeated, and I want to map them to gene symbols, where I get NAs for the
>> NA IDs or IDs without a symbol. What is the best way to do this?
>>
>> I tried select() but it gave me a table with unique entries; not very
>> convenient. It also does not handle NAs. And totally breaks with
>> duplicates
>> using the GENEID key type (kind of works with ENTREZID):
>>
>> select(Homo.sapiens, GENEID, "SYMBOL", "GENEID")
>> Error in `[[<-`(`*tmp*`, name, value = list(GENEID = c("245938", "245939",
>> :
>>    269 elements in value to replace 1312 elements
>>
>> Also tried the venerable mget(GENEID, org.Hs.egSYMBOL, ifnotfound=NA), but
>> this returns a list and fails with NAs.
>>
>> What would be nice is something like:
>>
>> map(Homo.sapiens, GENEID, "SYMBOL", "GENEID", OneToOneOrNone)
>>
>> where OneToOneOrNone is an assertion that I expect the mappings to be
>> one-to-one, so it will unlist() or whatever and throw an error if the
>> assertion fails. It should return NA for anything not found, and for any
>> NA
>> GENEID. Does something like this already exist?
>>
>
> Couldn't this be handled via an extra argument to select()?
>
> I would suggest this argument be called something like 'ManyToOneOrNone'
> or 'ManyToZeroOrOne' rather than 'OneToOneOrNone' (different keys
> can be mapped to the same symbol and I guess that's fine).
>
> In other words you want an option to force select() to return a
> data.frame that is "parallel" to the vector of keys (i.e. 1 row
> per key and in the same order, even when this vector contains NAs
> and/or duplicates), or fail.
>
> Kind of related to that discussion we had on the bioconductor list
> about 1 year ago:
>
>   https://stat.ethz.ch/pipermail/bioconductor/2013-July/054056.html
>
> Cheers,
> H.
>
>
>> Michael
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fhcrc.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to