Hi Michael,

On 06/18/2014 06:03 AM, Michael Lawrence wrote:
Let's say I have a vector of gene IDs where some are NA, and are some are
repeated, and I want to map them to gene symbols, where I get NAs for the
NA IDs or IDs without a symbol. What is the best way to do this?

I tried select() but it gave me a table with unique entries; not very
convenient. It also does not handle NAs. And totally breaks with duplicates
using the GENEID key type (kind of works with ENTREZID):

select(Homo.sapiens, GENEID, "SYMBOL", "GENEID")
Error in `[[<-`(`*tmp*`, name, value = list(GENEID = c("245938", "245939",
:
   269 elements in value to replace 1312 elements

Also tried the venerable mget(GENEID, org.Hs.egSYMBOL, ifnotfound=NA), but
this returns a list and fails with NAs.

What would be nice is something like:

map(Homo.sapiens, GENEID, "SYMBOL", "GENEID", OneToOneOrNone)

where OneToOneOrNone is an assertion that I expect the mappings to be
one-to-one, so it will unlist() or whatever and throw an error if the
assertion fails. It should return NA for anything not found, and for any NA
GENEID. Does something like this already exist?

Couldn't this be handled via an extra argument to select()?

I would suggest this argument be called something like 'ManyToOneOrNone'
or 'ManyToZeroOrOne' rather than 'OneToOneOrNone' (different keys
can be mapped to the same symbol and I guess that's fine).

In other words you want an option to force select() to return a
data.frame that is "parallel" to the vector of keys (i.e. 1 row
per key and in the same order, even when this vector contains NAs
and/or duplicates), or fail.

Kind of related to that discussion we had on the bioconductor list
about 1 year ago:

  https://stat.ethz.ch/pipermail/bioconductor/2013-July/054056.html

Cheers,
H.


Michael

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to