That is a good start. But for convenience, I would favor something that just returns the vector corresponding to "column" rather than a data.frame.
Thanks, Michael On Wed, Jun 18, 2014 at 10:11 AM, Hervé Pagès <hpa...@fhcrc.org> wrote: > Hi Michael, > > > On 06/18/2014 06:03 AM, Michael Lawrence wrote: > >> Let's say I have a vector of gene IDs where some are NA, and are some are >> repeated, and I want to map them to gene symbols, where I get NAs for the >> NA IDs or IDs without a symbol. What is the best way to do this? >> >> I tried select() but it gave me a table with unique entries; not very >> convenient. It also does not handle NAs. And totally breaks with >> duplicates >> using the GENEID key type (kind of works with ENTREZID): >> >> select(Homo.sapiens, GENEID, "SYMBOL", "GENEID") >> Error in `[[<-`(`*tmp*`, name, value = list(GENEID = c("245938", "245939", >> : >> 269 elements in value to replace 1312 elements >> >> Also tried the venerable mget(GENEID, org.Hs.egSYMBOL, ifnotfound=NA), but >> this returns a list and fails with NAs. >> >> What would be nice is something like: >> >> map(Homo.sapiens, GENEID, "SYMBOL", "GENEID", OneToOneOrNone) >> >> where OneToOneOrNone is an assertion that I expect the mappings to be >> one-to-one, so it will unlist() or whatever and throw an error if the >> assertion fails. It should return NA for anything not found, and for any >> NA >> GENEID. Does something like this already exist? >> > > Couldn't this be handled via an extra argument to select()? > > I would suggest this argument be called something like 'ManyToOneOrNone' > or 'ManyToZeroOrOne' rather than 'OneToOneOrNone' (different keys > can be mapped to the same symbol and I guess that's fine). > > In other words you want an option to force select() to return a > data.frame that is "parallel" to the vector of keys (i.e. 1 row > per key and in the same order, even when this vector contains NAs > and/or duplicates), or fail. > > Kind of related to that discussion we had on the bioconductor list > about 1 year ago: > > https://stat.ethz.ch/pipermail/bioconductor/2013-July/054056.html > > Cheers, > H. > > >> Michael >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> >> > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpa...@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel