Seconded, this would be so useful. I still use mget() for heavens sake. Meanwhile I'm going to try VariantFiltering. Thanks for starting this conversation Michael.
--t > On Jun 18, 2014, at 11:18 AM, Michael Lawrence <lawrence.mich...@gene.com> > wrote: > > That is a good start. But for convenience, I would favor something that > just returns the vector corresponding to "column" rather than a data.frame. > > Thanks, > Michael > > > >> On Wed, Jun 18, 2014 at 10:11 AM, Hervé Pagès <hpa...@fhcrc.org> wrote: >> >> Hi Michael, >> >> >>> On 06/18/2014 06:03 AM, Michael Lawrence wrote: >>> >>> Let's say I have a vector of gene IDs where some are NA, and are some are >>> repeated, and I want to map them to gene symbols, where I get NAs for the >>> NA IDs or IDs without a symbol. What is the best way to do this? >>> >>> I tried select() but it gave me a table with unique entries; not very >>> convenient. It also does not handle NAs. And totally breaks with >>> duplicates >>> using the GENEID key type (kind of works with ENTREZID): >>> >>> select(Homo.sapiens, GENEID, "SYMBOL", "GENEID") >>> Error in `[[<-`(`*tmp*`, name, value = list(GENEID = c("245938", "245939", >>> : >>> 269 elements in value to replace 1312 elements >>> >>> Also tried the venerable mget(GENEID, org.Hs.egSYMBOL, ifnotfound=NA), but >>> this returns a list and fails with NAs. >>> >>> What would be nice is something like: >>> >>> map(Homo.sapiens, GENEID, "SYMBOL", "GENEID", OneToOneOrNone) >>> >>> where OneToOneOrNone is an assertion that I expect the mappings to be >>> one-to-one, so it will unlist() or whatever and throw an error if the >>> assertion fails. It should return NA for anything not found, and for any >>> NA >>> GENEID. Does something like this already exist? >> >> Couldn't this be handled via an extra argument to select()? >> >> I would suggest this argument be called something like 'ManyToOneOrNone' >> or 'ManyToZeroOrOne' rather than 'OneToOneOrNone' (different keys >> can be mapped to the same symbol and I guess that's fine). >> >> In other words you want an option to force select() to return a >> data.frame that is "parallel" to the vector of keys (i.e. 1 row >> per key and in the same order, even when this vector contains NAs >> and/or duplicates), or fail. >> >> Kind of related to that discussion we had on the bioconductor list >> about 1 year ago: >> >> https://stat.ethz.ch/pipermail/bioconductor/2013-July/054056.html >> >> Cheers, >> H. >> >> >>> Michael >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioc-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpa...@fhcrc.org >> Phone: (206) 667-5791 >> Fax: (206) 667-1319 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel