Tim, you always crack me up! :) I totally agree, and it would probably be good to also have the tools enabled to download directly from Ensembl, NCBI, cloud-annotation source, etc. and build/update the AnnDbBimap objects. This way the annotation sources can maintain the data and us the scripts, including the pre-built AnnDbBimap objects just in case.
~p -----Original Message----- From: Bioc-devel [mailto:bioc-devel-boun...@r-project.org] On Behalf Of Tim Triche, Jr. Sent: Monday, January 11, 2016 2:02 PM To: Vincent Carey Cc: bioc-devel@r-project.org Subject: Re: [Bioc-devel] Known Genes replaced by GENCODE genes at UCSC ENSEMBL knownGene was always a disaster. For extra amusement/horror, be sure to check out the sad saga of the TCGA GAF and its disconnection from knownGenes as well as reality. Three cheers for rendering transcript-level estimates useless (and no this was not Katie's fault) Rainer and many others have made a herculean effort to bring all the BioC annotation infrastructure into the 21st century... having worked with Kallisto extensively of late, I see no reason to use a non-ENSEMBL "conservative" reference transcriptome (I see plenty of reasons to use miTranscriptome, etc. but that is another discussion). sorry if slighting anyone/everyone, but ENSEMBL is the clear choice IMHO. $0.02 - transmission costs --t On Mon, Jan 11, 2016 at 10:57 AM, Vincent Carey <st...@channing.harvard.edu> wrote: > I think these are all good observations and we may benefit from a > wider discussion on the support site? > > the abandonment of knownGene seems to have clear implications for > changing our most visible txdb examples. what should we change to? > can we make a more future-proof design for these annotation > selections? > > On Mon, Jan 11, 2016 at 1:40 PM, Robert Castelo > <robert.cast...@upf.edu> > wrote: > > > hi, > > > > On 01/11/2016 04:07 PM, Vincent Carey wrote: > > [...] > > > >> Is it true that there is an asymmetry between Entrez gene ID and > >> Ensembl gene ID for querying org.Hs.eg.db (I tend to prefer > >> Homo.sapiens as a symbol mapping resource)? Both ENTREZID and > >> ENSEMBL are listed as keytypes. My question is whether this > >> "anchor" concept holds in the current infrastructure. > >> > > > > you're right that the infrastructure is probably symmetric at least > > between Entrez and Ensembl, so maybe i'm not using the term "anchor" > > correctly here, i'm just referring to the fact that many package > functions > > and use cases of BioC are based in, or illustrated, using Entrez IDs. > > examples are: > > > > head(org.Hs.eg.db::keys(org.Hs.eg.db)) > > [1] "1" "2" "3" "9" "10" "11" > > > > i.e., by default the 'keytype' is 'ENTREZID' > > > > genefilter::nsFilter() argument 'require.entrez' filters out > > features without an Entrez Gene ID annotation. > > > > Category::categoryToEntrezBuilder() returns a list mapping category > > ids > to > > the Entrez Gene ids annotated at the cateogry id. > > > > SummarizedExperiment::geneRangeMapper() takes a 'TxDb' object and a > > keytype to map ranges to genes. By default the keytype is 'ENTREZID' > > > > some of the workflows are also based on Entrez IDs, such as: > > > > > http://www.bioconductor.org/help/workflows/annotation/Annotation_Resou > rces > > > > http://www.bioconductor.org/help/workflows/variants > > > > so if the user just replaces the txdb object in one of those > > examples or argument functions by a txdb object that does not have > > Entrez identifiers as primary gene key, those functions, examples or > > workflows will require modification. this is not necessarily bad, > > but may put more burden on the user who is learning with a "default" TxDb human gene annotation package. > > this has been so far the *.UCSC.knownGene using Entrez as gene > identifiers. > > given the apparent discontinuity of UCSC with the known gene track, > > i > would > > suggest to put available at the BioC site another default gene > > annotation package, but then one based on Entrez identifiers given > > the amount of legacy code and documentation using Entrez in one way or another. > > > > an alternative to translating the default Ensembl Gencode > > identifiers > into > > Entrez would be to just take the NCBI RefSeq annotations as human > > gene annotation package available by default, i.e., replacing > > current *.UCSC.knownGene by *.UCSC.refGene > > > > > > > > robert. > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel