Dear Johannes, one follow-up question/comment on the EnsDb packages:
The reason they escaped my notice (and thus potentially will also others) is that I expected such packages to be named "^TxDb...". What actually argues against sticking to existing Bioc vocabulary and naming eg EnsDb.Hsapiens.v79 TxDb.Hsapiens.Ensembl.hg38.ensGene (or alternatively, if packages like BSgenome.Hsapiens.NCBI.GRCh38 will indeed make it in the long run: TxDb.Hsapiens.Ensembl.GRCh38.ensGene) This would also have the advantage that genome build and idType could be inferred right from the package name. Best, Ludwig > dear Robert and Ludwig, > > the EnsDb packages provide all the gene/transcript etc annotations for all > genes defined in the Ensembl database (for a given species and Ensembl > release). Except the column/attribute "entrezid" that is stored in the > internal database there is however no link to NCBI or UCSC annotations. > So, basically, if you want to use "pure" Ensembl based annotations: use > EnsDb, if you want to have the UCSC annotations: use the TxDb packages. > > In case you need EnsDbs of other species or Ensembl versions, the > ensembldb package provides functionality to generate such packages either > using the Ensembl Perl API or using GTF files provided by Ensembl. If you > have problems building the packages, just drop me a line and I'll do > that. > > cheers, jo > >> On 03 Jun 2015, at 15:56, Robert M. Flight <rfligh...@gmail.com> wrote: >> >> Ludwig, >> >> If you do this search on the UCSC genome browser (which this annotation >> package is built from), you will see that the longest variant is what >> is >> shown >> >> http://genome.ucsc.edu/cgi-bin/hgTracks?clade=mammal&org=Human&db=hg38&position=brca1&hgt.positionInput=brca1&hgt.suggestTrack=knownGene&Submit=submit&hgsid=429339723_8sd4QD2jSAnAsa6cVCevtoOy4GAz&pix=1885 >> >> If instead of "genes" you do "transcripts", you will see 20 different >> transcripts for this gene, including the one listed by NCBI. >> >> I havent tried it yet (haven't upgraded R or bioconductor to latest >> version), but there is now an Ensembl based annotation package as well, >> that may work better?? >> http://bioconductor.org/packages/release/data/annotation/html/EnsDb.Hsapiens.v79.html >> >> -Robert >> >> >> >> On Wed, Jun 3, 2015 at 7:04 AM Ludwig Geistlinger < >> ludwig.geistlin...@bio.ifi.lmu.de> wrote: >> >>> Dear Bioc annotation team, >>> >>> Querying TxDb.Hsapiens.UCSC.hg38.knownGene for gene coordinates, e.g. >>> for >>> >>> BRCA1; ENSG00000012048; entrez:672 >>> >>> via >>> >>>> genes(TxDb.Hsapiens.UCSC.hg38.knownGene, vals=list(gene_id="672")) >>> >>> gives me: >>> >>> GRanges object with 1 range and 1 metadata column: >>> seqnames ranges strand | gene_id >>> <Rle> <IRanges> <Rle> | <character> >>> 672 chr17 [43044295, 43170403] - | 672 >>> ------- >>> seqinfo: 455 sequences (1 circular) from hg38 genome >>> >>> >>> However, querying Ensembl and NCBI Gene >>> http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000012048 >>> http://www.ncbi.nlm.nih.gov/gene/672 >>> >>> the gene is located at (note the difference in the end position) >>> >>> Chromosome 17: 43,044,295-43,125,483 reverse strand >>> >>> >>> How is the inconsistency explained and how to extract an ENSEMBL/NCBI >>> conform annotation from the TxDb object? >>> (I am aware of biomaRt, but I want to explicitely use the Bioc >>> annotation >>> functionality). >>> >>> Thanks! >>> Ludwig >>> >>> >>> -- >>> Dipl.-Bioinf. Ludwig Geistlinger >>> >>> Lehr- und Forschungseinheit für Bioinformatik >>> Institut für Informatik >>> Ludwig-Maximilians-Universität München >>> Amalienstrasse 17, 2. Stock, Büro A201 >>> 80333 München >>> >>> Tel.: 089-2180-4067 >>> eMail: ludwig.geistlin...@bio.ifi.lmu.de >>> >>> _______________________________________________ >>> Bioc-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel > > _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel