Re: [Bioc-devel] Gene annotation: TxDb vs ENSEMBL/NCBI inconsistency
dear Ludwig, On 09 Jun 2015, at 10:46, Ludwig Geistlinger ludwig.geistlin...@bio.ifi.lmu.demailto:ludwig.geistlin...@bio.ifi.lmu.de wrote: Dear Johannes, Thx for providing the great EnsDb packages! One question: As of now, I am able to choose between TxDb and EnsDb for genomic coordinates of genomic features such as genes, transcripts, and exons. For the sequences themselves I need the corresponding BSgenome package. While it is easy to automatically map from a specific TxDb package (eg TxDb.Hsapiens.UCSC.hg38.knownGene) to the corresponding BSgenome package (here: BSgenome.Hsapiens.UCSC.hg38), I wonder how to do that for an EnsDb package as the package name (eg EnsDb.Hsapiens.v79) contains no information about the genome build. A cumbersome option would be to extract the genome_build from the metadata of the EnsDb package (which would give me for EnsDb.Hsapiens.v79: 'GRCh38') and then ask all existing BSgenome.Hsapiens packages for their metadata release name (eg 'GRCh38' for BSgenome.Hsapiens.UCSC.hg38). This however needs all BSgenome.Hsapiens packages installed and takes thus too much time and space for a programmatic access. Can you suggest a better way to map from coordinates to sequence (within the BioC annotation functionality)? agree, there's no easy mapping (yet). I'll implement a method suggestGenomePackage in the ensembldb package. In the long run I hope that also NCBI BSgenome packages (like the BSgenome.Hsapiens.NCBI.GRCh38) will become available for all species... that would make the mapping much easier... cheers, jo Thanks Best, Ludwig dear Robert and Ludwig, the EnsDb packages provide all the gene/transcript etc annotations for all genes defined in the Ensembl database (for a given species and Ensembl release). Except the column/attribute entrezid that is stored in the internal database there is however no link to NCBI or UCSC annotations. So, basically, if you want to use pure Ensembl based annotations: use EnsDb, if you want to have the UCSC annotations: use the TxDb packages. In case you need EnsDbs of other species or Ensembl versions, the ensembldb package provides functionality to generate such packages either using the Ensembl Perl API or using GTF files provided by Ensembl. If you have problems building the packages, just drop me a line and I'll do that. cheers, jo On 03 Jun 2015, at 15:56, Robert M. Flight rfligh...@gmail.commailto:rfligh...@gmail.com wrote: Ludwig, If you do this search on the UCSC genome browser (which this annotation package is built from), you will see that the longest variant is what is shown http://genome.ucsc.edu/cgi-bin/hgTracks?clade=mammalorg=Humandb=hg38position=brca1hgt.positionInput=brca1hgt.suggestTrack=knownGeneSubmit=submithgsid=429339723_8sd4QD2jSAnAsa6cVCevtoOy4GAzpix=1885 If instead of genes you do transcripts, you will see 20 different transcripts for this gene, including the one listed by NCBI. I havent tried it yet (haven't upgraded R or bioconductor to latest version), but there is now an Ensembl based annotation package as well, that may work better?? http://bioconductor.org/packages/release/data/annotation/html/EnsDb.Hsapiens.v79.html -Robert On Wed, Jun 3, 2015 at 7:04 AM Ludwig Geistlinger ludwig.geistlin...@bio.ifi.lmu.de wrote: Dear Bioc annotation team, Querying TxDb.Hsapiens.UCSC.hg38.knownGene for gene coordinates, e.g. for BRCA1; ENSG0012048; entrez:672 via genes(TxDb.Hsapiens.UCSC.hg38.knownGene, vals=list(gene_id=672)) gives me: GRanges object with 1 range and 1 metadata column: seqnames ranges strand | gene_id RleIRanges Rle | character 672chr17 [43044295, 43170403] - | 672 --- seqinfo: 455 sequences (1 circular) from hg38 genome However, querying Ensembl and NCBI Gene http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG0012048 http://www.ncbi.nlm.nih.gov/gene/672 the gene is located at (note the difference in the end position) Chromosome 17: 43,044,295-43,125,483 reverse strand How is the inconsistency explained and how to extract an ENSEMBL/NCBI conform annotation from the TxDb object? (I am aware of biomaRt, but I want to explicitely use the Bioc annotation functionality). Thanks! Ludwig -- Dipl.-Bioinf. Ludwig Geistlinger Lehr- und Forschungseinheit für Bioinformatik Institut für Informatik Ludwig-Maximilians-Universität München Amalienstrasse 17, 2. Stock, Büro A201 80333 München Tel.: 089-2180-4067 eMail: ludwig.geistlin...@bio.ifi.lmu.de ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list
Re: [Bioc-devel] Changes in AnnotationDbi
Hi My two cents: On 04/06/15 19:50, James W. MacDonald wrote: In other words, for me it is a common practice to do something like this: fit - lmFit(eset, design) fit2 - eBayes(fit) gns - select(chippackage, featureNames(eset), c(ENTREZID,SYMBOL)) gns - gns[!duplicated(gns[,1]),] fit2$genes - gns I add in the step where dups are removed because I already know they are there. But a naive user might instead do fit2$genes - select(chippackage, featureNames(eset), c(ENTREZID,SYMBOL)) I'm not even that happy with James' first solution, as it relies on the order being correct after removing the duplicates. I'd feel safer to use 'match' to ensure that. (What if an EntrezId is not found in the Annotation DB? Will we have a line with NA, or is the line simply missing? The latter would break James' code.) What users really want here is a way to get the preferred symbol for an entrezId, and for lack of this, they accept simply a random one or the first one (in some unspecified collation). So, we should have a function, maybe 'select1', to select one and only one hit for each query value. select1(x, keys, columns, keytype, requireUnique=FALSE, ... ) This would query the AnnotationDbi object 'x' as does 'select', but return a data frame with the columns specified in 'columns', and the vector that was passed as 'keys' as row names, thus guaranteeing that each line in the data frame corresponds to one query key. If there were multiple records for a key, the first one is used, unless 'requireUnique' is set, in which case an error is issued. And if no record is present for a key, the data frame contains a row of NAs for this key. This would be quite convenient for any kind of ID conversion issues. Simon ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Gene annotation: TxDb vs ENSEMBL/NCBI inconsistency
Dear Johannes, Thx for providing the great EnsDb packages! One question: As of now, I am able to choose between TxDb and EnsDb for genomic coordinates of genomic features such as genes, transcripts, and exons. For the sequences themselves I need the corresponding BSgenome package. While it is easy to automatically map from a specific TxDb package (eg TxDb.Hsapiens.UCSC.hg38.knownGene) to the corresponding BSgenome package (here: BSgenome.Hsapiens.UCSC.hg38), I wonder how to do that for an EnsDb package as the package name (eg EnsDb.Hsapiens.v79) contains no information about the genome build. A cumbersome option would be to extract the genome_build from the metadata of the EnsDb package (which would give me for EnsDb.Hsapiens.v79: 'GRCh38') and then ask all existing BSgenome.Hsapiens packages for their metadata release name (eg 'GRCh38' for BSgenome.Hsapiens.UCSC.hg38). This however needs all BSgenome.Hsapiens packages installed and takes thus too much time and space for a programmatic access. Can you suggest a better way to map from coordinates to sequence (within the BioC annotation functionality)? Thanks Best, Ludwig dear Robert and Ludwig, the EnsDb packages provide all the gene/transcript etc annotations for all genes defined in the Ensembl database (for a given species and Ensembl release). Except the column/attribute entrezid that is stored in the internal database there is however no link to NCBI or UCSC annotations. So, basically, if you want to use pure Ensembl based annotations: use EnsDb, if you want to have the UCSC annotations: use the TxDb packages. In case you need EnsDbs of other species or Ensembl versions, the ensembldb package provides functionality to generate such packages either using the Ensembl Perl API or using GTF files provided by Ensembl. If you have problems building the packages, just drop me a line and I'll do that. cheers, jo On 03 Jun 2015, at 15:56, Robert M. Flight rfligh...@gmail.com wrote: Ludwig, If you do this search on the UCSC genome browser (which this annotation package is built from), you will see that the longest variant is what is shown http://genome.ucsc.edu/cgi-bin/hgTracks?clade=mammalorg=Humandb=hg38position=brca1hgt.positionInput=brca1hgt.suggestTrack=knownGeneSubmit=submithgsid=429339723_8sd4QD2jSAnAsa6cVCevtoOy4GAzpix=1885 If instead of genes you do transcripts, you will see 20 different transcripts for this gene, including the one listed by NCBI. I havent tried it yet (haven't upgraded R or bioconductor to latest version), but there is now an Ensembl based annotation package as well, that may work better?? http://bioconductor.org/packages/release/data/annotation/html/EnsDb.Hsapiens.v79.html -Robert On Wed, Jun 3, 2015 at 7:04 AM Ludwig Geistlinger ludwig.geistlin...@bio.ifi.lmu.de wrote: Dear Bioc annotation team, Querying TxDb.Hsapiens.UCSC.hg38.knownGene for gene coordinates, e.g. for BRCA1; ENSG0012048; entrez:672 via genes(TxDb.Hsapiens.UCSC.hg38.knownGene, vals=list(gene_id=672)) gives me: GRanges object with 1 range and 1 metadata column: seqnames ranges strand | gene_id RleIRanges Rle | character 672chr17 [43044295, 43170403] - | 672 --- seqinfo: 455 sequences (1 circular) from hg38 genome However, querying Ensembl and NCBI Gene http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG0012048 http://www.ncbi.nlm.nih.gov/gene/672 the gene is located at (note the difference in the end position) Chromosome 17: 43,044,295-43,125,483 reverse strand How is the inconsistency explained and how to extract an ENSEMBL/NCBI conform annotation from the TxDb object? (I am aware of biomaRt, but I want to explicitely use the Bioc annotation functionality). Thanks! Ludwig -- Dipl.-Bioinf. Ludwig Geistlinger Lehr- und Forschungseinheit für Bioinformatik Institut für Informatik Ludwig-Maximilians-Universität München Amalienstrasse 17, 2. Stock, Büro A201 80333 München Tel.: 089-2180-4067 eMail: ludwig.geistlin...@bio.ifi.lmu.de ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Gene annotation: TxDb vs ENSEMBL/NCBI inconsistency
dear Robert and Ludwig, the EnsDb packages provide all the gene/transcript etc annotations for all genes defined in the Ensembl database (for a given species and Ensembl release). Except the column/attribute entrezid that is stored in the internal database there is however no link to NCBI or UCSC annotations. So, basically, if you want to use pure Ensembl based annotations: use EnsDb, if you want to have the UCSC annotations: use the TxDb packages. In case you need EnsDbs of other species or Ensembl versions, the ensembldb package provides functionality to generate such packages either using the Ensembl Perl API or using GTF files provided by Ensembl. If you have problems building the packages, just drop me a line and I'll do that. cheers, jo On 03 Jun 2015, at 15:56, Robert M. Flight rfligh...@gmail.com wrote: Ludwig, If you do this search on the UCSC genome browser (which this annotation package is built from), you will see that the longest variant is what is shown http://genome.ucsc.edu/cgi-bin/hgTracks?clade=mammalorg=Humandb=hg38position=brca1hgt.positionInput=brca1hgt.suggestTrack=knownGeneSubmit=submithgsid=429339723_8sd4QD2jSAnAsa6cVCevtoOy4GAzpix=1885 If instead of genes you do transcripts, you will see 20 different transcripts for this gene, including the one listed by NCBI. I havent tried it yet (haven't upgraded R or bioconductor to latest version), but there is now an Ensembl based annotation package as well, that may work better?? http://bioconductor.org/packages/release/data/annotation/html/EnsDb.Hsapiens.v79.html -Robert On Wed, Jun 3, 2015 at 7:04 AM Ludwig Geistlinger ludwig.geistlin...@bio.ifi.lmu.de wrote: Dear Bioc annotation team, Querying TxDb.Hsapiens.UCSC.hg38.knownGene for gene coordinates, e.g. for BRCA1; ENSG0012048; entrez:672 via genes(TxDb.Hsapiens.UCSC.hg38.knownGene, vals=list(gene_id=672)) gives me: GRanges object with 1 range and 1 metadata column: seqnames ranges strand | gene_id RleIRanges Rle | character 672chr17 [43044295, 43170403] - | 672 --- seqinfo: 455 sequences (1 circular) from hg38 genome However, querying Ensembl and NCBI Gene http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG0012048 http://www.ncbi.nlm.nih.gov/gene/672 the gene is located at (note the difference in the end position) Chromosome 17: 43,044,295-43,125,483 reverse strand How is the inconsistency explained and how to extract an ENSEMBL/NCBI conform annotation from the TxDb object? (I am aware of biomaRt, but I want to explicitely use the Bioc annotation functionality). Thanks! Ludwig -- Dipl.-Bioinf. Ludwig Geistlinger Lehr- und Forschungseinheit für Bioinformatik Institut für Informatik Ludwig-Maximilians-Universität München Amalienstrasse 17, 2. Stock, Büro A201 80333 München Tel.: 089-2180-4067 eMail: ludwig.geistlin...@bio.ifi.lmu.de ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Gene annotation: TxDb vs ENSEMBL/NCBI inconsistency
On 06/08/2015 11:43 PM, Rainer Johannes wrote: dear Robert and Ludwig, the EnsDb packages provide all the gene/transcript etc annotations for all genes defined in the Ensembl database (for a given species and Ensembl release). Except the column/attribute entrezid that is stored in the internal database there is however no link to NCBI or UCSC annotations. So, basically, if you want to use pure Ensembl based annotations: use EnsDb, if you want to have the UCSC annotations: use the TxDb packages. In case you need EnsDbs of other species or Ensembl versions, the ensembldb package provides functionality to generate such packages either using the Ensembl Perl API or using GTF files provided by Ensembl. If you have problems building the packages, just drop me a line and I'll do that. Two other sources of Ensembl TxDb's are GenomicFeatures::makeTxDbFromBiomart() and AnnotationHub. For the latter, I'll add a variant of the following to the AnnotationHub HOWTO vignette http://bioconductor.org/packages/devel/bioc/html/AnnotationHub.html later today. ## Gene models _Bioconductor_ represents gene models using 'transcript' databases. These are available via packages such as [TxDb.Hsapiens.UCSC.hg38.knownGene](http://bioconductor.org/packages/TxDb.Hsapiens.UCSC.knownGene.html), or can be constructed using functions such as `[GenomicFeatures](http://bioconductor.org/packages/GenomicFeatures.html)::makeTxDbFromBiomart()` or `GenomicFeatures::makeTxDbFromGRanges()`. _AnnotationHub_ provides an easy way to work with gene models published by Ensembl. Here we discover the Ensemble release 80 r esources for pufferfish,_Takifugu rubripes_ ```{r takifugu-gene-models} query(ah, c(Takifugu, release-80)) ``` We see that there is a GTF file, as well as various DNA sequences. Let's retrieve the GTF and top-level sequence files. The GTF file is imported as a _GRanges_ instance, the DNA sequence as a compressed, indexed Fasta file ```{r takifugi-data} gtf - ah[[AH47101]] dna - ah[[AH47477]] head(gtf, 3) dna head(seqlevels(dna)) ``` It is trivial to make a TxDb instance ```{r takifugi-txdb} library(GenomicFeatures) txdb - makeTxDbFromGRanges(gtf) and to use that in conjunction with the DNA sequence, e.g., to find exon sequences of all annotated genes. ```{r takifugi-exons} library(Rsamtools) # for getSeq,FaFile-method exons - exons(txdb) getSeq(dna, exons) ``` Some difficulties arise when working with this partly assembled genome that require more advanced GenomicRanges skills, see the [GenomicRanges](http://bioconductor.org/packages/GenomicRanges.html) vignettes, especially GenomicRanges HOWTOs and An Introduction to GenomicRanges. cheers, jo On 03 Jun 2015, at 15:56, Robert M. Flight rfligh...@gmail.com wrote: Ludwig, If you do this search on the UCSC genome browser (which this annotation package is built from), you will see that the longest variant is what is shown http://genome.ucsc.edu/cgi-bin/hgTracks?clade=mammalorg=Humandb=hg38position=brca1hgt.positionInput=brca1hgt.suggestTrack=knownGeneSubmit=submithgsid=429339723_8sd4QD2jSAnAsa6cVCevtoOy4GAzpix=1885 If instead of genes you do transcripts, you will see 20 different transcripts for this gene, including the one listed by NCBI. I havent tried it yet (haven't upgraded R or bioconductor to latest version), but there is now an Ensembl based annotation package as well, that may work better?? http://bioconductor.org/packages/release/data/annotation/html/EnsDb.Hsapiens.v79.html -Robert On Wed, Jun 3, 2015 at 7:04 AM Ludwig Geistlinger ludwig.geistlin...@bio.ifi.lmu.de wrote: Dear Bioc annotation team, Querying TxDb.Hsapiens.UCSC.hg38.knownGene for gene coordinates, e.g. for BRCA1; ENSG0012048; entrez:672 via genes(TxDb.Hsapiens.UCSC.hg38.knownGene, vals=list(gene_id=672)) gives me: GRanges object with 1 range and 1 metadata column: seqnames ranges strand | gene_id RleIRanges Rle | character 672chr17 [43044295, 43170403] - | 672 --- seqinfo: 455 sequences (1 circular) from hg38 genome However, querying Ensembl and NCBI Gene http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG0012048 http://www.ncbi.nlm.nih.gov/gene/672 the gene is located at (note the difference in the end position) Chromosome 17: 43,044,295-43,125,483 reverse strand How is the inconsistency explained and how to extract an ENSEMBL/NCBI conform annotation from the TxDb object? (I am aware of biomaRt, but I want to explicitely use the Bioc annotation functionality). Thanks! Ludwig -- Dipl.-Bioinf. Ludwig Geistlinger Lehr- und Forschungseinheit für Bioinformatik Institut für Informatik Ludwig-Maximilians-Universität München Amalienstrasse 17, 2. Stock, Büro A201 80333 München Tel.: 089-2180-4067 eMail: ludwig.geistlin...@bio.ifi.lmu.de ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Changes in AnnotationDbi
Hi Martin On 09/06/15 15:35, Martin Morgan wrote: In case you missed it in Marc's reply, and acknowledging that this is different from your suggestion, there is mapIds() for doing this on a single column basis, which is the common use case where one doesn't care too much about multiple mapping ids I have indeed missed this point in Marc's reply -- and you are right, the single column case is the only one where it is common that one does not care for multiple mapping. So, sorry for the noise. How comes I never knew 'mapIds' even though it is clearly mentioned in the AnnotationDb help page? Maybe the page is too long, or --more likely-- I'm to impatient when browsing through help pages. Simon ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Changes in AnnotationDbi
On 06/09/2015 02:52 AM, Simon Anders wrote: Hi My two cents: On 04/06/15 19:50, James W. MacDonald wrote: In other words, for me it is a common practice to do something like this: fit - lmFit(eset, design) fit2 - eBayes(fit) gns - select(chippackage, featureNames(eset), c(ENTREZID,SYMBOL)) gns - gns[!duplicated(gns[,1]),] fit2$genes - gns I add in the step where dups are removed because I already know they are there. But a naive user might instead do fit2$genes - select(chippackage, featureNames(eset), c(ENTREZID,SYMBOL)) I'm not even that happy with James' first solution, as it relies on the order being correct after removing the duplicates. I'd feel safer to use 'match' to ensure that. (What if an EntrezId is not found in the Annotation DB? Will we have a line with NA, or is the line simply missing? The latter would break James' code.) What users really want here is a way to get the preferred symbol for an entrezId, and for lack of this, they accept simply a random one or the first one (in some unspecified collation). So, we should have a function, maybe 'select1', to select one and only one hit for each query value. select1(x, keys, columns, keytype, requireUnique=FALSE, ... ) This would query the AnnotationDbi object 'x' as does 'select', but return a data frame with the columns specified in 'columns', and the vector that was passed as 'keys' as row names, thus guaranteeing that each line in the data frame corresponds to one query key. If there were multiple records for a key, the first one is used, unless 'requireUnique' is set, in which case an error is issued. And if no record is present for a key, the data frame contains a row of NAs for this key. This would be quite convenient for any kind of ID conversion issues. In case you missed it in Marc's reply, and acknowledging that this is different from your suggestion, there is mapIds() for doing this on a single column basis, which is the common use case where one doesn't care too much about multiple mapping ids org = org.Hs.eg.db head(select(org, keys(org), ALIAS)) ENTREZIDALIAS 11 A1B 21 ABG 31 GAB 41 HYST2477 51 A1BG 62 A2MD head(mapIds(org, keys(org), ALIAS, ENTREZID)) 1 2 3 9 10 11 A1B A2MD A2MP AAC1 AAC2 AACP head(mapIds(org, keys(org), ALIAS, ENTREZID, multiVals=CharacterList)) CharacterList of length 6 [[1]] A1B ABG GAB HYST2477 A1BG [[2]] A2MD CPAMD5 FWP007 S863-7 A2M [[3]] A2MP A2MP1 [[9]] AAC1 MNAT NAT-1 NATI NAT1 [[10]] AAC2 NAT-2 PNAT NAT2 [[11]] AACP NATP1 NATP str(head(mapIds(org, keys(org), ALIAS, ENTREZID, multiVals=list))) List of 6 $ 1 : chr [1:5] A1B ABG GAB HYST2477 ... $ 2 : chr [1:5] A2MD CPAMD5 FWP007 S863-7 ... $ 3 : chr [1:2] A2MP A2MP1 $ 9 : chr [1:5] AAC1 MNAT NAT-1 NATI ... $ 10: chr [1:4] AAC2 NAT-2 PNAT NAT2 $ 11: chr [1:3] AACP NATP1 NATP Also since this is the devel list, there is library(dplyr) d = src_sqlite(org.Hs.eg_dbfile()) d src: sqlite 3.8.6 [/home/mtmorgan/R/x86_64-unknown-linux-gnu-library/3.2-BiocDevel/org.Hs.eg.db/extdata/org.Hs.eg.sqlite] tbls: accessions, alias, chrlengths, chromosome_locations, chromosomes, cytogenetic_locations, ec, ensembl, ensembl_prot, ensembl_trans, ensembl2ncbi, gene_info, genes, go, go_all, go_bp, go_bp_all, go_cc, go_cc_all, go_mf, go_mf_all, kegg, map_counts, map_metadata, metadata, ncbi2ensembl, omim, pfam, prosite, pubmed, refseq, sqlite_stat1, ucsc, unigene, uniprot d %% tbl(alias) %% group_by(`_id`) %% summarize(alias_symbol) Source: sqlite 3.8.6 [/home/mtmorgan/R/x86_64-unknown-linux-gnu-library/3.2-BiocDevel/org.Hs.eg.db/extdata/org.Hs.eg.sqlite] From: derived table [?? x 2] _id alias_symbol 11 A1BG 22 A2M 33A2MP1 44 NAT1 55 NAT2 66 NATP 77 SERPINA3 88AADAC 99 AAMP 10 10AANAT .. ... ... (with lots of nice confusion there, including extensive masking of symbols between dplyr / AnnotationDbi, need for knowledge of the schema (basically a central id, ENTREZID for org packages, and tables of mappings from the central id to other ids), and the more-or-less arbitrary choice of alias_symbol). Martin Simon ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel