Has this situation been rectified? On Tue, Apr 23, 2019 at 11:40 AM Van Twisk, Daniel < daniel.vantw...@roswellpark.org> wrote:
> We've made some changes to our annotation generation scripts this release > and it seems these may have introduced some errors. Thank you for > identifying this issue and I will try to have some fixes out asap. > > ________________________________ > From: Bioc-devel <bioc-devel-boun...@r-project.org> on behalf of James W. > MacDonald <jmac...@uw.edu> > Sent: Tuesday, April 23, 2019 11:03:02 AM > To: Aaron Lun > Cc: Bioc-devel > Subject: Re: [Bioc-devel] Weird monkey identifiers in org.Hs.eg.db > > Looks like the ensembl table of the human.db0 package got polluted with > *Pan > troglodytes* genes: > > > con <- dbConnect(SQLite(), > "/R-devel/lib64/R/library/human.db0/extdata/chipsrc_human.sqlite") > > dbGetQuery(con, "select count(*) from ensembl where ensid like > 'ENSPTR%';") > count(*) > 1 16207 > > dbGetQuery(con, "select count(*) from ensembl where ensid like 'ENSG%';") > count(*) > 1 28973 > > On Mon, Apr 22, 2019 at 11:54 PM Aaron Lun < > infinite.monkeys.with.keyboa...@gmail.com> wrote: > > > Playing around with org.Hs.eg.db 3.8.0. What on earth is ENSPTRG0000...? > > > > > library(org.Hs.eg.db) > > > mapIds(org.Hs.eg.db, key="GCG", keytype="SYMBOL", column="ENSEMBL") > > 'select()' returned 1:many mapping between keys and columns > > GCG > > "ENSPTRG00000000777" > > > > Well, at least it still recovers the right identifier... eventually. > > > > > select(org.Hs.eg.db, key="GCG", keytype="SYMBOL", columns="ENSEMBL") > > 'select()' returned 1:many mapping between keys and columns > > SYMBOL ENSEMBL > > 1 GCG ENSPTRG00000000777 > > 2 GCG ENSG00000115263 > > > > The SYMBOL->Entrez ID relational table seems to be okay: > > > > > Y <- toTable(org.Hs.egSYMBOL) > > > Y[which(Y[,2]=="GCG"),] > > gene_id symbol > > 2152 2641 GCG > > > > So the cause is the Ensembl->Entrez mappings: > > > > > Z <- toTable(org.Hs.egENSEMBL2EG) > > > Z[Z[,1]==2641,] > > gene_id ensembl_id > > 3028 2641 ENSPTRG00000000777 > > 3029 2641 ENSG00000115263 > > > > Googling suggests that ENSPTRG00000000777 is an identifier for some > > other gene in one of the other monkeys. Hardly "Hs" stuff. > > > > Session info (not technically R 3.6, but I didn't think that would have > > been the cause): > > > > > R Under development (unstable) (2019-04-11 r76379) > > > Platform: x86_64-pc-linux-gnu (64-bit) > > > Running under: Ubuntu 18.04.2 LTS > > > > > > Matrix products: default > > > BLAS: /home/luna/Software/R/trunk/lib/libRblas.so > > > LAPACK: /home/luna/Software/R/trunk/lib/libRlapack.so > > > > > > locale: > > > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > > > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > > > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > > > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > > > > > attached base packages: > > > [1] parallel stats4 stats graphics grDevices utils > datasets > > > [8] methods base > > > > > > other attached packages: > > > [1] org.Hs.eg.db_3.8.0 AnnotationDbi_1.45.1 IRanges_2.17.5 > > > [4] S4Vectors_0.21.23 Biobase_2.43.1 BiocGenerics_0.29.2 > > > > > > loaded via a namespace (and not attached): > > > [1] Rcpp_1.0.1 digest_0.6.18 DBI_1.0.0 RSQLite_2.1.1 > > > [5] blob_1.1.1 bit64_0.9-7 bit_1.1-14 compiler_3.7.0 > > > [9] pkgconfig_2.0.2 memoise_1.1.0 > > > > _______________________________________________ > > Bioc-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > This email message may contain legally privileged and/or confidential > information. If you are not the intended recipient(s), or the employee or > agent responsible for the delivery of this message to the intended > recipient(s), you are hereby notified that any disclosure, copying, > distribution, or use of this email message is prohibited. If you have > received this message in error, please notify the sender immediately by > e-mail and delete this email message from your computer. Thank you. > [[alternative HTML version deleted]] > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > -- The information in this e-mail is intended only for the ...{{dropped:18}} _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel