[Bioc-devel] Question about which new organism resources to create

Marc Carlson Tue, 06 May 2014 10:16:47 -0700

Hi everyone,

As many of you already know we have long provided organism annnotationpackages that give gene based annotations for selected organisms. Andwe intend to keep doing that. But these days there is also a lot ofother data at NCBI that could be used to make gene based databases forother organisms. And at the same time, there is also greater andgreater demand for annotations from other organisms too. So I aim tomake organism based gene databases for a wider range of organisms.However instead of just making more packages, I intend to put these DBsinto the AnnotationHub. You can get an idea about what access will belike by looking at the inparanoid8 objects that were put in for the lastrelease.


library(AnnotationHub)
ah = AnnotationHub()
hs8 = ah$inparanoid8.Orthologs.hom.Homo_sapiens.inp8.sqlite
hs8
columns(hs8)
k = head(keys(hs8, 'TOXOPLASMA_GONDII'))
select(hs8, k, 'HOMO_SAPIENS', 'TOXOPLASMA_GONDII')
## etc.

Anyhow my reason for posting is that I am now looking at all the NCBIdata that could be used for annotation packages and trying to decidewhat to include. About half of the 14 thousand potential critters inthe NCBI dataset only have about one gene annotated. I am guessing thatit is not worth anyone's time to pre-process those organisms that haveonly one gene. Or is it? If you think it might be, now would probablybe a good time to speak up.

How many annotations do you guys want/expect in an organism packagebefore it becomes annoying that you even downloaded it?


Thanks in advance for your opinions,


  Marc

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] Question about which new organism resources to create

Reply via email to