Re: [Bioc-devel] NCBI taxonomy annotation

2021-08-09 Thread Brian Schilder
Hi Levi, 

I recently just put together a new package called orthogene 
 (currently under review by bioc) 
that has a convenience function for flexibly mapping species identifiers to any 
ID types (including NCBI taxa IDs): map_species() 

It may not be as comprehensive as GenomeInfoDbData, but might still be useful. 

Best, 
Brian
___
Brian Schilder
PhD Candidate
UK Dementia Research Institute at Imperial College London
Faculty of Medicine, Department of Brain Sciences, Neurogenomics Lab
Profile | bit.ly/imperial_profile 
LinkedIn | linkedin.com/in/brian-schilder 

Twitter | twitter.com/BMSchilder 
Lab | neurogenomics.co.uk 
UK DRI | www.ukdri.ac.uk 


> On 8 Aug 2021, at 19:10, Levi Waldron  wrote:
> 
> Does anyone else do mapping between NCBI taxids, names, and ranks? We do
> this in curatedMetagenomicData and soon other packages, currently using
> external files that lack provenance and versioning, so Ludwig Geistlinger
> was looking for Bioconductor annotation resources. The closest he found was
> in GenomeInfoDbData  but
> this has only genus and species, and some quirks like Bacteria being listed
> as a genus:
> 
>> library(GenomeInfoDbData)
>> data(specData)
>> head(specData)
>  tax_idgenus species
> 1  1  all
> 2  1 root
> 3  2 Bacteria
> 4  6 Azorhizobium
> 5  7 Azorhizobium caulinodans
> 6  9 Buchnera  aphidicola
>> dim(specData)
> [1] 2521271   3
>> subset(specData, c(genus == "Escherichia" & species == "coli"))$tax_id
> [1] 562
> 
> Any thoughts from the GenomeInfoDbData maintainer ("Bioconductor Maintainer
> ") about a pull request either to a) update
> specData to add additional columns for all taxonomic levels, or b) creating
> a new object? Or, another approach altogether? See
> https://github.com/waldronlab/curatedMetagenomicData/issues/245.
> 
> --
> 
> Levi Waldron
> 
> Associate Professor
> 
> Department of Epidemiology and Biostatistics
> 
> CUNY Graduate School of Public Health and Health Policy
> 
> Institute for Implementation Science in Population Health
> 
> 55 W 125th St, New York NY 10035
> 
> https://waldronlab.io
> 
> Join the microbiome Virtual International Forum: https://microbiome-vif.org
> 
>   [[alternative HTML version deleted]]
> 
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] NCBI taxonomy annotation

2021-08-08 Thread Levi Waldron
Does anyone else do mapping between NCBI taxids, names, and ranks? We do
this in curatedMetagenomicData and soon other packages, currently using
external files that lack provenance and versioning, so Ludwig Geistlinger
was looking for Bioconductor annotation resources. The closest he found was
in GenomeInfoDbData  but
this has only genus and species, and some quirks like Bacteria being listed
as a genus:

> library(GenomeInfoDbData)
> data(specData)
> head(specData)
  tax_idgenus species
1  1  all
2  1 root
3  2 Bacteria
4  6 Azorhizobium
5  7 Azorhizobium caulinodans
6  9 Buchnera  aphidicola
> dim(specData)
[1] 2521271   3
> subset(specData, c(genus == "Escherichia" & species == "coli"))$tax_id
[1] 562

Any thoughts from the GenomeInfoDbData maintainer ("Bioconductor Maintainer
") about a pull request either to a) update
specData to add additional columns for all taxonomic levels, or b) creating
a new object? Or, another approach altogether? See
https://github.com/waldronlab/curatedMetagenomicData/issues/245.

--

Levi Waldron

Associate Professor

Department of Epidemiology and Biostatistics

CUNY Graduate School of Public Health and Health Policy

Institute for Implementation Science in Population Health

55 W 125th St, New York NY 10035

https://waldronlab.io

Join the microbiome Virtual International Forum: https://microbiome-vif.org

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel