Re: [BioRDF] Taxonomic Databases Working Group and LSIDs

William Bug Tue, 29 Aug 2006 05:36:24 -0700

Thanks for putting this out there for consideration, Eric. I certainly agree the amount of effort they have invested on the issue of using LSIDs as GUIDs for organism taxonomic information makes they a very worthy example, and, as they're work continues to progress, a possible existence proof of the value LSIDs have to offer.

Being able to deal with species in a more systematic and semantically granular manner is very important - and will be critical to using formal semantically-driven information federation techniques to better support translational research - e.g., enabling the creation of software capable of placing findings from animal models of disease in their proper, fine-grained, semantic context to make them useful to clinical treatment of human disease. It's also critical to phylogenetic analyses. Both of these issues can be handled now with sufficient manual effort in a relatively narrow domain, but this is not scalable and not the recommended plan for the future.

In general, it is helpful to be as specific as possible when specifying the organism taxon, since that brings with it some constrained definition of genotype. So, for instance, for the available digital mouse brain atlases, I believe the most specific one can be regarding taxon would be "Mus musculus" (ID: 10090 - http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=10090&lvl=3&lin=f&keep=1&srchmode=1&unlock), though it's possible the more specific subspecies Mus musculus domesticus would fit (http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=10092&lvl=3&lin=f&keep=1&srchmode=1&unlock), as many classical inbred strains are derived from this sub-species.

As many on this list are aware, NCBI Taxonomy is in ubiquitous use in the biomolecular informatics community and is included in UMLS. Having said that, NCBI Taxonomy is NOT the last - or even the best - current effort to formally specify the extent of our current extant knowledge of organism taxonomy and phylogeny. In fact, every page on the NCBI Taxonomy site such as the ones given above includes the following disclaimer at the bottom of the page:

"Disclaimer: The NCBI taxonomy database is not an authoritative source for nomenclature or classification - please consult the relevant scientific literature for the most reliable information."

The Zoological Record had previously been, since the mid-1880s, in cooperation with The British Museum, THE authority on this topic. That situation began to change in the 1990s. The email clips below are from an email I'd sent a few weeks ago to a colleague in response to a request for info on the status of defining an agreed upon, global, comprehensive formal specification of organism taxonomy.

Cheers,

Bill

EMAIL 1:

To my knowledge, there are three basic projects working on the issue of organism taxonomy with a view toward being globally and phylogenetically comprehensive:

* Life science library & info scientists associated with university science libraries, scientific field stations (especially for agriculture & ecology), life science databases (such as ZR), botanical gardens, and natural history museums around the world

==> Species 2000 (http://www.sp2000.org/)

* Researchers whose work involves some aspect of studying global biodiversity

==> Global Diversity Info Facility (http://www.gbif.org/)

* Researchers who study the phylogenetics of organism comparative anatomy (macro, micro, and biochemical) and behavioral ecology.

==> Tree of Life (http://tolweb.org/tree/)

They are all authorities in their own right - Species2000, GDIF, and ToL - but each from their own vantage.

In some ways, the GDIF and its constituent participants has been around the longest, though possibly GDIF the institution hasn't been around as long the shared biodiversity information aggregation/integration effort started by several of the participant organizations.

GDIF

Homepage

http://www.gbif.org/

Wiki

http://wiki.gbif.org/gbif/wikka.php?wakka=HomePage

Portal

http://www.asia.gbif.net/portal/index.jsp

Darwin Core data element definitions

http://darwincore.calacademy.org

When you go to the Portal and browse the taxonomy, you see attribution to sources for taxonomic names. This appears to be in holding with the following stated goal:

"Taxonomic names. GBIF developing an 'Electronic Catalogue of Taxonomic Names'. This will provide access to authoritative information about both scientific and common names for all organisms, and will integrate data from a wide range of different organisations. The portal already includes data for over 983,000 scientific names and 253,000 common names from the Catalogue of Life Partnership Annual Checklist. Some names are listed with the words 'Tentative position in taxonomy'. This indicates that the name is only known to the portal from specimen/observation records and should not be treated as authoritative simply on the basis of being listed here."

Right now they have 176 data providers for taxonomic information (http://www.gbif.org/DataProviders/providerslist?sortby=records), many of which are linked to the Species2000 Project.

I also know GDIF has been looking to use semantic web technologies in a big way and the LSID as a global identification system (resolvable URIs for RDF triplet resources).

The Tree of Life has always appealed to me as a bottom up effort of current investigators whose research aims include a phylogenetics component. It was the "brain child" of the Maddison brothers (http://tolweb.org/tree/home.pages/homepeople.html) back in the mid-90s. I've known of ToL since it's relatively humble beginnings about a decade ago as a collection of phylogeny web pages organized according in a phylogenetic tree graph. Back then, there were mostly empty nodes in the graph. Now they have an absolutely immense collection of domain expert contributors and an ever decreasing collection of blank nodes (http://tolweb.org/tree/home.pages/participants.html). Given the participants involved and their stated objectives (http://tolweb.org/tree/home.pages/goals.html) there efforts on this task really need to be somehow incorporated into any comprehensive, semantically formal _expression_ of organism taxonomy.

EMAIL 2:

I do think this is a critical issue for the medium- to long-term. My sense has been about 10 years ago NCBI bit off the tractable part of this problem immediately addressing the needs of molecular biologists in a manner that has proven exceedingly useful along the lines of the the way GO has become a ubiquitous tool for many informatics tasks stretching well beyond it's original design goals - though in the area of microbes, and particularly viruses, there are significant problems with NCBI. Whenever such a thing happens - a tool gets pressed into service for tasks not part of it's original cornucopia of Use Cases - there is a need to step back. Either you need to start recommending the community not use the resource for that "new" purpose - as is often the case for UMLS utilization - or considerable re-tooling needs to be done.

The biodiversity group includes folks like ZR and the various nat. history/bot. gardens organizations throughout the world, etc. who've been working on this issue of organism taxonomy for a very long time - some for over a century. Few have resources you'd want to use "as is" if the goal were to construct a well founded ontology. I'm particularly concerned with the high-level structure of the "ontology" the TDWG is proposing (the DARWIN Core - http://darwincore.calacademy.org/). However, it is really ill advised to go it alone and ignore this body of work.

NCBI taxonomy - like GO - is in such ubiquitous use in the realm of molecular & celluar biology, one can't throw it out either. Really what should be done is those at NCBI who curate NCBI Tax., the GBIF folks, AND the Tree of Life folks need to be brought together to work on this problem. Otherwise, splintering of the efforts will cause problems for us all in the future.

On Aug 28, 2006, at 1:02 PM, Eric Neumann wrote:

I would like to point out the Taxonomic Databases Working Group (TDWG) and their work with trying to establish a system of Global Unique Identifiers (GUIDs).

http://wiki.gbif.org/guidwiki/wikka.php?wakka=GUID2Report

At this point in time they are recommending (within their community) the use of LSIDs WITH metadata in the form of RDF.

I would like to propose that we include this on the list of examples for the LSID/URI discussion in BioRDF (just added to http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Tasks/URI_Best_Practices/LSID_Pros_%26_Cons). I think they have some great global examples of how to use such identifiers.

Eric

Eric Neumann, PhD
co-chair, W3C Healthcare and Life Sciences,
and Senior Director Product Strategy
Teranode Corporation
83 South King Street, Suite 800
Seattle, WA 98104
+1 (781)856-9132
www.teranode.com

Bill Bug

Senior Research Analyst/Ontological Engineer

Laboratory for Bioimaging & Anatomical Informatics

www.neuroterrain.org

Department of Neurobiology & Anatomy

Drexel University College of Medicine

2900 Queen Lane

Philadelphia, PA 19129

215 991 8430 (ph)

610 457 0443 (mobile)

215 843 9367 (fax)

Please Note: I now have a new email - [EMAIL PROTECTED]


This email and any accompanying attachments are confidential. 
This information is intended solely for the use of the individual 
to whom it is addressed. Any review, disclosure, copying, 
distribution, or use of this email communication by others is strictly 
prohibited. If you are not the intended recipient please notify us 
immediately by returning this message to the sender and delete 
all copies. Thank you for your cooperation.

Re: [BioRDF] Taxonomic Databases Working Group and LSIDs

Reply via email to