Re: bioGUID

Roderic Page Thu, 29 Mar 2007 21:37:23 -0800


Dear Matt,

Do you have any publications that outline the motivation here (except
the LSIDs don't work for the semantic web argument you have outlined
in your online material)?

No publication as yet on bioGUID, but I'm working on some rough notes.In essence the motivation is to get biodiversity informatics using RDF,and I don't think LSIDs are the best way to get us there. For somethingrelated you could look at my article "Taxonomic names, metadata, andthe Semantic Web"(http://jbi.nhm.ku.edu/index.php/jbi/article/view/25).

What are the rules for generating a URI for a particular database
record? For example:
http://bioguid.info/rdf/GO:0003674 does not work, and neither does
http://bioguid.info/rdf/go:0003674 ... is the gene ontology not
referenced yet? or do I have the rule wrong?

bioGUID needs to know how to resolve the identifier, which in turnmeans that there is some way to get metadata about the identifier(although I will resort to having local copies of databases if I havetoo).

To resolve a GO identifier there needs to be a service somewhere thattakes a GO identifier and returns metadata either in RDF, or in aformat that can be converted to RDF. Is there any such service? If not,I would have to host a copy of GO here, and write something to output aGO term in RDF.

Are you trying to achieve a mechanism for uniquely identifying links
to important bioinformatic records by URI?

I guess I'm trying to show the value of having such URIs, because mysense is that -- within the biodiversity informatics community at least-- people haven't bought the argument yet. It's hard to make the casewithout real demos. Plus my own work depends on having such aninfrastructure in place.

Would you imagine this might become a primary conduit for people to
locate a database record when they have database record IDs, rdf
tools, but no idea how to use these tools to access the record data
(I'm not suggesting the record data is itself RDF).

Ideally no, because I would hope data providers would have their ownURIs that are stable and return RDF. For example, the database recordthat a user has should itself be a URI. However, as an interim step,yes, bioGUID can be a way to locate a record in the absence of knowinghow to do that, and in some cases, it may be the only currentlyexisting way to do that, unless you want to write your own code. Forexample, accessing a museum specimen record requires writing a XMLdocument and embedding that in a URL (gack).


Are you trying to achieve a cross referencing system for database
records? And if so, on what basis is a cross-reference made?

The cross reference uses bioGUIDs. For example, if a PubMed recordcontains a DOI, the RDF will have a triple linking the PubMed and theDOI using rdfs:sameAs. If a PubMed record lists sequence ids, they areconverted to bioGUIDs. I use bioGUIDs so that the link can be navigatedby a Semantic Web browser.

(for
example the record referenced itself back references the record making
this reference)

Um, huh? Do you mean, if a PubMed record references a sequence, doesthe sequence reference the PubMed record? The answer is it depends. Inthe case of a PubMed record and a sequence, in most cases yes, hencefor http://bioguid.info/pmid:17079492 there is a triple


<dcterms:references rdf:resource="http://bioguid.info/gi:117652796"; />

and for http://bioguid.info/gi:117652796 there is a triple

<dcterms:isReferencedByrdf:resource="http://bioguid.info/pmid:17079492"/>

These are easy because the PubMed and the GenBank records refer to eachother. In other cases both links don't exist -- for example, a specimenhas no idea whether a sequence links to it. I could add the reverselink in these cases, but I'd sort of assumed that people could use aSPARQL query to get these.


How would people apply to have their databases added?

Basically just ask me. So far I'm adding data sources that are directlyrelevant to my own work, but since that includes sequences, that prettymuch opens up most things in bioinformatics. I'm also looking at addinglist of triples (such as citation links) to the underlying triplestore, so the bioGUID records become richer than just a remote databaselookup.

The immediate use-case that springs to mind is being able to drop the
crop of libraries one needs to interpret records in one database to
find accession numbers for another database and so on until you find
sort of what you are looking for in the actual database you want.

Not totally sure I understand this. My own immediate use case is tohave a script that will fetch a record with a bioGUID, and have thatscript fetch every linked record referred to by that record (i.e., RDFspidering). For example, if I start with a PubMed identifier, thescript would pull out all the sequences in that paper, any specimenslinked to those sequences, the taxonomy of the organisms, and thepapers cited by the PubMed paper. I would then have a local triplestore for this information, and be able to do stuff like plot ageographical map of the sequences based on the georeferenced specimenrecords.


Regards

Rod

----------------------------------------------------------------------------------------------------------------

Professor Roderic D. M. Page
Editor, Systematic Biology
DEEB, IBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QP
United Kingdom

Phone:    +44 141 330 4778
Fax:      +44 141 330 2792
email:    [EMAIL PROTECTED]
web:      http://taxonomy.zoology.gla.ac.uk/rod/rod.html
iChat:    aim://rodpage1962
reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html

Subscribe to Systematic Biology through the Society of Systematic
Biologists Website:  http://systematicbiology.org
Search for taxon names: http://darwin.zoology.gla.ac.uk/~rpage/portal/
Find out what we know about a species: http://ispecies.org
Rod's rants on phyloinformatics: http://iphylo.blogspot.com
Rod's rants on ants: http://semant.blogspot.com

Re: bioGUID

Reply via email to