Re: proposal for standard NCBI database URI

Matt Halstead Tue, 09 May 2006 19:16:57 -0700


On 9/05/2006, at 8:46 PM, Matthias Samwald wrote:

Hi Alan,
 As far as I know there is no standard URI for a resource at NCBI. I
 would like to propose that there be one, since we will all need
 them to use when we refer to these resources  in our RDF. (and I
 need one *now*)
I think we should be aware that this could be a VERY importantdecision for the further development of RDF in the life sciences.The URI - scheme we come up with during this project would probablybecome THE standard for referencing ressources at the NCBI. I guesswe should try to contact someone from the NCBI to make sure thesoloution we come up with is acceptable to them. Maybe they willsoon realize the need for URIs themselves and start creating theirown, conflicting URI scheme. The last thing the Semantic Web wouldneed would be two different URIs for each of the many ressources inthe Entrez databases.
 Following other styles I've seen, I propose the following:


 1. http://www.ncbi.nlm.nih.gov/2006/entrez/<DATABASE_GOES_HERE>/
 <IDENTIFIER_GOES_HERE>

 or


 2. http://www.ncbi.nlm.nih.gov/2006/entrez/
 <DATABASE_GOES_HERE>#<IDENTIFIER_GOES_HERE>

In my experience I have felt that leaving #identifier free for themost fine-grained data resources best provides URI readability. I usea composition rule :

If data resource_a is composed of dataresource_b and dataresource_cand dataresource_b and dataresource_c cease to exist ifdataresource_a is destroyed, the the uri would be something like<domainname>/<database>/resource_a#<identifier> where <identifier>would be dataresoure_b and dataresource_c. The #identifier typicallyappears in document specific contexts, i.e. id's within a particulardocument that are unique. But extending this to a database means thatthese documents are likely to be quite dynamic, and the documentspecificity of ids becomes blurred. That's why I'm trying composition(not aggregation) as a rule for when something is a #id. Not too sureon the results yet.

We should have a look at how applications (especially triplestores)handle this. Do they know how to split namespace from identifier inthe first case? I remember that the current version of thetriplestore Sesame has some performance - problems when handlingURNs, because he splits namespace and identifier in a wrong way(creating a new namespace for almost every resource). I know that,according to the RDF specification, the RDF ID is just an opaquestring, but applications do handle that differently.
 Rational: can use owl:sameAs to make them the same if we need to.
 We can suggest a best practice if we want to preferentially use one
 numbering system versus another. (I like the alphanumeric ones,
 myself)
We would not be happy to have huge amounts of redundant resourceslinked with owl:sameAs. owl:sameAs is nice when it only needs to beused sparingly, but having two different naming schemes of a largeprotein database linked through owl:sameAs would 'pollute' theSemantic Web right from the beginning. We should seek to avoid thiswhen we are still in the position to do so.

I cannot see this can be avoided. The bigger picture is thatdifferent databases and groups associated with them will usedifferent URI schemes for describing the same thing. Also, thingsthat were deemed not the same once may become thought of as the samelater. It is also impossible to predict what URI naming schemes willmake sense further down the track, or what factors various enginesmight play on (swoogle for instance). What I think there needs to beis a combination of careful thought and tools for URI normalisation,where yes there may come a time when suddenly a sameAs property isdefined for every database record, but that a tool can be used byanyone to normalise to a preferred URI. Sort of like a agent's owncache victim, but for semantic web services where you may query aservice with one URI, and if that is not a currently active version,the webservice would say "but that uri is also the sameas thispreferred one" and so the agent can agree to update their URI and re-perform the query.



kind regards,
Matthias Samwald



http://neuroscientific.net

Section on Medical Expert and Knowledge-Based Systems
Core Unit for Medical Statistics and Informatics
Medical University of Vienna/Austria
http://www.meduniwien.ac.at/mes/home_en.html

Re: proposal for standard NCBI database URI

Reply via email to