On 9/05/2006, at 8:46 PM, Matthias Samwald wrote:


Hi Alan,

 As far as I know there is no standard URI for a resource at NCBI. I
 would like to propose that there be one, since we will all need
 them to use when we refer to these resources  in our RDF. (and I
 need one *now*)

I think we should be aware that this could be a VERY important decision for the further development of RDF in the life sciences. The URI - scheme we come up with during this project would probably become THE standard for referencing ressources at the NCBI. I guess we should try to contact someone from the NCBI to make sure the soloution we come up with is acceptable to them. Maybe they will soon realize the need for URIs themselves and start creating their own, conflicting URI scheme. The last thing the Semantic Web would need would be two different URIs for each of the many ressources in the Entrez databases.


 Following other styles I've seen, I propose the following:


 1. http://www.ncbi.nlm.nih.gov/2006/entrez/<DATABASE_GOES_HERE>/
 <IDENTIFIER_GOES_HERE>

 or


 2. http://www.ncbi.nlm.nih.gov/2006/entrez/
 <DATABASE_GOES_HERE>#<IDENTIFIER_GOES_HERE>

In my experience I have felt that leaving #identifier free for the most fine-grained data resources best provides URI readability. I use a composition rule :

If data resource_a is composed of dataresource_b and dataresource_c and dataresource_b and dataresource_c cease to exist if dataresource_a is destroyed, the the uri would be something like <domainname>/<database>/resource_a#<identifier> where <identifier> would be dataresoure_b and dataresource_c. The #identifier typically appears in document specific contexts, i.e. id's within a particular document that are unique. But extending this to a database means that these documents are likely to be quite dynamic, and the document specificity of ids becomes blurred. That's why I'm trying composition (not aggregation) as a rule for when something is a #id. Not too sure on the results yet.



We should have a look at how applications (especially triplestores) handle this. Do they know how to split namespace from identifier in the first case? I remember that the current version of the triplestore Sesame has some performance - problems when handling URNs, because he splits namespace and identifier in a wrong way (creating a new namespace for almost every resource). I know that, according to the RDF specification, the RDF ID is just an opaque string, but applications do handle that differently.

 Rational: can use owl:sameAs to make them the same if we need to.
 We can suggest a best practice if we want to preferentially use one
 numbering system versus another. (I like the alphanumeric ones,
 myself)

We would not be happy to have huge amounts of redundant resources linked with owl:sameAs. owl:sameAs is nice when it only needs to be used sparingly, but having two different naming schemes of a large protein database linked through owl:sameAs would 'pollute' the Semantic Web right from the beginning. We should seek to avoid this when we are still in the position to do so.

I cannot see this can be avoided. The bigger picture is that different databases and groups associated with them will use different URI schemes for describing the same thing. Also, things that were deemed not the same once may become thought of as the same later. It is also impossible to predict what URI naming schemes will make sense further down the track, or what factors various engines might play on (swoogle for instance). What I think there needs to be is a combination of careful thought and tools for URI normalisation, where yes there may come a time when suddenly a sameAs property is defined for every database record, but that a tool can be used by anyone to normalise to a preferred URI. Sort of like a agent's own cache victim, but for semantic web services where you may query a service with one URI, and if that is not a currently active version, the webservice would say "but that uri is also the sameas this preferred one" and so the agent can agree to update their URI and re- perform the query.





kind regards,
Matthias Samwald



http://neuroscientific.net

Section on Medical Expert and Knowledge-Based Systems
Core Unit for Medical Statistics and Informatics
Medical University of Vienna/Austria
http://www.meduniwien.ac.at/mes/home_en.html




Reply via email to