URIs

Alan Ruttenberg Thu, 15 Jun 2006 23:54:09 -0700

There was an discussion a few weeks ago about URIs touch on variousissues. This message is an attempt to untangle them, something I saidI would write up as an action item in one of the HCLS conferencecalls. We'll be discussing URIs at the monday BioRDF conference call.


As I read the discussion I partitioned it in to three distinct issues:

1) The relationship between the use of a URI in a representation andwhat it dereferences to, if anything. The possibilities seem to be:

a) The identifier is not intended to be dereferencable. In thatcase the info: scheme was suggested for the form of the uri, as thatis explicitly not dereferenceable.

b) The URI is used primarily as a name. Insofar as we want usenames, it is important there be some stable URIs. Of course itdoesn't hurt if the URI becomes dereferenceable at some point, and itwould even be nice, so let's leave open that possibility (but caveatsin discussion below)


  c) Any URL we use needs to be able to be dereferenced to something.

d) Any URL we use needs to be able to be dereferenced to the thingit is (and not dereferenced if you can't do that). It's only meaningis what it dereferences to.

2) What a URI refers to. Some of this conversation was made in theform of a discussion about what reasonable arguments to owl:sameAsare - for example should one say that http://www.expasy.org/uniprot/P04637 is the sameAs http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=NP_000537.

Another part of the conversation talked in terms of whether the URIhttp://www.expasy.org/uniprot/P04637 should, for our purposes, referto a database record or to a thing in the world - Human P53 proteins.

Of course these are two sides of the same coin - you would only saythey the two URIs above referred to things in the world. As databaseentries, they are obviously different. There are different fields,they are in maintained by different people, etc.

3) Something I will call the social aspect of URIs, for lack of abetter term. By this I mean those aspects process we go through tocome to a shared use of of URI. Under this category there is theontology building, the strategies for connecting pieces ofinformation generated by different groups. There was a bit in theconversations where people were arguing about whether using sameAsfor mapping was pollution or a necessity, for instance. An importantpart of this in our context is how to define the use of URLs tothings where there was not rigorous ontological engineering appliedto create careful definitions, things like terminologies and entriesin gene databases.


---

I'll offer some of my own opinions on these issues now.

On the matter of what a URI dereferences to, I think it is moreimportant to get the names in place quickly. I don't agree with thepoint of view that we should explicitly make them notdereferenceable, even though I'm not sure what should come back whenwe ask for what they point to yet. And I don't see support for therebeing a necessity that anything that looks like a URL have a serverthat returns something specific back. Here's a quote from RFC 3986,

Although many URI schemes are named after protocols, this does notimply that use of these URIs will result in access to the resourcevia the named protocol. URIs are often used simply for the sake ofidentification.

It will part of our social process to come to some understand andagreement about what would be useful for us to have come back, ifanything. Is it an RDF graph? A bunch of OWL definitions of thingsrelated to the gene? A representation of the asn record? A page ofHTML? All of the above?

On the question of what kind of concept an entrez gene URI refers to,I think that concept needs to be "databaseRecord". There's too manydifferent concepts that it could mean if we want it to refer tosomething in the world - does it refer to the sequence of the gene?The typical gene? All mutations of it that are found in populations?The possible gene products?

Rather, we can use the URI to the database entry to start to buildconcepts by defining properties and using them in OWL classdefinitions in a variety of ways. In foaf and SKOS, for instance,there is a property isPrimarySubjectOf. The kind of equivalence wecan have between http://www.expasy.org/uniprot/P04637 and http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=NP_000537 is something like: The same somethingisPrimarySubjectof http://www.expasy.org/uniprot/P04637 and http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=NP_000537.

where "something" is a blank node in RDF.  Or in OWL

Class(P53Gene complete
    restriction(isPrimarySubjectof

(value <http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=NP_000537>)))

Class(P53Transcript partial intersectionOf(mRNA restriction(derivesFrom someValuesFrom(P53Gene))))

Which says that it is necessary and sufficient for x to be aP53Gene,for example, if someone

has stated or it has been inferred that

Individual(x value(isPrimarySubjectOf <http://www.expasy.org/uniprot/P04637>))

and that a P53 transcript, among other things, is a mRNA thatderivesFrom some P53Gene.


(there will be more complicated definitions too :)

[sameAs, equivalentClass, equivalentProperty will be a necessity, Ithink, BTW]


As for the social process, I look forward to the discussion on Monday :)

Regards,
Alan


http://www.w3.org/TR/uri-clarification/

Uniform Resource Identifier (URI): Generic Syntax - http://tools.ietf.org/html/3986Relations in biomedical ontologies - http://genomebiology.com/2005/6/5/R46

http://en.wikipedia.org/wiki/Uniform_Resource_Identifier
http://en.wikipedia.org/wiki/URL

URIs

Reply via email to