There was an discussion a few weeks ago about URIs touch on various issues. This message is an attempt to untangle them, something I said I would write up as an action item in one of the HCLS conference calls. We'll be discussing URIs at the monday BioRDF conference call.

As I read the discussion I partitioned it in to three distinct issues:

1) The relationship between the use of a URI in a representation and what it dereferences to, if anything. The possibilities seem to be:

a) The identifier is not intended to be dereferencable. In that case the info: scheme was suggested for the form of the uri, as that is explicitly not dereferenceable.

b) The URI is used primarily as a name. Insofar as we want use names, it is important there be some stable URIs. Of course it doesn't hurt if the URI becomes dereferenceable at some point, and it would even be nice, so let's leave open that possibility (but caveats in discussion below)

  c) Any URL we use needs to be able to be dereferenced to something.

d) Any URL we use needs to be able to be dereferenced to the thing it is (and not dereferenced if you can't do that). It's only meaning is what it dereferences to.

2) What a URI refers to. Some of this conversation was made in the form of a discussion about what reasonable arguments to owl:sameAs are - for example should one say that http://www.expasy.org/uniprot/ P04637 is the sameAs http://eutils.ncbi.nlm.nih.gov/entrez/eutils/ efetch.fcgi?db=protein&id=NP_000537.

Another part of the conversation talked in terms of whether the URI http://www.expasy.org/uniprot/P04637 should, for our purposes, refer to a database record or to a thing in the world - Human P53 proteins.

Of course these are two sides of the same coin - you would only say they the two URIs above referred to things in the world. As database entries, they are obviously different. There are different fields, they are in maintained by different people, etc.

3) Something I will call the social aspect of URIs, for lack of a better term. By this I mean those aspects process we go through to come to a shared use of of URI. Under this category there is the ontology building, the strategies for connecting pieces of information generated by different groups. There was a bit in the conversations where people were arguing about whether using sameAs for mapping was pollution or a necessity, for instance. An important part of this in our context is how to define the use of URLs to things where there was not rigorous ontological engineering applied to create careful definitions, things like terminologies and entries in gene databases.

---

I'll offer some of my own opinions on these issues now.

On the matter of what a URI dereferences to, I think it is more important to get the names in place quickly. I don't agree with the point of view that we should explicitly make them not dereferenceable, even though I'm not sure what should come back when we ask for what they point to yet. And I don't see support for there being a necessity that anything that looks like a URL have a server that returns something specific back. Here's a quote from RFC 3986,

Although many URI schemes are named after protocols, this does not imply that use of these URIs will result in access to the resource via the named protocol. URIs are often used simply for the sake of identification.

It will part of our social process to come to some understand and agreement about what would be useful for us to have come back, if anything. Is it an RDF graph? A bunch of OWL definitions of things related to the gene? A representation of the asn record? A page of HTML? All of the above?

On the question of what kind of concept an entrez gene URI refers to, I think that concept needs to be "databaseRecord". There's too many different concepts that it could mean if we want it to refer to something in the world - does it refer to the sequence of the gene? The typical gene? All mutations of it that are found in populations? The possible gene products?

Rather, we can use the URI to the database entry to start to build concepts by defining properties and using them in OWL class definitions in a variety of ways. In foaf and SKOS, for instance, there is a property isPrimarySubjectOf. The kind of equivalence we can have between http://www.expasy.org/uniprot/P04637 and http:// eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi? db=protein&id=NP_000537 is something like: The same something isPrimarySubjectof http://www.expasy.org/uniprot/P04637 and http:// eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi? db=protein&id=NP_000537.
where "something" is a blank node in RDF.  Or in OWL

Class(P53Gene complete
    restriction(isPrimarySubjectof
(value <http://eutils.ncbi.nlm.nih.gov/entrez/ eutils/efetch.fcgi?db=protein&id=NP_000537>)))

Class(P53Transcript partial intersectionOf(mRNA restriction (derivesFrom someValuesFrom(P53Gene))))

Which says that it is necessary and sufficient for x to be a P53Gene,for example, if someone
has stated or it has been inferred that

Individual(x value(isPrimarySubjectOf <http://www.expasy.org/uniprot/ P04637>))

and that a P53 transcript, among other things, is a mRNA that derivesFrom some P53Gene.

(there will be more complicated definitions too :)

[sameAs, equivalentClass, equivalentProperty will be a necessity, I think, BTW]

As for the social process, I look forward to the discussion on Monday :)

Regards,
Alan


http://www.w3.org/TR/uri-clarification/
Uniform Resource Identifier (URI): Generic Syntax - http:// tools.ietf.org/html/3986 Relations in biomedical ontologies - http://genomebiology.com/ 2005/6/5/R46
http://en.wikipedia.org/wiki/Uniform_Resource_Identifier
http://en.wikipedia.org/wiki/URL

Reply via email to