There was an discussion a few weeks ago about URIs touch on various
issues. This message is an attempt to untangle them, something I said
I would write up as an action item in one of the HCLS conference
calls. We'll be discussing URIs at the monday BioRDF conference call.
As I read the discussion I partitioned it in to three distinct issues:
1) The relationship between the use of a URI in a representation and
what it dereferences to, if anything. The possibilities seem to be:
a) The identifier is not intended to be dereferencable. In that
case the info: scheme was suggested for the form of the uri, as that
is explicitly not dereferenceable.
b) The URI is used primarily as a name. Insofar as we want use
names, it is important there be some stable URIs. Of course it
doesn't hurt if the URI becomes dereferenceable at some point, and it
would even be nice, so let's leave open that possibility (but caveats
in discussion below)
c) Any URL we use needs to be able to be dereferenced to something.
d) Any URL we use needs to be able to be dereferenced to the thing
it is (and not dereferenced if you can't do that). It's only meaning
is what it dereferences to.
2) What a URI refers to. Some of this conversation was made in the
form of a discussion about what reasonable arguments to owl:sameAs
are - for example should one say that http://www.expasy.org/uniprot/
P04637 is the sameAs http://eutils.ncbi.nlm.nih.gov/entrez/eutils/
efetch.fcgi?db=protein&id=NP_000537.
Another part of the conversation talked in terms of whether the URI
http://www.expasy.org/uniprot/P04637 should, for our purposes, refer
to a database record or to a thing in the world - Human P53 proteins.
Of course these are two sides of the same coin - you would only say
they the two URIs above referred to things in the world. As database
entries, they are obviously different. There are different fields,
they are in maintained by different people, etc.
3) Something I will call the social aspect of URIs, for lack of a
better term. By this I mean those aspects process we go through to
come to a shared use of of URI. Under this category there is the
ontology building, the strategies for connecting pieces of
information generated by different groups. There was a bit in the
conversations where people were arguing about whether using sameAs
for mapping was pollution or a necessity, for instance. An important
part of this in our context is how to define the use of URLs to
things where there was not rigorous ontological engineering applied
to create careful definitions, things like terminologies and entries
in gene databases.
---
I'll offer some of my own opinions on these issues now.
On the matter of what a URI dereferences to, I think it is more
important to get the names in place quickly. I don't agree with the
point of view that we should explicitly make them not
dereferenceable, even though I'm not sure what should come back when
we ask for what they point to yet. And I don't see support for there
being a necessity that anything that looks like a URL have a server
that returns something specific back. Here's a quote from RFC 3986,
Although many URI schemes are named after protocols, this does not
imply that use of these URIs will result in access to the resource
via the named protocol. URIs are often used simply for the sake of
identification.
It will part of our social process to come to some understand and
agreement about what would be useful for us to have come back, if
anything. Is it an RDF graph? A bunch of OWL definitions of things
related to the gene? A representation of the asn record? A page of
HTML? All of the above?
On the question of what kind of concept an entrez gene URI refers to,
I think that concept needs to be "databaseRecord". There's too many
different concepts that it could mean if we want it to refer to
something in the world - does it refer to the sequence of the gene?
The typical gene? All mutations of it that are found in populations?
The possible gene products?
Rather, we can use the URI to the database entry to start to build
concepts by defining properties and using them in OWL class
definitions in a variety of ways. In foaf and SKOS, for instance,
there is a property isPrimarySubjectOf. The kind of equivalence we
can have between http://www.expasy.org/uniprot/P04637 and http://
eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?
db=protein&id=NP_000537 is something like: The same something
isPrimarySubjectof http://www.expasy.org/uniprot/P04637 and http://
eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?
db=protein&id=NP_000537.
where "something" is a blank node in RDF. Or in OWL
Class(P53Gene complete
restriction(isPrimarySubjectof
(value <http://eutils.ncbi.nlm.nih.gov/entrez/
eutils/efetch.fcgi?db=protein&id=NP_000537>)))
Class(P53Transcript partial intersectionOf(mRNA restriction
(derivesFrom someValuesFrom(P53Gene))))
Which says that it is necessary and sufficient for x to be a
P53Gene,for example, if someone
has stated or it has been inferred that
Individual(x value(isPrimarySubjectOf <http://www.expasy.org/uniprot/
P04637>))
and that a P53 transcript, among other things, is a mRNA that
derivesFrom some P53Gene.
(there will be more complicated definitions too :)
[sameAs, equivalentClass, equivalentProperty will be a necessity, I
think, BTW]
As for the social process, I look forward to the discussion on Monday :)
Regards,
Alan
http://www.w3.org/TR/uri-clarification/
Uniform Resource Identifier (URI): Generic Syntax - http://
tools.ietf.org/html/3986
Relations in biomedical ontologies - http://genomebiology.com/
2005/6/5/R46
http://en.wikipedia.org/wiki/Uniform_Resource_Identifier
http://en.wikipedia.org/wiki/URL