Summary: This is a technical discussion in which respond to various
points that Eric makes in his message regarding the utility of using
PURLs, which I place in the context of making statements on the
semantic web. Comments are in line with the original conversation
because they refer to specific passages of his original message.
[ I'm going to be experiment with including short summaries at the
top of long messages at the suggestion of some colleagues who need a
better way of prioritizing emails]
Hi Eric,
On Jul 14, 2007, at 10:26 AM, Eric Jain wrote:
Alan Ruttenberg wrote:
I'm not at all saying that you wouldn't want to attach any
statements to a specific representation, but that if you did, you'd
better use the actual URL of the representation, not some PURL.
The point of having the PURLs is to ensure that there is a mechanism
for handling three cases that LSIDs were intended to address (but
which can be addressed without the trouble of introducing a separate
resolving mechanism)
1) To be immune from the "actual URL of the representation" changing.
(e.g. beta.uniprot.org goes out of beta)
2) To enable switching to a backup if the server is turned off, or
certain pages go 404
3) To facilitate local caching of content from servers such as
uniprot in such a way as to not adjust what URLs clients need to use
to access this content.
Many of us who have worked in the field have seen (and been burned
by) variants of these cases over the years.
For example, if you wanted to state that http://beta.uniprot.org/
uniprot/P12345 validates as XHTML 1.0:
<http://beta.uniprot.org/uniprot/P12345>
validatesAs <http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd>
...seems better than e.g.
<http://purl.org/commons/html/uniprotkb/P12345>
...of which you can't be quite as certain that it will always point
to the specific page you wanted to describe!
It is better, because when http://beta.uniprot.org/uniprot/P12345
switches to http://uniprot.org/uniprot/P12345, as you suggest will
happen, we (by this I mean the imagined community that administers
the purls in the interest of serving HCLS informatics) will update
the redirect to point to the new URL. When the "beta" is dropped,
either it will be the same page as before, in which case all our
statements will still be valid, or the page will change, in which
case using uniprot direct address doesn't help. The intention of
setting up a PURL system by our community would be to not only
blindly manage redirects, but to set (and represent) expectations of
what a client can reasonably expect the behavior of a fetch of any of
these URLs to be. It will always be up to uniprot to decide what they
put on html pages. But within our community it would be considered
good manners to explain what uniprot's page update policy is (or
isn't), and then live up to what they have said.
You could argue that this URI is meant to represent the more
general concept of "an HTML representation of P12345", but at that
point I really start to wonder...
This has not been argued. The closes thing along these lines that has
been argued for is the definition of
http://purl.org/commons/record/uniprotkb/P12345 which is intended to
represent the underlying information in the database record without
commitment to a particular format (xml, rdf, html). This URI would be
used for making statements about aspects of this information that are
common to any format (e.g. this record includes a representation of
an amino acid sequence). That /record/ uri is intended not to be an
information resource, to 303 to an RDF document describing how to
access the specific formats, etc, as described in other emails.
I suppose it might be possible to represent which header should be
used in the content negotiation as part of the RDF, but a) It's
got to be easier to just put that information in the name and b)
In the case that you want to, e.g. mirror some contents of Uniprot
on a file system, you will have to make up distinct names anyways?
Maybe I'm dense, but I fail to see how content negotiation is of
any use on the semantic web.
Note that the content negotiation is done *at the level of the
resolver*, all the different representations have their own URLs:
Yes, but what sorts of statements can be made using http://
purl.uniprot.org/uniprot/P12345 as the subject? Because it can mean
any of the below, even the protein class itself, how can a *semantic
web* statement be made using it?
http://beta.uniprot.org/uniprot/P12345
http://beta.uniprot.org/uniprot/P12345.xml
http://beta.uniprot.org/uniprot/P12345.rdf
http://beta.uniprot.org/uniprot/P12345.fasta
Content negotiation could be a useful mechanism for bypassing the
HTML representation (which is what the PURL resolves to by default,
greatest common denominator etc), important if a lot of requests
need to be made.
The issue of efficiency of requests is a separate issue, but you
aren't the only one who has mixed up the issue of efficiency with
clarity - I had a conversation with TBL a couple of weeks ago where I
argued that the whole hash uri thing was another such case - a
premature optimization.
IMO, the first goal of our design ought to be to ensure that
automated semantic web agents (idiots as they will be) will have a
fighting chance to avoid having to do the difficult (even impossible)
sorts of disambiguations that people are faced with all the time.
That bar hasn't yet been met. Once we've ensured that we can meet
that goal, then we can talk about optimization. (incidentally we do
discuss various optimization techniques, from predicability of the
form of the name, to purl servers sending back the rewrite rules they
use so that they can be implemented on the client side).
You'll notice that in the RDF representation, this HSSP resource is
represented with the URL http://purl.uniprot.org/hssp/7aat. The
main reason for pre-resolving the PURLs in the web pages is that
many people (been there, done that) like to see where they are
going before they click.
OK, I missed that. But I'd still use the same purls in the HTML.
There are other mechanisms for indicating the real destination, and
it may lead to confusion when people need to choose a name to the
subject or object of a statement. If you ever land up using RDF/a,
you will need to use the same URIs as in the RDF.
btw this is an example of a resource that won't work with the HCLS
PURL resolver at the moment, as this resolver can also only append
to a path!
So that you know, there is interest by the purl developers to extend
their redirect service to better accommodate semantic web usage, and
they have offered to do this if we can get together and tell them
what we need.
Regards,
Alan