Summary: This is a technical discussion in which respond to various points that Eric makes in his message regarding the utility of using PURLs, which I place in the context of making statements on the semantic web. Comments are in line with the original conversation because they refer to specific passages of his original message.

[ I'm going to be experiment with including short summaries at the top of long messages at the suggestion of some colleagues who need a better way of prioritizing emails]

Hi Eric,

On Jul 14, 2007, at 10:26 AM, Eric Jain wrote:

Alan Ruttenberg wrote:

I'm not at all saying that you wouldn't want to attach any statements to a specific representation, but that if you did, you'd better use the actual URL of the representation, not some PURL.

The point of having the PURLs is to ensure that there is a mechanism for handling three cases that LSIDs were intended to address (but which can be addressed without the trouble of introducing a separate resolving mechanism) 1) To be immune from the "actual URL of the representation" changing. (e.g. beta.uniprot.org goes out of beta) 2) To enable switching to a backup if the server is turned off, or certain pages go 404 3) To facilitate local caching of content from servers such as uniprot in such a way as to not adjust what URLs clients need to use to access this content.

Many of us who have worked in the field have seen (and been burned by) variants of these cases over the years.

For example, if you wanted to state that http://beta.uniprot.org/ uniprot/P12345 validates as XHTML 1.0:
<http://beta.uniprot.org/uniprot/P12345>
  validatesAs <http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd>
...seems better than e.g.
<http://purl.org/commons/html/uniprotkb/P12345>
...of which you can't be quite as certain that it will always point to the specific page you wanted to describe!

It is better, because when http://beta.uniprot.org/uniprot/P12345 switches to http://uniprot.org/uniprot/P12345, as you suggest will happen, we (by this I mean the imagined community that administers the purls in the interest of serving HCLS informatics) will update the redirect to point to the new URL. When the "beta" is dropped, either it will be the same page as before, in which case all our statements will still be valid, or the page will change, in which case using uniprot direct address doesn't help. The intention of setting up a PURL system by our community would be to not only blindly manage redirects, but to set (and represent) expectations of what a client can reasonably expect the behavior of a fetch of any of these URLs to be. It will always be up to uniprot to decide what they put on html pages. But within our community it would be considered good manners to explain what uniprot's page update policy is (or isn't), and then live up to what they have said.

You could argue that this URI is meant to represent the more general concept of "an HTML representation of P12345", but at that point I really start to wonder...

This has not been argued. The closes thing along these lines that has been argued for is the definition of http://purl.org/commons/record/uniprotkb/P12345 which is intended to represent the underlying information in the database record without commitment to a particular format (xml, rdf, html). This URI would be used for making statements about aspects of this information that are common to any format (e.g. this record includes a representation of an amino acid sequence). That /record/ uri is intended not to be an information resource, to 303 to an RDF document describing how to access the specific formats, etc, as described in other emails.

I suppose it might be possible to represent which header should be used in the content negotiation as part of the RDF, but a) It's got to be easier to just put that information in the name and b) In the case that you want to, e.g. mirror some contents of Uniprot on a file system, you will have to make up distinct names anyways? Maybe I'm dense, but I fail to see how content negotiation is of any use on the semantic web.

Note that the content negotiation is done *at the level of the resolver*, all the different representations have their own URLs:

Yes, but what sorts of statements can be made using http:// purl.uniprot.org/uniprot/P12345 as the subject? Because it can mean any of the below, even the protein class itself, how can a *semantic web* statement be made using it?

http://beta.uniprot.org/uniprot/P12345
http://beta.uniprot.org/uniprot/P12345.xml
http://beta.uniprot.org/uniprot/P12345.rdf
http://beta.uniprot.org/uniprot/P12345.fasta

Content negotiation could be a useful mechanism for bypassing the HTML representation (which is what the PURL resolves to by default, greatest common denominator etc), important if a lot of requests need to be made.

The issue of efficiency of requests is a separate issue, but you aren't the only one who has mixed up the issue of efficiency with clarity - I had a conversation with TBL a couple of weeks ago where I argued that the whole hash uri thing was another such case - a premature optimization.

IMO, the first goal of our design ought to be to ensure that automated semantic web agents (idiots as they will be) will have a fighting chance to avoid having to do the difficult (even impossible) sorts of disambiguations that people are faced with all the time. That bar hasn't yet been met. Once we've ensured that we can meet that goal, then we can talk about optimization. (incidentally we do discuss various optimization techniques, from predicability of the form of the name, to purl servers sending back the rewrite rules they use so that they can be implemented on the client side).

You'll notice that in the RDF representation, this HSSP resource is represented with the URL http://purl.uniprot.org/hssp/7aat. The main reason for pre-resolving the PURLs in the web pages is that many people (been there, done that) like to see where they are going before they click.

OK, I missed that. But I'd still use the same purls in the HTML. There are other mechanisms for indicating the real destination, and it may lead to confusion when people need to choose a name to the subject or object of a statement. If you ever land up using RDF/a, you will need to use the same URIs as in the RDF.

btw this is an example of a resource that won't work with the HCLS PURL resolver at the moment, as this resolver can also only append to a path!

So that you know, there is interest by the purl developers to extend their redirect service to better accommodate semantic web usage, and they have offered to do this if we can get together and tell them what we need.

Regards,
Alan

Reply via email to