Re: URL +1, LSID -1

Alan Ruttenberg Sat, 14 Jul 2007 19:20:00 -0700

Summary: This is a technical discussion in which respond to variouspoints that Eric makes in his message regarding the utility of usingPURLs, which I place in the context of making statements on thesemantic web. Comments are in line with the original conversationbecause they refer to specific passages of his original message.

[ I'm going to be experiment with including short summaries at thetop of long messages at the suggestion of some colleagues who need abetter way of prioritizing emails]


Hi Eric,

On Jul 14, 2007, at 10:26 AM, Eric Jain wrote:

Alan Ruttenberg wrote:
I'm not at all saying that you wouldn't want to attach anystatements to a specific representation, but that if you did, you'dbetter use the actual URL of the representation, not some PURL.

The point of having the PURLs is to ensure that there is a mechanismfor handling three cases that LSIDs were intended to address (butwhich can be addressed without the trouble of introducing a separateresolving mechanism)1) To be immune from the "actual URL of the representation" changing.(e.g. beta.uniprot.org goes out of beta)2) To enable switching to a backup if the server is turned off, orcertain pages go 4043) To facilitate local caching of content from servers such asuniprot in such a way as to not adjust what URLs clients need to useto access this content.

Many of us who have worked in the field have seen (and been burnedby) variants of these cases over the years.

For example, if you wanted to state that http://beta.uniprot.org/uniprot/P12345 validates as XHTML 1.0:
<http://beta.uniprot.org/uniprot/P12345>
  validatesAs <http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd>
...seems better than e.g.
<http://purl.org/commons/html/uniprotkb/P12345>
...of which you can't be quite as certain that it will always pointto the specific page you wanted to describe!

It is better, because when http://beta.uniprot.org/uniprot/P12345switches to http://uniprot.org/uniprot/P12345, as you suggest willhappen, we (by this I mean the imagined community that administersthe purls in the interest of serving HCLS informatics) will updatethe redirect to point to the new URL. When the "beta" is dropped,either it will be the same page as before, in which case all ourstatements will still be valid, or the page will change, in whichcase using uniprot direct address doesn't help. The intention ofsetting up a PURL system by our community would be to not onlyblindly manage redirects, but to set (and represent) expectations ofwhat a client can reasonably expect the behavior of a fetch of any ofthese URLs to be. It will always be up to uniprot to decide what theyput on html pages. But within our community it would be consideredgood manners to explain what uniprot's page update policy is (orisn't), and then live up to what they have said.

You could argue that this URI is meant to represent the moregeneral concept of "an HTML representation of P12345", but at thatpoint I really start to wonder...

This has not been argued. The closes thing along these lines that hasbeen argued for is the definition ofhttp://purl.org/commons/record/uniprotkb/P12345 which is intended torepresent the underlying information in the database record withoutcommitment to a particular format (xml, rdf, html). This URI would beused for making statements about aspects of this information that arecommon to any format (e.g. this record includes a representation ofan amino acid sequence). That /record/ uri is intended not to be aninformation resource, to 303 to an RDF document describing how toaccess the specific formats, etc, as described in other emails.

I suppose it might be possible to represent which header should beused in the content negotiation as part of the RDF, but a) It'sgot to be easier to just put that information in the name and b)In the case that you want to, e.g. mirror some contents of Uniproton a file system, you will have to make up distinct names anyways?Maybe I'm dense, but I fail to see how content negotiation is ofany use on the semantic web.
Note that the content negotiation is done *at the level of theresolver*, all the different representations have their own URLs:

Yes, but what sorts of statements can be made using http://purl.uniprot.org/uniprot/P12345 as the subject? Because it can meanany of the below, even the protein class itself, how can a *semanticweb* statement be made using it?

http://beta.uniprot.org/uniprot/P12345
http://beta.uniprot.org/uniprot/P12345.xml
http://beta.uniprot.org/uniprot/P12345.rdf
http://beta.uniprot.org/uniprot/P12345.fasta
Content negotiation could be a useful mechanism for bypassing theHTML representation (which is what the PURL resolves to by default,greatest common denominator etc), important if a lot of requestsneed to be made.

The issue of efficiency of requests is a separate issue, but youaren't the only one who has mixed up the issue of efficiency withclarity - I had a conversation with TBL a couple of weeks ago where Iargued that the whole hash uri thing was another such case - apremature optimization.

IMO, the first goal of our design ought to be to ensure thatautomated semantic web agents (idiots as they will be) will have afighting chance to avoid having to do the difficult (even impossible)sorts of disambiguations that people are faced with all the time.That bar hasn't yet been met. Once we've ensured that we can meetthat goal, then we can talk about optimization. (incidentally we dodiscuss various optimization techniques, from predicability of theform of the name, to purl servers sending back the rewrite rules theyuse so that they can be implemented on the client side).

You'll notice that in the RDF representation, this HSSP resource isrepresented with the URL http://purl.uniprot.org/hssp/7aat. Themain reason for pre-resolving the PURLs in the web pages is thatmany people (been there, done that) like to see where they aregoing before they click.

OK, I missed that. But I'd still use the same purls in the HTML.There are other mechanisms for indicating the real destination, andit may lead to confusion when people need to choose a name to thesubject or object of a statement. If you ever land up using RDF/a,you will need to use the same URIs as in the RDF.

btw this is an example of a resource that won't work with the HCLSPURL resolver at the moment, as this resolver can also only appendto a path!

So that you know, there is interest by the purl developers to extendtheir redirect service to better accommodate semantic web usage, andthey have offered to do this if we can get together and tell themwhat we need.


Regards,
Alan

Re: URL +1, LSID -1

Reply via email to