Re: [BioRDF] All about the LSID URI/URN

Sean Martin Wed, 12 Jul 2006 06:18:11 -0700

Hello Alan,
The short answer is that only some parts of what the LSID scheme does could be done using the means you suggest. The reason for this is that what you outline is more or less part of the LSID resolution process under the covers. However in the end it would not meet a number the original requirements and would require new infrastructural mechanisms that somewhat defeat the purpose of sticking with http://.

Let me respond with comments embedded below:

[EMAIL PROTECTED] wrote on 07/07/2006 04:52:01 PM:

>
> Sean, couldn't what LSID achieves be done, for instance, by having a
> convention that if someone dereferences, for example,
>
> http://bla.com/path/to/document/foo.lsid
>

As you initially start with a URL, you obviously have the initial location and protocol dependency issues raised but not addressed in earlier posts. In summary it is my experience that when one names existing objects with long persistence that are intended for wide area distributions it is both prudent and practical to separate that name from the mechanism for resolution.

Also because you use a URL you are forced to always dereference it to understand its current contract. One cannot programmatically tell the difference in contracts between http://bla.com/path/to/document/foo.lsid and http://www.cnn.com/index.html without dereferencing both of them and locally storing and then comparing details of their particular contracts. This means that one cannot just safely assume that the name string http://www.cnn.com/index.html names something that is the same http://www.cnn.com/index.html a day later. Nor that the object someone sent me named http://bla.com/path/to/document/foo.lsid is the same as the object I can retrieve if I dereference http://bla.com/path/to/document/foo.lsid right now.

Should the person who sent me an object also send me a copy of the persistence policy perhaps? How often does one go back in ones email and click on URL links that are now broken? How many binary attachments do you have in your email that you cannot figure out what their data source was without opening them up and doing some human level heuristics or perhaps doing a Google string match? These are the sorts of problems that the LSID addresses but of course not just for email.

>
> it is understood to obey a protocol, namely to return a snippet of
> rdf that says, here's a handle to my metadata, here's a handle to my
> data, here's my machine readable persistence policy. Or instead of
> returning rdf, the link response mentioned in http://www.w3.org/2001/
> tag/doc/URNsAndRegistries-50.html could be used to point to the
> auxillary information.

This is similar to the LSID scheme, except that LSID resolution uses a WSDL document to communicate the possible data and metadata service end-points. Since the LSID scheme only has one contract regarding persistence (the hard rule that the LSID may never be reused to name any other bytes), there is no need to pass persistence information. This means that the LSID string alone can be used to compare for equality between objects. For the caching of metadata (which can change over time) the LSID scheme defers to the transport mechanism over which that metadata was obtained for an indication the length of time the metadata should considered valid. This is one area where I believe the LSID standard should be improved so as to formally address both persistent and non-persistent metadata and is something the caBIG folks wanted.

>
> And if that persistence policy says that the data is immutable, then
> you can comfortably store it, and use this URI for as a handle for
> resolving, in the same way an LSID can be resolved by an http service
>
> http://lsid.company.net/resolver/http://bla.com/path/to/document/
> foo.lsid
>
> The resolved could pull back whatever information you have locally,
> return source information, or redirect to the id, like a click through.

Note that this would require new proxy and browser client infrastructure that can understand how to interpret and act on these policies and this higher level protocol. The behavior on existing infrastructure would likely be broken as one could not just put one of these URIs into a browser or proxy server and have it do the right things. This weakens the argument that the reason we are so keen to only use http:// URLs as URIs is because of all the deployed existing infrastructure makes adoption easier. The part of the web infrastructure that would just work today is also the exact same part of the infrastructure that the LSID resolution scheme uses.

>
> This seems to satisfy the requirement that you can tell what sort of
> thing it is from looking at it, as well as the desired ability to

I am not sure what you mean by `looking` at it here. Do you mean without deference or after inspection by dereference? For LSIDs the contract is clear without dereference, but for URLs I cannot see how that can be true.

> cache and indirect.
>
> More generally any social convention that we use can accomplish the
> same thing - a provider could say (in a robots.txt-like file, or as a
> published policy) that certain paths in its tree have this sort of
> metadata available and should be treated like an lsid would.

Again this requires standards & infrastructure to interpret and apply the difference contracts, particularly if this must be machine-readable. The more sophisticated and/or `wooly` it is, the less likely we are to see adoption. Each time one retrieves an object one would need to check (and perhaps store) the contract/policy too. Comparisons simply cannot be made of URI simple name strings to determine equality of the object named.

Finally, I would like to add an unrelated comment about one of the practical aspects of LSIDs that we find useful here. This is in the area of local/distributed/offline vs. online/centralized naming and access. Because the LSID named object is not tied to any particular place or protocol, objects can be created and accessed locally on ones own machine (perhaps offline) using exactly the same name that they will be accessed with when they are uploaded and made public to a wider group or to the internet as a whole. Software we write for locally creating or accessing LSID data can be the same as that for accessing LSIDs across the network and it makes no difference whether the LSIDs have been uploaded or not. One has none of the worries of maintaining a relative link structures or hard coding and then having to recode URL absolute references or even finding one now has to use a new (longer) name once the object is uploaded to a distribution service end-point.

Kindest regards, Sean

PS, I was amused to recently realize the irony of you playing the (extremely useful) part of devil's advocate on this topic. I don’t know if you realize it, but my understanding is that the original LSID was based on work at Millennium. ;-)

--

Sean Martin
IBM Corp.

Re: [BioRDF] All about the LSID URI/URN

Reply via email to