Re: URI note snapshot available

Booth, David (HP Software - Boston) Mon, 28 Apr 2008 20:56:18 -0700

Hi Jonathan,

Comments on
http://www.w3.org/2001/sw/hcls/notes/uris/
version of 28 April 2008 11:57 -0400.


General thoughts:

 - It's been a while since I read a draft, but this looks like great progress.

 - Overall it feels heavy on the rationale and light on getting to the point of 
what to do.  The rationale is helpful, but does bog things down a bit. Maybe a 
summary up front along the line of EricP's Quick Tips would help:
http://lists.w3.org/Archives/Public/public-semweb-lifesci/2008Apr/att-0075/QuickTips.html

 - The discussion hardly mentions URIs.  It's good that the problem of naming 
is pointed out as being broader than just with URI, I think a bit more emphasis 
specifically on URIs would help, because that's the purpose of the advice.

 - I think it would be good to give more prominence to the idea that a URI 
definition should clearly indicate its change policy.  This is really more 
fundamental than just saying "don't change the URI definition", since it would 
be okay to change a definition if it clearly indicates that it is unstable, 
released only for testing, and subject to change.  Similarly, the FOAF 
documentation specifically states:
http://xmlns.com/foaf/spec/#sec-evolution
"we do not update the namespace URI as the vocabulary matures", so users can 
set expectations accordingly.

Specific comments, in document sequence:

1. I like the real naming examples in the intro, but I think it could be 
shortened.  The explanation of the Linnaean system introduces more detail than 
needed to get the point across.

2. I think these sentences could be dropped without loss:
[[
Several innovations of technique come with URI-based naming: Systems of schemes 
and registries, network protocols such as the domain name system (DNS) and 
their associated oversight organizations, techniques for assigning globally 
unique names, and the behaviour of network-based protocols for communication 
keyed by URIs. However, successful use of these techniques for naming in 
science depends, as in the case of Linnaean system, on additional factors such 
as clear documentation and how well naming and documenting fit in to the 
practice of doing science.
]]

3. I don't see how the scanner example illustrates this point: "The second 
lesson is that how we name things matters."

4. I suggest rewording "the consequences of mistakes such as these will be more 
severe" as "it may be more difficult to diagnose such problems"

5. I think the section on "Capturing context using global names" can be 
substantially shortened.

6. When mentioning "URI owner" it would be good to reference:
http://www.w3.org/TR/webarch/#uri-ownership

7. This statement is problematic:
[[
Even if the URI owner has not made any clear statement about the URI's meaning, 
a community may still establish a meaning for a URI through use. As participant 
in such communities, it would be wise for a URI owner to respect that meaning, 
as contradictory statements would probably be ignored.
]]
This seems to imply that it is okay to squat on other people's URIs, and 
emphatically it is *not* okay to do so.  I think it would be best to just 
delete this statement.  The previous paragraph already admonishes against a URI 
owner changing the URI definition from its accepted usage.

8. These paragraphs are problematic too:
[[
A naming system that has an associated protocol relates to the protocols only 
in that the protocol provides what can be construed as a standard catalog or 
dictionary that aids in the understanding of the names. Regardless of whether 
or how the naming system exploits a technical apparatus such as the Web, 
meanings of names are not hostage to mistakes or technical or administrative 
failures, because the meaning of a name is infused in all communication that 
uses the name, and the name's documentation is only one such communication. 
This is easy to see in the case of the Linnaean system, which is universally 
understood to be based on primary literature, not catalogs. Only recently has 
it had comprehensive catalogs at all, and even these are considered secondary 
sources subject to verification. However, even a naming system such as GenBank 
[citation] that is very closely associated with a web-accessible source of 
primary documentation is ultimately based on what its names (accession numbers) 
are believed to mean, not on what the database says. If GenBank were to become 
corrupt or drop off the face of the earth, the community would scramble to 
create an alternative source for the retrieval of sequence information 
associated with the accession numbers, because so many scientific 
communications depend on the accession numbers to name the information that the 
records carry. As with any naming system, GenBank's technical infrastructure is 
a community trust, not an authority.

A naming system that has an associated protocol relates to the protocols only 
in that the protocol documentation and/or specific documentation received using 
the protocol help us understand what names mean. Regardless of whether or how 
the naming system exploits a technical apparatus such as the Web, meanings of 
names are not hostage to mistakes or technical or administrative failures, 
because meaning takes root in a different arena: from [flushed: a recognized 
initial communication and followed by] meaningful use in communication.
]]

First of all, the Linnaean system seems to illustrate the opposite of what you 
said it illustrates: the meaning of a Linnaean term is univerally understood to 
be based on the authority of its initial publication -- *not* on how the 
community uses the term in catalogs.

It seems like the main point you are trying to make is: a name definition, once 
published and adopted by the community, should be unchangeable.  I think that 
point is good.  The problem comes up when there is any suggestion that the 
community's usage of the term defines the term's meaning.  That indeed is the 
way natural language works, and it is okay for humans, but it is not okay for 
the Semantic Web:

 - Who defines "the community"?  Different "communities" often come up with 
different definitions for the same term.  With URIs this becomes URI collision
http://www.w3.org/TR/webarch/#URI-collision
and it is harmful, as you explain in your discussion of polysemy.

 - Usage-based definitions can drift over time, again leading to URI collision 
(or polysemy).  This is exactly what happens in natural language.

9. These look very good:
[[
1. [581 JAR:]  Is available documentation about the use of the URI sufficiently 
clear and unambiguous to guide effective use?
2. Will [was: Does] the documentation remain faithful to the meaning of the 
URI? [was: over time]
3. Is documentation available when needed? [maybe: will it be available?]
4. Is documentation available to computational agents via a well-known protocol 
and in a form that is useful to them?
]]

However, the way #2 is phrased in a way that suggests that the meaning of the 
URI can be independent of its published definition.  Also, I think the word 
"definition" is more to the point than "documentation", though documentation 
beyond the definition (such as usage examples) would be good to suggest also.  
You might consider rewording these to:
[[
1. Is the URI definition sufficiently clear and unambiguous to guide effective 
use?  Usage examples can also help.
2. Will the URI definition remain unchanged?
3. Will the URI definition be available when needed?
4. Is the URI definition available to computational agents via a well-known 
protocol and in a form that is useful to them?
]]

10. In he section on "Polysemy (one name, many meanings)", you might say 
straight out: "Never use the same URI to denote both a Web page or Web site and 
something else."

11.  Regarding these sentences:
[[
It is tempting to assume that a successful request for a document (one that 
elicits an HTTP "200 OK" response) tells us what the URI denotes - that it 
denotes the response to the request. However, making a similar assumption about 
the URI following a change to the document would result in a polysemy because 
then the URI would seem to denote two different responses.[jar check] If the 
HTTP responses vary over time, the URI, asumming no server error, denotes not a 
single unchanging document, but rather a draft series, the changing output of 
an instrument, a blog, the changing bylaws of an organization, or an otherwise 
evolving entity. Here, the publisher who wishes to avoid the risk of polysemy 
would make clear via documentation that the URI denotes a changing thing.
]]
I think you should be careful not to imply that the URI might legitimately 
denote the response itself, since that would be an extremely rare case, even 
for a Web page that never changes.  Also, I think the best advice to publishers 
is that they should be clear about their change policy, rather than only 
stating their change policy if the the document may change or if the document 
will not change.

How about this wording instead:
[[
When a request for a document yields an HTTP "200 OK" response, it might be 
tempting to assume that the URI denotes a document only in its current state.  
However, making a similar assumption about the URI following a change to the 
document would result in a polysemy because then the URI would seem to denote a 
document with two different contents.  In a situation like this the URI should 
instead be taken to denote not a single unchanging document, but a draft 
series, the changing output of an instrument, a blog, the changing bylaws of an 
organization, or an otherwise evolving entity.  To avoid such 
misunderstandings, a document should clearly state its change policy.
]]

12. The section on Polysemy should mention that in the AWWW this is called "URI 
collision":
http://www.w3.org/TR/webarch/#URI-collision

13. The section on synonymy should mention that in the AWWW this is called "URI 
aliasing":
http://www.w3.org/TR/webarch/#uri-aliases

14.  Word missing in this sentence?  "URI schemes that lack protocol 
association, or that are explicit in making protocol association advisory 
instead of central, might be seen as preferable to those that do for the 
purpose of naming."

15. Garbled sentence or missing word: "The conclusion is that the meaning of 
the URI has changed, making the correct interpretation prior uses dependent on 
which meaning is intended."

16. Nice job on the "protocol association tradeoff" section.

17. Nice job on the "Insurance against technical failure" section also.  
Personally, I do think there is room for additional technical tools in this 
areas.  For example, as I mentioned privately:
[[
Much of the consternation around persistance seems to be concern that a 
document might change at all though the community expects it to remain totally 
unchanged.  In this case, I would think that trusted timestamping
http://en.wikipedia.org/wiki/Trusted_timestamping
in conjunction with distributed archiving mechanisms could be used to obtain a 
sufficient level of confidence that users could at least detect a change.
]]
This is a community service that purl.org or a similar might offer, but 
specific guidance on doing this in the context of URI minting still needs to be 
worked out.

18. The section on "What the standards community needs to do" is excellent.  
You might add something like:

 - Techniques and guidance for achieving persistance, perhaps involving trusted 
timestamping.



David Booth, Ph.D.
HP Software
+1 617 629 8881 office  |  [EMAIL PROTECTED]
http://www.hp.com/go/software

Opinions expressed herein are those of the author and do not represent the 
official views of HP unless explicitly stated otherwise.

Re: URI note snapshot available

Reply via email to