Re: URI canonicalization

Roy T. Fielding Mon, 31 Jan 2005 21:43:45 -0800

On Jan 31, 2005, at 7:10 PM, Martin Duerst wrote:

5) Add a note saying something like "Comparison functions
   provided by many URI classes/implementations make additional
   assumptions about equality that are not true for Identity
   Constructs. Atom processors therefore should use simple
   string functions for comparing Identity Constructs."
   I think such a note could be a good balance to the normalization
   advice.


That would be a falsehood.  Identifiers are not subject to
"simplification" -- they are either equivalent or not.  We can
add all of the implementation requirements we like to prevent
software from detecting false negatives, but that doesn't change
the fact that equivalent identifiers always identify the same
resource.  It is the author's responsibility to use URIs
(or IRIs) that are actually different, not the responsibility
of the protocol or implementation.

I am disappointed that a MUST requirement was added to IRI in the
last draft without working group review.  This part

  Applications using IRIs as identity tokens with no relationship to a
  protocol MUST use the Simple String Comparison (see section 5.3.1).
  All other applications MUST select one of the comparison practices
  from the Comparison Ladder (see section 5.3 or, after IRI-to-URI
  conversion, select one of the comparison practices from the URI
  comparison ladder in [RFC3986], section 6.2)

is completely missing the point of the ladder.  The identifiers may
or may not be equivalent and there is absolutely no reason for
protocols to require inaccurate comparisons.  The reason for
simplification of comparison is ONLY that false negatives are
an acceptable fact of life and their elimination is an
implementation-specific decision that has no impact on
interoperable use of identifiers.  That is why there is no such
requirement for URIs.

....Roy

Re: URI canonicalization

Reply via email to