On 1/23/2012 9:55 AM, Philip Martin wrote:
So lots of high range Unicode points are allowed.
Yes.
How will we validate that?
The same way the code (as given to me by others) currently validates the
names---it iterates the characters provided, and if one of them doesn't
meet the definition, it returns false. Basically what goes inside
if(...){} would be changed.
Do we have any suitable code in Subversion?
In my original email I provided the name of the method that is currently
providing the arbitrary restriction. If the if(...){} block would change
to relax its current restrictions. I don't see what is difficult about
it, although perhaps I'm being naive. However, noting that SVN+DAV works
just fine with this relaxed restriction, and that JavaHL works just fine
/reading/ values with relaxed restriction, my best guess is that all you
have to do is change a few lines in that method and things will all work
nicely.
Do we write an XML validator?
Nowhere was there ever a hint of XML validation. In fact, I wasn't even
proposing verification of XML well-formedness. There is no XML markup
involved. I'm simply proposing we use the same definition that XML does
of a name.
The definition of a name is conceptually a set of characters. Think of
it as a regular expression. Currently Subversion uses something like
/[a-zA-z:_][a-zA-z0-9\.:_]*/. I'm simply proposing we relax this using
XML's "regular expression" instead of the one we use now. There is no
XML involved. We are simply re-using a definition from their specification.
Currently there are at least two "official" Subversion clients. One is
using XML's definition of a property name. Another is ("for now" the
code says) using another definition. Whatever we do, I would propose
they both use the same definition. I would vote for XML's definition
Use some other existing validator? Do we have to extract UTF8
multibyte characters first?
We would have to interpret the incoming bytes that as UTF-8 and parse
them accordingly before validating the characters, yes. In fact, this
should be happening anyway. Remember that clients such as Subclipse and
TortoiseSVN are already /reading/ these property name values as UTF-8,
so the code that validates them should be interpreting them as UTF-8 as
well.
I thought you were proposing to write the code?
I'm fine with that as well. Looks like I would have to add a few lines
to decote UTF-8 (surely such code already exists in the Subversion
codebase somewhere) and change a few if(...){} statements. I should be
able to handle that. I would imagine it will take more effort on my part
to get permission to change the code than actually writing the code itself.
Basically I'm proposing that we set
publicly what constitutes a valid Subversion name, and then make
whatever code changes are needed to conform. A test suite comes to
mind as a tool to assist in this, but that's another subject
altogether.
Subversion has a testsuite.
Either 1) the test suite does not cover property name validity, or 2)
the DAV+SVN client isn't run through the test suite, because the DAV+SVN
client doesn't comply to the property name validation present behind JavaHL.
Garret