Thomas Lord wrote:
My question is whether any principled reason for these arbitrary
constants is given that might be supported without appeal
to analogies to other programming languages.
Consider what happens if Unicode surrogate values are considered
valid characters. That implies they can stored in a string,
which is basically a character array.
Then the question arises is to what it means to index into a
string: Is it the N'th code point or the N'th scalar value?
The draft specifies that it's the N'th scalar value - which
means any use of surrogates must be hidden.
If you allow Unicode surrogate values as actual character
values that you effectively prohibit an implementation
for storing characters internally using UTF-16, since you
can't tell whether a surrogate pair is one Scheme character
or two. UTF-16 is the natural representation in Java, at least.
(I think that might be the code in Windows APIs as well.)
--
--Per Bothner
[EMAIL PROTECTED] http://per.bothner.com/
_______________________________________________
r6rs-discuss mailing list
[email protected]
http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss