On Sat, 2009-09-19 at 16:48 -0700, Thomas Lord wrote:

> Yes, but when you are building a string-like mutable
> type, appending and taking substrings, suddenly you 
> are renormalizing on every operation.

If you are appending and taking substrings, the codepoint level 
is one of several wrong choices to make about where to allow
string divisions, for exactly this reason.

What human beings think of as characters, are represented in unicode
by a base codepoint plus nondefective sequence of combining 
modifiers and variant selectors, each of which is also a codepoint.
The sequence is usually length zero, but since you're talking about
renormalizing after divisions, you're already talking about cases 
where the sequence is nonempty. 

If you allow division of strings on codepoint boundaries which 
are not also character boundaries, you can "renormalize" but in 
this case the renormalization operation makes no semantic sense. 
You have created characters that were not there, you have 
vanished characters that were there, you have changed characters 
into different characters, and so on.  These are not sensible 
operations; these are bugs.

If you restrict string division to character boundaries, then 
you have no need to "renormalize" because by not dividing strings 
in mid-character or joining strings that start or end with partial
characters, you never create a denormalized string. Further, 
the characters on each side of the division are the same 
characters that were there in the undivided string, so the 
user does not experience this class of inconsistencies and 
bugs.

This is why I believe that the best semantics for string-length, 
indexes in strings, etc, is that they should count characters 
rather than codepoints.  And this is one of the things that I 
believed then and still believe now that R6RS got wrong.

                                Bear




_______________________________________________
r6rs-discuss mailing list
[email protected]
http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss

Reply via email to