Re: [r6rs-discuss] Proposed NON-features for small Scheme, part 8: string-set! must die

John Cowan Wed, 23 Sep 2009 09:16:25 -0700

Ray Dillinger scripsit:

> 40 of which don't count because they're not part of the repertoire of 
> normalized characters,


That is, normalizations which remove compatibility characters (NKFC
and NKFD).  There exist good reasons to keep compatibility characters,
though, in which case this characterization is inaccurate.

> and 88 of which are single characters that 
> change under casing operations to single characters, confusing only
> those who have already confused character lengths with codepoint
> lengths. 

"Character" is a vague term; it has five definitions in the Unicode
glossary.  You are identifying characters with DCGs, which are sensible
for some languages and purposes but misfire for others.  Tamil users
think of their abugida as a syllabary, and DCGs work well for them; Hindi
users think of their closely related abugida as either an alphabet or a
set of consonant clusters with vowel marks, depending on the ligaturing
behavior they are most familiar with.  Likewise, in Swedish ä and
ö are as distinct from a and o as i from j or G from C; in German,
the umlauted letters are mere variants of their normal counterparts.
Furthermore, Spanish é is just an e that bears word stress, whereas in
French é, è, and e are three separate entities.

The one true answer is that there is no one true answer.  Codepoints are
the irreducible minimum level: when you go down to code units or octets,
you lose too much semantic import and are in the realm of encodings of
Unicode rather than Unicode itself.  Above that there are many ways to
segment strings, some language-specific, some not.  I don't see much
point in privileging one over another.

What's more, using DCGs means that strings are a denumerably infinite
domain of finite sequences over another denumerably infinite domain, DCGs.
Some might think that one denumerably infinite domain was sufficient.

--
My corporate data's a mess!                     John Cowan
It's all semi-structured, no less.              http://www.ccil.org/~cowan
    But I'll be carefree                        [email protected]
    Using XSLT
On an XML DBMS.

_______________________________________________
r6rs-discuss mailing list
[email protected]
http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss

Re: [r6rs-discuss] Proposed NON-features for small Scheme, part 8: string-set! must die

Reply via email to