On 23 Sep 2009, at 3:28 pm, Arthur A. Gleckler wrote:

>> You know, it's a shame that we - *PLT nerds*, not *typographers* -
>> are
>> having to sit and debate how computers should model text. I feel it's
>> a failing of the Unicode community that they've just defined a list
>> of
>> codepoints and some algorithms, and not even suggested any
>> programming
>> models, leaving each programming language (that has strings as base
>> types, some do not) to have to work out their own idea of what a
>> "character" is...
>
> This probably isn't exactly what you're looking for, but it's at least
> in the right direction:
>
>  <http://www.unicode.org/faq/specifications.html>
>
>  Q: The Unicode Standard and related standards contain a number of
>  specifications or guidelines for dealing with different programming
>  tasks. Sometimes it's hard to find these. Is there a central place to
>  look?

Interesting...

http://www.unicode.org/reports/tr29/ looks pertinent. It refers
explicitly to algorithms for finding character boundaries.

*rummage* looks like it suggests that grapheme clusters be the default
notion of string length for users (and graphemes the default notion of
what a string is composed of), with the codepoint level being a little
more advanced, and most likely useful in the sense that each diacritic
codepoint can be considered as a modifier applied to the previous
character, so might well be exposed to the user in terms of a
character modification that can be "undone" by removing it.

Food for thought!

ABS

--
Alaric Snell-Pym
Work: http://www.snell-systems.co.uk/
Play: http://www.snell-pym.org.uk/alaric/
Blog: http://www.snell-pym.org.uk/archives/author/alaric/




_______________________________________________
r6rs-discuss mailing list
[email protected]
http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss

Reply via email to