On 23 Sep 2009, at 3:28 pm, Arthur A. Gleckler wrote: >> You know, it's a shame that we - *PLT nerds*, not *typographers* - >> are >> having to sit and debate how computers should model text. I feel it's >> a failing of the Unicode community that they've just defined a list >> of >> codepoints and some algorithms, and not even suggested any >> programming >> models, leaving each programming language (that has strings as base >> types, some do not) to have to work out their own idea of what a >> "character" is... > > This probably isn't exactly what you're looking for, but it's at least > in the right direction: > > <http://www.unicode.org/faq/specifications.html> > > Q: The Unicode Standard and related standards contain a number of > specifications or guidelines for dealing with different programming > tasks. Sometimes it's hard to find these. Is there a central place to > look?
Interesting... http://www.unicode.org/reports/tr29/ looks pertinent. It refers explicitly to algorithms for finding character boundaries. *rummage* looks like it suggests that grapheme clusters be the default notion of string length for users (and graphemes the default notion of what a string is composed of), with the codepoint level being a little more advanced, and most likely useful in the sense that each diacritic codepoint can be considered as a modifier applied to the previous character, so might well be exposed to the user in terms of a character modification that can be "undone" by removing it. Food for thought! ABS -- Alaric Snell-Pym Work: http://www.snell-systems.co.uk/ Play: http://www.snell-pym.org.uk/alaric/ Blog: http://www.snell-pym.org.uk/archives/author/alaric/ _______________________________________________ r6rs-discuss mailing list [email protected] http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss
