On characters and string case mappings: On Wed, Aug 3, 2011 at 7:19 AM, Denis Washington <[email protected]> wrote:
> The remarks about Turkic casing pairs not being used for char-upcase / > char-downcase are somewhat disturbing. Or, how about just saying that "there procedures uses language-insensitive case mappings as defined in Unicode"? Turkish special mappings are defined as language-sensitive mapping in SpecialCasing.txt of UnicodeData. There's another language-sensitive mapping about Lithuanian, which I assume won't be used either. Lithuanian mapping isn't simple mapping, so it can only affect string-upcase; yet in the string case conversion routine only Turkish is mentioned. By saying excluding language-sensitive mappings, we can cover both. (R6RS says "locale-independent mapping"). In the string case conversion, it mentions the context sensitivity of Greek sigma: A small final sigma needs to be used when it is at the end of the word. However, there's no definition of "word", which can lead inconsistent behavior among implementations. We can refer to UAX #29, as R6RS does. (To illustrate the latter issue: R6RS shows two examples on the sigma: (string-downcase "ΧΑΟΣΣ") => "χαοσς" (string-downcase "ΧΑΟΣ Σ") => "χαος σ" Here are some other cases which are somewhat non-obvious: (string-downcase "ΧΑΟΣ.Σ") => "χαοσ.ς" (string-downcase "ΧΑΟΣ. Σ") => "χαος. σ" Naive word segmentation may miss "χαοσ.ς" case.) _______________________________________________ Scheme-reports mailing list [email protected] http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports
