On Sat, Feb 21, 2009 at 3:24 PM, Eli Barzilay <[email protected]> wrote: >> Sure, if you want. Or you can use various forms of normalization, >> some of which are standardized by Unicode and some not, to throw >> away any unwanted distinctions. For example, if you are analyzing >> Chinese text, you may want to throw away the difference between >> Simplified and Traditional characters -- not that it's trivial to do >> so. > > Exacly my point -- these various forms of normalizations are more > fragile, and the selection of the normalizations you'd want to have is > also less obvious, and they're all things that are inherently > cultural. So, as a hacker, I find it much easier to just ignore it > all and look at the bits instead. (It's convenient to have Unicode as > a very difficult piece of work that I didn't have to deal with...)
And, for some people, I suspect that normalizing traditional into simplified would be a political no no (and the other way would probably make your text hard to read for lots of people). Robby _______________________________________________ r6rs-discuss mailing list [email protected] http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss
