Egmont Koblinger <[EMAIL PROTECTED]> writes: > I guess tr should support multibyte character sets, even if not by default, > then by providing a command line option.
That'd be nice. It's a bit tricky, though. Doing it right would require that tr support encoding errors (stray byte sequences that cannot be parsed as parts of multibyte characters). For example, one should easily be able to remove the encoding errors without making any other changes, or to transliterate to upper-case while preserving encoding errors. Help in this area would be appreciated. The POSIX spec for tr <http://www.opengroup.org/onlinepubs/009695399/utilities/tr.html> talks about this issue somewhat, but it's incoherent -- I can't make heads or tails of what the -C option is really supposed to do. > If I'm wrong and the current behavior is the desired one then please replace > all occurances of "character" to "byte" in its manual. The CVS version of the coreutils manual talks about this, saying "Currently @command{tr} fully supports only single-byte characters. Eventually it will support multibyte characters; ..." with some more details about the problem. _______________________________________________ Bug-coreutils mailing list [email protected] http://lists.gnu.org/mailman/listinfo/bug-coreutils
