Egmont Koblinger <[EMAIL PROTECTED]> writes:

> I guess tr should support multibyte character sets, even if not by default,
> then by providing a command line option.

That'd be nice.  It's a bit tricky, though.  Doing it right would
require that tr support encoding errors (stray byte sequences that
cannot be parsed as parts of multibyte characters).  For example, one
should easily be able to remove the encoding errors without making any
other changes, or to transliterate to upper-case while preserving
encoding errors.  Help in this area would be appreciated.

The POSIX spec for tr
<http://www.opengroup.org/onlinepubs/009695399/utilities/tr.html>
talks about this issue somewhat, but it's incoherent -- I can't make
heads or tails of what the -C option is really supposed to do.

> If I'm wrong and the current behavior is the desired one then please replace
> all occurances of "character" to "byte" in its manual.

The CVS version of the coreutils manual talks about this, saying
"Currently @command{tr} fully supports only single-byte characters.
Eventually it will support multibyte characters; ..." with some more
details about the problem.


_______________________________________________
Bug-coreutils mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/bug-coreutils

Reply via email to