Bruno Haible wrote on 2000-10-30 18:27 UTC:
> Markus Kuhn asked:
> > what is wrong with having fallback transliteration done by mbrtowc?
> You certainly mean wcrtomb.
Yes of course, sorry.
> What is wrong with it:
> a) it is confusing for programmers,
I disagree. Why is it less confusing to programmers than a miraculously
failing printf()? In my eyes, "ü" -> "ue" is just as much a valid and
useful multibyte encoding as UTF-8. The C standard does definitely not
require multi-byte encodings to be reversible, therefore you are free to
add additional tricks (such as transliteration) to the multibyte
encoding stage that will play no role at multi-byte decoding. I think
that is the conceptually cleanest and most easy to understand way to
handle transliteration in the existing framework.
> b) it may produce more than MB_CUR_MAX bytes, thus crash applications,
No! A correct MB_CUR_MAX implementation must of course indicate the
maximum number of bytes that multibyte encoding (including
transliteration if applied) can create. Then there is not problem and
also no programmer confusion.
> c) it causes problems with wcwidth, e.g. wcwidth(<a">) = 1 but
> should be 2 if transliterated.
No! A correct wcwidth() implementation must of course indicate the
number of character cells that the transliteration result consumes. Then
there is not problem and no programmer confusion.
It should be obvious that b) and c) follow quite naturally once you
start to treat transliteration just as an extension of multi-byte
encoding, and *NOT* as some dark magic invisible non-standard
post-processing that is performed on some of the output streams and
secretly done outside the scope of the C standard. I personally find the
latter transliteration paradigm far more useless, dangerous, and
confusing for the programmer than (as I propose) doing it within C's
multi-byte framework. My proposal gives the programmer far more control
and at the same time far less special code that has to be added to
applications.
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/