Bruno Haible wrote on 2000-08-15 15:22 UTC:
> Werner Lemberg wrote:
> > It's not clear to me why groff should do provide the correct -T
> > option.
>
> Because the iconv character set conversion might involve
> transliteration, e.g. U+2264 to "<=", which would disturb groff's nice
> justification of the right margin if it were done after groff.
My suggestion is that groff should offer a new -Twlocale,
in which it formats a paragraph as a wchar_t text and then spits
it out via wprintf() and friends. The C library will take care
of converting this to UTF-8, Latin-1, ASCII, transliteration,
etc. For each non-ASCII character in a paragraph, groff should
query with wcwidth(), how many ASCII character cells wide the
character will be according to the locale. This should also take
care of transliteration, i.e. wcwidth(0x2264) == 2 in case the
locale includes ASCII transliteration and results in
wputchar(0x2264) to spit out "<=".
All this should only be compiled in on systems, where
__STDC_ISO_10646__ is defined, otherwise you have no guarantee
that wchar_t really always contains ISO 10646. On systems where
__STDC_ISO_10646__ is not defined, -Twlocale is not available.
No need to explicitely use iconv everywhere, it is possible to
do this more formally portable with just ISO C99 facilities.
Markus
(writing from Denver, Colorado)
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/