Re: wcwidth

Markus Kuhn Wed, 27 Sep 2000 09:30:43 -0700
Marcin 'Qrczak' Kowalczyk wrote on 2000-09-27 15:33 UTC:
> Wed, 27 Sep 2000 15:09:55 +0100, Markus Kuhn <[EMAIL PROTECTED]> pisze:
> 
> > The last glibc 2.1.93+ version of glibc that I tested had iswprint()
> > = 0 for every combining character and as a directly hardwired
> > consequences also wcwidth() = -1 for every combining character.
> 
> This makes me think that I should always use a private implementation
> of wcwidth for Haskell, instead of relying on the one in the C library,
> as I do with character class predicates and toupper/tolower.
> 
> Is the width of Unicode characters a property defined more by Unicode
> itself, or by the OS?

The purpose of wcwidth() is to predict, how many character cells on a
monospaced output device (video terminal, line printer, etc.) a
character will consume. On POSIX systems, I think wcwidth() should be
defined by the current locale, because the locale is already used to
define the character coding of these simple output devices.

Unicode assigns a width property to every character, and I used this
EastAsianWidth table and the combining character properties to derive

  http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c

with a very simple algorithm (documented in the comments). I recommend
usage of my hardwired wcwidth() only for systems, where the C library
does not provide an adequate locale-based one. Otherwise use the one
from your C library, which gives your end user the ability to configure
it (via localedef) herself, and which will hopefully in the end make it
more likely that all your applications (including the terminal
emulator!) use the same single wcwidth() definition.

I can think of many ways of making my wcwidth() definition neater (for
example making EM DASH a wide character would have been nice), but I
resisted such temptations, because it would have opened an neverending
fiddling around with the wcwidth table, and we really do not want to
have a large variety of different wcwidth() definitions in circulation
(it would just negatively affect remote terminal emulation
interoperability, where people are less likely to care about the exact
locale configuration).

> OTOH many slots will probably never be assigned at all. It would be
> strange to say that a nonexistant character is printable.

Why not? Xterm does print a default character for it after all, so
*something* will be printed.

Why not add a separate function isassigned() that tests whether the
implementation knows about the definition of the character. Don't try to
overload functions with too many different application purposes at once.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/
Re: wcwidth

Reply via email to