2007/2/11, Chris Burke <[EMAIL PROTECTED]>:
June Kim wrote:
[snip]
> Second, the box is broken with different width characters(that is,
> when the length of bytes of the encoding, and the width of the
> characters on display don't match). What is the usual way of solving
> it in other programming languages? There is a unicode standard for
> character widths. http://unicode.org/reports/tr11/
>
> Python implements that standard(along with others) in unicodedata module.
>
>>>> unicodedata.east_asian_width(u'한')
> 'W'
>>>> unicodedata.east_asian_width(u'a')
> 'Na'
>
> (u specifies the following string is unicode. east_asian_width returns
> the width of the character, not only for east asian characters but all
> unicode characters; it's got a narrow name due to its history)
>
[snip]

If you are having problems with display, it is because of the font, not
because we are not using unicode.
[snip]

When a string is boxed and the string includes characters that have
different width to the byte lenghts, then the box is broken in J. It
is not because of the font. It is because J makes an assumption that
every character's width is same with its byte length, which is
obviously false in many writting+encoding systems, including east
asians. We can definitely say J's box display isn't internationalized
yet.

For example, 54620 (in unicode code point) is a Korean character,
which is pronounced as "han". It's width is "Wide"(twice wide as latin
alphabets)

  han=.4 u: 54620
  <han
+---+
|한|
+---+
  <8 u: han
+---+
|한|
+---+

Since J counts the byte length for determining character's width, and
the byte length for han is 3 in UTF-8( 3-: #8 u: han ), the box's
horizontal character '-'(of which width is "Narrow") is printed three
times, and on the display the box is broken.
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to