2007/2/11, Chris Burke <[EMAIL PROTECTED]>:
June Kim wrote:
[snip]
> Second, the box is broken with different width characters(that is, > when the length of bytes of the encoding, and the width of the > characters on display don't match). What is the usual way of solving > it in other programming languages? There is a unicode standard for > character widths. http://unicode.org/reports/tr11/ > > Python implements that standard(along with others) in unicodedata module. > >>>> unicodedata.east_asian_width(u'한') > 'W' >>>> unicodedata.east_asian_width(u'a') > 'Na' > > (u specifies the following string is unicode. east_asian_width returns > the width of the character, not only for east asian characters but all > unicode characters; it's got a narrow name due to its history) >
[snip]
If you are having problems with display, it is because of the font, not because we are not using unicode.
[snip] When a string is boxed and the string includes characters that have different width to the byte lenghts, then the box is broken in J. It is not because of the font. It is because J makes an assumption that every character's width is same with its byte length, which is obviously false in many writting+encoding systems, including east asians. We can definitely say J's box display isn't internationalized yet. For example, 54620 (in unicode code point) is a Korean character, which is pronounced as "han". It's width is "Wide"(twice wide as latin alphabets) han=.4 u: 54620 <han +---+ |한| +---+ <8 u: han +---+ |한| +---+ Since J counts the byte length for determining character's width, and the byte length for han is 3 in UTF-8( 3-: #8 u: han ), the box's horizontal character '-'(of which width is "Narrow") is printed three times, and on the display the box is broken.
---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
