June Kim wrote:
> There are frequently asked questions upon box characters in J. The box
> characters in J on non-English(or latin-1) environment creates a few
> problems, which can be avoided or solved with unicode.
>
> First of all, what is the rationale for assigning a.{~16+i.11 to box
> characters? It's not ASCII nor unicode. Where are they from? (a friend
> of mine, who have an extensive knowledge on unicode and character
> encodings told me the choice is weird)
>
> Second, the box is broken with different width characters(that is,
> when the length of bytes of the encoding, and the width of the
> characters on display don't match). What is the usual way of solving
> it in other programming languages? There is a unicode standard for
> character widths. http://unicode.org/reports/tr11/
>
> Python implements that standard(along with others) in unicodedata module.
>
>>>> unicodedata.east_asian_width(u'한')
> 'W'
>>>> unicodedata.east_asian_width(u'a')
> 'Na'
>
> (u specifies the following string is unicode. east_asian_width returns
> the width of the character, not only for east asian characters but all
> unicode characters; it's got a narrow name due to its history)
>
> Many programming language implementations and platforms use already
> implemented unicode libraries, which have quite good quality. One of
> them is ICU from IBM. http://www.ibm.com/software/globalization/icu
> Its license is very loose. Some uses a part of ICU and some includes
> it wholely.
>
> Using existing solutions, we could improve the box display in J.
The boxdraw help explains this, see
http://www.jsoftware.com/help/user/boxdraw.htm .
The JFE (J front end) does indeed use unicode box drawing characters.
You can see this by copying boxed output and checking the alphabet index
of the result, e.g.
a.i.wdclipread''
226 148 140 226 148 128...
If you are having problems with display, it is because of the font, not
because we are not using unicode.
The one unusual thing is that the JE (J engine) (which is different from
the JFE) outputs boxdraw characters as (16+i.11) { a. and the JFE maps
these into the proper unicode characters. The main reason for doing this
was to keep things simple for the JE and minimize changes needed to
support unicode. In theory, it should not affect the user. We have had a
couple of problems with it in J601, for example JE output sent to the
printer, but these are fixed.
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm