Re: [Jgeneral] wd 'set ...' with box draw characters

June Kim Mon, 12 Feb 2007 21:50:34 -0800

07. 2. 13, Oleg Kobchenko <[EMAIL PROTECTED]>이(가) 작성:

Looking at these East Asian characters in my
email client, IE browser, they are not rendered
as double width, but as a fractional width between 1 and 2
using Courier New font.


Well, in a fixed-pitched font, it's 1 em(wide) or 1/2 em(narrow).
Nothing is fractional.


3 dashes: too narrow
+---+
|한글─|
+---+

5 dashes: too wide
+-----+
|한글─|
+-----+


Here you are using unicode character 9472, of which name is  'BOX
DRAWINGS LIGHT HORIZONTAL', and its width is, according to unicode
standard, A. That stands for "Ambigious".

It means, the width of the horizontal line is 1/2 em or 1 em,
depending on the context.

I am using DejaVu Sans Mono, (http://dejavu.sourceforge.net/) On this
setting, its width is 1/2 em. If I choose "굴림체" instead, then it's 1
em wide.

So, 5 dashes(5 narrow characters) matches exactly with '한글─' as regard
to the width, at least in the setting.

Also currently J stubborly wants to draw the box
as if for a UTF-8 sequence, not for Unicode, even after
explicit conversion:

  <7 u:'한글─'
+---------+
|한글─|
+---------+

  datatype 7 u:'한글─'
unicode
  #7 u:'한글─'
3


Yes and I think that should be rectified. However, after the
rectification, the boxes will be broken still, unless the
character-width is considered. I am working on the Roger's Box Display
code to encompass unicode characters now.


--- June Kim <[EMAIL PROTECTED]> wrote:

> I'm working on the code.
>
> In the mean time, here is the code for calculating display width:
>
> First you need to save the text file at
> http://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt
>
> ===============================================
> require 'regex jfiles'
> t=: 1!:1 <'EastAsianWidth.txt'
> point=:'^([0-9A-F]{4});(Na|N|H|A|W|F)' rxmatches t
> range=:'([0-9A-F]{1,4})\.\.([0-9A-F]{1,4});(Na|N|H|A|W|F)' rxmatches t
> jcreate 'unidatapoint'
> (< }."1 point rxfrom t) jappend 'unidatapoint'
> jcreate 'unidatarange'
> (< }."1 range rxfrom t) jappend 'unidatarange'
> ===============================================
>
> Now you have unidatapoint.ijf and unidatarange.ijf and are able to use them.
>
> ===============================================
> require 'jfiles'
>
> NB. N  : half
> NB. Na : half
> NB. H  : half
> NB. A  : half
> NB. F  : full
> NB. W  : full
>
> widthcode=:;: 'N Na H A F W'
> pod=:>jread 'unidatapoint';0
> rad=:>jread 'unidatarange';0
>
> towc=: widthcode&i. NB. towidthcode
>
> dfh=. 16&#. @ ('0123456789ABCDEF'&i.)
> po=:(dfh each {."1 pod),. <"0 towc"0 {:"1 pod
> ra=:(,&.>/"1 dfh each 2&{."1 rad),. <"0 towc"0 {:"1 rad
> poa=:>{."1 po
>
> fill=: 4 : 0
>       'r c'=.x
>       r=. ({.r)+ i. >: -~/ r
>       ({.c) r}y
> )
>
> tab=:65536$0 NB. missing is N
> tab=:(> {:"1 po) poa} tab
> tab=:>./ ra fill"1 tab
>
> diswid=: [: >: [: 4&<: [: {&tab 3&u:@ucp  NB.for rank 1
> ================================================
> For performance improvement, you could save tab using jfile and use
> it. Also, you could use more compact representation(using 3 bits to
> represent each character and compress the data).
>
> Usage Example:
>    diswid '한글ab!─'
> 2 2 1 1 1 1
>    (,:~ ((ucp'-') $~ +/@diswid)) ucp '한글ab!-'  NB. properly showing
> the top line in fixed-pitch font
> --------
> 한글ab!-
>
>
>
> 2007/2/13, Eric Iverson <[EMAIL PROTECTED]>:
> > The problem of proper display of boxed unicode data is an interesting
> > one. The first step to getting this fixed is for someone to provide a
> > working J model that takes an arbitrary boxed argument and produces the
> > character stream that properly displays it. If we had such a model we
> > might consider incorporating it into the JE.
> >
> > ----- Original Message -----
> > From: "June Kim" <[EMAIL PROTECTED]>
> > To: "General forum" <[email protected]>
> > Sent: Sunday, February 11, 2007 5:11 AM
> > Subject: Re: [Jgeneral] wd 'set ...' with box draw characters
> >
> >
> > > 2007/2/11, Chris Burke <[EMAIL PROTECTED]>:
> > >> June Kim wrote:
> > > [snip]
> > >> > Second, the box is broken with different width characters(that is,
> > >> > when the length of bytes of the encoding, and the width of the
> > >> > characters on display don't match). What is the usual way of
> > >> > solving
> > >> > it in other programming languages? There is a unicode standard for
> > >> > character widths. http://unicode.org/reports/tr11/
> > >> >
> > >> > Python implements that standard(along with others) in unicodedata
> > >> > module.
> > >> >
> > >> >>>> unicodedata.east_asian_width(u'한')
> > >> > 'W'
> > >> >>>> unicodedata.east_asian_width(u'a')
> > >> > 'Na'
> > >> >
> > >> > (u specifies the following string is unicode. east_asian_width
> > >> > returns
> > >> > the width of the character, not only for east asian characters but
> > >> > all
> > >> > unicode characters; it's got a narrow name due to its history)
> > >> >
> > > [snip]
> > >>
> > >> If you are having problems with display, it is because of the font,
> > >> not
> > >> because we are not using unicode.
> > > [snip]
> > >
> > > When a string is boxed and the string includes characters that have
> > > different width to the byte lenghts, then the box is broken in J. It
> > > is not because of the font. It is because J makes an assumption that
> > > every character's width is same with its byte length, which is
> > > obviously false in many writting+encoding systems, including east
> > > asians. We can definitely say J's box display isn't internationalized
> > > yet.
> > >
> > > For example, 54620 (in unicode code point) is a Korean character,
> > > which is pronounced as "han". It's width is "Wide"(twice wide as latin
> > > alphabets)
> > >
> > >   han=.4 u: 54620
> > >   <han
> > > +---+
> > > |한|
> > > +---+
> > >   <8 u: han
> > > +---+
> > > |한|
> > > +---+
> > >
> > > Since J counts the byte length for determining character's width, and
> > > the byte length for han is 3 in UTF-8( 3-: #8 u: han ), the box's
> > > horizontal character '-'(of which width is "Narrow") is printed three
> > > times, and on the display the box is broken.




____________________________________________________________________________________
Want to start your own business?
Learn how on Yahoo! Small Business.
http://smallbusiness.yahoo.com/r-index
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jgeneral] wd 'set ...' with box draw characters

Reply via email to