----- Original Message ----- From: <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Tuesday, November 13, 2001 12:40 PM Subject: Re: [idn] reordering strawpoll
> In a message dated 2001-11-12 14:41:29 Pacific Standard Time, > [EMAIL PROTECTED] writes: > > > If you encode each Hangul syllabic in 3 jamos in utf8, > > it need 3 octets * 3 = 9 octets, while 3 basic latin letter need 3 octets > in utf8. > > 3 times more space! if there were any real "compaction" on hangul > > syllable code points, that may be just the bare minimum. > > But one paragraph earlier, Soobok stated that each hangul character is > roughly equivalent to (i.e. carries roughly as much information as) 2.2 to > 2.7 Latin letters. So the 9 octets of UTF-8 actually encode the equivalent > of 6.6 to 8.1 Latin letters, which means Hangul encoding is 10% to 27% less > efficient than Latin encoding. Representing it as two-thirds (67%) less > efficient is obviously misleading. Such claims only detract attention away > from any merit the reordering plan may have. my analogy cited above was for *UTF8*. The another paragraph enclosed below is for *ACE* with/without REORDERING. You may have mixed up my two separate arguments. Sorry for my mispresentation of myself if my sentence make confusions. I may be not skilled enough in english. So be careful in reading my postings. :-) Soobok Lee > > > From What i get from reorering experiments, It became clear that > > long han/hangul code points sequence of length N can be represented > > by 2.0~2.2 * N latin letters. Without reordering, it would be 3.0~3.1. > > 33% improvement is possible! Why should we go without reordering > > which merely require simple mapping tables with so many benefits? > > James Seng has stated repeatedly that there is no need to reiterate, yet > again, the supposed benefits of reordering. Every proposal, including this > one, has both advantages and disadvantages which must be weighed against each > other. > > -Doug Ewell > Fullerton, California >
