----- Original Message ----- From: "Kent Karlsson" <[EMAIL PROTECTED]> > > It's a bit strange that this comes from quarters where there is already quite > a lot of "compaction" in the representations of text. A single Han ideograph > expresses "more" than a single letter in other scripts. And a single Hangul > syllable character expresses from 2 to 6 letters in one single character.
In hangul natural sentence,it may reach 2.6~7. In hangul business names, the average number is 2.2 or so, i guess. > Hangul is fundamentally is an alphabetic script, with 17 consonant letters > and 11 vowel letters, plus some variant (and historic) letters. If you encode each Hangul syllabic in 3 jamos in utf8, it need 3 octets * 3 = 9 octets, while 3 basic latin letter need 3 octets in utf8. 3 times more space! if there were any real "compaction" on hangul syllable code points, that may be just the bare minimum. >From What i get from reorering experiments, It became clear that long han/hangul code points sequence of length N can be represented by 2.0~2.2 * N latin letters. Without reordering, it would be 3.0~3.1. 33% improvement is possible! Why should we go without reordering which merely require simple mapping tables with so many benefits? Soobok Lee
