Soobok Lee <[EMAIL PROTECTED]> wrote: > I suggest you visit the next link which contains The Revelation of the > Holy Bible in Chinese (GB) and English (Modern BBE and Old KJV). > > http://www.ccim.org/cgi-user/bible/ob?version=hgb&version=kjv&version=bb e&book=gen > > And then, come back to this WG with new estimations on information > capacity of han letters, please.
In Genesis chapter 1, I counted the Han ideographs in the Chinese Union version, and the Latin letters in the King James version. In both cases I excluded all other characters (punctuation, spaces, verse numbers, etc). Chinese ideographs: 778 English letters: 3168 This suggests that each Chinese ideograph carries the information content of slighly over four English letters. Therefore a maximal Chinese domain label in AMC-ACE-Z (19 ideographs, using about 3 octets each plus 4 octets for the prefix) holds about as much information as 76-letter English string, which is 21% more information than a maximal English domain label (63 letters using 1 octet each). The situation is much worse for Korean. I think each Hangul character carries the information of only about 1.5 English letters, but still takes about 2.9 octets in AMC-ACE-Z, which means a maximal Korean domain label (20 hangul) holds about as much information as a 30-letter English string. Of all the languages I've looked at, Korean is by far the least dense when encoded using AMC-ACE-Z. AMC
