----- Original Message ----- From: "David Hopwood" <[EMAIL PROTECTED]> To: "Soobok Lee" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Saturday, October 20, 2001 10:41 AM Subject: Re: [idn] call for comments for REORDERING
> -----BEGIN PGP SIGNED MESSAGE----- > > Soobok Lee wrote: > > From: "Martin Duerst" <[EMAIL PROTECTED]> > > > > > > > >1) saturations in TLD namespaces would require longer names for which > > > > REORDERING is designed to give greater benefits/compression ratio. > > > > > > No. What James referred to is that saturation tends to fill up the > > > short name slots, and thus flatten the probability distribution. > > > I.e. if somebody doesn't get the name they wanted, the chance is > > > that they go for something like xq.com, because it's easy to > > > remember because it's short. Neither x nor q are very frequent > > > letters. > > > > Han/hangeul characters carries meanings while latin alphabets > > denote phonemes. > > ?? Unless I'm very confused about Hangul, it is at least as much > phonetically-based as Latin. Hangul Jamo are letters of an alphabet, > which happen to be arranged in square cells corresponding to syllables, > instead of linearly. You are only partly correct in that Hangul is phonetic. If you ever read a hangul-to-hangul dictionary, you can find easily that over 70~80% of modern hangul vocabularies came from 1:1 mappings of Chinese words like most english & french words came from latin ones. Therefore, one hangul character carries similar amount of information with its chinese character counterparts. hangul/han both carries as much information as about 2 latins characters. > > Moreover, each Hangul syllable (encoded as a single character when > NFC/NFKC-normalised), normally represents 3 Jamo. That should be taken > into account when assessing whether Hangul is encoded compactly enough. > > > Therefore your analogy between latin and han domains > > may be false. Chinese people would rather choose to register > > digit-added variants of already taken desired domains in saturated > > ML.com, instead of choosing non-sense irrelevant rare han characters. > > > > Later time, I will provide some proofs that SC and TC only have > > small partial set of frequent characters. > > That's not in dispute. The argument is about whether the complexity of > reordering is worth the additional compression. IMHO it isn't - > AMC-Z (or UTF-8) encodings are sufficiently compact that the 63-octet > and 255-octet limits are not a serious problem for any language or script, > and the savings for average names are marginal. > > - -- > David Hopwood <[EMAIL PROTECTED]> > > Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/ > RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5 0F 69 8C D4 FA 66 15 01 > Nothing in this message is intended to be legally binding. If I revoke a > public key but refuse to specify why, it is because the private key has been > seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip > > > -----BEGIN PGP SIGNATURE----- > Version: 2.6.3i > Charset: noconv > > iQEVAwUBO9DO2zkCAxeYt5gVAQF/7AgAzp3KB/kPA2XAxb43hCSbrLBOxavd4WSq > DYfvw2UuwloLkEZB+tkkoOPucW/ElLmaYjuYMKt6nea2LZthLpTWDc8a8ENXqM34 > Z+aP8nqN9XzeMTPisebpCcTE7PZYWdi87a0grmL0KFBzYG0PsxAB905Yvf12oU4U > u3da6Ku37YJeYK0jNi4/qhoAUZ8gyz+gW4MWWxCmuAIrvmIkaf/d4lX4Tu+75mg2 > VcS3ezCGbOt3Wf0GIfUl869BBRbPB7bScBX0EjP/C+sQpCVR6gVs6SKDS9zY/W6k > XImrf7IuLg57za70dy5YiCgNBYOvlNa4Xgi3d+DFoW7jntmj4MEUYw== > =4Lmr > -----END PGP SIGNATURE----- > >
