On 11/19/10 2:43 AM, Masataka Ohta wrote:
Eric Brunner-Williams wrote:

Anyway, Arabic strings are examples of exponential explosions
with large coefficients a lot easier to understand for most
of you than Chinese ones.

the density of variant (context dependent) characters in arabic script,
whether sampled as text, or sampled as domain names, is sparse,

While meaningful variations would be sparse, meaningful
capitalization of Latin is also sparse. That is, meaningful
capitalization of "mydomain" should be

        myDomain
        Mydomain
        MyDomain
        MYDOMAIN

However, meaningless capitalization such as:

        mYdOmAiN

should also be protected, which should be same for Arabic.

A protection could be rejection of domain registration. But it,
anyway, requires a document of complete and unambiguous
definition of extended case insensitivities.

as a sometime registry operator, i'm only mildly curious as to the registration policies of other registry operators. the range of values preferred by a data base operator is distinct from the range of values possible in a data type, and the later is amenable to agreement among implementers.

And, the definition used for TLDs must be international one.

while this sounds nice, in practice it means that the operators of the two constellations of name servers that support two mostly identical views of "." must agree, and from november 2001 to the very recent present, there was a substantial area of disagreement between these two operator communities over a charset issue. this is probably not a feature to be accidentally aggravated by an insistence on a single, and possibly incomplete, policy model, or indifference to the perception of genuine necessity by one, or the other, of the operators.

So, the requirement for complete and unambiguous specification
for case insensitivities or canonicalization is same.

case is a property of a particular character repertoire, preexisting digital encoding methods.

canonicalization in the sense used in i18n contexts where variable length encodings allow for two or more possible encodings for one or more characters in some character repertoire, and meaningless in fixed length encoded contexts, results in fewer, and possibly a single choice of possible encodings of characters, and therefore of sequences of multiply encoded characters. it is a property of variable length encodings, not of encodings generally, or of character repertoires, or scripts.

it is unlikely that a property preexisting encodings, and a property of specific encodings, are correctly specified without distinction.

relative
to the density of "variant" characters in the (unified) han script(s),
which is not quite 2^^n,

You seems to be thinking "variant" for Chinese mean simplified
and complex. But there are other types of "variant" Chinese
characters.

the density of true variant characters in (unified) han script,
whether sampled as text, or sampled as domain names, is also sparse, relative to the density of "variant" characters in the (unified) han script(s), which is not quite 2^^n, where n is the number of characters in a label, but is sufficiently close to allow the "exponential" term to be reasonably used.

quoting my previous note's concluding para with only the following change: s/arabic/(unified) han/

"variants in arabic script present problems to the idn(a) specification(s) that assume "unicode" as the character repertoire, but they are unlike in scale the problems presented sc/tc equivalence classes presented with similar conditions and assumptions."

 It's somewhat like plain 'C' without ceddille and
'C' with ceddille, which, sometimes (here is dependency on
locale information), must be treated as identical characters.

their answers were as i expected, and fail to support a
"Arabic strings are examples of exponential explosions with
large coefficients" claim.

They should, naturally, reject meaningless combinations.

i think the fundamental issue here is not to accept meaningless, or meaning-loosing reasonings by imagined similarity. case is not a property of han script(s). when non-specialists imagine that han script(s) have a case-like property and then try to reason about han script(s), error arises.

there are interesting issues in computerized typography, of arabic script, and other scripts, with, and without, specific encoding properties, and with, and without, the issue domain being specific to an application domain, such as dns labels.

-e
_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Reply via email to