Re: [DNSOP] clue w.r.t. arabic

Eric Brunner-Williams Fri, 19 Nov 2010 04:34:37 -0800

On 11/19/10 2:43 AM, Masataka Ohta wrote:

Eric Brunner-Williams wrote:

Anyway, Arabic strings are examples of exponential explosions
with large coefficients a lot easier to understand for most
of you than Chinese ones.


the density of variant (context dependent) characters in arabic script,
whether sampled as text, or sampled as domain names, is sparse,


While meaningful variations would be sparse, meaningful
capitalization of Latin is also sparse. That is, meaningful
capitalization of "mydomain" should be

        myDomain
        Mydomain
        MyDomain
        MYDOMAIN

However, meaningless capitalization such as:

        mYdOmAiN

should also be protected, which should be same for Arabic.

A protection could be rejection of domain registration. But it,
anyway, requires a document of complete and unambiguous
definition of extended case insensitivities.

as a sometime registry operator, i'm only mildly curious as to theregistration policies of other registry operators. the range of valuespreferred by a data base operator is distinct from the range of valuespossible in a data type, and the later is amenable to agreement amongimplementers.

And, the definition used for TLDs must be international one.

while this sounds nice, in practice it means that the operators of thetwo constellations of name servers that support two mostly identicalviews of "." must agree, and from november 2001 to the very recentpresent, there was a substantial area of disagreement between thesetwo operator communities over a charset issue. this is probably not afeature to be accidentally aggravated by an insistence on a single,and possibly incomplete, policy model, or indifference to theperception of genuine necessity by one, or the other, of the operators.

So, the requirement for complete and unambiguous specification
for case insensitivities or canonicalization is same.

case is a property of a particular character repertoire, preexistingdigital encoding methods.

canonicalization in the sense used in i18n contexts where variablelength encodings allow for two or more possible encodings for one ormore characters in some character repertoire, and meaningless in fixedlength encoded contexts, results in fewer, and possibly a singlechoice of possible encodings of characters, and therefore of sequencesof multiply encoded characters. it is a property of variable lengthencodings, not of encodings generally, or of character repertoires, orscripts.

it is unlikely that a property preexisting encodings, and a propertyof specific encodings, are correctly specified without distinction.

relative
to the density of "variant" characters in the (unified) han script(s),
which is not quite 2^^n,


You seems to be thinking "variant" for Chinese mean simplified
and complex. But there are other types of "variant" Chinese
characters.


the density of true variant characters in (unified) han script,

whether sampled as text, or sampled as domain names, is also sparse,relative to the density of "variant" characters in the (unified) hanscript(s), which is not quite 2^^n, where n is the number ofcharacters in a label, but is sufficiently close to allow the"exponential" term to be reasonably used.

quoting my previous note's concluding para with only the followingchange: s/arabic/(unified) han/

"variants in arabic script present problems to the idn(a)specification(s) that assume "unicode" as the character repertoire,but they are unlike in scale the problems presented sc/tc equivalenceclasses presented with similar conditions and assumptions."


 It's somewhat like plain 'C' without ceddille and

'C' with ceddille, which, sometimes (here is dependency on
locale information), must be treated as identical characters.

their answers were as i expected, and fail to support a
"Arabic strings are examples of exponential explosions with
large coefficients" claim.


They should, naturally, reject meaningless combinations.

i think the fundamental issue here is not to accept meaningless, ormeaning-loosing reasonings by imagined similarity. case is not aproperty of han script(s). when non-specialists imagine that hanscript(s) have a case-like property and then try to reason about hanscript(s), error arises.

there are interesting issues in computerized typography, of arabicscript, and other scripts, with, and without, specific encodingproperties, and with, and without, the issue domain being specific toan application domain, such as dns labels.


-e
_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] clue w.r.t. arabic

Reply via email to