> We have [STD13] defines that LDH are the DNS identifiers, > then what are the IDN identifiers? UCS is too big and contains > many semantically equivalent characters for IDN. Should we > ask for a table of semantically equivalent character sets > definition table from Unicode Consortium?
To me, what you saying is no different from Normalization Form. > 1) label separators, ie puncturations and formating marks > 2) structured data indicators, ie. $/%/& ... > 3) unstructured data identifiers, ie. alphabet, CJKs, > sound marks... Take a look at the categories. It is already there. We just have to use it properly. > 1)case insensitive, Case folding. Done. > 2)size or width insensitive, Normalization Form. Done. > 3)font insensitive (include majority of TC/SC) Unicode Consortium dont deal with fonts. It provide characters for references, but not fonts standardisation. TC-SC is not "font sensitivity" issues. > 4)language insensitive (include CJK), Normalization form *is* language insensitive. It only deals with scripts. > 5)combination insensitive(regardless NFC or KNFC). > Language insensitive: ie. circled numbers, circled > Han numerals, Dingbats, subset of CJKs. But other > subset of CJK will be different semantically for each > languages, then we have to have separated tables to > work with for each or them. You are venturing into a very dangerous area of script vs language. ISO10646 and UCS is a script based CCS, not a language-based. The moment we want to deal with "language", we are on our own. AFAICS, we do have agreement that we can do I18N, => script. We do not have agreement to do multilingual => language. Please dont confuse the two. I have no intention to start a conversation about "multilingual domain names". We tried and the conclusion is that it is not possible. -James Seng
