>Return-Path: <[EMAIL PROTECTED]> >Delivered-To: [EMAIL PROTECTED] >X-Authentication-Warning: opsmail.internic.net: majordom set sender to >[EMAIL PROTECTED] using -f >Date: Wed, 28 Jul 1999 18:29:19 -0700 >From: James Seng <[EMAIL PROTECTED]> >X-Accept-Language: en,zh,zh-TW,zh-CN >To: Robert Elz <[EMAIL PROTECTED]> >Cc: "Martin J. Duerst" <[EMAIL PROTECTED]>, [EMAIL PROTECTED], > [EMAIL PROTECTED], [EMAIL PROTECTED] >Subject: Re: draft-duerst-dns-i18n-02.txt >References: <[EMAIL PROTECTED]> >Sender: [EMAIL PROTECTED] > >Hi all and Robert, > >I think there is some general misconception. > >Internationalization DNS effort is not an issue of martin draft >vs skwan draft. there are merits in both draft, and neither draft >in its current state address all the issues i have encountered >while doing my iDNS testbed with the NICs for the last year. > >What is apparent is that iDNS does not really care whether the final >domain name is in UTF-5 or in UTF-8. Either encoding we choose, you >still have to take care of the iDNS issues which I shall mention >later. But let me reply your mail first. > >Robert Elz wrote: >> I think that you're saying here, that if your system knows UTF-5, you >> can have a UTF-5 domain name, and send me e-mail from that name, and that >> my system which knows nothing of UTF-5 will be able to interpret the name >> and reply to you - your address will look ugly, but it will still meet >> the syntax requirements of everything that matters, and be functional. > >UTF-8 is currently the standard UTF for IETF. Most OS which supports >Unicode already expressed that UTF-8 is the encoding they will use. The >beauty of UTF-8 is obviously in the compatibility with US-ASCII thus >making incorporation of the current existing domain name space easiler. > >UTF-5 is a new proposal, which of course means no vendors is going to >support it at the moment. However, it has a elegance in that the final >Unicode strings is within a limited A-V,0-9 character set which means >it should not break any existing DNS client/server. In fact, if you go >beyond DNS, I doubt it will break any other existing client/server for >other protocols. This makes it extremely compatible UTF. > >I agree with Martin that UTF-8 should be the long term solution. >However, it should be noted that which ever encoding we used, we >have to look beyond DNS, and think in other related protocols. In >particular, the main concern currently is email. While it is possible >to have a domain name in UTF-8 working perfectly, it is not possible >to have an email address totally in UTF-8 without a overhaul in SMTP >protocol and all the clients and servers. Of course, it can be done >and it will take time. > >On the other hand, UTF-5 works pretty well, required minimum changes >in protocols and softwares. This makes it an ideal encoding in a >short-run. > >Ultimately, it is my believe that an DNS system should be designed >properly based on UTF-8. But in the meantime, our effort to push for >UTF-5 is a case of theortical vs practical issues. (Theortically UTF-8 >is better. In practice, UTF-5 works and implements immediately) > >At various meetings, at APRIOCT and at INET, i have tried to get an >agreement on this issues but I couldnt since the iDNS is so new and >I have to spend the session explain the background all over again. >In fact, there are queries why we are using Unicode in the first place! >(Unicode v1.0 did a pretty bad job for Asian language. It is a >historical problem on how they handle the Asians in general when they >first started.) > >Anyway, I am planning to hold an iDNS BoF at the next IETF meeting >in Nov under Application Area so we can trash out this issue. > >> However, and given that I haven't seen James' proposed new draft yet, and >> am relying upon your old one, I believe this relies on the ".i" TLD >> working, doesn't it? I'm afraid I have never been able to give much > >Having an new ".i" TLD has its advantage. While, France in French may end >up using .fr and Germany in German may still be .de, Taiwan in Chinese is >not likely to use .tw nor Japan in Japanese going to use .jp. Of course, >unless you define your i18n to mean only European languages. > >> credence to the possibility of that, and the parallel tree that it implies, >> is practical in any sense at all. It would mean that to allow anyone to >> have a UTF-5 name in .AU I would need to create an AU.i domain, and under >> that COM.AU.i - and unless I were to want to immediately add a large number >> of lame delegations to the DNS, I'd have to administer it separately, only >> adding delegations when requested, while somehow attempting to ensure that >> control of sub-domains rested with the same people who run the equivalent >> COM.AU domains. Thanks all the same, but no thanks. > >There is no conflict as you mention here. You are imaging things :) >If a domain name is strictly US-ASCII, then i do not see why you should >put it under .i hierarchy. You only use the alternative hierarchy if it >is a multilingual domain name. In certain ways, I see the separation >as an advantage in maintaince and also helped a little by balancing the >current hierarchy. > >> That is, if it were considered appropriate, the e-mail specs could be >> left as they are, e-mail addresses could continue to demand ascii domain > >So does that means while I can have a Chinese domain name, I couldnt >use my Chinese name as my Email name? I think you have miss the point >of the issues of our current effort. > >27% of the Japan population is using the Internet. It is estimated that >only 10% Japanese can read and write English and maybe 1/2 of these can >speak it properly. So what do the rest of the 13% do? They stick to their >whole of Japanese website, and sending to each another email in Japanese. >So what you going to tell the rest of the 73% of the Japanese who is >waiting to join the NET? Too bad, learn English first? > >Consider the case in China with 4million Internet user up from 1/2 million >in 1998. This only represent less than 1% of the population and estimately, >about only 2% China users can actually read/write English. China may not >be too bad since it is possible to use transliteration via Hanyu Pinyin >and get a domain name like zhaodaole.com or "findit.com". Still, looking >at the word "zhaodaole" dont really give me any clue that it is means >"findit". > >Consider the case in Taiwan, where Hanyu Pinyin wasnt even used. So >"zhaodale" probably dont even makes sense to them. > >I could go on but my point is that a half-way solution is not acceptable. >We either have i18n of DNS or we dont. And if we do, we should at least >make sure the other relating protocols can be i18n with the domain names. > >> names, and utf-5 encoding, with a .i pseudo-tld could be used to mark >> them. The DNS specs could say that any domain name ending in ".i" is >> one which is presented to the resolver in UTF5, and needs to be converted >> to UTF-8, the ".i" removed, and then looked up in the DNS. This allows >> the DNS to have the "natural" names, avoids any kind of parallel DNS >> tree from being required, and also allows a natural progression towards >> extending the e-mail interface in the future if desired (doesn't tend to >> lock the world into what currently seems like the only implementable >> solution, forever). > >These are implementation issues. > >> In any case, I look forward to seeing James' promised draft, where I'd >> hope that the rationale for all of this will be set out. > >I have already submitted the draft to [EMAIL PROTECTED] But I >think you will be disappointed to know that the purpose of the draft is >to define UTF-5, and not to address iDNS issue. As I mention above, iDNS >issues are issues need to be address irregardless whether we use UTF-5 or >UTF-8. And that is the draft I still writing and I hope to get it done >before the next IETF meeting. > >So what are these iDNS issues? Hell lot and most of it wasnt even addressed >in Kwan's draft or even Martin's draft. > >Lets start with something you know. Case problem. DÜRST or Dürst? >I think this was mention in Kwan's draft and the solution is to convert >it to lower case. :) > >Now, lets consider language outside European. For example, Chinese have >Traditional and Simplified character which have different code point in >Unicode, and of course different glyph but they mean the same thing. For >example, Taiwan in Chinese can be either (in Unicode) U+53F0 U+6063 or >U+81FA U+6063. Both means Taiwan and both are valid. Converting between them >are unfortunately not possible due to CJK unification. This is because >what is Simplified character of a Traditional character in Chinese may not >be so in Japanese. In fact, Japanese would prefer no such conversion. >Korean are simplier since they already stop using the chinese ideograph and >replace it with Hangul. > >Speaking of Japanese, lets take "JAPAN" or NIPPON. It is normally written >in Kanji with using 2 glyph. On the other hand, Japanese childrens do not >learn Kanji before grade 6. They learn the basic Hiragana instead and to >them, the hiragana form of NIPPON are "correct". > >And you can move on to a other languages and you will find other problem >and they are very different. e.g Tamil have their "upper" and "lower" case >glyph and unfortunately in Unicode, these "upper" and "lower" case glyph can >only been obtained via kerneling (as Tamil was not given enough code space). >This means conversion from "upper" to "lower" case is not a simple of A->a. > >This is only part of the issues we need to address. If you are interested, >I strongly suggest you subscribe to our APNG iDNS Working Group. Send an >email to [EMAIL PROTECTED] with "subscribe" to join the mailing list. > >-James Seng > > -- Richard Sexton | [EMAIL PROTECTED] | http://dns.vrx.net/tech/rootzone http://killifish.vrx.net http://www.mbz.org http://lists.aquaria.net Bannockburn, Ontario, Canada, 70 & 72 280SE, 83 300SD +1 (613) 473-1719