Responding to several messages together, rather than sending a series of fragmented messages...
Folks, I'm pleased to see so much interest on this list, but note that the issue has already been extensively discussed on the IETF list and elsewhere. I'm copying Tina Dam, who has the IDN lead for ICANN, on this message; if there is further traffic, you might want to include her as well (or, for the reasons discussed below, she might want to move part of the conversation to an appropriate ICANN list). --On Wednesday, 16 February, 2005 15:09 +0100 "JFC (Jefsey) Morfin" <[EMAIL PROTECTED]> wrote: >... > Thank you for your response. Let assume this list is the > missing International coordination list. While I think that an "International coordination list" would be helpful, my assumption is that ICANN's idn-discuss list (see http://www.icann.org/topics/idn.html for subscription information and some other information that might be useful) is intended to serve that purpose. It is pretty clearly not this, or any other, IETF list: the IETF made a series of fairly explicit decisions to not get into the business of deciding what things could or could not be registered within the very broad coding mechanisms specified as part of IDNA. > As you know we are > in a gray situation regarding the IANA and language tags. > There is the RFC 3066 which accepts ISO 639 as a reference and > which permits the language "specialists" (ISO and W3C) of the > http://www.alvestrand.no/mailman/listinfo/ietf-languages > mailing list to discuss missing languages. This is clean. > > But then we have a confusion. > > 1. When you register a IANA tag, you are to register it by > language, script and ccTLD. This is clear. Actually, that is not the ICANN/IANA requirement, nor is it clear where their requirement applies (e.g., such registrations are optional). Their requirement is unclear and, IMO, needs updating. I hope that they will get to that updating process RSN. On the other hand, I have had that hope for well over a year. Progress is unlikely to be made on those subjects by further discussion here. >... > 1. internationlisation: this is the IDN and the ccTLD (quoted > in the language/script/ccTLD sequence is the authority). So >... Jefsey, as you know, we do not agree on your analysis of language and cultural issues and how they relate to the Internet's protocol suites or the DNS and agree even less with your model of how the DNS works and its role in the world. We also disagree on what appears to be your desire --exhibited by citing OPES as an Internet norm-- to move all sorts of issues from the edges of the network to the center. For me, one of the major strengths of the Internet, and a key reason it has been deployed worldwide and achieved as much penetration as it has, is that it strongly resemblances an "edge"-based network and has avoided the many traps associated with a design in which significant functionality, especially significant applications functionality, is located in between peers or clients and servers I won't reprise the rest of my side of that disagreement here. In a separate note, at Tuesday, February 15, 2005 7:44 PM, you wrote, in part... >> Should it not be supported on the IANA server and common to >> all the gTLDs? I agree with Pat that commonality --among gTLDs and even more generally -- would be a good thing, especially for the would-be registrant who wishes to register the same name in multiple domains. But ICANN has so far chosen to not to try to impose it, and I hope they don't. We have so far discovered at least two things that may argue for caution in this area: (i) As I trust everyone on this list knows, phishing is only part of a far more general set of issues that can cause end-user confusion (whether through accident or malice). For any given domain, there is a tradeoff between maximum safety (which might require permitting only a small number of characters and imposing significant restrictions on how they are used) and maximum registration flexibility (which might argue for much more flexible rules). In my personal opinion, at least, it is far better for the Internet to let domains compete on how protective or flexible they want to be, assuming the advantages and risks of whatever solution point they pick, than to try to impose some Procrustian solution. (ii) An interesting distinction has been identified between the needs of a domain that must serve the requirements of a particular country and a domain that supports the language commonly associated with that country. For the first case, of which .DE is the best-worked-out example, there is a legitimate requirement for registration of common names, company names, street names, etc., in Germany. Given history, that list will include strings and characters that don't exist in the German language. It may include strings the contain combinations of characters that do no appear together in any contemporary language that uses Roman-based characters. By contrast, if a gTLD creates a language table defined around the German language, many of the characters needed by .DE are simply invalid. That contrast, which Martin has identified in the form of the difference between the "German" tables used in the TLDs for Austria and Switzerland relative to those used in Germany) may require taking a different look at the rules and guidelines (and table registration models) than we have heretofore taken: either for rather different guidelines for ccTLDs than for gTLDs, or for rethinking the registration model, or both. The issue that Pat identifies with Tajik is another piece of the same puzzle: many of us may believe that there is no possible reason to mix the three scripts in which that language can be written in a single label, and I certainly trust Roozbeh's knowledge and experience in that area. Certainly, it would make things safer to prohibit any mixing (note that IDNA's BIDI restrictions essentially prohibit mixing an Arabic-derived script with anything other than itself, another Arabic-derived script, or Hebrew). However, we have a long history of DNS labels that could not possibly be words in any language. Whether or not to permit mixed-script labels is presumably an issue that the registry for .TJ will need to sort out (I have been told, for example, that mixed Cyrillic and Latin-character labels are likely to be a requirement in Serbia and Montenegro, although this illustration might give them pause). And the best answer for them might or might not be the best answer for a gTLD. In addition, as Hotta-san's very helpful note points out, one could considerably reduce the scope of the identified confusion/phishing problems by aggressively applying a variant model across scripts, restricting the registration of homographs to the same registrant. I personally suspect that will not prove practical, from a policy standpoint, in the collection of alphabetic scripts that share Old Semitic origins, but that is, IMO, just another argument for giving different registries the flexibility to develop their own policies and take responsibility for the consequences of those policies. Again, these issues need to be worked out in an ICANN forum; the IETF has thrown the problem over the wall and shows no signs of wanting it back. --On Wednesday, 16 February, 2005 08:07 +0100 "\"Martin v. L�wis\"" <[EMAIL PROTECTED]> wrote, responding to Soobok Lee: >> All Cyrillic label "HP" (.com) can be registered even in >> Russian language pack. >> >> Cyrillic "HP".COM in its uppercase form looks the same as >> all ASCII "HP.COM". >> >> Any Registration Process should filter out these "HP" like >> combinations.. But the only way to do that would require that a domain that permits Cyrillic characters must ban ASCII characters and vice versa. I would predict that will just never happen, if only because every domain that exists today has a long of all-ASCII labels in it. It is not a very good example (see below), but I note that this particular example is only of the reasons why "identify a mixed-script label in the application" may be a useful tool, but is not a solution -- this is not a mixed-script label. > I think this is unreasonable. The lower-case forms ("??" vs > "hp") > look quite differently, and browsers typically display domain > names in all-lower in the address bar. Regardless of what browsers do (there is a case to be made that, for traditional labels, if they get an all-upper-case label back from a DNS query and display it in lower case, they are violating the intent of the spec), note that, for IDNs, mapping through IDNA and back (ToUnicode followed by ToASCII) will always result in lower case and application of several other mappings. >... > just because they are homograph with a latin combination. Then, > the same would apply to Greek vs. Cyrillic. Yes, banning a mixture of Latin characters with those of any other script won't work either because one can get homographs among other pairs of scripts. However, it might be realistic to ban the combination of Greek and Cyrillic in the same zone, while it would not be practical to ban the presence of Roman-based characters with either. Mostly, again, this points out the importance of zone-specific policies that are well-tailored to the needs of that particular zone. To repeat what I have said on the IETF list and elsewhere, nothing is going to make these issues 100% foolproof or easy. A number of tools may help. Certainly carefully-designed restrictions on what can be registered in a particular zone and what characters can be used together in the same label will help. Intelligent and well-thought-out use of variant models may help a good deal. If labels that can be identified as having no use other than to confuse or defraud can be rejected at registration time, that would eliminate a lot of problems down the line -- that may be practical in some cases but not in others. I'm a bit skeptical about identification of mixed-script labels in applications, not because I think it wouldn't be useful, but because carrying those tables around and keeping them up to date could be a bar to implementation and performance, but there may be ways to make it practical. I think users --both those whose preferred scripts are written in Roman-based characters and those whose preferred scripts are not-- are going to need to become educated about some of these issues, not just to protect themselves but to understand when IDNs or IRIs can be exchanged with others with high odds that they will be usable and unambiguous. I think there is a lot of potential in distinctions like the Firefox one between "copy link location" / "paste link" and "copy"/"paste" and that we may discover that the former pair should convert to punycode and URIs and back to UTF-8 (or whatever) and IRIs to prevent inter-application and inter-system cut and paste problems. And I hope we can all figure out a way to work together to make this work. It is important, "don't use IDNs" isn't an answer now and never has been, and the alternatives are just a choice among ways to fragment the Internet. john
