Re: IDN and language
> Date: 2005-01-04 12:06 > From: John C Klensin <[EMAIL PROTECTED]> > the > IDN situation is not an issue except in a very narrow sense and > similar situation would apply to local-parts if we ever do > something there. ÂIn the IDN case, the protocols are written in > terms of arbitrary Unicode strings and just about have to be -- > there has never been a DNS restriction requiring that the labels > be names or words in a language. ÂThe protocols apply some > mapping rules that reject a few characters (and hence the labels > that contain them) and change some characters into others, but > the net effect is still a set of standards written in terms of > strings, not languages. My concern is the distinction between "names" (in the sense used in RFC 1958, i.e. protocol elements) vs. "text" (RFC 2277), and internationalizing domain names seems to make sense only if the domain names are being treated in some way as text (i.e. human-readable content, and therefore (possibly, at least) in some language (see 2277 section 2, third paragraph and section 4.1, first sentence). RFC 2277, a.k.a. BCP 18, requires (sect. 4.2) that protocols which transfer text must have provision for carrying language information. The considerations for making provision for language-tagging (N.B. not requiring that every IDN be tagged) are clear when one considers presentation issues for the visually impaired; a screen reader needs to be able to identify language to correctly present at least some subset of labels which might appear in either an IDN or internationalized local-part. > The situation with local-parts will, most of us are convinced, > work out in much the same way. ÂThere is a long history of > strings used in local-parts that are not "names", "words", or > otherwise bound to a particular language. [...] Agreed, and I'd be happy if local parts and domain names were to be treated purely as protocol elements. But given the decision to internationalize and hence the treatment as text at least for presentation, there ought to be provision for indicating language where necessary for correct presentation. ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: IDN and language
At 23:37 04/01/2005, John Cowan wrote: John C Klensin scripsit: I know that -- I did read 3743 first. But in that case, whatever did you mean by "ICANN has created a recommendation [...] that languages not be mixed within a label"? The first question (see may yesterday mail) is to define what we are talking about. What is a language. You do not talk about the same thing as ICANN. How could it? There is no requirement that there be a table for every possible language tag, after all; all existing language tags remain valid. These tables are just tagged content like any other, though the application of the tag is different from the usual application. I do not understand. What is the "usual application" ? We are talking about a standard? jfc ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: IDN and language
John C Klensin scripsit: > I suppose there are always exceptions. In particular, the > recommendations of RFC 3743 are about tables of characters, not > dictionary lookup. I know that -- I did read 3743 first. But in that case, whatever did you mean by "ICANN has created a recommendation [...] that languages not be mixed within a label"? > If, however, a domain decided to adopt a > canonical dictionary and lookup in it as a registration > criterion, that rule would be perfectly enforceable. Certainly. But that is not the same as saying "languages [SHOULD] not be mixed in a label." That is a stricture about linguistic entities, not about entries in a dictionary. > Other issues occur if the writing order of > characters in a language obeys specific rules and one chooses to > enforce them (a potential issue with, e.g., Hangul, although, > again, the choice of whether or not to try to enforce is up to > the registry). This is even more confusing. What languages do *not* impose a specific writing order on their characters? > It is not clear that the current proposal is much better than 3066 > for handling those cases, but I wonder if anyone has carefully > evaluated whether it would make things worse. How could it? There is no requirement that there be a table for every possible language tag, after all; all existing language tags remain valid. These tables are just tagged content like any other, though the application of the tag is different from the usual application. -- XQuery Blueberry DOMJohn Cowan Entity parser dot-com [EMAIL PROTECTED] Abstract schemata http://www.reutershealth.com XPointer errata http://www.ccil.org/~cowan Infoset Unicode BOM --Richard Tobin ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: IDN and language
--On Tuesday, 04 January, 2005 12:52 -0500 John Cowan <[EMAIL PROTECTED]> wrote: > John C Klensin scripsit: > >> Returning to the DNS/IDN situation, ICANN has created a >> recommendation for all TLDs, and a requirement on at least >> some gTLDs, that languages not be mixed within a label and for >> registration and use of tables similar to those recommended by >> RFC 3743. > > This regulation is going to be completely unenforceable, since > with a few exceptions (hexagonal French), languages do not > have bright-line rules saying what words they do and do not > contain. Are we to be in the position of saying that > eigenvector.com may be registered (and is) because the word > appears in dictionaries, whereas eigenevent.com is ruled out > because it "mixes" English and German? John, I am sure that ICANN would welcome your participation as the various rules/ guidelines evolve -- those rules are not an IETF problem, even though changes to the standard that is used to label them might be. One of the things their processes have in common with the IETF is that they prefer that people actually try to read and understand documents before attacking them, but I suppose there are always exceptions. In particular, the recommendations of RFC 3743 are about tables of characters, not dictionary lookup. If, however, a domain decided to adopt a canonical dictionary and lookup in it as a registration criterion, that rule would be perfectly enforceable. I'd recommend against it for many reasons, but this would be more or less up to them. > Forbidding the mixing of scripts is another matter, although > in fact some languages are written using more than one > (Unicode) script. Whether those languages are a problem or not in the DNS context depends on whether one wishes to permit a single label to use both (or all three in at least a few cases I know of) scripts. Again a per-registry decision and again perfectly enforceable either way. Other issues occur if the writing order of characters in a language obeys specific rules and one chooses to enforce them (a potential issue with, e.g., Hangul, although, again, the choice of whether or not to try to enforce is up to the registry). But one of the notational problems with using 3066 would be a rule that one can have a label that contains the characters of a given language written in, e.g., either a modified Arabic script or a modified Cyrillic one but not in a modified Roman ("Latin") one. Another issue arises when one wants to permit a character collection that includes the characters from a given script that are used by two separate languages -- not all of the characters of that script, but exactly those characters that fall into the union of the characters from the script used by the relevant languages. It is not clear that the current proposal is much better than 3066 for handling those cases, but I wonder if anyone has carefully evaluated whether it would make things worse. john ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: IDN and language
At 18:06 04/01/2005, John C Klensin wrote: Returning to the DNS/IDN situation, ICANN has created a recommendation for all TLDs, and a requirement on at least some gTLDs, that languages not be mixed within a label and for registration and use of tables similar to those recommended by RFC 3743. Those tables are identified by a combination of the Domain name associated with the registering TLD registry and a 3066 code. That system is not, IMO, working especially well and the 3066 code model will, I think, have to be extended to deal with some unusual situations. But, interestingly, draft-phillips... doesn't appear to solve that particular problem: what is needed is a way to specify odd mixtures of languages and/or scripts that may be appropriate to a particular zone, and that means less specificity and more linguistically-strange constructions, not more specificity and structure. The real problem is the confusion all this introduce because it is not a consensual draft by an IETF WG working along an IAB approved Charter, what is odd when the discussed RFC was authored by the IESG Chair and the private mailing list hosted under his name with the name "ietf-language" what is confusing to many. At this stage, we can only say that there is no consensus on what is discussed, on the problems to solve and the proposed solutions. But that there is no reason why there would not be such a consensus when the charter I outlined yesterday would have been carried. jfc ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: IDN and language
> ruled out because it "mixes" English and German? > Sorry I can't resist: like in EdelWeb.fr ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: IDN and language
John C Klensin scripsit: > Returning to the DNS/IDN situation, ICANN has created a > recommendation for all TLDs, and a requirement on at least some > gTLDs, that languages not be mixed within a label and for > registration and use of tables similar to those recommended by > RFC 3743. This regulation is going to be completely unenforceable, since with a few exceptions (hexagonal French), languages do not have bright-line rules saying what words they do and do not contain. Are we to be in the position of saying that eigenvector.com may be registered (and is) because the word appears in dictionaries, whereas eigenevent.com is ruled out because it "mixes" English and German? Forbidding the mixing of scripts is another matter, although in fact some languages are written using more than one (Unicode) script. -- "And it was said that ever after, if anyJohn Cowan man looked in that Stone, unless he had a [EMAIL PROTECTED] great strength of will to turn it to other www.ccil.org/~cowan purpose, he saw only two aged hands withering www.reutershealth.com in flame." --"The Pyre of Denethor" ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: IDN and language
--On Tuesday, 04 January, 2005 09:38 -0500 Bruce Lilly <[EMAIL PROTECTED]> wrote: >> One is not. Domain names are strings of characters; only >> incidentally do they spell out one or more words in one or >> more languages. I doubt whether the names "Google," "Yahoo," >> and "AltaVista" can be pinned down as belonging to one >> specific language. > > I was referring specifically to internationalized domain names > (IDN, RFCs 3490, 3491, 3492, 3743) where the on-the-wire > domain name continues to be of traditional form (ANSI X3.4 > letters,digits, and hyphen (with restrictions on combinations > and placement)), but where a certain class of names (those > beginning with "xn--") are "internationalized" and might be > presented to users in a different form (which can include > non-ASCII characters). That came about because of the > tendency to associate a domain name (tag) with a natural > language "name" or legally-registered name (trademark, etc.). > Whether one considers such associations logical or > irrational, that is what has happened. So one could have > a domain name (beginning with xn--) that is presented by > an application as "Nestlé.com". Now certainly some names, > such as your examples, Kodak, Häagen-Dazs, etc. have no > language (because they are made-up strings of characters), > but others do have a specific language. In skimming through > the RFCs mentioned above, it appears that there is now some > provision for language tagging (which was not present in > earlier versions of IDN). However, I have not thoroughly > reviewed those recent additions; therefore it should be > clear that I have not reviewed the impact of the proposed > draft changes on IDN or vice versa. Such a review should > take place (ideally before the deadline for the New Last > Call on draft-phillips-langtags-08 (tomorrow!)), but I'm > not the person to do so as I have only slight interest in > IDN (I'm one of those who considers associating a tag > with natural language and/or legally registered names to > be irrational). One potential issue is that domain names > are case-insensitive, and whether lower-case accented > characters map to/compare with unaccented upper-case > letters may be a function of language (or culture, or > political fiat). >... > I would add that there is apparently some discussion of > wreaking similar havoc on local-parts, which appear in > message-identifiers and email mailbox identifiers (STD 11). > That too should be evaluated w.r.t. specification of > language and the proposed changes. Bruce, While I'm sympathetic to many of the points you have raised, the IDN situation is not an issue except in a very narrow sense and similar situation would apply to local-parts if we ever do something there. In the IDN case, the protocols are written in terms of arbitrary Unicode strings and just about have to be -- there has never been a DNS restriction requiring that the labels be names or words in a language. The protocols apply some mapping rules that reject a few characters (and hence the labels that contain them) and change some characters into others, but the net effect is still a set of standards written in terms of strings, not languages. There has been a good deal of concern in the DNS community about the potential for deliberately or accidentially misleading users about domain names and the consequent opportunities for confusion or outright fraud. Those concerns have led to a good deal of work on restrictions about what strings can be registered, imposing, e.g., rules that the holder of one string may be the only permitted holder of a related one and rules that prohibit mixing scripts within a single label. These types of rules, especially the latter, are the "very narrow sense" mentioned above, but they have no impact on the protocols themselves. The registration rules actually differ from zone to zone and can safely do so because, to the user of the DNS, an unregistered name is an unregistered name and the distinction as to whether a name is unregistered because no one wanted it or because some subtle rule prohibited its registration is not of importance. The situation with local-parts will, most of us are convinced, work out in much the same way. There is a long history of strings used in local-parts that are not "names", "words", or otherwise bound to a particular language. Worse, different destination systems apply different internal syntax rules and interpretations to local-part strings. Protocols will need to be designed to reflect that history and avoid unreasonable restrictions. At the same time, I would expect the administrators of an given local system to impose restrictions on what local-parts parts can be used for mailboxes there (just as is often done today). Those restrictions may, in many cases, reflect assumptions about languages and/or scripts but, since they are purely local conventions, there is no need for external registration. Returning to the DNS/
Re: IDN and language
> Re: draft-phillips-langtags-08, process, specifications, "stability", and > extensions > Date: 2005-01-01 19:56 > From: "Doug Ewell" <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > > Bruce Lilly wrote: > > Domain names and > > language tags are different types of names, used for > > different purposes, and with different scope (largely > > non-overlapping, though one might legitimately ask how > > one is supposed to determine the language of an > > "internationalized" domain name...) > > One is not. Domain names are strings of characters; only incidentally > do they spell out one or more words in one or more languages. I doubt > whether the names "Google," "Yahoo," and "AltaVista" can be pinned down > as belonging to one specific language. I was referring specifically to internationalized domain names (IDN, RFCs 3490, 3491, 3492, 3743) where the on-the-wire domain name continues to be of traditional form (ANSI X3.4 letters,digits, and hyphen (with restrictions on combinations and placement)), but where a certain class of names (those beginning with "xn--") are "internationalized" and might be presented to users in a different form (which can include non-ASCII characters). That came about because of the tendency to associate a domain name (tag) with a natural language "name" or legally-registered name (trademark, etc.). Whether one considers such associations logical or irrational, that is what has happened. So one could have a domain name (beginning with xn--) that is presented by an application as "Nestlé.com". Now certainly some names, such as your examples, Kodak, Häagen-Dazs, etc. have no language (because they are made-up strings of characters), but others do have a specific language. In skimming through the RFCs mentioned above, it appears that there is now some provision for language tagging (which was not present in earlier versions of IDN). However, I have not thoroughly reviewed those recent additions; therefore it should be clear that I have not reviewed the impact of the proposed draft changes on IDN or vice versa. Such a review should take place (ideally before the deadline for the New Last Call on draft-phillips-langtags-08 (tomorrow!)), but I'm not the person to do so as I have only slight interest in IDN (I'm one of those who considers associating a tag with natural language and/or legally registered names to be irrational). One potential issue is that domain names are case-insensitive, and whether lower-case accented characters map to/compare with unaccented upper-case letters may be a function of language (or culture, or political fiat). I would add that there is apparently some discussion of wreaking similar havoc on local-parts, which appear in message-identifiers and email mailbox identifiers (STD 11). That too should be evaluated w.r.t. specification of language and the proposed changes. ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf