Re: Will Language Wars Balkanize the Web?
At 03:49 AM 12/8/00 +0859, Masataka Ohta wrote: >However, they can't justify to call them internationalization. precisely.
end to end (Re: Will Language Wars Balkanize the Web?)
At 06:21 PM 12/6/00 +, Graham Klyne wrote: >BTW, the basic tenet of end-to-end connectivity of data and services is, I >think, satisfied by the IP layer. Part of my question was about the >extent to which this end-to-end-ness needs to be duplicated at higher layers. Not sure whether this is a distraction -- hence the modified Subject -- but I do NOT consider an end-to-end mechanism at one level to be sufficient, when talking about end-to-end at another level. Lower layers must support the e2e requirements of the layer under discussion, but those lower layers do not satisfy the requirements by themselves. If the layer under discussion, in this case the DNS application, does not support e2e, then the fact that IP does does not buy much. d/ =-=-=-=-= Dave Crocker <[EMAIL PROTECTED]> Brandenburg Consulting Tel: +1.408.246.8253, Fax: +1.408.273.6464
Re: Internationalization and the IETF (Re: Will Language Wars Balkanize the Web?)
> If the world had asked you or me to design an international > language, I think either of us would have done better. Don't be too sure. Even today, there are no more speakers of Esperanto than of Mayan.
Re: Internationalization and the IETF (Re: Will Language Wars Balkanize the Web?)
> From: Henk Langeveld <[EMAIL PROTECTED]> > > You know, it isn't that long ago that I realised that for many Americans, > "International" is synonymous with "Non-American". That is as true as the observation that many who learn English as a second language think that "international" is synonymous with using the language of their few dozen million countrymen. It is a fact that the single international language of the late 20th and early 21st is far more closely related to a subset of American English than any other local language. It is also a fact that only during my lifetime has that odd situation developed. If the world had asked you or me to design an international language, I think either of us would have done better. But the first fact is all that matters. If it makes your feel better, note that just as Latin was not exactly what Italians spoke, the current international language is not exactly what is spoken by citizens of the largest nation that calls itself The United States of America (there are >1) and whose mother tongue is English. Thanks to satellite TV and other forms of what the P.C. call cultural imperialism, the modern difference are small, but they exist. > From: Dave Crocker <[EMAIL PROTECTED]> > >Diacritical marks are no different from Cyrillic, Arabic, Greek, Hebrew, > >Sanskrit, and other non-Latin character sets in not being not part of > >the international language. The goal of communicating is to communicate, > >not wave flags in support of national languages. > > In a sense, Harald's observation points out a case in which all those other > sets very much ARE part of the "international" language. If those are part of your "international language," then what characters are not part of it? It is Polically Correct to pretend we all speak, read, and write a single language, but also hopelessly silly. > It does not matter whether readers understood the semantics of the strings; > they needed to be able to see them. > That is not national flag waving. > That is global utility. "Global unity" is a matter of everyone being able to communicate with everyone else. It has not only has nothing to do with each of us using our favorite set of glyphs, but goes against it. Each of us using our favorite language *internationally* is a real Tower of Babel. Being able use strings is not only a matter of being able to type their characters. Those of us who have studied languages with alphabets other than what learned while young have discovered that just as the human ear has difficulty hearing sounds outside our mother tongues, the human eye has trouble seeing foreign glyphs. If they're not yours, all of those diacritical marks look the same or are invisible. There are good reasons why the international lingua francas of previous millenia have forced people to transliterate their native writings instead of importing them wholesale. MIME and 8-bit domain names are mechanisms for importing wholesale instead of transliterating. They're good *locally*, but not *internationally*. > ... > Technical standards work often gets distracted by trying to deal with > issues that are outside the scope of reasonable technical standards > work. It should not be the task of such work to dictate or constrain users > to only socially acceptable behavior. That is a social task, not a > technical one. Yes. So why do otherwise rational IETF particpants claim that social and political notions such as "global unity" are somehow related to MIME and IDN? MIME and localized domain names are good and necessary, but only locally or provincially, even when "locally" involves vast land areas (e.g. Russian or Spanish) or billions of people. > Choosing to send various types of data requires making decisions about the > context. No technical standard can be designed to "automatically" > determine when it is, or is not, appropriate to send that data, whether it > is diacritical marks, kanji, or an excel spread sheet. Even when the > sender has information about recipient capabilities, social factors affect > the choices. Yes, so why do some MIME and *localized* domain name advocates claim otherwise? What is the pathology insisting that sending MIME to international mailing lists makes sense? Why do apparently rational people claim that 8-bit binary domain are "international"? Because they've been infected with Political Correctness or because they don't want to dilute political support among the unthinking for whatever they're advocating? > ... > At least the recipient has the unintelligible data well isolated and > labeled. MIME did its job. Yes, but the justification of the sender for using MIME to send unitllitible data is crazy, since communication is averted while resources, including the human recipient's time, are wasted. > ... > The question is whether a coherent extension to DNS will be done in a > fashion which will keep the DNS integrated, or whether
Re: Will Language Wars Balkanize the Web?
Keith; > > > you missed it. Suppose you could not exchange in commerce with a person of > > > a given nationality, not because you did not have a language in common with > > > him or her, but because your system could not interpret his or her name. > > > That would mean that you could not spend money in that person's direction, > > > because you could not communicate with him or her. > > > > And it means that person is at a disadvantage in your marketspace, and > > that it's not your problem. > > why in the world do people think they can justify or not justify actions > based on whether something is an advantage/disadvantage in some > "marketspace"? They can justify them locally within local marketspaces, of course. However, they can't justify to call them internationalization. Masataka Ohta
Re: Will Language Wars Balkanize the Web?
> > you missed it. Suppose you could not exchange in commerce with a person of > > a given nationality, not because you did not have a language in common with > > him or her, but because your system could not interpret his or her name. > > That would mean that you could not spend money in that person's direction, > > because you could not communicate with him or her. > > And it means that person is at a disadvantage in your marketspace, and > that it's not your problem. why in the world do people think they can justify or not justify actions based on whether something is an advantage/disadvantage in some "marketspace"? Keith
Re: Internationalization and the IETF (Re: Will Language Wars Balkanize the Web?)
Date: Thu, 07 Dec 2000 07:23:11 -0500 From: Dave Crocker <[EMAIL PROTECTED]> At least the recipient has the unintelligible data well isolated and labeled. MIME did its job. Indeed. If I get a mail message which is in HTML only, 99.97% of the time it's SPAM-mail. And I've lost count of how many time I've received Chinese (or other Asian language) SPAM-mail. In fact, I'm seriously thinking about coding up a rule which automatically junks HTML mail unread. I guess MIME is useful for something. :-) - Ted
Re: Internationalization and the IETF (Re: Will Language Wars Balkanize the Web?)
Keith Moore wrote: > Furthermore, a > great many people use multiple languages (not necessarily including > English) is, so that a given person, host, or subnetwork will often > need to exist in multiple (potentially competing) locales at once. Sometimes even in the same sentence. My mother grew up partly in Quebec; when she's talking to her siblings, they'll often use French words when the English ones don't come to mind immediately. -- /==\ |John Stracke| http://www.ecal.com |My opinions are my own.| |Chief Scientist |=| |eCal Corp. |How many roads must a man walk down before he| |[EMAIL PROTECTED]|admits he is LOST? | \==/
Re: Will Language Wars Balkanize the Web?
At 08:15 AM 12/4/00 -0500, Dave Crocker wrote: >On the other hand, this thread was triggered by Graham's question about >the negative impact of partitioning. The postal example would seem to >show that the effect is not so bad. > >Except I would claim that it is not partitioning. Note that an address >always has a global representation, in addition to a possibly different >local one. You're right, it's not strictly partitioning... >Perhaps that can reconciled as easily as claiming that any 'local' domain >name must also have a global form? (But, somehow, the word "scaling" gets >in the way of believing that.) ... when I asked that question, I had in mind something like Tim Berners Lee presented about at the WWW5 conference in 1996, in which connectivity between communities might be seen as having a fractal structure, with groupings and lines of communication between groups visible at a range of scales. I think this is, in part, how people achieve flexible scalability in their communications. (Similar patterns also arise in natural phenomena). #g -- BTW, the basic tenet of end-to-end connectivity of data and services is, I think, satisfied by the IP layer. Part of my question was about the extent to which this end-to-end-ness needs to be duplicated at higher layers. Graham Klyne ([EMAIL PROTECTED])
Re: Internationalization and the IETF (Re: Will Language Wars Balkanize the Web?)
The notion that use of languages other than English can or should be 'localized' strikes me as both shockingly arrogant and hopelessly naive. People can and will use their own languages on the Internet - in email, on the web, and in domain names, and without regard to their location in either the physical world, the currently topology of the network, or the TLD of the host they are using at the moment. Furthermore, a great many people use multiple languages (not necessarily including English) is, so that a given person, host, or subnetwork will often need to exist in multiple (potentially competing) locales at once. And while a great many people - who speak only a single langauge, and whose travels are confined to a small geographic area where others speak only that language - might indeed be happy with a localized solution, adoption of purely localized solutions would impair the vast number of people who do not fall into that category. The question is not whether people will use non-ASCII characters in domain names, but whether the various uses of non-ASCII characters will coexist peacefully with each other and with existing applications, and whether applications will continue to interoperate with one another. So while it's quite important that IDNs be able to be represented in ASCII for compatibility with existing applications and the large number of protocols that use DNS names as protocol elements, and even though we all understand that pure-ASCII, Romanized versions of non-English names will continue to enjoy wide use -- we still need to produce an IDN standard as soon as possible. Fortunately the IDN group is making very good progress, and I'm confident that consensus around a concrete proposal will soon emerge. Keith
Re: Will Language Wars Balkanize the Web?
At 06:06 PM 12/3/00 -0500, Betsy Brennan wrote: >But the Internet is not the postal system nor the phone system. We already >have the postal system and the phone system. They may be slower, but does >that mean they should be replaced or that the Internet must duplicate what >these systems do? BLB you missed it. Suppose you could not exchange in commerce with a person of a given nationality, not because you did not have a language in common with him or her, but because your system could not interpret his or her name. That would mean that you could not spend money in that person's direction, because you could not communicate with him or her. Although IP datagrams could get from you to him/her, there would not be a good way to determine what address to send them to. That would be pretty tough.
Re: Internationalization and the IETF (Re: Will Language Wars Balkanize the Web?)
At 01:58 AM 12/7/00 -0700, Vernon Schryver wrote: > > From: Harald Alvestrand <[EMAIL PROTECTED]> > > it may have escaped the notice of some that a fair bit of the > discussion on > > diacritcs was carried out using live examples, > >Diacritical marks are no different from Cyrillic, Arabic, Greek, Hebrew, >Sanskrit, and other non-Latin character sets in not being not part of >the international language. The goal of communicating is to communicate, >not wave flags in support of national languages. In a sense, Harald's observation points out a case in which all those other sets very much ARE part of the "international" language. The live examples were a) intended, b) appropriate, and c) successful. It does not matter whether readers understood the semantics of the strings; they needed to be able to see them. That is not national flag waving. That is global utility. > > MIME character sets is an example of a battle fought and won. > >When MIME is used to pass special forms among people whose common >understandings including more or other than ASCII, MIME is a battle >fought and won. >When MIME is used to send unintelligible garbage, it is a battle fought >and lost. Technical standards work often gets distracted by trying to deal with issues that are outside the scope of reasonable technical standards work. It should not be the task of such work to dictate or constrain users to only socially acceptable behavior. That is a social task, not a technical one. Choosing to send various types of data requires making decisions about the context. No technical standard can be designed to "automatically" determine when it is, or is not, appropriate to send that data, whether it is diacritical marks, kanji, or an excel spread sheet. Even when the sender has information about recipient capabilities, social factors affect the choices. So sending such data in MIME inappropriately is STILL an example of a battle fought and won. At least the recipient has the unintelligible data well isolated and labeled. MIME did its job. At 08:19 AM 12/6/00 -0500, vint cerf wrote: >Even if we introduce extended character sets, it seems vital >that there be some form of domain name that can be rendered >(and entered) as simple IA4 characters to assure continued >interworking at the most basic levels. This suggests that >there is need for some correspondence between an IA4 Domain >Name and any extended characterset counterpart. The same task is at issue for the DNS as it was for MIME. We need a mechanism for labeling and encoding DNS strings and, I believe, we need it to be added to the existing DNS. Users of those strings will be all over the world, not just in a particular locale. The need for this capability is massive and immediate. There WILL be a solution deployed. In fact there already is. The question is whether a coherent extension to DNS will be done in a fashion which will keep the DNS integrated, or whether this requirement produces an independent DNS. That's not flag-waving. That's multiple DNS namespaces. We need to be careful to distinguish two different requirements. One is for a mechanism to encode domain names in non-ascii character sets. The second is for an equivalence mapping from non-ascii domain names into ascii domain names. The former is so that the technical and operational aspects of the DNS remain coherent. The latter is so that everyone has a way to reach a particular domain, even if they cannot generate the non-ascii form of the name. The extreme form of the latter task involves ascii encodings that are "comfortable" for human users; that requirement is not solved in human non-technical situations. I believe that the example of alternate choices of "jin" and "gin" as representations for some chinese character(s) was used. Hence this extreme form of the task is not going to be solved by lowly IETF protocol designers. At best, use of ACE-like encodings permits an ascii representation, albeit one that is "uncomfortable". It is as far as the IETF should go in trying to permit a "universally accessible" form for all domain names. Interestingly, we do not need to have all domain names exchange and stored in an ACE form, forever. Just as MIME is able to support pure binary encodings, so can the DNS. The ACE form can be mapped to when needed. d/ =-=-=-=-= Dave Crocker <[EMAIL PROTECTED]> Brandenburg Consulting Tel: +1.408.246.8253, Fax: +1.408.273.6464
Will Language Wars Balkanize the Web? ICANN timing
James Salsman [EMAIL PROTECTED] said: >I don't know why ICANN would want to bring such a heavy burden >upon themselves in an area of such flux so soon, when they have >so much else that they have already committed to do. Dan K says: Well, the actual announcement at ICANN.Org doesn't really even hint at a timetable. I think its got to happen, (IDN DNS). But perhaps not too quickly. I think, the registrars that are pushing the envelope to apply this prematurely are doing there customers somewhat of a dissservice. Then again testing has to happen sooner or later. I don't think there is anyone that wants to trade off doing it quickly for doing it right. Getting more non technocrats involved would help to judge the reality check parts of all of this. But that seems hard to arrange on a meaningful scale without committing to the whole thing. Regs, Dan Kolis
Re: Internationalization and the IETF (Re: Will Language Wars Balkanize the Web?)
> From: Harald Alvestrand <[EMAIL PROTECTED]> > >The same thinking that says that MIME Version headers make sense in > >general IETF list mail also says that localized alphabets and glyphs must > >be used in absolutely all contexts, including those that everyone must > >use and so would expect to be limited to the lowest common denominator. > > it may have escaped the notice of some that a fair bit of the discussion on > diacritcs was carried out using live examples, and while I am sure there > were some who did not see the diacritics on screen, at least there was a > single definition of how to get from what was sent on the wire to what > might have been displayed on the screen, and MANY of the participants > actually saw them correctly displayed. Diacritical marks are no different from Cyrillic, Arabic, Greek, Hebrew, Sanskrit, and other non-Latin character sets in not being not part of the international language. The goal of communicating is to communicate, not wave flags in support of national languages. When you are trying to talk to strangers and have no clue about their languages, you are a fool to not use the common, international language, no matter how poor and ugly it is. > MIME character sets is an example of a battle fought and won. When MIME is used to pass special forms among people whose common understandings including more or other than ASCII, MIME is a battle fought and won. When MIME is used to send unintelligible garbage, it is a battle fought and lost. Whether the garbage is HTML, the latest word processing format from Redmond or a good representation of the mother tongue of 1,000,000,000 people is irrelevant to whether the use of MIME is wise or foolish. If the encoding is not known before hand to be intelligible to its recipients, then the use of MIME is foolish. MIME is a good *localization* mechanism, either in geography or culture or in computer applications (e.g. pictures or sound). The continuing IETF efforts to extend MIME to include yet more extra or special forms in the vague hope that the recipient will surely be able to interpret at least one is probably the best of what we can expect from "internationalized" domain names in 2 or 3 years. Unless something like Vint Cerf's principle of encoding *localized* domain names in ASCII is followed, the IDN efforts will at best repeat the history of MIME email exemplified by the many Microsoft MIME formats. In MIME, except in special cases, the "universal" form of the body is either sufficient and the fancy versions useless wastes of cycles, storage, and bandwidth, or the "universal" form can only say "sorry, better upgrade your system." Just as in the vast majority of HTML+ASCII email where there is can be no useful difference and there is rarely a visible difference between the ASCII plaintext and the HTML encrypted version, *localized* domain names will either be unusable outside their native provinces or they will be usable with a 7-bit ASCII keyboard. Vernon Schryver[EMAIL PROTECTED]
Internationalization and the IETF (Re: Will Language Wars Balkanize the Web?)
At 15:35 06/12/2000 -0700, Vernon Schryver wrote: >The same thinking that says that MIME Version headers make sense in >general IETF list mail also says that localized alphabets and glyphs must >be used in absolutely all contexts, including those that everyone must >use and so would expect to be limited to the lowest common denominator. it may have escaped the notice of some that a fair bit of the discussion on diacritcs was carried out using live examples, and while I am sure there were some who did not see the diacritics on screen, at least there was a single definition of how to get from what was sent on the wire to what might have been displayed on the screen, and MANY of the participants actually saw them correctly displayed. MIME character sets is an example of a battle fought and won. -- Harald Tveit Alvestrand, [EMAIL PROTECTED] +47 41 44 29 94 Personal email: [EMAIL PROTECTED]
Re: Will Language Wars Balkanize the Web? & P.S. Eudora/PalmOS
Masataka Ohta and Vernon Schryver make excellent points in favor of the domain name status quo. I agree that IDN should be frozen for at least a few years to see what local domain admins and application vendors tend to do, especially since the pieces of the likely solutions (such as the competing UTF-8 encodings) are so still so new and somewhat under development. I don't know why ICANN would want to bring such a heavy burden upon themselves in an area of such flux so soone, when they have so much else that they have already committed to do. This thread reminded me of these news items, only two days apart: http://abcnews.go.com/sections/travel/DailyNews/FrenchintheSkies000404.html http://abcnews.go.com/sections/travel/DailyNews/BacktoFrenchinSkies000406.html Cheers, James P.S. By the way, on my usual topic of wireless asynchronous voice messaging, here is a news article in which Qualcomm founder and chief Irwin Jacobs asserts that "voice-enabled capabilities" "could prove popular" on third-generation mobile phones: http://biz.yahoo.com/rf/001206/hkg15073_2.html I suppose Irwin Jacobs is the person to ask for MIME audio attachment record and play in Eudora email on the PalmOS. Please ask in person if you see him in San Diego!
Re: Will Language Wars Balkanize the Web?
> From: Masataka Ohta <[EMAIL PROTECTED]> > ... > > (Can we please move this discussion to the IDN list, where it > > belongs?) > > The point is that IDN WG is purposeless and is wrong to exist. Of > course, it is waste of time to discuss it in IDN list Masataka Ohta is raising a point of order, and from what I've seen of other "internationalization" efforts, it is probably more valid than not. That the IETF's effort nominally involves "internationalization" instead of "localization" is bad sign. Since I first encountered "internationalization" hassles in the late 1970's in making an ASCII+EBCDIC system behave tolerably for people typing and reading Arabic and Hebrew text, I've found that "internationalization" is both technically hard and incredibly Politically Correct. Some people like to hoist standardized flags that today bear "Respect for Diversity" and start marching over cliffs--no, that's wrong. In Politically Correct issues, the standards bearers tell everyone else to march over the cliff while they stand to attention nearby. Once an "internationalization" organization gets started, it *never* stops, no matter how many of the original participants get wise and quit, what obviously false premise is required to justify the latest conclusion, nor what damage has already been done (not to mention contemplated) in the product, standard, protocol, or whatever justifies the existence of the internationalization organization. "Is the new version equally and completley useless for both domestic and overseas users?--Great, let's fix the next one." It took me about 10 years and more than one "internationalization" organization to reach that politically incorrect conclusion. > ... > If people want local names let them have them under local domains, > with all the local conventions on encoding and everything. > > The administrator of the local domains may or may not force people > have additional internatinalized domain names. > > Note that local, here, means culturally (not necessarily geographically) > local that ccTLDs may or maynot be the local domains. > > But, it can be said that gTLDs are not a proper place to put local > names. The same thinking that says that MIME Version headers make sense in general IETF list mail also says that localized alphabets and glyphs must be used in absolutely all contexts, including those that everyone must use and so would expect to be limited to the lowest common denominator. When confronted with fact that ANSI X3.4 (ASCII) is a provincial U.S. variant of an international standard, otherwise rational people flinch and claim that sending anything but 7-bit ASCII to major IETF lists is not merely an unthinking waste of bandwidth but must be supported and encouraged. They justify such nonsense with talk like: ]diversity of list ] contributors' networking interests and experience (culture), which include ] people who happen to find it cost-effective to use such things as ] formatting and unusual character sets in their email. MIME is as much a ] part of the Internet culture as any standard (apologies to the author of that private message) It is a mystery to me why otherwise reasonable people who would never dream of imposing their own idiosyncracies on everyone else demand that others not only be allowed but encouraged to do so. In other words, people have trouble understanding that "internationalization" necessarily means restricting to the lowest common internatational denominatior instead of the impossible goal of simultaneously supporting absolutely all possible languages and glyphs. Vernon Schryver[EMAIL PROTECTED]
Re: Will Language Wars Balkanize the Web?
John; > (Can we please move this discussion to the IDN list, where it > belongs?) The point is that IDN WG is purposeless and is wrong to exist. Of course, it is waste of time to discuss it in IDN list. So, the only reasonable reaction is to ignore it (I dropped improper CC:). The only necessary discussion on domain names, IF ANY, is localization issues, for which there is no specific WG of IETF. > (iii) Regardless of how the names in the DNS are coded, it is > important to have analogies to "two sided business cards". A typical business card of Japanese have Chinese characters. When we internatinalize it, we use the other side to put a Lain character version. As we already have fully internatinalized DNS with Latin characters, Chinese characters in DNS is localization against internationalization. > And, because of the > registration issue, there is no plausible way to impose a > requirement that every host (or other DNS entry) have a name in > ASCII if it has a name in some other script: people and hosts > not visible outside their own countries may not care enough to > go to the trouble. That are local issues. If people want local names let them have them under local domains, with all the local conventions on encoding and everything. The administrator of the local domains may or may not force people have additional internatinalized domain names. Note that local, here, means culturally (not necessarily geographically) local that ccTLDs may or maynot be the local domains. But, it can be said that gTLDs are not a proper place to put local names. Masataka Ohta
RE: Will Language Wars Balkanize the Web?
I can't agree more. -Original Message- From: John C Klensin [mailto:[EMAIL PROTECTED]] Sent: 06 December 2000 16:46 To: vint cerf Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: Will Language Wars Balkanize the Web? (Can we please move this discussion to the IDN list, where it belongs?) --On Wednesday, 06 December, 2000 08:19 -0500 vint cerf <[EMAIL PROTECTED]> wrote: > Mr. Ohta has put his finger on a key point: ability of all > parties to generate email addresses, web page URLs and so on. > Even if we introduce extended character sets, it seems vital > that there be some form of domain name that can be rendered > (and entered) as simple IA4 characters to assure continued > interworking at the most basic levels. This suggests that > there is need for some correspondence between an IA4 Domain > Name and any extended characterset counterpart. Vint, I think I agree with the principle. However, there are several different models with which the "correspondence" can be implemented. The difference among them is quite important technically --implementations would need to occur in different places and with different implications, deployment times, and side effects-- and perhaps as important philosophically. E.g., let me try to identify some of the in extreme form to help identify the differences: (i) The names in the DNS are "protocol elements". They should be expressed in a minimal subset of ASCII so that they can be rendered and typed on almost all of the world's equipment (the assumption that, e.g., all Chinese or Arabic keyboards and display devices in the medium to long term will contain Roman characters seems a little dubious). There is no requirement that they be mneumonic in any language: in principle, a string containing characters selected at random would do as well as the name of a company, person, or product. This model gives rise to directory and keyword systems (most of them outside the DNS) that contain the names that people use. While the registration and name-conflict problems are non-trivial, names in multiple languages and character codings can easily map onto a single DNS identifier. On the other hand, binding a national-language name to an ASCII name would need to be done either by parallel registrations or by matching on keywords (and the latter might not yield unambiguous and accurate results). (ii) Entries in the DNS are always coded. After all, "ASCII" is just a code mapping between a human-visible character set and a machine (or wire) representation. It is the job of an application to get from "characters" to "codes" and back, and to recognize coding systems and applying the correct decodings. And software that is old or broken will simply display a different rendering of the coded form (whether that is a "hexification" such as Base64 or some other system). This model gives rise to the "ACE all the way up" models, in which non-ASCII names are placed in the DNS using some tagging system, but the "ASCII representation" of a name that, in the original, uses non-Roman characters, may be quite ugly and bear no connection with the name as it would be rendered using the original characters other than an algorithmic one. It also gives rise to some of the UTF-8 models, on the assumption that applications that can't handle the full IS 10646 character set can do something intelligent. (iii) Regardless of how the names in the DNS are coded, it is important to have analogies to "two sided business cards". Such systems assume that any name rendered in a non-Roman character set should have an analogue in Roman characters. And those analogues are expected to be bound to the original form by transliteration or translation -- they aren't just random, algorithmically matching, strings. While there need to be facilities for the non-Roman (even non-ASCII) characters in either the DNS or a directory, establishing the "ASCII names" is, of necessity, a registration issue rather than an algorithmic issue. We don't know how to do the "translation" (or, in the general case, even transliteration) algorithmically. To give one example, despite the "Han unification" of IS 10646, the characters on a Japanese business card for you would almost certainly be different from those on a Chinese business card for you.And, because of the registration issue, there is no plausible way to impose a requirement that every host (or other DNS entry) have a name in ASCII if it has a name in some other script: people and hosts not visible outside their own countries may not care enough to go to the trouble. These models are not mutually exclusive. But they are definitely different perspectives. It is also worth noting that, as a matter of perspective, the dominance of subsets of ASCII in these debates has some
Re: Will Language Wars Balkanize the Web?
(Can we please move this discussion to the IDN list, where it belongs?) --On Wednesday, 06 December, 2000 08:19 -0500 vint cerf <[EMAIL PROTECTED]> wrote: > Mr. Ohta has put his finger on a key point: ability of all > parties to generate email addresses, web page URLs and so on. > Even if we introduce extended character sets, it seems vital > that there be some form of domain name that can be rendered > (and entered) as simple IA4 characters to assure continued > interworking at the most basic levels. This suggests that > there is need for some correspondence between an IA4 Domain > Name and any extended characterset counterpart. Vint, I think I agree with the principle. However, there are several different models with which the "correspondence" can be implemented. The difference among them is quite important technically --implementations would need to occur in different places and with different implications, deployment times, and side effects-- and perhaps as important philosophically. E.g., let me try to identify some of the in extreme form to help identify the differences: (i) The names in the DNS are "protocol elements". They should be expressed in a minimal subset of ASCII so that they can be rendered and typed on almost all of the world's equipment (the assumption that, e.g., all Chinese or Arabic keyboards and display devices in the medium to long term will contain Roman characters seems a little dubious). There is no requirement that they be mneumonic in any language: in principle, a string containing characters selected at random would do as well as the name of a company, person, or product. This model gives rise to directory and keyword systems (most of them outside the DNS) that contain the names that people use. While the registration and name-conflict problems are non-trivial, names in multiple languages and character codings can easily map onto a single DNS identifier. On the other hand, binding a national-language name to an ASCII name would need to be done either by parallel registrations or by matching on keywords (and the latter might not yield unambiguous and accurate results). (ii) Entries in the DNS are always coded. After all, "ASCII" is just a code mapping between a human-visible character set and a machine (or wire) representation. It is the job of an application to get from "characters" to "codes" and back, and to recognize coding systems and applying the correct decodings. And software that is old or broken will simply display a different rendering of the coded form (whether that is a "hexification" such as Base64 or some other system). This model gives rise to the "ACE all the way up" models, in which non-ASCII names are placed in the DNS using some tagging system, but the "ASCII representation" of a name that, in the original, uses non-Roman characters, may be quite ugly and bear no connection with the name as it would be rendered using the original characters other than an algorithmic one. It also gives rise to some of the UTF-8 models, on the assumption that applications that can't handle the full IS 10646 character set can do something intelligent. (iii) Regardless of how the names in the DNS are coded, it is important to have analogies to "two sided business cards". Such systems assume that any name rendered in a non-Roman character set should have an analogue in Roman characters. And those analogues are expected to be bound to the original form by transliteration or translation -- they aren't just random, algorithmically matching, strings. While there need to be facilities for the non-Roman (even non-ASCII) characters in either the DNS or a directory, establishing the "ASCII names" is, of necessity, a registration issue rather than an algorithmic issue. We don't know how to do the "translation" (or, in the general case, even transliteration) algorithmically. To give one example, despite the "Han unification" of IS 10646, the characters on a Japanese business card for you would almost certainly be different from those on a Chinese business card for you.And, because of the registration issue, there is no plausible way to impose a requirement that every host (or other DNS entry) have a name in ASCII if it has a name in some other script: people and hosts not visible outside their own countries may not care enough to go to the trouble. These models are not mutually exclusive. But they are definitely different perspectives. It is also worth noting that, as a matter of perspective, the dominance of subsets of ASCII in these debates has some important technical advantages (e.g., the code set can be made very small and the canonicalization/matching rules are algorithmic, universally-agreed, and trivial), but it is also significantly an historical accident. Because of that historical accident, we tend to couch these discussions (as your note does and as I have done above) in terms of ASCII <-> something-else mappings. But it isn't hard to imagi
Re: Will Language Wars Balkanize the Web?
Mr. Ohta has put his finger on a key point: ability of all parties to generate email addresses, web page URLs and so on. Even if we introduce extended character sets, it seems vital that there be some form of domain name that can be rendered (and entered) as simple IA4 characters to assure continued interworking at the most basic levels. This suggests that there is need for some correspondence between an IA4 Domain Name and any extended characterset counterpart. Vint At 07:32 PM 12/6/2000 +0859, you wrote: >And, if a mailto URL is within a webpage with a chinese character >anchor, it does not matter whether a mail address in the URL >consists of pure ASCII characters or not. > >> It's worth nothing that my computer could handle the address if I can't. > >You properly understand that the current ASCII DNS is already >fully internationalized.
Re: Will Language Wars Balkanize the Web?
Claus; > vint cerf <[EMAIL PROTECTED]> schrieb/wrote: > > Incorporating other character sets without deep technical > > consideration will risk the inestimable value of interworking across > > the Internet. It CAN be done but there is a great deal of work to make > > it function properly. > > How do I type chinese characters? I can't. So I can't write mail to > someone whose email address contains non-ASCII characters if I don't > already have the address in electronic form (e.g. within a webpage). Right. And, if a mailto URL is within a webpage with a chinese character anchor, it does not matter whether a mail address in the URL consists of pure ASCII characters or not. > It's worth nothing that my computer could handle the address if I can't. You properly understand that the current ASCII DNS is already fully internationalized. Masataka Ohta
Re: Will Language Wars Balkanize the Web?
Ran; > At 02:53 05/12/00, Martin J. Duerst wrote: > >At 00/12/04 10:42 -0800, Christian Huitema wrote: > >>So, at a minimum, we need an IETF > >>specification on how to detect that a domain name part is using a non ascii > >>encoding, so that DNS servers don't get lost. > > > >Why not just use UTF-8? It is an encoding of the UCS (aka > >Unicode/ISO 10646), the encoding is fully compatible with > >ASCII (all 7-bit bytes are ASCII and only ASCII), and it > >is IETF policy (RFC 2277). > > All, > > Please MOVE this conversation to the IDN WG list, > where it would be in scope. Btw, this specific question > has been raised and answered several times now on the IDN list. > I encourage folks to read the sundry IDN proposals before > diving in any deeper here. IDN is the perfect place for repeated silly conversation like above. But it is not the place to discuss localized domain name, which has nothing to do with internationalization. Masataka Ohta
Re: Will Language Wars Balkanize the Web?
At 18.05 +0900 00-12-05, Martin J. Duerst wrote: >ACE is (maybe) for machines. It's not primarily intended for humans. >We may have ACE all the way (including TLD). It might be usable as a >poor man's ASCII equivalent, but I strongly doubt that anybody will >want to have it on the Latin side of their name card. I would, because I know that people in many parts of the world don't know how to enter "sömos" on their keyboard, and if I register the domain "snömos.se", I really want people to be able to get to http://www.snömos.se so, if I think it is perfectly alright to have http://www.bq--abzw55tnn5zq.se on my buissnes card (aswell). paf --
Re: Will Language Wars Balkanize the Web?
Martin, I'll send you a copy of the "@sign vs !path" debate from my USENIX papers archive. See "Pathalias: or The Care and Feeding of Relative Addresses" by Honeyman and Bellovin, undated, at http://www.uucp.org/papers/pathalias.pdf. Speculations on the general utility and availability of "single" encoding schemes or some approximation of limited ambiguity code-set mapping(s) should not displace actual data. The claim that iso10646 is "good" is not improved by non-reference to the costs and benefits of ASCII-colliding encodings (EBCDIC, SJIS, etc.), just as the "interoperability" claim is not improved by non-reference to the operational deployment of serviceable encoding. Ignoring the daft peculiarities of particular encodings (and ANSI C) such as NULLs in strings (or file names), what I learned from owning the i18n problem at Sun was that a program of code-set indepence had time-to-market, sustaining engineering, and ease of implementation arguements over a program of opportunistic code-set dependence (the industry standard practice), and as a matter of convience, that the XPG/3 locale model made a utf8 locale a minor cost item, and an interal convenience mechanism. It was a compelling case who's hardest technical issue was dynamic character width determination in the bottom-half of the tty subsystem. I mention this to contrast it with substition of UTF8 (or any fixed-width multi-octet encoding scheme) dependence for ASCII dependence, or the common form of an addition of an "alternate code path" which affords run-time selection of one of two code-set dependent processing mechanisms. >From my perspective, the IETF has preferred the second form of solution to the problem since the appearence of rfc2130. See also the following rfcs: 0373, 1345, 1468, 1489, 1502, 1555, 1557, 1815, 1842, 1922, 1947, 2237, and 2319. As I pointed out to you over lunch Thursday at the W3C AC meeting, the i18n problem is not simplified by the constraint which requires reference to iso639, or iso3166. While few APRAnauts have an evident interest in the problem of Euro-American Americanist hobbiests getting the fundamentals of Cherokee wrong (or care that there are three Cherokee polities), in an ISO normative reference (iso10646), on other lists (ICANN cluttered) Americans of sundry "liberties" pursuasions are quite worked up that Euro-American Sinology hobbiests are not, or may not, have precedence over Chinese governmental and cultural institutions on the operational deployment of Chinese language elements in the DNS (CNNIC vs Verisign). A related question is whether the i18n problem is simplified by a constraint which requires reference to the IAB Technical Comment on the Unique DNS Root, a constraint which adds, without reflection, the constraints of iso3166 to the dns-i18n problem set. Again, from my perspective, several sets of critics of the IANA transition(s), and its reluctant proponents, have overloaded the dns-i18n problem set as either an escape mechanism from uniqueness of the DNS root, or as a problem which cannot be solved except by preservation of the same property (uniqueness). Neither party appear to be motivated by the interests of users of ASCII colliding or pre-iso10646 (et alia) encodings, or users without practicable means to use their preferred writing (or signing) systems. Assuming a heterogenity of end-systems, each possibly with a heterogenous set of character encoded applications with some cut-buffer mediation mechanism, e.g., a (encoding-neutral or encoding-preferential) windowing system for transparent, or converted reads and write operations between end-system resident applications, and a DNS resolver library with access DNS service, and no additional constraints (these are enough, thanks!), is UTF-8 _the_ compelling answer? The attractions of Universalism still appear to be compelling, only if some non-technical, or ancilliary service model is controlling. Unfortunately, the utility of Particularism is temporarily hijacked anywhere near the DNS by partizans of one convention or its converse. If next-hop has a case for forwarding, then it is surprising that the case can't be applied to forwarding, except for opaque datagrams. Cheers, Eric P.S. I forgot to work in NATs and VPNs. Sigh.
Re: Will Language Wars Balkanize the Web?
"Martin J. Duerst" wrote: > The mixed case is not too > important for us, as discussed above. I think it can be, actually. Suppose you've got someone living in Spain, whose father is Spanish and whose mother is Japanese. His full surname, then, is something like Ohta y Montoya (or maybe the other way around; I don't remember). Now he wants to get a vanity domain, with "Ohta" in Japanese characters and "y Montoya" in Roman letters. He needs to be able to mix character sets. -- /===\ |John Stracke| http://www.ecal.com |My opinions are my own. | |Chief Scientist |==| |eCal Corp. |"Fate just isn't what it used to be." --Hobbes| |[EMAIL PROTECTED]| | \===/
Re: Will Language Wars Balkanize the Web?
> Really big post offices have special places to handle things such > as incomplete addresses. Nothing guaranteed, but if you are lucky, > you may even successfully send a letter from an arbitrary place to > anywhere in the world using local addressing, at least if you don't > forget the country name in the local script. tagging, eh?
RE: Will Language Wars Balkanize the Web?
At 02:53 05/12/00, Martin J. Duerst wrote: >At 00/12/04 10:42 -0800, Christian Huitema wrote: >>So, at a minimum, we need an IETF >>specification on how to detect that a domain name part is using a non ascii >>encoding, so that DNS servers don't get lost. > >Why not just use UTF-8? It is an encoding of the UCS (aka >Unicode/ISO 10646), the encoding is fully compatible with >ASCII (all 7-bit bytes are ASCII and only ASCII), and it >is IETF policy (RFC 2277). All, Please MOVE this conversation to the IDN WG list, where it would be in scope. Btw, this specific question has been raised and answered several times now on the IDN list. I encourage folks to read the sundry IDN proposals before diving in any deeper here. Thanks, Ran
Re: Will Language Wars Balkanize the Web?
however the value of the public Internet is surely in its widespread accessibility and interoperability. vint At 05:10 PM 12/5/2000 +0900, Martin J. Duerst wrote: >I think there is a difference between making it technically possible >for everybody to participate in whatever community they want, and >forcing anybody to do so. Internet technology has shown that it's >quite usable in local circumstances (the best example in Intranets).
Re: Will Language Wars Balkanize the Web?
At 00/12/04 08:15 -0500, Dave Crocker wrote: >Thank you. I was hoping someone would point out the support for parallel >operation so we could go further down that path. As you note, it seems to >be the closest to providing local/global support already. > >That means postal gives us: > >1. Global support for a common "character set" > >2. Global support for a carefully mixed character set -- though really it >is just a partitioning between the global field and the local field > >3. Local support for a local character set. > >(the support goes beyond character set, but let's leave it at that if >that's ok.) > >An immediate problem with comparing to postal is that it somewhat >correlates with the path a letter will take, so that the incremental >interpretation can be done by groups with different language skill-sets. Really big post offices have special places to handle things such as incomplete addresses. Nothing guaranteed, but if you are lucky, you may even successfully send a letter from an arbitrary place to anywhere in the world using local addressing, at least if you don't forget the country name in the local script. >The DNS does not have that flexibility and the domain name interpretation >is not part of the transfer sequence of the data. Yes, there are quite some differences. The advantage we have is that as soon as the characters are somehow in the computer, everything else is mechanical. This means there is no need for a global field; if somebody is able to type in the address, that's it, the machine does the rest. >Schemes that put an ACE-like field into a .com might be considered to be >like #2, above, by really they are not. The whole string is still global. ACE is (maybe) for machines. It's not primarily intended for humans. We may have ACE all the way (including TLD). It might be usable as a poor man's ASCII equivalent, but I strongly doubt that anybody will want to have it on the Latin side of their name card. >Frankly this leaves me viewing the postal example as pretty unhelpful for >finding a solution to the DNS requirement. Well, the postal example shows how Latin and other scripts can both be used to address something. The mixed case is not too important for us, as discussed above. In the postal example, conversion from one notation to the other is a complex process (in particular for Japanese, lookup in context is absolutely necessary). So I don't expect that something purely mechanical (e.g. ACE) will do for DNS. >On the other hand, this thread was triggered by Graham's question about >the negative impact of partitioning. The postal example would seem to >show that the effect is not so bad. >Except I would claim that it is not partitioning. Note that an address >always has a global representation, in addition to a possibly different >local one. It's a kind of partitioning, in that it is not always easy, for everybody, to do use the 'local' address or to convert from a local to a global one. >Perhaps that can reconciled as easily as claiming that any 'local' domain >name must also have a global form? (But, somehow, the word "scaling" gets >in the way of believing that.) Scaling would be only by a factor 2. Regards, Martin.
Re: Will Language Wars Balkanize the Web?
At 00/12/04 19:58 -0500, Eric Brunner wrote: > > I guess one of the first questions should be; "Is some partitioning of > the > > Internet community such a bad thing?"... > >If the "partition" intended for discussion is "@sign vs !path" addressing >conventions, Eric Allman and Peter Honeyman have left a discussion archive >on the subject. Any pointers? >Arguably the universalist thesis understated the drawbacks >of anyone having the capability of addressing everyone anywhere. Clueless >users is only one possible policy model -- a point made by Peter then, and >equally valid now. > >Personally I'm underwhelmed by the universalism advocated by the members >of the UNICODE Consortium, a single encoding scheme of necessity comes to >peripheral markets late in their adoption of computerized writing systems, >and their integration into a rationalized global system is not obviously a >boon to their pre-integration service models. Unicode came late to everybody's adoption of computerization of writing. Most probably the delay is much longer for central markets than for peripheral markets, but that would have to be checked. Also, one main factor in the delay in many cases is the amount of time it takes for the specific 'market' to agree on a single encoding scheme, or encoding table, locally. In some cases (e.g. Korean), this is due to the wide range of choices that the script offers for encoding. In other cases, this is due to the fact that it takes some time (up to one generation) for all the people who have proposed and implemented different encodings not only to realize that everybody would benefit from a single encoding, but also to accept that to a large extent, which single encoding is chosen is by way less important than that a single one is chosen. >On the up-side, large user bases need not adapt to extraneous requirements >for participating in the "Internet community", and Universalist Credos may >fail in the markets (plural intended). I think there is a difference between making it technically possible for everybody to participate in whatever community they want, and forcing anybody to do so. Internet technology has shown that it's quite usable in local circumstances (the best example in Intranets). Regards, Martin.
RE: Will Language Wars Balkanize the Web?
At 00/12/04 10:42 -0800, Christian Huitema wrote: >So, at a minimum, we need an IETF >specification on how to detect that a domain name part is using a non ascii >encoding, so that DNS servers don't get lost. Why not just use UTF-8? It is an encoding of the UCS (aka Unicode/ISO 10646), the encoding is fully compatible with ASCII (all 7-bit bytes are ASCII and only ASCII), and it is IETF policy (RFC 2277). Regards, Martin.
Re: Will Language Wars Balkanize the Web?
> I guess one of the first questions should be; "Is some partitioning of the > Internet community such a bad thing?"... If the "partition" intended for discussion is "@sign vs !path" addressing conventions, Eric Allman and Peter Honeyman have left a discussion archive on the subject. Arguably the universalist thesis understated the drawbacks of anyone having the capability of addressing everyone anywhere. Clueless users is only one possible policy model -- a point made by Peter then, and equally valid now. Personally I'm underwhelmed by the universalism advocated by the members of the UNICODE Consortium, a single encoding scheme of necessity comes to peripheral markets late in their adoption of computerized writing systems, and their integration into a rationalized global system is not obviously a boon to their pre-integration service models. > PS: I think it is without doubt that it is a Good Thing that we make > efforts to internationalize protocols ... Even less satisfactory is the practice of generalizing ASCII (nee BCD) to encodings with more than 256 code points, via this universalist scheme and no other. To advance from ASCII to ASCII-plus-UTF8 could be just as well characterized as SJIS/GB/Big5/... (and their uses) depricated. > ... my comments/questions are an > attempt to explore how far this process can reasonable go. The i18n problem isn't trivial, and isn't advanced by problematic essays, good intentions, or American (actual and honorary) indulgences. On the up-side, large user bases need not adapt to extraneous requirements for participating in the "Internet community", and Universalist Credos may fail in the markets (plural intended). As for poking the ICANN mess in the eye with a sharpened brush on the IETF list prior to a meeting, it is clumsy slight-of-hand and a poor substitute for work on writing system support. See also the W3C WAI for information encoding and presentation systems which are not "writing". Kitakitamatsinopowaw, Eric
Re: Will Language Wars Balkanize the Web?
> So, at a minimum, we need an IETF > specification on how to detect that a domain name part is using a non ascii > encoding, so that DNS servers don't get lost. We need a great deal more than that. The real impact of internationalizing DNS names isn't with the DNS protocol or software itself (you can probably do it without any changes to these), it is the applications that make assumptions about character encodings used in DNS names and/or place their own limitations on the allowable characters in DNS names. Keith
RE: Will Language Wars Balkanize the Web?
> On Sun, 03 Dec 2000 13:17:45 EST, vint cerf <[EMAIL PROTECTED]> said: > > to incorporate and refer to domain names. The IA4 alphabet > includes essentially > > just the letters A-Z, numbers 0-9 and the "-" (dash). This > is the limit of what > > is allowed in domain names today. > > The sad part is, of course, that RFC1035, section 3.1 > specifically says > that any octet value is legal. The restrictions that Vint mentions are actually restrictions on the domain name part of email addresses, as specified in RFC-821. The DNS system itself does not has such restrictions; this allows for example RFC 2782 to specify the use of the "illegal" character _ (underline) in some domain name parts. The main restriction in the DNS itself is the comparison rule embedded in the system, that says that domain names are case independent. Case comparison is indeed specific to the alphabet code, and in fact is often times language dependent. The matter is already muddy for European languages. In a case independent comparison in French, e-acute matches the accentless e; in German, u-umlaut could match the digraph "ue"; DNS servers don't do such matches, but at least they do the binary comparison right when an 8-bit alphabet is a superset of ASCII. But the matter indeeds gets more complex when the characters are encoded on 16 bits, when either the top or the bottom could be misinterpreted as a lower or upper case ascii letter, resulting in incorrect matches. So, at a minimum, we need an IETF specification on how to detect that a domain name part is using a non ascii encoding, so that DNS servers don't get lost. -- Christian Huitema
Re: Will Language Wars Balkanize the Web?
>Wasn't there a Dilbert cartoon regarding sending a page to a pager number >containing a caret? ;) It was a tilde. ;-) RGF Robert G. Ferrell, CISSP Information Systems Security Officer National Business Center U. S. Dept. of the Interior [EMAIL PROTECTED] Who goeth without humor goeth unarmed.
Re: Will Language Wars Balkanize the Web?
On Sun, 03 Dec 2000 13:17:45 EST, vint cerf <[EMAIL PROTECTED]> said: > to incorporate and refer to domain names. The IA4 alphabet includes essentially > just the letters A-Z, numbers 0-9 and the "-" (dash). This is the limit of what > is allowed in domain names today. The sad part is, of course, that RFC1035, section 3.1 specifically says that any octet value is legal. But I guess we're stuck with the IA4 charset ;( -- Valdis Kletnieks Operating Systems Analyst Virginia Tech PGP signature
Re: Will Language Wars Balkanize the Web?
On Sun, 03 Dec 2000 16:00:53 PST, lists <[EMAIL PROTECTED]> said: > "I'm sorry, I'm not going to be able to figure out how to type that email > address on my keyboard, could you please send me a message, and I'll just hit > reply". Wasn't there a Dilbert cartoon regarding sending a page to a pager number containing a caret? ;) -- Valdis Kletnieks Operating Systems Analyst Virginia Tech PGP signature
Re: Will Language Wars Balkanize the Web?
At 10:59 PM 12/4/00 +0859, Masataka Ohta wrote: > > Thank you. I was hoping someone would point out the support for parallel > > operation so we could go further down that path. As you note, it seems to > > be the closest to providing local/global support already. > >Silly comparison. Thank you. We always seek to entertain. >Efficient postal system works with numbers so called zip code. Zip/postal code is not required. The example given was of country string in 'global' form and remainder in local. >Postal address with various characters needs human intervention for >complex matching and is similar not to DNS but to search engines. Machines frequently process the strings, but that is not relevant to the nature and use of the strings. They are addresses, pertaining to location. Search engines are for free-form keywords. Not the same at all. d/ =-=-=-=-= Dave Crocker <[EMAIL PROTECTED]> Brandenburg Consulting Tel: +1.408.246.8253, Fax: +1.408.273.6464
Re: Will Language Wars Balkanize the Web?
Dave; > Thank you. I was hoping someone would point out the support for parallel > operation so we could go further down that path. As you note, it seems to > be the closest to providing local/global support already. Silly comparison. Efficient postal system works with numbers so called zip code. Postal address with various characters needs human intervention for complex matching and is similar not to DNS but to search engines. Masataka Ohta
Re: Will Language Wars Balkanize the Web?
Thank you. I was hoping someone would point out the support for parallel operation so we could go further down that path. As you note, it seems to be the closest to providing local/global support already. That means postal gives us: 1. Global support for a common "character set" 2. Global support for a carefully mixed character set -- though really it is just a partitioning between the global field and the local field 3. Local support for a local character set. (the support goes beyond character set, but let's leave it at that if that's ok.) An immediate problem with comparing to postal is that it somewhat correlates with the path a letter will take, so that the incremental interpretation can be done by groups with different language skill-sets. The DNS does not have that flexibility and the domain name interpretation is not part of the transfer sequence of the data. Schemes that put an ACE-like field into a .com might be considered to be like #2, above, by really they are not. The whole string is still global. Frankly this leaves me viewing the postal example as pretty unhelpful for finding a solution to the DNS requirement. On the other hand, this thread was triggered by Graham's question about the negative impact of partitioning. The postal example would seem to show that the effect is not so bad. Except I would claim that it is not partitioning. Note that an address always has a global representation, in addition to a possibly different local one. Perhaps that can reconciled as easily as claiming that any 'local' domain name must also have a global form? (But, somehow, the word "scaling" gets in the way of believing that.) d/ At 05:20 PM 12/4/00 +0900, Martin J. Duerst wrote: >At 00/12/03 13:57 -0500, Dave Crocker wrote: >>Would it be such a bad thing to be unable to postal mail a letter or >>package to anywhere in the world? > >Of course it would be very bad. But it is usual now to send mail >e.g. from Japan to Japan with an address without any Latin letters. >It is also possible to send mail e.g. from the US or Europe to e.g. >Japan, with all but the country name in ideographs. > >So the postal system is already now much closer to multilingual >domain names than to ASCII-only domain names. > >It is also possible, as far as I understand, to send mail >with an address only written in Latin letters, to any country >in the world. The multilingual domain name solution should of >course provide a way (at least one way) to do this. =-=-=-=-= Dave Crocker <[EMAIL PROTECTED]> Brandenburg Consulting Tel: +1.408.246.8253, Fax: +1.408.273.6464
Re: Will Language Wars Balkanize the Web?
At 00/12/03 13:57 -0500, Dave Crocker wrote: >Would it be such a bad thing to be unable to postal mail a letter or >package to anywhere in the world? Of course it would be very bad. But it is usual now to send mail e.g. from Japan to Japan with an address without any Latin letters. It is also possible to send mail e.g. from the US or Europe to e.g. Japan, with all but the country name in ideographs. So the postal system is already now much closer to multilingual domain names than to ASCII-only domain names. It is also possible, as far as I understand, to send mail with an address only written in Latin letters, to any country in the world. The multilingual domain name solution should of course provide a way (at least one way) to do this. Please also note that Japanese name cards usually have two sides, one in Japanese and one in Latin. Now, the email addresses on both sides are the same, but in the future, you would just use the one on the Latin side if you cannot type Japanese. Regards, Martin.
Re: Will Language Wars Balkanize the Web?
At 00/12/03 08:03 +, Graham Klyne wrote: >There's a news story at: > > http://www.acm.org/technews/articles/2000-2/1201f.html#item10 > >under the heading "Will Language Wars Balkanize the Web?" > >Leaving aside the issues of competing registries, Sorry, but I think that's the main topic of the article (as far as I can deduce from the abstract), and it is also the main threat to create balkanization. The problem currently is not that Chinese domain names may create a disconnect between the "Chinese Internet" and some other part of the Internet, but that there are various proposals and actors that are working on Chinese domain names, and that all of them act prematurely (i.e. before there is an IETF spec) and with side interests that affect things negatively. >touched upon in that article, I had been wondering with the formation of >IDN WG how I18N would affect cross-character-type-boundary Internet activities. > >I guess one of the first questions should be; "Is some partitioning of >the Internet community such a bad thing?". Why should it matter if, say, >Chinese-based domains aimed at Chinese audiences are not meaningfully >accessible to non-Chinese Internet users? Reasonable question indeed. If the content is Chinese, does it hurt if the address is also Chinese? There are cases where it indeed hurts (such as when you have fonts to display Chinese on your system, but nothing to input Chinese, as may be the case if you work off an English OS of some kind). However, in general and for the majority of actual users (i.e. for the Chinese users reading Chinese web pages,...), having Chinese domain names is actually a big advantage. They are easier to memorize, easier to guess, easier to identify with, and so on. >At a purely technological level, the priority ascribed to the end-to-end >architecture of the Internet has underpinned and presumed >non-discriminatory any-to-any communication. I wonder if this is a >reasonable expectation at the social level of Internet use. At the *linguistic* level, there are certain rather hard boundaries based on the difficulty of learning foreign languages and on the slow advances of machine translation. At the social level, boundaries should be kept as low as possible. Regards, Martin.
Re: Will Language Wars Balkanize the Web?
- Original Message - From: "lists" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Sunday, December 03, 2000 19:00 Subject: Re: Will Language Wars Balkanize the Web? > > > "I'm sorry, I'm not going to be able to figure out how to type that email > address on my keyboard, could you please send me a message, and I'll just hit > reply". > > Adi > Good point. I didn't think about e-mail addresses. Kimon _NetZero Free Internet Access and Email__ http://www.netzero.net/download/index.html
Re: Will Language Wars Balkanize the Web?
> "I'm sorry, I'm not going to be able to figure out how to type that email > address on my keyboard, could you please send me a message, and I'll just hit > reply". if the app-presentation -> internal coding -> dns request mapping is not one:one and reversable on the other end, even this is not sure to work. randy
Re: Will Language Wars Balkanize the Web?
On Sun, Dec 03, 2000 at 04:56:38PM -0500, Kimon A. Andreou wrote: > > > You can't address a letter to someone in Berkeley, USA in nagari or > amharic > > characters and expect it to reach. However you can address a letter to > someone > > in Addis Ababa, Ethiopia in ASCII characters with a poor-phonetic > > approximation and expect it to reach (choice of locales based on > experience). > > > > > > > Adi > > But don't packets get routed using IP addresses (i.e. numbers) ? er, wrong layer. Although I'm as good at remembering IP addresses as phone numbers, you'll have a hard time convincing others to give up DNS. "I'm sorry, I'm not going to be able to figure out how to type that email address on my keyboard, could you please send me a message, and I'll just hit reply". Adi
Re: Will Language Wars Balkanize the Web?
Kimon gets a A. Betsy gets an F. d/ At 03:30 PM 12/3/00 -0500, Kimon A. Andreou wrote: >But isn't the Internet a medium of communication as is the Post and the >telephone? >Therefore, shouldn't it support communication between any two points, >wherever they may be or however they're called? > >Kimon >- Original Message - >From: "Betsy Brennan" <[EMAIL PROTECTED]> > > But the Internet is not the postal system nor the phone system. We already > > have the postal system and the phone system. T =-=-=-=-= Dave Crocker <[EMAIL PROTECTED]> Brandenburg Consulting Tel: +1.408.246.8253, Fax: +1.408.273.6464
Re: Will Language Wars Balkanize the Web?
- Original Message - From: "R . P . Aditya" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Sunday, December 03, 2000 16:20 Subject: Re: Will Language Wars Balkanize the Web? > You can't address a letter to someone in Berkeley, USA in nagari or amharic > characters and expect it to reach. However you can address a letter to someone > in Addis Ababa, Ethiopia in ASCII characters with a poor-phonetic > approximation and expect it to reach (choice of locales based on experience). > > > Adi But don't packets get routed using IP addresses (i.e. numbers) ? Kimon ___ Why pay for something you could get for free? NetZero provides FREE Internet Access and Email http://www.netzero.net/download/index.html
Re: Will Language Wars Balkanize the Web?
As has been noted, the _hard part_ is making the protocol that is used between countries' communications systems "language independent". > > Would it be such a bad thing to be unable to make a phone call to anywhere > > in the world? I have yet to see a telephone dialpad that even has non-arabic base-10 numbers on it (has it slowed the spread and use of the phone system?). > > Would it be such a bad thing to be unable to postal mail a letter or > > package to anywhere in the world? You can't address a letter to someone in Berkeley, USA in nagari or amharic characters and expect it to reach. However you can address a letter to someone in Addis Ababa, Ethiopia in ASCII characters with a poor-phonetic approximation and expect it to reach (choice of locales based on experience). At some point it's not worth the effort to "internationalize" all the layers...will the lucrative returns on additional domains pay for such an effort? and will that make an already "complex" Internet more accessible? Does Babelization without language isomorphism lead to Balkanization? Or, "why is machine translation so hard?". Adi On Sun, Dec 03, 2000 at 03:06:10PM -0500, Betsy Brennan wrote: > But the Internet is not the postal system nor the phone system. We already > have the postal system and the phone system. They may be slower, but does > that mean they should be replaced or that the Internet must duplicate what > these systems do? BLB > > Dave Crocker wrote: > > > At 08:03 AM 12/3/00 +, Graham Klyne wrote: > > >I guess one of the first questions should be; "Is some partitioning of > > >the Internet community such a bad thing?". > > > > Would it be such a bad thing to be unable to make a phone call to anywhere > > in the world? > > > > Would it be such a bad thing to be unable to postal mail a letter or > > package to anywhere in the world? > > > > d/ > > > > ps. strictly rhetorical questions, as I hope is obvious. > > > > =-=-=-=-= > > Dave Crocker <[EMAIL PROTECTED]> > > Brandenburg Consulting > > Tel: +1.408.246.8253, Fax: +1.408.273.6464 >
Re: Will Language Wars Balkanize the Web?
But isn't the Internet a medium of communication as is the Post and the telephone? Therefore, shouldn't it support communication between any two points, wherever they may be or however they're called? Kimon - Original Message - From: "Betsy Brennan" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Sunday, December 03, 2000 15:06 Subject: Re: Will Language Wars Balkanize the Web? > But the Internet is not the postal system nor the phone system. We already > have the postal system and the phone system. They may be slower, but does > that mean they should be replaced or that the Internet must duplicate what > these systems do? BLB > NetZero Free Internet Access and Email_ Download Now http://www.netzero.net/download/index.html Request a CDROM 1-800-333-3633 ___
Re: Will Language Wars Balkanize the Web?
> But the Internet is not the postal system nor the phone system. We already > have the postal system and the phone system. They may be slower, but does > that mean they should be replaced or that the Internet must duplicate what > these systems do? i am sorry, but i can not understand the above. perhaps you were writing in californian. qed. randy
Re: Will Language Wars Balkanize the Web?
But the Internet is not the postal system nor the phone system. We already have the postal system and the phone system. They may be slower, but does that mean they should be replaced or that the Internet must duplicate what these systems do? BLB Dave Crocker wrote: > At 08:03 AM 12/3/00 +, Graham Klyne wrote: > >I guess one of the first questions should be; "Is some partitioning of > >the Internet community such a bad thing?". > > Would it be such a bad thing to be unable to make a phone call to anywhere > in the world? > > Would it be such a bad thing to be unable to postal mail a letter or > package to anywhere in the world? > > d/ > > ps. strictly rhetorical questions, as I hope is obvious. > > =-=-=-=-= > Dave Crocker <[EMAIL PROTECTED]> > Brandenburg Consulting > Tel: +1.408.246.8253, Fax: +1.408.273.6464
Re: Will Language Wars Balkanize the Web?
> I guess one of the first questions should be; "Is some partitioning of the > Internet community such a bad thing?". Why should it matter if, say, > Chinese-based domains aimed at Chinese audiences are not meaningfully > accessible to non-Chinese Internet users? There's a distinct issue that exists apart from the inter-human aspects - the packets containing these new character forms will flow, at least occasionally, into pretty much everyone's machines, routers, NATs, firewalls, web caches, etc - all of which need to be able to handle these new packets without ill effects. (The definition of "ill effect" will vary depending on what the box is supposed to be doing.) For instance, it would be "a bad thing" if some "transparent" web cache in some ISP went south when it re-resolved a URL that contained a domain name that either had itself a label in some non-hostname character set or was resolved via a CNAME containing non-hostname characters. In other words, although the humans (and their user interfaces) may Balkanize, the infrastructure on which the net operates should not. --karl--
Re: Will Language Wars Balkanize the Web?
In my opinion, it is vital to craft Internet's evolution so as to maintain full connectivity and interworking among all its parts. I do not see "balkanization" as a good thing at all. I believe there are sound technical means to achieve the objective of incorporating character sets associated with non-roman languages but that critics need to understand more fully just how important the limitations of the current character set for domain names have been in maintaining interworking and also ability of so many applications to incorporate and refer to domain names. The IA4 alphabet includes essentially just the letters A-Z, numbers 0-9 and the "-" (dash). This is the limit of what is allowed in domain names today. Incorporating other character sets without deep technical consideration will risk the inestimable value of interworking across the Internet. It CAN be done but there is a great deal of work to make it function properly. Vint At 08:03 AM 12/3/2000 +, Graham Klyne wrote: >There's a news story at: > > http://www.acm.org/technews/articles/2000-2/1201f.html#item10 > >under the heading "Will Language Wars Balkanize the Web?" > >Leaving aside the issues of competing registries, touched upon in that article, I had >been wondering with the formation of IDN WG how I18N would affect >cross-character-type-boundary Internet activities. > >I guess one of the first questions should be; "Is some partitioning of the Internet >community such a bad thing?". Why should it matter if, say, Chinese-based domains >aimed at Chinese audiences are not meaningfully accessible to non-Chinese Internet >users? At a purely technological level, the priority ascribed to the end-to-end >architecture of the Internet has underpinned and presumed non-discriminatory >any-to-any communication. I wonder if this is a reasonable expectation at the social >level of Internet use. > >#g > >PS: I think it is without doubt that it is a Good Thing that we make efforts to >internationalize protocols; my comments/questions are an attempt to explore how far >this process can reasonable go. > > >Graham Klyne >([EMAIL PROTECTED])
Re: Will Language Wars Balkanize the Web?
At 08:03 AM 12/3/00 +, Graham Klyne wrote: >I guess one of the first questions should be; "Is some partitioning of >the Internet community such a bad thing?". Would it be such a bad thing to be unable to make a phone call to anywhere in the world? Would it be such a bad thing to be unable to postal mail a letter or package to anywhere in the world? d/ ps. strictly rhetorical questions, as I hope is obvious. =-=-=-=-= Dave Crocker <[EMAIL PROTECTED]> Brandenburg Consulting Tel: +1.408.246.8253, Fax: +1.408.273.6464
Re: Will Language Wars Balkanize the Web?
At 03:03 03/12/00, Graham Klyne wrote: >I guess one of the first questions should be; "Is some partitioning of the Internet >community such a bad thing?" A partioning based on nationality, which is of course different than language group, would be harmful. Lack of interoperability of standard protocols would be bad, for whatever reason, including incompatible localisations. Lack of standards support for internationalisation/multi-lingual computing, as different from localisation, would also be bad. > Why should it matter if, say, Chinese-based domains aimed >at Chinese audiences are not meaningfully accessible to >non-Chinese Internet users? What about people who can read and perhaps also write in Chinese characters but who are not Chinese (either ROC on Taiwan or PRC on the mainland) nationals ? Consider not only folks in Singapore or SE Asia generally, but also Chinese-capable folks in other places (e.g. North America, Europe). [NB: I'm deliberately ignoring the issues with Traditional vs Simplified characters just now, though that is also part of the internationalisation equation]. I regularly read my news from British or Hong Kong or other countries' web sites. Living in North America, I'm certainly not the target audience for the HK Standard or South China Morning Post. However, I do read those newspapers online. Less regularly, but occasionally, I do read Chinese web sites (in Chinese) or Japanese web sites (reading the Kanji portion only). I am most assuredly NOT the target audience for any of these web sites. On a daily basis, I receive mail with Chinese language contents, though a surprising amount of that turns out to be unsolicted bulk email in my own case. I receive a modest amount of German or Vietnamese email. So multi-lingual protocol capabilities are quite important to me. So for all those reasons, it does in fact matter a great deal. >At a purely technological level, the priority ascribed to the end-to-end architecture >of the Internet has underpinned and presumed non-discriminatory any-to-any >communication. I wonder if this is a reasonable expectation at the social level of >Internet use. I do think so. >PS: I think it is without doubt that it is a Good Thing that we make efforts to >internationalize protocols; my comments/questions are an attempt to explore how far >this process can reasonable go. I don't want to try to predict the future, so I won't. I can say that today, we are NOT anywhere close to a reasonable end point or stopping point for internationalisation of IETF standards-track protocols. In particular, we haven't resolved the basic internationalisation issues for a number of core infrastructure protocols (e.g. DNS). Regards, Ran [EMAIL PROTECTED]
Re: Will Language Wars Balkanize the Web?
Graham; > Leaving aside the issues of competing registries, touched upon in that > article, I had been wondering with the formation of IDN WG how I18N would > affect cross-character-type-boundary Internet activities. Nothing. Cross-character-type-boundary is a pure localization issue and has nothing to do with people wrongly working on I18N. > PS: I think it is without doubt that it is a Good Thing that we make > efforts to internationalize protocols; If only you understand what "internationalize protocols" mean. ASCII (latin, numeric and hypen) characters are the only characters internationally recognizable by so many people. Masataka Ohta
Re: Will Language Wars Balkanize the Web?
you may want to look at the work going on in the idn wg. randy
Will Language Wars Balkanize the Web?
There's a news story at: http://www.acm.org/technews/articles/2000-2/1201f.html#item10 under the heading "Will Language Wars Balkanize the Web?" Leaving aside the issues of competing registries, touched upon in that article, I had been wondering with the formation of IDN WG how I18N would affect cross-character-type-boundary Internet activities. I guess one of the first questions should be; "Is some partitioning of the Internet community such a bad thing?". Why should it matter if, say, Chinese-based domains aimed at Chinese audiences are not meaningfully accessible to non-Chinese Internet users? At a purely technological level, the priority ascribed to the end-to-end architecture of the Internet has underpinned and presumed non-discriminatory any-to-any communication. I wonder if this is a reasonable expectation at the social level of Internet use. #g PS: I think it is without doubt that it is a Good Thing that we make efforts to internationalize protocols; my comments/questions are an attempt to explore how far this process can reasonable go. Graham Klyne ([EMAIL PROTECTED])