Re: Reply-To mess opinion [was Re: Unicode on a non-Unicode
On Sun, 10 Sep 2000 04:39:22 -0800 (GMT-0800), Harald Alvestrand wrote: I have the opposite experience from Simon Hill: Most reply-to munging lists get regular complaints, those that don't do it don't want it. My experience is that the complaints arise when the list *systematically* adds a Reply-To, overwriting any reply-to that may have been set by the sender. Lists which add a Reply-To if and only if the original message *does not* contain this header are, in my experience, trouble-free. John. -- -- Over 1200 webcams from ski resorts around the world - http://www.tradoc.fr/john/webcams/ -- Translate your technical documents and web pages- http://www.tradoc.fr/en/
Fwd: I-D ACTION:draft-duerst-i18n-norm-04.txt
To: IETF-Announce: ; From: [EMAIL PROTECTED] Reply-to: [EMAIL PROTECTED] Subject: I-D ACTION:draft-duerst-i18n-norm-04.txt Date: Thu, 14 Sep 2000 06:57:36 -0400 Sender: [EMAIL PROTECTED] A New Internet-Draft is available from the on-line Internet-Drafts directories. Title : Character Normalization in ITEF Protocols Author(s) : M. Duerst, M. Davis Filename: draft-duerst-i18n-norm-04.txt Pages : 12 Date: 13-Sep-00 The Universal Character Set (UCS) [ISO10646, Unicode] covers a very wide repertoire of characters. The IETF, in [RFC 2277], requires that future IETF protocols support UTF-8 [RFC 2279], an ASCII-compatible encoding of UCS. The wide range of characters included in the UCS has lead to some cases of duplicate encodings. This document proposes that in IETF protocols, the class of duplicates called canonical equivalents be dealt with by using Early Uniform Normalization according to Unicode Normalization Form C, Canonical Composition (NFC) [UTR15]. This document describes both Early Uniform Normalization and Normalization Form C. A URL for this Internet-Draft is: http://www.ietf.org/internet-drafts/draft-duerst-i18n-norm-04.txt Internet-Drafts are also available by anonymous FTP. Login with the username "anonymous" and a password of your e-mail address. After logging in, type "cd internet-drafts" and then "get draft-duerst-i18n-norm-04.txt". A list of Internet-Drafts directories can be found in http://www.ietf.org/shadow.html or ftp://ftp.ietf.org/ietf/1shadow-sites.txt Internet-Drafts can also be obtained by e-mail. Send a message to: [EMAIL PROTECTED] In the body type: "FILE /internet-drafts/draft-duerst-i18n-norm-04.txt". NOTE: The mail server at ietf.org can return the document in MIME-encoded form by using the "mpack" utility. To use this feature, insert the command "ENCODING mime" before the "FILE" command. To decode the response(s), you will need "munpack" or a MIME-compliant mail reader. Different MIME-compliant mail readers exhibit different behavior, especially when dealing with "multipart" MIME messages (i.e. documents which have been split up into multiple messages), so check your local documentation on how to manipulate these messages. Below is the data which will enable a MIME compliant mail reader implementation to automatically retrieve the ASCII version of the Internet-Draft. Content-Type: text/plain Content-ID: [EMAIL PROTECTED] ENCODING mime FILE /internet-drafts/draft-duerst-i18n-norm-04.txt ftp://ftp.ietf.org/internet-drafts/draft-duerst-i18n-norm-04.txt
Re: [idn] nameprep forbidden characters
I think it's very useful to know about the problems of Hebrew software with points, and about the problems that Hebrew users have with using points. And Jonathan is definitely in the best position to know about that. However, that doesn't mean that the best solution is to ignore points on the client side. For example, Yiddish uses pointed letters in quite a bit a different way; they cannot be ignored. The same may apply to other languages written with the Hebrew script. There may also be cases where a point can indeed make a difference. One solution to the problem is obviously to ignore it. If Jonathan is true, registering names with points won't be attractive, and so there will automatically be very few registrations with points. If it's difficult for users to input the points, then they will be very much at ease with just inputing the base letters. Everything will work together. For those cases where it's necessary to make a difference (e.g. Yiddish), there won't be any problems. Regards, Martin. At 00/09/17 10:04 -0700, Mark Davis wrote: I am curious why you feel so strongly that the Hebrew points should be ignored in domain names. Prima facie, it seems that there is little harm in treating them no differently from other characters. What problem would arise if the domain was ABC.COM and I could not get it by typing AB*C.COM? (Here uppercase stands for Hebrew, and * for a point.) Conversely, if someone really did register AB*C.COM, would it be a problem that I couldn't get to that location by typing ABC.COM? It is my understanding that the vowels are rarely used, and that people really wouldn't use them in registered domain names anyway. It seems that if someone did take the trouble to type in the points, that there would be a reason for their making such a distinction. I'd appreciate it if you could help me to understand the issue more clearly. Mark Jonathan Rosenne wrote: We should distinguish "punctuation", like 060C Arabic Comma, and "diacritics", such as 064E Arabic Fatha. Diacritics is probably the wrong word. I have the impression that you were referring to the latter. For Hebrew, my opinion is that from the point of view of the user, punctuation should be forbidden, while diacritics such as the vowels and other combining characters should be allowed and be ignored. I believe it is important that the rules for Arabic and Hebrew should be the same as far as possible. Jony -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Wael Nasr Sent: Saturday, September 16, 2000 1:16 AM To: Edmon; idn working group; Adam M. Costello Subject: RE: [idn] nameprep forbidden characters Wanted to share with you that in the arabic Working group of minc we have discussed this point at length. In arabic the meaning of the word will change depending on punctuation , like the words "knowlege" and "flag" in arabic are exactly the same except for punctuation. It is my opinion that , at least regarding arabic, no punctuation should be allowed for now. I am sure 5 years from now , domain name systems will be much more dynamic than what we have now and will not be simply a simple mapping of unicode or ascii to an ip number. at that time, punctuation can be allowed to be part of the game. wael --- Wael Nasr Director, Middle East Business Development I-DNS.net [EMAIL PROTECTED] Cell Phone(Egypt):+(201) 222 55 380 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Edmon Sent: Saturday, September 16, 2000 12:59 AM To: idn working group; Adam M. Costello Subject: Re: [idn] nameprep forbidden characters Perhaps host names should avoid all punctuation in all languages so people don't have to worry about it. I think we have to remember that it is the registrant's choice to choose a name that best reflects their identity online. Punctuations may serve to be great symbols that identifies an entity, for example a person called O'Brian would want to have the apostrophe for his domain name and a company AB would want the "" in their name. Our move to multilingual is the best opportunity for us to re-include these worthwhile and long awaited symbols back into the domain name space. Edmon AMC
http://www.unicode.org/unicode/standard/standard.html
Hello, I had written: I have re-read section "Controlling Ligatures", in TUC 3.0, p. 318. Am 2000-09-15 um 14:40 UCT hat Mark Davis geschrieben: I'd like to remind everyone to look at the latest version of the Unicode Standard, especially when looking at fine points. To cite Unicode 3.0.1 (http://www.unicode.org/unicode/standard/versions/Unicode3.0.1.html) Thank you for this pointer. Actually, I had looked for this information before I wrote my previous note, but could not find it; hence, I resorted to TUS 3.0. The reason for my failure to locate 3.01 on the Unocode WWW site is an improper organisation of the latter: I followed this trail: - http://www.unicode.org/: About the Unicode/Standard - http://www.unicode.org/unicode/standard/standard.html: Version 3.0 The most current major version of the Unicode standard... Here ended my search. It did not occurr to me that I would find an even more current version of the Unicode standard under the "Versions of the Unicode Standard" link. So please make sure that the wording on your "Unicode Standard" page is not misleading, and that there is a conspicuous link to the current standard, even if it is not a major version. Thank you, and best wishes, Otto Stolz
Re: New Locale Proposal
I do not know if this proposal is good or evil. But in any case there are some points that need to be enhanced IMHO. Carl W. Brown wrote: The locale will consist of three parts: 1) A modified lower case RFC 1766bis language 2) An ISO 3166 country code Can you allow for areas that are a little bigger ? The first obvious case is the EU (but I believe it may soon become a ISO 3166 code). Problematic cases also include the Arabic countries and the Spanish America, where the unity of language conjugated with the differences in countries create a long list of almots completely virtual locales (that is, outside the need to tag monetary amounts, these locales are non-informative). Same problem for French in Africa and, to a lesser extend, English on wide areas on Earth. 3) A variant The modifications to RFC 1766bis to make to better suited for locales are as follows: 1) Normalize to single form when possible. Use ISO 639-1 code instead of 639-2 if one exists. Are you forced to re-tag every bit of data when ISO 639/RA issues a new code? 3) Variants that are not related to language are locale variants. fr_FR_EURO Can *please* people avoid this abuse of the variant idea? We are at less than 16 months from the end of the use of FRF. So in 16 months from now, the "fr_FR" locale will become completely indistinguishable from your example. Unless you want to force us to leave the "fr_FR" and reserves it for tagging obsolete datas, but I can tell you this is an already lost battle. This is a big problem for a draft RFC that will take around, say, 15 months (;-)), to be completed. Now, if we try to be a bit more clever, the locale that speaks French and which labels monetary amounts in euros should be named "fr_EU", for anything except very peculiar and very rare uses. There are as much differences between France's French and Belgian French as between Scottish English and London English (the most notable being the use of "octante" instead of "quatre-vingt" for eighty); and I believe the few other similar cases like "de_EU" for "de_DE"/"de_AT", "nl_EU" for "nl_NL"/"nl_BE", and the perhaps more future "en_EU"/"en_IE"/"en_GB" or "sv_EU"/"sv_FI"/"sv_SE". Furthermore, the small countries and alike, as are "LU", "AD", "SM", "MC" or "VC", for which independant locales will be quite of jokes (I except "lb_LU"), will then be covered easily. 5) Convert all non-human locales "C" "POSIX" to human locales e.g. en_US. There are BIG differences between "C"/"POSIX" and "en_US". If you do not see that, then I believe there are big holes in the intended uses of these new locales. A major one is that "POSIX" collates in the same order as ASCII; while I do not believe you are willingful to impose this burden on every user of "en_US"! The whole point of "C" and "POSIX" (or its grand'brother "i18n"), as locales, are to provide surety in execution in an area where fuzziness is the rule. And yes, there are cases where this is much more important than displaying user-friendly dates... Furthermore, I am not sure at all that mapping "C" to "en_US" will be welcome everywhere (even if C99 now insists that the names used in full text dates are the English ones). I am not even sure this is conforming, even assuming the _classical_ "en_US" where accentuated characters are considered punctuation. In any ways, the modern, Unicode-conformant, definition of "en_US" will certainly not qualify. Antoine
Re: New Locale Proposal
Antoine Leca [EMAIL PROTECTED] wrote: 1) Normalize to single form when possible. Use ISO 639-1 code instead of 639-2 if one exists. Are you forced to re-tag every bit of data when ISO 639/RA issues a new code? From what I have heard, ISO 639/MA will not be issuing any new 639-1 (two-letter) codes for languages that already have a 639-2 (three- letter) code. So this re-tagging scenario should not occur and Carl's solution, which is the same as that proposed in RFC 1766 bis, should work fine. -Doug Ewell Fullerton, California
Re: http://www.unicode.org/unicode/standard/standard.html
The paragraph reads: The most _current major version of the Unicode standard_ contains 49,194 distinct coded characters. These characters cover the principal written languages of the Americas, Europe, the Middle East, Africa, India, Asia, and the Pacific Basin. Certain technical reports are approved and part of 3.0. For a list, see the _Technical Reports_ page. Note that Version 3.0 is amended by an update version, _Unicode 3.0.1_. so the note is at the bottom ("_" indicates a link). I am not criticizing your message -- if you missed it, certainly a lot of other people will! We will have to rewrite the paragraph for clarity. Otto Stolz wrote: Hello, I had written: I have re-read section "Controlling Ligatures", in TUC 3.0, p. 318. Am 2000-09-15 um 14:40 UCT hat Mark Davis geschrieben: I'd like to remind everyone to look at the latest version of the Unicode Standard, especially when looking at fine points. To cite Unicode 3.0.1 (http://www.unicode.org/unicode/standard/versions/Unicode3.0.1.html) Thank you for this pointer. Actually, I had looked for this information before I wrote my previous note, but could not find it; hence, I resorted to TUS 3.0. The reason for my failure to locate 3.01 on the Unocode WWW site is an improper organisation of the latter: I followed this trail: - http://www.unicode.org/: About the Unicode/Standard - http://www.unicode.org/unicode/standard/standard.html: Version 3.0 The most current major version of the Unicode standard... Here ended my search. It did not occurr to me that I would find an even more current version of the Unicode standard under the "Versions of the Unicode Standard" link. So please make sure that the wording on your "Unicode Standard" page is not misleading, and that there is a conspicuous link to the current standard, even if it is not a major version. Thank you, and best wishes, Otto Stolz
Re: [idn] nameprep forbidden characters
From: "Martin J. Duerst" [EMAIL PROTECTED] However, that doesn't mean that the best solution is to ignore points on the client side. For example, Yiddish uses pointed letters in quite a bit a different way; they cannot be ignored. The same may apply to other languages written with the Hebrew script. There may also be cases where a point can indeed make a difference. The points can and should be ignored for Internet domain name resolution. They cannot be ignored in all cases for all Hebrew-script applications anyone can come up with now or in the future, either for Hebrew or Yiddish or Ladino or whatever. Similarly, Letter case cannot be ignored for Latin-script languages either in every application possible. But I feel as a Yiddish expert that they can and should be ignored for the application in question, Internet domain name resolution. I claim this applies very much to Ladino, though I can't claim expertise in that language, only superficial knowledge. Other Jewish languages (using Hebrew script) are archaic and are not in use, or hardly in use, by living speakers.
Re: New Locale Proposal
Doug Ewell wrote: Antoine Leca [EMAIL PROTECTED] wrote: 1) Normalize to single form when possible. Use ISO 639-1 code instead of 639-2 if one exists. Are you forced to re-tag every bit of data when ISO 639/RA issues a new code? From what I have heard, ISO 639/MA will not be issuing any new 639-1 (two-letter) codes for languages that already have a 639-2 (three- letter) code. So this re-tagging scenario should not occur and Carl's solution, which is the same as that proposed in RFC 1766 bis, should work fine. I *should* have missed something. In the last publication of new codes, there was "bs" for "Bosnian". My understanding of the situation of the former Yugoslavia is that the language which is intended to be tagged is a form of Serbo-Croatian that is spoken in the country named Bosnia-Herzegovina (not sure about Herzegovina), and outside this country by the natives or relatives of natives of this very country. Now this language is not a sudden invention: it was known before. And as I understand things, this language was tagged "hr-XX-Bosnian", or something like that (XX being the relevant country of the speaker). So now the (probably fictious) document is supposed to be re-tagged as "bs-XX". Or have I missed something? Another example: a text in Avestan was, before the last change, tagged as "x-Avestan" or "x-Avesta" or "x-zend" or a number of others, according to the tagger. Now, should they be re-tagged? (and don't miss me, that will be certainly a Good Thing; remember, the question is about the requirement). Antoine
Re: New Locale Proposal
The opposite it true, Doug. ISO 639 will ONLY issue new 639-1 (two-letter) codes for languages that already have a 639-2 (three-letter) code. That means, in effect, that the ISO 639-1/MA (AT InfoTerm) has its hands tied: it can no longer register any new lanugage tag identifiers for languages not already approved by the ISO 639-2/MA (US Library of Congress). mg Arsa Doug Ewell: Antoine Leca [EMAIL PROTECTED] wrote: 1) Normalize to single form when possible. Use ISO 639-1 code instead of 639-2 if one exists. Are you forced to re-tag every bit of data when ISO 639/RA issues a new code? From what I have heard, ISO 639/MA will not be issuing any new 639-1 (two-letter) codes for languages that already have a 639-2 (three- letter) code. So this re-tagging scenario should not occur and Carl's solution, which is the same as that proposed in RFC 1766 bis, should work fine. -Doug Ewell Fullerton, California -- Marion Gunn Everson Gunn Teoranta http://www.egt.ie
Re: New Locale Proposal
Carl W. Brown wrote: I am sorry that my previous reply was so short, I was rushed. A bit of background: AH... I am sorry, I missed entirely your point on the first shot. I was believing you intended a new design on the locale issue. You can easily drop my comments, they usually do not apply to the problem you are referring yourselves. I apologize for the confusion I caused. Since I have to send this message, here are a few more comments on your notes. Mostly for fun... From: Antoine Leca [mailto:[EMAIL PROTECTED]] Carl W. Brown wrote: The locale will consist of three parts: 1) A modified lower case RFC 1766bis language 2) An ISO 3166 country code Can you allow for areas that are a little bigger ? The first obvious case is the EU (but I believe it may soon become a ISO 3166 code). Problematic cases also include the Arabic countries and the Spanish America, where the unity of language conjugated with the differences in countries create a long list of almots completely virtual locales (that is, outside the need to tag monetary amounts, these locales are non-informative). Same problem for French in Africa and, to a lesser extend, English on wide areas on Earth. Good point. A combined South American Spanish is also a good starting point for a neutral Spanish dialect. I guess you can always use a 5-8 character language variant. I guess this too, but I believe(d) standardization in this area may help. Alas, this does not appear as the way we go. As an European, I assume you meant "a neutral Hispanoamerican" above, i.e. want to dissociate European Spanish from Hispanoamerican (note to non-Spanish speakers: this holds a lot of sense). "Neutral Spanish" already have a locale code, "es", no need here. On the other hand for a language like Portuguese you might want to use Brazilian Portuguese from Minas Gerais as a language neutral. This might be a case for your ISO 3166-2 codes Brazil is the major producer to T.V. and movies and influences the Portuguese language. Sounds OK as far as I know, but I do not know Brazil's linguistic situation! I guess it is like taking California English as a standard, maybe resented but generally understood. But this one is going funny. Here in France, "Californian English" (which we usually call West Coast American English) is taken as the prototypical example of the hard-to-understand American way of talking. Of course, persons in contact with "real" Americans people know about Arizona or Texas or Ebonics (no offence intended; insert you case here:---) or X...; but the symbol is represented with the "West Coast"... Antoine
Re: New Locale Proposal
Marion Gunn wrote: The opposite it true, Doug. ISO 639 will ONLY issue new 639-1 (two-letter) codes for languages that already have a 639-2 (three-letter) code. Almost, but not quite. If that were true, 639-2 tags could become effectively obsolete. The true rules AFAIU are: 1) A language with a 639-1 tag has and will always have a 639-2 tag as well. E.g. English has tags "eng" and "en". 2) A language which currently has a 639-2 tag but not a 639-1 tag will not get a new 639-1 tag in future. E.g. Arapaho has tag "arp" but will never have a 639-1 tag. 3) Therefore, the only future 639-1 tags are those assigned to new (i.e. not in 639-2) languages, simultaneously with a 639-2 tag. E.g. Lojban, a currently untagged language, might get the tags "loj" and "lj". (When Hell freezes over.) -- There is / one art || John Cowan [EMAIL PROTECTED] no more / no less|| http://www.reutershealth.com to do / all things || http://www.ccil.org/~cowan with art- / lessness \\ -- Piet Hein
Re: New Locale Proposal
Absolutely true, John, and said far more succinctly than I did. The most significant aspect of this is that the work of registering codes should be procesed much faster in future, becuause, although for now there may still exist two separate Maintenance Agencies to process requests, aplicants applying to AT InfoTerm for 639-1 codes have, in future, as you say below, simultaneously to satisfy 639-2 US LOC requirements. mg Arsa John Cowan: Marion Gunn wrote: The opposite it true, Doug. ISO 639 will ONLY issue new 639-1 (two-letter) codes for languages that already have a 639-2 (three-letter) code. Almost, but not quite. If that were true, 639-2 tags could become effectively obsolete. The true rules AFAIU are: 1) A language with a 639-1 tag has and will always have a 639-2 tag as well. E.g. English has tags "eng" and "en". 2) A language which currently has a 639-2 tag but not a 639-1 tag will not get a new 639-1 tag in future. E.g. Arapaho has tag "arp" but will never have a 639-1 tag. 3) Therefore, the only future 639-1 tags are those assigned to new (i.e. not in 639-2) languages, simultaneously with a 639-2 tag. E.g. Lojban, a currently untagged language, might get the tags "loj" and "lj". (When Hell freezes over.) -- There is / one art || John Cowan [EMAIL PROTECTED] no more / no less|| http://www.reutershealth.com to do / all things || http://www.ccil.org/~cowan with art- / lessness \\ -- Piet Hein -- Marion Gunn Everson Gunn Teoranta http://www.egt.ie
This is not UniLocale!
Isn't there a more appropriate forum for the localization issues? I might even subscribe. However, let's please move the topic to a more appropriate place and let character encoding issues comprise at least half the traffic around here. Thanks, /"\/|/|ike /+yers \ / ASCII Ribbon Campaign X Against HTML Mail Test Engineer / \BMC Software, Inc.
Re: This is not UniLocale!
A noble thought, Mike. But how exactly would you suggest legislating the feeling of what is important in the minds of others? My overall impression is that people ask here because they are looking for the slant that they would get from this group. And lets face it... if there were not other locales, there probably would not be other languages, or other scripts. And then there would be no need for Unicode.:-) Willie Sutton was once misquoted when asked why he robbed banks (the claim was that he said "thats where the money is"). This is where the languages are michka a new book on internationalization in VB at http://www.i18nWithVB.com/ - Original Message - From: Ayers; "Mike" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Monday, September 18, 2000 4:13 PM Subject: This is not UniLocale! Isn't there a more appropriate forum for the localization issues? I might even subscribe. However, let's please move the topic to a more appropriate place and let character encoding issues comprise at least half the traffic around here. Thanks, /"\/|/|ike /+yers \ / ASCII Ribbon Campaign X Against HTML Mail Test Engineer / \BMC Software, Inc.
RE: New Locale Proposal
Antoine wrote: As an European, I assume you meant "a neutral Hispanoamerican" above, i.e. want to dissociate European Spanish from Hispanoamerican (note to non-Spanish speakers: this holds a lot of sense). "Neutral Spanish" already have a locale code, "es", no need here. you are right. I don't consider Mexican Spanish very neutral either. But this one is going funny. Here in France, "Californian English" (which we usually call West Coast American English) is taken as the prototypical example of the hard-to-understand American way of talking. However because Europeans are better trained in languages they can tolorate accent differences better. Try to send a Midwestern American to New Zealand for example. If you want funny, my college roommate was a French major from Georgia who spoke French with a heavy Southern American accent. As to your other messaged about "hr-XX-Bosnian" to "bs-XX". This would have first been converted to hr-bosnian_XX to keep the language variant together with the language and before the country. Then it would be converted to bs_XX when the new standard was implemented. Carl
This is not UniLocale!
Mike Ayers [EMAIL PROTECTED] wrote: Isn't there a more appropriate forum for the localization issues? I might even subscribe. However, let's please move the topic to a more appropriate place and let character encoding issues comprise at least half the traffic around here. For my part, I started and have continued the language tag discussion because of interest in Unicode's (discouraged) Plane 14 language tags. The spillover discussion on RFC 1766 bis and Ethnologue tags is still interesting to me (although the POSIX locale discussion is not). I apologize to those who are getting sick of all this. I agree that some more genuine Unicode topics would be welcome, though I have none to contribute at present. /"\ \ / ASCII Ribbon Campaign X Against HTML Mail / \ A noble cause, and one I support -- however, no more noble than the Great Crusade to Stamp Out UTF-7 Mail Headers. -Doug Ewell Fullerton, California
Re: This is not UniLocale!
On Mon, Sep 18, 2000 at 07:24:30PM -0800, Doug Ewell wrote: A noble cause, and one I support -- however, no more noble than the Great Crusade to Stamp Out UTF-7 Mail Headers. (And this is on list) why? What's so evil about UTF-7 mail headers? What about them would make them non-trivial for any Unicode-compliant mailer to handle? -- David Starner - [EMAIL PROTECTED] http/ftp: dvdeug.dhis.org And crawling, on the planet's face, some insects called the human race. Lost in space, lost in time, and meaning. -- RHPS