Re: Common Locale Data Repository Project
On 23/04/2004 17:15, Philippe Verdy wrote: ... Think more recently about the new codification for Serbo-Croatian, and the split of sh, with no definition except that it is country based (Serbian, Croatian, Bosnian, Montenegrin), assimuming that one country uses only one language when in fact there are several in the same one, that are shared by multiple countries, and differ mostly by their script... These are language which were probably originally somewhat artificially unified, to be the main language of the old Yugoslavia, and which since the old Yugoslavia fell apart have rapidly diverged. When it comes down to it, whether the speech varieties used in two different areas are counted as one language or as separate ones is down to the choice and self-perception of the speakers. For now, many Belgians prefer to say that they speak French, although their spoken dialect is no doubt quite different from Parisian French and their written form is not identical. A time may come when they decide they want their own language, Walloon. At that time they will no doubt ask for appropriate ISO etc codes. That would be the choice of the people of Belgium, and it would not the business of standards committees (or the French) to tell them what to call their language. A language has been defined as a dialect with an army. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
RE: Common Locale Data Repository Project
From: Mark Davis [mailto:[EMAIL PROTECTED] You can reiterate it all you want; in practice, 3066 tags are used as locale identifiers. And for a narrow sense of locales, that is perfectly reasonable. For a broad sense of locale, including timezone, user's currency, religious preference, etc., it clearly would not be reasonable, and I would agree with you for that. But there are a lot of people that don't know enough to recognize that difference. So, even though a language identifier may be sufficient in many cases to name a locale, it is IMO very unhelpful to refer to RFC 3066 tags as locale identifiers as it perpetuates and leads people into wrong assumptions. Please help improve common understanding by not referring to them as locale IDs. ISO 639 is not unstable. It is an open code set that is being added to over time, but I don't think that should be referred to as unstable -- that term suggests other things. ISO 3066 has *demonstrated* instability, I take it you mean ISO 3166? I did not make any claim in that regard. However, there is no policy documented *anywhere* that says they won't. I'm working on it. The ISO 639/RA-JAC has acknowledged the need for stability. Getting into the normative text of the standards takes a little time. Peter Constable
RE: Common Locale Data Repository Project
A time may come when they decide they want their own language, Walloon. At that time they will no doubt ask for appropriate ISO etc codes. There's nothing futuristic about that: wln (http://www.loc.gov/standards/iso639-2/englangn.html#uvwxyz) Peter Constable
RE: Common Locale Data Repository Project
From: Philippe Verdy [mailto:[EMAIL PROTECTED] What is already unstable in ISO639 is the deprecation of iw and the addition of he, same thing for in and id or for yi and ji. Don't you call that unstability? I think there is a misunderstanding here. As I understand it, ISO 639-1 actually never included iw, in or ji. But somehow, something got published listing those (I don't know those exact details). So there was mixed info out there indicating both iw and he, etc. To resolve the apparent ambiguity, the ISO 639/RA-JAC had to state that the IDs iw, in and ji were deprecated. Think more recently about the new codification for Serbo-Croatian, and the split of sh, with no definition except that it is country based (Serbian, Croatian, Bosnian, Montenegrin), assimuming that one country uses only one language when in fact there are several in the same one, that are shared by multiple countries, and differ mostly by their script... I don't disagree that there aren't some difficult areas, such as this. The differences intended by sr, bs and hr do *not* have to do with script -- i.e. one cannot assume that any of these imply any particular script. They also don't imply a particular region (Serbian could be spoken outside Serbia), though clearly one country is most likely. They *do* imply linguistic differences. Here's the difficulty: in those countries, claims are made that there are linguistic differences, so much so that it is problematic to sell products there that claim support for Serbo-Croatian. On the other hand, given a document in one of these, it's difficult to say that it's specifically one of them and not the other two. ISO 639-3 will provide a macro-language identifier for Serbo-Croatian, so it will be possible to tag a document without make that distinction. Also if ISO3166 is unstable I made no claim regarding stability of ISO 3166. Serbia-Montenegro?), then it introduces unstability too within ISO 3066 or its proposed replacement 1. It is and IETF specification, not an ISO standard; the designation is **RFC** 3066. 2. The draft successor to RFC 3066 addresses this very issue. 3. (a bit on the nit-picking side, IMO, but there have been three comments on this) RFC 3066 will be *superceded*, not replaced. For now, the only workable solution to solve these issues is found in supplementary libraries in ICU which support locale aliases. (Yes I use the terme Locale because this is the term that Java gives to this identification, NO. That is the term Java (and other things) give to a *different* identification. There are languages, there are cultures/locales. The two are not the same. Peter Constable
Re: Common Locale Data Repository Project
From: Peter Constable [EMAIL PROTECTED] For now, the only workable solution to solve these issues is found in supplementary libraries in ICU which support locale aliases. (Yes I use the terme Locale because this is the term that Java gives to this identification, NO. That is the term Java (and other things) give to a *different* identification. There are languages, there are cultures/locales. The two are not the same. Then there will remain a problem in Java locales, unless the Java community accepts that the language part of a locale will contain will the language subtags of RFC 3066 or its successor, so that the API can implement a language resolver for that part only, ignoring the second and third parameter that will be used only to specify other (non-language) elements of a Locale. For now it's well known that if you create a Java application with resources bundles for Hebrew, you have to use the iw language parameter to name your bundle; if you use he, then the same properties file or class part of a bundle will not be found on a OS that the Java runtime determines as supporting the iw locale, and the application will then display only the default locale (most often English). Note that Hebrew is part of the set of fully supported languages in Java. I doubt that the JRE will be changed to use now the he code by default as long as the locale resolver in Java is not updated to use a more clever algorithm than just equality of language codes. Same problem for the Simplified Chinese language: Java supports it natively only with the TW country code separately from the zh language code. If things must change later, the Java runtime should learn to work with a zh-Hant language identifier to be used in every country where the language is used. Using zh_TW (i.e. a separate zh language code and the separate TW country code) has the bad effect of also applying other locale standards appriate only for Taiwan, but not for Macau, Hong Kong, Singapore, the Reunion and other Indian Ocean, South Asian and South African countries or territories where this language is used with other national locale conventions (currenty, time and numeric formats, phone numbers...) In fact I would like to see that Traditional and Simplified Chinese are distinct languages in the same family. And an application would better use zht and zhs language codes to make the distinction, so that zh would become an identifier for a family of Han-written languages, rather than a language identifier, and so a legacy code. This means also changes in the Locale resolver so that a OS and user locale which indicates zhs or zht will first look for resources marked with their respective language code, and later will attempt to use a zh resource if not found. A Locale resolver should be able to determine, from each properties or class of a bundle, which codes it may support, and a degree/priority of matching face to other localized resources. But I have not seen anything that suggests that an application may be able to provide such Locale resolver; for now each application has to write its own resolver to map a user locale to a matching application-defined supported locale. The automatic resolver in Java (but other systems like POSIX have the same caveats) seem quite ill, as well as the resolution order (a bit more general) currently suggested in RFC 3066 which is exactly what was implemented in Java...
RE: Common Locale Data Repository Project
From: Philippe Verdy [mailto:[EMAIL PROTECTED] In fact I would like to see that Traditional and Simplified Chinese are distinct languages in the same family. And an application would better use zht and zhs language codes to make the distinction, so that zh would become an identifier for a family of Han-written languages, rather than a language identifier, and so a legacy code. In ISO 639-3, zh will be considered a macro-language identifier. But zhs and zht would not be good ideas, and will not be considered for ISO 639 or for RFC 3066. Peter Constable
Re: Common Locale Data Repository Project
comments below. Mark __ http://www.macchiato.com - Original Message - From: Peter Constable [EMAIL PROTECTED] To: Unicode List [EMAIL PROTECTED] Sent: Sat, 2004 Apr 24 06:12 Subject: RE: Common Locale Data Repository Project From: Mark Davis [mailto:[EMAIL PROTECTED] You can reiterate it all you want; in practice, 3066 tags are used as locale identifiers. And for a narrow sense of locales, that is perfectly reasonable. For a broad sense of locale, including timezone, user's currency, religious preference, etc., it clearly would not be reasonable, and I would agree with you for that. But there are a lot of people that don't know enough to recognize that difference. So, even though a language identifier may be sufficient in many cases to name a locale, it is IMO very unhelpful to refer to RFC 3066 tags as locale identifiers as it perpetuates and leads people into wrong assumptions. Please help improve common understanding by not referring to them as locale IDs. I disagree. There is, as I have said, a perfectly reasonable, narrow sense of locale which is essentially identical to what is captured by RFC 3066. And in practice, RFC 3066 is often used with that meaning. I don't see any need to deny reality (at least not in this area ;-) As I said before, for a broader sense of locale, RFC 3066 is not sufficient to capture everything that anyone has meant by that term. ISO 639 is not unstable. It is an open code set that is being added to over time, but I don't think that should be referred to as unstable -- that term suggests other things. ISO 3066 has *demonstrated* instability, I take it you mean ISO 3166? I did not make any claim in that regard. My typo: I meant ISO 3166. However, there is no policy documented *anywhere* that says they won't. I'm working on it. The ISO 639/RA-JAC has acknowledged the need for stability. Getting into the normative text of the standards takes a little time. That's great -- any way we can help with that? Peter Constable
Re: Common Locale Data Repository Project
On Friday, April 23, 2004 7:02 AM Peter Constable [EMAIL PROTECTED] va escriure: due to the strong perception of OpenI18N.org as opensource/Linux advocates, even though CLDR project is not specifically bound to Linux. It is hard to look at OpenI18N.org's spec and not get the impression that all of that group's projects are not bound to some flavour of Unix. While CLDR certainly originates _from_ the Linux community, it is not _bound_ to it. That is, as far as I understand, it is the same datas as what use ICU, and to my knowledge, ICU runs also on Windows, which is under no way bound to [that] flavour of Unix. Or are you saying that, in as much some are advocating that everything from Microsoft is so much evil that one should not even touch it, everything that originates from Linux is not pure enough to be run on other systems? :-) The Scope clause for several sections are specifically expressed in terms of Unix-related implementations (e.g. having the scope for rendering requirements expressed as what is needed for X Window). Where are these clauses? By the way, X Window, while Unix-related, is not bound to it. For example, I ran for years a X client on a Windows desktop OS, with the server running on another non-Unix machine. In fact, we did that because the equivalent technology from Microsoft was at the time, emh, not very mature... And even if a section isn't scoped specifically in terms of a Unix-derived platform, it may specify requirements that are explicitly related to Unix implementations (e.g. that base libraries must support POSIX i18n environment variables). Again, where is it said that CLDR require any form of base libraries, much less one that support POSIX variables? Antoine
Re: Common Locale Data Repository Project
From: Antoine Leca [EMAIL PROTECTED] And even if a section isn't scoped specifically in terms of a Unix-derived platform, it may specify requirements that are explicitly related to Unix implementations (e.g. that base libraries must support POSIX i18n environment variables). Again, where is it said that CLDR require any form of base libraries, much less one that support POSIX variables? POSIX variables are normally part of most implementations of languages supported on Windows and MacOS too. It's true that Windows and MacOS has deprecated the use of environment variables for system-wide configuration or user settings, but this does not mean that this environment cannot be emulated within a program by a support library. This is already happening in Java when it is started on Windows. What is needed in fact is the support of an API in POSIX, but not a particular system feature. The Java Locale class for example is a minimum implementation API to support POSIX locales. But it could become more rich later. In fact if ISO 3066 is later standardized, the designation and use of locales could become its own API supporting standard identifiers. In fact the exact syntax of compound locale identifiers appears to me just as a parsable serialization of a more complete LocaleID object. On Windows and MacOS these identifiers can be translated to/from native system identifiers. With the CLDR data, this mapping of locale ids could become more documented and more stable. I think that the CLDR database is extremely important for software implementations, because it avoids some caveats that come from other unstable standards such as ISO 3166 and ISO 639. But as this CLDR data will still need to adapt itself to new changes in ISO 3166 (countries and territories will probably continue to change their status, may merge or split...) and ISO 639 (some new languages may become standardized), what is needed is another level of abstraction to allow accessing to locale data using older identifiers using some standardized locale resolution algorithm. Java has such a basic algorithm, which is a bit richer in ICU; if this algorithm should be tunable by user-settings or by a program, these tunings that control a locale resolution should be documented as well (notably when mapping from a locale identifier supported on one system onto another locale identifier on another system, when the localization resources are not completely identical between those systems). What can ease the interchange of locale-sensitive data and methods is the standardization of a common data encoding (Unicode), common values (CLDR locale identifiers). So I approve the migration from OpenI18n.org to Unicode.org which will ease the interoperability of systems and interchange of internationalized data.
Re: Common Locale Data Repository Project
From: Peter Constable [EMAIL PROTECTED] due to the strong perception of OpenI18N.org as opensource/Linux advocates, even though CLDR project is not specifically bound to Linux. It is hard to look at OpenI18N.org's spec and not get the impression that all of that group's projects are not bound to some flavour of Unix. We understand what you mean. Sometime perception is very important, and that's why we thought it was a good idea to transfer CLDR. As we started as Linux Internationalization Initiative(li18nux.org) and later changed name and charter as OpenI18N.org to accommodate wider platforms and platform neutral I18N technology developments, any projects at OpenI18N.org are not limited to Linux/Unix. CLDR doesn't have to be tied to any particular platform -- after all, it's just a collection of data. Yup! So hopefully this move would help more parties to join the projects. That would definitely help global interoperability for all platforms and help everybody. But I don't think you can honestly say that OpenI18N isn't tied to a particular family of platforms Most of our current projects are mainly for some flavour of Unix, since most of the participants' expertise and interests are for those platforms but we are not limited nor have to be bound to them. The only requirement for the projects in OpenI18N.org is to be open to everyone, to be developed in open process and to be opensourced. For example, one of the projects I run, the platform neutral multilingual distributed Unicode input method framework, IIIMF, runs on Windows as well, and I honestly hope Microsoft to adapt to IIIMF in the future release of Windows, so that we can unite unicode input method framework regardless of platform. Best Regards, -- [EMAIL PROTECTED],OpenI18N.org,li18nux.org,unicode.org,sun.com} Chair, OpenI18N.org/The Free Standards Group http://www.OpenI18N.org Architect/Sr. Staff Engineer, Sun Microsystems, Inc, USA eFAX: 509-693-8356
Re: Common Locale Data Repository Project
You are talking about Locale IDs. There is currently work underway on an RFC to replace 3066 (this is referenced by UTS #35), and one of the features is stability -- even where the ISO standards are not. See: ... http://www.ietf.org/internet-drafts/draft-phillips-langtags-02.txt http://www.ietf.org/internet-drafts/draft-phillips-langtags-02.pdf It is also available in HTML format on my private website here: http://www.inter-locale.com/ID/draft-phillips-langtags-02.html I will also be posting our issues list with resolutions and a link to the recent presentation by Mark and myself at the Unicode conference on that site. This version contains a few changes based on discussion on this list, notably it more closely defines the rules for using UN M49 identifiers to resolve ambiguity. It also contains semi-substantial wordsmithing in section 2 which is not substantive, but which does make the rules (we think) clearer and easier to understand. Best Regards, Addison Mark __ http://www.macchiato.com - Original Message - From: Philippe Verdy [EMAIL PROTECTED] To: Unicode List [EMAIL PROTECTED] Sent: Fri, 2004 Apr 23 02:58 Subject: Re: Common Locale Data Repository Project From: Antoine Leca [EMAIL PROTECTED] And even if a section isn't scoped specifically in terms of a Unix-derived platform, it may specify requirements that are explicitly related to Unix implementations (e.g. that base libraries must support POSIX i18n environment variables). Again, where is it said that CLDR require any form of base libraries, much less one that support POSIX variables? POSIX variables are normally part of most implementations of languages supported on Windows and MacOS too. It's true that Windows and MacOS has deprecated the use of environment variables for system-wide configuration or user settings, but this does not mean that this environment cannot be emulated within a program by a support library. This is already happening in Java when it is started on Windows. What is needed in fact is the support of an API in POSIX, but not a particular system feature. The Java Locale class for example is a minimum implementation API to support POSIX locales. But it could become more rich later. In fact if ISO 3066 is later standardized, the designation and use of locales could become its own API supporting standard identifiers. In fact the exact syntax of compound locale identifiers appears to me just as a parsable serialization of a more complete LocaleID object. On Windows and MacOS these identifiers can be translated to/from native system identifiers. With the CLDR data, this mapping of locale ids could become more documented and more stable. I think that the CLDR database is extremely important for software implementations, because it avoids some caveats that come from other unstable standards such as ISO 3166 and ISO 639. But as this CLDR data will still need to adapt itself to new changes in ISO 3166 (countries and territories will probably continue to change their status, may merge or split...) and ISO 639 (some new languages may become standardized), what is needed is another level of abstraction to allow accessing to locale data using older identifiers using some standardized locale resolution algorithm. Java has such a basic algorithm, which is a bit richer in ICU; if this algorithm should be tunable by user-settings or by a program, these tunings that control a locale resolution should be documented as well (notably when mapping from a locale identifier supported on one system onto another locale identifier on another system, when the localization resources are not completely identical between those systems). What can ease the interchange of locale-sensitive data and methods is the standardization of a common data encoding (Unicode), common values (CLDR locale identifiers). So I approve the migration from OpenI18n.org to Unicode.org which will ease the interoperability of systems and interchange of internationalized data.
RE: Common Locale Data Repository Project
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mark Davis You are talking about Locale IDs. There is currently work underway on an RFC to replace 3066 But let me reiterate from my correction to Philippe: even the replacement of RFC 3066 is a specification for *language* identification, not *locale* identification. Peter Peter Constable Globalization Infrastructure and Font Technologies Microsoft Windows Division
RE: Common Locale Data Repository Project
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Philippe Verdy In fact if ISO 3066 is later standardized, the designation and use of locales could become its own API supporting standard identifiers. I really don't want to get into this discussion but can't let this point go by: RFC 3066 (not ISO) is not a specification for *locale* identification. It is a specification for *language* identification. There are many possible cases in which this distinction is very important. I think that the CLDR database is extremely important for software implementations, because it avoids some caveats that come from other unstable standards such as ISO 3166 and ISO 639. ISO 639 is not unstable. It is an open code set that is being added to over time, but I don't think that should be referred to as unstable -- that term suggests other things. Peter Peter Constable Globalization Infrastructure and Font Technologies Microsoft Windows Division
RE: Common Locale Data Repository Project
At 16:18 -0700 2004-04-23, Peter Constable wrote: But let me reiterate from my correction to Philippe: even the replacement of RFC 3066 is a specification for *language* identification, not *locale* identification. And it is to supercede RFC 3066, with a new edition. That's different from replacing. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Common Locale Data Repository Project
Title: RE: Common Locale Data Repository Project From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Michael Everson Sent: Friday, April 23, 2004 4:31 PM At 16:18 -0700 2004-04-23, Peter Constable wrote: But let me reiterate from my correction to Philippe: even the replacement of RFC 3066 is a specification for *language* identification, not *locale* identification. And it is to supercede RFC 3066, with a new edition. That's different from replacing. Furthermore, in the IETF document architecture, the only way to amend an RFC is by a superceding RFC. RFCs are superceded all the time. /|/|ike
Re: Common Locale Data Repository Project
Mike Ayers scripsit: Furthermore, in the IETF document architecture, the only way to amend an RFC is by a superceding RFC. RFCs are superceded all the time. Almost. It's also possible for an RFC to update older RFCs without superseding them completely. For example, RFC 2396 (URI syntax) updates RFC 1766 (URL syntax), leaving the parts specific to certain URL schemes still in effect. -- John Cowan www.reutershealth.com www.ccil.org/~cowan [EMAIL PROTECTED] Lope de Vega: It wonders me I can speak at all. Some caitiff rogue did rudely yerk me on the knob, wherefrom my wits still wander. An Englishman: Ay, a filchman to the nab betimes 'll leave a man crank for a spell. --Harry Turtledove, Ruled Britannia
Re: Common Locale Data Repository Project
From: Peter Constable [EMAIL PROTECTED] I think that the CLDR database is extremely important for software implementations, because it avoids some caveats that come from other unstable standards such as ISO 3166 and ISO 639. ISO 639 is not unstable. It is an open code set that is being added to over time, but I don't think that should be referred to as unstable -- that term suggests other things. By unstable I mean in fact ambiguous, even for the correct designation of languages with a code that can be recognized. Even the proposal to supercede ISO 3066 with new tags has its caveats: which code must an application use when it already defines multiple ones (is this number bound?) to refer to the same language. The problem comes within Softwares when a user will specify a prefered language in his locale with a code that will not be understood by an application that just understands another one. This becomes worse when one software will require one code in the user's locale to support a language and another will require another code in the user's locale to support the same language. Look for example the case of Norwegian: is it no, nn or nb or no-nynorks or no-bokmal ? Even with the algorithm based on common prefixes, you won't be able to match them all. So there's a need to specify an algorithms that allows aliases to be resolved. With multi-tags language identifiers the resolution order becomes unpredictable if one supports aliases for one subtag and not the other. What is already unstable in ISO639 is the deprecation of iw and the addition of he, same thing for in and id or for yi and ji. Don't you call that unstability? OK these codes are deprecated, not reassigned. But they still cause problems. Think more recently about the new codification for Serbo-Croatian, and the split of sh, with no definition except that it is country based (Serbian, Croatian, Bosnian, Montenegrin), assimuming that one country uses only one language when in fact there are several in the same one, that are shared by multiple countries, and differ mostly by their script... Also if ISO3166 is unstable (CS: is that the former Czechoslovakia or the newer Serbia-Montenegro?), then it introduces unstability too within ISO 3066 or its proposed replacement... for the indentification of languages. For now, the only workable solution to solve these issues is found in supplementary libraries in ICU which support locale aliases. (Yes I use the terme Locale because this is the term that Java gives to this identification, based on a language code consisting into a single subtag, a country/territory code and a variant code with possibly multiple subtags, and no reference to the needed script code; I wonder how the newer RFC 3066 model will fit here).
Re: Common Locale Data Repository Project
Philippe Verdy scripsit: By unstable I mean in fact ambiguous, even for the correct designation of languages with a code that can be recognized. Even the proposal to supercede ISO 3066 with new tags has its caveats: which code must an application use when it already defines multiple ones (is this number bound?) to refer to the same language. RFC 3066 always requires that the 2-letter code be used in place of either 3-letter code if it exists. In all other cases, there is only one 3-letter code, and it is used. Some codes are vague, in the sense that they do not fully specify which language is in use. For that reason, ISO 639-3 is being defined as an upward compatible extension of ISO 639-2. Look for example the case of Norwegian: is it no, nn or nb or no-nynorks or no-bokmal ? There are two issues here: no-nynorsk and no-bokmal are now deprecated codes: that is, no application should require them, every application thta accepts nn or nb should accept them, no application should produce them. Older versions will be less forgiving and should be upgraded. The second is that no is unique, or nearly so: it designates nn and nb jointly. Now everyone who can read one can read the other, so Norwegian applications should accept any of no, nb, nn in data. But no is meaningless to a spell-checker, which should require either nb or nn. What is already unstable in ISO639 is the deprecation of iw and the addition of he, same thing for in and id or for yi and ji. Don't you call that unstability? OK these codes are deprecated, not reassigned. But they still cause problems. Not really. Again, all applications should generate he and accept both iw and he. Also if ISO3166 is unstable (CS: is that the former Czechoslovakia or the newer Serbia-Montenegro?), then it introduces unstability too within ISO 3066 or its proposed replacement... for the indentification of languages. ISO 3066bis specifies that CS will always mean Czechoslovakia, and the highly stable 3-digit code will be used for Serbia-Montenegro. For now, the only workable solution to solve these issues is found in supplementary libraries in ICU which support locale aliases. (Yes I use the terme Locale because this is the term that Java gives to this identification, based on a language code consisting into a single subtag, a country/territory code and a variant code with possibly multiple subtags, and no reference to the needed script code; I wonder how the newer RFC 3066 model will fit here). Language specifiers are conceptually different from locale specifiers. One might specify a locale of da_us to mean Danish language, U.S. measurement systems, but the language da-us would be the U.S. dialect of Danish, a very different thing. -- John Cowan www.ccil.org/~cowan www.reutershealth.com [EMAIL PROTECTED] In might the Feanorians / that swore the unforgotten oath brought war into Arvernien / with burning and with broken troth. and Elwing from her fastness dim / then cast her in the waters wide, but like a mew was swiftly borne, / uplifted o'er the roaring tide. --the Earendillinwe
Re: Common Locale Data Repository Project
You can reiterate it all you want; in practice, 3066 tags are used as locale identifiers. And for a narrow sense of locales, that is perfectly reasonable. For a broad sense of locale, including timezone, user's currency, religious preference, etc., it clearly would not be reasonable, and I would agree with you for that. ISO 639 is not unstable. It is an open code set that is being added to over time, but I don't think that should be referred to as unstable -- that term suggests other things. ISO 3066 has *demonstrated* instability, because they remove codes, then reuse those codes for different entities. It'd be like our removing a character, then later putting a different character in that spot*. ISO 639 has not yet *demonstrated* instability. They have removed codes, but since they haven't reused them, one can handle that with an alias table, keeping all the old codes usable. However, there is no policy documented *anywhere* that says they won't. As long as they don't have that, and given the demonstrated instability in ISO 3066, the standard simply cannot be trusted to be stable in the future. * Yes, I know we did that for Korean, when we were first getting started. But we learned from that, and put into place firm policies against that ever happening in the future. We have no such assurances from ISO, for some pretty key components: language codes, country codes, currency codes, or script codes. Mark __ http://www.macchiato.com - Original Message - From: Peter Constable [EMAIL PROTECTED] To: Mark Davis [EMAIL PROTECTED]; Philippe Verdy [EMAIL PROTECTED]; Unicode List [EMAIL PROTECTED] Sent: Fri, 2004 Apr 23 16:18 Subject: RE: Common Locale Data Repository Project From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mark Davis You are talking about Locale IDs. There is currently work underway on an RFC to replace 3066 But let me reiterate from my correction to Philippe: even the replacement of RFC 3066 is a specification for *language* identification, not *locale* identification. Peter Peter Constable Globalization Infrastructure and Font Technologies Microsoft Windows Division
Re: Common Locale Data Repository Project
Mark Davis scripsit: ISO 3066 has *demonstrated* instability, because they remove codes, then reuse those codes for different entities. It'd be like our removing a character, then later putting a different character in that spot*. That's ISO 3166, of course, not RFC 3066. -- Eric Raymond is the Margaret Mead John Cowan of the Open Source movement.[EMAIL PROTECTED] --Bruce Perens, http://www.ccil.org/~cowan some years agohttp://www.reutershealth.com
Re: Common Locale Data Repository Project
From: Rick McGowan [EMAIL PROTECTED] The Unicode Consortium announced today that it will be hosting the Common Locale Data Repository project, providing key building blocks for software to support the world's languages. For more information and links to the project pages, please see: http://www.unicode.org/press/press_release-cldr.html Is that a contribution of the Unicode Consortium to the OpenI18n.org project (former li18nux.org, maintained with most help from the FSF), or a decision to make the OpenI18n.org project be more open by pushing it to a more visible standard? In that case, I'm surprised to see that the preliminary pages on the Unicode.org's CLDR project defines it as a UTS (Standard) when it is a revizion of a previously published released 1.0 of LDML, plus the repository which is still hosted in the IBM's ICU project repository... Some confusion will occur for now if the CLDR pages reference a UTS (standard) rather than a UTR, which it should still be now, until there's a final approval as a standard (don't forget the Microsoft vote here, as it is camaigning a lot against Linux, which was the base platform from which the Openi18n.org project was born. Also the only certified platform for Openi18n.org is RedHat, a Linux platform... Will Microsoft endorse this addition into the domain of Unicode.org? I hope so, if this can help improve interoperability of platforms in this domain. I also hope that IBM will continue his woderful support for the CLDR collection of data for the repository, and that Microsoft and others will contribute too to make this important repository a key element for the convergence of platforms. May be this collaborative and richer standard will bring to the final approval of the unfinished ISO 3066 standard which developers and users want since so long... What will happen to the discussion lists on openi18n.org? Will it be easy to contribute locale data or to submit bug reports as it was in the past? I'm sure that the Unicode subcommitee that will take in charge the CLDR will need a new policy to accept new members using also their own technical solutions. At least I see a good point here if Openi18n.org merges with Unicode's goals: Unicode has now a concrete application of its standard (for example the CLDR will contain what has always been missing in Unicode: a clear definition of its usage with concreate languages and locales; so Unicode will not ignore the specific issues that come with some languages)
Re: Common Locale Data Repository Project
From: Philippe Verdy [EMAIL PROTECTED] From: Rick McGowan [EMAIL PROTECTED] The UnicodeĀ® Consortium announced today that it will be hosting the Common Locale Data Repository project, providing key building blocks for software to support the world's languages. Is that a contribution of the Unicode Consortium to the OpenI18n.org project (former li18nux.org, maintained with most help from the FSF), or a decision to make the OpenI18n.org project be more open by pushing it to a more visible standard? In that case, I'm surprised to see that the preliminary pages on the Unicode.org's CLDR project defines it as a UTS (Standard) when it is a revizion of a previously published released 1.0 of LDML, plus the repository which is still hosted in the IBM's ICU project repository... Given its pre-Unicode history, I'd say that it clearly fits within the realm of a UTS. As such, Microsoft or any other vendor is free to ignore or support it as much as they wish as its impact upon Unicode per se is none. For me, the interesting thing to see will be how it affects ECMAScript. For a long time, several of its functions have reserved, but not made use of a locale argument. If this standard takes off, ECMAScript may finally have something to use in its next version, whatever that ends up being. However, a bigger question emerges with the release of the draft version of UTS 35. What happened to TR 33 and TR 34? Indeed, what are they? Something must be at least tentatively planned for those numbers, but there isn't anything available publicly at least.
Re: Common Locale Data Repository Project
From: Philippe Verdy [EMAIL PROTECTED] Is that a contribution of the Unicode Consortium to the OpenI18n.org project (former li18nux.org, maintained with most help from the FSF), or a decision to make the OpenI18n.org project be more open by pushing it to a more visible standard? More on the latter, but slightly different. We believe it would be good for both opensource community and commercial IT industry that we transfer (at least a part of) the project to Unicode Consortium, after hearing the concerns on difficulty of some commercial companies to join the project due to the strong perception of OpenI18N.org as opensource/Linux advocates, even though CLDR project is not specifically bound to Linux. We hope this transfer would gain further participations from wider audiences. Regarding confusions, I have to say it is anticipated, since the project is still in transition(for example, OpenI18N.org side has not been finished necessary procedure to finalize this, so OpenI18N.org does not have a press release statement ready yet - this announcement is a little too early), I guess it will all be sorted out as time goes by. -- [EMAIL PROTECTED],OpenI18N.org,li18nux.org,unicode.org,sun.com} Chair, OpenI18N.org/The Free Standards Group http://www.OpenI18N.org Architect/Sr. Staff Engineer, Sun Microsystems, Inc, USA eFAX: 509-693-8356
Re: Common Locale Data Repository Project
However, a bigger question emerges with the release of the draft version of UTS 35. What happened to TR 33 and TR 34? Indeed, what are they? Something must be at least tentatively planned for those numbers, but there isn't anything available publicly at least. Working drafts of some material that may (and should) end up as UTR's eventually. UTR numbers are assigned sequentially, and not all documents progress with equal speed. When UTR's 32, 33, and 34 progress to the point where there is consensus that they are in good enough states to open them for general public comment as public drafts, they will, in due time, get posted along with the other drafts. As the CLDR documentation mentions, UTS #35 is being moved along particularly quickly, since it is effectively an inherited specification from another project. It is already quite mature. --Ken
RE: Common Locale Data Repository Project
due to the strong perception of OpenI18N.org as opensource/Linux advocates, even though CLDR project is not specifically bound to Linux. It is hard to look at OpenI18N.org's spec and not get the impression that all of that group's projects are not bound to some flavour of Unix. The Scope clause for several sections are specifically expressed in terms of Unix-related implementations (e.g. having the scope for rendering requirements expressed as what is needed for X Window). And even if a section isn't scoped specifically in terms of a Unix-derived platform, it may specify requirements that are explicitly related to Unix implementations (e.g. that base libraries must support POSIX i18n environment variables). CLDR doesn't have to be tied to any particular platform -- after all, it's just a collection of data. But I don't think you can honestly say that OpenI18N isn't tied to a particular family of platforms. Or, at least, I can say that when I last looked at the OpenI18N site, it sure looked like it was tied to a particular family of platforms. Peter Constable