Re: TR35
On Friday, May 14, 2004 10:22 PM, Peter Constable wrote: > It is simply inadequate analysis of usage scenarios to say "an > order form contains formatted dates / numbers / currency that need to > be interpreted, therefore this document has a locale". Sorry, you lost me. I do not know what "usage scenario" are. But if "usage scenario" describes a workflow, if the workflow involve orders, and if the amounts can be written in ambiguous form, I would have thought that, _at some level of the modelisation_, some notion of locale might be present; and then that a realisation (I hope you get my vocabulary of specification right) might have an property "locale id" attached to the "order form" document. This was the scheme I had in hand. Of course, it results that "this document has a locale" is a shorthand. Nevertheless, I did not deny your analysis. Rather, I pointed that I my view, it would be wrong to think that "no document has a locale," which is a quite different thing. In the case it was not clear before, I agree that in most cases, they do not. > But if the record is *not* in a > neutral representation, then there are several other questions that > need to be considered regarding how the string was generated, and how > the receiver knows what was assumed by the authoring process. Regarding you example: I do envision very well an application that will tag the , and also the XML document, with some externally defined locale id (and I do not mean language here). And I also have already seen a pair of application doing similar things... Whether this is sensible or not is another debate entirelly: I just point out it could be done. >> And these files do >> include or refer locale ids and language ids, sometimes named one >> for the other BTW. > > Just because someone called the two the same doesn't mean that the > notions are not distinct, and that it wouldn't be helpful for us to > understand that distinction. Again, I am lost: I did not say they are merged, just that some use the name of the former to design the latter. Now, I can accept they may be in fact the same thing, since I am not an expert of this field: just that for me, they appear as different for the moment (and the more I read in this thread, the more I stay on my initial idea that they are different.) >> And what you see as "internal to >> your process" is, to me, actually an usable, external, data. > > If you consider it external, then it is because you expect others to > use what you put there, or you are using what others put there -- and > so it is indeed external. Yes, exactly. >> See my example, >> imagining it is a text processing file: deeply inside, I have found >> the locale id of the sender. Which was an hint, not the real data I >> would have liked. > > If the document includes an ID that indicates the locale mode that was > set in the author's software when the author created that file, and > you wish to use that as a hint to set a processing mode on your end, > I have no problem with that; I have never said anything against that. This is what I missed. I claimed, this ID was considered (by me) as a locale tagging of the document (see above my full reasonment). I never claimed it was intended that way at the beginning, or in other processes, including the ones that will follow the one of recognition of the intended meaning. But in that particular process, it looked very much like a locale id tagging a document to me. > Rather, I'm saying that the conceptual model we have inherited from > the past is inadequate, and that we need to adopt a more > carefully-conceived model around which to design i18n platforms for > the future. This is starting to be interesting: we obviously will have quite of bit of "backward compatibility" (in the minds of the people) to deal with, won't we? > And it starts by understanding that while they may be > related, "locale" and "language" are conceptually two different > things. I never thought such a thing, did I? OTOH, I acknowledged your terse description of the question as being a very good thing (« ce qui se conçoit bien s'énonce clairement, et les mots pour le dire viennent aisément » --the well understood would be explained clearly, and the words to say it will flow easily-- sorry M. de Boileau for the bad English translation) Antoine
RE: TR35
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf > Of Antoine Leca > I wrote about an electronic document, sorry, file, I might receive > containing an order form, and you said documents did not encompass order > forms, as I read it. An order form is not a case we can evaluate without actually analyzing in more detail exactly how information is being exchanged, whether public protocols are in use, and how the processes on each end are to work. It is simply inadequate analysis of usage scenarios to say "an order form contains formatted dates / numbers / currency that need to be interpreted, therefore this document has a locale". For instance, if the order information is exchanged using some XML schema involving, say Buckwheat flour (bulk) 123,456 there's a very good chance that the order application was designed so that the number inside the element was in a locale-independent representation. In that case, there is no reason whatsoever to say anything more about this record than that English is used. (Actually, it would be most appropriate to simply say that the name element is in English: .) But if the record is *not* in a neutral representation, then there are several other questions that need to be considered regarding how the string was generated, and how the receiver knows what was assumed by the authoring process. The point is, we need to do analysis at that kind of level, not in sweeping terms like "order forms are documents that require locales". > And these files do > include or refer locale ids and language ids, sometimes named one for the > other BTW. Just because someone called the two the same doesn't mean that the notions are not distinct, and that it wouldn't be helpful for us to understand that distinction. > And what you see as "internal to > your process" is, to me, actually an usable, external, data. If you consider it external, then it is because you expect others to use what you put there, or you are using what others put there -- and so it is indeed external. > See my example, > imagining it is a text processing file: deeply inside, I have found the > locale id of the sender. Which was an hint, not the real data I would have > liked. If the document includes an ID that indicates the locale mode that was set in the author's software when the author created that file, and you wish to use that as a hint to set a processing mode on your end, I have no problem with that; I have never said anything against that. > To be able to have my job done, I sometimes (often, in fact) have to use > different softwares... Now, one can > just deface me saying that I am not supposed to look at that, that the users > should restrict themselves to the next release of XML. This is equivalent to > say, users are not invited to the discussions about the tools they will use... I have no qualms with what you may need to do now to get your job done. When all we have is a hammer, everything starts to look like a nail, and we need to wring as much benefit within that constraint as we can. All I'm saying is that we should be content to stay there. I have no intent of telling anyone they cannot do what they are doing. Rather, I'm saying that the conceptual model we have inherited from the past is inadequate, and that we need to adopt a more carefully-conceived model around which to design i18n platforms for the future. And it starts by understanding that while they may be related, "locale" and "language" are conceptually two different things. As for participating in the discussion, I am not trying to keep anyone out. > a very common behaviour of the computer people here in Europa, and a > behaviour I am very angry against (hence the sarcarms, for which I would > apologize). I was not aware of that background. Apology most kindly accepted. Peter Peter Constable Globalization Infrastructure and Font Technologies Microsoft Windows Division
Re: TR35
On Friday, May 14, 2004 3:30 PM, Peter Constable va escriure: >> To me, documents encompassed any style of writings (and was >> broader). For exemple, I believed that writing was invented 6 >> millenaries ago precisely for accounting and trading, *not* with the >> Hamurabi codex or the Egyptian hymns. But it appears I was wrong. > > If you get a clay tablet with some type of inventory on it and encode > it digitally, presumably there are names of things, and numbers, > perhaps also dates. Let's suppose you encode the text into a digital > document. You assign a metadata tag indicating that the "language" > (linguistic variety and writing system) is such-and-such. How would > it be useful to also assign metadata to indicate what the number > format is? I do not know, I was not thinking about that. I wrote about an electronic document, sorry, file, I might receive containing an order form, and you said documents did not encompass order forms, as I read it. So my example is void. My error was that I was considering "accounting spreadsheet or an order-entry record" as documents, while you do not. And my mistake was based, I think, on a faulty interpretation of the history of writing, as I wrote. Now, the actual content of the clay tablets is irrelevant (I think). >>> If something is going on internal to proprietary software, then >>> there are no rules. >> >> I also missed that the difference between language ids and locale >> ids only mattered when used in public documents in published >> standardized formats, and that private formats or any out-of-band >> tags, persistant or not, are irrelevant here. > > If something is internal to your process, who cares but you what is > happening? I am basicaly an user. My "process" are procedures, the objects they deal with are, among others, electronic documents, sorry, files, a number of them with proprietary formats that I can (partially) decode. And these files do include or refer locale ids and language ids, sometimes named one for the other BTW. My process is very different from yours. And what you see as "internal to your process" is, to me, actually an usable, external, data. See my example, imagining it is a text processing file: deeply inside, I have found the locale id of the sender. Which was an hint, not the real data I would have liked. To be able to have my job done, I sometimes (often, in fact) have to use different softwares. I understood CLDR as being a way to establish a common ground for these softwares to interoperate, the same way the ONLY purpose of Unicode is to allow various softwares to interoperate. And it happens that these datas (locale and language ids), hidden inside the proprietary formats of the files, are the ones that will select the datas to be used. Since I understand that I feel commited to participate to the debate. Now, one can just deface me saying that I am not supposed to look at that, that the users should restrict themselves to the next release of XML. This is equivalent to say, users are not invited to the discussions about the tools they will use, a very common behaviour of the computer people here in Europa, and a behaviour I am very angry against (hence the sarcarms, for which I would apologize). Have a nice week end, folks (I wrote that, because I noticed Satursday is a raging day for this list ;-) while I am disconnected for Internet, and much more quiet this way. There is no sarcasm, it's sincere.) Antoine
RE: TR35
> I am sorry I had misunderstood the whole discussion then. Your sarcasm isn't productive. > To me, documents encompassed any style of writings (and was broader). For > exemple, I believed that writing was invented 6 millenaries ago precisely > for accounting and trading, *not* with the Hamurabi codex or the Egyptian > hymns. But it appears I was wrong. If you get a clay tablet with some type of inventory on it and encode it digitally, presumably there are names of things, and numbers, perhaps also dates. Let's suppose you encode the text into a digital document. You assign a metadata tag indicating that the "language" (linguistic variety and writing system) is such-and-such. How would it be useful to also assign metadata to indicate what the number format is? > > If something is going on internal to proprietary software, then there > > are no rules. > > I also missed that the difference between language ids and locale ids only > mattered when used in public documents in published standardized formats, > and that private formats or any out-of-band tags, persistant or not, are > irrelevant here. If something is internal to your process, who cares but you what is happening? You could use 0x0041 to mean "B" and 0x0042 to mean "A"; that's your business. You can still claim conformance to Unicode as long as you do not emit that publicly, or apply those interpretations to characters you receive from another source. Same here. The example was a software process, and inside that process you could be using "en" to mean "mm/dd/yy" date formatting, and if it's only going on internally, then that's your business. > So please ignore my points. > Of course when we consider only the legal texts where all months shall be > in > full letters, all quantities spelled twice, one with numbers and the > other > with letters... I can only say this quite misconstrues anything I have said. Peter Constable
Re: TR35
On Thursday, May 13th, 2004 16:40, Peter Constable wrote: > Only that I don't think it's appropriate in general to tag > documents (by which I don't mean an accounting spreadsheet or an > order-entry record) for things like number formatting, and so such > info should not be included in attributes like xml:lang. I am sorry I had misunderstood the whole discussion then. To me, documents encompassed any style of writings (and was broader). For exemple, I believed that writing was invented 6 millenaries ago precisely for accounting and trading, *not* with the Hamurabi codex or the Egyptian hymns. But it appears I was wrong. > If something is going on internal to proprietary software, then there > are no rules. I also missed that the difference between language ids and locale ids only mattered when used in public documents in published standardized formats, and that private formats or any out-of-band tags, persistant or not, are irrelevant here. So please ignore my points. Of course when we consider only the legal texts where all months shall be in full letters, all quantities spelled twice, one with numbers and the other with letters, and the timezone rules explicitely deferred to some authority, you are very right. And then the example from Mark is just garbage, as many people would see it (replace "garbage" with "unreadable" if you are not happy with that word); so it is not a "document" any more, and this would be discarded as well. So I beg your pardon having abusing your time. Antoine
Re: TR35
On Thu, May 13, 2004 at 05:16:49PM -0700, Mike Ayers wrote: The only correct English way I know to write dates is "March 20, 2003", No. Try "20 March 2003", if you want English (spoken as "the 20th of March 2003"). If you want to add superscript "th" after the "20", or a comma after the month, feel free. The language you're speaking of is "American", which is a distinct, non-normative, dialect. :-) which I very rarely see. People from lots of different countries would recognize "3/20/03". Therefore we have multiple ways to write dates for This is malformed, even if recognized. And of course "01/02/03" is totally ambiguous, having at least three different "normal" readings of the six available. As expressed on forms, and other official documents, dates in my country always have day before month before year. This is true whether the month is expressed as a number or as a name (possibly abbreviated). most languages, and multiple languages for most ways to write dates. I think Peter Constable is on the right track here. -- Christopher Vance
RE: TR35
> -Original Message- (B> From: Addison Phillips [wM] [mailto:[EMAIL PROTECTED] (B> Sent: Thursday, May 13, 2004 10:16 AM (B (B[snip] (B (B> > -Original Message- (B> > From: [EMAIL PROTECTED] (B> > [mailto:[EMAIL PROTECTED] Behalf Of Peter Constable (B> > Sent: 2004$BG/(J5$B7n(J13$BF|(J 7:40 (B (BJust noticed this. So, I think we all know that Addison's "language" is US English, (Band it seems from what Mark says that that was enough for his system to determine how (Bto format the date and time, and enough for my system to determine how to interpret (Bthe date/time string his system generated. (B (B(Obviously not!) (B (B (B (BPeter (B (BPeter Constable (BGlobalization Infrastructure and Font Technologies (BMicrosoft Windows Division
RE: TR35
Title: RE: TR35 (B (B (B (B (B (B> From: [EMAIL PROTECTED] (B> [mailto:[EMAIL PROTECTED]]On Behalf Of Peter Constable (B> Sent: Thursday, May 13, 2004 4:01 PM (B (B> > You speak as if date or number formats had nothing to do (B> with language. I (B> > very (B> > much disagree. If I have message that says: "The date of (B> the last version (B> > of (B> > this document was 2003$BG/(J3$B7n(J20$BF|(J", nobody in their right mind would say (B> > that that is (B> > correct English. (B> (B> I never said they would. The correct analysis of that content (B> is that it has two runs that are in different languages. (So, (B> AFICT your example does not prove anything.) (B (B (B Actually, it can be considered as a single language, Japanese, if you accept romaji, which seem to be increasingly difficult to deny. However, I think this is irrelevant, as I fail to see that "20Mar03" (as I write 'em) or "3/20/03" (more common) qualify as "correct English", either. The only correct English way I know to write dates is "March 20, 2003", which I very rarely see. People from lots of different countries would recognize "3/20/03". Therefore we have multiple ways to write dates for most languages, and multiple languages for most ways to write dates. I think Peter Constable is on the right track here. (B (B (B/|/|ike (B (B (B (B
RE: TR35
At 11:21 AM 5/13/2004, Francois Yergeau wrote: Peter Constable a écrit : > A "language" is an attribute of content, and a "language" ID > is used for > declaration of that attribute. > > A "locale" is an operational mode of software processes, and > a "locale" > ID is used in APIs to set or determine that mode. Oversimplified, I'm afraid. Consider machine translation software or computer-aided translation tools (e.g. translation memories). In these: A "language" is an operational mode of software processes, and a "language" ID is used in APIs to set or determine that mode. I tend to support Peter's interpretation (see his rejoinder). Your examples both have obvious aspects of content. The translation memory may not be in any particular 'mode', beyond retrieving the data whose attribute is defined by the language tag of interest. This is very different from 'locale' which really does work like a mode, affecting many types of operations of an application. I think what you are after is the case where a set of rules (e.g. spelling rules) are identified by language. However, there seems to me still a difference, since the applying a spell checker etc. requires data that are in the designated language, whereas for locale-based formatting, the raw data is usually language independent. A./
RE: TR35
> You speak as if date or number formats had nothing to do with language. I (B> very (B> much disagree. If I have message that says: "The date of the last version (B> of (B> this document was 2003$BG/(J3$B7n(J20$BF|(J", nobody in their right mind would (B> say (B> that that is (B> correct English. (B (BI never said they would. The correct analysis of that content is that it has two runs (Bthat are in different languages. (So, AFICT your example does not prove anything.) (B (B (B (B> The core of what anyone means by locale is the language -- and that means, (B> in (B> our context, written language, thus including script (Cryl vs Latn) and (B> variants (B> (such as US vs UK spelling). (B (BI have been putting "language" in quotation marks because the category types involved (Binclude writing system and orthography -- you've heard my presentation on that, so you (Bknow that I agree with you on that particular point. (B (BAs for "language" being the core of what anyone means by locale, I have most certainly (Bsaid that "language" is one of the defining components of a locale. There may even be (Bsituations (translation software being an example) in which the processing mode does (Bnot care about anything else. But in general, locales -- software processing modes (Btailored for cultural user preferences -- *do* involve other non-linguistic (Bcomponents. Even in an example like translation software where such non-linguistic (Bcomponents are not needed, the infrastructure for managing the processing mode is (Bworking in terms of parameter bundles that *do* include non-linguistic components. And (Bdistinctions for such non-linguistic components are not in any situation I can think (Bof useful things to declare regarding linguistic documents. (B (B (B> The choice of language affects most of what (B> people (B> traditionally associate with software globalization, including date, time, (B> number, currency, formatting & parsing; segmentation (words, lines); (B> collation (B> and searching; resource bundle choice for translated text & appropriate (B> icons, (B> etc. (B (BC'mon, Mark. Certainly a choice of language affects how something like a date is (Bdisplayed, but it is not the only factor. If I tell you that my language is English, (Beven English with US spelling, that does *not* tell you how I want my numbers, dates, (Btimes, etc. formatted. It may give you a hint, and that hint may even lead you to do (Bwhat I want; but it also might not. (IIRC, you yourself prefer to use a date format (Bthat is *not* what most systems would guess at from being told that your language (Bpreference is US English.) Therefore it is plainly *not* the case that "language" is (Ball that anybody means by locale. Thus, the premise of your statement (B (B> So if that is all of what someone means by locale, then there is little (B> point in (B> distinguishing between "locale IDs" and "language IDs". (B (Bis not established, and thus the implication is not established. (B (BYou are making broad, general comments without considering carefully enough how things (Bare really used. To repeat something I said earlier, it would not be a good idea to (Bdesign a transaction-processing system that makes assumptions about how to interpret (Bformatted number or currency strings from a language preference, or even from being (Btold what locale was set on the originating system; I need to know exactly what (Bdetermined the formatting of the string I received. *That* is an example of the level (Bof discussion of scenarios that needs to happen before any meaningful statements about (Bwhat a "language" or "locale" ID is and how it should be used. It simply is not good (Benough to say "people traditionally associate [language] with ... date [etc.]". You (Bare trying to justify wrong (IMO) conclusions using inadequate analysis. (B (B (BLocales in general *do* involve things beyond "language", and it is wrong to put (Bdeclarations specifically for such non-linguistic things into an attribute like (Bxml:lang, and therefore (for instance) entirely unhelpful to refer to RFC3066 tags as (Blocale tags, as though there were no difference. (B (BI think 20 years of practice in software design have gotten many people stuck in a (Brut, but the fact that people have thought in a given way for twenty years doesn't (Bmake it right or desirable. (B (B (B (BPeter Constable
Re: TR35
From: "Peter Constable" <[EMAIL PROTECTED]> > All I have said is that the notions of "locale" and "language" are > distinct, that in general non-linguistic locale parameters such as > number format are not appropriate things to declare about documents, and > so we should not design systems or protocols that assume that locale > tags can be inserted in document metadata attributes where a language > tag is specified. And that it's not helpful in getting people to > understand what is or isn't good to do for someone providing some degree > of leadership in the area to use the terms "language" and "locale" > interchangeably. A locale for me goes MUCH farther than the simple slection of a few textual-related settings. In fact, any parameter that a user may which to customize to fit his need or expectations about what a software will do can be part of the general concept of "Locale". MacOS has standardizeed since long a good term for it: "Preferences" (rather than the ambiguous term "Options" found too often in Windows). Well Windows has a very large concept of Locales: see all what can be set in the HKEY_CURRENT_USER registry hive (and also, under some limits a few settings in HKEY_LOCAL_MACHINE, althoug hit is personalized only for all users of the same local system)... This goes much further than what one would define in a few POSIX environement variables. Windows has shown since long that this information is interchangeable, and is so valuable that there are hackers and merchants promoting adwares that want to steal that precious information: a complete Locale contains many things that are part of user's privacy. Defining standard "Locale IDs" will be too difficult (in fact impossible given the unbounded range of orthogonal settings). If standardization must occur, it's for some important settings that are part of a "Locale". So I think that what needs to be registered is those settings: - Language-IDs (as set in POSIX's "LANG" or "LC_ALL" environment variables). - timezones (as set in POSIX's "TZ" environment variable) and a few others if they can be thought of general interest, interchangeable, and mostly orthogonal. Let's not try making all fit in one standard ID, as I think it will never work. However, the impossibility of defining standard "Locale IDs" does not forbid defining a standard syntax to serialize lists of settings that are part of a Locale, and defining standard mechanisms to match and resolve them.
Re: TR35
From: "Mark Davis" <[EMAIL PROTECTED]> > So if one's locale definition includes something like: language=sh-Cryl-YU plus > currency=EUR plus timezone=GMT, then that is clearly something far different > than just language. May be you meant language=sh-Cyrl-YU, which however was never used and will never be used like this since "sh" was deprecated long before script codes were defined for Cyrillic. So it would be probably: LANG=sh-YU or simply LANG=sh for the legacy language written first in Cyrillic. Today you would set LANG=sr-SR or just LANG=sr mostly for Cyrillic, even even Latin is also used today (if you need the precision then LANG=sr-Cyrl-SR or simply LANG=sr-Cyrl. There's a way to create such compound locale ids orthogonal to language settings by using an attributed syntax: LANG="sr-Cyrl-sr;TZ=GMT;LC_CURRENCY=EUR" It could be a good idea to keep POSIX names for these extra orthogonal attributes... The above line would set a complete locale-ID, starting by a required language ID and optional attrbiutes for other settings. The only problem is that there's currently no support in many programs or libraries to support the attributed syntax to specify a resource search path (for example when locating the appropriate resource to use with the correct currency or timezone). However, it can be emulated on top of Locale resource class loaders (by considering that attributes are handled as overrides for named resources that would be searched within the default language-ID assigned to the locale-ID.) In Java, one would create such a locale like this: Locale loc = new Locale("sr-Cyrl-sr); loc.put("TZ", "GMT"); loc.put("LC_CURRENCY"; "EUR"); but the following would not work for now, although it would be the correct way to build a locale instance with a complete locale-ID: Locale loc = new Locale("sr-Cyrl-sr;TZ=GMT;LC_CURRENCY=EUR"); So all we can create is this object: Locale loc = new Locale("sr-Cyrl-sr); which will work but will not reference correctly the other settings. This would require some rework to make either the Locale class implement the Properties interface, or to supplement the ResourcesBundle class to allow setting such overrides. So it seems that the "Locale" class in Java does not cover correctly all what can be defined and selected in a Locale. A more meaningful name for this class should have been "LanguageID".
Re: TR35
You speak as if date or number formats had nothing to do with language. I very much disagree. If I have message that says: "The date of the last version of this document was 2003å3æ20æ", nobody in their right mind would say that that is correct English. (More on that at the end of http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/language_code_issues.html, as I pointed to). The core of what anyone means by locale is the language -- and that means, in our context, written language, thus including script (Cryl vs Latn) and variants (such as US vs UK spelling). The choice of language affects most of what people traditionally associate with software globalization, including date, time, number, currency, formatting & parsing; segmentation (words, lines); collation and searching; resource bundle choice for translated text & appropriate icons, etc. So if that is all of what someone means by locale, then there is little point in distinguishing between "locale IDs" and "language IDs". There are attributes that are clearly orthogonal to language, like choice of timezone or choice of currency (not the *formatting* of them, but the *choice*). So if one's locale definition includes something like: language=sh-Cryl-YU plus currency=EUR plus timezone=GMT, then that is clearly something far different than just language. If that is what someone means by locale, then there one must clearly distinguish between "locale IDs" and "language IDs". Syntactically, locale IDs may be an extension of language IDs, since they do form the core. Or one could use some completely different structure. In CLDR, for example, we use RFC 3066 for the language part (actually an extension, anticipating RFC 3066bis), but then use an extension mechanism for additional features that are not captured by language. Mark __ http://www.macchiato.com â à â - Original Message - From: "Peter Constable" <[EMAIL PROTECTED]> To: "Unicode Mailing List" <[EMAIL PROTECTED]> Sent: Thu, 2004 May 13 11:58 Subject: RE: TR35 > > > Moreover, you would never label a document for a > > > number format in order to determine how automated-formatting > > > of numbers should be done on the receiving system. > > > > You would not label it to determine formatting on the receiving > system, but > > to determine interpretation (parsing) of formatted values in the > received > > data. You need to know what the convention is to interpret the number > > 123.456 or the date 02/03/04. > > But as I pointed out earlier, you cannot know for certain how to > interpret it unless you know how it was generated; and if it was entered > manually by a human, you need to know what they were thinking. A locale > ID cannot tell you that. A locale ID is useful only if the string that's > received was generated automatically on the originating system (and you > know that to be the case), but I'm guessing that most of the time when > that actually happens, that string is going to be an isolated element > within a data structure. > > It is the case that in a significant number of situations the language > tag of content will include a region ID, and if I encounter a formatted > number or date string in the content, I can use that to guess what the > correct interpretation should be. But I'm not sure I'd want to build a > system for processing business transactions on such assumptions. > > > > Peter > > Peter Constable > Globalization Infrastructure and Font Technologies > Microsoft Windows Division > > >
RE: TR35
> > Moreover, you would never label a document for a > > number format in order to determine how automated-formatting > > of numbers should be done on the receiving system. > > You would not label it to determine formatting on the receiving system, but > to determine interpretation (parsing) of formatted values in the received > data. You need to know what the convention is to interpret the number > 123.456 or the date 02/03/04. But as I pointed out earlier, you cannot know for certain how to interpret it unless you know how it was generated; and if it was entered manually by a human, you need to know what they were thinking. A locale ID cannot tell you that. A locale ID is useful only if the string that's received was generated automatically on the originating system (and you know that to be the case), but I'm guessing that most of the time when that actually happens, that string is going to be an isolated element within a data structure. It is the case that in a significant number of situations the language tag of content will include a region ID, and if I encounter a formatted number or date string in the content, I can use that to guess what the correct interpretation should be. But I'm not sure I'd want to build a system for processing business transactions on such assumptions. Peter Peter Constable Globalization Infrastructure and Font Technologies Microsoft Windows Division
RE: TR35
> > A "language" is an attribute of content, and a "language" ID > > is used for > > declaration of that attribute. > > > > A "locale" is an operational mode of software processes, and > > a "locale" > > ID is used in APIs to set or determine that mode. > > Oversimplified, I'm afraid. Consider machine translation software or > computer-aided translation tools (e.g. translation memories). In these: > > A "language" is an operational mode of software processes, and > a "language" ID is used in APIs to set or determine that mode. The translation memory content has a "language" attribute, and it's appropriate to declare it using a "language" tag. Assuming the software is not dealing with things like number formats, the processing mode could be called a "language" mode or a "locale" mode. The software infrastructures provided in platforms and programming frameworks manage these modes using "locales", however, so I would say that these applications are using locales. Of course, a "language" tag in the translation memory can be used to set the processing mode ("locale") of the software. More often than not, though, I expect that what would be happening is that the "language" element of the locale is being determined, and then corresponding content is being retrieved from the translation memory. So, I disagree: I do not think it is oversimplified. What is too simple is the way that many people think and speak about it all. Peter Peter Constable Globalization Infrastructure and Font Technologies Microsoft Windows Division
RE: TR35
Addison: > Interestingly, the W3C I18N WG published a new working draft... Great! I'll certainly be interested in reading it. (When I get a chance -- I still need to look at the 2nd draft of RFC3066bis; I know, you'd like that to be done yesterday.) > I think what's interesting is that our document illustrates some of the situations in > which you might wish to exchange locale information. And I think these illustrations > go more to prove Peter's point than not. I can feel a little bit vindicated, then :-) > Locale interchange is very important to > internationalized software > So, there are very valid reasons why applications need to transfer locale preferences. That, I have never questioned. > Certainly language tags carry or imply locale information in > certain situations. Although the concepts are related, it needs to be very clear just > how much information one can infer from a language tag... > Check out our group's document (and the forthcoming requirements document) and > see if you don't agree... but we should be wary of very broad global statements (both > "all language tags are also locale tags" and "language tags are never locale tags"). I've agreed that the two are related, and I don't contest that a language tag can be useful in making decisions about setting the locale mode of a software process. All I have said is that the notions of "locale" and "language" are distinct, that in general non-linguistic locale parameters such as number format are not appropriate things to declare about documents, and so we should not design systems or protocols that assume that locale tags can be inserted in document metadata attributes where a language tag is specified. And that it's not helpful in getting people to understand what is or isn't good to do for someone providing some degree of leadership in the area to use the terms "language" and "locale" interchangeably. Peter Peter Constable Globalization Infrastructure and Font Technologies Microsoft Windows Division
RE: TR35
Peter Constable a écrit : > Moreover, you would never label a document for a > number format in order to determine how automated-formatting > of numbers should be done on the receiving system. You would not label it to determine formatting on the receiving system, but to determine interpretation (parsing) of formatted values in the received data. You need to know what the convention is to interpret the number 123.456 or the date 02/03/04. -- François
RE: TR35
Peter Constable a écrit : > A "language" is an attribute of content, and a "language" ID > is used for > declaration of that attribute. > > A "locale" is an operational mode of software processes, and > a "locale" > ID is used in APIs to set or determine that mode. Oversimplified, I'm afraid. Consider machine translation software or computer-aided translation tools (e.g. translation memories). In these: A "language" is an operational mode of software processes, and a "language" ID is used in APIs to set or determine that mode. -- François
RE: TR35
Interestingly, the W3C I18N WG published a new working draft of our Web services scenarios document just yesterday and some of that document grapples with this issue--when and how to exchange locale information and other "international preferences", as well as when and how to exchange languuage information. The document is here: http://www.w3.org/TR/2004/WD-ws-i18n-scenarios-20040512/ I think what's interesting is that our document illustrates some of the situations in which you might wish to exchange locale information. And I think these illustrations go more to prove Peter's point than not. Locale interchange is very important to internationalized software. Certainly language tags carry or imply locale information in certain situations. Although the concepts are related, it needs to be very clear just how much information one can infer from a language tag. For example, if you read XSLT (see: http://www.w3.org/TR/xslt#convert) and think that the "lang" attribute for converting numbers to strings is a locale, then you probably haven't read the text closely enough. It really means something more like language (I think this particular example illustrates just how fuzzy the edges are pretty nicely.) Antoine Leca's example is a good one (there is a similar one in the document above, donated by Mark Davis), and I think it shows how distributed software needs to have locale information in order to produce results that one could deem "correct" (if that text were generated by a message formatter, for example). But we shouldn't confuse language tagging of the result ("english") with software processing used to produce it (that sentence might have been rendered in the locale "de-DE"). So, there are very valid reasons why applications need to transfer locale preferences. Check out our group's document (and the forthcoming requirements document) and see if you don't agree... but we should be wary of very broad global statements (both "all language tags are also locale tags" and "language tags are never locale tags"). Best Regards, Addison Addison P. Phillips Director, Globalization Architecture webMethods | Delivering Global Business Visibility http://www.webMethods.com Chair, W3C Internationalization (I18N) Working Group Chair, W3C-I18N-WG, Web Services Task Force http://www.w3.org/International Internationalization is an architecture. It is not a feature. > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] Behalf Of Peter Constable > Sent: 2004å5æ13æ 7:40 > To: Unicode Mailing List > Subject: RE: TR35 > > > > Well, it is true that what I really search for is not *exactly* the > > formatting locale, but rather another wider information, which would > be the > > mind setting of the writer. > > Precisely. The locale info only tells you how a number would have been > formatted by the author's system, not what the author in fact did. When > you receive a document, being told what the system would have done > doesn't tell you anything useful. Not unless the document you receive > was generated by the system -- and I'm guessing that in many such > situations what's exchanged isn't a document per se but data structures > in which numbers are in some pre-defined representation not formatted > for the user. > > I'm not saying that there is never a need to exchange locale-setting > info. Only that I don't think it's appropriate in general to tag > documents (by which I don't mean an accounting spreadsheet or an > order-entry record) for things like number formatting, and so such info > should not be included in attributes like xml:lang. > > > > I have another example, but I cannot expose it here publicly, it is > related > > to some proprietary software. > > If something is going on internal to proprietary software, then there > are no rules. This is only about public interchange. > > > > Peter > > Peter Constable > Globalization Infrastructure and Font Technologies > Microsoft Windows Division >
RE: TR35
> Well, it is true that what I really search for is not *exactly* the > formatting locale, but rather another wider information, which would be the > mind setting of the writer. Precisely. The locale info only tells you how a number would have been formatted by the author's system, not what the author in fact did. When you receive a document, being told what the system would have done doesn't tell you anything useful. Not unless the document you receive was generated by the system -- and I'm guessing that in many such situations what's exchanged isn't a document per se but data structures in which numbers are in some pre-defined representation not formatted for the user. I'm not saying that there is never a need to exchange locale-setting info. Only that I don't think it's appropriate in general to tag documents (by which I don't mean an accounting spreadsheet or an order-entry record) for things like number formatting, and so such info should not be included in attributes like xml:lang. > I have another example, but I cannot expose it here publicly, it is related > to some proprietary software. If something is going on internal to proprietary software, then there are no rules. This is only about public interchange. Peter Peter Constable Globalization Infrastructure and Font Technologies Microsoft Windows Division
Re: TR35
On Wednesday, May 12, 2004 8:00 PM, Peter Constable va escriure: > It's not particularly useful to communicate that a document was > created when a locale with such-and-such number format was in effect, Sure? : Please send to us 100.000 units of your item 12010, available to our : warehouse by 6/7/04. We agree with the current tariff. Now it happens that I do NOT have such item 12010, only 12001 or 21001. And with the former, 10 may take sense, and 100 definitively does not. But with the latter, 100 takes sense, 10 is probably too much (and anyway I do not have that much merchandise available.) Units may be kg or t, in fact, so 3 decimals is adequate. What should I send? When? Of course, the guy is away from office, cellphone is down, etc. Well, it is true that what I really search for is not *exactly* the formatting locale, but rather another wider information, which would be the mind setting of the writer. But if the document happens to carry the locale it was formatted with, then I have an hint about its correct meaning. I agree beforehand that the locale id would not be a certain answer, just an hint. This might not be what you had in mind. I have another example, but I cannot expose it here publicly, it is related to some proprietary software. Let just say that the knowledge of the locale under which the document was created/formatted, was a preceptive knowledge to be able to render it correctly. > because that only meant how automated processes would format numbers, > the author can choose to do something else, and the document can even > use multiple formats: 1,234.56 as well as 1.234,56 (and it's not hard > to imagine how the two formats might have been automatically added to > the document at different times). Moreover, you would never label a > document for a number format in order to determine how > automated-formatting of numbers should be done on the receiving > system. I do not know about Mark, but at least I did. Now with EDIFACT there are agreements to avoid possible misunderstandings (so the tagging results useless, in fact it is already done at a superior level), but it was not always the case. And I did see, and even make, processes that deals with similarly tagged datas. For a nowadays example, think about an i15d standalone program that emits checks. I would expect such a program to be subsumed with a given locale (according to the nationality of the check to emit), then fed with the correct datas. Now, if the subsuming process is itself a generic one, it will itself be fed with datas labeled with the format to be used. Of course, we are very far away from Unicode here, even further from plain text such as Ken asks us to stick with. Clearly, the locale ids here are attributes, and even have almost nothing to do languages, so it might be inappropriate for CLDR as well (this is obscure to me at the moment.) That is just to say that while I agree with the fundamental of your distinction, I also believe that the fact that locales have been "reduced" (historically for the need of APIs) to locale ids, did then allow to use these to tag documents. And while one may argue this is "bad", there is also no way to stop people doing so... Antoine
Re: TR35
Well, I too don't have a lot of time ;-) I see both language IDs and locale IDs as having usage beyond what you say. Both can be tagging content (e.g. this content was generated in accordance with locale x, or this content represents the collation sequence for locale/language y). Both can be used in queries (give me content, but restrict to what is appropriate for languages x and y; give me content, but restrict to what is appropriate for locales z, w). I think we would both agree that timezones and currencies (but *not* their names) are orthogonal to language. Where we might differ on -- and where everyone seems to differ on -- is the meaning of the term "locale". Some interpret it very narrowly, essentially coextensive with language; some interpret it very broadly, essentially a bundle of user preferences / information). I fully agree that under the latter interpretation, it is very important to distinguish between a language ID and a locale ID. Mark __ http://www.macchiato.com â à â - Original Message - From: "Peter Constable" <[EMAIL PROTECTED]> To: "Unicode Mailing List" <[EMAIL PROTECTED]> Sent: Wed, 2004 May 12 08:45 Subject: RE: TR35 > >Here I disagree; this area is very fuzzy. See > >http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/language_ > code_issues.html, > >especially the end. > > During which you observe that "both [language IDs and locale IDs] are > somewhat nebulous concepts." (Of course, it's not the *IDs* that are > nebulous, but the types of category that they represent: "language" and > "locale".) > > I don't have time at the moment for a detailed discussion, (or to finish > reading what's here and in TR35) but have been meaning to comment on > this topic in relation to TR35, so will briefly comment here: these > concepts will remain nebulous until people understand a fundamental > distinction: > > A "language" is an attribute of content, and a "language" ID is used for > declaration of that attribute. > > A "locale" is an operational mode of software processes, and a "locale" > ID is used in APIs to set or determine that mode. > > > > Peter > > Peter Constable > Globalization Infrastructure and Font Technologies > Microsoft Windows Division > > > >
RE: TR35
> I see both language IDs and locale IDs as having usage beyond what you say. Both > can be tagging content (e.g. this content was generated in accordance with > locale x, It's not particularly useful to communicate that a document was created when a locale with such-and-such number format was in effect, because that only meant how automated processes would format numbers, the author can choose to do something else, and the document can even use multiple formats: 1,234.56 as well as 1.234,56 (and it's not hard to imagine how the two formats might have been automatically added to the document at different times). Moreover, you would never label a document for a number format in order to determine how automated-formatting of numbers should be done on the receiving system. or this content represents the collation sequence for locale/language > y). Both can be used in queries (give me content, but restrict to what is > appropriate for languages x and y; give me content, but restrict to what is > appropriate for locales z, w). I don't contest that both can be used in queries. I do not think that it makes sense to declare locale attributes of content. > I think we would both agree that timezones and currencies (but *not* their > names) are orthogonal to language. Yes. > Where we might differ on -- and where > everyone seems to differ on -- is the meaning of the term "locale". Some > interpret it very narrowly, essentially coextensive with language; I don't know that I've seen such narrow interpretation, except from you. I've already communicated my concerns at you introducing this usage, since it perpetuates confusion between two things that really are distinct: one's an attribute of content, the other is a processing mode. > some > interpret it very broadly, essentially a bundle of user preferences / > information). I'd take it slightly further: locale is a processing mode, tailored in relation to a set of (mostly or entirely culture-related) user preferences. The tailoring is done using bundles of locale data. (I'd use three terms in discussing locales: "locale" is the processing mode, "locale data" is the collection of parameter values used to configure that mode, and "locale ID" is something passed in an API to set or determine that mode.) > I fully agree that under the latter interpretation, it is very > important to distinguish between a language ID and a locale ID. I am glad we at least agree on that :-) Peter Peter Constable Globalization Infrastructure and Font Technologies Microsoft Windows Division
RE: TR35
>Here I disagree; this area is very fuzzy. See >http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/language_ code_issues.html, >especially the end. During which you observe that "both [language IDs and locale IDs] are somewhat nebulous concepts." (Of course, it's not the *IDs* that are nebulous, but the types of category that they represent: "language" and "locale".) I don't have time at the moment for a detailed discussion, (or to finish reading what's here and in TR35) but have been meaning to comment on this topic in relation to TR35, so will briefly comment here: these concepts will remain nebulous until people understand a fundamental distinction: A "language" is an attribute of content, and a "language" ID is used for declaration of that attribute. A "locale" is an operational mode of software processes, and a "locale" ID is used in APIs to set or determine that mode. Peter Peter Constable Globalization Infrastructure and Font Technologies Microsoft Windows Division
Re: TR35
> The issue of "French as spoken in Switzerland" versus "French as spoken > in Canada" is totally unrelated to the issue of Swiss conventions versus > Canadian conventions for sorting, date and time format, decimal > separator, and so forth. Here I disagree; this area is very fuzzy. See http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/language_code_issues.html, especially the end. Mark __ http://www.macchiato.com â à â - Original Message - From: "Doug Ewell" <[EMAIL PROTECTED]> To: "Unicode Mailing List" <[EMAIL PROTECTED]> Cc: "Philippe Verdy" <[EMAIL PROTECTED]> Sent: Tue, 2004 May 11 20:33 Subject: Re: TR35 > Philippe Verdy wrote: > > > From past comments I read here, it is understood now that locale > > identifiers used to select languages contain a country/territory code > > only as a legacy way to select language variants. This code is meant > > to designate the language variant as spoken in that area, but not for > > identifying a location. > > IMHO this is at, or at least near, the heart of much of the confusion > surrounding locales and the use of language/country pairs to denote > them. > > The issue of "French as spoken in Switzerland" versus "French as spoken > in Canada" is totally unrelated to the issue of Swiss conventions versus > Canadian conventions for sorting, date and time format, decimal > separator, and so forth. > > As for time zones, I agree completely with Mark that they should be > handled separately from all other locale settings, and not dependent on > them in any way. Not only do people travel, and need to change their > time zone setting while leaving everything else alone, but states and > countries do sometimes change from one time zone to another. The Olson > data shows how common that is. > > -Doug Ewell > Fullerton, California > http://users.adelphia.net/~dewell/ > > >
Re: TR35
From: "Antoine Leca" <[EMAIL PROTECTED]> > On Tuesday, May 11, 2004 6:59 PM, Philippe Verdy va escriure: > > This code is meant > > to designate the language variant as spoken in that area, but not for > > identifying a location. > > I am very sorry, but if in > LANG=fr; LC_MONETARY=es_ES > you consider that _ES above is a language variant of Spanish Castilian as > different from Hispanoamerican, you are deeply wrong. Don't infer things I did not say. I did not mean that. My sentence is valid within the context of the [LANG=] setting, not in the context of [LC_MONETARY=]. Within [LANG=], the country/territory specification is a language variant specifier, which may or may not work well to designate other localizable elements. In fact even if you used [LANG=es_ES], this may not mean only Catillan: there are other variants of Spanish in Spain (even if you exclude regional languages like Basque, Catalan, Occitan, Galician, which have also their own variants independant of [LANG=es] Spanish). [LANG=] in your example is unambiguously specifying the French language (but no implied country/territory, and thus not Spanis) and is then used as the default for other locale settings; [LC_MONETARY=] will never have a semantic for language or language-variant selection, it is really meant to designate the currency used in Spain, and formated according to currency format in Spain (the [es] prefix has no real function here except that it just selects the best script to use for digits and decimal separator and grouping) for spelling currency amounts, French terms would still be used according to [LANG], in reference to the Spanish currency (now the Euro, same as in France). Should the [LC_MONETARY=] setting be left unspecified, the currency settings would inherit from the language setting in [LANG=], that does not specify the territory (so the currency will be left unspecified to some defaults, using the digits, dot and comma as used in French, most probably in France here). The POSIX settings are very language-centric with [LANG=] used as the root setting used as the default for for other specialized settings (the only exception being [TZ=] for the timezone, which can't be infered correctly and easily from a language or even a territory).
Re: TR35
On Tuesday, May 11, 2004 6:59 PM, Philippe Verdy va escriure: > From: "Carl W. Brown" <[EMAIL PROTECTED]> >> Expats break the locale model anyway. The problem is that we use >> country as both a language modifier and a location. > > From past comments I read here, it is understood now that locale > identifiers used to select languages contain a country/territory code > only as a legacy way to select language variants. I disagree. You are seeing the locale identifiers just in the context of language tagging. It is not its primary use, nor is it the historical one, neither the most proeminent. Main usage for locale ids nowadays is to resume all the i18n settings in an environnement. And certainly i18n settings depends on the language, but also on the territory you are in. When you cross the border between Italy and Slovenia, or between Ontario and New York, the most striking difference is not the orthography or the pitch, but rather the coins. Then, main variations within a language have been historically identified with countries. This might be related to the common practice from States to affirm its independance by drawings laws on this respect. It might also be related to the current state of orthographies between both sides of Atlantic Ocean for some important languages (and even more when we consider the situation 20 years ago.) Whether this perception is correct as "first tie", or if it should be replaced by another (which one?), I cannot say. What is certain is that it is not universal. Now, the two points (locale identifiers characterizes language and territory, and languages are usually partitioned with territory information) did interfere during the last decade (certainly RFC 1766 and 3066 might be related to this process.) Carl's point, and I believe he is correct, is just that these two meanings should NOT be mixed. And that when we spoke about locales, the relevant one is the first one (the part you snipped.) > This code is meant > to designate the language variant as spoken in that area, but not for > identifying a location. I am very sorry, but if in LANG=fr; LC_MONETARY=es_ES you consider that _ES above is a language variant of Spanish Castilian as different from Hispanoamerican, you are deeply wrong. > However the set of variables in POSIX is not rich enough or tweaked, > because a single LC_ALL variable can override all these variables. You are completely distording the model here. The normal setting is as above: LANG, then LC_xxx where LANG is inadequate. LC_ALL is an alternative way, that allows a _supplementary_ level. This is very useful when you have to temporarily override the setting (please remember that POSIX is initially console-oriented), because this way you can with not too much keystrokes specify a desired behaviour for a given action, like it LC_ALL=POSIX cc myStrangeProgram.c > This means that all settings what can be defined in a locale must be > definable with the same identifier. No, it does not _mean_ that. No obligation here. Anyway, the general way to implement the standard C setlocale() is just that, an identifier (not even human-readable, that is not its point) that groups all settings. If a Taiwanese sets in .profile LC_ALL=zh_TW; export LC_ALL and then complains the locale model is wrong, everybody, you included, will tell him that what is primarly wrong is her setting. > Now a good question is: can all settings in locales be selective > enough to allow specifying correctly the possible values. Define "possible": are you writing about the set of already described locales? (the only useful, as Carl wrote, en_GU is essentially non-existent; same for 0x180c) Or about all the potential possible values, including pro_QQ for Occitan as used within the Chancellery of Toulouse? > Is the POSIX syntax enough for them? Since it exists an extension to it in ISO/IEC TR 14652, answer here is probably no. Antoine
RE: TR35
Doug, > The issue of "French as spoken in Switzerland" versus "French as spoken > in Canada" is totally unrelated to the issue of Swiss conventions versus > Canadian conventions for sorting, date and time format, decimal > separator, and so forth. > > As for time zones, I agree completely with Mark that they should be > handled separately from all other locale settings, and not dependent on > them in any way. Not only do people travel, and need to change their > time zone setting while leaving everything else alone, but states and > countries do sometimes change from one time zone to another. The Olson > data shows how common that is. My understanding of the value of locales is that they provide a standard mapping for a set of parameters be it language, country conventions or time handling. It is unfortunate that often locale information that is country based is not separated from sub language and country conventions such as currency and numeric formatting. The value of a locale is that it provides us with a way to map the locale into a common set of parameters. But to do that properly we need more flexibility. For example if I am going to send a letter it is helpful to know how the country of the recipient formats the address. But it is not that simple. The recipient's country should be in the language of the sender so that the letter can be sent to the proper country to get to the recipient. This is where Unicode comes in. With Unicode this becomes possible. I consider time zone a locale specification however is should be independent of language, script, and country. However country is useful if you want to set a default time zone selection list since in most cases you will use a time zone in the country you specify in the locale. In most cases the sub language will also be the same. However, a French speaking Canadian in Switzerland will probably want to use a French Canadian spell checker even while in Switzerland but use the Swiss currency. Carl
Re: TR35
Mark Davis wrote: > BTW, what is curious is that the way the US timezones work, even > though Pacific Time is listed as being -08:00, a *majority* of the > year it is actually -07:00, and same for the others with daylight > savings time. Interesting way of thinking about it. It was 50/50 until the rules were changed in 1987. In Europe the discrepancy is even greater than in the USA, by a week; seven months for summer time, only five (including short February) for standard time. -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/
Re: TR35
Philippe Verdy wrote: > From past comments I read here, it is understood now that locale > identifiers used to select languages contain a country/territory code > only as a legacy way to select language variants. This code is meant > to designate the language variant as spoken in that area, but not for > identifying a location. IMHO this is at, or at least near, the heart of much of the confusion surrounding locales and the use of language/country pairs to denote them. The issue of "French as spoken in Switzerland" versus "French as spoken in Canada" is totally unrelated to the issue of Swiss conventions versus Canadian conventions for sorting, date and time format, decimal separator, and so forth. As for time zones, I agree completely with Mark that they should be handled separately from all other locale settings, and not dependent on them in any way. Not only do people travel, and need to change their time zone setting while leaving everything else alone, but states and countries do sometimes change from one time zone to another. The Olson data shows how common that is. -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/
Re: TR35
As far as I'm concerned, timezone choice is completely orthogonal to locale choice. And trying to guess from the region in the locale is very chancy. The UIs I've seen just list the choices; they don't try to narrow it by country. After all, I might be traveling, or living in a different country than my language setting is for in my browser. You can do a full order of timezones, very easily, using a lexicographic ordering. To see whether timezone X is greater than timezone Y, walk back in time over each of them. At the first point where the offsets differ, the one with the greater offset is first. If they are the same throughout the database period, they are equal. That ordering relationship can be used to sort the timezones. This method will also group together all of the zones that are equal back through time to a given point. For example, all the zones that are the same back to 5 years ago will be clumped together. That being said, the one piece of data that I wish the Olson database had was: given two timezones X and Y that are identical in behavior over the last N years, which is the 'preferable' choice to show in a UI? Of course, that is a choice that might vary by locale. With that information, and the ordering, for some time period (say 5 years), one can present an ordered list of only distinct timezones over that period, and use the 'preferable' one to represent any others; either that or have a 2nd level menu or 'advanced' option to get all of them. Mark BTW, what is curious is that the way the US timezones work, even though Pacific Time is listed as being -08:00, a *majority* of the year it is actually -07:00, and same for the others with daylight savings time. __ http://www.macchiato.com â à â - Original Message - From: "Carl W. Brown" <[EMAIL PROTECTED]> To: "Unicode List" <[EMAIL PROTECTED]> Sent: Tue, 2004 May 11 07:31 Subject: RE: TR35 > Peter, > > > >If I live in Guam I will probably be using an en_US locale. > > However the "US" territory does not contain my time zone. > > Probably the best solution for this problem is to add a category > > of possessions to the territory information. This allows > > applications to enumerate available time zones for not only the > > country itself but also it possessions that might be using the locale. > > > > > > > > > > This issue is not limited to a country's possessions. Many expatriates > > and traveling business people etc want to keep their (laptop) > > computer's general locale settings as that of their home country (not > > least because changing this often destabilises data) but need to set it > > to the time zone in which they are temporarily resident. So time zones > > should be kept independent of other locale information, especially > > independent of such things as date and decimal point formats, and > > preferred languages. > > The problem is that if you give users the option to pick time zone and use > the Olson zones then you want to be able to limit the number of zones that > most people pick to the most likely ones. The time zones for the country of > the locale I am using are the most likely ones. In the case of Guam I > suspect that more people use en_US as a locale than en_GU. I don't think > that many people actually implement an en_GU locale. > > To me setting a time zone should probably start by selecting the time zone > list: > > 1) Locale country (In most cases there is only one so there is not need for > a second selection) > 2) Country and related territories or possessions. > 3) Time zones matching current system time. > 4) Time zones within one hour of current system time. > 5) All time zones in time order starting with current system time. > > To stay out of politics I would list Mainland China, Hong Kong, Singapore > and Taiwan under each other. Pick one get 4. The Falklands would be listed > und both Great Britain and Argentina. > > One good point about using Unicode we can now use script rather than code > page or specify Taiwan for Traditional script even if the person is not in > Taiwan or Hong Kong. > > Expats break the locale model anyway. The problem is that we use country as > both a language modifier and a location. Thus a Brazilian community in the > US can not pick pt_BR as a language and US as a territory. > > TR35 explicitly designates the country portion as a territory not a language > variant. Should there be two different specifications both using the same > ISO 3066-1 codes and in most cases they will be the same? > > Carl > > > >
Re: TR35
From: "Carl W. Brown" <[EMAIL PROTECTED]> > Expats break the locale model anyway. The problem is that we use country as > both a language modifier and a location. Thus a Brazilian community in the > US can not pick pt_BR as a language and US as a territory. >From past comments I read here, it is understood now that locale identifiers used to select languages contain a country/territory code only as a legacy way to select language variants. This code is meant to designate the language variant as spoken in that area, but not for identifying a location. So a user that prefers Traditional Chinese will set its locale to zh_TW even if that user is not in Taiwan. For timezones and currencies, the locale needs another spacialized setting. In POSIX, the main locale specifier is not enough: LANG selects the language, but for all other areas (currency and legal commercial constraints, time and number formats, time zone and so on) there are separate locale identifiers (TZ, LC_TIME, LC_MONETARY, LC_NUMBER...). This seems good and allows various combinations to match what is needed in user's environment. However the set of variables in POSIX is not rich enough or tweaked, because a single LC_ALL variable can override all these variables. This means that all settings what can be defined in a locale must be definable with the same identifier. Java defines one unique main locale that plays the role of the POSIX LANG setting. Any other specialized locale settings however may be set as needed by creating other instances of the Locale object. Now a good question is: can all settings in locales be selective enough to allow specifying correctly the possible values. Is the POSIX syntax enough for them? Apparently no for the timezone setting (TZ) which has almost always used distinct locale identifiers.
RE: TR35
> To stay out of politics... The Falklands would be > listed > und both Great Britain and Argentina. Falkland Islanders would not consider that to be 'staying out of politics' :) -- Benjamin Peterson [EMAIL PROTECTED]
Re: TR35
Carl W. Brown wrote: > To stay out of politics I would list Mainland China, Hong Kong, > Singapore and Taiwan under each other. Pick one get 4. I don't think Singapore belongs in that list. Nobody seriously questions its independence (and if anyone did it would be Malaysia, not China). Macao might belong there. > The Falklands would be listed und both Great Britain and Argentina. That would be staying *in* politics, IMHO. -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/
RE: TR35
Peter, > >If I live in Guam I will probably be using an en_US locale. > However the "US" territory does not contain my time zone. > Probably the best solution for this problem is to add a category > of possessions to the territory information. This allows > applications to enumerate available time zones for not only the > country itself but also it possessions that might be using the locale. > > > > > > This issue is not limited to a country's possessions. Many expatriates > and traveling business people etc want to keep their (laptop) > computer's general locale settings as that of their home country (not > least because changing this often destabilises data) but need to set it > to the time zone in which they are temporarily resident. So time zones > should be kept independent of other locale information, especially > independent of such things as date and decimal point formats, and > preferred languages. The problem is that if you give users the option to pick time zone and use the Olson zones then you want to be able to limit the number of zones that most people pick to the most likely ones. The time zones for the country of the locale I am using are the most likely ones. In the case of Guam I suspect that more people use en_US as a locale than en_GU. I don't think that many people actually implement an en_GU locale. To me setting a time zone should probably start by selecting the time zone list: 1) Locale country (In most cases there is only one so there is not need for a second selection) 2) Country and related territories or possessions. 3) Time zones matching current system time. 4) Time zones within one hour of current system time. 5) All time zones in time order starting with current system time. To stay out of politics I would list Mainland China, Hong Kong, Singapore and Taiwan under each other. Pick one get 4. The Falklands would be listed und both Great Britain and Argentina. One good point about using Unicode we can now use script rather than code page or specify Taiwan for Traditional script even if the person is not in Taiwan or Hong Kong. Expats break the locale model anyway. The problem is that we use country as both a language modifier and a location. Thus a Brazilian community in the US can not pick pt_BR as a language and US as a territory. TR35 explicitly designates the country portion as a territory not a language variant. Should there be two different specifications both using the same ISO 3066-1 codes and in most cases they will be the same? Carl
Re: TR35
On 07/05/2004 09:44, Carl W. Brown wrote: ... If I live in Guam I will probably be using an en_US locale. However the "US" territory does not contain my time zone. Probably the best solution for this problem is to add a category of possessions to the territory information. This allows applications to enumerate available time zones for not only the country itself but also it possessions that might be using the locale. This issue is not limited to a country's possessions. Many expatriates and travelling business people etc want to keep their (laptop) computer's general locale settings as that of their home country (not least because changing this often destabilises data) but need to set it to the time zone in which they are temporarily resident. So time zones should be kept independent of other locale information, especially independent of such things as date and decimal point formats, and preferred languages. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: TR35 (was: Standardize TimeZone ID)
On 07/05/2004 14:53, [EMAIL PROTECTED] wrote: ... So the database aliases one to the other. Aliases are used for timezones that are compeltely equivalent on the whole timeframe considered (apparently only starting in the early years of last century). The cutoff date is 1970-01-01; if two timezones have been the same ever since then, they are not separately encoded *unless* they are in separate national jurisdictions (because after all it is the nation-state which sets up the rules). This date is the Posix zero point. It is not always the nation-state which sets the rules. For example, in Australia each state sets its own rules; and so there are six different schemes with half hour differences, some daylight saving and some without. It is not only possible but quite likely that new distinctions will be introduced in time zones which have been the same since 1970; e.g. very likely New South Wales and Victoria have been in the same time zone ever since then, but there is a real chance that NSW will abolish daylight saving but Victoria will not. So don't assume too quickly that time zones will not be split. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: "Country possessions" (was: Re: TR35)
E. Keown wrote: >> For an authoritative list of "countries," the UN >> list is probably your best bet. > > Is this list online? -- Elaine http://unstats.un.org/unsd/methods/m49/m49alpha.htm The ISO 3166-1 FAQ points to this page as the determining factor in whether a "country" gets its own ISO 3166-1 code. There are certainly some entities here (e.g. Puerto Rico, U.S. Virgin Islands, Svalbard and Jan Mayen) that are not independent in the same sense as the world's major countries. Finding the dividing line is not easy, and one good question to ask would be, "What do I intend to do with this information?" -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/
Re: "Country possessions" (was: Re: TR35)
Elaine Keown Tucson Hi, > For an authoritative list of "countries," the UN > list is probably your best bet. Is this list online? -- Elaine __ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover
"Country possessions" (was: Re: TR35)
Philippe Verdy wrote: > The status of some "possessions" in the Antarctica (AQ) is not clear. > They are administered by existing countries for the scientific bases > that run there, but have now a limited right for their expansion (the > old maps that divided it into sectors to the pole are no longer > valid), and the territory itself is placed under an international > treaty protected by the United Nations. At least in the past, there were some countries -- including some who operate scientific bases in Antarctica and some who do not -- who made national territorial claims to portions of the Antarctican continent. The official U.S. policy, someone correct me if I'm wrong, was that the U.S. didn't recognize any country's territorial claims to Antarctica, but reserved the right to make such claims itself in the future. (As arrogant as that sounds.) Philippe's point is basically sound, that once you get beyond "countries" with their own fully autonomous government, the lines get fuzzy. Additionally, any "list of country possessions" is certain to be the subject of dispute between countries with conflicting claims. The Falkland Islands, Jammu and Kashmir, Taiwan, etc. For an authoritative list of "countries," the UN list is probably your best bet. -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/
Re: TR35 (was: Standardize TimeZone ID
It depends on what you mean by "possessions". National Parks? Furniture? Occupied countries? ... More seriously, this is an messy area. Probably the most fruitful approach would have to do is look at the international standards for postal addressing, which point off to the individual countries for their own internal subdivisions. You would then find out at least what countries *think* they own (or administer -- I'll refrain from more politically-tinged statements on this list). Mark __ http://www.macchiato.com â à â - Original Message - From: "Carl W. Brown" <[EMAIL PROTECTED]> To: "Unicode List" <[EMAIL PROTECTED]> Sent: Sat, 2004 May 08 07:54 Subject: RE: TR35 (was: Standardize TimeZone ID > Mark, > > Do you know if there is an official list of country possessions? > > Carl > > > -Original Message- > > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > > Behalf Of Mark Davis > > Sent: Friday, May 07, 2004 5:28 PM > > To: Carl W. Brown; Unicode List > > Subject: Re: TR35 (was: Standardize TimeZone ID > > > > > > If you look at LDML, you will see that it uses a narrow view of locale; > > essentially those elements that are language-specific + > > variations (like choice > > of phonebook vs dictionary collation for German). In particular, > > a locale does > > not include a time zone, nor does it include a currency; those > > are considered > > orthogonal attributes. What an LDML locale does include is the > > capacity to have > > *translated names* for time zones, and *translated names* for currencies. > > > > If someone wants to build a broader notion of locale on top of > > this they could > > do so, incorporating whatever other information is important for the given > > transactional processing, e.g., customer timezone, nearest branch office > > timezone, customer's preferred currencies, vendor's allowed > > currencies, seat > > assignment, dietary restrictions (kosher, atkins, no vegetables > > beginning with > > the letter C, ...), security status (low-, medium-, high-risk), religious > > preference (atheist vs theist), etc. > > > > Mark > > ______ > > http://www.macchiato.com > > â à â > > > > - Original Message - > > From: "Carl W. Brown" <[EMAIL PROTECTED]> > > To: "Unicode List" <[EMAIL PROTECTED]> > > Sent: Fri, 2004 May 07 14:46 > > Subject: RE: TR35 (was: Standardize TimeZone ID > > > > > > > Mark, > > > > > > > That is not a problem. The Olson IDs are not guaranteed > > > > to be unique, just unambiguous. And there are aliases. > > > > Typically these are de-unified for political > > > > purposes. Thus you may find that two different IDs produce > > > > the same results over > > > > the entire period of time in the database. > > > > > > So which timezone will the tr_TR locale in a TR35 database have? > > "Asia/Istanbul" or "Europe/Istanbul" or both? > > > > > > I guess that the territory possessions list should be an > > another database that > > is merged. > > > > > > Carl > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Re: TR35 (was: Standardize TimeZone ID
From: "Carl W. Brown" <[EMAIL PROTECTED]> > Do you know if there is an official list of country possessions? Not very complicate to build, starting by the ISO 3166-1 and UN (numeric) list of country/territory codes. I have such a list if you want. But all depends on the level of granularity you need: some "territories" in UN and ISO have a single code for the same administrative region, that covers sometimes very distant "possesions" (I'd rather use the term "dependancies"). Some of them have no formal assignment in ISO 3166-1, only some reserved codes or simply no code at all. Examples: Jersey (JE), Guernsey (GE), Chausey Islands (grouped with Jersey?), Paracel Islands (claimed by China). The status of some "possessions" in the Antarctica (AQ) is not clear. They are administered by existing countries for the scientific bases that run there, but have now a limited right for their expansion (the old maps that divided it into sectors to the pole are no longer valid), and the territory itself is placed under an international treaty protected by the United Nations. I can say that of the old French "Terre AdÃlie" which consists in only one antarctic scientific base (Dumont d'Urville), now administered within the "French Austral and Antarctic Territories" (TF), an administrative term that also covers non Antarctic islands such as Kerguelen Islands and Amsterdam Island (this territory, out of the European Union, is administered from Paris by two ministries, and is used mostly as a flagship for commercial navigation).
RE: TR35 (was: Standardize TimeZone ID
At 07:54 -0700 2004-05-08, Carl W. Brown wrote: Do you know if there is an official list of country possessions? The CIA factbook probably gets it right. I guess the UN publishes something. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: TR35 (was: Standardize TimeZone ID
Mark, Do you know if there is an official list of country possessions? Carl > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > Behalf Of Mark Davis > Sent: Friday, May 07, 2004 5:28 PM > To: Carl W. Brown; Unicode List > Subject: Re: TR35 (was: Standardize TimeZone ID > > > If you look at LDML, you will see that it uses a narrow view of locale; > essentially those elements that are language-specific + > variations (like choice > of phonebook vs dictionary collation for German). In particular, > a locale does > not include a time zone, nor does it include a currency; those > are considered > orthogonal attributes. What an LDML locale does include is the > capacity to have > *translated names* for time zones, and *translated names* for currencies. > > If someone wants to build a broader notion of locale on top of > this they could > do so, incorporating whatever other information is important for the given > transactional processing, e.g., customer timezone, nearest branch office > timezone, customer's preferred currencies, vendor's allowed > currencies, seat > assignment, dietary restrictions (kosher, atkins, no vegetables > beginning with > the letter C, ...), security status (low-, medium-, high-risk), religious > preference (atheist vs theist), etc. > > Mark > __ > http://www.macchiato.com > â à â > > - Original Message - > From: "Carl W. Brown" <[EMAIL PROTECTED]> > To: "Unicode List" <[EMAIL PROTECTED]> > Sent: Fri, 2004 May 07 14:46 > Subject: RE: TR35 (was: Standardize TimeZone ID > > > > Mark, > > > > > That is not a problem. The Olson IDs are not guaranteed > > > to be unique, just unambiguous. And there are aliases. > > > Typically these are de-unified for political > > > purposes. Thus you may find that two different IDs produce > > > the same results over > > > the entire period of time in the database. > > > > So which timezone will the tr_TR locale in a TR35 database have? > "Asia/Istanbul" or "Europe/Istanbul" or both? > > > > I guess that the territory possessions list should be an > another database that > is merged. > > > > Carl > > > > > > > > > > > > > > > > >
Re: TR35 (was: Standardize TimeZone ID
If you look at LDML, you will see that it uses a narrow view of locale; essentially those elements that are language-specific + variations (like choice of phonebook vs dictionary collation for German). In particular, a locale does not include a time zone, nor does it include a currency; those are considered orthogonal attributes. What an LDML locale does include is the capacity to have *translated names* for time zones, and *translated names* for currencies. If someone wants to build a broader notion of locale on top of this they could do so, incorporating whatever other information is important for the given transactional processing, e.g., customer timezone, nearest branch office timezone, customer's preferred currencies, vendor's allowed currencies, seat assignment, dietary restrictions (kosher, atkins, no vegetables beginning with the letter C, ...), security status (low-, medium-, high-risk), religious preference (atheist vs theist), etc. Mark __ http://www.macchiato.com â à â - Original Message - From: "Carl W. Brown" <[EMAIL PROTECTED]> To: "Unicode List" <[EMAIL PROTECTED]> Sent: Fri, 2004 May 07 14:46 Subject: RE: TR35 (was: Standardize TimeZone ID > Mark, > > > That is not a problem. The Olson IDs are not guaranteed > > to be unique, just unambiguous. And there are aliases. > > Typically these are de-unified for political > > purposes. Thus you may find that two different IDs produce > > the same results over > > the entire period of time in the database. > > So which timezone will the tr_TR locale in a TR35 database have? "Asia/Istanbul" or "Europe/Istanbul" or both? > > I guess that the territory possessions list should be an another database that is merged. > > Carl > > > > > > >
Re: TR35 (was: Standardize TimeZone ID
From: "Carl W. Brown" <[EMAIL PROTECTED]> > > That is not a problem. The Olson IDs are not guaranteed > > to be unique, just unambiguous. And there are aliases. > > Typically these are de-unified for political > > purposes. Thus you may find that two different IDs produce > > the same results over > > the entire period of time in the database. > > So which timezone will the tr_TR locale in a TR35 database have? "Asia/Istanbul" or "Europe/Istanbul" or both? Both: one is an alias of the other, which exists only as a convenience for users. However, should the eastern part of Turkey use a different timezone, tr-TR would not indicate the applicable timezone (this is what happens to the "en-US" locale, that spans many timezones). This is a good justification for a separate locale setting for TZ in POSIX locales, so a US user could set: LANG=en_US for the default locale, and TZ=America/New_York to adjust the timezone; some newer syntaxes allow setting the timezone in a combined locale ID with attributes: "en_US;tz=America/New_York". However, POSIX locales use legacy syntaxes for timezone IDs like "PST-8PDT", which specify the GMT offset and abbreviations in the standard and daylight time. Many of them are referenced in the Olson's database as aliases. For today's developments, the default timezone in softwares without timezone set should be UTC (alias Zulu or "Z"), even in a "en_US" locale, but many legacy applications use the US Pacific Time used in California as a default timezone in that locale or in the default "C" locale. ;-) I wonder why...
Re: TR35 (was: Standardize TimeZone ID
Carl W. Brown scripsit: > So which timezone will the tr_TR locale in a TR35 database have? > "Asia/Istanbul" or "Europe/Istanbul" or both? Both. > I guess that the territory possessions list should be an another > database that is merged. I think they should be in the same database. Guam is a territory, but Hawaii is integral: all the French overseas departments are integral. Simplest to treat everything as integral. -- John Cowan [EMAIL PROTECTED] www.reutershealth.com www.ccil.org/~cowan If a soldier is asked why he kills people who have done him no harm, or a terrorist why he kills innocent people with his bombs, they can always reply that war has been declared, and there are no innocent people in an enemy country in wartime. The answer is psychotic, but it is the answer that humanity has given to every act of aggression in history. --Northrop Frye
Re: TR35 (was: Standardize TimeZone ID)
Philippe Verdy scripsit: > I do agree. The fact that both "Europe/Istanbul" and "Asia/Istanbul" > are referenced is probably not really political, but it reflects > the fact that this city is on both continents, and that it's timezone > covers more than just this city. Someone leaving on the Asian area near > the city, but not in Istanbul must just wonder why its timezone is not > defined in the "Asia" subcategory, and why he must select it in Europe > (the reverse is possible). Correct. > So the database aliases one to the other. Aliases are used for timezones > that are compeltely equivalent on the whole timeframe considered > (apparently only starting in the early years of last century). The cutoff date is 1970-01-01; if two timezones have been the same ever since then, they are not separately encoded *unless* they are in separate national jurisdictions (because after all it is the nation-state which sets up the rules). This date is the Posix zero point. > when in fact solar time was most frequently used (with lots of > approximations) rather than official times. Standard time dates to the 1890s in Europe and North America; basically, its existence reflected the need for railroads to use a single time zone (or as few as possible). > What I don't know is if the Riyadh Solar Time is still in use today in > Sauda Arabia (the Olson's database only contains rules for 1987-1989). > in I believe that it is not. The intention was to set sunset (the beginning of the Islamic day) to 00:00 local time, but the difficulties in doing so were simply too great. > As well the "yearistype.sh" script is quite bogous if used to determine > leap years (is it useful or correct for US election years?). It is (the U.S. elects presidents in years that are divisible by 4 and greater than 1787, when the present constitution came into effect). No actual time zone depends on whether the year is a presidential election year, though the idea was proposed at one time. -- "But the next day there came no dawn, John Cowan and the Grey Company passed on into the [EMAIL PROTECTED] darkness of the Storm of Mordor and werehttp://www.ccil.org/~cowan lost to mortal sight; but the Dead http://reutershealth.com followed them. --"The Passing of the Grey Company"
RE: TR35 (was: Standardize TimeZone ID
Mark, > That is not a problem. The Olson IDs are not guaranteed > to be unique, just unambiguous. And there are aliases. > Typically these are de-unified for political > purposes. Thus you may find that two different IDs produce > the same results over > the entire period of time in the database. So which timezone will the tr_TR locale in a TR35 database have? "Asia/Istanbul" or "Europe/Istanbul" or both? I guess that the territory possessions list should be an another database that is merged. Carl >
Re: TR35 (was: Standardize TimeZone ID)
From: "Mark Davis" <[EMAIL PROTECTED]> > That is not a problem. The Olson IDs are not guaranteed to be unique, just > unambiguous. And there are aliases. Typically these are de-unified for political > purposes. Thus you may find that two different IDs produce the same results over > the entire period of time in the database. > > Moreover, whether or not someone wants to consider two IDs as 'equivalent' > depends on their timeframe. If I only care about the last 5 years, then many > more IDs fall into the same equivalence class than if I look over the entire > period of time covered by Olson. > > While I do not believe that the database is perfect, there is no need to invent > yet another mechanism. I do agree. The fact that both "Europe/Istanbul" and "Asia/Istanbul" are referenced is probably not really political, but it reflects the fact that this city is on both continents, and that it's timezone covers more than just this city. Someone leaving on the Asian area near the city, but not in Istanbul must just wonder why its timezone is not defined in the "Asia" subcategory, and why he must select it in Europe (the reverse is possible). So the database aliases one to the other. Aliases are used for timezones that are compeltely equivalent on the whole timeframe considered (apparently only starting in the early years of last century). I doubt that before, daylight was ever applied with consistent rules, when in fact solar time was most frequently used (with lots of approximations) rather than official times. With solar time, there's no standard timezone, as each place defines its own time, depending on seasons and the observed position of the sun in the sky. What I don't know is if the Riyadh Solar Time is still in use today in Sauda Arabia (the Olson's database only contains rules for 1987-1989). It may be in use today for determining the time of religious events, but official time is probably based on a fixed offset from UTC for practical reasons. If I use the "Asia/Riyadh89" timezone, it defines the GMTOFF field to 03:07:04 with dayly changes of daylight offsets up to December 31 (where the daylight offset is minus 3 minutes). Then after, starting Jan 1st 1990, there's no daylight offset, so I suppose that it is permanently set now to this GMTOFF value. But if I consider the comments at the top, there's a astronomical formula to compute the apparent noon time, rounded to nearest 5 seconds (due to a limit in the initial Olson implementation). So a good question remains: should the astronomical formula be used to compute official time, or should we just keep the average noon time offset 0, and ignore the Riyadh87 to 89 timezone IDs? The comment at the top is strange as it uses an number of days from January 0 (What's this??? May be Olson knows or there's a comment about this in the discussions saved in the HUGE "tzarchive" file). Also its internal "iso3166.tab" file is obsolete, as well as "zone.tab" which contains a mapping from countries/territories (with logitude/latitude of a relevant city) to lists of timezones. As well the "yearistype.sh" script is quite bogous if used to determine leap years (is it useful or correct for US election years?). May be TR35 should specify which parts of the database are referenced.
Re: TR35 (was: Standardize TimeZone ID
That is not a problem. The Olson IDs are not guaranteed to be unique, just unambiguous. And there are aliases. Typically these are de-unified for political purposes. Thus you may find that two different IDs produce the same results over the entire period of time in the database. Moreover, whether or not someone wants to consider two IDs as 'equivalent' depends on their timeframe. If I only care about the last 5 years, then many more IDs fall into the same equivalence class than if I look over the entire period of time covered by Olson. While I do not believe that the database is perfect, there is no need to invent yet another mechanism. Mark __ http://www.macchiato.com â à â - Original Message - From: "Carl W. Brown" <[EMAIL PROTECTED]> To: "Unicode List" <[EMAIL PROTECTED]> Sent: Fri, 2004 May 07 09:44 Subject: TR35 (was: Standardize TimeZone ID > Mark, > > > LDML does require the Olson IDs to identify time zones > > (as does Unix, Java, ICU,...). See the discussion in > > http://www.unicode.org/reports/tr35/. > > I found a normalization problem with the IDs. For example you have both "Asia/Istanbul" and "Europe/Istanbul" which are different names for the same time zone. I believe that the best solution is to drop the region designation because the time zones that we need are specific to a unique country. Thus "Istanbul" under "TR" works just fine. I do not believe that we need the "Etc/..." or miscellaneous aliases. > > This changes TR35 to: > > > > Pacific Time > Pacific Standard Time > Pacific Daylight Time > > > PT > PST > PDT > > San Francisco > > > It will then be part of the locale territory properties. > > Problem number 2: > > If I live in Guam I will probably be using an en_US locale. However the "US" territory does not contain my time zone. Probably the best solution for this problem is to add a category of possessions to the territory information. This allows applications to enumerate available time zones for not only the country itself but also it possessions that might be using the locale. > > Thus es_PR, en_PR, en_US, and es_US will all have access to the "Puerto_Rico" time zone without replicating data and denormalizing the database. The application can choose to include territories or not depending on its specific requirements. > > I believe that the strength of the Unicode standard is in the fact that in addition to unifying code pages it also is a mechanism to support normalizing of data and specifications. > > Carl > > > > > > >