subject:"Re\: TR35"

Re: TR35

2004-05-18 Thread Antoine Leca

On Friday, May 14, 2004 10:22 PM, Peter Constable wrote:
> It is simply inadequate analysis of usage scenarios to say "an
> order form contains formatted dates / numbers / currency that need to
> be interpreted, therefore this document has a locale".

Sorry, you lost me. I do not know what "usage scenario" are. But if "usage
scenario" describes a workflow, if the workflow involve orders, and if the
amounts can be written in ambiguous form, I would have thought that, _at
some level of the modelisation_, some notion of locale might be present; and
then that a realisation (I hope you get my vocabulary of specification
right) might have an property "locale id" attached to the "order form"
document. This was the scheme I had in hand. Of course, it results that
"this document has a locale" is a shorthand.

Nevertheless, I did not deny your analysis. Rather, I pointed that I my
view, it would be wrong to think that "no document has a locale," which is a
quite different thing.

In the case it was not clear before, I agree that in most cases, they do
not.

> But if the  record is *not* in a
> neutral representation, then there are several other questions that
> need to be considered regarding how the string was generated, and how
> the receiver knows what was assumed by the authoring process.

Regarding you example: I do envision very well an application that will tag
the , and also the XML document, with some externally defined locale id
(and I do not mean language here). And I also have already seen a pair of
application doing similar things... Whether this is sensible or not is
another debate entirelly: I just point out it could be done.

>> And these files do
>> include or refer locale ids and language ids, sometimes named one
>> for the other BTW.
>
> Just because someone called the two the same doesn't mean that the
> notions are not distinct, and that it wouldn't be helpful for us to
> understand that distinction.

Again, I am lost: I did not say they are merged, just that some use the name
of the former to design the latter. Now, I can accept they may be in fact
the same thing, since I am not an expert of this field: just that for me,
they appear as different for the moment (and the more I read in this thread,
the more I stay on my initial idea that they are different.)

>> And what you see as "internal to
>> your process" is, to me, actually an usable, external, data.
>
> If you consider it external, then it is because you expect others to
> use what you put there, or you are using what others put there -- and
> so it is indeed external.

Yes, exactly.

>> See my example,
>> imagining it is a text processing file: deeply inside, I have found
>> the locale id of the sender. Which was an hint, not the real data I
>> would have liked.
>
> If the document includes an ID that indicates the locale mode that was
> set in the author's software when the author created that file, and
> you wish to use that as a hint to set a processing mode on your end,
> I have no problem with that; I have never said anything against that.

This is what I missed.
I claimed, this ID was considered (by me) as a locale tagging of the
document (see above my full reasonment). I never claimed it was intended
that way at the beginning, or in other processes, including the ones that
will follow the one of recognition of the intended meaning.

But in that particular process, it looked very much like a locale id tagging
a document to me.

> Rather, I'm saying that the conceptual model we have inherited from
> the past is inadequate, and that we need to adopt a more
> carefully-conceived model around which to design i18n platforms for
> the future.

This is starting to be interesting: we obviously will have quite of bit of
"backward compatibility" (in the minds of the people) to deal with, won't
we?

> And it starts by understanding that while they may be
> related, "locale" and "language" are conceptually two different
> things.

I never thought such a thing, did I?

OTOH, I acknowledged your terse description of the question as being a very
good thing (« ce qui se conçoit bien s'énonce clairement, et les mots pour
le dire viennent aisément » --the well understood would be explained
clearly, and the words to say it will flow easily-- sorry M. de Boileau for
the bad English translation)

Antoine

RE: TR35

2004-05-14 Thread Peter Constable

> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf
> Of Antoine Leca


> I wrote about an electronic document, sorry, file, I might receive
> containing an order form, and you said documents did not encompass
order
> forms, as I read it.

An order form is not a case we can evaluate without actually analyzing
in more detail exactly how information is being exchanged, whether
public protocols are in use, and how the processes on each end are to
work. It is simply inadequate analysis of usage scenarios to say "an
order form contains formatted dates / numbers / currency that need to be
interpreted, therefore this document has a locale". For instance, if the
order information is exchanged using some XML schema involving, say


Buckwheat flour (bulk)
123,456


there's a very good chance that the order application was designed so
that the number inside the  element was in a locale-independent
representation. In that case, there is no reason whatsoever to say
anything more about this record than that English is used. (Actually, it
would be most appropriate to simply say that the name element is in
English: .) But if the  record is *not* in a
neutral representation, then there are several other questions that need
to be considered regarding how the string was generated, and how the
receiver knows what was assumed by the authoring process.

The point is, we need to do analysis at that kind of level, not in
sweeping terms like "order forms are documents that require locales".


> And these files do
> include or refer locale ids and language ids, sometimes named one for
the
> other BTW.

Just because someone called the two the same doesn't mean that the
notions are not distinct, and that it wouldn't be helpful for us to
understand that distinction.


> And what you see as "internal to
> your process" is, to me, actually an usable, external, data.

If you consider it external, then it is because you expect others to use
what you put there, or you are using what others put there -- and so it
is indeed external.


> See my example,
> imagining it is a text processing file: deeply inside, I have found
the
> locale id of the sender. Which was an hint, not the real data I would
have
> liked.

If the document includes an ID that indicates the locale mode that was
set in the author's software when the author created that file, and you
wish to use that as a hint to set a processing mode on your end, I have
no problem with that; I have never said anything against that.



> To be able to have my job done, I sometimes (often, in fact) have to
use
> different softwares... Now, one can
> just deface me saying that I am not supposed to look at that, that the
users
> should restrict themselves to the next release of XML. This is
equivalent to
> say, users are not invited to the discussions about the tools they
will use...

I have no qualms with what you may need to do now to get your job done.
When all we have is a hammer, everything starts to look like a nail, and
we need to wring as much benefit within that constraint as we can. All
I'm saying is that we should be content to stay there. I have no intent
of telling anyone they cannot do what they are doing. Rather, I'm saying
that the conceptual model we have inherited from the past is inadequate,
and that we need to adopt a more carefully-conceived model around which
to design i18n platforms for the future. And it starts by understanding
that while they may be related, "locale" and "language" are conceptually
two different things. As for participating in the discussion, I am not
trying to keep anyone out.


> a very common behaviour of the computer people here in Europa, and a
> behaviour I am very angry against (hence the sarcarms, for which I
would
> apologize).

I was not aware of that background. Apology most kindly accepted.



Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division

Re: TR35

2004-05-14 Thread Antoine Leca

On Friday, May 14, 2004 3:30 PM, Peter Constable va escriure:

>> To me, documents encompassed any style of writings (and was
>> broader). For exemple, I believed that writing was invented 6
>> millenaries ago precisely for accounting and trading, *not* with the
>> Hamurabi codex or the Egyptian hymns. But it appears I was wrong.
>
> If you get a clay tablet with some type of inventory on it and encode
> it digitally, presumably there are names of things, and numbers,
> perhaps also dates. Let's suppose you encode the text into a digital
> document. You assign a metadata tag indicating that the "language"
> (linguistic variety and writing system) is such-and-such. How would
> it be useful to also assign metadata to indicate what the number
> format is?

I do not know, I was not thinking about that.
I wrote about an electronic document, sorry, file, I might receive
containing an order form, and you said documents did not encompass order
forms, as I read it. So my example is void. My error was that I was
considering "accounting spreadsheet or an order-entry record" as documents,
while you do not. And my mistake was based, I think, on a faulty
interpretation of the history of writing, as I wrote.

Now, the actual content of the clay tablets is irrelevant (I think).



>>> If something is going on internal to proprietary software, then
>>> there are no rules.
>>
>> I also missed that the difference between language ids and locale
>> ids only mattered when used in public documents in published
>> standardized formats, and that private formats or any out-of-band
>> tags, persistant or not, are irrelevant here.
>
> If something is internal to your process, who cares but you what is
> happening?

I am basicaly an user. My "process" are procedures, the objects they deal
with are, among others, electronic documents, sorry, files, a number of them
with proprietary formats that I can (partially) decode. And these files do
include or refer locale ids and language ids, sometimes named one for the
other BTW.
My process is very different from yours. And what you see as "internal to
your process" is, to me, actually an usable, external, data. See my example,
imagining it is a text processing file: deeply inside, I have found the
locale id of the sender. Which was an hint, not the real data I would have
liked.

To be able to have my job done, I sometimes (often, in fact) have to use
different softwares. I understood CLDR as being a way to establish a common
ground for these softwares to interoperate, the same way the ONLY purpose of
Unicode is to allow various softwares to interoperate. And it happens that
these datas (locale and language ids), hidden inside the proprietary formats
of the files, are the ones that will select the datas to be used. Since I
understand that I feel commited to participate to the debate. Now, one can
just deface me saying that I am not supposed to look at that, that the users
should restrict themselves to the next release of XML. This is equivalent to
say, users are not invited to the discussions about the tools they will use,
a very common behaviour of the computer people here in Europa, and a
behaviour I am very angry against (hence the sarcarms, for which I would
apologize).


Have a nice week end, folks (I wrote that, because I noticed Satursday is a
raging day for this list ;-) while I am disconnected for Internet, and much
more quiet this way. There is no sarcasm, it's sincere.)

Antoine

RE: TR35

2004-05-14 Thread Peter Constable

> I am sorry I had misunderstood the whole discussion then.

Your sarcasm isn't productive.

 
> To me, documents encompassed any style of writings (and was broader).
For
> exemple, I believed that writing was invented 6 millenaries ago
precisely
> for accounting and trading, *not* with the Hamurabi codex or the
Egyptian
> hymns. But it appears I was wrong.

If you get a clay tablet with some type of inventory on it and encode it
digitally, presumably there are names of things, and numbers, perhaps
also dates. Let's suppose you encode the text into a digital document.
You assign a metadata tag indicating that the "language" (linguistic
variety and writing system) is such-and-such. How would it be useful to
also assign metadata to indicate what the number format is?

 
> > If something is going on internal to proprietary software, then
there
> > are no rules.
> 
> I also missed that the difference between language ids and locale ids
only
> mattered when used in public documents in published standardized
formats,
> and that private formats or any out-of-band tags, persistant or not,
are
> irrelevant here.

If something is internal to your process, who cares but you what is
happening? You could use 0x0041 to mean "B" and 0x0042 to mean "A";
that's your business. You can still claim conformance to Unicode as long
as you do not emit that publicly, or apply those interpretations to
characters you receive from another source. Same here. The example was a
software process, and inside that process you could be using "en" to
mean "mm/dd/yy" date formatting, and if it's only going on internally,
then that's your business.

 
> So please ignore my points.
> Of course when we consider only the legal texts where all months shall
be
> in
> full letters, all quantities  spelled twice, one with numbers and the
> other
> with letters...

I can only say this quite misconstrues anything I have said.



Peter Constable

Re: TR35

2004-05-14 Thread Antoine Leca

On Thursday, May 13th, 2004 16:40, Peter Constable wrote:
> Only that I don't think it's appropriate in general to tag
> documents (by which I don't mean an accounting spreadsheet or an
> order-entry record) for things like number formatting, and so such
> info should not be included in attributes like xml:lang.

I am sorry I had misunderstood the whole discussion then.

To me, documents encompassed any style of writings (and was broader). For
exemple, I believed that writing was invented 6 millenaries ago precisely
for accounting and trading, *not* with the Hamurabi codex or the Egyptian
hymns. But it appears I was wrong.

> If something is going on internal to proprietary software, then there
> are no rules.

I also missed that the difference between language ids and locale ids only
mattered when used in public documents in published standardized formats,
and that private formats or any out-of-band tags, persistant or not, are
irrelevant here.

So please ignore my points.
Of course when we consider only the legal texts where all months shall be in
full letters, all quantities  spelled twice, one with numbers and the other
with letters, and the timezone rules explicitely deferred to some authority,
you are very right. And then the example from Mark is just garbage, as many
people would see it (replace "garbage" with "unreadable" if you are not
happy with that word); so it is not a "document" any more, and this would be
discarded as well.

So I beg your pardon having abusing your time.

Antoine

Re: TR35

2004-05-13 Thread Christopher Vance

On Thu, May 13, 2004 at 05:16:49PM -0700, Mike Ayers wrote:
The only correct English way I know to write dates is "March 20, 2003",
No.  Try "20 March 2003", if you want English (spoken as "the 20th of
March 2003").  If you want to add superscript "th" after the "20", or
a comma after the month, feel free.  The language you're speaking of
is "American", which is a distinct, non-normative, dialect.  :-)
which I very rarely see.  People from lots of different countries would
recognize "3/20/03".  Therefore we have multiple ways to write dates for
This is malformed, even if recognized.  And of course "01/02/03" is
totally ambiguous, having at least three different "normal" readings
of the six available.  As expressed on forms, and other official
documents, dates in my country always have day before month before
year.  This is true whether the month is expressed as a number or as a
name (possibly abbreviated).
most languages, and multiple languages for most ways to write dates.  I
think Peter Constable is on the right track here.
--
Christopher Vance

RE: TR35

2004-05-13 Thread Peter Constable

> -Original Message-
(B> From: Addison Phillips [wM] [mailto:[EMAIL PROTECTED]
(B> Sent: Thursday, May 13, 2004 10:16 AM
(B
(B[snip]
(B
(B> > -Original Message-
(B> > From: [EMAIL PROTECTED]
(B> > [mailto:[EMAIL PROTECTED] Behalf Of Peter Constable
(B> > Sent: 2004$BG/(J5$B7n(J13$BF|(J 7:40
(B
(BJust noticed this. So, I think we all know that Addison's "language" is US English, 
(Band it seems from what Mark says that that was enough for his system to determine how 
(Bto format the date and time, and enough for my system to determine how to interpret 
(Bthe date/time string his system generated. 
(B
(B(Obviously not!)
(B
(B
(B
(BPeter
(B 
(BPeter Constable
(BGlobalization Infrastructure and Font Technologies
(BMicrosoft Windows Division

RE: TR35

2004-05-13 Thread Mike Ayers

Title: RE: TR35

(B
(B
(B
(B
(B
(B> From: [EMAIL PROTECTED] 
(B> [mailto:[EMAIL PROTECTED]]On Behalf Of Peter Constable
(B> Sent: Thursday, May 13, 2004 4:01 PM
(B 
(B> > You speak as if date or number formats had nothing to do 
(B> with language. I
(B> > very
(B> > much disagree. If I have message that says: "The date of 
(B> the last version
(B> > of
(B> > this document was 2003$BG/(J3$B7n(J20$BF|(J", nobody in their right mind would say
(B> > that that is
(B> > correct English.
(B> 
(B> I never said they would. The correct analysis of that content 
(B> is that it has two runs that are in different languages. (So, 
(B> AFICT your example does not prove anything.)
(B
(B
(B    Actually, it can be considered as a single language, Japanese, if you accept romaji, which seem to be increasingly difficult to deny.  However, I think this is irrelevant, as I fail to see that "20Mar03" (as I write 'em) or "3/20/03" (more common) qualify as "correct English", either.  The only correct English way I know to write dates is "March 20, 2003", which I very rarely see.  People from lots of different countries would recognize "3/20/03".  Therefore we have multiple ways to write dates for most languages, and multiple languages for most ways to write dates.  I think Peter Constable is on the right track here.
(B
(B
(B/|/|ike
(B
(B
(B
(B

RE: TR35

2004-05-13 Thread Asmus Freytag

At 11:21 AM 5/13/2004, Francois Yergeau wrote:
Peter Constable a écrit :
> A "language" is an attribute of content, and a "language" ID
> is used for
> declaration of that attribute.
>
> A "locale" is an operational mode of software processes, and
> a "locale"
> ID is used in APIs to set or determine that mode.
Oversimplified, I'm afraid.  Consider machine translation software or
computer-aided translation tools (e.g. translation memories).  In these:
  A "language" is an operational mode of software processes, and
  a "language" ID is used in APIs to set or determine that mode.
I tend to support Peter's interpretation (see his rejoinder).

Your examples both have obvious aspects of content. The translation memory
may not be in any particular 'mode', beyond retrieving the data whose attribute
is defined by the language tag of interest.
This is very different from 'locale' which really does work like a mode,
affecting many types of operations of an application.
I think what you are after is the case where a set of rules (e.g. spelling
rules) are identified by language. However, there seems to me still a 
difference,
since the applying a spell checker etc. requires data that are in the 
designated
language, whereas for locale-based formatting, the raw data is usually language
independent.

A./

RE: TR35

2004-05-13 Thread Peter Constable

> You speak as if date or number formats had nothing to do with language. I
(B> very
(B> much disagree. If I have message that says: "The date of the last version
(B> of
(B> this document was 2003$BG/(J3$B7n(J20$BF|(J", nobody in their right mind would 
(B> say
(B> that that is
(B> correct English.
(B
(BI never said they would. The correct analysis of that content is that it has two runs 
(Bthat are in different languages. (So, AFICT your example does not prove anything.)
(B
(B
(B
(B> The core of what anyone means by locale is the language -- and that means,
(B> in
(B> our context, written language, thus including script (Cryl vs Latn) and
(B> variants
(B> (such as US vs UK spelling).
(B
(BI have been putting "language" in quotation marks because the category types involved 
(Binclude writing system and orthography -- you've heard my presentation on that, so you 
(Bknow that I agree with you on that particular point.
(B
(BAs for "language" being the core of what anyone means by locale, I have most certainly 
(Bsaid that "language" is one of the defining components of a locale. There may even be 
(Bsituations (translation software being an example) in which the processing mode does 
(Bnot care about anything else. But in general, locales -- software processing modes 
(Btailored for cultural user preferences -- *do* involve other non-linguistic 
(Bcomponents. Even in an example like translation software where such non-linguistic 
(Bcomponents are not needed, the infrastructure for managing the processing mode is 
(Bworking in terms of parameter bundles that *do* include non-linguistic components. And 
(Bdistinctions for such non-linguistic components are not in any situation I can think 
(Bof useful things to declare regarding linguistic documents.
(B
(B
(B> The choice of language affects most of what
(B> people
(B> traditionally associate with software globalization, including date, time,
(B> number, currency, formatting & parsing; segmentation (words, lines);
(B> collation
(B> and searching; resource bundle choice for translated text & appropriate
(B> icons,
(B> etc.
(B
(BC'mon, Mark. Certainly a choice of language affects how something like a date is 
(Bdisplayed, but it is not the only factor. If I tell you that my language is English, 
(Beven English with US spelling, that does *not* tell you how I want my numbers, dates, 
(Btimes, etc. formatted. It may give you a hint, and that hint may even lead you to do 
(Bwhat I want; but it also might not. (IIRC, you yourself prefer to use a date format 
(Bthat is *not* what most systems would guess at from being told that your language 
(Bpreference is US English.) Therefore it is plainly *not* the case that "language" is 
(Ball that anybody means by locale. Thus, the premise of your statement
(B
(B> So if that is all of what someone means by locale, then there is little
(B> point in
(B> distinguishing between "locale IDs" and "language IDs".
(B
(Bis not established, and thus the implication is not established.
(B
(BYou are making broad, general comments without considering carefully enough how things 
(Bare really used. To repeat something I said earlier, it would not be a good idea to 
(Bdesign a transaction-processing system that makes assumptions about how to interpret 
(Bformatted number or currency strings from a language preference, or even from being 
(Btold what locale was set on the originating system; I need to know exactly what 
(Bdetermined the formatting of the string I received. *That* is an example of the level 
(Bof discussion of scenarios that needs to happen before any meaningful statements about 
(Bwhat a "language" or "locale" ID is and how it should be used. It simply is not good 
(Benough to say "people traditionally associate [language] with ... date [etc.]". You 
(Bare trying to justify wrong (IMO) conclusions using inadequate analysis.
(B
(B
(BLocales in general *do* involve things beyond "language", and it is wrong to put 
(Bdeclarations specifically for such non-linguistic things into an attribute like 
(Bxml:lang, and therefore (for instance) entirely unhelpful to refer to RFC3066 tags as 
(Blocale tags, as though there were no difference.
(B
(BI think 20 years of practice in software design have gotten many people stuck in a 
(Brut, but the fact that people have thought in a given way for twenty years doesn't 
(Bmake it right or desirable.
(B
(B
(B
(BPeter Constable

Re: TR35

2004-05-13 Thread Philippe Verdy

From: "Peter Constable" <[EMAIL PROTECTED]>
> All I have said is that the notions of "locale" and "language" are
> distinct, that in general non-linguistic locale parameters such as
> number format are not appropriate things to declare about documents, and
> so we should not design systems or protocols that assume that locale
> tags can be inserted in document metadata attributes where a language
> tag is specified. And that it's not helpful in getting people to
> understand what is or isn't good to do for someone providing some degree
> of leadership in the area to use the terms "language" and "locale"
> interchangeably.

A locale for me goes MUCH farther than the simple slection of a few
textual-related settings. In fact, any parameter that a user may which to
customize to fit his need or expectations about what a software will do can be
part of the general concept of "Locale". MacOS has standardizeed since long a
good term for it: "Preferences" (rather than the ambiguous term "Options" found
too often in Windows).

Well Windows has a very large concept of Locales: see all what can be set in the
HKEY_CURRENT_USER registry hive (and also, under some limits a few settings in
HKEY_LOCAL_MACHINE, althoug hit is personalized only for all users of the same
local system)...

This goes much further than what one would define in a few POSIX environement
variables.

Windows has shown since long that this information is interchangeable, and is so
valuable that there are hackers and merchants promoting adwares that want to
steal that precious information: a complete Locale contains many things that are
part of user's privacy.

Defining standard "Locale IDs" will be too difficult (in fact impossible given
the unbounded range of orthogonal settings). If standardization must occur, it's
for some important settings that are part of a "Locale". So I think that what
needs to be registered is those settings:
- Language-IDs (as set in POSIX's "LANG" or "LC_ALL" environment variables).
- timezones (as set in POSIX's "TZ" environment variable)
and a few others if they can be thought of general interest, interchangeable,
and mostly orthogonal.
Let's not try making all fit in one standard ID, as I think it will never work.

However, the impossibility of defining standard "Locale IDs" does not forbid
defining a standard syntax to serialize lists of settings that are part of a
Locale, and defining standard mechanisms to match and resolve them.

Re: TR35

2004-05-13 Thread Philippe Verdy

From: "Mark Davis" <[EMAIL PROTECTED]>
> So if one's locale definition includes something like: language=sh-Cryl-YU
plus
> currency=EUR plus timezone=GMT, then that is clearly something far different
> than just language.

May be you meant language=sh-Cyrl-YU, which however was never used and will
never be used like this since "sh" was deprecated long before script codes were
defined for Cyrillic. So it would be probably: LANG=sh-YU or simply LANG=sh for
the legacy language written first in Cyrillic.

Today you would set LANG=sr-SR or just LANG=sr mostly for Cyrillic, even even
Latin is also used today (if you need the precision then LANG=sr-Cyrl-SR or
simply LANG=sr-Cyrl.

There's a way to create such compound locale ids orthogonal to language settings
by using an attributed syntax:

LANG="sr-Cyrl-sr;TZ=GMT;LC_CURRENCY=EUR"

It could be a good idea to keep POSIX names for these extra orthogonal
attributes...
The above line would set a complete locale-ID, starting by a required language
ID and optional attrbiutes for other settings.
The only problem is that there's currently no support in many programs or
libraries to support the attributed syntax to specify a resource search path
(for example when locating the appropriate resource to use with the correct
currency or timezone).
However, it can be emulated on top of Locale resource class loaders (by
considering that attributes are handled as overrides for named resources that
would be searched within the default language-ID assigned to the locale-ID.)

In Java, one would create such a locale like this:
Locale loc = new Locale("sr-Cyrl-sr);
loc.put("TZ", "GMT");
loc.put("LC_CURRENCY"; "EUR");
but the following would not work for now, although it would be the correct way
to build a locale instance with a complete locale-ID:
Locale loc = new Locale("sr-Cyrl-sr;TZ=GMT;LC_CURRENCY=EUR");
So all we can create is this object:
Locale loc = new Locale("sr-Cyrl-sr);
which will work but will not reference correctly the other settings. This would
require some rework to make either the Locale class implement the Properties
interface, or to supplement the ResourcesBundle class to allow setting such
overrides. So it seems that the "Locale" class in Java does not cover correctly
all what can be defined and selected in a Locale. A more meaningful name for
this class should have been "LanguageID".

Re: TR35

2004-05-13 Thread Mark Davis

You speak as if date or number formats had nothing to do with language. I very
much disagree. If I have message that says: "The date of the last version of
this document was 2003å3æ20æ", nobody in their right mind would say that that is
correct English. (More on that at the end of
http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/language_code_issues.html,
as I pointed to).

The core of what anyone means by locale is the language -- and that means, in
our context, written language, thus including script (Cryl vs Latn) and variants
(such as US vs UK spelling). The choice of language affects most of what people
traditionally associate with software globalization, including date, time,
number, currency, formatting & parsing; segmentation (words, lines); collation
and searching; resource bundle choice for translated text & appropriate icons,
etc.

So if that is all of what someone means by locale, then there is little point in
distinguishing between "locale IDs" and "language IDs".

There are attributes that are clearly orthogonal to language, like choice of
timezone or choice of currency (not the *formatting* of them, but the *choice*).
So if one's locale definition includes something like: language=sh-Cryl-YU plus
currency=EUR plus timezone=GMT, then that is clearly something far different
than just language.

If that is what someone means by locale, then there one must clearly distinguish
between "locale IDs" and "language IDs". Syntactically, locale IDs may be an
extension of language IDs, since they do form the core. Or one could use some
completely different structure. In CLDR, for example, we use RFC 3066 for the
language part (actually an extension, anticipating RFC 3066bis), but then use an
extension mechanism for additional features that are not captured by language.

Mark
__
http://www.macchiato.com
â à â

- Original Message - 
From: "Peter Constable" <[EMAIL PROTECTED]>
To: "Unicode Mailing List" <[EMAIL PROTECTED]>
Sent: Thu, 2004 May 13 11:58
Subject: RE: TR35


> > > Moreover, you would never label a document for a
> > > number format in order to determine how automated-formatting
> > > of numbers should be done on the receiving system.
> >
> > You would not label it to determine formatting on the receiving
> system, but
> > to determine interpretation (parsing) of formatted values in the
> received
> > data.  You need to know what the convention is to interpret the number
> > 123.456 or the date 02/03/04.
>
> But as I pointed out earlier, you cannot know for certain how to
> interpret it unless you know how it was generated; and if it was entered
> manually by a human, you need to know what they were thinking. A locale
> ID cannot tell you that. A locale ID is useful only if the string that's
> received was generated automatically on the originating system (and you
> know that to be the case), but I'm guessing that most of the time when
> that actually happens, that string is going to be an isolated element
> within a data structure.
>
> It is the case that in a significant number of situations the language
> tag of content will include a region ID, and if I encounter a formatted
> number or date string in the content, I can use that to guess what the
> correct interpretation should be. But I'm not sure I'd want to build a
> system for processing business transactions on such assumptions.
>
>
>
> Peter
>
> Peter Constable
> Globalization Infrastructure and Font Technologies
> Microsoft Windows Division
>
>
>

RE: TR35

2004-05-13 Thread Peter Constable

> > Moreover, you would never label a document for a
> > number format in order to determine how automated-formatting
> > of numbers should be done on the receiving system.
> 
> You would not label it to determine formatting on the receiving
system, but
> to determine interpretation (parsing) of formatted values in the
received
> data.  You need to know what the convention is to interpret the number
> 123.456 or the date 02/03/04.

But as I pointed out earlier, you cannot know for certain how to
interpret it unless you know how it was generated; and if it was entered
manually by a human, you need to know what they were thinking. A locale
ID cannot tell you that. A locale ID is useful only if the string that's
received was generated automatically on the originating system (and you
know that to be the case), but I'm guessing that most of the time when
that actually happens, that string is going to be an isolated element
within a data structure.

It is the case that in a significant number of situations the language
tag of content will include a region ID, and if I encounter a formatted
number or date string in the content, I can use that to guess what the
correct interpretation should be. But I'm not sure I'd want to build a
system for processing business transactions on such assumptions.



Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division

RE: TR35

2004-05-13 Thread Peter Constable

> > A "language" is an attribute of content, and a "language" ID
> > is used for
> > declaration of that attribute.
> >
> > A "locale" is an operational mode of software processes, and
> > a "locale"
> > ID is used in APIs to set or determine that mode.
> 
> Oversimplified, I'm afraid.  Consider machine translation software or
> computer-aided translation tools (e.g. translation memories).  In
these:
> 
>   A "language" is an operational mode of software processes, and
>   a "language" ID is used in APIs to set or determine that mode.

The translation memory content has a "language" attribute, and it's
appropriate to declare it using a "language" tag. 

Assuming the software is not dealing with things like number formats,
the processing mode could be called a "language" mode or a "locale"
mode. The software infrastructures provided in platforms and programming
frameworks manage these modes using "locales", however, so I would say
that these applications are using locales.

Of course, a "language" tag in the translation memory can be used to set
the processing mode ("locale") of the software. More often than not,
though, I expect that what would be happening is that the "language"
element of the locale is being determined, and then corresponding
content is being retrieved from the translation memory.


So, I disagree: I do not think it is oversimplified. What is too simple
is the way that many people think and speak about it all.



Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division

RE: TR35

2004-05-13 Thread Peter Constable

Addison:

> Interestingly, the W3C I18N WG published a new working draft...

Great! I'll certainly be interested in reading it. (When I get a chance
-- I still need to look at the 2nd draft of RFC3066bis; I know, you'd
like that to be done yesterday.)


> I think what's interesting is that our document illustrates some of
the situations in
> which you might wish to exchange locale information. And I think these
illustrations
> go more to prove Peter's point than not.

I can feel a little bit vindicated, then :-)



> Locale interchange is very important to
> internationalized software

> So, there are very valid reasons why applications need to transfer
locale preferences.

That, I have never questioned.


> Certainly language tags carry or imply locale information in
> certain situations. Although the concepts are related, it needs to be
very clear just
> how much information one can infer from a language tag...

> Check out our group's document (and the forthcoming requirements
document) and
> see if you don't agree... but we should be wary of very broad global
statements (both
> "all language tags are also locale tags" and "language tags are never
locale tags").

I've agreed that the two are related, and I don't contest that a
language tag can be useful in making decisions about setting the locale
mode of a software process. 

All I have said is that the notions of "locale" and "language" are
distinct, that in general non-linguistic locale parameters such as
number format are not appropriate things to declare about documents, and
so we should not design systems or protocols that assume that locale
tags can be inserted in document metadata attributes where a language
tag is specified. And that it's not helpful in getting people to
understand what is or isn't good to do for someone providing some degree
of leadership in the area to use the terms "language" and "locale"
interchangeably.


Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division

RE: TR35

2004-05-13 Thread Francois Yergeau

Peter Constable a écrit :
> Moreover, you would never label a document for a
> number format in order to determine how automated-formatting 
> of numbers should be done on the receiving system.

You would not label it to determine formatting on the receiving system, but
to determine interpretation (parsing) of formatted values in the received
data.  You need to know what the convention is to interpret the number
123.456 or the date 02/03/04.

-- 
François

RE: TR35

2004-05-13 Thread Francois Yergeau

Peter Constable a écrit :
> A "language" is an attribute of content, and a "language" ID 
> is used for
> declaration of that attribute.
> 
> A "locale" is an operational mode of software processes, and 
> a "locale"
> ID is used in APIs to set or determine that mode.

Oversimplified, I'm afraid.  Consider machine translation software or
computer-aided translation tools (e.g. translation memories).  In these:

  A "language" is an operational mode of software processes, and 
  a "language" ID is used in APIs to set or determine that mode.

-- 
François

RE: TR35

2004-05-13 Thread Addison Phillips [wM]

Interestingly, the W3C I18N WG published a new working draft of our Web services 
scenarios document just yesterday and some of that document grapples with this 
issue--when and how to exchange locale information and other "international 
preferences", as well as when and how to exchange languuage information. The document 
is here:

http://www.w3.org/TR/2004/WD-ws-i18n-scenarios-20040512/

I think what's interesting is that our document illustrates some of the situations in 
which you might wish to exchange locale information. And I think these illustrations 
go more to prove Peter's point than not. Locale interchange is very important to 
internationalized software. Certainly language tags carry or imply locale information 
in certain situations. Although the concepts are related, it needs to be very clear 
just how much information one can infer from a language tag.

For example, if you read XSLT (see: http://www.w3.org/TR/xslt#convert) and think that 
the "lang" attribute for converting numbers to strings is a locale, then you probably 
haven't read the text closely enough. It really means something more like language (I 
think this particular example illustrates just how fuzzy the edges are pretty nicely.)

Antoine Leca's example is a good one (there is a similar one in the document above, 
donated by Mark Davis), and I think it shows how distributed software needs to have 
locale information in order to produce results that one could deem "correct" (if that 
text were generated by a message formatter, for example). But we shouldn't confuse 
language tagging of the result ("english") with software processing used to produce it 
(that sentence might have been rendered in the locale "de-DE").

So, there are very valid reasons why applications need to transfer locale preferences. 
Check out our group's document (and the forthcoming requirements document) and see if 
you don't agree... but we should be wary of very broad global statements (both "all 
language tags are also locale tags" and "language tags are never locale tags").

Best Regards,

Addison

Addison P. Phillips
Director, Globalization Architecture
webMethods | Delivering Global Business Visibility
http://www.webMethods.com
Chair, W3C Internationalization (I18N) Working Group
Chair, W3C-I18N-WG, Web Services Task Force
http://www.w3.org/International

Internationalization is an architecture. 
It is not a feature.

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] Behalf Of Peter Constable
> Sent: 2004å5æ13æ 7:40
> To: Unicode Mailing List
> Subject: RE: TR35
> 
> 
> > Well, it is true that what I really search for is not *exactly* the
> > formatting locale, but rather another wider information, which would
> be the
> > mind setting of the writer.
> 
> Precisely. The locale info only tells you how a number would have been
> formatted by the author's system, not what the author in fact did. When
> you receive a document, being told what the system would have done
> doesn't tell you anything useful. Not unless the document you receive
> was generated by the system -- and I'm guessing that in many such
> situations what's exchanged isn't a document per se but data structures
> in which numbers are in some pre-defined representation not formatted
> for the user.
> 
> I'm not saying that there is never a need to exchange locale-setting
> info. Only that I don't think it's appropriate in general to tag
> documents (by which I don't mean an accounting spreadsheet or an
> order-entry record) for things like number formatting, and so such info
> should not be included in attributes like xml:lang.
> 
> 
> > I have another example, but I cannot expose it here publicly, it is
> related
> > to some proprietary software.
> 
> If something is going on internal to proprietary software, then there
> are no rules. This is only about public interchange.
> 
> 
> 
> Peter
>  
> Peter Constable
> Globalization Infrastructure and Font Technologies
> Microsoft Windows Division
>

RE: TR35

2004-05-13 Thread Peter Constable

> Well, it is true that what I really search for is not *exactly* the
> formatting locale, but rather another wider information, which would
be the
> mind setting of the writer.

Precisely. The locale info only tells you how a number would have been
formatted by the author's system, not what the author in fact did. When
you receive a document, being told what the system would have done
doesn't tell you anything useful. Not unless the document you receive
was generated by the system -- and I'm guessing that in many such
situations what's exchanged isn't a document per se but data structures
in which numbers are in some pre-defined representation not formatted
for the user.

I'm not saying that there is never a need to exchange locale-setting
info. Only that I don't think it's appropriate in general to tag
documents (by which I don't mean an accounting spreadsheet or an
order-entry record) for things like number formatting, and so such info
should not be included in attributes like xml:lang.


> I have another example, but I cannot expose it here publicly, it is
related
> to some proprietary software.

If something is going on internal to proprietary software, then there
are no rules. This is only about public interchange.



Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division

Re: TR35

2004-05-13 Thread Antoine Leca

On Wednesday, May 12, 2004 8:00 PM, Peter Constable va escriure:
> It's not particularly useful to communicate that a document was
> created when a locale with such-and-such number format was in effect,

Sure?

: Please send to us 100.000 units of your item 12010, available to our
: warehouse by 6/7/04. We agree with the current tariff.

Now it happens that I do NOT have such item 12010, only 12001 or 21001. And
with the former, 10 may take sense, and 100 definitively does not. But
with the latter, 100 takes sense, 10 is probably too much (and anyway I
do not have that much merchandise available.) Units may be kg or t, in fact,
so 3 decimals is adequate. What should I send? When?

Of course, the guy is away from office, cellphone is down, etc.


Well, it is true that what I really search for is not *exactly* the
formatting locale, but rather another wider information, which would be the
mind setting of the writer. But if the document happens to carry the locale
it was formatted with, then I have an hint about its correct meaning.

I agree beforehand that the locale id would not be a certain answer, just an
hint. This might not be what you had in mind.


I have another example, but I cannot expose it here publicly, it is related
to some proprietary software. Let just say that the knowledge of the locale
under which the document was created/formatted, was a preceptive knowledge
to be able to render it correctly.


> because that only meant how automated processes would format numbers,
> the author can choose to do something else, and the document can even
> use multiple formats: 1,234.56 as well as 1.234,56 (and it's not hard
> to imagine how the two formats might have been automatically added to
> the document at different times). Moreover, you would never label a
> document for a number format in order to determine how
> automated-formatting of numbers should be done on the receiving
> system.

I do not know about Mark, but at least I did. Now with EDIFACT there are
agreements to avoid possible misunderstandings (so the tagging results
useless, in fact it is already done at a superior level), but it was not
always the case. And I did see, and even make, processes that deals with
similarly tagged datas.

For a nowadays example, think about an i15d standalone program that emits
checks. I would expect such a program to be subsumed with a given locale
(according to the nationality of the check to emit), then fed with the
correct datas. Now, if the subsuming process is itself a generic one, it
will itself be fed with datas labeled with the format to be used.


Of course, we are very far away from Unicode here, even further from plain
text such as Ken asks us to stick with. Clearly, the locale ids here are
attributes, and even have almost nothing to do languages, so it might be
inappropriate for CLDR as well (this is obscure to me at the moment.)
That is just to say that while I agree with the fundamental of your
distinction, I also believe that the fact that locales have been "reduced"
(historically for the need of APIs) to locale ids, did then allow to use
these to tag documents. And while one may argue this is "bad", there is also
no way to stop people doing so...


Antoine

Re: TR35

2004-05-12 Thread Mark Davis

Well, I too don't have a lot of time ;-)

I see both language IDs and locale IDs as having usage beyond what you say. Both
can be tagging content (e.g. this content was generated in accordance with
locale x, or this content represents the collation sequence for locale/language
y). Both can be used in queries (give me content, but restrict to what is
appropriate for languages x and y; give me content, but restrict to what is
appropriate for locales z, w).

I think we would both agree that timezones and currencies (but *not* their
names) are orthogonal to language. Where we might differ on -- and where
everyone seems to differ on -- is the meaning of the term "locale". Some
interpret it very narrowly, essentially coextensive with language; some
interpret it very broadly, essentially a bundle of user preferences /
information). I fully agree that under the latter interpretation, it is very
important to distinguish between a language ID and a locale ID.

Mark
__
http://www.macchiato.com
â à â

- Original Message - 
From: "Peter Constable" <[EMAIL PROTECTED]>
To: "Unicode Mailing List" <[EMAIL PROTECTED]>
Sent: Wed, 2004 May 12 08:45
Subject: RE: TR35


> >Here I disagree; this area is very fuzzy. See
> >http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/language_
> code_issues.html,
> >especially the end.
>
> During which you observe that "both [language IDs and locale IDs] are
> somewhat nebulous concepts." (Of course, it's not the *IDs* that are
> nebulous, but the types of category that they represent: "language" and
> "locale".)
>
> I don't have time at the moment for a detailed discussion, (or to finish
> reading what's here and in TR35) but have been meaning to comment on
> this topic in relation to TR35, so will briefly comment here: these
> concepts will remain nebulous until people understand a fundamental
> distinction:
>
> A "language" is an attribute of content, and a "language" ID is used for
> declaration of that attribute.
>
> A "locale" is an operational mode of software processes, and a "locale"
> ID is used in APIs to set or determine that mode.
>
>
>
> Peter
>
> Peter Constable
> Globalization Infrastructure and Font Technologies
> Microsoft Windows Division
>
>
>
>

RE: TR35

2004-05-12 Thread Peter Constable

> I see both language IDs and locale IDs as having usage beyond what you
say. Both
> can be tagging content (e.g. this content was generated in accordance
with
> locale x,

It's not particularly useful to communicate that a document was created
when a locale with such-and-such number format was in effect, because
that only meant how automated processes would format numbers, the author
can choose to do something else, and the document can even use multiple
formats: 1,234.56 as well as 1.234,56 (and it's not hard to imagine how
the two formats might have been automatically added to the document at
different times). Moreover, you would never label a document for a
number format in order to determine how automated-formatting of numbers
should be done on the receiving system.


 or this content represents the collation sequence for locale/language
> y). Both can be used in queries (give me content, but restrict to what
is
> appropriate for languages x and y; give me content, but restrict to
what is
> appropriate for locales z, w).

I don't contest that both can be used in queries. I do not think that it
makes sense to declare locale attributes of content.


 
> I think we would both agree that timezones and currencies (but *not*
their
> names) are orthogonal to language.

Yes.


> Where we might differ on -- and where
> everyone seems to differ on -- is the meaning of the term "locale".
Some
> interpret it very narrowly, essentially coextensive with language;

I don't know that I've seen such narrow interpretation, except from you.
I've already communicated my concerns at you introducing this usage,
since it perpetuates confusion between two things that really are
distinct: one's an attribute of content, the other is a processing mode.

> some
> interpret it very broadly, essentially a bundle of user preferences /
> information). 

I'd take it slightly further: locale is a processing mode, tailored in
relation to a set of (mostly or entirely culture-related) user
preferences. The tailoring is done using bundles of locale data.

(I'd use three terms in discussing locales: "locale" is the processing
mode, "locale data" is the collection of parameter values used to
configure that mode, and "locale ID" is something passed in an API to
set or determine that mode.)



> I fully agree that under the latter interpretation, it is very
> important to distinguish between a language ID and a locale ID.

I am glad we at least agree on that :-)


Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division

RE: TR35

2004-05-12 Thread Peter Constable

>Here I disagree; this area is very fuzzy. See
>http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/language_
code_issues.html,
>especially the end.

During which you observe that "both [language IDs and locale IDs] are
somewhat nebulous concepts." (Of course, it's not the *IDs* that are
nebulous, but the types of category that they represent: "language" and
"locale".) 

I don't have time at the moment for a detailed discussion, (or to finish
reading what's here and in TR35) but have been meaning to comment on
this topic in relation to TR35, so will briefly comment here: these
concepts will remain nebulous until people understand a fundamental
distinction:

A "language" is an attribute of content, and a "language" ID is used for
declaration of that attribute.

A "locale" is an operational mode of software processes, and a "locale"
ID is used in APIs to set or determine that mode.



Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division

Re: TR35

2004-05-12 Thread Mark Davis

> The issue of "French as spoken in Switzerland" versus "French as spoken
> in Canada" is totally unrelated to the issue of Swiss conventions versus
> Canadian conventions for sorting, date and time format, decimal
> separator, and so forth.

Here I disagree; this area is very fuzzy. See
http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/language_code_issues.html,
especially the end.

Mark
__
http://www.macchiato.com
â à â

- Original Message - 
From: "Doug Ewell" <[EMAIL PROTECTED]>
To: "Unicode Mailing List" <[EMAIL PROTECTED]>
Cc: "Philippe Verdy" <[EMAIL PROTECTED]>
Sent: Tue, 2004 May 11 20:33
Subject: Re: TR35


> Philippe Verdy  wrote:
>
> > From past comments I read here, it is understood now that locale
> > identifiers used to select languages contain a country/territory code
> > only as a legacy way to select language variants. This code is meant
> > to designate the language variant as spoken in that area, but not for
> > identifying a location.
>
> IMHO this is at, or at least near, the heart of much of the confusion
> surrounding locales and the use of language/country pairs to denote
> them.
>
> The issue of "French as spoken in Switzerland" versus "French as spoken
> in Canada" is totally unrelated to the issue of Swiss conventions versus
> Canadian conventions for sorting, date and time format, decimal
> separator, and so forth.
>
> As for time zones, I agree completely with Mark that they should be
> handled separately from all other locale settings, and not dependent on
> them in any way.  Not only do people travel, and need to change their
> time zone setting while leaving everything else alone, but states and
> countries do sometimes change from one time zone to another.  The Olson
> data shows how common that is.
>
> -Doug Ewell
>  Fullerton, California
>  http://users.adelphia.net/~dewell/
>
>
>

Re: TR35

2004-05-12 Thread Philippe Verdy

From: "Antoine Leca" <[EMAIL PROTECTED]>
> On Tuesday, May 11, 2004 6:59 PM, Philippe Verdy va escriure:
> > This code is meant
> > to designate the language variant as spoken in that area, but not for
> > identifying a location.
>
> I am very sorry, but if in
> LANG=fr; LC_MONETARY=es_ES
> you consider that _ES above is a language variant of Spanish Castilian as
> different from Hispanoamerican, you are deeply wrong.

Don't infer things I did not say. I did not mean that. My sentence is valid
within the context of the [LANG=] setting, not in the context of [LC_MONETARY=].
Within [LANG=], the country/territory specification is a language variant
specifier, which may or may not work well to designate other localizable
elements.

In fact even if you used [LANG=es_ES], this may not mean only Catillan: there
are other variants of Spanish in Spain (even if you exclude regional languages
like Basque, Catalan, Occitan, Galician, which have also their own variants
independant of [LANG=es] Spanish).

[LANG=] in your example is unambiguously specifying the French language (but no
implied country/territory, and thus not Spanis) and is then used as the default
for other locale settings; [LC_MONETARY=] will never have a semantic for
language or language-variant selection, it is really meant to designate the
currency used in Spain, and formated according to currency format in Spain (the
[es] prefix has no real function here except that it just selects the best
script to use for digits and decimal separator and grouping) for spelling
currency amounts, French terms would still be used according to [LANG], in
reference to the Spanish currency (now the Euro, same as in France).

Should the [LC_MONETARY=] setting be left unspecified, the currency settings
would inherit from the language setting in [LANG=], that does not specify the
territory (so the currency will be left unspecified to some defaults, using the
digits, dot and comma as used in French, most probably in France here).

The POSIX settings are very language-centric with [LANG=] used as the root
setting used as the default for for other specialized settings (the only
exception being [TZ=] for the timezone, which can't be infered correctly and
easily from a language or even a territory).

Re: TR35

2004-05-12 Thread Antoine Leca

On Tuesday, May 11, 2004 6:59 PM, Philippe Verdy va escriure:

> From: "Carl W. Brown" <[EMAIL PROTECTED]>
>> Expats break the locale model anyway.  The problem is that we use
>> country as both a language modifier and a location.
>
> From past comments I read here, it is understood now that locale
> identifiers used to select languages contain a country/territory code
> only as a legacy way to select language variants.

I disagree. You are seeing the locale identifiers just in the context of
language tagging. It is not its primary use, nor is it the historical one,
neither the most proeminent.

Main usage for locale ids nowadays is to resume all the i18n settings in an
environnement. And certainly i18n settings depends on the language, but also
on the territory you are in. When you cross the border between Italy and
Slovenia, or between Ontario and New York, the most striking difference is
not the orthography or the pitch, but rather the coins.

Then, main variations within a language have been historically identified
with countries. This might be related to the common practice from States to
affirm its independance by drawings laws on this respect. It might also be
related to the current state of orthographies between both sides of Atlantic
Ocean for some important languages (and even more when we consider the
situation 20 years ago.)

Whether this perception is correct as "first tie", or if it should be
replaced by another (which one?), I cannot say. What is certain is that it
is not universal.

Now, the two points (locale identifiers characterizes language and
territory, and languages are usually partitioned with territory information)
did interfere during the last decade (certainly RFC 1766 and 3066 might be
related to this process.) Carl's point, and I believe he is correct, is just
that these two meanings should NOT be mixed. And that when we spoke about
locales, the relevant one is the first one (the part you snipped.)

> This code is meant
> to designate the language variant as spoken in that area, but not for
> identifying a location.

I am very sorry, but if in

LANG=fr; LC_MONETARY=es_ES

you consider that _ES above is a language variant of Spanish Castilian as
different from Hispanoamerican, you are deeply wrong.


> However the set of variables in POSIX is not rich enough or tweaked,
> because a single LC_ALL variable can override all these variables.

You are completely distording the model here.
The normal setting is as above: LANG, then LC_xxx where LANG is inadequate.
LC_ALL is an alternative way, that allows a _supplementary_ level. This is
very useful when you have to temporarily override the setting (please
remember that POSIX is initially console-oriented), because this way you can
with not too much keystrokes specify a desired behaviour for a given action,
like it

LC_ALL=POSIX cc myStrangeProgram.c


> This means that all settings what can be defined in a locale must be
> definable with the same identifier.

No, it does not _mean_ that. No obligation here.
Anyway, the general way to implement the standard C setlocale() is just
that, an identifier (not even human-readable, that is not its point) that
groups all settings.

If a Taiwanese sets in .profile

LC_ALL=zh_TW; export LC_ALL

and then complains the locale model is wrong, everybody, you included, will
tell him that what is primarly wrong is her setting.


> Now a good question is: can all settings in locales be selective
> enough to allow specifying correctly the possible values.

Define "possible": are you writing about the set of already described
locales? (the only useful, as Carl wrote, en_GU is essentially non-existent;
same for 0x180c)
Or about all the potential possible values, including pro_QQ for Occitan as
used within the Chancellery of Toulouse?


> Is the POSIX syntax enough for them?

Since it exists an extension to it in ISO/IEC TR 14652, answer here is
probably no.


Antoine

RE: TR35

2004-05-11 Thread Carl W. Brown

Doug,

> The issue of "French as spoken in Switzerland" versus "French as spoken
> in Canada" is totally unrelated to the issue of Swiss conventions versus
> Canadian conventions for sorting, date and time format, decimal
> separator, and so forth.
> 
> As for time zones, I agree completely with Mark that they should be
> handled separately from all other locale settings, and not dependent on
> them in any way.  Not only do people travel, and need to change their
> time zone setting while leaving everything else alone, but states and
> countries do sometimes change from one time zone to another.  The Olson
> data shows how common that is.

My understanding of the value of locales is that they provide a standard mapping for a 
set of parameters be it language, country conventions or time handling.  

It is unfortunate that often locale information that is country based is not separated 
from sub language and country conventions such as currency and numeric formatting.

The value of a locale is that it provides us with a way to map the locale into a 
common set of parameters.

But to do that properly we need more flexibility.  For example if I am going to send a 
letter it is helpful to know how the country of the recipient formats the address.  
But it is not that simple.  The recipient's country should be in the language of the 
sender so that the letter can be sent to the proper country to get to the recipient.  
This is where Unicode comes in.  With Unicode this becomes possible.  

I consider time zone a locale specification however is should be independent of 
language, script, and country.  However country is useful if you want to set a default 
time zone selection list since in most cases you will use a time zone in the country 
you specify in the locale.  In most cases the sub language will also be the same.  
However, a French speaking Canadian in Switzerland will probably want to use a French 
Canadian spell checker even while in Switzerland but use the Swiss currency.

Carl

Re: TR35

2004-05-11 Thread Doug Ewell

Mark Davis  wrote:

> BTW, what is curious is that the way the US timezones work, even
> though Pacific Time is listed as being -08:00, a *majority* of the
> year it is actually -07:00, and same for the others with daylight
> savings time.

Interesting way of thinking about it.  It was 50/50 until the rules were
changed in 1987.

In Europe the discrepancy is even greater than in the USA, by a week;
seven months for summer time, only five (including short February) for
standard time.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/

Re: TR35

2004-05-11 Thread Doug Ewell

Philippe Verdy  wrote:

> From past comments I read here, it is understood now that locale
> identifiers used to select languages contain a country/territory code
> only as a legacy way to select language variants. This code is meant
> to designate the language variant as spoken in that area, but not for
> identifying a location.

IMHO this is at, or at least near, the heart of much of the confusion
surrounding locales and the use of language/country pairs to denote
them.

The issue of "French as spoken in Switzerland" versus "French as spoken
in Canada" is totally unrelated to the issue of Swiss conventions versus
Canadian conventions for sorting, date and time format, decimal
separator, and so forth.

As for time zones, I agree completely with Mark that they should be
handled separately from all other locale settings, and not dependent on
them in any way.  Not only do people travel, and need to change their
time zone setting while leaving everything else alone, but states and
countries do sometimes change from one time zone to another.  The Olson
data shows how common that is.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/

Re: TR35

2004-05-11 Thread Mark Davis

As far as I'm concerned, timezone choice is completely orthogonal to locale
choice. And trying to guess from the region in the locale is very chancy. The
UIs I've seen just list the choices; they don't try to narrow it by country.
After all, I might be traveling, or living in a different country than my
language setting is for in my browser.

You can do a full order of timezones, very easily, using a lexicographic
ordering. To see whether timezone X is greater than timezone Y, walk back in
time over each of them. At the first point where the offsets differ, the one
with the greater offset is first. If they are the same throughout the database
period, they are equal. That ordering relationship can be used to sort the
timezones. This method will also group together all of the zones that are equal
back through time to a given point. For example, all the zones that are the same
back to 5 years ago will be clumped together.

That being said, the one piece of data that I wish the Olson database had was:
given two timezones X and Y that are identical in behavior over the last N
years, which is the 'preferable' choice to show in a UI? Of course, that is a
choice that might vary by locale.

With that information, and the ordering, for some time period (say 5 years), one
can present an ordered list of only distinct timezones over that period, and use
the 'preferable' one to represent any others; either that or have a 2nd level
menu or 'advanced' option to get all of them.

Mark

BTW, what is curious is that the way the US timezones work, even though Pacific
Time is listed as being -08:00, a *majority* of the year it is actually -07:00,
and same for the others with daylight savings time.

__
http://www.macchiato.com
â à â

- Original Message - 
From: "Carl W. Brown" <[EMAIL PROTECTED]>
To: "Unicode List" <[EMAIL PROTECTED]>
Sent: Tue, 2004 May 11 07:31
Subject: RE: TR35


> Peter,
>
> > >If I live in Guam I will probably be using an en_US locale.
> > However the "US" territory does not contain my time zone.
> > Probably the best solution for this problem is to add a category
> > of possessions to the territory information.  This allows
> > applications to enumerate available time zones for not only the
> > country itself but also it possessions that might be using the locale.
> > >
> > >
> >
> > This issue is not limited to a country's possessions. Many expatriates
> > and traveling business people etc want to keep their (laptop)
> > computer's general locale settings as that of their home country (not
> > least because changing this often destabilises data) but need to set it
> > to the time zone in which they are temporarily resident. So time zones
> > should be kept independent of other locale information, especially
> > independent of such things as date and decimal point formats, and
> > preferred languages.
>
> The problem is that if you give users the option to pick time zone and use
> the Olson zones then you want to be able to limit the number of zones that
> most people pick to the most likely ones.  The time zones for the country of
> the locale I am using are the most likely ones.  In the case of Guam I
> suspect that more people use en_US as a locale than en_GU.  I don't think
> that many people actually implement an en_GU locale.
>
> To me setting a time zone should probably start by selecting the time zone
> list:
>
> 1) Locale country (In most cases there is only one so there is not need for
> a second selection)
> 2) Country and related territories or possessions.
> 3) Time zones matching current system time.
> 4) Time zones within one hour of current system time.
> 5) All time zones in time order starting with current system time.
>
> To stay out of politics I would list Mainland China, Hong Kong, Singapore
> and Taiwan under each other.  Pick one get 4. The Falklands would be listed
> und both Great Britain and Argentina.
>
> One good point about using Unicode we can now use script rather than code
> page or specify Taiwan for Traditional script even if the person is not in
> Taiwan or Hong Kong.
>
> Expats break the locale model anyway.  The problem is that we use country as
> both a language modifier and a location.  Thus a Brazilian community in the
> US can not pick pt_BR as a language and US as a territory.
>
> TR35 explicitly designates the country portion as a territory not a language
> variant.  Should there be two different specifications both using the same
> ISO 3066-1 codes and in most cases they will be the same?
>
> Carl
>
>
>
>

Re: TR35

2004-05-11 Thread Philippe Verdy

From: "Carl W. Brown" <[EMAIL PROTECTED]>
> Expats break the locale model anyway.  The problem is that we use country as
> both a language modifier and a location.  Thus a Brazilian community in the
> US can not pick pt_BR as a language and US as a territory.

>From past comments I read here, it is understood now that locale identifiers
used to select languages contain a country/territory code only as a legacy way
to select language variants. This code is meant to designate the language
variant as spoken in that area, but not for identifying a location.

So a user that prefers Traditional Chinese will set its locale to zh_TW even if
that user is not in Taiwan. For timezones and currencies, the locale needs
another spacialized setting. In POSIX, the main locale specifier is not enough:
LANG selects the language, but for all other areas (currency and legal
commercial constraints, time and number formats, time zone and so on) there are
separate locale identifiers (TZ, LC_TIME, LC_MONETARY, LC_NUMBER...). This seems
good and allows various combinations to match what is needed in user's
environment.

However the set of variables in POSIX is not rich enough or tweaked, because a
single LC_ALL variable can override all these variables. This means that all
settings what can be defined in a locale must be definable with the same
identifier.

Java defines one unique main locale that plays the role of the POSIX LANG
setting. Any other specialized locale settings however may be set as needed by
creating other instances of the Locale object.

Now a good question is: can all settings in locales be selective enough to allow
specifying correctly the possible values. Is the POSIX syntax enough for them?
Apparently no for the timezone setting (TZ) which has almost always used
distinct locale identifiers.

RE: TR35

2004-05-11 Thread Benjamin Peterson


> To stay out of politics... The Falklands would be
> listed
> und both Great Britain and Argentina.

Falkland Islanders would not consider that to be 'staying out of
politics' :)




 
-- 
  Benjamin Peterson
  [EMAIL PROTECTED]

Re: TR35

2004-05-11 Thread Doug Ewell

Carl W. Brown  wrote:

> To stay out of politics I would list Mainland China, Hong Kong,
> Singapore and Taiwan under each other.  Pick one get 4.

I don't think Singapore belongs in that list.  Nobody seriously
questions its independence (and if anyone did it would be Malaysia, not
China).  Macao might belong there.

> The Falklands would be listed und both Great Britain and Argentina.

That would be staying *in* politics, IMHO.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/

RE: TR35

2004-05-11 Thread Carl W. Brown

Peter,

> >If I live in Guam I will probably be using an en_US locale.
> However the "US" territory does not contain my time zone.
> Probably the best solution for this problem is to add a category
> of possessions to the territory information.  This allows
> applications to enumerate available time zones for not only the
> country itself but also it possessions that might be using the locale.
> >
> >
>
> This issue is not limited to a country's possessions. Many expatriates
> and traveling business people etc want to keep their (laptop)
> computer's general locale settings as that of their home country (not
> least because changing this often destabilises data) but need to set it
> to the time zone in which they are temporarily resident. So time zones
> should be kept independent of other locale information, especially
> independent of such things as date and decimal point formats, and
> preferred languages.

The problem is that if you give users the option to pick time zone and use
the Olson zones then you want to be able to limit the number of zones that
most people pick to the most likely ones.  The time zones for the country of
the locale I am using are the most likely ones.  In the case of Guam I
suspect that more people use en_US as a locale than en_GU.  I don't think
that many people actually implement an en_GU locale.

To me setting a time zone should probably start by selecting the time zone
list:

1) Locale country (In most cases there is only one so there is not need for
a second selection)
2) Country and related territories or possessions.
3) Time zones matching current system time.
4) Time zones within one hour of current system time.
5) All time zones in time order starting with current system time.

To stay out of politics I would list Mainland China, Hong Kong, Singapore
and Taiwan under each other.  Pick one get 4. The Falklands would be listed
und both Great Britain and Argentina.

One good point about using Unicode we can now use script rather than code
page or specify Taiwan for Traditional script even if the person is not in
Taiwan or Hong Kong.

Expats break the locale model anyway.  The problem is that we use country as
both a language modifier and a location.  Thus a Brazilian community in the
US can not pick pt_BR as a language and US as a territory.

TR35 explicitly designates the country portion as a territory not a language
variant.  Should there be two different specifications both using the same
ISO 3066-1 codes and in most cases they will be the same?

Carl

Re: TR35

2004-05-10 Thread Peter Kirk

On 07/05/2004 09:44, Carl W. Brown wrote:

...

If I live in Guam I will probably be using an en_US locale.  However the "US" territory does not contain my time zone.  Probably the best solution for this problem is to add a category of possessions to the territory information.  This allows applications to enumerate available time zones for not only the country itself but also it possessions that might be using the locale.  
 

This issue is not limited to a country's possessions. Many expatriates 
and travelling business people etc want to keep their (laptop) 
computer's general locale settings as that of their home country (not 
least because changing this often destabilises data) but need to set it 
to the time zone in which they are temporarily resident. So time zones 
should be kept independent of other locale information, especially 
independent of such things as date and decimal point formats, and 
preferred languages.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/

Re: TR35 (was: Standardize TimeZone ID)

2004-05-10 Thread Peter Kirk

On 07/05/2004 14:53, [EMAIL PROTECTED] wrote:

...

So the database aliases one to the other. Aliases are used for timezones
that are compeltely equivalent on the whole timeframe considered
(apparently only starting in the early years of last century).
   

The cutoff date is 1970-01-01; if two timezones have been the same ever since
then, they are not separately encoded *unless* they are in separate national
jurisdictions (because after all it is the nation-state which sets up the
rules).  This date is the Posix zero point.
 

It is not always the nation-state which sets the rules. For example, in 
Australia each state sets its own rules; and so there are six different 
schemes with half hour differences, some daylight saving and some 
without. It is not only possible but quite likely that new distinctions 
will be introduced in time zones which have been the same since 1970; 
e.g. very likely New South Wales and Victoria have been in the same time 
zone ever since then, but there is a real chance that NSW will abolish 
daylight saving but Victoria will not. So don't assume too quickly that 
time zones will not be split.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/

Re: "Country possessions" (was: Re: TR35)

2004-05-08 Thread Doug Ewell

E. Keown  wrote:

>> For an authoritative list of "countries," the UN
>> list is probably your best bet.
>
> Is this list online? --  Elaine

http://unstats.un.org/unsd/methods/m49/m49alpha.htm

The ISO 3166-1 FAQ points to this page as the determining factor in
whether a "country" gets its own ISO 3166-1 code.

There are certainly some entities here (e.g. Puerto Rico, U.S. Virgin
Islands, Svalbard and Jan Mayen) that are not independent in the same
sense as the world's major countries.  Finding the dividing line is not
easy, and one good question to ask would be, "What do I intend to do
with this information?"

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/

Re: "Country possessions" (was: Re: TR35)

2004-05-08 Thread E. Keown

 Elaine Keown
 Tucson

Hi,


> For an authoritative list of "countries," the UN 
> list is probably your best bet.

Is this list online? --  Elaine




__
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs  
http://hotjobs.sweepstakes.yahoo.com/careermakeover

"Country possessions" (was: Re: TR35)

2004-05-08 Thread Doug Ewell

Philippe Verdy  wrote:

> The status of some "possessions" in the Antarctica (AQ) is not clear.
> They are administered by existing countries for the scientific bases
> that run there, but have now a limited right for their expansion (the
> old maps that divided it into sectors to the pole are no longer
> valid), and the territory itself is placed under an international
> treaty protected by the United Nations.

At least in the past, there were some countries -- including some who
operate scientific bases in Antarctica and some who do not -- who made
national territorial claims to portions of the Antarctican continent.

The official U.S. policy, someone correct me if I'm wrong, was that the
U.S. didn't recognize any country's territorial claims to Antarctica,
but reserved the right to make such claims itself in the future.  (As
arrogant as that sounds.)

Philippe's point is basically sound, that once you get beyond
"countries" with their own fully autonomous government, the lines get
fuzzy.  Additionally, any "list of country possessions" is certain to be
the subject of dispute between countries with conflicting claims.  The
Falkland Islands, Jammu and Kashmir, Taiwan, etc.  For an authoritative
list of "countries," the UN list is probably your best bet.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/

Re: TR35 (was: Standardize TimeZone ID

2004-05-08 Thread Mark Davis

It depends on what you mean by "possessions". National Parks? Furniture?
Occupied countries? ...

More seriously, this is an messy area. Probably the most fruitful approach would
have to do is look at the international standards for postal addressing, which
point off to the individual countries for their own internal subdivisions. You
would then find out at least what countries *think* they own (or administer -- 
I'll refrain from more politically-tinged statements on this list).

Mark
__
http://www.macchiato.com
â à â

- Original Message - 
From: "Carl W. Brown" <[EMAIL PROTECTED]>
To: "Unicode List" <[EMAIL PROTECTED]>
Sent: Sat, 2004 May 08 07:54
Subject: RE: TR35 (was: Standardize TimeZone ID


> Mark,
>
> Do you know if there is an official list of country possessions?
>
> Carl
>
> > -Original Message-
> > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> > Behalf Of Mark Davis
> > Sent: Friday, May 07, 2004 5:28 PM
> > To: Carl W. Brown; Unicode List
> > Subject: Re: TR35 (was: Standardize TimeZone ID
> >
> >
> > If you look at LDML, you will see that it uses a narrow view of locale;
> > essentially those elements that are language-specific +
> > variations (like choice
> > of phonebook vs dictionary collation for German). In particular,
> > a locale does
> > not include a time zone, nor does it include a currency; those
> > are considered
> > orthogonal attributes. What an LDML locale does include is the
> > capacity to have
> > *translated names* for time zones, and *translated names* for currencies.
> >
> > If someone wants to build a broader notion of locale on top of
> > this they could
> > do so, incorporating whatever other information is important for the given
> > transactional processing, e.g., customer timezone, nearest branch office
> > timezone, customer's preferred currencies, vendor's allowed
> > currencies, seat
> > assignment, dietary restrictions (kosher, atkins, no vegetables
> > beginning with
> > the letter C, ...), security status (low-, medium-, high-risk), religious
> > preference (atheist vs theist), etc.
> >
> > Mark
> > ______
> > http://www.macchiato.com
> > â à â
> >
> > - Original Message - 
> > From: "Carl W. Brown" <[EMAIL PROTECTED]>
> > To: "Unicode List" <[EMAIL PROTECTED]>
> > Sent: Fri, 2004 May 07 14:46
> > Subject: RE: TR35 (was: Standardize TimeZone ID
> >
> >
> > > Mark,
> > >
> > > > That is not a problem. The Olson IDs are not guaranteed
> > > > to be unique, just unambiguous. And there are aliases.
> > > > Typically these are de-unified for political
> > > > purposes. Thus you may find that two different IDs produce
> > > > the same results over
> > > > the entire period of time in the database.
> > >
> > > So which timezone will the tr_TR locale in a TR35 database have?
> > "Asia/Istanbul" or "Europe/Istanbul" or both?
> > >
> > > I guess that the territory possessions list should be an
> > another database that
> > is merged.
> > >
> > > Carl
> > > >
> > >
> > >
> > >
> > >
> > >
> >
> >
> >
> >
>
>
>
>
>

Re: TR35 (was: Standardize TimeZone ID

2004-05-08 Thread Philippe Verdy

From: "Carl W. Brown" <[EMAIL PROTECTED]>
> Do you know if there is an official list of country possessions?

Not very complicate to build, starting by the ISO 3166-1 and UN (numeric) list
of country/territory codes. I have such a list if you want.

But all depends on the level of granularity you need: some "territories" in UN
and ISO have a single code for the same administrative region, that covers
sometimes very distant "possesions" (I'd rather use the term "dependancies").

Some of them have no formal assignment in ISO 3166-1, only some reserved codes
or simply no code at all. Examples: Jersey (JE), Guernsey (GE), Chausey Islands
(grouped with Jersey?), Paracel Islands (claimed by China).

The status of some "possessions" in the Antarctica (AQ) is not clear. They are
administered by existing countries for the scientific bases that run there, but
have now a limited right for their expansion (the old maps that divided it into
sectors to the pole are no longer valid), and the territory itself is placed
under an international treaty protected by the United Nations.

I can say that of the old French "Terre AdÃlie" which consists in only one
antarctic scientific base (Dumont d'Urville), now administered within the
"French Austral and Antarctic Territories" (TF), an administrative term that
also covers non Antarctic islands such as Kerguelen Islands and Amsterdam Island
(this territory, out of the European Union, is administered from Paris by two
ministries, and is used mostly as a flagship for commercial navigation).

RE: TR35 (was: Standardize TimeZone ID

2004-05-08 Thread Michael Everson

At 07:54 -0700 2004-05-08, Carl W. Brown wrote:
Do you know if there is an official list of country possessions?
The CIA factbook probably gets it right. I guess the UN publishes something.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

RE: TR35 (was: Standardize TimeZone ID

2004-05-08 Thread Carl W. Brown

Mark,

Do you know if there is an official list of country possessions?

Carl

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> Behalf Of Mark Davis
> Sent: Friday, May 07, 2004 5:28 PM
> To: Carl W. Brown; Unicode List
> Subject: Re: TR35 (was: Standardize TimeZone ID
> 
> 
> If you look at LDML, you will see that it uses a narrow view of locale;
> essentially those elements that are language-specific + 
> variations (like choice
> of phonebook vs dictionary collation for German). In particular, 
> a locale does
> not include a time zone, nor does it include a currency; those 
> are considered
> orthogonal attributes. What an LDML locale does include is the 
> capacity to have
> *translated names* for time zones, and *translated names* for currencies.
> 
> If someone wants to build a broader notion of locale on top of 
> this they could
> do so, incorporating whatever other information is important for the given
> transactional processing, e.g., customer timezone, nearest branch office
> timezone, customer's preferred currencies, vendor's allowed 
> currencies, seat
> assignment, dietary restrictions (kosher, atkins, no vegetables 
> beginning with
> the letter C, ...), security status (low-, medium-, high-risk), religious
> preference (atheist vs theist), etc.
> 
> Mark
> __
> http://www.macchiato.com
> â à â
> 
> - Original Message - 
> From: "Carl W. Brown" <[EMAIL PROTECTED]>
> To: "Unicode List" <[EMAIL PROTECTED]>
> Sent: Fri, 2004 May 07 14:46
> Subject: RE: TR35 (was: Standardize TimeZone ID
> 
> 
> > Mark,
> >
> > > That is not a problem. The Olson IDs are not guaranteed
> > > to be unique, just unambiguous. And there are aliases.
> > > Typically these are de-unified for political
> > > purposes. Thus you may find that two different IDs produce
> > > the same results over
> > > the entire period of time in the database.
> >
> > So which timezone will the tr_TR locale in a TR35 database have?
> "Asia/Istanbul" or "Europe/Istanbul" or both?
> >
> > I guess that the territory possessions list should be an 
> another database that
> is merged.
> >
> > Carl
> > >
> >
> >
> >
> >
> >
> 
> 
> 
>

Re: TR35 (was: Standardize TimeZone ID

2004-05-07 Thread Mark Davis

If you look at LDML, you will see that it uses a narrow view of locale;
essentially those elements that are language-specific + variations (like choice
of phonebook vs dictionary collation for German). In particular, a locale does
not include a time zone, nor does it include a currency; those are considered
orthogonal attributes. What an LDML locale does include is the capacity to have
*translated names* for time zones, and *translated names* for currencies.

If someone wants to build a broader notion of locale on top of this they could
do so, incorporating whatever other information is important for the given
transactional processing, e.g., customer timezone, nearest branch office
timezone, customer's preferred currencies, vendor's allowed currencies, seat
assignment, dietary restrictions (kosher, atkins, no vegetables beginning with
the letter C, ...), security status (low-, medium-, high-risk), religious
preference (atheist vs theist), etc.

Mark
__
http://www.macchiato.com
â à â

- Original Message - 
From: "Carl W. Brown" <[EMAIL PROTECTED]>
To: "Unicode List" <[EMAIL PROTECTED]>
Sent: Fri, 2004 May 07 14:46
Subject: RE: TR35 (was: Standardize TimeZone ID


> Mark,
>
> > That is not a problem. The Olson IDs are not guaranteed
> > to be unique, just unambiguous. And there are aliases.
> > Typically these are de-unified for political
> > purposes. Thus you may find that two different IDs produce
> > the same results over
> > the entire period of time in the database.
>
> So which timezone will the tr_TR locale in a TR35 database have?
"Asia/Istanbul" or "Europe/Istanbul" or both?
>
> I guess that the territory possessions list should be an another database that
is merged.
>
> Carl
> >
>
>
>
>
>

Re: TR35 (was: Standardize TimeZone ID

2004-05-07 Thread Philippe Verdy

From: "Carl W. Brown" <[EMAIL PROTECTED]>
> > That is not a problem. The Olson IDs are not guaranteed
> > to be unique, just unambiguous. And there are aliases.
> > Typically these are de-unified for political
> > purposes. Thus you may find that two different IDs produce
> > the same results over
> > the entire period of time in the database.
>
> So which timezone will the tr_TR locale in a TR35 database have?
"Asia/Istanbul" or "Europe/Istanbul" or both?

Both: one is an alias of the other, which exists only as a convenience for
users. However, should the eastern part of Turkey use a different timezone,
tr-TR would not indicate the applicable timezone (this is what happens to the
"en-US" locale, that spans many timezones).

This is a good justification for a separate locale setting for TZ in POSIX
locales, so a US user could set:
LANG=en_US for the default locale, and TZ=America/New_York to adjust the
timezone; some newer syntaxes allow setting the timezone in a combined locale ID
with attributes: "en_US;tz=America/New_York".

However, POSIX locales use legacy syntaxes for timezone IDs like "PST-8PDT",
which specify the GMT offset and abbreviations in the standard and daylight
time. Many of them are referenced in the Olson's database as aliases.

For today's developments, the default timezone in softwares without timezone set
should be UTC (alias Zulu or "Z"), even in a "en_US" locale, but many legacy
applications use the US Pacific Time used in California as a default timezone in
that locale or in the default "C" locale.  ;-) I wonder why...

Re: TR35 (was: Standardize TimeZone ID

2004-05-07 Thread jcowan

Carl W. Brown scripsit:

> So which timezone will the tr_TR locale in a TR35 database have?
> "Asia/Istanbul" or "Europe/Istanbul" or both?

Both.

> I guess that the territory possessions list should be an another
> database that is merged.

I think they should be in the same database.  Guam is a territory, but Hawaii
is integral: all the French overseas departments are integral.  Simplest to
treat everything as integral.

-- 
John Cowan  [EMAIL PROTECTED]  www.reutershealth.com  www.ccil.org/~cowan
If a soldier is asked why he kills people who have done him no harm, or a
terrorist why he kills innocent people with his bombs, they can always
reply that war has been declared, and there are no innocent people in an
enemy country in wartime.  The answer is psychotic, but it is the answer
that humanity has given to every act of aggression in history.  --Northrop Frye

Re: TR35 (was: Standardize TimeZone ID)

2004-05-07 Thread jcowan

Philippe Verdy scripsit:

> I do agree. The fact that both "Europe/Istanbul" and "Asia/Istanbul"
> are referenced is probably not really political, but it reflects
> the fact that this city is on both continents, and that it's timezone
> covers more than just this city. Someone leaving on the Asian area near
> the city, but not in Istanbul must just wonder why its timezone is not
> defined in the "Asia" subcategory, and why he must select it in Europe
> (the reverse is possible).

Correct.

> So the database aliases one to the other. Aliases are used for timezones
> that are compeltely equivalent on the whole timeframe considered
> (apparently only starting in the early years of last century).

The cutoff date is 1970-01-01; if two timezones have been the same ever since
then, they are not separately encoded *unless* they are in separate national
jurisdictions (because after all it is the nation-state which sets up the
rules).  This date is the Posix zero point.

> when in fact solar time was most frequently used (with lots of
> approximations) rather than official times.

Standard time dates to the 1890s in Europe and North America; basically, its
existence reflected the need for railroads to use a single time zone (or as few
as possible).

> What I don't know is if the Riyadh Solar Time is still in use today in
> Sauda Arabia (the Olson's database only contains rules for 1987-1989).
> in

I believe that it is not.  The intention was to set sunset (the beginning of
the Islamic day) to 00:00 local time, but the difficulties in doing so
were simply too great.

> As well the "yearistype.sh" script is quite bogous if used to determine
> leap years (is it useful or correct for US election years?).

It is (the U.S. elects presidents in years that are divisible by 4
and greater than 1787, when the present constitution came into effect).
No actual time zone depends on whether the year is a presidential election
year, though the idea was proposed at one time.

-- 
"But the next day there came no dawn,   John Cowan
and the Grey Company passed on into the [EMAIL PROTECTED]
darkness of the Storm of Mordor and werehttp://www.ccil.org/~cowan
lost to mortal sight; but the Dead  http://reutershealth.com
followed them.  --"The Passing of the Grey Company"

RE: TR35 (was: Standardize TimeZone ID

2004-05-07 Thread Carl W. Brown

Mark,

> That is not a problem. The Olson IDs are not guaranteed 
> to be unique, just unambiguous. And there are aliases. 
> Typically these are de-unified for political
> purposes. Thus you may find that two different IDs produce 
> the same results over
> the entire period of time in the database.

So which timezone will the tr_TR locale in a TR35 database have?  "Asia/Istanbul" or 
"Europe/Istanbul" or both?

I guess that the territory possessions list should be an another database that is 
merged.

Carl
>

Re: TR35 (was: Standardize TimeZone ID)

2004-05-07 Thread Philippe Verdy

From: "Mark Davis" <[EMAIL PROTECTED]>
> That is not a problem. The Olson IDs are not guaranteed to be unique, just
> unambiguous. And there are aliases. Typically these are de-unified for
political
> purposes. Thus you may find that two different IDs produce the same results
over
> the entire period of time in the database.
>
> Moreover, whether or not someone wants to consider two IDs as 'equivalent'
> depends on their timeframe. If I only care about the last 5 years, then many
> more IDs fall into the same equivalence class than if I look over the entire
> period of time covered by Olson.
>
> While I do not believe that the database is perfect,  there is no need to
invent
> yet another mechanism.

I do agree. The fact that both "Europe/Istanbul" and "Asia/Istanbul" are
referenced is probably not really political, but it reflects the fact that this
city is on both continents, and that it's timezone covers more than just this
city. Someone leaving on the Asian area near the city, but not in Istanbul must
just wonder why its timezone is not defined in the "Asia" subcategory, and why
he must select it in Europe (the reverse is possible).

So the database aliases one to the other. Aliases are used for timezones that
are compeltely equivalent on the whole timeframe considered (apparently only
starting in the early years of last century). I doubt that before, daylight was
ever applied with consistent rules, when in fact solar time was most frequently
used (with lots of approximations) rather than official times. With solar time,
there's no standard timezone, as each place defines its own time, depending on
seasons and the observed position of the sun in the sky.

What I don't know is if the Riyadh Solar Time is still in use today in Sauda
Arabia (the Olson's database only contains rules for 1987-1989). It may be in
use today for determining the time of religious events, but official time is
probably based on a fixed offset from UTC for practical reasons. If I use the
"Asia/Riyadh89" timezone, it defines the GMTOFF field to 03:07:04 with dayly
changes of daylight offsets up to December 31 (where the daylight offset is
minus 3 minutes). Then after, starting Jan 1st 1990, there's no daylight offset,
so I suppose that it is permanently set now to this GMTOFF value.

But if I consider the comments at the top, there's a astronomical formula to
compute the apparent noon time, rounded to nearest 5 seconds (due to a limit in
the initial Olson implementation).

So a good question remains: should the astronomical formula be used to compute
official time, or should we just keep the average noon time offset 0, and ignore
the Riyadh87 to 89 timezone IDs? The comment at the top is strange as it uses an
number of days from January 0 (What's this??? May be Olson knows or there's a
comment about this in the discussions saved in the HUGE "tzarchive" file).

Also its internal "iso3166.tab" file is obsolete, as well as "zone.tab" which
contains a mapping from countries/territories (with logitude/latitude of a
relevant city) to lists of timezones. As well the "yearistype.sh" script is
quite bogous if used to determine leap years (is it useful or correct for US
election years?).

May be TR35 should specify which parts of the database are referenced.

Re: TR35 (was: Standardize TimeZone ID

2004-05-07 Thread Mark Davis

That is not a problem. The Olson IDs are not guaranteed to be unique, just
unambiguous. And there are aliases. Typically these are de-unified for political
purposes. Thus you may find that two different IDs produce the same results over
the entire period of time in the database.

Moreover, whether or not someone wants to consider two IDs as 'equivalent'
depends on their timeframe. If I only care about the last 5 years, then many
more IDs fall into the same equivalence class than if I look over the entire
period of time covered by Olson.

While I do not believe that the database is perfect,  there is no need to invent
yet another mechanism.

Mark
__
http://www.macchiato.com
â à â

- Original Message - 
From: "Carl W. Brown" <[EMAIL PROTECTED]>
To: "Unicode List" <[EMAIL PROTECTED]>
Sent: Fri, 2004 May 07 09:44
Subject: TR35 (was: Standardize TimeZone ID


> Mark,
>
> > LDML does require the Olson IDs to identify time zones
> > (as does Unix, Java, ICU,...). See the discussion in
> > http://www.unicode.org/reports/tr35/.
>
> I found a normalization problem with the IDs.  For example you have both
"Asia/Istanbul" and "Europe/Istanbul" which are different names for the same
time zone.  I believe that the best solution is to drop the region designation
because the time zones that we need are specific to a unique country.  Thus
"Istanbul" under "TR" works just fine.  I do not believe that we need the
"Etc/..." or miscellaneous aliases.
>
> This changes TR35 to:
>
> 
> 
> Pacific Time
> Pacific Standard Time
> Pacific Daylight Time
> 
> 
> PT
> PST
> PDT
> 
> San Francisco
> 
>
> It will then be part of the locale territory properties.
>
> Problem number 2:
>
> If I live in Guam I will probably be using an en_US locale.  However the "US"
territory does not contain my time zone.  Probably the best solution for this
problem is to add a category of possessions to the territory information.  This
allows applications to enumerate available time zones for not only the country
itself but also it possessions that might be using the locale.
>
> Thus es_PR, en_PR, en_US, and es_US will all have access to the "Puerto_Rico"
time zone without replicating data and denormalizing the database.  The
application can choose to include territories or not depending on its specific
requirements.
>
> I believe that the strength of the Unicode standard is in the fact that in
addition to unifying code pages it also is a mechanism to support normalizing of
data and specifications.
>
> Carl
>
>
>
>
>
>
>

51 matches

Mail list logo