RE: Possible RFC 3683 PR-action

2008-03-25 Thread Peter Constable
> From: Simon Josefsson [mailto:[EMAIL PROTECTED]

> > Frankly, it strikes me as somewhat odd that a body acting as a
> > standards-setting organization with public impact might allow any
> > technical decision on its specifications to be driven by people
> > operating under a cloak of anonymity. Expressing an anonymous voice?
> > No problem. Influencing determination of a consensus with public
> > impact? That should not be allowed, IMO.
>
> What if the pseudonymous voice raise a valid technical concern, provide
> useful text for a specification, or even co-author a specification?

That's having voice. We can be open to any voice. If a concern has valid 
technical merits, then that should be evident to others, and drive a consensus 
on its own. But the consensus can still be determined by identifiable people.


> I think decisions should be based on technically sound arguments.

Just so.

> Whether someone wants to reveal their real identity is not necessarily
> correlated to the same person providing useful contributions.

True. But neither is ability to provide useful contributions necessarily 
correlated with being counted as part of a consensus.


Peter
___
IETF mailing list
IETF@ietf.org
https://www.ietf.org/mailman/listinfo/ietf


Re: Possible RFC 3683 PR-action

2008-03-25 Thread Peter Constable
From: Russ Housley...

> > Since IETF does not vote, it is certainly not an issue here?
>
> This is not totally true.  A WG Chair or Area Director cannot
> judge rough consensus if they are unsure if the portion of the
> population that is representing a dissenting view is one person
> or many different people.  This is especially true when there
> are a large number of silent observers.


Frankly, it strikes me as somewhat odd that a body acting as a 
standards-setting organization with public impact might allow any technical 
decision on its specifications to be driven by people operating under a cloak 
of anonymity. Expressing an anonymous voice? No problem. Influencing 
determination of a consensus with public impact? That should not be allowed, 
IMO.


Peter Constable
___
IETF mailing list
IETF@ietf.org
https://www.ietf.org/mailman/listinfo/ietf


Re: [Ltru] Possible RFC 3683 PR-action

2008-03-24 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of
> [EMAIL PROTECTED]

Randy Presuhn wrote:

> However, the vocabulary, style, content, and peculiar world-view of
> this latest missive leave me more convinced than ever that "LB"
> is indeed JFC Morphin, and that under the terms of RFC 3683
> we are well justified in suspending the posting privileges for that
> address.

I'll mention a few things in relation to this: First, I believe the record in 
the archives of the LTRU list will show that at times in the past JFC attempted 
to circumvent actions taken to limit his posting privileges to that list by 
using alternate email addresses, signed under his same name. Secondly, at 
various times since his posting privileges to that list were first limited, the 
list received mail from posters presenting themselves by a different name but 
whose vocabulary, style, content and world view were decidedly similar to that 
of JFC. In short, "LB" is not the first instance of a poster to LTRU that I 
suspected of being a sock-puppet for JFC. Nor is it just of late that "LB" was 
suspected of being a sock-puppet for JFC.

Also, a quick search for "lbleriot" did not point to anybody engaging in public 
discussions except on the LTRU or IETF lists. I find that somewhat curious 
given LB's comment to be a member of a "multilinguistic working list": I 
suppose there could be a group of multiple individuals working on protocol 
specifications intended for the Internet or other such public systems but all 
of whose discussions are conducted on a private list, but it would be readily 
accounted for if, in fact, LB were no more than a sock-puppet.

Granted, LB's posts to LTRU have been neither as frequent or a long as were 
JFC's before his privileges were suspended. And granted that there is one other 
online presence for a "lbleriot" who, for over some period of time ending early 
last year, sold a number of items on eBay and apparently had very positive 
feedback. Neither of these points lead me to consider Randy's actions to be 
unreasonable, however.


Peter
___
IETF mailing list
IETF@ietf.org
https://www.ietf.org/mailman/listinfo/ietf


Re: Last Call: draft-klensin-unicode-escapes (ASCII Escaping ofUnicode Characters) to BCP

2007-10-26 Thread Peter Constable
I have a terminological objection to this draft, mainly in section 2. I have 
other comments regarding section 2 I'll mention.

First, terminology: the heading for section 2 has "...Table Position...", and 
the body refers to "code point position in the table". While the term "code 
table" could have been used in the Unicode Standard to refer to the encoded 
entities and their encoding, it is not.

The Unicode Standard uses these terms:

- It uses "character set" and "character repertoire" for the collection of 
elements being encoded, and "coded character set" for the set of pairs of such 
elements and their encoded representations.

- It uses "codespace" to refer to a range of numeric values used as encoded 
representations, and specifically "Unicode codespace" for the range 0 to 10 
(hex).

- It uses "code point" or "code position" (synonyms) for values in the Unicode 
codespace.

Thus, the appropriate term here is simply "code point" or "code position". 
"Table position" and "position in the table" are not appropriate since the 
Standard never uses "table" in this regard. And "code point position" is 
redundant. Perhaps the wording was attempting to differentiate between code 
points and various encoded representations of code points. But the latter are 
not code points, so there isn't really any ambiguity.

A possible refinement might be to use "Unicode Scalar Value": this refers to 
code points other than surrogate code points. By definition in the Standard, 
encoded characters can only be assigned to a Unicode Scalar Value. I don't see 
this as a necessary change in the draft, however.


Now for other comments on section 2.

The draft has:

  "However, when
   information about characters is to be processed by people,
   information about the Unicode code point is preferable to a further
   encoding of the encoded form of the character."

Information about the code point? (It is numeric. It is an integer. It is 
non-negative. It is in the range 0 to 10. It is even. It is divisible by 
17. It is the same value as is found on the license plate of John Doe's car.) I 
think it is the code point itself that is to be preferred. Also, "a further 
encoding of the encoding form" isn't going to be clear to readers. (I'm not 
sure myself what these words mean themselves; I can guess at what the author 
meant, though am not positive.)

Thus, I'd change this text to:

  "However, when
   information about characters is to be processed by people,
   reference to the Unicode code point is preferable to encoded
   representations of the code point."


Now, section 2 is talking about alternate representations of an encoded 
character, but the flow is a bit mixed up, IMO. The first paragraph says that 
there are different equivalent representations but that the Unicode code point 
is preferred. Then the next paragraph revisits the same thing in more detail. 
The sentence from the first paragraph discussed above, once revised so that it 
makes a clear statement, already says what paragraph two says in greater 
detail. Whether a more succinct or more detailed statement is preferred, just 
say it once.

Of course, if the more detailed paragraph two is kept, "code point position in 
the table" should be changed to "code point".

Also from paragraph two:

   "the UTF-8
   encoding or some other short-form encoding"

The term "short-form encoding" isn't explained here and may not be understood. 
I can only guess what is meant. If the intended meaning is what I think (a 
reference to shortest-form versus non-shortest-form UTF-8), then I don't think 
it's really relevant. Either way, I'd change the wording to:

   "the UTF-8 encoding or some other encoding form"

(Encoding form is a term defined in the Unicode Standard.)

Also:

   "the other encodes the octets of"

I don't think octets are encoded; they are simply referenced using some 
notational system. Thus, change to:

   "the other uses the octets of ... in some representation."

(This gives parallel wording for the two kinds of reference.)

Finally:

   "the Unicode code point forms"

Drop "forms":

   "the Unicode code points"




Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: Last Call: draft-klensin-unicode-escapes (ASCII Escaping ofUnicode Characters) to BCP

2007-10-22 Thread Peter Constable
From: Stephane Bortzmeyer [mailto:[EMAIL PROTECTED]
Sent: Monday, October 22, 2007 4:03 AM

>> Also, "a further encoding of the encoding form" isn't going to be
>> clear to readers.
>
> It is a reference to a bad practice (used in URLs, for instance) to
> encode twice (for instance in UTF-8, then in %xx escapes of the
> bytes).

The discussion in that section is about references to characters in general 
human-readable content, not in URLs. If that is what the wording is referring 
to, it's extremely opaque. If that's really what the authors intend to talk 
about, it should be explained -- and the section should be organized better so 
that it makes sense why that particular thing is being discussed.


>>   "However, when information about characters is to be processed by
>>   people, reference to the Unicode code point is preferable to
>>   encoded representations of the code point."
>
> That's not more clear to me.

How can it not be clear? Human-readable content is discussing a Unicode 
character and needs to refer to the character in some way. The whole point of 
this document is about how to refer. Since Unicode character identity is 
established by the name, the code point and the reference glyph, reference can 
be made using one of those three things. It appears to me that this document 
focuses on references based in some way on the code point: is not the key 
distinction between the code point itself and some encoded representation of 
the code point?



Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Last Call: draft-klensin-unicode-escapes (ASCII Escaping ofUnicode Characters) to BCP

2007-10-21 Thread Peter Constable
I have a terminological objection to this draft, mainly in section 2. I have 
other comments regarding section 2 I'll mention.

First, terminology: the heading for section 2 has "...Table Position...", and 
the body refers to "code point position in the table". While the term "code 
table" could have been used in the Unicode Standard to refer to the encoded 
entities and their encoding, it is not.

The Unicode Standard uses these terms:

- It uses "character set" and "character repertoire" for the collection of 
elements being encoded, and "coded character set" for the set of pairs of such 
elements and their encoded representations.

- It uses "codespace" to refer to a range of numeric values used as encoded 
representations, and specifically "Unicode codespace" for the range 0 to 10 
(hex).

- It uses "code point" or "code position" (synonyms) for values in the Unicode 
codespace.

Thus, the appropriate term here is simply "code point" or "code position". 
"Table position" and "position in the table" are not appropriate since the 
Standard never uses "table" in this regard. And "code point position" is 
redundant. Perhaps the wording was attempting to differentiate between code 
points and various encoded representations of code points. But the latter are 
not code points per se, so there isn't really any ambiguity.

A possible refinement might be to use "Unicode Scalar Value": this refers to 
code points other than surrogate code points. By definition in the Standard, 
encoded characters can only be assigned to a Unicode Scalar Value. I don't see 
this as a necessary change in the draft, however.


Now for other comments on section 2.

The draft has:

  "However, when
   information about characters is to be processed by people,
   information about the Unicode code point is preferable to a further
   encoding of the encoded form of the character."

Information about the code point? (The code point of that character is numeric 
/ is an integer / is non-negative / is in the range 0 to 10 / is even / is 
divisible by 17 / is the same value as the number of days the song "Hey Jude" 
was on the Top 40 list.) I think it is the code point itself that is to be 
preferred, not information about it.

Also, "a further encoding of the encoding form" isn't going to be clear to 
readers. (I'm not sure myself what these words mean themselves; I can guess at 
what the author meant, though am not positive.)

Thus, I'd change this text to:

  "However, when
   information about characters is to be processed by people,
   reference to the Unicode code point is preferable to encoded
   representations of the code point."


Now, section 2 is talking about alternate representations of an encoded 
character, but the flow is a bit mixed up, IMO. The first paragraph says that 
there are different equivalent representations but that the Unicode code point 
is preferred. Then the next paragraph revisits the same thing in more detail. 
The sentence from the first paragraph discussed above, once revised so that it 
makes a clear statement, already says what paragraph two says in greater 
detail. Whether a more succinct or more detailed statement is preferred, just 
say it once.

Of course, if the more detailed paragraph two is kept, "code point position in 
the table" should be changed to "code point".

Also from paragraph two:

   "the UTF-8
   encoding or some other short-form encoding"

The term "short-form encoding" isn't explained here and may not be understood. 
I can only guess what is meant. If the intended meaning is what I think (a 
reference to shortest-form versus non-shortest-form UTF-8), then I don't think 
it's really relevant. Either way, I'd change the wording to:

   "the UTF-8 encoding or some other encoding form"

(Encoding form is a term defined in the Unicode Standard.)

Also:

   "the other encodes the octets of"

I don't think octets are encoded; they are simply referenced using some 
notational system. Thus, change to:

   "the other uses the octets of ... in some representation."

(This gives parallel wording for the two kinds of reference.)

Finally:

   "the Unicode code point forms"

Drop "forms":

   "the Unicode code points"




Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: Petition to the IESG for a PR-action against Jefsey Morfin posted

2005-10-03 Thread Peter Constable
> From: Dean Anderson [mailto:[EMAIL PROTECTED]
 
> > If it were representative, then one would expect that several others
> > monitoring the WG discussions would be providing that confirmation. I
> have not
> > seen any indication of that happening.
> 
> This is a false premise.  First, silence does not indicate agreement.

On a list with a couple of dozen members, if someone is generally making useful 
contributions and a few cranks start harassing that person, it would be 
completely reasonable to expect that the rest will not simply jump on the 
bandwagon of the few. If you think otherwise, then I think you have an 
unreasonably pessimistic prejudice against the majority of participants.

As you say, silence does not indicate agreement. Quite so. If your sample was 
representative, and the majority on the list agreed but were silent, then it 
would be reasonable to expect they would be prepared to give that confirmation. 
IOW, if your sample is really representative, then it should be easy to get 
testimony from witnesses to that effect.


> But even
> if everyone on the WG did want him removed, their reasons must be due to
> actual
> and unreasonable misbehavior that prevents the working group from
> functioning.
> Personal dislike, even if unanimous, is insufficient.  No serious
> engineering
> can be done as a personal popularity contest.

Just so. And that is precisely my point: Harald and Doug indicated that they 
want him removed due to unreasonable misbehaviour, not simply personal dislike; 
the accuracy or sincerity of Doug's comment was questioned, and so I have 
offered my testimony supporting what he said.



Peter Constable
___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Petition to the IESG for a PR-action against Jefsey Morfin posted

2005-10-03 Thread Peter Constable
> From: Dean Anderson <[EMAIL PROTECTED]>

> In the message Randy concludes that
> 
> "If anyone wishes to raise an issue, (s)he should do on on the
working group
> mailing list by posting a message detailing the concern and, if
possible,
> supplying proposed replacement text."
> 
> But it would seem that Morfin did just exactly that, with a lot of
supporting
> documentation.  It seems to me that Randy Presuhn just doesn't want to
address
> the concerns raised, nor does he want anyone _else_ to address the
concerns.

Not the case at all. Everyone else in the WG that was voicing pertinent
concerns was doing so (i) in a reasonably clear manner that all could
understand (ii) on the list and (iii) whenever appropriate supplying
specific suggested revisions to the text. There were occasions on which
Mr. Morfin made clear and pertinent comments on the list, and when he
did they were welcomed. On some occasions, he suggested specific text,
and when he did those suggestions were considered openly. On several
occasions, however, he posted messages that tended toward being opaque
or overly long or both, and far more often than not he didn't give
concrete suggestions for specific textual changes. Within some of those
often-lengthy posts he pointed to documents he had placed on other
sites, and there were many things that led others in the WG to believe
that the material on those other sites was supporting his entirely
different agenda rather than the work of the WG. Perhaps some of that
content was useful to the work of the WG, but by that point there was
already a high level of frustration among many WG members, such that
there really was an onus on him to demonstrate that it would be
worthwhile to spend the time going off to review them. This he did not
do.


> In
> fact, Randy actually admits in the same message to having advised
others _not_
> to review Morfin's objections.  That seems to be contrary to Last
Call.

I'm not aware of any occasion on which Randy advised members of the WG
not to review Last Call comments that had be submitted in the expected
manner on the WG or IETF lists.



> The sample, limited as it is, seems to confirm an unjustifiable
personal attack
> on Morfin based, it seems, on personal dislike and intolerance for his
English
> language skills

IMO your limited sample is not sufficient to support your point. If it
were representative, then one would expect that several others
monitoring the WG discussions would be providing that confirmation. I
have not seen any indication of that happening.


Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Petition to the IESG for a PR-action against Jefsey Morfin posted

2005-10-03 Thread Peter Constable
> From: "Doug Ewell" <[EMAIL PROTECTED]>

> I wish that everyone who trivializes Harald's proposal as a matter of
> "personal dislike" or "silencing anyone with a different opinion"
could
> have experienced life in the LTRU Working Group for the past 6 months,
> where list members were constantly insulted for being Americans or for
> being employed by large companies, where "resolved" and out-of-scope
> issues were raised over and over again, and where list members became
> wary of posting anything at all, for fear their words would be twisted
> to mean something completely different.  It would not have been
> tolerated in any face-to-face working environment.

I concur. This is not IMO a matter of personal dislike. Mr. Morfin has
made some positive contributions to the LTRU WG which have been
appreciated. But those were more than offset by regular and repeated
occurrences of the kinds of behaviours Harald and Doug have described.
It *has* hindered the WG, and repeated requests for change in behaviour
have been of no avail.



Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: RFC 3066 bis Libraries list

2005-09-12 Thread Peter Constable
> From: JFC (Jefsey) Morfin [mailto:[EMAIL PROTECTED]


> Dear Peter,
> whatever the way you want to say it, these libraries have now to meet
> new specs they had not to meet before.

I cannot say whether every existing software library written to conform
to RFC 3066 *has to* meet new specs. Certainly there will some, perhaps
many, that would be benefit from revision to the new specs. 

If that was your intent, it would have been clearer to me had you asked
people to identify libraries they feel should be revised if the new spec
is adopted. 




Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: RFC 3066 bis Libraries list

2005-09-11 Thread Peter Constable
> From: "JFC (Jefsey) Morfin" <[EMAIL PROTECTED]>

> RFC 3066 Bis imposes new constraints on the existing language tags
> software libraries.

We need to be more careful in describing the proposed revision to RFC
3066 (aka RFC 3066 bis) wrt exiting libraries that conform to RFC 3066:
every tag valid under the terms of RFC 3066 bis will be recognized by an
existing library written to conform to RFC 3066. Not every tag that
*could* be recognized by such a library would be valid under RFC 3066
bis, but every tag actually valid today under RFC 3066 is also valid
under RFC 3066 bis.


The draft was written with careful attention to ensuring compatibility
with existing libraries written to support RFC 3066. The draft can be
said to impose new constraints that existing libraries would not impose;
I don't see how it could be said that the draft imposes new constraints
on those libraries.



Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: [Ltru] Last call comments on LTRU registry andinitialization documents

2005-09-09 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of John C
> Klensin

>   Aside on the example above (LTRU participants can skip
>   unless they want to check my logic): "en-Hang" and
>   "en-Hant" would imply writing English in Korean Hangul
>   or Traditional Chinese characters respectively.  In
>   addition to those not exactly being common cases, it is
>   not clear that they are feasible...

>   Hangul is problematic in a different way.
>   Unlike Chinese characters, it is definitely phonetic.
>   But because it is rather carefully designed and
>   structured around the needs of Korean, it is not clear
>   to me, in my ignorance, that it could be used to
>   represent the full range of English phonemes and
>   syllables with reasonable accuracy.

Actually, that's not quite true. There are Korean linguists who have promoted 
the idea that Hangul script can be adapted for use for general phonemic 
transcription of languages. Thus, it is plausible that a user might have 
English data written with Hangul script.

This really is no different from the da-CO kinds of examples: they are not 
particularly useful because there is no actual variant of the given language 
associated with the given country, but only coincidentally, and in principle 
the state of affairs in human demographics could well change such as such a 
variant does actually exist. Just as there is no limit to what region a given 
language can be spoken by a significant population (given sufficient mobility), 
in principle there is no limit to what script can be used to write a given 
language (given sufficient ingenuity). 

The *only* difference for these cases in RFC 3066 is that generative use 
without requiring registration is sanctioned for country IDs ‎but not for 
script IDs. And that is not so by explicit design; is resulted simply because 
we weren't yet sure how script IDs should be integrated into the tags and the 
fact that ISO 15924 wasn't yet published.


Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Enough is enough: Intent to file an RFC 3683 against Jefsey Morfin (Harald Tveit Alvestrand)

2005-08-30 Thread Peter Constable
> From: Harald Tveit Alvestrand <[EMAIL PROTECTED]>


> At the moment, learning that Jefsey's opinions can be ignored is a
part of
> the initiation process for new IETF participants in the fora he
frequents.
> I think that's a steep learning curve.

The difficulty isn't in learning when they should be ignored, but rather
in knowing that they *will* be ignored by others. People can form their
own opinions of his comments very quickly, but it takes a while to
discover what others' opinions might be.


Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: Last Call: 'Tags for Identifying Languages' to BCP [Re: Ietf Digest, Vol 16, Issue 95]

2005-08-29 Thread Peter Constable
> From: Brian E Carpenter [mailto:[EMAIL PROTECTED]

> If we are reading here about a point of view that was expressed within
> the WG, and that the WG did not accept, that seems to be clear enough
> for the purposes of a Last Call. I think discussion of details and
> of the history could be conducted on the WG list.

For my part, I have been responding to what appeared to me might be
construed as JFC having found cause to oppose the draft. Since he has
since clarified that he supports the draft, to wit, 

> You do not need to sell your solution. I explained again and again I 
> support it.

then I have no reason to comment further on this list.



Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Last Call: 'Tags for Identifying Languages' to BCP

2005-08-29 Thread Peter Constable
> From: "JFC (Jefsey) Morfin" <[EMAIL PROTECTED]>
> Subject: Re: Last Call: 'Tags for Identifying Languages' to BCP

> >Mr. Morfin appears to me to have no more than a very vague sense of
the
> >scope of ISO 639-4.
> 
> This is somewhat fun as I am a contributor.

To my knowledge, you have never been a member of TC37/SC2/WG1. I cannot
rule out the possibility that you have submitted suggestions that found
their way to WG1.



Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: Ietf Digest, Vol 16, Issue 95

2005-08-29 Thread Peter Constable
> From: "JFC (Jefsey) Morfin" <[EMAIL PROTECTED]>


> I am sorry to impose again the community, what starts amounting to
> ad-hominems. I am used to that, but the quality of the person and the
> serious looking of the mail calls for a reponse. In particular in
> this case, where two majors points are documented.
> Sorry, Peter. Please, Brian advise if inadequate.

My comments have not been ad hominem. If you feel they were in error,
please provide argumentation and evidence to that effect.


> He
> >has never provided any specific proposal except a request to permit
> >certain private-use tags, which I will return to below.
> 
> Dear Peter,
> This kind of repetition now abuse no one. I bored everyone enough in
> explaining that two additional subtags were necessary IMHO: the
> referent and the context. There is also - a way or another the need
> of the date of the reference (this can be a date or included in a
subtag).

To my recollection, you never gave any concrete proposal indicating how
this should be done. If you did, please indicate where in the archive,
and I will gladly withdraw that statement. I don't question that you
mentioned inclusion of such information within a language tag, but at
first glance this is the very kind of thing that *should* be in a
distinct attribute.


> Just in case: the langtag is not supposed to only support the
> written-form attributes, but to be multimodal (cf. Peter Constable).
> Please quote the voice, signs, icons, mood, etc. subtags.

For any question regarding how a distinction in linguistic variety or
written form should be reflected in a tag, the members of the WG
provided an answer.


> >Two comments: First, Mr. Morfin suggested within the LTRU WG that the
> >syntax for language tags should be loosened to permit additional
> >characters, such as "." and ":".
> 
> This is a false affirmation. I did two things...

> - I supported the proposition of an African searcher (they treated of
> troll)

Please indicate the name of the person you believe who made this
contribution, or point to the relevant part of the archives. To my
recollection this idea was promoted by "JFC Morfin", aka "Jefsey
Morfin", aka "Jean-Francois Charles Morfin", and by "F. Charles". Please
provide reason to believe that "F. Charles" is not the sock-puppet of
"Jean-Francoise Charles Morfin".

As has been mentioned, the request was rejected for technical, not
ad-hominem, reasons.



> The Draft addresses targets you defined a long ago. It was presented
> privately (twice) and is now presented as a WG document.  The
> document having not changed,

For clarification, the document underwent considerable change -- enough
to merit 12 drafts -- many changes being made in response to the
last-call comments on the earlier private drafts.


> In a nutshell, I do _not_ believe that a draft crafted by a few
> individuals can supports all the relevant distinctions needed to
> describe the linguistic and written-form attributes of content as may
> be needed for all purposes, commercial and otherwise.

In the six months since the WG was formed, you have not suggested any
distinctions that could not be made using the proposal in the draft and
that the WG found to be appropriate for integration into these tags. You
did suggest distinctions that were appropriate for these tags, but the
WG pointed out that they are already supported by the proposal.


> >- While Mr. Morfin cites ISO 11179, he has never made statements
> >   that clearly indicate that he actually understands those
standards.
> 
> I propose everyone having time to spend to read ISO 11179 and to
judge.

Since this was an opinion about your statements, they would need to read
both ISO 11179 and archives of the LTRU and IETF lists.


> In a recent mail, Peter acknowledged the need to consider ISO 11179
> and explained that ISO 12620 was its equivalent.

On the contrary, I indicated that TC37/SC2/WG1 had affirmed the choice
of the project team for ISO 639-6 to apply ISO 12620, a TC37 standard
that predates ISO 11179 but is being revised to make normative reference
to ISO 11179. This was in reporting activity of ISO TC37/SC2 and has no
direct bearing whatsoever on this draft or the work of the LTRU WG.


> >- While Mr. Morfin refers to "an ISO 11179 conformant system",
> >   none of the ISO 11179 series of standards contains any statement
> >   of conformance requirements. Thus, no such notion of "ISO 11179
> >   conformant" is defined anywhere.
> 
> :-) :-)
> 
> This is the second Historic statement!
> Too bad there is Google 

My point was merely that conformance is not defined in any formal way,
therefore must be measured in terms of consistency with the co

Re: Last Call: 'Tags for Identifying Languages' to BCP

2005-08-29 Thread Peter Constable
than that of a combined conflation of characteristics; each
> characteristic can be assigned separate preference values, and
irrelevant
> characteristics (e.g. script w.r.t. spoken language) can be easily
ignored.

Negotiation of separate attributes involving inter-related
characteristics is *not* simpler, as pointed out above. The draft fully
allows for irrelevant characteristics (e.g. script wrt audio content) to
be ignored. Again, what has been provided in the draft is in accordance
with the charter of the WG.


> As negotiation and related issues represent a critical technical issue
for
> the design of language tags (viz. keeping separate characteristics out
of
> *language* tags), it is essential that such negotiation issues be
> considered
> carefully before specifying the format of tags.  Unfortunately, that
has
> not
> been done, and considering the published WG milestones it appears that
> that
> issue has not been taken into consideration...  However, it
> appears that the WG has not considered the issues, with the effect
that
> the
> WG product lacks the "particular care" expected of BCP documents (RFC
> 2026).

It is unclear on what basis it is asserted that these issues have not
been considered by the WG. I believe most of the WG members would feel
that they have been reasonably taken into consideration. Again, what has
been submitted for last call is in accordance with the charter; just as
it is not reliable to infer something about a content provider from a
language tag, so also it is not reliable to infer from the order of
milestones in the charter that matching issues were not taken into
consideration in preparation of these drafts.



> Note that it is not the registration procedural issues that are
typical of
> BCP documents that are problematic; rather it is the conflation of
> separate
> characteristics into a single tag syntax, specified in the same
document,
> which raises problems related to content negotiation.

Bruce asserts (a) that there is conflation of separate characteristics,
and that (b) this creates problems in content negotiation. The WG
determined that the characteristics conflated into a single tag are not
independent, and that it would be *separation* into separate attributes
that would result in problems in content negotiation, not their
combination into a single attribute.



> Another large part of
> the problem is WG management; in addition to the issues raised by John
> Klensin the last time that LTRU participation was discussed on the
IETF
> discussion list -- and with which I wholeheartedly agree -- it appears
> that
> management of WG participant conduct has been rather lax; proponents
of
> the
> individual submission effort who are participating in the WG tend to
> resort
> to ad-hominem attacks when a problem is identified or when an
alternative
> approach is raised, with no visible intervention by the WG co-chairs.
> That
> has also (i.e. in addition to the factors which John identified) had
the
> effect of limiting WG participation by individuals.

It's unclear what bearing this has on what improvements can be made to
the drafts in fulfillment of the WG charter. I believe several WG
participants felt that management of conduct was lax, particularly in
relation to a very small number of participants with a penchant for
certain behaviours that would have challenged the best of moderators.

As for the accusation that proponents of an earlier individual
submission engaged in ad-hominem attacks that went without intervention
by the WG co-chairs, resulting in the limitation of participation in the
WG by other individuals, in the absence of specific evidence, this
appears itself to be no more than an ad-hominem attack on those
individuals and on the WG co-chairs. To my knowledge, there was only one
individual in relation to whom other members of the WG acted in any way
that might discourage or hinder his participation, and such actions
arose only in response to repeated provocation from that individual.



> Specification of "language" tag syntax which conflates other content
> characteristics prior to open and professional discussion of
negotiation
> issues and alternative approaches would be a premature lock-in of a
design
> choice.  As the document under discussion specifies a conflation of
such
> characteristics without open discussion

It is asserted that there has been no open discussion of the matter of
conflation. This is untrue. It is asserted that there has been no open
discussion of alternatives; the only concrete alternative presented for
discussion was to have separate language and script tags, which
alternative was considered and rejected due to problems that arise in
content negotiation. The drafts submitted for review are in accordance
with the charter, and I believe I can say that in the opinion of WG
members matters of conflation and of negotiation issues were taken into
consideration, and were discussed in an open and professional manner.



Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Last Call: 'Tags for Identifying Languages' to BCP

2005-08-28 Thread Peter Constable
> From: "JFC (Jefsey) Morfin" <[EMAIL PROTECTED]>

> >XML, HTML, etc. are not IETF protocols and should not be the main
> >consideration in IETF work on IETF documents,
> 
> They are specifically quoted by the Charter. Also is CLDR...

These are cited in the charter only as examples in a statement to the
effect that "the RFC 3066 standard for language tags has been widely
adopted in various protocols and text formats..."



> It is to note that ISO 639-4 work is about discussing guidelines in
> that area. This work is under way and was not considered.

Mr. Morfin appears to me to have no more than a very vague sense of the
scope of ISO 639-4.



Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: Last Call: 'Tags for Identifying Languages' to BCP

2005-08-28 Thread Peter Constable
> From: JFC (Jefsey) Morfin [mailto:[EMAIL PROTECTED]


> This means that the legitimate URI tag:
> "tags:x-tags.org:constable.english.x-tag.org"
> must be accommodated into the format
> "x----etc." instead of
> "0-x-tags.org:constable.english.x-tag.org"

As I mentioned in another message, Mr. Morfin submitted a request to the
WG that the syntax in the draft be loosened to permit tags of the form
indicated, and that the consensus of everyone else in the WG was to
reject that request on the basis that (i) it would result in backward
incompatibility with existing processes designed to conform to RFC 3066,
and (ii) it was possible to create a scheme for semantically equivalent
tags without breaking compatibility with RFC 3066.


> Peter takes a loosely applied chancy non-exclusive proposition, to
> make it the significantly constrained exclusive rule of the Internet
> instead of correcting it and following the ISO innovation (ISO 639-6
> and ISO 11179) as directed by the Charter. This permits him to
> exclude competitive propositions following or preceding that
innovation.

The LTRU charter makes no reference whatsoever to ISO 639-6 or to ISO
11179. As I have explained elsewhere, Mr. Morfin's suggestion that the
draft is incompatible with ISO 11179 while his alternative would be
conformant is far from valid. Finally, I have not excluded competing
propositions; I was one voice among many that rejected a request to
permit "." and ":" in the syntax, and to my recollection no other
concrete proposal wrt syntax, let alone an overall system of metadata
elements, was submitted by Mr. Morfin to the WG.


 
> With the trick above: length and character wise a private tag is a
subtag.
>  and the lack of explanation of how billions of machines will
> know about the daily updated version of his 600 K file, without
> anyone paying for it, but me and the like.

It is completely unclear on what basis Mr. Morfin is suggestion that
billions of machines will need to update "my" (?? I did not create it!)
600K file on a daily basis. There is no indication or likelihood that
the language subtag registry proposed by this draft will change with a
frequency approaching anything close to daily. Indeed, it is entirely
likely that it will change rather less frequently than the RFC 3066
registry was likely to change.



Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: STD (was: Last Call: 'Tags for Identifying Languages'toBCP)

2005-08-28 Thread Peter Constable
onstitute a simpler solution. Given the widespread
existing use of RFC 3066 tags, use of ISO 639-6 would have to go
alongside use of multi-part tags of the form permitted by RFC 3066,
which is certainly not simpler than what is specified in the draft.



> >Your statement doesn't contradict anything that Debbie has said,
> >provided the context is ISO 639-6 alone. If we were to talk about
> >incorporation of ISO 639-6 into a revision of RFC 3066, however, then
> >duplication would become an issue for consideration.
> 
> This is the WG-ltru Charter that all the ISO codes be included.

The charter makes reference to "the underlying ISO standards"; that is,
to the ISO standards referenced in RFC 3066 or those cited in the
charter to be incorporated into the update RFC. The charter does not
cite ISO 639-6, let alone state that "all the ISO codes be included".


> Nice to see that ISO 11179 is accepted now. Peter Constable and the
> WG-ltru have opposed the reference to ISO 11179 model. This model
> permits to conceptualise languages and to include in their
> description an unlimited number of additional elements.

This is in no way implied by ISO 11179. The model of that standard
assumes that metadata elements designate concepts within some conceptual
system, and that the system of metadata elements includes a meta-model
that reflects that conceptual system. This would have the effect of
*constraining* the concepts represented to entities within that
conceptual model. Those entities may be an infinite set, but the set of
entities that can be represented by the tags defined by this draft would
not increase in number if the draft were changed to reference ISO 11179.


> But ISO 11179 totally open the concept...

Clearly either Mr. Morfin does not understand ISO 11179 or, if he does,
he has totally failed to express a statement consistent with that
understanding.


> I would then advise that the Draft is sent back to the WG-ltru, with
> the suggestion that a lexicon is provided which would define what is
> a "language", a "script", a "country", and the purpose (informative,
> descriptive, normative?) of a langtag. This might be a big step ahead.

Mr. Morfin submitted a request to the WG that these terms be defined.
The consensus of everyone else in the WG was that this was not necessary
since it would not significantly alter the ability of anyone to
implement or use the specification.



Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Last Call: 'Tags for Identifying Languages' to BCP

2005-08-28 Thread Peter Constable
> From: Bruce Lilly <[EMAIL PROTECTED]>


> It's unclear what you're trying to get at here.  A URI scheme is a
> protocol element (an "assigned number") registered by IANA, not a
> piece of text (see RFCs 1958 and 2277).  As such, it has no need of
> an indication of language, for it has no language; it is a language-
> independent protocol element.

This point was made in response to Mr. Morfin on more than one occasion
within the LTRU WG. He appears to be unwilling to accept it, however.


> ought to be a means of indicating language in IDNs.  However, that is
> primarily an issue with the IDN specification(s), not with the
document
> under discussion (except to the extent that the document under
> discussion extends the likely length of tags

In comparison to RFC 3066, the draft does not extend the likely length
of tags. The likely length of tags is precisely the same as before; the
main difference is that this draft imposes significant structural
constraints on tags.



Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Unicode points

2005-02-24 Thread Peter Constable
> From: Bruce Lilly <[EMAIL PROTECTED]>

> I apologize for not being sufficiently clear.

But part of the issue appears to be one of being sufficiently informed.


> Given the flip-flop on musical notation, I expect that the consortium
> will have no trouble finding other non-text things to encode (smileys,
> aromatic hydrocarbon chemical symbols (very fertile territory, no pun
> intended), dance notation, logos, traffic symbols, etc.).

There has been no flip-flop on such things. There were never any
guarantees that musical symbols would not be part of the UCS. There will
be further symbols added to the UCS, and there is no certainty of
exactly what, but it is by no means open-ended.


> > The range of Unicode characters is defined in
> > <http://www.unicode.org/versions/Unicode4.0.0/ch02.pdf>, page 24, as
0 to
> > 10(hex), which is 1.114.111 decimal - quite a bit larger than
65536,
> > but quite a bit smaller than 4 billion.
> 
> Now, yes, but I have about as much faith that that won't expand as I
> now have in the "Unicode characters are sixteen bits" statement which
> was true in its day.

That merely indicates that you are not fully informed with the
development of Unicode and ISO 10646. Nothing whatsoever has happened to
increase the likelihood of the codespace ever going beyond U+10.
Rather, much has been done to ensure that it does not, as reflected by
recent action within JTC1/SC2/WG2 to that effect.


> I did; in case you missed it, I quoted from the Unicode Standard
> itself, viz. "Graphologies unrelated to text, such as musical and
> dance notations, are outside the scope of the Unicode Standard".

That means that musical or dance notation and complete notational
systems are beyond the scope of the Unicode Standard itself, as are
mathematical formulas. That does *not* mean that the text elements --
the individual symbols -- that are used in those notation systems are
necessarily out of scope for the Unicode Standard.


> That appeared in the description of the "plain text" principle,
> before that sentence was elided following the abandonment of that
> principle.

You seem to think that some principle has been abandoned, but it has
not.



Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: The process/WG/BCP/langtags mess...

2005-01-11 Thread Peter Constable
> From: Vernon Schryver <[EMAIL PROTECTED]>
> Subject: Re: The process/WG/BCP/langtags mess...

> That's fine, but does suggest some questions:
> 
>  - Is the Last Call over?
> 
>  - If so, was its result "no supporting consensus"?
> 
>  - If the result was "no supporting consensus", will the current
document
>  nevertheless be published as a BCP?
> 
>  - If the result was "no supporting consensus", will a revision of
> the document be published as a BCP without a new Last Call?
> 
> Last week I saw a comment that seemed to answer first question with
Yes.
> If the answers to the other questions are not Yes, No, and No, then
> as others have said, the IETF has far more serious process problems
> than how to account for the expenses of the to be hired help.

This is comment is a general one rather than being directed toward the
particular case at hand. It seems to me that your comment is making a
presumption, as a participant on the IETF list, regarding what the
outcome of the question regarding result must be. Perhaps I am wrong,
but I would have thought it is the role of the IESG to make that
determination, not members of this list; and if that is the case I would
certainly think it possible for them to weigh concerns that have been
raised against responses provided and reach a conclusion that there has
been adequate disposition of the comments raised. Again, I am not saying
that in this case I think that is what the IESG will or might or should
do; only that in general I would think it is something that they *could*
do, in which case the outcome of their decision even when concerns have
been raised cannot be assumed a priori.


> If outside groups can publish IETF BCPs without the let, leave, or
> hindrance of the IETF, then the honest thing to do is to get rid
> of all of that tiresome WG stuff.

No outside group is doing this.



> On the other hand, if the answers are Yes, Yes, No, and No, then
> contrary to the other person's request, there is no good reason to
> talk about the language tags document here and now.

I agree that a yes to the first question -- is the last call closed? --
would appear to be adequate grounds for there to be no further
discussion on this list in relation to the I-D in question. Whether
there may be grounds for discussing other process-related questions
possibly including the area of work to which this I-D pertained is, of
course, a separate question.


Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: The process/WG/BCP/langtags mess...

2005-01-11 Thread Peter Constable
> From: Vernon Schryver <[EMAIL PROTECTED]>

> >    In fact we feel that we've been very considerate
> > and open in the development of this draft in the language tagging
> > community and continue to be open to comments and criticism, no
> > matter the source.
> 
> Based on what I have seen in this mailing list, I disagree.

I'd be curious to know what has led to the impression that the authors
have not been open to comments or criticism. 


> He is basically saying "You must publish our
> BCP because we followed all of the steps as we understood them and the
> default result of that is surely to publish."

I am unable to see how you derive that from his message. Rather, he
appears to be saying, if there is not enough consensus for acceptance of
this draft, then surely we should be able to find a way for stakeholders
to continue work together toward a draft that does achieve consensus.


Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: Language tags, the phillips draft, and procedures

2005-01-07 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:ietf-languages-
> [EMAIL PROTECTED] On Behalf Of John C Klensin

> I'd like
> to suggest that everyone voluntarily declare a cooling-off
> period...

> Please don't try to
> answer that question today, especially on the IETF list.

I'll respect that request. I'll only comment that I think both you and
Kristin failed to identify where the real dichotomies lies. For
instance, your second suggestion was to think about the contradiction
between the two positions, but in fact the supporters of the draft would
describe their position as involving elements of both of the two
opposing positions in your analysis.

Some of those who have raised concerns with the draft have expressed
frustration at not being heard, which is a reasonable complaint, and I
have made a real attempt to understand those concerns. (E.g. the last
sequences of exchanges between Ned and me; and my acknowledgment of
comments you've made wrt process and WGs.) Please understand that there
may also be frustration for supporters of the draft from a perception
that their position is not being understood, which may result for
instance from analyses of the opposing views that really don't capture
their position at all. 

For my part, I won't say I'm frustrated by the analysis you gave; just
disappointed that I haven't been able to get us closer to the place
where we agree on what the dichotomies are, which I had hoped to do.


Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, sp ecifications, "stability", and extensions

2005-01-06 Thread Peter Constable
John:

> Peter, just to clarify... In my opinion (which isn't necessarily
> worth much)

(I sincerely doubt that's the case.)


>, the procedures that were followed were perfectly
> reasonable.   Anyone can form a design team and put a document
> together, and there are no rules that bar such a design team
> from using and building on a mailing list set up for something
> else.  That may or may not be wise, but it is certainly
> permitted.  The only place this runs into a problem is if
> someone presumes that a document developed in the way this one
> was developed is equivalent to a WG product, or that it is
> entitled to the presumptions of relevancy and correctness that
> go with a WG product.

I can't speak for the authors. I was not familiar with those
distinctions when the process began, and I suspect that is true of
others on the IETF-languages list who contributed. In my mind we were
following a precident that implied not only a permitted procedure but an
entirely appropriate one. I think all of us now understand, at least in
part, that some distinctions exist that may have practical implications
on how something is received by the IETF community and processed by the
IESG.


> From that point of view, it is nothing
> more or less than an individual submission (or the output of a
> self-defined design team) and the comments Dave and I have been
> making apply.

I don't think I have questioned the applicability of your comments in
this regard at any point.



Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, sp ecifications, "stability", and extensions

2005-01-06 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:ietf-languages-
> [EMAIL PROTECTED] On Behalf Of John C Klensin


> > This reflects a fundamental misunderstanding of what the draft
> > does compared to what RFC 3066 does.  It imposes *more*
> > restraints on language tags, not fewer.
> 
> It also very explicitly permits talking about scripts, not just
> languages and countries.That, to me, is an extension,
> regardless of the additional constraints.

There may be a disagreement here due to a difference of perspective: one
could say that the grammar is more extensive, but that makes the formal
language less extensive. So, I suppose whether one considers such a
revision "an extension" depends on one's perspective. 

Note that while the draft permits "talking about" scripts, RFC 3066
permits "talking about" *anything*. More extensive grammar, less
extensive language (and vice versa). 


> And, as Ned as
> pointed out repeatedly, there are things that can be done in
> 3066 parsers/interpreters in practice that have to be done
> differently in this new system.

I think this claim can only be made on the basis of assumptions not
found in RFC 3066. Ned has most recently said, 

"3066bis provides a reliable way to locate country codes in all cases,
but the algorithm is different. And this is a non-backwards-compatible
change."

The fact that it can identify country codes in all cases but requires a
different algorithm does not imply a non-backwards-compatible change
since it is a new functionality -- it is doing something that wasn't
even possible in RFC 3066. 

Backwards compatibility cannot be measured in terms of whether new
processors require different algorithms to achieve new functionality. It
can only be measured in terms of whether new processors can perform
correct operations (correct according to the specification for those
processors -- the proposed draft) on existing tags, and whether existing
processors can perform correct operations (correct according to the
specification of those processors -- RFC 3066) on new tags. This draft
permits this.



Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, sp ecifications, "stability", and extensions

2005-01-06 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]


> > RFC 3066 left us with bigger problems: it doesn't give us any
> > way to identify pieces that we would be encountering in registered
tags
> > (apart from hard-coded tables compiled from versions of the registry
> > that pre-exist a given implementation).
> 
> With, as you point out below, one important exception: It did have a
way
> to
> reliably identify a country code in most cases (but not all).

If "in most cases" means from among tags in use today under the terms of
RFC 3066 (as John Cowan would say, "what is true"), then yes. But if "in
most cases" means trom among tags permitted by RFC 3066 (as John Cowan
might say, "what is the rule") -- including some that users have been
wanting to use but have delayed using pending a revision of RFC 3066--
then no: RFC 3066 allowed for reliable identification of a country code
in only a small portion of all possible cases: only if it occurred as
the second subtag following an ISO 639 code (it does not prohibit a
country code from occurring anywhere after the first subtag).



> And this ability
> to say "2 character subtag in the second position, most be a country
code"
> was
> quite useful even though it might miss other occurences of country
codes
> in some cases.

The draft would still grant the ability to make that statement, and
would permit new implementations never to miss *any* occurences of
country codes.


 
> 3066bis provides a reliable way to locate country codes in all cases,
but
> the
> algorithm is different. And this is a non-backwards-compatible change.

Surely this has been the point of greatest contention in this
discussion, and is clearly not obvious, for there are several who have
repeatedly indicated that they do not see any such backwards
non-compatibility. Please, anyone claiming there would be
incompatibility, be pedantic: define whatever terms, make explicit
whatever assumptions are required to support this claim. (I suspect the
root of this disagreement lies in unstated assumptions.) 

Those who claim backward compatibility do so on the basis that every
existing implementation conformant to RFC 3066 will continue to operate
precisely as designed and in conformance with RFC 3066 regardless
whether they encounter a tag presently well-formed and valid under the
terms of RFC 3066 or one that would be sanctioned by this draft. If
there is any term needing clarification in that statement or any
suspected assumption not made plain, please ask for clarification.
 

 
> Of course there's the option Dave Singer has raised: Reverse the
positions
> of
> script and country codes in 3066bis. I see two problems with this:
> 
> (1) Script codes are in general more important than country codes, and
> therefore really should come first so that simple truncation
matches
> work "better". (There are probably exceptions to this assertion
> lurking
> out there somewhere, but I believe it is mostly true.)

Thank you for voicing support for this position.

 
> (2) I believe it increases the number of grandfathered codes that
won't
> conform
> to the new format.

If I'm not mistaken, I think there would be no difference in this
regard.

 

Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, sp ecifications, "stability", and extensions

2005-01-06 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:ietf-languages-
> [EMAIL PROTECTED] On Behalf Of John C Klensin


> (3) Finally, there is apparently a procedural oddity with this
> document.  The people who put it together apparently held
> extended discussions on the ietf-languages mailing list, a list
> that was established largely or completely to review
> registrations under 3066 and its predecessors.My
> understanding at this point is that their good-faith impression
> was that the discussions on that list were essentially
> equivalent to those of a WG.

I believe I can say that it was done this way because it followed the
example of the development of RFC 3066, which to my knowledge (as a
member of the IETF-languages list at that time) happened in the same
way. It was certainly done with a good-faith impression that appropriate
procedures were being followed.



Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, sp ecifications, "stability", and extensions

2005-01-06 Thread Peter Constable
p://www.sil.org/silewp/abstract.asp?ref=2002-003.) 


> or what sort of naming scheme is politically acceptable, and is there
a conflict there.
> This does get back to the
> algorithmic matching issue in a sense though, which is that if one
wants some sort of
> hierarchical structure to
> the tags (to allow easier matching),

Insofar as tags are structures as linearly-sequenced elements and that
there are matching algorithms in use that are based on
left-prefix-trunctation, there is no debate over *wanting* a
hierarchical structure: it's a reality we must live with.


> or indeed [wants to] define any sort of matching rules (as an
> implementor wants), you're
> probably getting right into some political questions about how
matching "should work".
> So for those who wanted
> to stick just to linguistic accuracy and try to avoid political
issues, trying to avoid
> discussion of algorithmic matching
> may have seemed appealing (but then provides no help to what I've
termed the
> "implementors").

This seems to assume that those promoting an ordering of script and
country subtags as found in the draft are supporting that order for
reasons of linguistic purity and have no interest in discussion of
algorithmic matching, which is completely wrong: the reason for
supporting that order of subtags has everything to do with matching
behaviour in certain widely-deployed algorithms.



Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, sp ecifications, "stability", and extensions

2005-01-06 Thread Peter Constable
> From: Dave Singer [mailto:[EMAIL PROTECTED]


> Sorry, I should have gone on to conclude:  the important aspect of
> sub-tags is that their nature and purpose be identifiable and
> explained (e.g. that this is a country code), and that we retain
> compatibility with previous specifications.

Ah! Then the proposed draft ensures that the nature of subtags are
always identifiable, which RFC 3066 (as I mentioned earlier) fails to
do. 

And the draft retains compatibility with previous specifications using
an assumption (thoroughly discussed and concluded on the IETF-languages
list a year ago) that, in case of left-prefix matching processes, script
distinctions are generally far more important that country distinctions.


> I don't believe that simple
> truncation is a necessarily useful operation in all circumstances,

I don't think anyone would dispute that.


> and it probably should not be in the spec. at all.  For example, I'd
> say that we should retain the 3066 ordering of language-country and
> therefore script, if needed, comes later.  However, my typesetting
> subsystem doesn't care a jot about language or country, it just needs
> to find the script code ('can I render this script'?).

Here I disagree. For other purposes, I think it's very clear that the
only time that choice of order matters is with matching algorithms that
use simple truncation, and for the most common implementations, which
use left-prefix truncation, the order lang-script-country will be far
more useful in the long run precisely because script distinctions are
generally far more important in matching than country distinctions. I
don't know of any case in which a tag might be used that contained all
three subtags but in which the country distinction generally matters
more than the script distinction.


Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, sp ecifications,

2005-01-06 Thread Peter Constable
> From: Dave Crocker <[EMAIL PROTECTED]>

> It occurs to me that a Last Call for an independent submission has an
added
> requirement to satisfy, namely that the community supports adoption of
the work.
> We take a working group as a demonstration of community support.

You say "the community", though surely a working group is only
representative of "a community" or a portion of "the community".

In this case, there is a least "a community", represented by members of
the IETF-anguages list.


> And, indeed, I haven't seen much support for the document under
discussion.

??!! If there wasn't much support, surely the discussion would have died
a few weeks ago shortly after it started.



Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, sp ecifications, "stability", and extensions

2005-01-06 Thread Peter Constable
> From: Dave Singer [mailto:[EMAIL PROTECTED]

> >This is similar to the reason why the language code comes before the
country
> >code. If we had the order CH-fr, then we could end up mixing French
and
> >German in the same page, because we would fall back (for one of the
data
> >sources) from CH-fr to CH, which could be German.
> 
> It has to be application-specific which fallback happens.  If the
> user says he's swiss french, and the the content has alternative
> offers for swiss german or french french, which do you present?  If
> the content actually differs for legal or geographic reasons ('the
> legal representative in your country is', 'for copyright reasons this
> edition differs in material ways from other countries'), then the
> correct country but wrong language is the best answer.  If the desire
> is simply for maximum intelligibility, then the reverse is true.

But that is a level of decision making that goes well beyond any
algorithm that simply uses truncation of tags, which is the only case in
which the ordering of sub-tags matters.



Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, sp ecifications, "stability", and extensions

2005-01-06 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:ietf-languages-
> [EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED]


> Again, your pejorative dismissal of other people's concerns does not
> mean your position is valid...


> Parsing almost never is. But simply parsing these tag is not, and
never
> has
> been, the issue.

I think you guys are in violent agreement over country codes within a
tag, and that the debate over intrepreting the wording of RFC 3066
serves no purpose.

I think the intent of Mark's dismissal has been to refute
perceived-invalid objections, in which case we need to consider that the
line between perceived-invalid and truly-invalid has been blurred simply
by the volume of discussion (the noise factor). There have been some
invalid objections that bear some similarity to comments Ned has made as
he has tried to make his point. (E.g. Bruce Lilly has claimed invalid
back-compat problems on the incorrect premises that RFC 3066 does not
permit ISO 3166 country codes except as second subtags or does not
permit second subtags that are not country codes (at the moment I forget
if it was one or the other or both).)

But Ned's concerns are legitimate, I think. I'd say they are not
necessarily blocking issues for this draft, because I think a possible
outcome of discussion is to characterize them as concerns about
outstanding issues that need to be solved rather than as concerns over
the draft itself; but I do think they are valid concerns that deserve
attention.

In a nutshell, Ned was elaborating on a comment from Dave Singer that,
once we have parsed a pair of tags and identified all the pieces, it's
not a trivial matter to decide in every case how the two tags compare,
and that there are factors that would exist if the draft were approved
that didn't exist under RFC 3066.

Again, I think this is a question that deserves discussion. In relation
to the proposed draft, I don't see it as a particular problem with the
draft. It is a problem that doesn't exist in RFC 3066, but that is only
because RFC 3066 left us with bigger problems: it doesn't give us any
way to identify pieces that we would be encountering in registered tags
(apart from hard-coded tables compiled from versions of the registry
that pre-exist a given implementation).

RFC 3066 permits tags that have all kinds of internal structures. That
is a problem as it will never allow us to derive much useful information
from a tag with any confidence -- only the ISO 639 language category and
in some cases a country category. I predict that in the future we will
be seeing a significant number of tags (whether sanctioned without
registration by a successor to RFC 3066 or as tags registered under RFC
3066) that go beyond the patterns 'll(-CC)" and "lll(-CC)". If we stick
with RFC 3066, we will have no way of writing forward-compatible
processors that will be able to do very useful matching.

What this draft does is impose some order to all the other patterns
within  tags that are permitted, and tell us what the different pieces
must be. As a result, we have more named pieces to deal with, and we are
presented with the question that Ned raised: "Now we have more named
pieces than we did before; what do we do with them?" That is a problem
that will need to be addressed. But I don't think it's a reason to
oppose the draft, since opposing the draft (or at least opposing any
revision that introduces a richer internal structure) leaves us in a
situation that must be characterized either as a worse problem or as
turning our backs on increased functionality to meet real user needs.



Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, specifications, "stability", and extensions

2005-01-06 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:ietf-languages-
> [EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED]


> My reading of that text is that it goes out of its way to try and
avoid
> direct
> discussion of a matching algorithm, talking instead about "rules" and
> "constructs". I no longer recall the circumstances behind this, but my
> guess
> would be that talking about algorithms directly moved this
specification a
> bit
> too close to implementation work, which in turn would argue for the
normal
> standards track and its ability to assess interop status, not BCP.
> 
> This present yet another problem for the current draft, BTW.

You say that it avoids direct discussion of an algorithm, but then imply
that it talks directly about algorithms. Which is it?

If it talks about principles that may be used in processing tags in a
general sense, but not a specific algorithm, then I don't see that there
is any problem. All that it is doing is giving guidance regarding the
semantic relationships that may exist between tags of different types,
and pointing out what processes may or should not change about a tag to
preserve it's well-formedness and preferred ('canonical') structure, all
things within the scope of a BCP that doesn't specify any specific
matching algorithms.


Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, specifications, "stability", and extensions

2005-01-05 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:ietf-languages-
> [EMAIL PROTECTED] On Behalf Of Bruce Lilly

> > [...] RFC 1766/3066 need to be able to deal with tags that contain pieces 
> > they don't
> know about -- the only subtags they can know about are initial subtags of 
> "i", "x" or
> ISO 639 IDs, or a second subtag consisting of an ISO 3166 code in case the 
> first
> subtag is and ISO 639 ID.
> 
> Right. I.e. they should be able to deal with superfluous stuff
> on the right.  But not script tags that suddenly appear between
> language code and country code.

For purposes of an RFC 1766/3066 parser, a script tag plus anything after it 
would be "stuff on the right I don't know anything specific about". It could 
not be described as superfluous -- the process can still compare tags and make 
matches according to whatever rules it uses, such as left-prefix matching.



> For the triple of
> language/country/script to match usefully in the general case by
> RFC 3066 parsers (which are unaware of script in general), the first
> and second subtags would have to remain language code and country
> code respectively.

If you consider realistic scenarios, this makes the wrong assumption that 
country distinctions generally matter more to users.


> not on a Quixotic quest for "stability"
> of nations.

The draft doesn't try to achieve stability of nations. Only stability in the 
semantics of metadata elements.


Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Language Tags: Response to a part of Jefsey's comments concerning the W3C

2005-01-05 Thread Peter Constable
> From: "JFC (Jefsey) Morfin" <[EMAIL PROTECTED]>

> why not to follow under IAB guidance (or to review) the charter I
proposed
> yesterday, in an IETF way everyone could participate, and to have all
these
> applications supported one shot in working on a linguistic ontology
where
> each language instance would be documented by an ad hoc authoritative
> source. Otherwise it could not be the standard you wish.

The objective of RFC 3066 or any successor is not language documentation
(which I understand to mean more or less language description). Perhaps
I misunderstand what you're saying here.


Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: Last Call on Language Tags (RE: draft-phillips-langtags-08)

2005-01-04 Thread Peter Constable
> From: JFC (Jefsey) Morfin [mailto:[EMAIL PROTECTED]


> 2. I never objected the scripting-ID. I objected that it was not given
the
> same importance as language and country codes. I plead (and act) for
25
> years for the support of authoritative distinctions among users
contexts.
> But I am not paid by a big employer.

I don't have time to offer many comments. Let me say for the benefit of
people that don't know much about me that up to a year ago I was not
paid by a big employer, but was a volunteer working for a non-profit
organization, SIL International, and it was in *that* context that I
became involved in the development of ISO 639 (including being SIL
liaison to the ISO 639-RA/JAC, a member of the US TAG for TC 37, and
project editor for ISO 639-3), a contributor to the development of RFC
3066 and a regular participant in the activity of the IETF-languages
list.



> There is NO consensus in the community and huge technical,
> societal, economical and political concerns. Because one does not
> understand what the Draft wants to achieve, for who and how. The main
> request is to clarify. There are no real objections (except to the
paucity
> of the proposition) but concerns.

I haven't seen many requests for clarification. If that is people are
wanting, then I think the authors, or others, can provide that, if it's
made clear at what points clarification is needed.


> > > It would be very helpful, to me at least, if you or he could
> > > identify the specific context in which such tags would be used
> > > and are required.  The examples should ideally be of
> > > IETF-standard software, not proprietary products.
> 
> You respond none. Just an application level problem.

I was asked to respond with examples that pertain to IETF-standard
software, so that's what I did.


> >I've used Chinese as one example, but there are many other cases,
some
> >familiar to many and some less well known

> Full agreement. But this is to be done through an open and inclusive
> semantic, not on an exclusive first come first serve registration
basis.

Which is why one of the aims of the proposed draft is to fully
incorporate script IDs as sanctioned sub-tags rather than leaving
individual parties to make ad hoc registrations for such distinctions.



> Why do you want there would be an exclusive _unique_ matching
algorithm?

I have never said I want that.


> We had a long talk at the end of the August Paris meeting at AUF over
ISO
> 639-2 and the need to aggregate language ID, scripting ID, usage
> description, authoritative sources and also country codes and on the
> complexity to take into account "sub-code" and private codes and to
add
> accidental or new descriptors in order to document venacular ways of
> speaking, thinking, talking. Obviously it was a private discussion
with a
> few people sharing the same ideas ... May be you were there (we were
the
> last to leave the room and the building).

I don't know. I don't recall this discussion, and I can't put a face to
your name. I know I was not last to leave the room. Obviously I have
ideas on those issues.


Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, sp ecifications, "stability", and extensions

2005-01-04 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:ietf-languages-
> [EMAIL PROTECTED] On Behalf Of John Cowan


> > The whole question of what is a language, a variant or dialect of a
> > language, or a suitable substitute for a language, would benefit
some
> > thought in any tagging scheme, though I agree the problem is not
> > generally soluble.
> 
> See the editor's draft of ISO 639-3 at http://tinyurl.com/6kky2 ...

I would say that all of clause 4.2 is relevant; in addition to 4.2.1, I
would especially include 4.2.2, in relation to which I have presented
ideas that led to the inclusion of the Extensions subtag in the proposed
draft. (I originally thought of it as a way to capture some existing
registered tags as part of a consistent scheme rather than merely as
ad-hoc tags, but I think it may be more generally useful as well for
dealing with some of the issues regarding different perceptions of what
is a language.) I'm afraid I don't have time at the moment to elaborate
further.



Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, sp ecifications, "stability", and extensions

2005-01-04 Thread Peter Constable
> From: Dave Singer [mailto:[EMAIL PROTECTED]

> The whole question of what is a language, a variant or dialect of a
> language, or a suitable substitute for a language, would benefit some
> thought in any tagging scheme, though I agree the problem is not
> generally soluble.

These are questions that have been given some thought. No time to delve
into it at the moment, however.



Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: Last Call on Language Tags (RE: draft-phillips-langtags-08)

2005-01-03 Thread Peter Constable
ics of the whole explicitly in such
cases. And there *is* a need to avoid the problem you alluded to...

> while also
> believing that it is possible to make registrations under the
> rules of 3066 that would make quite a mess of things.

Part of my reluctance to have script IDs included in RFC 3066 was due to
the fact that a set of tags had just been registered (some of which I
now wish didn't exist) which used various subtags in combination, and I
sensed that there was a lack of collective understanding of what the
internal structure of tags and relationships between subtags should be
(which is a direct cause that led me to write the paper I referred to
earlier). Not long after RFC 3066 was approved, there were several
further tags registered that used various subtags in combinations that
concerned me then (I voiced my reservations at the time) and still do.
RFC 3066 *is* too flexible to use without some kind of constraints.
While the proposed draft is not what I would have drafted had I gotten
there first, I have been willing to support it because I feel it
provides helpful constraints on the internal structure of RFC 3066
language tags.



> We have
> tag review processes to prevent just that eventuality.

I have been party to the review process for the past five or so years,
and can say that the review process did not, IMO, always succeed in
avoiding regretable tags (I do not consider those that include script
IDs to be among them) because there was a lack of a model of what
ontology was needing to be described and what the appropriate elements
within a tag standing in what kind of relationship to one another were
needed. This draft doesn't describe such a model, but it does impose
one, which I think is moving in a good directiton. 


> > There may be implementations that use a more complex approach
> > to matching involving inspection of the tagged content itself,
> > or inspecting the particular subtags of a language tag.
> >...
> 
> Peter, you are talking, I think, about different applications
> doing different things given the greater range of options and
> flexibility that the new specification provides.

Actually, no; I was trying to guess at existing applications that might
have particular problems with complexity, as you mentioned. Certainly
language-range matching is no more complex in the proposed draft than it
is today. I personally suspect that the language-range matching
algorithm is too simplistic, but I haven't gone beyond that myself to
start suggesting it needs to be replaced with something more complex.



> Let me also comment on the ISO 3166 issues here...   But
> the solution to the problem of various ISO TCs not having an
> adequate understanding of the impact on the Internet and IT
> communities (and, in the case of TC46, even the
> library/information sciences community that are one of their
> historical main constituencies) is, IMO, to get that message
> across via liaison statements and, if necessary and appropriate,
> encouraging national member bodies to cast "no" votes on
> standards and registration procedures that are insufficiently
> stable.  After the "CS" decision, the statements from the
> British Library advocating a much longer time-to-reuse and from
> the IAB suggesting that a century might be adequate were, again,
> IMO, just the right sort of approach.   In particular, I presume
> that TC 37 has an adequate liaison mechanism in place with TC 46
> to insist that a much more conservative position be adopted with
> regard to changes.  If TC 37 isn't able or inclined to do that
> job effectively, I'm not persuaded that shifting the task to the
> IETF is an appropriate solution or one that is likely to be
> effective.

For my part, I made a point of informing TC 37 members of the
re-assignment of CS, and that led to a resolution at our Paris meeting
last August expressing strong concern over this. I did not ever hear any
response from either TC 46 or the ISO 3166 MA on this matter, however. I
don't know that I would have devised the approach to the handling of
this issue used in this draft had I been its author. I am deeply
concerned that stability be ensured in language tags, however, and if
this is the only way to ensure it I can accept it. 

Of course, your point is that it probably is neither the only nor the
best way to ensure this. I have no comments to counter that opinion.

Regards,
Peter Constable


___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: Last Call on Language Tags (RE: draft-phillips-langtags-08)

2005-01-03 Thread Peter Constable
ly comparisons that are easy
> involve bit-string identity.  Working out, at an application
> level, when two "languages" under the 3066 system are close
> enough that the differences can be ignored for practical
> purposes is quite uncomfortable.   Attempting similar logic for
> this new proposal is mind-boggling, especially if one begins to
> contemplate comparison of a language-locale specification with a
> language-script one -- a situation that I believe from reading
> the spec is easily possible.

RFC 3066 makes reference to a fairly simplistic matching algorithm using
the notion of language-range. The proposed draft would continue to
support that same algorithm with an expectation that implementations of
language-range matching as defined in RFC 3066 would continue to operate
using exactly the same algorithm on new tags permitted by the proposed
revision -- and with generally desirable results. 

There may be implementations that use a more complex approach to
matching involving inspection of the tagged content itself, or
inspecting the particular subtags of a language tag. Certainly an
existing RFC 3066 implementation that does the latter will not be aware
of the specific syntax of the proposed revision, though it also cannot
be aware of registered RFC 3066 tags defined after the implementation
was created -- there is no categorical difference here. 

As for how difficult it would be to update such an implementation to use
a sophisticated matching algorithm based on interpretation of individual
subtags permitted by this draft, I grant that there is greater
complexity, but the draft specifically imposes syntactic constraints
that allow different types of sub-elements to be identified quite
readily. 

As for how the different sub-elements would be used for matching, for
instance in recognizing a relationship between a language-region tag and
a language-script tag, those are issues that already exist with valid
RFC 3066 tags such as zh-CN and zh-Hans. I agree that it is not a
trivial matter to decide exactly how such tags relate. 

That does not, however, change the fact that language tags that
incorporate script IDs are useful and appropriate; for instance, in this
particular example, all that was available for tagging Chinese content
for some time were tags like zh-CN and zh-TW, and this was causing very
significant problems for implementations and users, which is precisely
why zh-Hans and zh-Hant have been registered, and why many of us are
eager to see a revision of RFC 3066 that incorporates script IDs.

(Granted, that does not speak to other changes proposed by the draft.)



> That situation almost invites
> profiling of how this specification should be used in different
> circumstances...

I have no particular counter to the opinions you expressed in your
remaining comments.



Peter Constable


___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: Last Call on Language Tags (RE: draft-phillips-langtags-08)

2005-01-03 Thread Peter Constable
> From: Peter Constable

> I'd also like to observe that various members of TC 37 and the ISO
639-
> RA/JAC have observed or participated in the development of this draft.
For
> my part, it is not the draft I would have developed if I had
undertaken it,
> but I see no problems with it from a TC 37 or ISO 639-RA/JAC
perspective.

I realized there are some additional comments I should make on the
proposed revision of RFC 3066 from a TC 37 perspective. 

(Note: these comments are offered as an active participant in the work
of TC 37 and as a member of the ISO 639-RA/JAC. They are not official
statements of TC 37 or any of its sub-committees, but I believe they are
a reasonable representation of prevailing opinion within TC 37.)

One of the issues this draft attempts to deal with is potential
instability of ISO identifiers and the damaging impact that can have on
existing content and implementations.

ISO 639-2:1998 specifies that its language identifiers may be changed
given compelling reasons, but that an identifier may not be reassigned
for a period of five years after such a change. ISO 639-1:2002 specifies
that an identifier may not be reassigned for a period of *ten* years
after such a change. In practice, there has been no case in which an ISO
639-1 or ISO 639-2 identifier was withdrawn and later reassigned. The
ISO 639-RA/JAC and TC 37/SC 2 have increasingly taken up concerns for
stability to the point that ISO DIS 639-3 has a very strict stability
policy designed to ensure that declarations made on existing information
objects do undergo any adverse changes. This includes a restriction that
identifiers that are deprecated may never be reassigned with a different
meaning.

To the extent that this draft attempts to protect language tags from
instability of ISO identifiers, TC 37 considers it very important to
ensure that metadata elements declaring linguistic properties of
information objects have stability in relation to their meaning, but
feels that there is no significant risk of such instability coming from
ISO 639. 

On the other hand, TC 37 has been very concerned about changes that have
been made in ISO 3166 country identifiers in which identifiers that had
prior meanings were reassinged with new meanings. ISO 3166 country
identifiers have been used by applications of the ISO 639 family of
standards to indicate sub-language distinctions, such as differences in
spelling or lexical items. Such changes to country identifiers have
potential for very detrimental effects on applications of the ISO 639
standards.

I note with interest that ccTLDs make use of ISO 3166 in spite of its
potential for instability. In the case of ccTLDs, however, there is a
considerable infrastructure for dealing with this: the DN system and
strict procedures for deploying changes in ccTLDs onto domains. In the
case of language tags, there are no such procedures for deploying
changes in meanings of country identifiers across instances of metadata
elements used to declare linguistic properties of information objects,
nor is anything of that sort feasible in the general case. It may be
that in the context of certain Internet protocols it is feasible to
deploy changes in ISO 3166 across instances of language tags used by
those protocols -- I don't know if this is true for any Internet
protocols or not. It is certainly not true of all applications of ISO
639 standards that also make use of ISO 3166.

In the latter regard, I would like to point out that the IETF
specification RFC 3066 is refereced for use in metadata in many other
places than IETF protocols, one important application of this
specification being its use for the xml:lang attribute in XML. To the
extent that ISO 3166 country codes can be reassigned with new meanings,
the potential for detrimental effects on RFC 3066 language tags at least
in contexts such as XML is of concern to TC 37.

To the extent that the proposed draft aims to protect language tags from
instability of ISO 3166 country identifiers where there is potential for
detrimental effects on metadata elements declaring linguistic properties
of language resources and other information objects, TC 37 would view
the intent to achieve stability a good thing. It may be that the way in
which it aims to achieve this may not be the best in the IETF context --
that is for IETF and not TC 37 to say. In the long term, though, TC 37
would support measures that would lead to ensuring that language tags
defined by RFC 3066 or its successors are not subject to detrimental
changes in semantics.



Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, specifications, "stability", and extensions

2005-01-03 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:ietf-languages-
> [EMAIL PROTECTED] On Behalf Of Bruce Lilly

> > I don't think it's that uncommon to refer to a specification A that
> makes use of another specification B as an application of B.
> 
> Perhaps, but I think it's best to avoid misunderstanding in
> technical discussion by being precise in use of terminology.

I was being precise. Note that ISO 639 uses "application of language 
identifiers" in exactly the same sense in which I have used "application of RFC 
3066".



Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: Last Call on Language Tags (RE: draft-phillips-langtags-08)

2005-01-03 Thread Peter Constable
> From: John C Klensin <[EMAIL PROTECTED]>

>   (iii) One way to read this document, and 3066 itself for
>   that matter, is that they constitute a critique of IS
>   639 in terms of its adequacy for Internet use.

Not exactly. It reflects that ISO 639 alone does not support all of the
linguistically-related distinctions that need to be declared about
content on the Internet -- something that ISO 639 itself acknowledges
(in general, not just in relation to the Internet). 

Just as RFC 1766/3066 also use ISO 3166 country codes to make
sub-language distinctions (e.g. to distinguish vocabulary or spelling),
so also there is a need to use ISO 15924 to distinguish between
different written forms of a given language. The proposed draft
incorporates ISO 15924 -- something that very nearly happened in RFC
3066, but did not since ISO 15924 was still in process and (as I see it)
those of us involved needed more time to evaluate the idea (which has
happened in the years since then, to the point that we have confindence
about this step).

RFC 1766/3066 also allowed tags to include subtags used for various
purposes, and some tags have been registered to reflect sub-language
variations other than those that can be captured using country (or
script) IDs. This is another way in which ISO 639 alone is not
sufficient, and the need for tags that include such variant subtags has
been demonstrated. The proposed draft constrains the structure of tags
including such variant subtags so as to avoid haphazard and inconsistent
structuring of tags, which would present signficant problems.

(Of course, that is not all that the proposed draft does.)

Thus, I would not describe this as a critique of ISO 639. It is simply a
recognition that ISO 639 itself makes that there are language
distinctions that often need to be made that ISO 639 itself does not
make.



>   From
>   that perspective, the difference between the two is that
>   3066 was prepared specifically to meet known and
>   identifiable Internet protocol requirements that were
>   not in the scope of IS 639.  The new proposal is more
>   general and seems to have much the same scope as ISO
>   639-2 has, or should have.

The scope of what is needed for Internet language tags is greater than
the scope of ISO 639-2, which is even more limited than the general
comments I made about wrt ISO 639 (which comments are equally applicable
to ISO 639-1, ISO 639-2 or ISO DIS 639-3).


>  It is not in the IETF's
>   interest to second-guess the established standards of
>   other standards bodies when that can be avoided and,
>   despite the good efforts of an excellent and qualified
>   choice or tag reviewer, this is not an area in which the
>   IETF (and still less the IANA) are deeply expert.  So
>   there is a case to be made that this draft should be
>   handed off to ISO TC 37 for processing, either for
>   integration into IS 639-2 or, perhaps, as the basis of a
>   new document that integrates the language coding of
>   639-2 with the script coding of IS 15924.

Speaking as a member of TC 37, of the ISO 639-RA Joint Advisory
Committee, and project editor for ISO 639-3, I can say that it would be
possible for TC 37 to take on a project to develop a standard for
language-tags that addresses some of the needs this draft is attempting
to meet, such as integrating ISO 15924. Note, though, that incorporation
of this draft (or even RFC 1766/3066) into ISO 639-2 would be well
beyond the scope of ISO 639-2. Something of this nature would
necessarily involve a distinct standard, and perhaps one that is not
part of the ISO 639 series. 

I'd also like to observe that various members of TC 37 and the ISO
639-RA/JAC have observed or participated in the development of this
draft. For my part, it is not the draft I would have developed if I had
undertaken it, but I see no problems with it from a TC 37 or ISO
639-RA/JAC perspective.


Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, specifications, and extensions

2005-01-02 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:ietf-languages-
> [EMAIL PROTECTED] On Behalf Of Bruce Lilly

> > There is nothing in RFC 3066 that says a registered tag must have 3 to 8
> characters in the second subtag. It simply requires that any tag in which
> the second subtag is 3 to 8 letters must be registered.
> 
>The following rules apply to the second subtag...

> That does not permit tags with two-letter second subtags to be registered
> in the IANA registry; it permits that only for "Tags with second subtags
> of 3 to 8 letters".  Granted, it could be clearer.

Are you familiar with the term "eisegesis"? You are "putting words in the 
mouth" of the RFC. It does not say what you claim. You are very clearly 
mis-interpreting it, as evidenced by the registry and by the text of the RFC 
itself:


   This procedure MAY also be used to register information with the IANA
   about a tag defined by this document, for instance if one wishes to
   make publicly available a reference to the definition for a language
   such as sgn-US (American Sign Language).


You are, quite simply, mistaken on this point.


> > There is no reason to create a separate mechanism. When identifying
> textual content,
> 
> Language is not exclusively associated with text.  It is also a
> characteristic of spoken (sung, etc.) material (but script is
> not).

True, though at present, the vast majority of linguistic content on the 
Internet is in the form of text. But this draft easily accommodates non-text 
content: don't put in a script ID when it's not an appropriate thing to declare 
about the content.


 
> > the identity of the writing system
> 
> Writing doesn't apply to spoken material, etc.  There is nothing
> in RFC 3282 or MIME that requires that Content-Language and/or
> Accept-Language fields be used exclusively with written text.

And there's nothing in the draft that would require a language tag to have a 
script ID.


> > > In an inappropriate way. Without consideration for backwards
> > > compatibility.  In violation of the BCP that specified the syntax
> > > and registration procedure.
> >
> > Not inappropriate at all.
> 
> Specifying script for audio material is as inappropriate as
> specifying charset. In Internet protocols, we do not burden
> protocols with having to interpret charset information for
> non-text material; we should not do so for script information.

This is silly. It's like arguing that xml:lang is a bad idea in the XML spec 
because it can be used as an attribute on any kind of element, including 
elements that happen not to contain linguistic content. So the draft would make 
it possible for someone to tag audio material with a tag containing a script 
ID; that doesn't mean people are going to be foolish enough to do so.


 
> > And all your repeated comments about lack of consideration for backwards
> compatibility and violation of syntax and procedures of BCP47 have been
> shown to be invalid.
> 
> Sorry -- saying so doesn't make it so.  I have explained in
> detail that an RFC 1766/3066 parser cannot be expected to
> make sense of unregistered "sr-Latn-CS" etc.  I have pointed
> to specific second subtag length requirements in RFC 3066 for
> registration.

You have misread RFC 3066 (see above), and that has already been pointed out in 
earlier messages. I've been willing to be corrected when you have shown me to 
be wrong; it's a bit frustrating that you don't seem willing to acknowledge 
such a clear mistake.

Any your misreading of RFC 3066 has misled you regarding what an RFC 3066 
parser should or should not be able to do with "sr-Latn-CS".



Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, specifications, "stability", and extensions

2005-01-02 Thread Peter Constable
ching
> legal "sr-Cs-Latn" containing script designation with legal "sr-CS"
> (no script specified). 

In your comments here, you are being rather loose in your assessment of what is 
or isn't valid. The tag "sr-Latn" is a registered, valid RFC 3066 language tag. 
The tag "sr-Latn-CS" is not registered, but could be and would be valid if 
registered. The tag "sr-CS" is certainly valid; I have no idea how widely it is 
used. The tag "sr-CS-Latn" would be valid if registered, but is not registered 
(and it is unlikely that, if requested, a consensus could be obtained to 
register it, given the preference among those involved in reviewing requests 
for a different ordering of subtags).

*If* "sr-CS-Latn" were registered (it is not), then a language-range matcher 
*must* match a request of "sr-CS" with content tagged "sr-CS-Latn". In 
preceisely the same way, if "sr-Latn-CS" were registered, a language-range 
matcher would, and without modification could, match a request of "sr-Latn" 
with "sr-Latn-CS".

You cannot say that "sr-Latn-CS" has any less or more likelihood of being 
handled by existing language-range matchers than "sr-CS-Latn". Either the 
matchers work per the terms of RFC 3066 or they do not, and RFC 3066 does not 
indicate that either of these is any less valid than the other.



> The proposed draft would make "sr-CS-Latn"
> illegal and would instead require "sr-Latn-CS" which cannot be
> recognized as a valid language tag by an RFC 1766/3066 parser, let
> alone matching against "sr-CS".

There is no reason why an RFC 1766/3066 parser should not recognize 
"sr-Latn-CS" as valid since it conforms to the syntax specified.

A language-range matcher should match "sr-Latn-CS" against a request for 
"sr-Latn", but not "sr-CS". That is by design since a left-prefix matching 
algorithm is limited in what tags it can match, and it is considered more 
important to match for script than for regional variations.


> > But you are speaking as though it's a problem that these tags are
> registered. I have no idea why.
> 
> Registration of a complete tag is not itself a problem.  Registration
> of a complete tag which incorporates script information is not an
> ideal solution to the issue of conveying script information; that
> would be more appropriately done using an orthogonal mechanism to
> convey the orthogonal information...

That's one opinion; there are many who hold a different opinion.



> > But speaking of selective usage, have you noticed that RFC 3454
> identifies specific characters from ISO/IEC 10646 as prohibited? Various
> space and control characters are not permitted, INVISIBLE TIMES isn't
> permitted, END OF AYAH isn't permitted, COMBINING GRAVE TONE MARK isn't
> permitted... How is what is proposed in this draft any more "cherry-
> picking" than that?
> 
> 1. RFC 3454 is not BCP, and isn't being pushed through for immediate
>Standards status without a phased roll-in. The draft under discussion
>has been proposed as BCP which would lack phased roll-in.

So acceptability of selective usage depends upon whether the document is a BCP 
or a proposed standard? I cannot see anything in RFC 2026 that suggests that 
(and it seems pretty odd).


> 2. RFC 3454 does not declare any parts of ISO 10646 as not valid and
>does not call for setting up an IANA registry of code points for the
>purpose of effectively declaring ISO 10646 code points invalid.  The
>draft under discussion explicitly seeks to set up a registry to
>replace use of ISO standard list.

RFC 3454 does say that some parts of ISO 10646 are not valid in strings output 
by stringprep implementations. This draft is analogous. If new characters are 
added to ISO 10646, it is certainly possible that RFC 3454 could be updated to 
exclude some of those new characters as well; what is proposed in this draft is 
analogous; the only difference is that the values considered invalid for the 
given purpose are documented in the IANA registry rather than in an RFC -- 
which is certainly the easier way to maintain things, though perhaps it's not 
considered the preferred means of doing this in the IETF context.


> 3. RFC 3454 does not seek to redefine the meaning of any ISO 10646 code
>points.  The draft under discussion does, as specifically noted in
>the case of the ISO 3166 code "CS".

This draft would not change the meaning of an ISO identifier; it simply does 
not use the latest assigned meaning in case a prior ISO-assigned meaning in use 
on the Internet exists. 

(Note: the draft itself does not entail that CS in particular should be

RE: draft-phillips-langtags-08, process, specifications, and extensions

2005-01-01 Thread Peter Constable
l your repeated comments about lack of 
consideration for backwards compatibility and violation of syntax and 
procedures of BCP47 have been shown to be invalid.



> RFC 3066 doesn't require "haw-US", and if encountered provides for
> matching it (in an "accept" role) with "haw" (as content to be
> provided). "sr-Latn" and "sr-Latn-CS" cannot be matched by an
> RFC 3066-compliant process to anything, since they do not fit the
> RFC 3066 syntax for well-formed language tags.

Certainly they do; and certainly an RFC 3066 parser will match "sr" with 
"sr-Latn" or "sr-Latn-CS", and "sr-Latn" with "sr-Latn-CS".


 
Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, specifications, "stability", and extensions

2005-01-01 Thread Peter Constable
to loss of backwards compatibility.
> >
> > But, as noted above, this is not an issue that is peculiar to the
> proposed revision -- it already existed in RFC 3066.
> 
> No, given a primary subtag which is a language code (and per RFCs
> 1766 and 3066, that's any primary subtag with 2 or more (RFC 3066
> only, more being limited to 3) characters), the second subtag --
> in either RFC 1766 or RFC 3066 language tags -- is always a country
> code and never a script code.

Go back and read RFC 3066 again. It does not impose that constraint:


   The following rules apply to the second subtag:

   - All 2-letter subtags are interpreted as ISO 3166 alpha-2 country
 codes from [ISO 3166], or subsequently assigned by the ISO 3166
 maintenance agency or governing standardization bodies, denoting
 the area to which this language variant relates.

   - Tags with second subtags of 3 to 8 letters may be registered with
 IANA, according to the rules in chapter 5 of this document.


It must be a country ID *if* it is two letters, but not otherwise.


> The proposed draft pulls the rug out
> from under existing parsers by changing that.

You are completely mistaken on this point -- the proposed draft does not change 
the constraint you assumed as that constraint never existed.


> Again you seem to be conflating established Internet Standards Track
> protocols with "applications"

I apparently am using "applications" in a sense you're not familiar with. I 
don't think it's that uncommon to refer to a specification A that makes use of 
another specification B as an application of B.


> and ignoring the critical importance of
> backwards compatibility.

As stated earlier, I quite disagree that back-compat issues have been ignored.


> > Note that there is nothing that prevents other applications from using
> other matching algorithms, including perhaps something that is able to
> recognize in "az-AZ" and "az-Latn-AZ" that both involve Azeri and used in
> Azerbaijan.
> 
> The issue at hand is the existing deployed base of RFC 3066
> implementations that depend on the matching algorithm specified
> therein (which doesn't work with a script tag interposed between
> language code and country code).

You say that these do not work; these implementations will still work, but they 
will match "sr-Latn" but not "sr-CS" with "sr-Latn-CS". If that is a problem, 
please explain why.


> > This is all a discussion we on the IETF-languages list went through five
> years ago, and in the intervening five years I think we have reached a
> consensus on these issues, that consensus being reflected in the proposed
> revision to RFC 3066. (Note that we made the relevant decisions over a
> year and a half ago when we reached a consensus to register az-Latn etc.
> The precedent was established then; the proposed revision adds nothing new
> in this regard.)
> 
> As previously noted, that is a danger recognized by RFC 2026 in
> activity that does not conform to IETF procedures; it is
> possible to reach good consensus on the wrong approach.

Well, that potential was created when RFC 1766 was first approved. Tags like 
az-Latn could have been registered under the terms of that RFC just as readily 
as RFC 3066.

But you are speaking as though it's a problem that these tags are registered. I 
have no idea why.


> > 7.1 says...

> > The proposed revision does not create Internet-specific versions of ISO
> standards...

> By cherry-picking, it effectively seeks to establish such a version.

I would not call what is done "cherry-picking". Any identifier defined in the 
source standard is valid for use, except in the case that the identifier was 
previously defined with a different meaning in that ISO standard. That isn't 
cherry-picking; that is a blindly-applied general principle, created with 
reasoned motivation: to provide stability.

But speaking of selective usage, have you noticed that RFC 3454 identifies 
specific characters from ISO/IEC 10646 as prohibited? Various space and control 
characters are not permitted, INVISIBLE TIMES isn't permitted, END OF AYAH 
isn't permitted, COMBINING GRAVE TONE MARK isn't permitted... How is what is 
proposed in this draft any more "cherry-picking" than that?


> > 10.1 states a general policy regarding IP...

> The ISO, as developers of ISO 639 and 3166, have rights. In particular,
> they have the right to determine what those standards specify -- in
> whole -- and they have the right to revise and amend those standards,
> and are the sole arbiters of what is (and what is not) "valid".

They certainly have and retain rights over standards for language, script and 
country identifiers. They do not, however, determine what is valid for use in 
Internet protocols. Just as it is appropriate for an IETF document RFC 3454 to 
specify for particular reasons that certain encoded entities of ISO/IEC 10646 
are not valid for Stringprep output, so also it is appropriate for an IETF 
document to specify for particular reasons that certain encoded entities of an 
ISO standard are not valid for use in language tags used on the Internet.



Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, sp ecifications, "stability", and extensions

2004-12-30 Thread Peter Constable
> From: JFC (Jefsey) Morfin [mailto:[EMAIL PROTECTED]


> >Of course it would not be clear if you don't have a conceptual model of
> >what "language" tags are identifiers *of*. When RFC 3066 was being
> >developed, there was a suggestion that script IDs be incorporated, but
> >some were reluctant, raising the same question you have here. I was one
> of
> >those. But I didn't remain obstructionist over the issue; instead, I gave
> >a fair amount of thought to the ontology that underlies "language" tags,
> >and subsequently published a white paper and presented on the topic at
> two
> >conferences in the spring and fall of 2002. (Paper is available online at
> >http://www.sil.org/silewp/abstract.asp?ref=2002-003 -- my thinking has
> >evolved since then, but some key results remain valid, I think.)
> 
> May us know which ones?

It would be easier to identify two key points on which my thinking has changed.

IIRC, I was uncertain at the time about what to do wrt sorting. I have since 
concluded that sort order is a presentation issue that, while linguistically 
related, is out of scope for language identifiers. (Note that there is no 
common usage scenario in which it makes sense to declare the sorted order of 
content.) Sort order may certainly be in scope for a locale identifier, but not 
for a "language" tag.

The bigger change is that I have abandoned the fourth main category in the 
ontological model I proposed. At the time, I was still trying to work out where 
something like "Latin America Spanish" fit in. I saw the similarity to 
sub-language varieties / dialects, but at the time thought it needed to be a 
distinct category, for which reason I concocted the notion "domain-specific 
data set". 

I was never very satisfied with that: it wasn't a particularly consistent model 
(a data set is quite a different kind of thing from a language variety) and it 
ignored the similarity with sub-language variety. (And the name was a bit 
unwieldy.) 

I have since realized that I was tripping up on the very problem that was 
blocking the Language Tag Reviewer from accepting the requested registration 
for "es-americas": the assumption that a language tag necessarily refers to a 
conventionally-recognized linguistic identity that exists in the world. 
Language tags are not attributes declared on language varieties; they are 
attributes declared on information objects, indicating linguistic properties of 
those information objects. And the linguistic attributes of an information 
object do not necessarily coincide with conventionally-recognized linguistic 
identities. Of course, in the majority of useful cases they will; but it's not 
hard to show that this is not always the case: e.g. if I present "chat" as an 
expression that could be intrepreted in relation to several different 
languages, it would be entirely appropropriate for me to declare a linguistic 
attribute of that expression of "indeterminate" since that is precisely my 
intent -- but clearly "indeterminate" doesn't correspond with any particular 
language identity out in the world.

Thus, I came to realize that the kind of distinction intended by "es-americas" 
was just the same kind of distinction made for any sub-language variety: it 
declares that the information object is not only in some particular language, 
but is even more constrained in terms of the language variety in use. It is 
simply coincidental that the more constrained usage in this case doesn't 
coincide with a single dialect used by some identifiable speaker community.



Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, sp ecifications, "stability", and extensions

2004-12-30 Thread Peter Constable
 a *non goal* of the proposed draft to accommodate that level of detail as 
it is not appropriate to try to capture that level of ad hoc detail in a 
general-purpose metadata element.



> >The bigger problem you're pointing out is the limitations of using
> >suffix-truncation alone as a matching algorithm...

> This shows that language matching algorithm should not be addressed in the
> same document. I also submit that this kind of matching policy should be a
> possible decision of the user. Obviously IA rules should be mentionned.

It doesn't show that matching should not be addressed in the same document; it 
merely shows that one particular algorithm doesn't meet all needs. It would be 
possible to move all discussion of matching to another document, but I don't 
see any reason why that must be done. The draft discusses some general 
considerations and leaves plenty of room for separate specifications for 
particular matching algorithms for use in particular applications.



> > > Surely some types
> > > of script is indicated by the charset; in situations where that
> > > is not the case, a separate mechanism could be used for that
> > > orthogonal parameter without breaking compatibility with
> > > existing parsers of language tags.
> >
> >This is all a discussion we on the IETF-languages list went through five
> >years ago, and in the intervening five years I think we have reached a
> >consensus on these issues, that consensus being reflected in the proposed
> >revision to RFC 3066. (Note that we made the relevant decisions over a
> >year and a half ago when we reached a consensus to register az-Latn etc.
> >The precedent was established then; the proposed revision adds nothing
> new
> >in this regard.)
> 
> Are we sure that this "others have reached a consensus without your
> objections, so we will not consider them" is a valid form of consensus?

I was merely trying to point out that the questions you are asking are not new, 
that decisions *have* been taken, and that the results are now part of the 
Internet legacy. You are certainly welcome to consider whether there's a better 
way and to propose some entirely new infrastructure for the Internet, but that 
should not prevent those of us who have been working on the evolution of the 
existing infrastructure for the past several years to continue to move forward 
in that evolution.

Or were you suggesting that at any time anybody should be able to question 
whether standards that have been in use for some time were formed with adequate 
consensus?


 
> > > Please see RFC 2026 sections 7.1, 7.1.1, 7.1.3, and 10.1...

> >7.1 says... The fact that not all are used, or that some are
> >used as they were specified in dated version of the ISO standard is not
> in
> >contradiction with 7.1 -- it's just one of "several ways in which an
> >external specification... may be adopted."
> 
> I am sorry but this does not stand. The proposed revision directly refers
> to ISO standards while there are Internet documentation of the way they
> should be used.
> 
> Examples/
> 1. OSI 3166 is refered to. RFC 1591 should. RFC 1591 introduces
> differences
> (we all live with) with OSI 3166 which is taken as a reference to know
> what
> is a country.
> 2. OSI 639 scripting fr-FR is used while RFC 1958 leads to fr-fr or FR-FR
> or FR-fr indifferently and calls for fra-fr to avoid confusion.
> 
> In RFC 1591and RFC 1958 parlance "en-GB" should therefore be "eng-uk"

RFC 1591 and RFC 1958 are specifications for completely separate protocols. Not 
only is it completely inappropriate to suggest that RFC 1766 or its successors 
must be subject to these unrelated specifications, to do so would break a large 
number of existing implementations of RFC 1766 and RFC 3066 (let alone the 
proposed revision). This is truly nonsense.



> >Thus, I see no difference between RFC 3066 and this proposed revision in
> >relation to compliance with the sections of RFC 2026 you referred to.
> 
> Full agreement. So there is no need for it - except to enhance the RFC
> 3066
> for its specific applications.  This is OK as long as this is clearly
> stated.

The goals for the proposed revision in enhancing RFC 3066 are clearly stated in 
the draft.


 

Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, specifications, "stability", and extensions

2004-12-30 Thread Peter Constable
s an explicit cooperative arrangement to do so has been made.
   However, there are several ways in which an external specification
   that is important for the operation and/or evolution of the Internet
   may be adopted for Internet use.


The proposed revision does not create Internet-specific versions of ISO 
standards; it uses IDs drawn from ISO standards with semantics defined in those 
source standards at the time they were adopted for use in language tags -- the 
source for the IDs, the symbols and their meanings all reside in the ISO 
standards. The fact that not all are used, or that some are used as they were 
specified in dated version of the ISO standard is not in contradiction with 7.1 
-- it's just one of "several ways in which an external specification... may be 
adopted."


7.1.1 simply says that an open extenal standard may be incorporated merely by 
reference. There is no requirement here that is not met by the proposed 
revision.

7.1.3 simply says that an Internet specification may be an adaptation of an 
external specification provided certain conditions are met. Neither RFC 3066 or 
the proposed revision are adaptations of any existing external specification, 
so this is not applicable.

10.1 states a general policy regarding IP: 


In all matters of intellectual property rights and procedures, the
   intention is to benefit the Internet community and the public at
   large, while respecting the legitimate rights of others.


Again, there is no requirement stated here that is not met by the proposed 
revision. Clearly, the intent of the proposed draft is to benefit the Internet 
community and the public at large. There are no rights of others that are in 
any way violated by the proposed revision.

Thus, I see no difference between RFC 3066 and this proposed revision in 
relation to compliance with the sections of RFC 2026 you referred to.


> Agreed.  But the activity on the ietf-languages list regarding the
> draft under discussion isn't an IETF process -- there is no WG or
> Chair, no charter, etc.  Like the fictional Topsy, it jes' growed up.

RFC 3066 was developed in exactly the same manner as this proposed revision has 
been developed -- as an internet draft prepared by a member of the the 
IETF-languages list and processed among members of that list until it was 
submitted for last call and subsequent IESG action.



Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, specifications, "stability", and extensions

2004-12-30 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:ietf-languages-
> [EMAIL PROTECTED] On Behalf Of Bruce Lilly


> > Do what you feel is warranted, Bruce. You don't appear to be trying to
> achieve consensus, which is the touchstone of the IETF process as I
> understand it. If you feel issues should be taken to the IESG, then do so.
> 
> You have yourself noted that the draft is an individual
> submission, not the result of an IETF process. "consensus"
> doesn't apply to an individual effort.  IF you want to
> adhere to IETF process, by all means ask the IESG to set
> up a working group, with a charter, a Chair, etc.; I
> fully support that.

I don't understand why these kinds of comments are arising. To my understanding 
(Harald can correct me if I'm wrong), the process that has been taken in 
preparing the proposed revision of RFC 3066 is the same as what was done in 
development of RFC 3066 as a replacement for RFC 1766. A general consensus was 
achieved on the IETF-languages list in preparing the draft for "RFC1766bis", 
and in exactly the same way a general consensus was achieved on this list in 
the preparation of "RFC3066bis". Subsequent steps were taken with RFC 3066 for 
it to be given BCP status, but that did not involve establishment of a working 
group; I don't understand what should prevent the same thing happening in this 
case.


Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: New Last Call: 'Tags for Identifying Languages' to BCP

2004-12-16 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:ietf-languages-
> [EMAIL PROTECTED] On Behalf Of Bruce Lilly


> > > The point is that under RFC 3066,
> > > the bilingual ISO language and country code lists are
> > > considered definitive.
> >
> > That is nowhere stated or even suggested in RFC 3066.
> 
> RFC 3066 section 2.2 states, in part:
> 
>- All 2-letter subtags are interpreted according to assignments
found
>  in ISO standard 639, "Code for the representation of names of
>  languages" [ISO 639], or assignments subsequently made by the ISO
>  639 part 1 maintenance agency or governing standardization
bodies.
> 
> and has a similar statement regarding ISO 3166.
> 
> "interpreted according to assignments found in" certainly
> sounds as if the ISO lists are considered definitive for
> their respective categories of subtags, since their
> interpretation is specified as that given in those lists.
> I don't see how the RFC 3066 text can be interpreted
> otherwise.

You're now quoting things so far removed from their context that they
are no longer being evaluated fairly. I believed we were talking about
the specific strings, as you had made reference to implementers of
bilingual products not having access to that data. Perhaps I
misunderstood you, but whether or not, the relevant facts are that RFC
3066 referred to ISO source standards to establish the denotation of
identifiers drawn from those standards, and the proposed revision does
the same.


Peter Constable
Microsoft Corporation

___
Ietf mailing list
[EMAIL PROTECTED]
https://www1.ietf.org/mailman/listinfo/ietf


RE: New Last Call: 'Tags for Identifying Languages' to BCP

2004-12-15 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:ietf-languages-
> [EMAIL PROTECTED] On Behalf Of John Cowan


> But absolutely nothing except his good sense prevents Michael from
> registering
>
en-the-dialect-spoken-on-the-bowery-between-1933-and-1945-by-alcoholic-
> drug-users-who-live-in-flophouses.

Sub-tags can be at most 8 chars long, so Michael would ask for it to be
changed to something like
en-the-dialect-spoken-on-the-bowery-between-1933-and-1945-by-alcoholc-dr
ug-users-who-live-in-flophses. :-)


Peter Constable
Microsoft Corporation

___
Ietf mailing list
[EMAIL PROTECTED]
https://www1.ietf.org/mailman/listinfo/ietf


RE: New Last Call: 'Tags for Identifying Languages' to BCP

2004-12-15 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:ietf-languages-
> [EMAIL PROTECTED] On Behalf Of Bruce Lilly


> > By reading both RFC 2047 and RFC 2231, one
> > finds that they assume that a language tag must be at most 64
characters
> > long...

> > - the shortest charset names are 2 characters long (e.g. "IT")
> 
> Not all charsets have 2-character names...

In determining the longest language tag permitted, one must identify the
shortest possibilities for all other components. 


> > - the minimum encoded-text length is 1 character long
> 
> That is strictly only true for text that meets all of the
> following conditions...

Hey, I just said what the EBNF said.



> > An encoded-word must contain at least 11 characters that are not
part of
> > the language tag and have a total length of no more than 75
characters.
> > Therefore, an upper bound on language tags that can be used in an
RFC
> > 2047/2231 encoded-word production is 64 characters.
> 
> That is a best case upper bound...

I identified it as such.


> The worst case appears to be the charset named
> Extended_UNIX_Code_Fixed_Width_for_Japanese (43 characters)...

> As mentioned, use of an encoded-word
> plus the necessary whitespace around it to represent a
> single character is rather wasteful, so a brief language tag
> is indicated; fortunately "ja" suffices for text likely to
> be used with that charset.

Of course, the length limitations must be balanced between the charset
tag, the language tag and the encoded-word itself.


> > I see no reason why limits must be added as a
> > constraint in a revision of RFC 3066.
> 
> The primary reason for specifying limits is due to the
> proposed removal of the review/registration process
> which currently limits the length of non-private-use
> tags.

The review/registration process for RFC 3066 registrations does not
impose pre-defined limits that implementers of RFC 3066 can assume in
their parsers.



> > It would be a good idea, however,
> > to point out in section 2.1 of the draft that some applications of
this
> > specification may impose limits on the length of accepted language
tags,
> > and perhaps to cite RFC 2231 as an example.
> 
> As a general principle, that's fine, however I would point
> out that given the inability of experts to be able to
> accurately point out the limits quickly...  I do
> not think it is sufficient merely to state the fact that
> there are limits, with or without a pointer to RFC 2231 as
> an example.  Some indication of the magnitude of worst-case
> restrictions is at least advisable...

How is it possible to identify what is the worst-case bound assumed in
implementations that are out there?

How is it possible to predict ahead of time what is the worst-case
length for a RFC3066-registered language tag?

Neither is possible. In light of that, I think it best to make sure
implementers of the revised RFC 3066 be reminded that some
implementations may impose limits (whether those implementers be
constructing tags or passing them from one process to another), and for
implementers to incorporate robustness into their implementations so
that they can respond gracefully if an unexpectedly-long tag is
encountered -- after all, no matter what limit could be imposed in a
revision to RFC 3066, there's no way to stop malware from sending bad
data.

(How *do* encoded-word parsers react if a bogus charset or language tag
that's 2k octets long is encountered? The encoded-word spec already
allows for segmenting long strings; could it not also be revised to
allow segmenting for the parameters, which would also make it more
robust?)


Peter Constable
Microsoft Corporation



___
Ietf mailing list
[EMAIL PROTECTED]
https://www1.ietf.org/mailman/listinfo/ietf


RE: New Last Call: 'Tags for Identifying Languages' to BCP

2004-12-15 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:ietf-languages-
> [EMAIL PROTECTED] On Behalf Of Bruce Lilly


> Currently sr-CS has a specific
> meaning under RFC 3066; it has had for some time.

The meaning "Serbia and Montenegro" was introduced relatively recently
(a little more than a year ago), was immediately received with alarm by
many in the IT sector. There were vain attempts to get it reversed, and
that failure was an impetus to introduce protection against such changes
in the revision of RFC 3066. I am not aware of "CS" being used in the IT
sector with the new meaning, though cannot guarantee that.


Peter Constable
Microsoft Corporation

___
Ietf mailing list
[EMAIL PROTECTED]
https://www1.ietf.org/mailman/listinfo/ietf


RE: New Last Call: 'Tags for Identifying Languages' to BCP

2004-12-15 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:ietf-languages-
> [EMAIL PROTECTED] On Behalf Of Bruce Lilly


> > This is a situation we do not intend to repeat.
> 
> That is precisely what would be repeated, and the problem
> would remain.  "CS" currently means "Serbia and Montenegro",
> and its use in accordance with RFC 3066 has precisely that
> meaning.

And that is a significant problem we wish to remedy as there is some
unknown amount of data or implementations out there that use "CS" but
with a different meaning intended.


> > > > The usability flaw in treating ISO 639 and ISO 3166 as
> > human-readable is
> > > > evident in the confusion between ja and JP (or is it jp and
JA?),
> [...]
> > It is not uncommon for users to confuse "JA" and "JP".
> 
> Which clearly demonstrates why mere codes in the absence
> of definitions associated with the codes is a pointless
> proposition.

I believe you have confirmed my point, that codes are not meant to be
human readable.

As for your concern regarding definition, it has been clearly pointed
out that codes will not be lacking definitions -- the same definitions
they have today from the same sources (with references made to the same
sources) will still be available.


> > Again, not hypothetical at all.
> 
> Last time I checked, "US" didn't mean France, and "CN"
> didn't mean Canada -- I suggest that you might want to
> brush up on the definition of hypothetical...

The case is hypothetical, but the hypothetical case serves to illustrate
a general scenario, and the general scenario is not hypothetical.



> > You didn't use the term "display names", but it is clearly implied
by
> > your reference to bilingual implementations.
> 
> Your inference (which you incorrectly claim as my implication)
> is different from my claim. My claim is that under RFC 3066,
> the definitions...

You have failed to quote what you originally wrote which I claimed made
this implication: you spoke not of definitions but of bilingual
applications.



> > Definitions in multiple languages are not a requisite to
establishing
> > the denotation of a coded element.
> 
> True but irrelevant to the point.

Oh? Simply because you make this assertion?


> We now have definitions of
> specific types of elements (viz. country and language tags) in
> multiple languages, and the objection is to the unnecessary
> removal of that characteristic.

The definitions we have now will remain, they will continue to be
referenced and available. I do not see how you say they are being
removed?


Peter Constable
Microsoft Corporation

___
Ietf mailing list
[EMAIL PROTECTED]
https://www1.ietf.org/mailman/listinfo/ietf


Re: New Last Call: 'Tags for Identifying Languages' to BCP

2004-12-14 Thread Peter Constable
> From: Vernon Schryver <[EMAIL PROTECTED]>
> Subject: Re: New Last Call: 'Tags for Identifying Languages' to BCP
> To: [EMAIL PROTECTED]
> Message-ID: <[EMAIL PROTECTED]>

> Besides, I didn't say that one should ignore the English, but that
> implementors give precedence to the ABNF.  When you are writing an RFC
> that you hope will be implemented, you MUST remember that programmers
> are lazy.  We transliterate the ABNF to build the parser and so
implement
> the syntax and read the English to figure out and so build the
semantics.
> As I said, if you must have contradictions between your ABNF and your
> English, you must accept the fact that most technical people will
> assume your ABNF is right and your English is wrong.  That fact seemed
> to me to conflict with statements in this thread, and that suggests a
> problem in your working group and your RFC.

This is somewhat moot since the author has indicated the relevant
portion of the ABNF will be revised. In this case, though, the ABNF
could not be said to be in contradiction with the English prose:
anything permitted by the constraints specified in the English prose
would be recognized using the ABNF. 

It is true that there are strings that could be recognized by the ABNF
that would not be permitted by the English prose, but the revision being
made to make the ABNF production in question match what Bruce Lilley
thought it should be does not change that. The only way to write the
ABNF in a way that it permits exactly no more or no less than what is
specified by the English prose would be to have the production rule
simply enumerate a specific set of terminal strings, which does not seem
to be particularly helpful, especially when the the RFC would establish
a machine-readable registry maintained by IANA in which those very
strings are enumerated.


Peter Constable

___
Ietf mailing list
[EMAIL PROTECTED]
https://www1.ietf.org/mailman/listinfo/ietf


RE: New Last Call: 'Tags for Identifying Languages' to BCP

2004-12-14 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:ietf-languages-
> [EMAIL PROTECTED] On Behalf Of Bruce Lilly


> > > The point is that under RFC 3066,
> > > the bilingual ISO language and country code lists are
> > > considered definitive.
> >
> > That is nowhere stated or even suggested in RFC 3066.
> 
> RFC 3066 section 2.2 states, in part:
> 
>- All 2-letter subtags are interpreted according to assignments
found
>  in ISO standard 639, "Code for the representation of names of
>  languages" [ISO 639], or assignments subsequently made by the ISO
>  639 part 1 maintenance agency or governing standardization
bodies.
> 
> and has a similar statement regarding ISO 3166.
> 
> "interpreted according to assignments found in" certainly
> sounds as if the ISO lists are considered definitive for
> their respective categories of subtags, since their
> interpretation is specified as that given in those lists.
> I don't see how the RFC 3066 text can be interpreted
> otherwise.

RFC 3066 indicates that the *interpretation* is determined by the source
ISO standards. You were discussing display names. (Though, now that I've
shown that display names are out of scope, you appear to be attempting
to change things as though you had been discussing definitions.)



Peter Constable
Microsoft Corporation

___
Ietf mailing list
[EMAIL PROTECTED]
https://www1.ietf.org/mailman/listinfo/ietf


RE: New Last Call: 'Tags for Identifying Languages' to BCP

2004-12-14 Thread Peter Constable
for the RFC 1766/3066/... sequence of specifications.


> > > One possibility would be two description fields.  But the
> > > registry would need a charset closer to ISO-8859-1 than
> > > to ANSI X3.4 as currently specified.  Or an encoding
> > > scheme.
> >
> > Personally, I don't see the value in something like that. Given the
> intent to have a registry that can be machine-readable, changing its
> charset from ANSI X3.4 in order to gain descriptors in just one more
> language is not worth it IMO.
> 
> Fine, use utf-8, which encompasses ANSI X3.4 and
> ISO-8859-1 (plus others).  The point is that ANSI
> X3.4 is inadequate.

There is no point changing the charset to support something that is out of 
scope for the specification.


> > Speaking at least for Microsoft, we're interested in having descriptors
> in far more than two languages, and we certainly would not blindly base
> the descriptors we present to our customers solely on what a registry
> provides, no matter what its charset.
> 
> Surely in going from two (the current situation per
> RFC 3066) to "more than two" indicates that decreasing
> to one (as in the draft proposal) is heading in the
> wrong direction.  It certainly invalidates the claim
> that the proposal is compatible with existing
> implementations, at least one of which does make use
> of the descriptions currently provided in both
> languages in the ISO lists specified by RFC 3066.

Incorrect; you are making false claims about what is specified in RFC 3066. 



Peter Constable
Microsoft Corporation

___
Ietf mailing list
[EMAIL PROTECTED]
https://www1.ietf.org/mailman/listinfo/ietf


RE: New Last Call: 'Tags for Identifying Languages' to BCP

2004-12-13 Thread Peter Constable
se in RFC 3066. This draft
does not change that; it merely provides some info that may save you
having to go look up the ISO standard, but that info is not the last
word.


> So, you're saying that the ISO definition of "CS" as
> "Serbia and Montenegro" will continue to be valid, with
> that meaning, in a language-tag?

The meaning of an ID in the registry that came from an ISO standard is
the meaning it had in the version of that ISO standard from which it was
obtained. (Typically, that is the current version of the ISO standard at
the time the ID is added to the registry, though the initial registry
being prepared will have some exceptions to resolve pre-existing
ambiguous cases, such as CS.)

If you're really wanting to know what the meaning of "CS" would be per
the proposed draft, the proposal is that it will forever remain valid
with the meaning "Czechoslovakia" as it was originally defined in ISO
3166.


> The foolishness is your insistence on trying to tie
> the definitions to a localization issue.

It was you who established it as a localization issue, very clearly:


> Surely, though, this is not a technical argument against the proposal.

Not purely technical, though it presents problems for existing
implementors who provide bilingual support.
Eliminating bilingual descriptions for the language, country (and UN
region) codes leaves implementors in a quandary.




> I haven't specifically discussed "display names"; that is your
> assertion, and not my basis for objection.

You didn't use the term "display names", but it is clearly implied by
your reference to bilingual implementations.


> I refer to the
> definitions and the need to map to and from those definitions
> at either end of the communications channel.  Whether or not
> that happens by "display" is incidental to the issue of the
> number of languages that the definitions are provided in.

Definitions in multiple languages are not a requisite to establishing
the denotation of a coded element. There are widely-adopted coding
standards that establish denotations using one language only. In this
case, though, the denotations of ISO IDs is established by the ISO
standard (and particular version of that standard) from which they were
obtained. The registry contains a description that dismbiguates which
ISO definition is to be used, but is not a replacement for the ISO
definition.


Peter Constable
Microsoft Corporation

___
Ietf mailing list
[EMAIL PROTECTED]
https://www1.ietf.org/mailman/listinfo/ietf


RE: New Last Call: 'Tags for Identifying Languages' to BCP

2004-12-13 Thread Peter Constable
ot replacements for content of the source
standards themselves

- that we do not need to change the proposed format of the registry to
include descriptions in multiple languages



Peter Constable
Microsoft Corporation

___
Ietf mailing list
[EMAIL PROTECTED]
https://www1.ietf.org/mailman/listinfo/ietf


RE: New Last Call: 'Tags for Identifying Languages' to BCP

2004-12-13 Thread Peter Constable
he goal that we had wrt stability while eliminating the concern
that English-only annotations for some reason apparently create for you.
Personally, I think the English annotation is helpful, but it seems that
the real solution you're looking for is to remove any annotation
whatsoever so that the situation is closer to what we have under RFC
3066.



> > Display names for languages and countries are not within the scope
of
> > RFC 1766 or RFC 3066. It is preposterous to suggest that this draft
is
> > not compatible with existing implementations of RFC 3066 on that
basis.
> 
> On the contrary, it is preposterous to suggest that codes
> will be attached to text by magic; some human somewhere,
> somehow is going to have to indicate the language to
> something, and it certainly isn't going to be by way of
> a 2- or 3-letter code without some reference to what those
> codes *mean*.  And at the present time, the meaning of
> those codes is defined -- bilingually -- in the ISO
> lists.

RFC 3066 did not even discuss let alone provide a means for attaching
display text to codes. It *is* preposterous to suggest that this draft
is incompatible with RFC 3066 on that basis. Again, the more you press
this, the more silly it seems.


> > But
> > you are simply adding localization requirements to a spec for i18n
> > infrastructure, and I consider that not at all appropriate.
> 
> No, I am complaining about removal of internationalized
> definitions associated with language tag components.

No definitions are removed. The draft points to the source ISO standards
just as RFC 3066 does.


> "Localization" would be translation of the French definition
> into some other language.  That is not my concern. My concern
> is the elimination of the French definition in the first place.

No, you have not commented on definitions; you have repeatedly commented
on stings to present to users. Please accept that your arguments on this
matter are empty.


> > > One part of my claim is that non-private-use RFC 3066 tags
> > > up to the present time are no longer than 11 octets in length.
> >
> > Only co-incidently at the present time.
> 
> As mentioned, under RFC 1766/3066 review/registration rules,
> excessively long tags would certainly raise objections. That's
> no coincidence -- it's an intentional design feature.

But excessive is not defined anywhere in RFC 1766/3066, and if there was
a very good reason presented why a tag of x characters long were needed,
it would have to be considered.


> > And so that limit would be a constraint applying for all time to the
> > 'grandfathered' production which concerned you so much.
> 
> And so it can easily be incorporated into that ABNF production.

The productive thing would be for you to provide a suggested revision of
the ABNF to the authors.



Peter Constable
Microsoft Corporation

___
Ietf mailing list
[EMAIL PROTECTED]
https://www1.ietf.org/mailman/listinfo/ietf


RE: New Last Call: 'Tags for Identifying Languages' to BCP

2004-12-13 Thread Peter Constable
grammar.
> >
> > The main concern was with the "grandfathered" production, but I've shown
> > that that is a non-issue.
> 
> Again, it is an issue that imposes requirements on language
> tag parsers.  What you've shown is that the ABNF is not
> consistent with what was desired to be expressed, and
> that makes it an issue that needs to be addressed.

Again, I believe the bigger issue is not getting the ABNF to express what was 
desired, but rather whether parsers are written to consider only the ABNF or 
the ABNF plus other specified constraints as well.


> > The maximal length issue exists just as much
> > in RFC 3066 due to private-use tags; it is a technical concern that
> > might worth reviewing in RFC 3066bis, however; but it is not
> > insurmountable, and not a new problem.
> 
> Private-use carries its own considerable baggage; aside from
> that, the draft proposal increases the length of non-private
> tags that affect both protocol design and implementations
> from a worst case maximum of 11 octets under RFC 3066...

Worst case at present; a month from now it could be unlimitedly larger. But 
I've accepted that it would be an improvement to add constraints on overall 
length.


Peter Constable
Microsoft Corporation

___
Ietf mailing list
[EMAIL PROTECTED]
https://www1.ietf.org/mailman/listinfo/ietf


RE: New Last Call: 'Tags for Identifying Languages' to BCP

2004-12-13 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:ietf-languages-
> [EMAIL PROTECTED] On Behalf Of Bruce Lilly


> > What is silly is saying that every language tag has to have a
date/time
> > attribute associated with it so that computer software managing that
> text
> > knows the language of that text.
> 
> In the specific cases of the core Internet protocols that
> I have mentioned, there *is* a date/time attribute in the
> form of an RFC [2]822 Date field.  If we're talking about
> some file stored on some machine, every OS that I know of
> has a date/time stamp associated with that file.  If you
> have something else in mind, a concrete description and/
> or example might help.

That is not sufficient for many other implementations of RFC 3066. For
instance, an XML document may well be stored in a file system that has
date/time stamps associated with the file; it might also be stored in a
content manangement system that does not report creation dates when
returning content. And elements from within that XML document may be
returned as the result of an X-Path query or a call into a DOM API, and
those surely cannot be assumed to have creation date/time stamps, though
one certain must assume that they can have RFC 3066 tags as xml:lang
attributes.


> I'm not "eager to abolish" "uniqueness".  There never was
> any guarantee that codes would never change. Both RFCs
> 1766 and 3066 specifically mention changes as a fact of
> life.

Some of us consider that fact and the instability particularly of ISO
3166 to be a serious problem. That (not accessibility) was one of the
key reasons for this revision.


> > > SO where are the French definitions?
> >
> > Ask a person who is bilingual in English and French to provide one.
> 
> That would lack definitiveness which characterizes the
> ISO lists.

You started out this thread by talking about display names, not
definitions; hence Mark's suggestion. Now you have switched to talking
about definitions. The draft clearly indicates where one finds the
definitions:

"   o  All 2-character language subtags were defined in the IANA
registry
  according to the assignments found in the standard ISO 639..."

I.e. the definition is provided in the registry on the basis of what is
defined in ISO 639; hence if what is indicated in the registry is for
any reason insufficient for your purposes, you consult the definitive
source, the ISO standard.



Peter Constable
Microsoft Corporation

___
Ietf mailing list
[EMAIL PROTECTED]
https://www1.ietf.org/mailman/listinfo/ietf


RE: New Last Call: 'Tags for Identifying Languages' to BCP

2004-12-13 Thread Peter Constable
> The specification of the
> draft is *NOT* compatible with that existing implementation
> because it removes the existing functionality of official
> descriptions in French of language and country codes. As a
> result of that incompatibility,  the newly proposed
> specification does not work with (at least that one)
> existing implementation (but I agree that that is a crucial
> concern).

Display names for languages and countries are not within the scope of
RFC 1766 or RFC 3066. It is preposterous to suggest that this draft is
not compatible with existing implementations of RFC 3066 on that basis.

 
> > There are 6000 languages spoken on Earth, of which
> > perhaps 600 have a standard written form.
> 
> ISO 639 lists about 650, not precisely 6000.

Between ISO 639-1 and ISO 639-2, there are less than 400 individual
languages listed. The number 6000 was given as a rough figure, and it is
fairly well known that the number of living languages is on that order.
ISO 639-3 will list over 7000 different individual languages.


> It might be worthwhile considering the differences in the
> way languages tags are used, by whom they are used, and for
> what purpose.  There may well be a substantial difference
> between use of a tag to represent an obscure dialect of a
> dead language in a research paper vs. tagging a piece of
> text in one of the core Internet protocols such as SMTP.
> The draft seems to ignore the needs of the core Internet
> protocols (e.g. unbounded tag length which is incompatible
> with those protocols).

IETF language tags are used in a wide variety of applications. The
parties involved in development of this spec (the authors and others)
have examined these issues for the past several years and have arrived
at this architecture.


> > What is supposed to
> > be privileged about English and French?
> 
> They happen to be the languages in which international
> standards (q.v. the ISO and UN lists) are published.

That is true for ISO standards because the official languages of ISO are
English and French. (Russian is also an official language of ISO, but is
not required.) But this spec is not an ISO standard; it is an IETF
standard. If you can point to IETF requirements that IETF specs must
contain English and French, then that would be a legitimate concern. But
you are simply adding localization requirements to a spec for i18n
infrastructure, and I consider that not at all appropriate.


> > > ABNF from the draft:
> >
> > You're technically right, but your underlying claim (that RFC 3066
tags
> are
> > bounded in length) is false, as has been shown
> 
> One part of my claim is that non-private-use RFC 3066 tags
> up to the present time are no longer than 11 octets in length.

Only co-incidently at the present time.


> As the draft, if/when approved, would close that registration
> process, that limit (unless a longer tag is registered in
> the interim) would apply for all time. 

And so that limit would be a constraint applying for all time to the
'grandfathered' production which concerned you so much.



Peter Constable
Microsoft Corporation

___
Ietf mailing list
[EMAIL PROTECTED]
https://www1.ietf.org/mailman/listinfo/ietf


RE: New Last Call: 'Tags for Identifying Languages' to BCP

2004-12-13 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:ietf-languages-
> [EMAIL PROTECTED] On Behalf Of Bruce Lilly


> The point is that under RFC 3066,
> the bilingual ISO language and country code lists are
> considered definitive.

That is nowhere stated or even suggested in RFC 3066.


Peter Constable
Microsoft Corporation

___
Ietf mailing list
[EMAIL PROTECTED]
https://www1.ietf.org/mailman/listinfo/ietf


Re: New Last Call: 'Tags for Identifying Languages' to BCP

2004-12-13 Thread Peter Constable
ication to incorporate mechanisms expected in a new part to ISO 639 that
is in preparation, but is not made avaialble for use at this time. 

 

Another (‘variant’) requires sub-tags
to be registered, and requires that the registration indicate prefix sub-tags
that they are recommended to be used with. While it may still be technical
valid to use a registered variant in some way other than the recommendatation,
that will be unlikely (just as certain combinations valid under RFC 3066, such
as ja-DE are unlikely). Thus, implementers will have a reasonable chance of anticipating
what combinations will be used. 

 

The third of these (‘extension’) is
defined as mechanism for extending language tags for use in future protocols. There
is an upper limit of 25 extensions, though this RFC does not define limits on
the length of each extension. There are no extensions defined at this time, and
any extension would require specification in the form of a separate RFC. At
such time as one or more extension RFCs are defined, those specifications would
provide some indication of what limits they do or don’t impose on the
length of extensions. In the case of any protocol that supports this proposed
revision to RFC 3066 but does not support extensions, any extensions that may
be included in a language tag are ignorable.

 

Apart from extensions, all of the mechanisms
introduced in the proposed revision were in response to the direction users and
implementers were already going with registered tags under RFC 3066. Thus,
while the proposed revision gives greater provision for lengthy tags, this is
not completely unrestrained, and the practical likelihood of encountering tags
of any given length would be no greater under the proposed revision than it was
under RFC 3066.

 

Even so, verious changes were suggested to
highlight issues related to length, specifically with a view to the possibility
that some applications of RFC 3066 (or this proposed revision) would impose
fixed limits on the length of tags. These suggestions included notes in that
regard in key points within the RFC, but also in sub-tag registrations and in
RFC defining extensions. (For instance, a variant registration would include
not only a recommendation on appropriate prefixes, but also specific comments
on maximal length of tags using the given variant.) There were no suggestions
to impose limits on the length of tags in the RFC itself (just as RFC 3066 does
not impose limits). Basically, limits on length was seen to be a concern belonging
to particular applications of the language-tag spec and not the spec itself,
but significant additions would be added to the RFC so that these concerns are
highlighted.

 

 

6. Re an i18n-considerations section: It was
pointed out that language tags are symbolic identifiers with no
culture-specific content; the only i18n consideration related to the
identifiers themselves are charset, and charset issues are covered in the
section on syntax. Bruce was also concerned about i18n considerations in the
registry (see issue #2, above – lack of French-language descriptions),
but it was pointed out that the content of the registry is not intended as
localization data, that there are well-established precedents for code sets
that are not documented in terms of multilingual content, and therefore that it
was not really necessary to discuss i18n concerns in relation to the registry
(no more than it is necessary to have a section to discuss i18n issues in
relation to the IANA charset registry in RFC 2978).

 

 

In conclusion, I think that some of Bruce’s
concerns were valid, and suggestions for changes have been presented to the
authors accordingly. I believe all of these changes can be considered to be for
clarification purposes, rather than technical changes. (No changes affecting the
set of valid tags have been made.)

 

 

 

Thanks.

 

Peter Constable

GIFT | GPTS | MICROSOFT

 






___
Ietf mailing list
[EMAIL PROTECTED]
https://www1.ietf.org/mailman/listinfo/ietf


RE: Ietf-languages Digest, Vol 24, Issue 5

2004-12-13 Thread Peter Constable
 specified.  Or an encoding
> scheme.

Personally, I don't see the value in something like that. Given the intent to 
have a registry that can be machine-readable, changing its charset from ANSI 
X3.4 in order to gain descriptors in just one more language is not worth it 
IMO. 

Speaking at least for Microsoft, we're interested in having descriptors in far 
more than two languages, and we certainly would not blindly base the 
descriptors we present to our customers solely on what a registry provides, no 
matter what its charset.



> > > The ABNF in the draft permits all of the following tags which
> > > are not legal per the RFC 3066 ABNF:
> > >    supercalifragilisticexpialidoceus
> > >    y-
> > >    x1234567890abc
> > >    a123-xyz
> >
> > In fact, none of these is permitted by the ABNF of the draft.
> 
> ABNF from the draft...

> That means that the "grandfathered"
> production (which is an alternative in the Language-Tag
> production) will match any of the following text tags (comments
> to the right separated by a semicolon):
>x  ; ALPHA followed by zero repetitions
>xa ; ALPHA followed by one ALPHA (see alphanum)
>x- ; ALPHA followed by one HYPHEN
>    supercalifragilisticexpialidoceus ; ALPHA followed by many ALPHAs
>(see alphanum) (example previously given)
>x1234567890abc ; ALPHA followed by 13 alphanums
>(as previously given)
>a123-xyz ; ALPHA followed by three DIGITs (see alphanum)
>followed by one HYPHEN followed by three ALPHAs
>(example previously given)
>y- ; ALPHA followed by five HYPHENs (example previously
>given)
> 
> I say the ABNF from draft -08 (quoted above) allows those;
> you say no.

My mistake; I was thinking beyond the ABNF alone to other constraints imposed 
by the proposed spec.

As you know, the 'grandfathered' production is loose in the ABNF given in the 
draft, but is very tightly constrained elsewhere in the draft: it is limited to 
only items registered under RFC 1766 or RFC 3066 up to the date of acceptance 
of this proposed spec. (In fact, only a subset of those, all explicitly 
identified in the sub-tag registry.) On the date of acceptance, you will be 
able to know precisely what the valid tags that fit under the 'grandfathered' 
production are and will forever be, and it is 100% guaranteed that none of them 
will have any of the forms that seem to concern you.



Peter Constable
Microsoft Corporation

___
Ietf mailing list
[EMAIL PROTECTED]
https://www1.ietf.org/mailman/listinfo/ietf


RE: New Last Call: 'Tags for Identifying Languages' to BCP

2004-12-13 Thread Peter Constable
ng with language
> lacks an "Internationalization considerations" section as
> recommended by RFC 2277 (a.k.a. BCP 18).

No more or less shocking than for RFC 3066, regarding which I'm not
aware of any complaints.

I don't quite understand what the critique is here: what is there to
internationalize about language tags? They are symbolic identifiers that
have no culture-specific content. The only possible consideration is the
charset, which for this spec involves ALPHA, DIGIT and "-" only. It's
true that ALPHA and DIGIT are not defined and that it would be better to
do so; it couldn't hurt to have a section for i18n considerations
(wouldn't need to be long). These are very minor concerns, and hardly
"shocking".


 
> Perhaps even more disturbing is the content of the "IANA
> Considerations" section; the draft predicts that certain things
> will happen ("IANA will"[...]), but doesn't actually direct
> (e.g. "IANA shall") IANA to do anything.  The placement of that
> section does not correspond to current RFC-Editor guidelines
> (it should appear after Security Considerations); also on that
> point, Appendices should precede References.

There is a process issue here, but I have assumed that the authors have
dealt with IANA on that. Otherwise, these are editorial issues -- "even
more disturbing" seems to me to be somewhat overstated.


> Many of the references are obsolete (e.g. RFCs 1327,
> 1521)... and at least one reference ([19])
> gives a bracketed URI rather than the correctly formatted
> RFC reference.  Although reference is made to the "Accept-
> Language" header field, RFC 3282 (the defining RFC for that
> field) is not listed among the references... 

> The formatting of the draft is atrocious

All editorial.


> there is no differentiation between normative and
> informative references, 

A valid concern.

 
> I am extremely surprised that the draft has been published
> at least nine times in such a state of poor formatting and
> poor attention to editorial content (e.g. obsolete and
> missing references), and that it progressed as far as IESG
> last call in such a state, with no Internationalization
> considerations section, etc.

In fairness to the authors, page-oriented plain text is not exactly
conducive to authoring and revising a long document, and a lot of energy
was spent focusing on details that have far more consequence than
formatting. And, as mentioned above, the lack of an i18n-concerns
section is hardly without precident, and not particularly significant in
the case of this spec. This really feels like nit-picking, IMO. I'm left
wondering if Bruce has been looking for nits to pick because he is...


> ... particularly concerned about the implementation
> ramifications of the proposed changes, especially (as
> noted in detail above):
> 1. the apparent contradiction between the stated
> objectives w.r.t. accessibility of relevant ISO data and
> standards and the reality of the proposal's
> implications (ISO 8601 date format parsing).

As mentioned above, this really is a non-issue.


> 2. the clear contradiction between the claims about
> ABNF compatibility with RFC 3066 and the factual
> incompatibility of certain provisions in the grammar.

The main concern was with the "grandfathered" production, but I've shown
that that is a non-issue. The maximal length issue exists just as much
in RFC 3066 due to private-use tags; it is a technical concern that
might worth reviewing in RFC 3066bis, however; but it is not
insurmountable, and not a new problem.



Peter Constable
Microsoft Corporation

___
Ietf mailing list
[EMAIL PROTECTED]
https://www1.ietf.org/mailman/listinfo/ietf