RE: Possible RFC 3683 PR-action
> From: Simon Josefsson [mailto:[EMAIL PROTECTED] > > Frankly, it strikes me as somewhat odd that a body acting as a > > standards-setting organization with public impact might allow any > > technical decision on its specifications to be driven by people > > operating under a cloak of anonymity. Expressing an anonymous voice? > > No problem. Influencing determination of a consensus with public > > impact? That should not be allowed, IMO. > > What if the pseudonymous voice raise a valid technical concern, provide > useful text for a specification, or even co-author a specification? That's having voice. We can be open to any voice. If a concern has valid technical merits, then that should be evident to others, and drive a consensus on its own. But the consensus can still be determined by identifiable people. > I think decisions should be based on technically sound arguments. Just so. > Whether someone wants to reveal their real identity is not necessarily > correlated to the same person providing useful contributions. True. But neither is ability to provide useful contributions necessarily correlated with being counted as part of a consensus. Peter ___ IETF mailing list IETF@ietf.org https://www.ietf.org/mailman/listinfo/ietf
Re: Possible RFC 3683 PR-action
From: Russ Housley... > > Since IETF does not vote, it is certainly not an issue here? > > This is not totally true. A WG Chair or Area Director cannot > judge rough consensus if they are unsure if the portion of the > population that is representing a dissenting view is one person > or many different people. This is especially true when there > are a large number of silent observers. Frankly, it strikes me as somewhat odd that a body acting as a standards-setting organization with public impact might allow any technical decision on its specifications to be driven by people operating under a cloak of anonymity. Expressing an anonymous voice? No problem. Influencing determination of a consensus with public impact? That should not be allowed, IMO. Peter Constable ___ IETF mailing list IETF@ietf.org https://www.ietf.org/mailman/listinfo/ietf
Re: [Ltru] Possible RFC 3683 PR-action
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of > [EMAIL PROTECTED] Randy Presuhn wrote: > However, the vocabulary, style, content, and peculiar world-view of > this latest missive leave me more convinced than ever that "LB" > is indeed JFC Morphin, and that under the terms of RFC 3683 > we are well justified in suspending the posting privileges for that > address. I'll mention a few things in relation to this: First, I believe the record in the archives of the LTRU list will show that at times in the past JFC attempted to circumvent actions taken to limit his posting privileges to that list by using alternate email addresses, signed under his same name. Secondly, at various times since his posting privileges to that list were first limited, the list received mail from posters presenting themselves by a different name but whose vocabulary, style, content and world view were decidedly similar to that of JFC. In short, "LB" is not the first instance of a poster to LTRU that I suspected of being a sock-puppet for JFC. Nor is it just of late that "LB" was suspected of being a sock-puppet for JFC. Also, a quick search for "lbleriot" did not point to anybody engaging in public discussions except on the LTRU or IETF lists. I find that somewhat curious given LB's comment to be a member of a "multilinguistic working list": I suppose there could be a group of multiple individuals working on protocol specifications intended for the Internet or other such public systems but all of whose discussions are conducted on a private list, but it would be readily accounted for if, in fact, LB were no more than a sock-puppet. Granted, LB's posts to LTRU have been neither as frequent or a long as were JFC's before his privileges were suspended. And granted that there is one other online presence for a "lbleriot" who, for over some period of time ending early last year, sold a number of items on eBay and apparently had very positive feedback. Neither of these points lead me to consider Randy's actions to be unreasonable, however. Peter ___ IETF mailing list IETF@ietf.org https://www.ietf.org/mailman/listinfo/ietf
Re: Last Call: draft-klensin-unicode-escapes (ASCII Escaping ofUnicode Characters) to BCP
I have a terminological objection to this draft, mainly in section 2. I have other comments regarding section 2 I'll mention. First, terminology: the heading for section 2 has "...Table Position...", and the body refers to "code point position in the table". While the term "code table" could have been used in the Unicode Standard to refer to the encoded entities and their encoding, it is not. The Unicode Standard uses these terms: - It uses "character set" and "character repertoire" for the collection of elements being encoded, and "coded character set" for the set of pairs of such elements and their encoded representations. - It uses "codespace" to refer to a range of numeric values used as encoded representations, and specifically "Unicode codespace" for the range 0 to 10 (hex). - It uses "code point" or "code position" (synonyms) for values in the Unicode codespace. Thus, the appropriate term here is simply "code point" or "code position". "Table position" and "position in the table" are not appropriate since the Standard never uses "table" in this regard. And "code point position" is redundant. Perhaps the wording was attempting to differentiate between code points and various encoded representations of code points. But the latter are not code points, so there isn't really any ambiguity. A possible refinement might be to use "Unicode Scalar Value": this refers to code points other than surrogate code points. By definition in the Standard, encoded characters can only be assigned to a Unicode Scalar Value. I don't see this as a necessary change in the draft, however. Now for other comments on section 2. The draft has: "However, when information about characters is to be processed by people, information about the Unicode code point is preferable to a further encoding of the encoded form of the character." Information about the code point? (It is numeric. It is an integer. It is non-negative. It is in the range 0 to 10. It is even. It is divisible by 17. It is the same value as is found on the license plate of John Doe's car.) I think it is the code point itself that is to be preferred. Also, "a further encoding of the encoding form" isn't going to be clear to readers. (I'm not sure myself what these words mean themselves; I can guess at what the author meant, though am not positive.) Thus, I'd change this text to: "However, when information about characters is to be processed by people, reference to the Unicode code point is preferable to encoded representations of the code point." Now, section 2 is talking about alternate representations of an encoded character, but the flow is a bit mixed up, IMO. The first paragraph says that there are different equivalent representations but that the Unicode code point is preferred. Then the next paragraph revisits the same thing in more detail. The sentence from the first paragraph discussed above, once revised so that it makes a clear statement, already says what paragraph two says in greater detail. Whether a more succinct or more detailed statement is preferred, just say it once. Of course, if the more detailed paragraph two is kept, "code point position in the table" should be changed to "code point". Also from paragraph two: "the UTF-8 encoding or some other short-form encoding" The term "short-form encoding" isn't explained here and may not be understood. I can only guess what is meant. If the intended meaning is what I think (a reference to shortest-form versus non-shortest-form UTF-8), then I don't think it's really relevant. Either way, I'd change the wording to: "the UTF-8 encoding or some other encoding form" (Encoding form is a term defined in the Unicode Standard.) Also: "the other encodes the octets of" I don't think octets are encoded; they are simply referenced using some notational system. Thus, change to: "the other uses the octets of ... in some representation." (This gives parallel wording for the two kinds of reference.) Finally: "the Unicode code point forms" Drop "forms": "the Unicode code points" Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: Last Call: draft-klensin-unicode-escapes (ASCII Escaping ofUnicode Characters) to BCP
From: Stephane Bortzmeyer [mailto:[EMAIL PROTECTED] Sent: Monday, October 22, 2007 4:03 AM >> Also, "a further encoding of the encoding form" isn't going to be >> clear to readers. > > It is a reference to a bad practice (used in URLs, for instance) to > encode twice (for instance in UTF-8, then in %xx escapes of the > bytes). The discussion in that section is about references to characters in general human-readable content, not in URLs. If that is what the wording is referring to, it's extremely opaque. If that's really what the authors intend to talk about, it should be explained -- and the section should be organized better so that it makes sense why that particular thing is being discussed. >> "However, when information about characters is to be processed by >> people, reference to the Unicode code point is preferable to >> encoded representations of the code point." > > That's not more clear to me. How can it not be clear? Human-readable content is discussing a Unicode character and needs to refer to the character in some way. The whole point of this document is about how to refer. Since Unicode character identity is established by the name, the code point and the reference glyph, reference can be made using one of those three things. It appears to me that this document focuses on references based in some way on the code point: is not the key distinction between the code point itself and some encoded representation of the code point? Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Last Call: draft-klensin-unicode-escapes (ASCII Escaping ofUnicode Characters) to BCP
I have a terminological objection to this draft, mainly in section 2. I have other comments regarding section 2 I'll mention. First, terminology: the heading for section 2 has "...Table Position...", and the body refers to "code point position in the table". While the term "code table" could have been used in the Unicode Standard to refer to the encoded entities and their encoding, it is not. The Unicode Standard uses these terms: - It uses "character set" and "character repertoire" for the collection of elements being encoded, and "coded character set" for the set of pairs of such elements and their encoded representations. - It uses "codespace" to refer to a range of numeric values used as encoded representations, and specifically "Unicode codespace" for the range 0 to 10 (hex). - It uses "code point" or "code position" (synonyms) for values in the Unicode codespace. Thus, the appropriate term here is simply "code point" or "code position". "Table position" and "position in the table" are not appropriate since the Standard never uses "table" in this regard. And "code point position" is redundant. Perhaps the wording was attempting to differentiate between code points and various encoded representations of code points. But the latter are not code points per se, so there isn't really any ambiguity. A possible refinement might be to use "Unicode Scalar Value": this refers to code points other than surrogate code points. By definition in the Standard, encoded characters can only be assigned to a Unicode Scalar Value. I don't see this as a necessary change in the draft, however. Now for other comments on section 2. The draft has: "However, when information about characters is to be processed by people, information about the Unicode code point is preferable to a further encoding of the encoded form of the character." Information about the code point? (The code point of that character is numeric / is an integer / is non-negative / is in the range 0 to 10 / is even / is divisible by 17 / is the same value as the number of days the song "Hey Jude" was on the Top 40 list.) I think it is the code point itself that is to be preferred, not information about it. Also, "a further encoding of the encoding form" isn't going to be clear to readers. (I'm not sure myself what these words mean themselves; I can guess at what the author meant, though am not positive.) Thus, I'd change this text to: "However, when information about characters is to be processed by people, reference to the Unicode code point is preferable to encoded representations of the code point." Now, section 2 is talking about alternate representations of an encoded character, but the flow is a bit mixed up, IMO. The first paragraph says that there are different equivalent representations but that the Unicode code point is preferred. Then the next paragraph revisits the same thing in more detail. The sentence from the first paragraph discussed above, once revised so that it makes a clear statement, already says what paragraph two says in greater detail. Whether a more succinct or more detailed statement is preferred, just say it once. Of course, if the more detailed paragraph two is kept, "code point position in the table" should be changed to "code point". Also from paragraph two: "the UTF-8 encoding or some other short-form encoding" The term "short-form encoding" isn't explained here and may not be understood. I can only guess what is meant. If the intended meaning is what I think (a reference to shortest-form versus non-shortest-form UTF-8), then I don't think it's really relevant. Either way, I'd change the wording to: "the UTF-8 encoding or some other encoding form" (Encoding form is a term defined in the Unicode Standard.) Also: "the other encodes the octets of" I don't think octets are encoded; they are simply referenced using some notational system. Thus, change to: "the other uses the octets of ... in some representation." (This gives parallel wording for the two kinds of reference.) Finally: "the Unicode code point forms" Drop "forms": "the Unicode code points" Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: Petition to the IESG for a PR-action against Jefsey Morfin posted
> From: Dean Anderson [mailto:[EMAIL PROTECTED] > > If it were representative, then one would expect that several others > > monitoring the WG discussions would be providing that confirmation. I > have not > > seen any indication of that happening. > > This is a false premise. First, silence does not indicate agreement. On a list with a couple of dozen members, if someone is generally making useful contributions and a few cranks start harassing that person, it would be completely reasonable to expect that the rest will not simply jump on the bandwagon of the few. If you think otherwise, then I think you have an unreasonably pessimistic prejudice against the majority of participants. As you say, silence does not indicate agreement. Quite so. If your sample was representative, and the majority on the list agreed but were silent, then it would be reasonable to expect they would be prepared to give that confirmation. IOW, if your sample is really representative, then it should be easy to get testimony from witnesses to that effect. > But even > if everyone on the WG did want him removed, their reasons must be due to > actual > and unreasonable misbehavior that prevents the working group from > functioning. > Personal dislike, even if unanimous, is insufficient. No serious > engineering > can be done as a personal popularity contest. Just so. And that is precisely my point: Harald and Doug indicated that they want him removed due to unreasonable misbehaviour, not simply personal dislike; the accuracy or sincerity of Doug's comment was questioned, and so I have offered my testimony supporting what he said. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Petition to the IESG for a PR-action against Jefsey Morfin posted
> From: Dean Anderson <[EMAIL PROTECTED]> > In the message Randy concludes that > > "If anyone wishes to raise an issue, (s)he should do on on the working group > mailing list by posting a message detailing the concern and, if possible, > supplying proposed replacement text." > > But it would seem that Morfin did just exactly that, with a lot of supporting > documentation. It seems to me that Randy Presuhn just doesn't want to address > the concerns raised, nor does he want anyone _else_ to address the concerns. Not the case at all. Everyone else in the WG that was voicing pertinent concerns was doing so (i) in a reasonably clear manner that all could understand (ii) on the list and (iii) whenever appropriate supplying specific suggested revisions to the text. There were occasions on which Mr. Morfin made clear and pertinent comments on the list, and when he did they were welcomed. On some occasions, he suggested specific text, and when he did those suggestions were considered openly. On several occasions, however, he posted messages that tended toward being opaque or overly long or both, and far more often than not he didn't give concrete suggestions for specific textual changes. Within some of those often-lengthy posts he pointed to documents he had placed on other sites, and there were many things that led others in the WG to believe that the material on those other sites was supporting his entirely different agenda rather than the work of the WG. Perhaps some of that content was useful to the work of the WG, but by that point there was already a high level of frustration among many WG members, such that there really was an onus on him to demonstrate that it would be worthwhile to spend the time going off to review them. This he did not do. > In > fact, Randy actually admits in the same message to having advised others _not_ > to review Morfin's objections. That seems to be contrary to Last Call. I'm not aware of any occasion on which Randy advised members of the WG not to review Last Call comments that had be submitted in the expected manner on the WG or IETF lists. > The sample, limited as it is, seems to confirm an unjustifiable personal attack > on Morfin based, it seems, on personal dislike and intolerance for his English > language skills IMO your limited sample is not sufficient to support your point. If it were representative, then one would expect that several others monitoring the WG discussions would be providing that confirmation. I have not seen any indication of that happening. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Petition to the IESG for a PR-action against Jefsey Morfin posted
> From: "Doug Ewell" <[EMAIL PROTECTED]> > I wish that everyone who trivializes Harald's proposal as a matter of > "personal dislike" or "silencing anyone with a different opinion" could > have experienced life in the LTRU Working Group for the past 6 months, > where list members were constantly insulted for being Americans or for > being employed by large companies, where "resolved" and out-of-scope > issues were raised over and over again, and where list members became > wary of posting anything at all, for fear their words would be twisted > to mean something completely different. It would not have been > tolerated in any face-to-face working environment. I concur. This is not IMO a matter of personal dislike. Mr. Morfin has made some positive contributions to the LTRU WG which have been appreciated. But those were more than offset by regular and repeated occurrences of the kinds of behaviours Harald and Doug have described. It *has* hindered the WG, and repeated requests for change in behaviour have been of no avail. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: RFC 3066 bis Libraries list
> From: JFC (Jefsey) Morfin [mailto:[EMAIL PROTECTED] > Dear Peter, > whatever the way you want to say it, these libraries have now to meet > new specs they had not to meet before. I cannot say whether every existing software library written to conform to RFC 3066 *has to* meet new specs. Certainly there will some, perhaps many, that would be benefit from revision to the new specs. If that was your intent, it would have been clearer to me had you asked people to identify libraries they feel should be revised if the new spec is adopted. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: RFC 3066 bis Libraries list
> From: "JFC (Jefsey) Morfin" <[EMAIL PROTECTED]> > RFC 3066 Bis imposes new constraints on the existing language tags > software libraries. We need to be more careful in describing the proposed revision to RFC 3066 (aka RFC 3066 bis) wrt exiting libraries that conform to RFC 3066: every tag valid under the terms of RFC 3066 bis will be recognized by an existing library written to conform to RFC 3066. Not every tag that *could* be recognized by such a library would be valid under RFC 3066 bis, but every tag actually valid today under RFC 3066 is also valid under RFC 3066 bis. The draft was written with careful attention to ensuring compatibility with existing libraries written to support RFC 3066. The draft can be said to impose new constraints that existing libraries would not impose; I don't see how it could be said that the draft imposes new constraints on those libraries. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: [Ltru] Last call comments on LTRU registry andinitialization documents
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of John C > Klensin > Aside on the example above (LTRU participants can skip > unless they want to check my logic): "en-Hang" and > "en-Hant" would imply writing English in Korean Hangul > or Traditional Chinese characters respectively. In > addition to those not exactly being common cases, it is > not clear that they are feasible... > Hangul is problematic in a different way. > Unlike Chinese characters, it is definitely phonetic. > But because it is rather carefully designed and > structured around the needs of Korean, it is not clear > to me, in my ignorance, that it could be used to > represent the full range of English phonemes and > syllables with reasonable accuracy. Actually, that's not quite true. There are Korean linguists who have promoted the idea that Hangul script can be adapted for use for general phonemic transcription of languages. Thus, it is plausible that a user might have English data written with Hangul script. This really is no different from the da-CO kinds of examples: they are not particularly useful because there is no actual variant of the given language associated with the given country, but only coincidentally, and in principle the state of affairs in human demographics could well change such as such a variant does actually exist. Just as there is no limit to what region a given language can be spoken by a significant population (given sufficient mobility), in principle there is no limit to what script can be used to write a given language (given sufficient ingenuity). The *only* difference for these cases in RFC 3066 is that generative use without requiring registration is sanctioned for country IDs but not for script IDs. And that is not so by explicit design; is resulted simply because we weren't yet sure how script IDs should be integrated into the tags and the fact that ISO 15924 wasn't yet published. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Enough is enough: Intent to file an RFC 3683 against Jefsey Morfin (Harald Tveit Alvestrand)
> From: Harald Tveit Alvestrand <[EMAIL PROTECTED]> > At the moment, learning that Jefsey's opinions can be ignored is a part of > the initiation process for new IETF participants in the fora he frequents. > I think that's a steep learning curve. The difficulty isn't in learning when they should be ignored, but rather in knowing that they *will* be ignored by others. People can form their own opinions of his comments very quickly, but it takes a while to discover what others' opinions might be. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: Last Call: 'Tags for Identifying Languages' to BCP [Re: Ietf Digest, Vol 16, Issue 95]
> From: Brian E Carpenter [mailto:[EMAIL PROTECTED] > If we are reading here about a point of view that was expressed within > the WG, and that the WG did not accept, that seems to be clear enough > for the purposes of a Last Call. I think discussion of details and > of the history could be conducted on the WG list. For my part, I have been responding to what appeared to me might be construed as JFC having found cause to oppose the draft. Since he has since clarified that he supports the draft, to wit, > You do not need to sell your solution. I explained again and again I > support it. then I have no reason to comment further on this list. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Last Call: 'Tags for Identifying Languages' to BCP
> From: "JFC (Jefsey) Morfin" <[EMAIL PROTECTED]> > Subject: Re: Last Call: 'Tags for Identifying Languages' to BCP > >Mr. Morfin appears to me to have no more than a very vague sense of the > >scope of ISO 639-4. > > This is somewhat fun as I am a contributor. To my knowledge, you have never been a member of TC37/SC2/WG1. I cannot rule out the possibility that you have submitted suggestions that found their way to WG1. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: Ietf Digest, Vol 16, Issue 95
> From: "JFC (Jefsey) Morfin" <[EMAIL PROTECTED]> > I am sorry to impose again the community, what starts amounting to > ad-hominems. I am used to that, but the quality of the person and the > serious looking of the mail calls for a reponse. In particular in > this case, where two majors points are documented. > Sorry, Peter. Please, Brian advise if inadequate. My comments have not been ad hominem. If you feel they were in error, please provide argumentation and evidence to that effect. > He > >has never provided any specific proposal except a request to permit > >certain private-use tags, which I will return to below. > > Dear Peter, > This kind of repetition now abuse no one. I bored everyone enough in > explaining that two additional subtags were necessary IMHO: the > referent and the context. There is also - a way or another the need > of the date of the reference (this can be a date or included in a subtag). To my recollection, you never gave any concrete proposal indicating how this should be done. If you did, please indicate where in the archive, and I will gladly withdraw that statement. I don't question that you mentioned inclusion of such information within a language tag, but at first glance this is the very kind of thing that *should* be in a distinct attribute. > Just in case: the langtag is not supposed to only support the > written-form attributes, but to be multimodal (cf. Peter Constable). > Please quote the voice, signs, icons, mood, etc. subtags. For any question regarding how a distinction in linguistic variety or written form should be reflected in a tag, the members of the WG provided an answer. > >Two comments: First, Mr. Morfin suggested within the LTRU WG that the > >syntax for language tags should be loosened to permit additional > >characters, such as "." and ":". > > This is a false affirmation. I did two things... > - I supported the proposition of an African searcher (they treated of > troll) Please indicate the name of the person you believe who made this contribution, or point to the relevant part of the archives. To my recollection this idea was promoted by "JFC Morfin", aka "Jefsey Morfin", aka "Jean-Francois Charles Morfin", and by "F. Charles". Please provide reason to believe that "F. Charles" is not the sock-puppet of "Jean-Francoise Charles Morfin". As has been mentioned, the request was rejected for technical, not ad-hominem, reasons. > The Draft addresses targets you defined a long ago. It was presented > privately (twice) and is now presented as a WG document. The > document having not changed, For clarification, the document underwent considerable change -- enough to merit 12 drafts -- many changes being made in response to the last-call comments on the earlier private drafts. > In a nutshell, I do _not_ believe that a draft crafted by a few > individuals can supports all the relevant distinctions needed to > describe the linguistic and written-form attributes of content as may > be needed for all purposes, commercial and otherwise. In the six months since the WG was formed, you have not suggested any distinctions that could not be made using the proposal in the draft and that the WG found to be appropriate for integration into these tags. You did suggest distinctions that were appropriate for these tags, but the WG pointed out that they are already supported by the proposal. > >- While Mr. Morfin cites ISO 11179, he has never made statements > > that clearly indicate that he actually understands those standards. > > I propose everyone having time to spend to read ISO 11179 and to judge. Since this was an opinion about your statements, they would need to read both ISO 11179 and archives of the LTRU and IETF lists. > In a recent mail, Peter acknowledged the need to consider ISO 11179 > and explained that ISO 12620 was its equivalent. On the contrary, I indicated that TC37/SC2/WG1 had affirmed the choice of the project team for ISO 639-6 to apply ISO 12620, a TC37 standard that predates ISO 11179 but is being revised to make normative reference to ISO 11179. This was in reporting activity of ISO TC37/SC2 and has no direct bearing whatsoever on this draft or the work of the LTRU WG. > >- While Mr. Morfin refers to "an ISO 11179 conformant system", > > none of the ISO 11179 series of standards contains any statement > > of conformance requirements. Thus, no such notion of "ISO 11179 > > conformant" is defined anywhere. > > :-) :-) > > This is the second Historic statement! > Too bad there is Google My point was merely that conformance is not defined in any formal way, therefore must be measured in terms of consistency with the co
Re: Last Call: 'Tags for Identifying Languages' to BCP
than that of a combined conflation of characteristics; each > characteristic can be assigned separate preference values, and irrelevant > characteristics (e.g. script w.r.t. spoken language) can be easily ignored. Negotiation of separate attributes involving inter-related characteristics is *not* simpler, as pointed out above. The draft fully allows for irrelevant characteristics (e.g. script wrt audio content) to be ignored. Again, what has been provided in the draft is in accordance with the charter of the WG. > As negotiation and related issues represent a critical technical issue for > the design of language tags (viz. keeping separate characteristics out of > *language* tags), it is essential that such negotiation issues be > considered > carefully before specifying the format of tags. Unfortunately, that has > not > been done, and considering the published WG milestones it appears that > that > issue has not been taken into consideration... However, it > appears that the WG has not considered the issues, with the effect that > the > WG product lacks the "particular care" expected of BCP documents (RFC > 2026). It is unclear on what basis it is asserted that these issues have not been considered by the WG. I believe most of the WG members would feel that they have been reasonably taken into consideration. Again, what has been submitted for last call is in accordance with the charter; just as it is not reliable to infer something about a content provider from a language tag, so also it is not reliable to infer from the order of milestones in the charter that matching issues were not taken into consideration in preparation of these drafts. > Note that it is not the registration procedural issues that are typical of > BCP documents that are problematic; rather it is the conflation of > separate > characteristics into a single tag syntax, specified in the same document, > which raises problems related to content negotiation. Bruce asserts (a) that there is conflation of separate characteristics, and that (b) this creates problems in content negotiation. The WG determined that the characteristics conflated into a single tag are not independent, and that it would be *separation* into separate attributes that would result in problems in content negotiation, not their combination into a single attribute. > Another large part of > the problem is WG management; in addition to the issues raised by John > Klensin the last time that LTRU participation was discussed on the IETF > discussion list -- and with which I wholeheartedly agree -- it appears > that > management of WG participant conduct has been rather lax; proponents of > the > individual submission effort who are participating in the WG tend to > resort > to ad-hominem attacks when a problem is identified or when an alternative > approach is raised, with no visible intervention by the WG co-chairs. > That > has also (i.e. in addition to the factors which John identified) had the > effect of limiting WG participation by individuals. It's unclear what bearing this has on what improvements can be made to the drafts in fulfillment of the WG charter. I believe several WG participants felt that management of conduct was lax, particularly in relation to a very small number of participants with a penchant for certain behaviours that would have challenged the best of moderators. As for the accusation that proponents of an earlier individual submission engaged in ad-hominem attacks that went without intervention by the WG co-chairs, resulting in the limitation of participation in the WG by other individuals, in the absence of specific evidence, this appears itself to be no more than an ad-hominem attack on those individuals and on the WG co-chairs. To my knowledge, there was only one individual in relation to whom other members of the WG acted in any way that might discourage or hinder his participation, and such actions arose only in response to repeated provocation from that individual. > Specification of "language" tag syntax which conflates other content > characteristics prior to open and professional discussion of negotiation > issues and alternative approaches would be a premature lock-in of a design > choice. As the document under discussion specifies a conflation of such > characteristics without open discussion It is asserted that there has been no open discussion of the matter of conflation. This is untrue. It is asserted that there has been no open discussion of alternatives; the only concrete alternative presented for discussion was to have separate language and script tags, which alternative was considered and rejected due to problems that arise in content negotiation. The drafts submitted for review are in accordance with the charter, and I believe I can say that in the opinion of WG members matters of conflation and of negotiation issues were taken into consideration, and were discussed in an open and professional manner. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Last Call: 'Tags for Identifying Languages' to BCP
> From: "JFC (Jefsey) Morfin" <[EMAIL PROTECTED]> > >XML, HTML, etc. are not IETF protocols and should not be the main > >consideration in IETF work on IETF documents, > > They are specifically quoted by the Charter. Also is CLDR... These are cited in the charter only as examples in a statement to the effect that "the RFC 3066 standard for language tags has been widely adopted in various protocols and text formats..." > It is to note that ISO 639-4 work is about discussing guidelines in > that area. This work is under way and was not considered. Mr. Morfin appears to me to have no more than a very vague sense of the scope of ISO 639-4. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: Last Call: 'Tags for Identifying Languages' to BCP
> From: JFC (Jefsey) Morfin [mailto:[EMAIL PROTECTED] > This means that the legitimate URI tag: > "tags:x-tags.org:constable.english.x-tag.org" > must be accommodated into the format > "x----etc." instead of > "0-x-tags.org:constable.english.x-tag.org" As I mentioned in another message, Mr. Morfin submitted a request to the WG that the syntax in the draft be loosened to permit tags of the form indicated, and that the consensus of everyone else in the WG was to reject that request on the basis that (i) it would result in backward incompatibility with existing processes designed to conform to RFC 3066, and (ii) it was possible to create a scheme for semantically equivalent tags without breaking compatibility with RFC 3066. > Peter takes a loosely applied chancy non-exclusive proposition, to > make it the significantly constrained exclusive rule of the Internet > instead of correcting it and following the ISO innovation (ISO 639-6 > and ISO 11179) as directed by the Charter. This permits him to > exclude competitive propositions following or preceding that innovation. The LTRU charter makes no reference whatsoever to ISO 639-6 or to ISO 11179. As I have explained elsewhere, Mr. Morfin's suggestion that the draft is incompatible with ISO 11179 while his alternative would be conformant is far from valid. Finally, I have not excluded competing propositions; I was one voice among many that rejected a request to permit "." and ":" in the syntax, and to my recollection no other concrete proposal wrt syntax, let alone an overall system of metadata elements, was submitted by Mr. Morfin to the WG. > With the trick above: length and character wise a private tag is a subtag. > and the lack of explanation of how billions of machines will > know about the daily updated version of his 600 K file, without > anyone paying for it, but me and the like. It is completely unclear on what basis Mr. Morfin is suggestion that billions of machines will need to update "my" (?? I did not create it!) 600K file on a daily basis. There is no indication or likelihood that the language subtag registry proposed by this draft will change with a frequency approaching anything close to daily. Indeed, it is entirely likely that it will change rather less frequently than the RFC 3066 registry was likely to change. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: STD (was: Last Call: 'Tags for Identifying Languages'toBCP)
onstitute a simpler solution. Given the widespread existing use of RFC 3066 tags, use of ISO 639-6 would have to go alongside use of multi-part tags of the form permitted by RFC 3066, which is certainly not simpler than what is specified in the draft. > >Your statement doesn't contradict anything that Debbie has said, > >provided the context is ISO 639-6 alone. If we were to talk about > >incorporation of ISO 639-6 into a revision of RFC 3066, however, then > >duplication would become an issue for consideration. > > This is the WG-ltru Charter that all the ISO codes be included. The charter makes reference to "the underlying ISO standards"; that is, to the ISO standards referenced in RFC 3066 or those cited in the charter to be incorporated into the update RFC. The charter does not cite ISO 639-6, let alone state that "all the ISO codes be included". > Nice to see that ISO 11179 is accepted now. Peter Constable and the > WG-ltru have opposed the reference to ISO 11179 model. This model > permits to conceptualise languages and to include in their > description an unlimited number of additional elements. This is in no way implied by ISO 11179. The model of that standard assumes that metadata elements designate concepts within some conceptual system, and that the system of metadata elements includes a meta-model that reflects that conceptual system. This would have the effect of *constraining* the concepts represented to entities within that conceptual model. Those entities may be an infinite set, but the set of entities that can be represented by the tags defined by this draft would not increase in number if the draft were changed to reference ISO 11179. > But ISO 11179 totally open the concept... Clearly either Mr. Morfin does not understand ISO 11179 or, if he does, he has totally failed to express a statement consistent with that understanding. > I would then advise that the Draft is sent back to the WG-ltru, with > the suggestion that a lexicon is provided which would define what is > a "language", a "script", a "country", and the purpose (informative, > descriptive, normative?) of a langtag. This might be a big step ahead. Mr. Morfin submitted a request to the WG that these terms be defined. The consensus of everyone else in the WG was that this was not necessary since it would not significantly alter the ability of anyone to implement or use the specification. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Last Call: 'Tags for Identifying Languages' to BCP
> From: Bruce Lilly <[EMAIL PROTECTED]> > It's unclear what you're trying to get at here. A URI scheme is a > protocol element (an "assigned number") registered by IANA, not a > piece of text (see RFCs 1958 and 2277). As such, it has no need of > an indication of language, for it has no language; it is a language- > independent protocol element. This point was made in response to Mr. Morfin on more than one occasion within the LTRU WG. He appears to be unwilling to accept it, however. > ought to be a means of indicating language in IDNs. However, that is > primarily an issue with the IDN specification(s), not with the document > under discussion (except to the extent that the document under > discussion extends the likely length of tags In comparison to RFC 3066, the draft does not extend the likely length of tags. The likely length of tags is precisely the same as before; the main difference is that this draft imposes significant structural constraints on tags. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Unicode points
> From: Bruce Lilly <[EMAIL PROTECTED]> > I apologize for not being sufficiently clear. But part of the issue appears to be one of being sufficiently informed. > Given the flip-flop on musical notation, I expect that the consortium > will have no trouble finding other non-text things to encode (smileys, > aromatic hydrocarbon chemical symbols (very fertile territory, no pun > intended), dance notation, logos, traffic symbols, etc.). There has been no flip-flop on such things. There were never any guarantees that musical symbols would not be part of the UCS. There will be further symbols added to the UCS, and there is no certainty of exactly what, but it is by no means open-ended. > > The range of Unicode characters is defined in > > <http://www.unicode.org/versions/Unicode4.0.0/ch02.pdf>, page 24, as 0 to > > 10(hex), which is 1.114.111 decimal - quite a bit larger than 65536, > > but quite a bit smaller than 4 billion. > > Now, yes, but I have about as much faith that that won't expand as I > now have in the "Unicode characters are sixteen bits" statement which > was true in its day. That merely indicates that you are not fully informed with the development of Unicode and ISO 10646. Nothing whatsoever has happened to increase the likelihood of the codespace ever going beyond U+10. Rather, much has been done to ensure that it does not, as reflected by recent action within JTC1/SC2/WG2 to that effect. > I did; in case you missed it, I quoted from the Unicode Standard > itself, viz. "Graphologies unrelated to text, such as musical and > dance notations, are outside the scope of the Unicode Standard". That means that musical or dance notation and complete notational systems are beyond the scope of the Unicode Standard itself, as are mathematical formulas. That does *not* mean that the text elements -- the individual symbols -- that are used in those notation systems are necessarily out of scope for the Unicode Standard. > That appeared in the description of the "plain text" principle, > before that sentence was elided following the abandonment of that > principle. You seem to think that some principle has been abandoned, but it has not. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: The process/WG/BCP/langtags mess...
> From: Vernon Schryver <[EMAIL PROTECTED]> > Subject: Re: The process/WG/BCP/langtags mess... > That's fine, but does suggest some questions: > > - Is the Last Call over? > > - If so, was its result "no supporting consensus"? > > - If the result was "no supporting consensus", will the current document > nevertheless be published as a BCP? > > - If the result was "no supporting consensus", will a revision of > the document be published as a BCP without a new Last Call? > > Last week I saw a comment that seemed to answer first question with Yes. > If the answers to the other questions are not Yes, No, and No, then > as others have said, the IETF has far more serious process problems > than how to account for the expenses of the to be hired help. This is comment is a general one rather than being directed toward the particular case at hand. It seems to me that your comment is making a presumption, as a participant on the IETF list, regarding what the outcome of the question regarding result must be. Perhaps I am wrong, but I would have thought it is the role of the IESG to make that determination, not members of this list; and if that is the case I would certainly think it possible for them to weigh concerns that have been raised against responses provided and reach a conclusion that there has been adequate disposition of the comments raised. Again, I am not saying that in this case I think that is what the IESG will or might or should do; only that in general I would think it is something that they *could* do, in which case the outcome of their decision even when concerns have been raised cannot be assumed a priori. > If outside groups can publish IETF BCPs without the let, leave, or > hindrance of the IETF, then the honest thing to do is to get rid > of all of that tiresome WG stuff. No outside group is doing this. > On the other hand, if the answers are Yes, Yes, No, and No, then > contrary to the other person's request, there is no good reason to > talk about the language tags document here and now. I agree that a yes to the first question -- is the last call closed? -- would appear to be adequate grounds for there to be no further discussion on this list in relation to the I-D in question. Whether there may be grounds for discussing other process-related questions possibly including the area of work to which this I-D pertained is, of course, a separate question. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: The process/WG/BCP/langtags mess...
> From: Vernon Schryver <[EMAIL PROTECTED]> > > In fact we feel that we've been very considerate > > and open in the development of this draft in the language tagging > > community and continue to be open to comments and criticism, no > > matter the source. > > Based on what I have seen in this mailing list, I disagree. I'd be curious to know what has led to the impression that the authors have not been open to comments or criticism. > He is basically saying "You must publish our > BCP because we followed all of the steps as we understood them and the > default result of that is surely to publish." I am unable to see how you derive that from his message. Rather, he appears to be saying, if there is not enough consensus for acceptance of this draft, then surely we should be able to find a way for stakeholders to continue work together toward a draft that does achieve consensus. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: Language tags, the phillips draft, and procedures
> From: [EMAIL PROTECTED] [mailto:ietf-languages- > [EMAIL PROTECTED] On Behalf Of John C Klensin > I'd like > to suggest that everyone voluntarily declare a cooling-off > period... > Please don't try to > answer that question today, especially on the IETF list. I'll respect that request. I'll only comment that I think both you and Kristin failed to identify where the real dichotomies lies. For instance, your second suggestion was to think about the contradiction between the two positions, but in fact the supporters of the draft would describe their position as involving elements of both of the two opposing positions in your analysis. Some of those who have raised concerns with the draft have expressed frustration at not being heard, which is a reasonable complaint, and I have made a real attempt to understand those concerns. (E.g. the last sequences of exchanges between Ned and me; and my acknowledgment of comments you've made wrt process and WGs.) Please understand that there may also be frustration for supporters of the draft from a perception that their position is not being understood, which may result for instance from analyses of the opposing views that really don't capture their position at all. For my part, I won't say I'm frustrated by the analysis you gave; just disappointed that I haven't been able to get us closer to the place where we agree on what the dichotomies are, which I had hoped to do. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, sp ecifications, "stability", and extensions
John: > Peter, just to clarify... In my opinion (which isn't necessarily > worth much) (I sincerely doubt that's the case.) >, the procedures that were followed were perfectly > reasonable. Anyone can form a design team and put a document > together, and there are no rules that bar such a design team > from using and building on a mailing list set up for something > else. That may or may not be wise, but it is certainly > permitted. The only place this runs into a problem is if > someone presumes that a document developed in the way this one > was developed is equivalent to a WG product, or that it is > entitled to the presumptions of relevancy and correctness that > go with a WG product. I can't speak for the authors. I was not familiar with those distinctions when the process began, and I suspect that is true of others on the IETF-languages list who contributed. In my mind we were following a precident that implied not only a permitted procedure but an entirely appropriate one. I think all of us now understand, at least in part, that some distinctions exist that may have practical implications on how something is received by the IETF community and processed by the IESG. > From that point of view, it is nothing > more or less than an individual submission (or the output of a > self-defined design team) and the comments Dave and I have been > making apply. I don't think I have questioned the applicability of your comments in this regard at any point. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, sp ecifications, "stability", and extensions
> From: [EMAIL PROTECTED] [mailto:ietf-languages- > [EMAIL PROTECTED] On Behalf Of John C Klensin > > This reflects a fundamental misunderstanding of what the draft > > does compared to what RFC 3066 does. It imposes *more* > > restraints on language tags, not fewer. > > It also very explicitly permits talking about scripts, not just > languages and countries.That, to me, is an extension, > regardless of the additional constraints. There may be a disagreement here due to a difference of perspective: one could say that the grammar is more extensive, but that makes the formal language less extensive. So, I suppose whether one considers such a revision "an extension" depends on one's perspective. Note that while the draft permits "talking about" scripts, RFC 3066 permits "talking about" *anything*. More extensive grammar, less extensive language (and vice versa). > And, as Ned as > pointed out repeatedly, there are things that can be done in > 3066 parsers/interpreters in practice that have to be done > differently in this new system. I think this claim can only be made on the basis of assumptions not found in RFC 3066. Ned has most recently said, "3066bis provides a reliable way to locate country codes in all cases, but the algorithm is different. And this is a non-backwards-compatible change." The fact that it can identify country codes in all cases but requires a different algorithm does not imply a non-backwards-compatible change since it is a new functionality -- it is doing something that wasn't even possible in RFC 3066. Backwards compatibility cannot be measured in terms of whether new processors require different algorithms to achieve new functionality. It can only be measured in terms of whether new processors can perform correct operations (correct according to the specification for those processors -- the proposed draft) on existing tags, and whether existing processors can perform correct operations (correct according to the specification of those processors -- RFC 3066) on new tags. This draft permits this. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, sp ecifications, "stability", and extensions
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > > RFC 3066 left us with bigger problems: it doesn't give us any > > way to identify pieces that we would be encountering in registered tags > > (apart from hard-coded tables compiled from versions of the registry > > that pre-exist a given implementation). > > With, as you point out below, one important exception: It did have a way > to > reliably identify a country code in most cases (but not all). If "in most cases" means from among tags in use today under the terms of RFC 3066 (as John Cowan would say, "what is true"), then yes. But if "in most cases" means trom among tags permitted by RFC 3066 (as John Cowan might say, "what is the rule") -- including some that users have been wanting to use but have delayed using pending a revision of RFC 3066-- then no: RFC 3066 allowed for reliable identification of a country code in only a small portion of all possible cases: only if it occurred as the second subtag following an ISO 639 code (it does not prohibit a country code from occurring anywhere after the first subtag). > And this ability > to say "2 character subtag in the second position, most be a country code" > was > quite useful even though it might miss other occurences of country codes > in some cases. The draft would still grant the ability to make that statement, and would permit new implementations never to miss *any* occurences of country codes. > 3066bis provides a reliable way to locate country codes in all cases, but > the > algorithm is different. And this is a non-backwards-compatible change. Surely this has been the point of greatest contention in this discussion, and is clearly not obvious, for there are several who have repeatedly indicated that they do not see any such backwards non-compatibility. Please, anyone claiming there would be incompatibility, be pedantic: define whatever terms, make explicit whatever assumptions are required to support this claim. (I suspect the root of this disagreement lies in unstated assumptions.) Those who claim backward compatibility do so on the basis that every existing implementation conformant to RFC 3066 will continue to operate precisely as designed and in conformance with RFC 3066 regardless whether they encounter a tag presently well-formed and valid under the terms of RFC 3066 or one that would be sanctioned by this draft. If there is any term needing clarification in that statement or any suspected assumption not made plain, please ask for clarification. > Of course there's the option Dave Singer has raised: Reverse the positions > of > script and country codes in 3066bis. I see two problems with this: > > (1) Script codes are in general more important than country codes, and > therefore really should come first so that simple truncation matches > work "better". (There are probably exceptions to this assertion > lurking > out there somewhere, but I believe it is mostly true.) Thank you for voicing support for this position. > (2) I believe it increases the number of grandfathered codes that won't > conform > to the new format. If I'm not mistaken, I think there would be no difference in this regard. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, sp ecifications, "stability", and extensions
> From: [EMAIL PROTECTED] [mailto:ietf-languages- > [EMAIL PROTECTED] On Behalf Of John C Klensin > (3) Finally, there is apparently a procedural oddity with this > document. The people who put it together apparently held > extended discussions on the ietf-languages mailing list, a list > that was established largely or completely to review > registrations under 3066 and its predecessors.My > understanding at this point is that their good-faith impression > was that the discussions on that list were essentially > equivalent to those of a WG. I believe I can say that it was done this way because it followed the example of the development of RFC 3066, which to my knowledge (as a member of the IETF-languages list at that time) happened in the same way. It was certainly done with a good-faith impression that appropriate procedures were being followed. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, sp ecifications, "stability", and extensions
p://www.sil.org/silewp/abstract.asp?ref=2002-003.) > or what sort of naming scheme is politically acceptable, and is there a conflict there. > This does get back to the > algorithmic matching issue in a sense though, which is that if one wants some sort of > hierarchical structure to > the tags (to allow easier matching), Insofar as tags are structures as linearly-sequenced elements and that there are matching algorithms in use that are based on left-prefix-trunctation, there is no debate over *wanting* a hierarchical structure: it's a reality we must live with. > or indeed [wants to] define any sort of matching rules (as an > implementor wants), you're > probably getting right into some political questions about how matching "should work". > So for those who wanted > to stick just to linguistic accuracy and try to avoid political issues, trying to avoid > discussion of algorithmic matching > may have seemed appealing (but then provides no help to what I've termed the > "implementors"). This seems to assume that those promoting an ordering of script and country subtags as found in the draft are supporting that order for reasons of linguistic purity and have no interest in discussion of algorithmic matching, which is completely wrong: the reason for supporting that order of subtags has everything to do with matching behaviour in certain widely-deployed algorithms. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, sp ecifications, "stability", and extensions
> From: Dave Singer [mailto:[EMAIL PROTECTED] > Sorry, I should have gone on to conclude: the important aspect of > sub-tags is that their nature and purpose be identifiable and > explained (e.g. that this is a country code), and that we retain > compatibility with previous specifications. Ah! Then the proposed draft ensures that the nature of subtags are always identifiable, which RFC 3066 (as I mentioned earlier) fails to do. And the draft retains compatibility with previous specifications using an assumption (thoroughly discussed and concluded on the IETF-languages list a year ago) that, in case of left-prefix matching processes, script distinctions are generally far more important that country distinctions. > I don't believe that simple > truncation is a necessarily useful operation in all circumstances, I don't think anyone would dispute that. > and it probably should not be in the spec. at all. For example, I'd > say that we should retain the 3066 ordering of language-country and > therefore script, if needed, comes later. However, my typesetting > subsystem doesn't care a jot about language or country, it just needs > to find the script code ('can I render this script'?). Here I disagree. For other purposes, I think it's very clear that the only time that choice of order matters is with matching algorithms that use simple truncation, and for the most common implementations, which use left-prefix truncation, the order lang-script-country will be far more useful in the long run precisely because script distinctions are generally far more important in matching than country distinctions. I don't know of any case in which a tag might be used that contained all three subtags but in which the country distinction generally matters more than the script distinction. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, sp ecifications,
> From: Dave Crocker <[EMAIL PROTECTED]> > It occurs to me that a Last Call for an independent submission has an added > requirement to satisfy, namely that the community supports adoption of the work. > We take a working group as a demonstration of community support. You say "the community", though surely a working group is only representative of "a community" or a portion of "the community". In this case, there is a least "a community", represented by members of the IETF-anguages list. > And, indeed, I haven't seen much support for the document under discussion. ??!! If there wasn't much support, surely the discussion would have died a few weeks ago shortly after it started. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, sp ecifications, "stability", and extensions
> From: Dave Singer [mailto:[EMAIL PROTECTED] > >This is similar to the reason why the language code comes before the country > >code. If we had the order CH-fr, then we could end up mixing French and > >German in the same page, because we would fall back (for one of the data > >sources) from CH-fr to CH, which could be German. > > It has to be application-specific which fallback happens. If the > user says he's swiss french, and the the content has alternative > offers for swiss german or french french, which do you present? If > the content actually differs for legal or geographic reasons ('the > legal representative in your country is', 'for copyright reasons this > edition differs in material ways from other countries'), then the > correct country but wrong language is the best answer. If the desire > is simply for maximum intelligibility, then the reverse is true. But that is a level of decision making that goes well beyond any algorithm that simply uses truncation of tags, which is the only case in which the ordering of sub-tags matters. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, sp ecifications, "stability", and extensions
> From: [EMAIL PROTECTED] [mailto:ietf-languages- > [EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] > Again, your pejorative dismissal of other people's concerns does not > mean your position is valid... > Parsing almost never is. But simply parsing these tag is not, and never > has > been, the issue. I think you guys are in violent agreement over country codes within a tag, and that the debate over intrepreting the wording of RFC 3066 serves no purpose. I think the intent of Mark's dismissal has been to refute perceived-invalid objections, in which case we need to consider that the line between perceived-invalid and truly-invalid has been blurred simply by the volume of discussion (the noise factor). There have been some invalid objections that bear some similarity to comments Ned has made as he has tried to make his point. (E.g. Bruce Lilly has claimed invalid back-compat problems on the incorrect premises that RFC 3066 does not permit ISO 3166 country codes except as second subtags or does not permit second subtags that are not country codes (at the moment I forget if it was one or the other or both).) But Ned's concerns are legitimate, I think. I'd say they are not necessarily blocking issues for this draft, because I think a possible outcome of discussion is to characterize them as concerns about outstanding issues that need to be solved rather than as concerns over the draft itself; but I do think they are valid concerns that deserve attention. In a nutshell, Ned was elaborating on a comment from Dave Singer that, once we have parsed a pair of tags and identified all the pieces, it's not a trivial matter to decide in every case how the two tags compare, and that there are factors that would exist if the draft were approved that didn't exist under RFC 3066. Again, I think this is a question that deserves discussion. In relation to the proposed draft, I don't see it as a particular problem with the draft. It is a problem that doesn't exist in RFC 3066, but that is only because RFC 3066 left us with bigger problems: it doesn't give us any way to identify pieces that we would be encountering in registered tags (apart from hard-coded tables compiled from versions of the registry that pre-exist a given implementation). RFC 3066 permits tags that have all kinds of internal structures. That is a problem as it will never allow us to derive much useful information from a tag with any confidence -- only the ISO 639 language category and in some cases a country category. I predict that in the future we will be seeing a significant number of tags (whether sanctioned without registration by a successor to RFC 3066 or as tags registered under RFC 3066) that go beyond the patterns 'll(-CC)" and "lll(-CC)". If we stick with RFC 3066, we will have no way of writing forward-compatible processors that will be able to do very useful matching. What this draft does is impose some order to all the other patterns within tags that are permitted, and tell us what the different pieces must be. As a result, we have more named pieces to deal with, and we are presented with the question that Ned raised: "Now we have more named pieces than we did before; what do we do with them?" That is a problem that will need to be addressed. But I don't think it's a reason to oppose the draft, since opposing the draft (or at least opposing any revision that introduces a richer internal structure) leaves us in a situation that must be characterized either as a worse problem or as turning our backs on increased functionality to meet real user needs. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, specifications, "stability", and extensions
> From: [EMAIL PROTECTED] [mailto:ietf-languages- > [EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] > My reading of that text is that it goes out of its way to try and avoid > direct > discussion of a matching algorithm, talking instead about "rules" and > "constructs". I no longer recall the circumstances behind this, but my > guess > would be that talking about algorithms directly moved this specification a > bit > too close to implementation work, which in turn would argue for the normal > standards track and its ability to assess interop status, not BCP. > > This present yet another problem for the current draft, BTW. You say that it avoids direct discussion of an algorithm, but then imply that it talks directly about algorithms. Which is it? If it talks about principles that may be used in processing tags in a general sense, but not a specific algorithm, then I don't see that there is any problem. All that it is doing is giving guidance regarding the semantic relationships that may exist between tags of different types, and pointing out what processes may or should not change about a tag to preserve it's well-formedness and preferred ('canonical') structure, all things within the scope of a BCP that doesn't specify any specific matching algorithms. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, specifications, "stability", and extensions
> From: [EMAIL PROTECTED] [mailto:ietf-languages- > [EMAIL PROTECTED] On Behalf Of Bruce Lilly > > [...] RFC 1766/3066 need to be able to deal with tags that contain pieces > > they don't > know about -- the only subtags they can know about are initial subtags of > "i", "x" or > ISO 639 IDs, or a second subtag consisting of an ISO 3166 code in case the > first > subtag is and ISO 639 ID. > > Right. I.e. they should be able to deal with superfluous stuff > on the right. But not script tags that suddenly appear between > language code and country code. For purposes of an RFC 1766/3066 parser, a script tag plus anything after it would be "stuff on the right I don't know anything specific about". It could not be described as superfluous -- the process can still compare tags and make matches according to whatever rules it uses, such as left-prefix matching. > For the triple of > language/country/script to match usefully in the general case by > RFC 3066 parsers (which are unaware of script in general), the first > and second subtags would have to remain language code and country > code respectively. If you consider realistic scenarios, this makes the wrong assumption that country distinctions generally matter more to users. > not on a Quixotic quest for "stability" > of nations. The draft doesn't try to achieve stability of nations. Only stability in the semantics of metadata elements. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Language Tags: Response to a part of Jefsey's comments concerning the W3C
> From: "JFC (Jefsey) Morfin" <[EMAIL PROTECTED]> > why not to follow under IAB guidance (or to review) the charter I proposed > yesterday, in an IETF way everyone could participate, and to have all these > applications supported one shot in working on a linguistic ontology where > each language instance would be documented by an ad hoc authoritative > source. Otherwise it could not be the standard you wish. The objective of RFC 3066 or any successor is not language documentation (which I understand to mean more or less language description). Perhaps I misunderstand what you're saying here. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: Last Call on Language Tags (RE: draft-phillips-langtags-08)
> From: JFC (Jefsey) Morfin [mailto:[EMAIL PROTECTED] > 2. I never objected the scripting-ID. I objected that it was not given the > same importance as language and country codes. I plead (and act) for 25 > years for the support of authoritative distinctions among users contexts. > But I am not paid by a big employer. I don't have time to offer many comments. Let me say for the benefit of people that don't know much about me that up to a year ago I was not paid by a big employer, but was a volunteer working for a non-profit organization, SIL International, and it was in *that* context that I became involved in the development of ISO 639 (including being SIL liaison to the ISO 639-RA/JAC, a member of the US TAG for TC 37, and project editor for ISO 639-3), a contributor to the development of RFC 3066 and a regular participant in the activity of the IETF-languages list. > There is NO consensus in the community and huge technical, > societal, economical and political concerns. Because one does not > understand what the Draft wants to achieve, for who and how. The main > request is to clarify. There are no real objections (except to the paucity > of the proposition) but concerns. I haven't seen many requests for clarification. If that is people are wanting, then I think the authors, or others, can provide that, if it's made clear at what points clarification is needed. > > > It would be very helpful, to me at least, if you or he could > > > identify the specific context in which such tags would be used > > > and are required. The examples should ideally be of > > > IETF-standard software, not proprietary products. > > You respond none. Just an application level problem. I was asked to respond with examples that pertain to IETF-standard software, so that's what I did. > >I've used Chinese as one example, but there are many other cases, some > >familiar to many and some less well known > Full agreement. But this is to be done through an open and inclusive > semantic, not on an exclusive first come first serve registration basis. Which is why one of the aims of the proposed draft is to fully incorporate script IDs as sanctioned sub-tags rather than leaving individual parties to make ad hoc registrations for such distinctions. > Why do you want there would be an exclusive _unique_ matching algorithm? I have never said I want that. > We had a long talk at the end of the August Paris meeting at AUF over ISO > 639-2 and the need to aggregate language ID, scripting ID, usage > description, authoritative sources and also country codes and on the > complexity to take into account "sub-code" and private codes and to add > accidental or new descriptors in order to document venacular ways of > speaking, thinking, talking. Obviously it was a private discussion with a > few people sharing the same ideas ... May be you were there (we were the > last to leave the room and the building). I don't know. I don't recall this discussion, and I can't put a face to your name. I know I was not last to leave the room. Obviously I have ideas on those issues. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, sp ecifications, "stability", and extensions
> From: [EMAIL PROTECTED] [mailto:ietf-languages- > [EMAIL PROTECTED] On Behalf Of John Cowan > > The whole question of what is a language, a variant or dialect of a > > language, or a suitable substitute for a language, would benefit some > > thought in any tagging scheme, though I agree the problem is not > > generally soluble. > > See the editor's draft of ISO 639-3 at http://tinyurl.com/6kky2 ... I would say that all of clause 4.2 is relevant; in addition to 4.2.1, I would especially include 4.2.2, in relation to which I have presented ideas that led to the inclusion of the Extensions subtag in the proposed draft. (I originally thought of it as a way to capture some existing registered tags as part of a consistent scheme rather than merely as ad-hoc tags, but I think it may be more generally useful as well for dealing with some of the issues regarding different perceptions of what is a language.) I'm afraid I don't have time at the moment to elaborate further. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, sp ecifications, "stability", and extensions
> From: Dave Singer [mailto:[EMAIL PROTECTED] > The whole question of what is a language, a variant or dialect of a > language, or a suitable substitute for a language, would benefit some > thought in any tagging scheme, though I agree the problem is not > generally soluble. These are questions that have been given some thought. No time to delve into it at the moment, however. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: Last Call on Language Tags (RE: draft-phillips-langtags-08)
ics of the whole explicitly in such cases. And there *is* a need to avoid the problem you alluded to... > while also > believing that it is possible to make registrations under the > rules of 3066 that would make quite a mess of things. Part of my reluctance to have script IDs included in RFC 3066 was due to the fact that a set of tags had just been registered (some of which I now wish didn't exist) which used various subtags in combination, and I sensed that there was a lack of collective understanding of what the internal structure of tags and relationships between subtags should be (which is a direct cause that led me to write the paper I referred to earlier). Not long after RFC 3066 was approved, there were several further tags registered that used various subtags in combinations that concerned me then (I voiced my reservations at the time) and still do. RFC 3066 *is* too flexible to use without some kind of constraints. While the proposed draft is not what I would have drafted had I gotten there first, I have been willing to support it because I feel it provides helpful constraints on the internal structure of RFC 3066 language tags. > We have > tag review processes to prevent just that eventuality. I have been party to the review process for the past five or so years, and can say that the review process did not, IMO, always succeed in avoiding regretable tags (I do not consider those that include script IDs to be among them) because there was a lack of a model of what ontology was needing to be described and what the appropriate elements within a tag standing in what kind of relationship to one another were needed. This draft doesn't describe such a model, but it does impose one, which I think is moving in a good directiton. > > There may be implementations that use a more complex approach > > to matching involving inspection of the tagged content itself, > > or inspecting the particular subtags of a language tag. > >... > > Peter, you are talking, I think, about different applications > doing different things given the greater range of options and > flexibility that the new specification provides. Actually, no; I was trying to guess at existing applications that might have particular problems with complexity, as you mentioned. Certainly language-range matching is no more complex in the proposed draft than it is today. I personally suspect that the language-range matching algorithm is too simplistic, but I haven't gone beyond that myself to start suggesting it needs to be replaced with something more complex. > Let me also comment on the ISO 3166 issues here... But > the solution to the problem of various ISO TCs not having an > adequate understanding of the impact on the Internet and IT > communities (and, in the case of TC46, even the > library/information sciences community that are one of their > historical main constituencies) is, IMO, to get that message > across via liaison statements and, if necessary and appropriate, > encouraging national member bodies to cast "no" votes on > standards and registration procedures that are insufficiently > stable. After the "CS" decision, the statements from the > British Library advocating a much longer time-to-reuse and from > the IAB suggesting that a century might be adequate were, again, > IMO, just the right sort of approach. In particular, I presume > that TC 37 has an adequate liaison mechanism in place with TC 46 > to insist that a much more conservative position be adopted with > regard to changes. If TC 37 isn't able or inclined to do that > job effectively, I'm not persuaded that shifting the task to the > IETF is an appropriate solution or one that is likely to be > effective. For my part, I made a point of informing TC 37 members of the re-assignment of CS, and that led to a resolution at our Paris meeting last August expressing strong concern over this. I did not ever hear any response from either TC 46 or the ISO 3166 MA on this matter, however. I don't know that I would have devised the approach to the handling of this issue used in this draft had I been its author. I am deeply concerned that stability be ensured in language tags, however, and if this is the only way to ensure it I can accept it. Of course, your point is that it probably is neither the only nor the best way to ensure this. I have no comments to counter that opinion. Regards, Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: Last Call on Language Tags (RE: draft-phillips-langtags-08)
ly comparisons that are easy > involve bit-string identity. Working out, at an application > level, when two "languages" under the 3066 system are close > enough that the differences can be ignored for practical > purposes is quite uncomfortable. Attempting similar logic for > this new proposal is mind-boggling, especially if one begins to > contemplate comparison of a language-locale specification with a > language-script one -- a situation that I believe from reading > the spec is easily possible. RFC 3066 makes reference to a fairly simplistic matching algorithm using the notion of language-range. The proposed draft would continue to support that same algorithm with an expectation that implementations of language-range matching as defined in RFC 3066 would continue to operate using exactly the same algorithm on new tags permitted by the proposed revision -- and with generally desirable results. There may be implementations that use a more complex approach to matching involving inspection of the tagged content itself, or inspecting the particular subtags of a language tag. Certainly an existing RFC 3066 implementation that does the latter will not be aware of the specific syntax of the proposed revision, though it also cannot be aware of registered RFC 3066 tags defined after the implementation was created -- there is no categorical difference here. As for how difficult it would be to update such an implementation to use a sophisticated matching algorithm based on interpretation of individual subtags permitted by this draft, I grant that there is greater complexity, but the draft specifically imposes syntactic constraints that allow different types of sub-elements to be identified quite readily. As for how the different sub-elements would be used for matching, for instance in recognizing a relationship between a language-region tag and a language-script tag, those are issues that already exist with valid RFC 3066 tags such as zh-CN and zh-Hans. I agree that it is not a trivial matter to decide exactly how such tags relate. That does not, however, change the fact that language tags that incorporate script IDs are useful and appropriate; for instance, in this particular example, all that was available for tagging Chinese content for some time were tags like zh-CN and zh-TW, and this was causing very significant problems for implementations and users, which is precisely why zh-Hans and zh-Hant have been registered, and why many of us are eager to see a revision of RFC 3066 that incorporates script IDs. (Granted, that does not speak to other changes proposed by the draft.) > That situation almost invites > profiling of how this specification should be used in different > circumstances... I have no particular counter to the opinions you expressed in your remaining comments. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: Last Call on Language Tags (RE: draft-phillips-langtags-08)
> From: Peter Constable > I'd also like to observe that various members of TC 37 and the ISO 639- > RA/JAC have observed or participated in the development of this draft. For > my part, it is not the draft I would have developed if I had undertaken it, > but I see no problems with it from a TC 37 or ISO 639-RA/JAC perspective. I realized there are some additional comments I should make on the proposed revision of RFC 3066 from a TC 37 perspective. (Note: these comments are offered as an active participant in the work of TC 37 and as a member of the ISO 639-RA/JAC. They are not official statements of TC 37 or any of its sub-committees, but I believe they are a reasonable representation of prevailing opinion within TC 37.) One of the issues this draft attempts to deal with is potential instability of ISO identifiers and the damaging impact that can have on existing content and implementations. ISO 639-2:1998 specifies that its language identifiers may be changed given compelling reasons, but that an identifier may not be reassigned for a period of five years after such a change. ISO 639-1:2002 specifies that an identifier may not be reassigned for a period of *ten* years after such a change. In practice, there has been no case in which an ISO 639-1 or ISO 639-2 identifier was withdrawn and later reassigned. The ISO 639-RA/JAC and TC 37/SC 2 have increasingly taken up concerns for stability to the point that ISO DIS 639-3 has a very strict stability policy designed to ensure that declarations made on existing information objects do undergo any adverse changes. This includes a restriction that identifiers that are deprecated may never be reassigned with a different meaning. To the extent that this draft attempts to protect language tags from instability of ISO identifiers, TC 37 considers it very important to ensure that metadata elements declaring linguistic properties of information objects have stability in relation to their meaning, but feels that there is no significant risk of such instability coming from ISO 639. On the other hand, TC 37 has been very concerned about changes that have been made in ISO 3166 country identifiers in which identifiers that had prior meanings were reassinged with new meanings. ISO 3166 country identifiers have been used by applications of the ISO 639 family of standards to indicate sub-language distinctions, such as differences in spelling or lexical items. Such changes to country identifiers have potential for very detrimental effects on applications of the ISO 639 standards. I note with interest that ccTLDs make use of ISO 3166 in spite of its potential for instability. In the case of ccTLDs, however, there is a considerable infrastructure for dealing with this: the DN system and strict procedures for deploying changes in ccTLDs onto domains. In the case of language tags, there are no such procedures for deploying changes in meanings of country identifiers across instances of metadata elements used to declare linguistic properties of information objects, nor is anything of that sort feasible in the general case. It may be that in the context of certain Internet protocols it is feasible to deploy changes in ISO 3166 across instances of language tags used by those protocols -- I don't know if this is true for any Internet protocols or not. It is certainly not true of all applications of ISO 639 standards that also make use of ISO 3166. In the latter regard, I would like to point out that the IETF specification RFC 3066 is refereced for use in metadata in many other places than IETF protocols, one important application of this specification being its use for the xml:lang attribute in XML. To the extent that ISO 3166 country codes can be reassigned with new meanings, the potential for detrimental effects on RFC 3066 language tags at least in contexts such as XML is of concern to TC 37. To the extent that the proposed draft aims to protect language tags from instability of ISO 3166 country identifiers where there is potential for detrimental effects on metadata elements declaring linguistic properties of language resources and other information objects, TC 37 would view the intent to achieve stability a good thing. It may be that the way in which it aims to achieve this may not be the best in the IETF context -- that is for IETF and not TC 37 to say. In the long term, though, TC 37 would support measures that would lead to ensuring that language tags defined by RFC 3066 or its successors are not subject to detrimental changes in semantics. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, specifications, "stability", and extensions
> From: [EMAIL PROTECTED] [mailto:ietf-languages- > [EMAIL PROTECTED] On Behalf Of Bruce Lilly > > I don't think it's that uncommon to refer to a specification A that > makes use of another specification B as an application of B. > > Perhaps, but I think it's best to avoid misunderstanding in > technical discussion by being precise in use of terminology. I was being precise. Note that ISO 639 uses "application of language identifiers" in exactly the same sense in which I have used "application of RFC 3066". Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: Last Call on Language Tags (RE: draft-phillips-langtags-08)
> From: John C Klensin <[EMAIL PROTECTED]> > (iii) One way to read this document, and 3066 itself for > that matter, is that they constitute a critique of IS > 639 in terms of its adequacy for Internet use. Not exactly. It reflects that ISO 639 alone does not support all of the linguistically-related distinctions that need to be declared about content on the Internet -- something that ISO 639 itself acknowledges (in general, not just in relation to the Internet). Just as RFC 1766/3066 also use ISO 3166 country codes to make sub-language distinctions (e.g. to distinguish vocabulary or spelling), so also there is a need to use ISO 15924 to distinguish between different written forms of a given language. The proposed draft incorporates ISO 15924 -- something that very nearly happened in RFC 3066, but did not since ISO 15924 was still in process and (as I see it) those of us involved needed more time to evaluate the idea (which has happened in the years since then, to the point that we have confindence about this step). RFC 1766/3066 also allowed tags to include subtags used for various purposes, and some tags have been registered to reflect sub-language variations other than those that can be captured using country (or script) IDs. This is another way in which ISO 639 alone is not sufficient, and the need for tags that include such variant subtags has been demonstrated. The proposed draft constrains the structure of tags including such variant subtags so as to avoid haphazard and inconsistent structuring of tags, which would present signficant problems. (Of course, that is not all that the proposed draft does.) Thus, I would not describe this as a critique of ISO 639. It is simply a recognition that ISO 639 itself makes that there are language distinctions that often need to be made that ISO 639 itself does not make. > From > that perspective, the difference between the two is that > 3066 was prepared specifically to meet known and > identifiable Internet protocol requirements that were > not in the scope of IS 639. The new proposal is more > general and seems to have much the same scope as ISO > 639-2 has, or should have. The scope of what is needed for Internet language tags is greater than the scope of ISO 639-2, which is even more limited than the general comments I made about wrt ISO 639 (which comments are equally applicable to ISO 639-1, ISO 639-2 or ISO DIS 639-3). > It is not in the IETF's > interest to second-guess the established standards of > other standards bodies when that can be avoided and, > despite the good efforts of an excellent and qualified > choice or tag reviewer, this is not an area in which the > IETF (and still less the IANA) are deeply expert. So > there is a case to be made that this draft should be > handed off to ISO TC 37 for processing, either for > integration into IS 639-2 or, perhaps, as the basis of a > new document that integrates the language coding of > 639-2 with the script coding of IS 15924. Speaking as a member of TC 37, of the ISO 639-RA Joint Advisory Committee, and project editor for ISO 639-3, I can say that it would be possible for TC 37 to take on a project to develop a standard for language-tags that addresses some of the needs this draft is attempting to meet, such as integrating ISO 15924. Note, though, that incorporation of this draft (or even RFC 1766/3066) into ISO 639-2 would be well beyond the scope of ISO 639-2. Something of this nature would necessarily involve a distinct standard, and perhaps one that is not part of the ISO 639 series. I'd also like to observe that various members of TC 37 and the ISO 639-RA/JAC have observed or participated in the development of this draft. For my part, it is not the draft I would have developed if I had undertaken it, but I see no problems with it from a TC 37 or ISO 639-RA/JAC perspective. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, specifications, and extensions
> From: [EMAIL PROTECTED] [mailto:ietf-languages- > [EMAIL PROTECTED] On Behalf Of Bruce Lilly > > There is nothing in RFC 3066 that says a registered tag must have 3 to 8 > characters in the second subtag. It simply requires that any tag in which > the second subtag is 3 to 8 letters must be registered. > >The following rules apply to the second subtag... > That does not permit tags with two-letter second subtags to be registered > in the IANA registry; it permits that only for "Tags with second subtags > of 3 to 8 letters". Granted, it could be clearer. Are you familiar with the term "eisegesis"? You are "putting words in the mouth" of the RFC. It does not say what you claim. You are very clearly mis-interpreting it, as evidenced by the registry and by the text of the RFC itself: This procedure MAY also be used to register information with the IANA about a tag defined by this document, for instance if one wishes to make publicly available a reference to the definition for a language such as sgn-US (American Sign Language). You are, quite simply, mistaken on this point. > > There is no reason to create a separate mechanism. When identifying > textual content, > > Language is not exclusively associated with text. It is also a > characteristic of spoken (sung, etc.) material (but script is > not). True, though at present, the vast majority of linguistic content on the Internet is in the form of text. But this draft easily accommodates non-text content: don't put in a script ID when it's not an appropriate thing to declare about the content. > > the identity of the writing system > > Writing doesn't apply to spoken material, etc. There is nothing > in RFC 3282 or MIME that requires that Content-Language and/or > Accept-Language fields be used exclusively with written text. And there's nothing in the draft that would require a language tag to have a script ID. > > > In an inappropriate way. Without consideration for backwards > > > compatibility. In violation of the BCP that specified the syntax > > > and registration procedure. > > > > Not inappropriate at all. > > Specifying script for audio material is as inappropriate as > specifying charset. In Internet protocols, we do not burden > protocols with having to interpret charset information for > non-text material; we should not do so for script information. This is silly. It's like arguing that xml:lang is a bad idea in the XML spec because it can be used as an attribute on any kind of element, including elements that happen not to contain linguistic content. So the draft would make it possible for someone to tag audio material with a tag containing a script ID; that doesn't mean people are going to be foolish enough to do so. > > And all your repeated comments about lack of consideration for backwards > compatibility and violation of syntax and procedures of BCP47 have been > shown to be invalid. > > Sorry -- saying so doesn't make it so. I have explained in > detail that an RFC 1766/3066 parser cannot be expected to > make sense of unregistered "sr-Latn-CS" etc. I have pointed > to specific second subtag length requirements in RFC 3066 for > registration. You have misread RFC 3066 (see above), and that has already been pointed out in earlier messages. I've been willing to be corrected when you have shown me to be wrong; it's a bit frustrating that you don't seem willing to acknowledge such a clear mistake. Any your misreading of RFC 3066 has misled you regarding what an RFC 3066 parser should or should not be able to do with "sr-Latn-CS". Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, specifications, "stability", and extensions
ching > legal "sr-Cs-Latn" containing script designation with legal "sr-CS" > (no script specified). In your comments here, you are being rather loose in your assessment of what is or isn't valid. The tag "sr-Latn" is a registered, valid RFC 3066 language tag. The tag "sr-Latn-CS" is not registered, but could be and would be valid if registered. The tag "sr-CS" is certainly valid; I have no idea how widely it is used. The tag "sr-CS-Latn" would be valid if registered, but is not registered (and it is unlikely that, if requested, a consensus could be obtained to register it, given the preference among those involved in reviewing requests for a different ordering of subtags). *If* "sr-CS-Latn" were registered (it is not), then a language-range matcher *must* match a request of "sr-CS" with content tagged "sr-CS-Latn". In preceisely the same way, if "sr-Latn-CS" were registered, a language-range matcher would, and without modification could, match a request of "sr-Latn" with "sr-Latn-CS". You cannot say that "sr-Latn-CS" has any less or more likelihood of being handled by existing language-range matchers than "sr-CS-Latn". Either the matchers work per the terms of RFC 3066 or they do not, and RFC 3066 does not indicate that either of these is any less valid than the other. > The proposed draft would make "sr-CS-Latn" > illegal and would instead require "sr-Latn-CS" which cannot be > recognized as a valid language tag by an RFC 1766/3066 parser, let > alone matching against "sr-CS". There is no reason why an RFC 1766/3066 parser should not recognize "sr-Latn-CS" as valid since it conforms to the syntax specified. A language-range matcher should match "sr-Latn-CS" against a request for "sr-Latn", but not "sr-CS". That is by design since a left-prefix matching algorithm is limited in what tags it can match, and it is considered more important to match for script than for regional variations. > > But you are speaking as though it's a problem that these tags are > registered. I have no idea why. > > Registration of a complete tag is not itself a problem. Registration > of a complete tag which incorporates script information is not an > ideal solution to the issue of conveying script information; that > would be more appropriately done using an orthogonal mechanism to > convey the orthogonal information... That's one opinion; there are many who hold a different opinion. > > But speaking of selective usage, have you noticed that RFC 3454 > identifies specific characters from ISO/IEC 10646 as prohibited? Various > space and control characters are not permitted, INVISIBLE TIMES isn't > permitted, END OF AYAH isn't permitted, COMBINING GRAVE TONE MARK isn't > permitted... How is what is proposed in this draft any more "cherry- > picking" than that? > > 1. RFC 3454 is not BCP, and isn't being pushed through for immediate >Standards status without a phased roll-in. The draft under discussion >has been proposed as BCP which would lack phased roll-in. So acceptability of selective usage depends upon whether the document is a BCP or a proposed standard? I cannot see anything in RFC 2026 that suggests that (and it seems pretty odd). > 2. RFC 3454 does not declare any parts of ISO 10646 as not valid and >does not call for setting up an IANA registry of code points for the >purpose of effectively declaring ISO 10646 code points invalid. The >draft under discussion explicitly seeks to set up a registry to >replace use of ISO standard list. RFC 3454 does say that some parts of ISO 10646 are not valid in strings output by stringprep implementations. This draft is analogous. If new characters are added to ISO 10646, it is certainly possible that RFC 3454 could be updated to exclude some of those new characters as well; what is proposed in this draft is analogous; the only difference is that the values considered invalid for the given purpose are documented in the IANA registry rather than in an RFC -- which is certainly the easier way to maintain things, though perhaps it's not considered the preferred means of doing this in the IETF context. > 3. RFC 3454 does not seek to redefine the meaning of any ISO 10646 code >points. The draft under discussion does, as specifically noted in >the case of the ISO 3166 code "CS". This draft would not change the meaning of an ISO identifier; it simply does not use the latest assigned meaning in case a prior ISO-assigned meaning in use on the Internet exists. (Note: the draft itself does not entail that CS in particular should be
RE: draft-phillips-langtags-08, process, specifications, and extensions
l your repeated comments about lack of consideration for backwards compatibility and violation of syntax and procedures of BCP47 have been shown to be invalid. > RFC 3066 doesn't require "haw-US", and if encountered provides for > matching it (in an "accept" role) with "haw" (as content to be > provided). "sr-Latn" and "sr-Latn-CS" cannot be matched by an > RFC 3066-compliant process to anything, since they do not fit the > RFC 3066 syntax for well-formed language tags. Certainly they do; and certainly an RFC 3066 parser will match "sr" with "sr-Latn" or "sr-Latn-CS", and "sr-Latn" with "sr-Latn-CS". Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, specifications, "stability", and extensions
to loss of backwards compatibility. > > > > But, as noted above, this is not an issue that is peculiar to the > proposed revision -- it already existed in RFC 3066. > > No, given a primary subtag which is a language code (and per RFCs > 1766 and 3066, that's any primary subtag with 2 or more (RFC 3066 > only, more being limited to 3) characters), the second subtag -- > in either RFC 1766 or RFC 3066 language tags -- is always a country > code and never a script code. Go back and read RFC 3066 again. It does not impose that constraint: The following rules apply to the second subtag: - All 2-letter subtags are interpreted as ISO 3166 alpha-2 country codes from [ISO 3166], or subsequently assigned by the ISO 3166 maintenance agency or governing standardization bodies, denoting the area to which this language variant relates. - Tags with second subtags of 3 to 8 letters may be registered with IANA, according to the rules in chapter 5 of this document. It must be a country ID *if* it is two letters, but not otherwise. > The proposed draft pulls the rug out > from under existing parsers by changing that. You are completely mistaken on this point -- the proposed draft does not change the constraint you assumed as that constraint never existed. > Again you seem to be conflating established Internet Standards Track > protocols with "applications" I apparently am using "applications" in a sense you're not familiar with. I don't think it's that uncommon to refer to a specification A that makes use of another specification B as an application of B. > and ignoring the critical importance of > backwards compatibility. As stated earlier, I quite disagree that back-compat issues have been ignored. > > Note that there is nothing that prevents other applications from using > other matching algorithms, including perhaps something that is able to > recognize in "az-AZ" and "az-Latn-AZ" that both involve Azeri and used in > Azerbaijan. > > The issue at hand is the existing deployed base of RFC 3066 > implementations that depend on the matching algorithm specified > therein (which doesn't work with a script tag interposed between > language code and country code). You say that these do not work; these implementations will still work, but they will match "sr-Latn" but not "sr-CS" with "sr-Latn-CS". If that is a problem, please explain why. > > This is all a discussion we on the IETF-languages list went through five > years ago, and in the intervening five years I think we have reached a > consensus on these issues, that consensus being reflected in the proposed > revision to RFC 3066. (Note that we made the relevant decisions over a > year and a half ago when we reached a consensus to register az-Latn etc. > The precedent was established then; the proposed revision adds nothing new > in this regard.) > > As previously noted, that is a danger recognized by RFC 2026 in > activity that does not conform to IETF procedures; it is > possible to reach good consensus on the wrong approach. Well, that potential was created when RFC 1766 was first approved. Tags like az-Latn could have been registered under the terms of that RFC just as readily as RFC 3066. But you are speaking as though it's a problem that these tags are registered. I have no idea why. > > 7.1 says... > > The proposed revision does not create Internet-specific versions of ISO > standards... > By cherry-picking, it effectively seeks to establish such a version. I would not call what is done "cherry-picking". Any identifier defined in the source standard is valid for use, except in the case that the identifier was previously defined with a different meaning in that ISO standard. That isn't cherry-picking; that is a blindly-applied general principle, created with reasoned motivation: to provide stability. But speaking of selective usage, have you noticed that RFC 3454 identifies specific characters from ISO/IEC 10646 as prohibited? Various space and control characters are not permitted, INVISIBLE TIMES isn't permitted, END OF AYAH isn't permitted, COMBINING GRAVE TONE MARK isn't permitted... How is what is proposed in this draft any more "cherry-picking" than that? > > 10.1 states a general policy regarding IP... > The ISO, as developers of ISO 639 and 3166, have rights. In particular, > they have the right to determine what those standards specify -- in > whole -- and they have the right to revise and amend those standards, > and are the sole arbiters of what is (and what is not) "valid". They certainly have and retain rights over standards for language, script and country identifiers. They do not, however, determine what is valid for use in Internet protocols. Just as it is appropriate for an IETF document RFC 3454 to specify for particular reasons that certain encoded entities of ISO/IEC 10646 are not valid for Stringprep output, so also it is appropriate for an IETF document to specify for particular reasons that certain encoded entities of an ISO standard are not valid for use in language tags used on the Internet. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, sp ecifications, "stability", and extensions
> From: JFC (Jefsey) Morfin [mailto:[EMAIL PROTECTED] > >Of course it would not be clear if you don't have a conceptual model of > >what "language" tags are identifiers *of*. When RFC 3066 was being > >developed, there was a suggestion that script IDs be incorporated, but > >some were reluctant, raising the same question you have here. I was one > of > >those. But I didn't remain obstructionist over the issue; instead, I gave > >a fair amount of thought to the ontology that underlies "language" tags, > >and subsequently published a white paper and presented on the topic at > two > >conferences in the spring and fall of 2002. (Paper is available online at > >http://www.sil.org/silewp/abstract.asp?ref=2002-003 -- my thinking has > >evolved since then, but some key results remain valid, I think.) > > May us know which ones? It would be easier to identify two key points on which my thinking has changed. IIRC, I was uncertain at the time about what to do wrt sorting. I have since concluded that sort order is a presentation issue that, while linguistically related, is out of scope for language identifiers. (Note that there is no common usage scenario in which it makes sense to declare the sorted order of content.) Sort order may certainly be in scope for a locale identifier, but not for a "language" tag. The bigger change is that I have abandoned the fourth main category in the ontological model I proposed. At the time, I was still trying to work out where something like "Latin America Spanish" fit in. I saw the similarity to sub-language varieties / dialects, but at the time thought it needed to be a distinct category, for which reason I concocted the notion "domain-specific data set". I was never very satisfied with that: it wasn't a particularly consistent model (a data set is quite a different kind of thing from a language variety) and it ignored the similarity with sub-language variety. (And the name was a bit unwieldy.) I have since realized that I was tripping up on the very problem that was blocking the Language Tag Reviewer from accepting the requested registration for "es-americas": the assumption that a language tag necessarily refers to a conventionally-recognized linguistic identity that exists in the world. Language tags are not attributes declared on language varieties; they are attributes declared on information objects, indicating linguistic properties of those information objects. And the linguistic attributes of an information object do not necessarily coincide with conventionally-recognized linguistic identities. Of course, in the majority of useful cases they will; but it's not hard to show that this is not always the case: e.g. if I present "chat" as an expression that could be intrepreted in relation to several different languages, it would be entirely appropropriate for me to declare a linguistic attribute of that expression of "indeterminate" since that is precisely my intent -- but clearly "indeterminate" doesn't correspond with any particular language identity out in the world. Thus, I came to realize that the kind of distinction intended by "es-americas" was just the same kind of distinction made for any sub-language variety: it declares that the information object is not only in some particular language, but is even more constrained in terms of the language variety in use. It is simply coincidental that the more constrained usage in this case doesn't coincide with a single dialect used by some identifiable speaker community. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, sp ecifications, "stability", and extensions
a *non goal* of the proposed draft to accommodate that level of detail as it is not appropriate to try to capture that level of ad hoc detail in a general-purpose metadata element. > >The bigger problem you're pointing out is the limitations of using > >suffix-truncation alone as a matching algorithm... > This shows that language matching algorithm should not be addressed in the > same document. I also submit that this kind of matching policy should be a > possible decision of the user. Obviously IA rules should be mentionned. It doesn't show that matching should not be addressed in the same document; it merely shows that one particular algorithm doesn't meet all needs. It would be possible to move all discussion of matching to another document, but I don't see any reason why that must be done. The draft discusses some general considerations and leaves plenty of room for separate specifications for particular matching algorithms for use in particular applications. > > > Surely some types > > > of script is indicated by the charset; in situations where that > > > is not the case, a separate mechanism could be used for that > > > orthogonal parameter without breaking compatibility with > > > existing parsers of language tags. > > > >This is all a discussion we on the IETF-languages list went through five > >years ago, and in the intervening five years I think we have reached a > >consensus on these issues, that consensus being reflected in the proposed > >revision to RFC 3066. (Note that we made the relevant decisions over a > >year and a half ago when we reached a consensus to register az-Latn etc. > >The precedent was established then; the proposed revision adds nothing > new > >in this regard.) > > Are we sure that this "others have reached a consensus without your > objections, so we will not consider them" is a valid form of consensus? I was merely trying to point out that the questions you are asking are not new, that decisions *have* been taken, and that the results are now part of the Internet legacy. You are certainly welcome to consider whether there's a better way and to propose some entirely new infrastructure for the Internet, but that should not prevent those of us who have been working on the evolution of the existing infrastructure for the past several years to continue to move forward in that evolution. Or were you suggesting that at any time anybody should be able to question whether standards that have been in use for some time were formed with adequate consensus? > > > Please see RFC 2026 sections 7.1, 7.1.1, 7.1.3, and 10.1... > >7.1 says... The fact that not all are used, or that some are > >used as they were specified in dated version of the ISO standard is not > in > >contradiction with 7.1 -- it's just one of "several ways in which an > >external specification... may be adopted." > > I am sorry but this does not stand. The proposed revision directly refers > to ISO standards while there are Internet documentation of the way they > should be used. > > Examples/ > 1. OSI 3166 is refered to. RFC 1591 should. RFC 1591 introduces > differences > (we all live with) with OSI 3166 which is taken as a reference to know > what > is a country. > 2. OSI 639 scripting fr-FR is used while RFC 1958 leads to fr-fr or FR-FR > or FR-fr indifferently and calls for fra-fr to avoid confusion. > > In RFC 1591and RFC 1958 parlance "en-GB" should therefore be "eng-uk" RFC 1591 and RFC 1958 are specifications for completely separate protocols. Not only is it completely inappropriate to suggest that RFC 1766 or its successors must be subject to these unrelated specifications, to do so would break a large number of existing implementations of RFC 1766 and RFC 3066 (let alone the proposed revision). This is truly nonsense. > >Thus, I see no difference between RFC 3066 and this proposed revision in > >relation to compliance with the sections of RFC 2026 you referred to. > > Full agreement. So there is no need for it - except to enhance the RFC > 3066 > for its specific applications. This is OK as long as this is clearly > stated. The goals for the proposed revision in enhancing RFC 3066 are clearly stated in the draft. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, specifications, "stability", and extensions
s an explicit cooperative arrangement to do so has been made. However, there are several ways in which an external specification that is important for the operation and/or evolution of the Internet may be adopted for Internet use. The proposed revision does not create Internet-specific versions of ISO standards; it uses IDs drawn from ISO standards with semantics defined in those source standards at the time they were adopted for use in language tags -- the source for the IDs, the symbols and their meanings all reside in the ISO standards. The fact that not all are used, or that some are used as they were specified in dated version of the ISO standard is not in contradiction with 7.1 -- it's just one of "several ways in which an external specification... may be adopted." 7.1.1 simply says that an open extenal standard may be incorporated merely by reference. There is no requirement here that is not met by the proposed revision. 7.1.3 simply says that an Internet specification may be an adaptation of an external specification provided certain conditions are met. Neither RFC 3066 or the proposed revision are adaptations of any existing external specification, so this is not applicable. 10.1 states a general policy regarding IP: In all matters of intellectual property rights and procedures, the intention is to benefit the Internet community and the public at large, while respecting the legitimate rights of others. Again, there is no requirement stated here that is not met by the proposed revision. Clearly, the intent of the proposed draft is to benefit the Internet community and the public at large. There are no rights of others that are in any way violated by the proposed revision. Thus, I see no difference between RFC 3066 and this proposed revision in relation to compliance with the sections of RFC 2026 you referred to. > Agreed. But the activity on the ietf-languages list regarding the > draft under discussion isn't an IETF process -- there is no WG or > Chair, no charter, etc. Like the fictional Topsy, it jes' growed up. RFC 3066 was developed in exactly the same manner as this proposed revision has been developed -- as an internet draft prepared by a member of the the IETF-languages list and processed among members of that list until it was submitted for last call and subsequent IESG action. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, specifications, "stability", and extensions
> From: [EMAIL PROTECTED] [mailto:ietf-languages- > [EMAIL PROTECTED] On Behalf Of Bruce Lilly > > Do what you feel is warranted, Bruce. You don't appear to be trying to > achieve consensus, which is the touchstone of the IETF process as I > understand it. If you feel issues should be taken to the IESG, then do so. > > You have yourself noted that the draft is an individual > submission, not the result of an IETF process. "consensus" > doesn't apply to an individual effort. IF you want to > adhere to IETF process, by all means ask the IESG to set > up a working group, with a charter, a Chair, etc.; I > fully support that. I don't understand why these kinds of comments are arising. To my understanding (Harald can correct me if I'm wrong), the process that has been taken in preparing the proposed revision of RFC 3066 is the same as what was done in development of RFC 3066 as a replacement for RFC 1766. A general consensus was achieved on the IETF-languages list in preparing the draft for "RFC1766bis", and in exactly the same way a general consensus was achieved on this list in the preparation of "RFC3066bis". Subsequent steps were taken with RFC 3066 for it to be given BCP status, but that did not involve establishment of a working group; I don't understand what should prevent the same thing happening in this case. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: New Last Call: 'Tags for Identifying Languages' to BCP
> From: [EMAIL PROTECTED] [mailto:ietf-languages- > [EMAIL PROTECTED] On Behalf Of Bruce Lilly > > > The point is that under RFC 3066, > > > the bilingual ISO language and country code lists are > > > considered definitive. > > > > That is nowhere stated or even suggested in RFC 3066. > > RFC 3066 section 2.2 states, in part: > >- All 2-letter subtags are interpreted according to assignments found > in ISO standard 639, "Code for the representation of names of > languages" [ISO 639], or assignments subsequently made by the ISO > 639 part 1 maintenance agency or governing standardization bodies. > > and has a similar statement regarding ISO 3166. > > "interpreted according to assignments found in" certainly > sounds as if the ISO lists are considered definitive for > their respective categories of subtags, since their > interpretation is specified as that given in those lists. > I don't see how the RFC 3066 text can be interpreted > otherwise. You're now quoting things so far removed from their context that they are no longer being evaluated fairly. I believed we were talking about the specific strings, as you had made reference to implementers of bilingual products not having access to that data. Perhaps I misunderstood you, but whether or not, the relevant facts are that RFC 3066 referred to ISO source standards to establish the denotation of identifiers drawn from those standards, and the proposed revision does the same. Peter Constable Microsoft Corporation ___ Ietf mailing list [EMAIL PROTECTED] https://www1.ietf.org/mailman/listinfo/ietf
RE: New Last Call: 'Tags for Identifying Languages' to BCP
> From: [EMAIL PROTECTED] [mailto:ietf-languages- > [EMAIL PROTECTED] On Behalf Of John Cowan > But absolutely nothing except his good sense prevents Michael from > registering > en-the-dialect-spoken-on-the-bowery-between-1933-and-1945-by-alcoholic- > drug-users-who-live-in-flophouses. Sub-tags can be at most 8 chars long, so Michael would ask for it to be changed to something like en-the-dialect-spoken-on-the-bowery-between-1933-and-1945-by-alcoholc-dr ug-users-who-live-in-flophses. :-) Peter Constable Microsoft Corporation ___ Ietf mailing list [EMAIL PROTECTED] https://www1.ietf.org/mailman/listinfo/ietf
RE: New Last Call: 'Tags for Identifying Languages' to BCP
> From: [EMAIL PROTECTED] [mailto:ietf-languages- > [EMAIL PROTECTED] On Behalf Of Bruce Lilly > > By reading both RFC 2047 and RFC 2231, one > > finds that they assume that a language tag must be at most 64 characters > > long... > > - the shortest charset names are 2 characters long (e.g. "IT") > > Not all charsets have 2-character names... In determining the longest language tag permitted, one must identify the shortest possibilities for all other components. > > - the minimum encoded-text length is 1 character long > > That is strictly only true for text that meets all of the > following conditions... Hey, I just said what the EBNF said. > > An encoded-word must contain at least 11 characters that are not part of > > the language tag and have a total length of no more than 75 characters. > > Therefore, an upper bound on language tags that can be used in an RFC > > 2047/2231 encoded-word production is 64 characters. > > That is a best case upper bound... I identified it as such. > The worst case appears to be the charset named > Extended_UNIX_Code_Fixed_Width_for_Japanese (43 characters)... > As mentioned, use of an encoded-word > plus the necessary whitespace around it to represent a > single character is rather wasteful, so a brief language tag > is indicated; fortunately "ja" suffices for text likely to > be used with that charset. Of course, the length limitations must be balanced between the charset tag, the language tag and the encoded-word itself. > > I see no reason why limits must be added as a > > constraint in a revision of RFC 3066. > > The primary reason for specifying limits is due to the > proposed removal of the review/registration process > which currently limits the length of non-private-use > tags. The review/registration process for RFC 3066 registrations does not impose pre-defined limits that implementers of RFC 3066 can assume in their parsers. > > It would be a good idea, however, > > to point out in section 2.1 of the draft that some applications of this > > specification may impose limits on the length of accepted language tags, > > and perhaps to cite RFC 2231 as an example. > > As a general principle, that's fine, however I would point > out that given the inability of experts to be able to > accurately point out the limits quickly... I do > not think it is sufficient merely to state the fact that > there are limits, with or without a pointer to RFC 2231 as > an example. Some indication of the magnitude of worst-case > restrictions is at least advisable... How is it possible to identify what is the worst-case bound assumed in implementations that are out there? How is it possible to predict ahead of time what is the worst-case length for a RFC3066-registered language tag? Neither is possible. In light of that, I think it best to make sure implementers of the revised RFC 3066 be reminded that some implementations may impose limits (whether those implementers be constructing tags or passing them from one process to another), and for implementers to incorporate robustness into their implementations so that they can respond gracefully if an unexpectedly-long tag is encountered -- after all, no matter what limit could be imposed in a revision to RFC 3066, there's no way to stop malware from sending bad data. (How *do* encoded-word parsers react if a bogus charset or language tag that's 2k octets long is encountered? The encoded-word spec already allows for segmenting long strings; could it not also be revised to allow segmenting for the parameters, which would also make it more robust?) Peter Constable Microsoft Corporation ___ Ietf mailing list [EMAIL PROTECTED] https://www1.ietf.org/mailman/listinfo/ietf
RE: New Last Call: 'Tags for Identifying Languages' to BCP
> From: [EMAIL PROTECTED] [mailto:ietf-languages- > [EMAIL PROTECTED] On Behalf Of Bruce Lilly > Currently sr-CS has a specific > meaning under RFC 3066; it has had for some time. The meaning "Serbia and Montenegro" was introduced relatively recently (a little more than a year ago), was immediately received with alarm by many in the IT sector. There were vain attempts to get it reversed, and that failure was an impetus to introduce protection against such changes in the revision of RFC 3066. I am not aware of "CS" being used in the IT sector with the new meaning, though cannot guarantee that. Peter Constable Microsoft Corporation ___ Ietf mailing list [EMAIL PROTECTED] https://www1.ietf.org/mailman/listinfo/ietf
RE: New Last Call: 'Tags for Identifying Languages' to BCP
> From: [EMAIL PROTECTED] [mailto:ietf-languages- > [EMAIL PROTECTED] On Behalf Of Bruce Lilly > > This is a situation we do not intend to repeat. > > That is precisely what would be repeated, and the problem > would remain. "CS" currently means "Serbia and Montenegro", > and its use in accordance with RFC 3066 has precisely that > meaning. And that is a significant problem we wish to remedy as there is some unknown amount of data or implementations out there that use "CS" but with a different meaning intended. > > > > The usability flaw in treating ISO 639 and ISO 3166 as > > human-readable is > > > > evident in the confusion between ja and JP (or is it jp and JA?), > [...] > > It is not uncommon for users to confuse "JA" and "JP". > > Which clearly demonstrates why mere codes in the absence > of definitions associated with the codes is a pointless > proposition. I believe you have confirmed my point, that codes are not meant to be human readable. As for your concern regarding definition, it has been clearly pointed out that codes will not be lacking definitions -- the same definitions they have today from the same sources (with references made to the same sources) will still be available. > > Again, not hypothetical at all. > > Last time I checked, "US" didn't mean France, and "CN" > didn't mean Canada -- I suggest that you might want to > brush up on the definition of hypothetical... The case is hypothetical, but the hypothetical case serves to illustrate a general scenario, and the general scenario is not hypothetical. > > You didn't use the term "display names", but it is clearly implied by > > your reference to bilingual implementations. > > Your inference (which you incorrectly claim as my implication) > is different from my claim. My claim is that under RFC 3066, > the definitions... You have failed to quote what you originally wrote which I claimed made this implication: you spoke not of definitions but of bilingual applications. > > Definitions in multiple languages are not a requisite to establishing > > the denotation of a coded element. > > True but irrelevant to the point. Oh? Simply because you make this assertion? > We now have definitions of > specific types of elements (viz. country and language tags) in > multiple languages, and the objection is to the unnecessary > removal of that characteristic. The definitions we have now will remain, they will continue to be referenced and available. I do not see how you say they are being removed? Peter Constable Microsoft Corporation ___ Ietf mailing list [EMAIL PROTECTED] https://www1.ietf.org/mailman/listinfo/ietf
Re: New Last Call: 'Tags for Identifying Languages' to BCP
> From: Vernon Schryver <[EMAIL PROTECTED]> > Subject: Re: New Last Call: 'Tags for Identifying Languages' to BCP > To: [EMAIL PROTECTED] > Message-ID: <[EMAIL PROTECTED]> > Besides, I didn't say that one should ignore the English, but that > implementors give precedence to the ABNF. When you are writing an RFC > that you hope will be implemented, you MUST remember that programmers > are lazy. We transliterate the ABNF to build the parser and so implement > the syntax and read the English to figure out and so build the semantics. > As I said, if you must have contradictions between your ABNF and your > English, you must accept the fact that most technical people will > assume your ABNF is right and your English is wrong. That fact seemed > to me to conflict with statements in this thread, and that suggests a > problem in your working group and your RFC. This is somewhat moot since the author has indicated the relevant portion of the ABNF will be revised. In this case, though, the ABNF could not be said to be in contradiction with the English prose: anything permitted by the constraints specified in the English prose would be recognized using the ABNF. It is true that there are strings that could be recognized by the ABNF that would not be permitted by the English prose, but the revision being made to make the ABNF production in question match what Bruce Lilley thought it should be does not change that. The only way to write the ABNF in a way that it permits exactly no more or no less than what is specified by the English prose would be to have the production rule simply enumerate a specific set of terminal strings, which does not seem to be particularly helpful, especially when the the RFC would establish a machine-readable registry maintained by IANA in which those very strings are enumerated. Peter Constable ___ Ietf mailing list [EMAIL PROTECTED] https://www1.ietf.org/mailman/listinfo/ietf
RE: New Last Call: 'Tags for Identifying Languages' to BCP
> From: [EMAIL PROTECTED] [mailto:ietf-languages- > [EMAIL PROTECTED] On Behalf Of Bruce Lilly > > > The point is that under RFC 3066, > > > the bilingual ISO language and country code lists are > > > considered definitive. > > > > That is nowhere stated or even suggested in RFC 3066. > > RFC 3066 section 2.2 states, in part: > >- All 2-letter subtags are interpreted according to assignments found > in ISO standard 639, "Code for the representation of names of > languages" [ISO 639], or assignments subsequently made by the ISO > 639 part 1 maintenance agency or governing standardization bodies. > > and has a similar statement regarding ISO 3166. > > "interpreted according to assignments found in" certainly > sounds as if the ISO lists are considered definitive for > their respective categories of subtags, since their > interpretation is specified as that given in those lists. > I don't see how the RFC 3066 text can be interpreted > otherwise. RFC 3066 indicates that the *interpretation* is determined by the source ISO standards. You were discussing display names. (Though, now that I've shown that display names are out of scope, you appear to be attempting to change things as though you had been discussing definitions.) Peter Constable Microsoft Corporation ___ Ietf mailing list [EMAIL PROTECTED] https://www1.ietf.org/mailman/listinfo/ietf
RE: New Last Call: 'Tags for Identifying Languages' to BCP
for the RFC 1766/3066/... sequence of specifications. > > > One possibility would be two description fields. But the > > > registry would need a charset closer to ISO-8859-1 than > > > to ANSI X3.4 as currently specified. Or an encoding > > > scheme. > > > > Personally, I don't see the value in something like that. Given the > intent to have a registry that can be machine-readable, changing its > charset from ANSI X3.4 in order to gain descriptors in just one more > language is not worth it IMO. > > Fine, use utf-8, which encompasses ANSI X3.4 and > ISO-8859-1 (plus others). The point is that ANSI > X3.4 is inadequate. There is no point changing the charset to support something that is out of scope for the specification. > > Speaking at least for Microsoft, we're interested in having descriptors > in far more than two languages, and we certainly would not blindly base > the descriptors we present to our customers solely on what a registry > provides, no matter what its charset. > > Surely in going from two (the current situation per > RFC 3066) to "more than two" indicates that decreasing > to one (as in the draft proposal) is heading in the > wrong direction. It certainly invalidates the claim > that the proposal is compatible with existing > implementations, at least one of which does make use > of the descriptions currently provided in both > languages in the ISO lists specified by RFC 3066. Incorrect; you are making false claims about what is specified in RFC 3066. Peter Constable Microsoft Corporation ___ Ietf mailing list [EMAIL PROTECTED] https://www1.ietf.org/mailman/listinfo/ietf
RE: New Last Call: 'Tags for Identifying Languages' to BCP
se in RFC 3066. This draft does not change that; it merely provides some info that may save you having to go look up the ISO standard, but that info is not the last word. > So, you're saying that the ISO definition of "CS" as > "Serbia and Montenegro" will continue to be valid, with > that meaning, in a language-tag? The meaning of an ID in the registry that came from an ISO standard is the meaning it had in the version of that ISO standard from which it was obtained. (Typically, that is the current version of the ISO standard at the time the ID is added to the registry, though the initial registry being prepared will have some exceptions to resolve pre-existing ambiguous cases, such as CS.) If you're really wanting to know what the meaning of "CS" would be per the proposed draft, the proposal is that it will forever remain valid with the meaning "Czechoslovakia" as it was originally defined in ISO 3166. > The foolishness is your insistence on trying to tie > the definitions to a localization issue. It was you who established it as a localization issue, very clearly: > Surely, though, this is not a technical argument against the proposal. Not purely technical, though it presents problems for existing implementors who provide bilingual support. Eliminating bilingual descriptions for the language, country (and UN region) codes leaves implementors in a quandary. > I haven't specifically discussed "display names"; that is your > assertion, and not my basis for objection. You didn't use the term "display names", but it is clearly implied by your reference to bilingual implementations. > I refer to the > definitions and the need to map to and from those definitions > at either end of the communications channel. Whether or not > that happens by "display" is incidental to the issue of the > number of languages that the definitions are provided in. Definitions in multiple languages are not a requisite to establishing the denotation of a coded element. There are widely-adopted coding standards that establish denotations using one language only. In this case, though, the denotations of ISO IDs is established by the ISO standard (and particular version of that standard) from which they were obtained. The registry contains a description that dismbiguates which ISO definition is to be used, but is not a replacement for the ISO definition. Peter Constable Microsoft Corporation ___ Ietf mailing list [EMAIL PROTECTED] https://www1.ietf.org/mailman/listinfo/ietf
RE: New Last Call: 'Tags for Identifying Languages' to BCP
ot replacements for content of the source standards themselves - that we do not need to change the proposed format of the registry to include descriptions in multiple languages Peter Constable Microsoft Corporation ___ Ietf mailing list [EMAIL PROTECTED] https://www1.ietf.org/mailman/listinfo/ietf
RE: New Last Call: 'Tags for Identifying Languages' to BCP
he goal that we had wrt stability while eliminating the concern that English-only annotations for some reason apparently create for you. Personally, I think the English annotation is helpful, but it seems that the real solution you're looking for is to remove any annotation whatsoever so that the situation is closer to what we have under RFC 3066. > > Display names for languages and countries are not within the scope of > > RFC 1766 or RFC 3066. It is preposterous to suggest that this draft is > > not compatible with existing implementations of RFC 3066 on that basis. > > On the contrary, it is preposterous to suggest that codes > will be attached to text by magic; some human somewhere, > somehow is going to have to indicate the language to > something, and it certainly isn't going to be by way of > a 2- or 3-letter code without some reference to what those > codes *mean*. And at the present time, the meaning of > those codes is defined -- bilingually -- in the ISO > lists. RFC 3066 did not even discuss let alone provide a means for attaching display text to codes. It *is* preposterous to suggest that this draft is incompatible with RFC 3066 on that basis. Again, the more you press this, the more silly it seems. > > But > > you are simply adding localization requirements to a spec for i18n > > infrastructure, and I consider that not at all appropriate. > > No, I am complaining about removal of internationalized > definitions associated with language tag components. No definitions are removed. The draft points to the source ISO standards just as RFC 3066 does. > "Localization" would be translation of the French definition > into some other language. That is not my concern. My concern > is the elimination of the French definition in the first place. No, you have not commented on definitions; you have repeatedly commented on stings to present to users. Please accept that your arguments on this matter are empty. > > > One part of my claim is that non-private-use RFC 3066 tags > > > up to the present time are no longer than 11 octets in length. > > > > Only co-incidently at the present time. > > As mentioned, under RFC 1766/3066 review/registration rules, > excessively long tags would certainly raise objections. That's > no coincidence -- it's an intentional design feature. But excessive is not defined anywhere in RFC 1766/3066, and if there was a very good reason presented why a tag of x characters long were needed, it would have to be considered. > > And so that limit would be a constraint applying for all time to the > > 'grandfathered' production which concerned you so much. > > And so it can easily be incorporated into that ABNF production. The productive thing would be for you to provide a suggested revision of the ABNF to the authors. Peter Constable Microsoft Corporation ___ Ietf mailing list [EMAIL PROTECTED] https://www1.ietf.org/mailman/listinfo/ietf
RE: New Last Call: 'Tags for Identifying Languages' to BCP
grammar. > > > > The main concern was with the "grandfathered" production, but I've shown > > that that is a non-issue. > > Again, it is an issue that imposes requirements on language > tag parsers. What you've shown is that the ABNF is not > consistent with what was desired to be expressed, and > that makes it an issue that needs to be addressed. Again, I believe the bigger issue is not getting the ABNF to express what was desired, but rather whether parsers are written to consider only the ABNF or the ABNF plus other specified constraints as well. > > The maximal length issue exists just as much > > in RFC 3066 due to private-use tags; it is a technical concern that > > might worth reviewing in RFC 3066bis, however; but it is not > > insurmountable, and not a new problem. > > Private-use carries its own considerable baggage; aside from > that, the draft proposal increases the length of non-private > tags that affect both protocol design and implementations > from a worst case maximum of 11 octets under RFC 3066... Worst case at present; a month from now it could be unlimitedly larger. But I've accepted that it would be an improvement to add constraints on overall length. Peter Constable Microsoft Corporation ___ Ietf mailing list [EMAIL PROTECTED] https://www1.ietf.org/mailman/listinfo/ietf
RE: New Last Call: 'Tags for Identifying Languages' to BCP
> From: [EMAIL PROTECTED] [mailto:ietf-languages- > [EMAIL PROTECTED] On Behalf Of Bruce Lilly > > What is silly is saying that every language tag has to have a date/time > > attribute associated with it so that computer software managing that > text > > knows the language of that text. > > In the specific cases of the core Internet protocols that > I have mentioned, there *is* a date/time attribute in the > form of an RFC [2]822 Date field. If we're talking about > some file stored on some machine, every OS that I know of > has a date/time stamp associated with that file. If you > have something else in mind, a concrete description and/ > or example might help. That is not sufficient for many other implementations of RFC 3066. For instance, an XML document may well be stored in a file system that has date/time stamps associated with the file; it might also be stored in a content manangement system that does not report creation dates when returning content. And elements from within that XML document may be returned as the result of an X-Path query or a call into a DOM API, and those surely cannot be assumed to have creation date/time stamps, though one certain must assume that they can have RFC 3066 tags as xml:lang attributes. > I'm not "eager to abolish" "uniqueness". There never was > any guarantee that codes would never change. Both RFCs > 1766 and 3066 specifically mention changes as a fact of > life. Some of us consider that fact and the instability particularly of ISO 3166 to be a serious problem. That (not accessibility) was one of the key reasons for this revision. > > > SO where are the French definitions? > > > > Ask a person who is bilingual in English and French to provide one. > > That would lack definitiveness which characterizes the > ISO lists. You started out this thread by talking about display names, not definitions; hence Mark's suggestion. Now you have switched to talking about definitions. The draft clearly indicates where one finds the definitions: " o All 2-character language subtags were defined in the IANA registry according to the assignments found in the standard ISO 639..." I.e. the definition is provided in the registry on the basis of what is defined in ISO 639; hence if what is indicated in the registry is for any reason insufficient for your purposes, you consult the definitive source, the ISO standard. Peter Constable Microsoft Corporation ___ Ietf mailing list [EMAIL PROTECTED] https://www1.ietf.org/mailman/listinfo/ietf
RE: New Last Call: 'Tags for Identifying Languages' to BCP
> The specification of the > draft is *NOT* compatible with that existing implementation > because it removes the existing functionality of official > descriptions in French of language and country codes. As a > result of that incompatibility, the newly proposed > specification does not work with (at least that one) > existing implementation (but I agree that that is a crucial > concern). Display names for languages and countries are not within the scope of RFC 1766 or RFC 3066. It is preposterous to suggest that this draft is not compatible with existing implementations of RFC 3066 on that basis. > > There are 6000 languages spoken on Earth, of which > > perhaps 600 have a standard written form. > > ISO 639 lists about 650, not precisely 6000. Between ISO 639-1 and ISO 639-2, there are less than 400 individual languages listed. The number 6000 was given as a rough figure, and it is fairly well known that the number of living languages is on that order. ISO 639-3 will list over 7000 different individual languages. > It might be worthwhile considering the differences in the > way languages tags are used, by whom they are used, and for > what purpose. There may well be a substantial difference > between use of a tag to represent an obscure dialect of a > dead language in a research paper vs. tagging a piece of > text in one of the core Internet protocols such as SMTP. > The draft seems to ignore the needs of the core Internet > protocols (e.g. unbounded tag length which is incompatible > with those protocols). IETF language tags are used in a wide variety of applications. The parties involved in development of this spec (the authors and others) have examined these issues for the past several years and have arrived at this architecture. > > What is supposed to > > be privileged about English and French? > > They happen to be the languages in which international > standards (q.v. the ISO and UN lists) are published. That is true for ISO standards because the official languages of ISO are English and French. (Russian is also an official language of ISO, but is not required.) But this spec is not an ISO standard; it is an IETF standard. If you can point to IETF requirements that IETF specs must contain English and French, then that would be a legitimate concern. But you are simply adding localization requirements to a spec for i18n infrastructure, and I consider that not at all appropriate. > > > ABNF from the draft: > > > > You're technically right, but your underlying claim (that RFC 3066 tags > are > > bounded in length) is false, as has been shown > > One part of my claim is that non-private-use RFC 3066 tags > up to the present time are no longer than 11 octets in length. Only co-incidently at the present time. > As the draft, if/when approved, would close that registration > process, that limit (unless a longer tag is registered in > the interim) would apply for all time. And so that limit would be a constraint applying for all time to the 'grandfathered' production which concerned you so much. Peter Constable Microsoft Corporation ___ Ietf mailing list [EMAIL PROTECTED] https://www1.ietf.org/mailman/listinfo/ietf
RE: New Last Call: 'Tags for Identifying Languages' to BCP
> From: [EMAIL PROTECTED] [mailto:ietf-languages- > [EMAIL PROTECTED] On Behalf Of Bruce Lilly > The point is that under RFC 3066, > the bilingual ISO language and country code lists are > considered definitive. That is nowhere stated or even suggested in RFC 3066. Peter Constable Microsoft Corporation ___ Ietf mailing list [EMAIL PROTECTED] https://www1.ietf.org/mailman/listinfo/ietf
Re: New Last Call: 'Tags for Identifying Languages' to BCP
ication to incorporate mechanisms expected in a new part to ISO 639 that is in preparation, but is not made avaialble for use at this time. Another (‘variant’) requires sub-tags to be registered, and requires that the registration indicate prefix sub-tags that they are recommended to be used with. While it may still be technical valid to use a registered variant in some way other than the recommendatation, that will be unlikely (just as certain combinations valid under RFC 3066, such as ja-DE are unlikely). Thus, implementers will have a reasonable chance of anticipating what combinations will be used. The third of these (‘extension’) is defined as mechanism for extending language tags for use in future protocols. There is an upper limit of 25 extensions, though this RFC does not define limits on the length of each extension. There are no extensions defined at this time, and any extension would require specification in the form of a separate RFC. At such time as one or more extension RFCs are defined, those specifications would provide some indication of what limits they do or don’t impose on the length of extensions. In the case of any protocol that supports this proposed revision to RFC 3066 but does not support extensions, any extensions that may be included in a language tag are ignorable. Apart from extensions, all of the mechanisms introduced in the proposed revision were in response to the direction users and implementers were already going with registered tags under RFC 3066. Thus, while the proposed revision gives greater provision for lengthy tags, this is not completely unrestrained, and the practical likelihood of encountering tags of any given length would be no greater under the proposed revision than it was under RFC 3066. Even so, verious changes were suggested to highlight issues related to length, specifically with a view to the possibility that some applications of RFC 3066 (or this proposed revision) would impose fixed limits on the length of tags. These suggestions included notes in that regard in key points within the RFC, but also in sub-tag registrations and in RFC defining extensions. (For instance, a variant registration would include not only a recommendation on appropriate prefixes, but also specific comments on maximal length of tags using the given variant.) There were no suggestions to impose limits on the length of tags in the RFC itself (just as RFC 3066 does not impose limits). Basically, limits on length was seen to be a concern belonging to particular applications of the language-tag spec and not the spec itself, but significant additions would be added to the RFC so that these concerns are highlighted. 6. Re an i18n-considerations section: It was pointed out that language tags are symbolic identifiers with no culture-specific content; the only i18n consideration related to the identifiers themselves are charset, and charset issues are covered in the section on syntax. Bruce was also concerned about i18n considerations in the registry (see issue #2, above – lack of French-language descriptions), but it was pointed out that the content of the registry is not intended as localization data, that there are well-established precedents for code sets that are not documented in terms of multilingual content, and therefore that it was not really necessary to discuss i18n concerns in relation to the registry (no more than it is necessary to have a section to discuss i18n issues in relation to the IANA charset registry in RFC 2978). In conclusion, I think that some of Bruce’s concerns were valid, and suggestions for changes have been presented to the authors accordingly. I believe all of these changes can be considered to be for clarification purposes, rather than technical changes. (No changes affecting the set of valid tags have been made.) Thanks. Peter Constable GIFT | GPTS | MICROSOFT ___ Ietf mailing list [EMAIL PROTECTED] https://www1.ietf.org/mailman/listinfo/ietf
RE: Ietf-languages Digest, Vol 24, Issue 5
specified. Or an encoding > scheme. Personally, I don't see the value in something like that. Given the intent to have a registry that can be machine-readable, changing its charset from ANSI X3.4 in order to gain descriptors in just one more language is not worth it IMO. Speaking at least for Microsoft, we're interested in having descriptors in far more than two languages, and we certainly would not blindly base the descriptors we present to our customers solely on what a registry provides, no matter what its charset. > > > The ABNF in the draft permits all of the following tags which > > > are not legal per the RFC 3066 ABNF: > > > supercalifragilisticexpialidoceus > > > y- > > > x1234567890abc > > > a123-xyz > > > > In fact, none of these is permitted by the ABNF of the draft. > > ABNF from the draft... > That means that the "grandfathered" > production (which is an alternative in the Language-Tag > production) will match any of the following text tags (comments > to the right separated by a semicolon): >x ; ALPHA followed by zero repetitions >xa ; ALPHA followed by one ALPHA (see alphanum) >x- ; ALPHA followed by one HYPHEN > supercalifragilisticexpialidoceus ; ALPHA followed by many ALPHAs >(see alphanum) (example previously given) >x1234567890abc ; ALPHA followed by 13 alphanums >(as previously given) >a123-xyz ; ALPHA followed by three DIGITs (see alphanum) >followed by one HYPHEN followed by three ALPHAs >(example previously given) >y- ; ALPHA followed by five HYPHENs (example previously >given) > > I say the ABNF from draft -08 (quoted above) allows those; > you say no. My mistake; I was thinking beyond the ABNF alone to other constraints imposed by the proposed spec. As you know, the 'grandfathered' production is loose in the ABNF given in the draft, but is very tightly constrained elsewhere in the draft: it is limited to only items registered under RFC 1766 or RFC 3066 up to the date of acceptance of this proposed spec. (In fact, only a subset of those, all explicitly identified in the sub-tag registry.) On the date of acceptance, you will be able to know precisely what the valid tags that fit under the 'grandfathered' production are and will forever be, and it is 100% guaranteed that none of them will have any of the forms that seem to concern you. Peter Constable Microsoft Corporation ___ Ietf mailing list [EMAIL PROTECTED] https://www1.ietf.org/mailman/listinfo/ietf
RE: New Last Call: 'Tags for Identifying Languages' to BCP
ng with language > lacks an "Internationalization considerations" section as > recommended by RFC 2277 (a.k.a. BCP 18). No more or less shocking than for RFC 3066, regarding which I'm not aware of any complaints. I don't quite understand what the critique is here: what is there to internationalize about language tags? They are symbolic identifiers that have no culture-specific content. The only possible consideration is the charset, which for this spec involves ALPHA, DIGIT and "-" only. It's true that ALPHA and DIGIT are not defined and that it would be better to do so; it couldn't hurt to have a section for i18n considerations (wouldn't need to be long). These are very minor concerns, and hardly "shocking". > Perhaps even more disturbing is the content of the "IANA > Considerations" section; the draft predicts that certain things > will happen ("IANA will"[...]), but doesn't actually direct > (e.g. "IANA shall") IANA to do anything. The placement of that > section does not correspond to current RFC-Editor guidelines > (it should appear after Security Considerations); also on that > point, Appendices should precede References. There is a process issue here, but I have assumed that the authors have dealt with IANA on that. Otherwise, these are editorial issues -- "even more disturbing" seems to me to be somewhat overstated. > Many of the references are obsolete (e.g. RFCs 1327, > 1521)... and at least one reference ([19]) > gives a bracketed URI rather than the correctly formatted > RFC reference. Although reference is made to the "Accept- > Language" header field, RFC 3282 (the defining RFC for that > field) is not listed among the references... > The formatting of the draft is atrocious All editorial. > there is no differentiation between normative and > informative references, A valid concern. > I am extremely surprised that the draft has been published > at least nine times in such a state of poor formatting and > poor attention to editorial content (e.g. obsolete and > missing references), and that it progressed as far as IESG > last call in such a state, with no Internationalization > considerations section, etc. In fairness to the authors, page-oriented plain text is not exactly conducive to authoring and revising a long document, and a lot of energy was spent focusing on details that have far more consequence than formatting. And, as mentioned above, the lack of an i18n-concerns section is hardly without precident, and not particularly significant in the case of this spec. This really feels like nit-picking, IMO. I'm left wondering if Bruce has been looking for nits to pick because he is... > ... particularly concerned about the implementation > ramifications of the proposed changes, especially (as > noted in detail above): > 1. the apparent contradiction between the stated > objectives w.r.t. accessibility of relevant ISO data and > standards and the reality of the proposal's > implications (ISO 8601 date format parsing). As mentioned above, this really is a non-issue. > 2. the clear contradiction between the claims about > ABNF compatibility with RFC 3066 and the factual > incompatibility of certain provisions in the grammar. The main concern was with the "grandfathered" production, but I've shown that that is a non-issue. The maximal length issue exists just as much in RFC 3066 due to private-use tags; it is a technical concern that might worth reviewing in RFC 3066bis, however; but it is not insurmountable, and not a new problem. Peter Constable Microsoft Corporation ___ Ietf mailing list [EMAIL PROTECTED] https://www1.ietf.org/mailman/listinfo/ietf