RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
Dave Crocker wrote: And, indeed, I haven't seen much support for the document under discussion. I find statements such as this mind-boggling. Please explain what you mean by much support. There have been at least as many individuals writing mails in favour of the document as against it. Furthermore, it has been made clear that the individuals writing the document and supporting it represent *very* large communities. -- Misha Wolf Standards Manager Chief Architecture Office Reuters -- -- Visit our Internet site at http://www.reuters.com Get closer to the financial markets with Reuters Messaging - for more information and to register, visit http://www.reuters.com/messaging Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Reuters Ltd. ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
On Mon, 10 Jan 2005 11:33:54 GMT, Misha Wolf said: I find statements such as this mind-boggling. Please explain what you mean by much support. There have been at least as many individuals writing mails in favour of the document as against it. Furthermore, it has been made clear that the individuals writing the document and supporting it represent *very* large communities. Support is there. Consensus, however, is quite lacking on this one. pgpCBTZi9YDml.pgp Description: PGP signature ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
Let me take this opportunity to say that Apple, too, strongly supports 3066bis. Deborah Goldsmith Internationalization, Unicode liaison Apple Computer, Inc. [EMAIL PROTECTED] On Jan 10, 2005, at 3:33 AM, Misha Wolf wrote: I find statements such as this mind-boggling. Please explain what you mean by much support. There have been at least as many individuals writing mails in favour of the document as against it. Furthermore, it has been made clear that the individuals writing the document and supporting it represent *very* large communities. ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
JFC (Jefsey) Morfin scripsit: Dear John, thank you to acknowledge that the proposed draft _impose_ something ! It therefore do not report on an existing practice. thank you to acknowledge that the proposed draft even _limits_ the current practice ! thank you to explain that the decision of the user is replaced by an a-priori obligation .. resulting from a decision of a member of this list. The practice that is being limited is that of the language tag review process (the list, the Reviewer, IANA), not of any user. Users are free to use language tags or not, of course. Technically, these remarks are however without incidence on John Klenin's remark: a limitation is only a (negative) extension. And tyranny is only negative liberty, I suppose? Hogwash. (except that the IANA registrations should be transfered to IANA now [...]) Is this supposed to mean something? -- How they ever reached any conclusion at all[EMAIL PROTECTED] is starkly unknowable to the human mind. http://www.reutershealth.com --Backstage Lensman, Randall Garrett http://www.ccil.org/~cowan ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
[EMAIL PROTECTED] scripsit: What would be really nice is to specify a parameterized matching algorithm (or more precisely, an algorithm family) along the lines of the stringprep family of string normalization algorithms. But I'm unsure if there's sufficient time and interest available to do this. But it is nice to dream... That would be a Good Thing indeed. However, it is definitely out of scope for this draft, as it would stretch the definition of BCP well beyond the breaking point. If there's any defending the presence of an *algorithm* in a BCP at all, it's because we are not making the algorithm normative, but just saying The most commonly used algorithm is. -- [W]hen I wrote it I was more than a little John Cowan febrile with foodpoisoning from an antique carrot [EMAIL PROTECTED] that I foolishly ate out of an illjudged faith www.ccil.org/~cowan in the benignancy of vegetables. --And Rosta www.reutershealth.com ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
John C Klensin scripsit: In RFC 3066, it is only a heuristic (or examination of the IANA registry, which is not machine-parseable) that tells the meaning of the second subtag the existing registered tag sr-Latn. In the draft, its meaning is unambiguously specified a priori. So? So it is meaningless to talk about breaking backward compatibility when the behavior in question is a heuristic (or to quote Ned, it works in most but not all cases). Registration of new tags under RFC 3066 can and will break the heuristic all by themselves. The new draft talks about scripts, but the existing registered tags talk about scripts too. -- John Cowan www.reutershealth.com www.ccil.org/~cowan [EMAIL PROTECTED] 'Tis the Linux rebellion / Let coders take their place, The Linux-nationale / Shall Microsoft outpace, We can write better programs / Our CPUs won't stall, So raise the penguin banner of / The Linux-nationale. --Greg Baker ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
[EMAIL PROTECTED] scripsit: Now, it may be the case that all _registered_ tags have avoided the use of non-country code two letter codes in the third and later position. But this is 100% irrelevant. If you say so. The point is that conformant code implementing RFC 3066 is broken if it simply assumes any 2 letter code after the first subtag is a country code. Rather, the rule is simply that a country code, if present, always appears as a two letter second subtag. Not quite. The rule is that a 2-letter second subtag is a country code. Country codes could have appeared elsewhere, and may still wind up doing so before RFC 3066 is obsoleted. But it is wrong for a compliant 3066 implementation to assume that such a two letter code is a country code! I really cannot fathom why this issue is so hard for you to understand. The new draft changes this rule, so applications that pay attention to coutnry codes in language tags have to change and the new algorithm for finding the country code is trickier. But not much. As an advantage, country codes can always be found in the new draft, whereas in RFC 3066 they could in principle be anywhere. Not really. Anyhing that puts a country code in some other location in the 3066 world isn't going to get the benefit of automatic recognition of the code as such by a 3066-compliant parser. (A private correspondent notes that the reference to -x- should in fact be a reference to any singleton, though -x- and i- are the only singletons currently usable.) I have to say I find it quite interesting that one of the main proponents of the new draft, while arguing that the new draft doesn't make the matching problem a lot harder, ended up giving an erroneous rule for extracting country codes from a language tag. Like other people, I sometimes post when tired; I don't think this particularly interesting. Whereas everyone who writes code when they implement this stuff will be as fresh as a daisy? Sure, in the general case most if not all of these nasty corner cases you've created can be blithly assumed away because they only appear in specific problem domains. Actual applications that operate in those specific domains aren't so lucky, however. And the metric we're supposed to apply in the IETF is real world implementability. I fail to see what this has to do with the merit of marking scripts in language tags. The preferred IETF charset, UTF-8, contains no information about script whatever. Sadly, the IETF's preferences haven't managed to catch on in many parts of the world. As it happens I deal with messaging applications, and in this space text/plain with all sorts of nasty charset issues is the rule, not the exception. Extended language tags will neither help nor harm you, then. This actually may be true, because as I have said before, the likely outcome if this draft is adopted in its present form will be that it will simply be ignored in its entirety. But is this what we want? Ned ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
Rather, the rule is simply that a country code, if present, always appears as a two letter second subtag. The new draft changes this rule, so applications that pay attention to coutnry codes in language tags have to change and the new algorithm for finding the country code is trickier. Your text above says (a) if there is a country code in the tag, it is the second subtag. That is not what text of RFC 3066 actually says, which is: The following rules apply to the second subtag: All 2-letter subtags are interpreted as ISO 3166 alpha-2 country... That is, it says (b) if a second subtag has 2 letters, then it is an ISO 3166 code, which is not the same as (a). (It is almost, but not quite, the converse.) Fine, whatever. The current RFC certainly does not forbid the use of country codes in other positions in language tags. One could absolutely register en-Latin-US, for example, meaning English as spoken in the US written in Latin script. Sure, but my point was, is, and always has been that any 3066-compliant implementation won't see this as a country code (unless it is table driven, which brings up its own set of issues). There has been a lot of noise on this issue, and too few concrete examples. No, what there has been is a lot of discussion of a real problem with no apparent recognition of it as such by the draft authors. Your pejorative characterization of this as noise does not make it so. In the so-called 3066bis draft, we have striven very hard to ensure that: (c) Every single tag that could be generated under RFC 3066bis is a tag that could have been registered under RFC 3066. True but irrelevant. Thus if someone wrote a parser that is future-compatible -- that could parse all RFC 3066 language tags including those registered after the parser was deployed -- then that parser can handle all 3066bis language tags. This is a significant advance over RFC 3066, whose registered (not generated) language tags are atomic, and cannot be effectively parsed at all. 3066bis adds more structure so as to allow effective parsing of tags. If you *can* come up with tags that would show that (c) is invalid, that would be a concrete case that we would have to make adjustments in the draft for. (c) is frankly not an issue I care one whit about. (Perhaps I should, but I don't.) I don't register tags. I write code that processes, and more to the point matches, tags. That's why I have issues with this draft. Moreover, all the talk about this being *too* complex is far overblown. Again, your pejorative dismissal of other people's concerns does not mean your position is valid. All 3066bis language tags can be parsed, including all the grandfathered codes, with a very short piece of code, or even with a regular expression (such as in Perl). Of course you can write a short piece of code to parse this stuff. It's what you do with it after you parse it that's a problem. This is not rocket science. Parsing almost never is. But simply parsing these tag is not, and never has been, the issue. Ned ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
From: [EMAIL PROTECTED] [mailto:ietf-languages- [EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Again, your pejorative dismissal of other people's concerns does not mean your position is valid... Parsing almost never is. But simply parsing these tag is not, and never has been, the issue. I think you guys are in violent agreement over country codes within a tag, and that the debate over intrepreting the wording of RFC 3066 serves no purpose. I think the intent of Mark's dismissal has been to refute perceived-invalid objections, in which case we need to consider that the line between perceived-invalid and truly-invalid has been blurred simply by the volume of discussion (the noise factor). There have been some invalid objections that bear some similarity to comments Ned has made as he has tried to make his point. (E.g. Bruce Lilly has claimed invalid back-compat problems on the incorrect premises that RFC 3066 does not permit ISO 3166 country codes except as second subtags or does not permit second subtags that are not country codes (at the moment I forget if it was one or the other or both).) But Ned's concerns are legitimate, I think. I'd say they are not necessarily blocking issues for this draft, because I think a possible outcome of discussion is to characterize them as concerns about outstanding issues that need to be solved rather than as concerns over the draft itself; but I do think they are valid concerns that deserve attention. In a nutshell, Ned was elaborating on a comment from Dave Singer that, once we have parsed a pair of tags and identified all the pieces, it's not a trivial matter to decide in every case how the two tags compare, and that there are factors that would exist if the draft were approved that didn't exist under RFC 3066. Again, I think this is a question that deserves discussion. In relation to the proposed draft, I don't see it as a particular problem with the draft. It is a problem that doesn't exist in RFC 3066, but that is only because RFC 3066 left us with bigger problems: it doesn't give us any way to identify pieces that we would be encountering in registered tags (apart from hard-coded tables compiled from versions of the registry that pre-exist a given implementation). RFC 3066 permits tags that have all kinds of internal structures. That is a problem as it will never allow us to derive much useful information from a tag with any confidence -- only the ISO 639 language category and in some cases a country category. I predict that in the future we will be seeing a significant number of tags (whether sanctioned without registration by a successor to RFC 3066 or as tags registered under RFC 3066) that go beyond the patterns 'll(-CC) and lll(-CC). If we stick with RFC 3066, we will have no way of writing forward-compatible processors that will be able to do very useful matching. What this draft does is impose some order to all the other patterns within tags that are permitted, and tell us what the different pieces must be. As a result, we have more named pieces to deal with, and we are presented with the question that Ned raised: Now we have more named pieces than we did before; what do we do with them? That is a problem that will need to be addressed. But I don't think it's a reason to oppose the draft, since opposing the draft (or at least opposing any revision that introduces a richer internal structure) leaves us in a situation that must be characterized either as a worse problem or as turning our backs on increased functionality to meet real user needs. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
--On Thursday, 06 January, 2005 06:35 -0800 [EMAIL PROTECTED] wrote: ... Extended language tags will neither help nor harm you, then. This actually may be true, because as I have said before, the likely outcome if this draft is adopted in its present form will be that it will simply be ignored in its entirety. But is this what we want? Actually, Ned, my concern goes a little beyond ignored in its entirety. If this thing were adopted as a separate standard, with some scope of applicability, and it were completely ignored, that would not really be harmful except on the great scoreboard of standards issued versus standards used. But, if, in the process, it succeeds in identifying 3066 as Obsolete without replacing it with anything that is usable and compatible, that could cause a serious reduction in interoperability in the areas where 3066, today, is good enough or just about good enough. That brings me back to what I've tried to suggest several times but which, as you observe, the authors of this draft seem to dismiss either as noise or as a no change to 3066 under any circumstances position.If the purpose is to get this model out there, rather than replacing 3066 because it is generally offensive, there is a fairly quick way to get this document unstuck: (1) Remove the notion and statements about obsoleting 3066. This would probably require a change of title and some introductory material, but that shouldn't be a big deal if the real goal is to get it finished and published. (2) Create a new section, called applicability that contains at least one example of how and where this system would be used. I'm not wild about it but, personally, I'd settle for some vague handwaving like places where more comprehensive identification is needed than is provided by 3066. (3) Ask the IESG to approve publication of the thing as either a Proposed Standard or an Experimental document, as they believe best matches the needs and consensus of the community. The document is not a good candidate for BCP for three reasons: (i) as you and I have commented, it contains algorithmic and protocol specifications, rather than just specifying a registration procedure. 3066 was, IMO, marginal in that regard; this is well over the line. (ii) As Jefsey has pointed out, this is not yet a current practice, best or otherwise. (iii) Two BCP documents covering the same space would be certain to create confusion unless the applicability differences were much more clearly stated than I think anyone is prepared to do. Then we let the market sort the situation out. If the proposers of this specification are right about how important the additional detail are, I'd expect to see Content-language: 3066-tag X-Extended-Content-language: new-tag and its equivalents show up all over the email environment and the web. The interpretation and matching issues would either sort themselves out or they wouldn't. If it became clear, in practice, that this was the right way to go for a broader range of applications, writing a short RFC to update the applicability statement (and to move the thing from Experimental to Proposed Standard if the IETF went the experimental route), would be pretty trivial. If that range appeared to the community to be subsuming all of the applications of 3066, the same document could provide the obsoletes 3066 decision. You've probably got a prediction about how likely the broadest form of outcome is, and it probably matches mine. But the IETF does not, and should not try, to impose technologies by replacing working standards, and, despite my biases about experience and processes, our predictions should ultimately be no more determinative than that of the authors. Let's separate this from replaces 3066, get it out there, see how important and useful the marketplace thinks it is, and let the marketplace (and the experimental/ proposed standard models) sort of the implementability and usability problems. john ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
--On Thursday, 06 January, 2005 07:42 -0800 Peter Constable [EMAIL PROTECTED] wrote: ... But Ned's concerns are legitimate, I think. I'd say they are not necessarily blocking issues for this draft, because I think a possible outcome of discussion is to characterize them as concerns about outstanding issues that need to be solved rather than as concerns over the draft itself; but I do think they are valid concerns that deserve attention. Peter, as soon as we get to valid concerns that deserve attention, we remove the proposed document, I believe, as a candidate for BCP. We don't have any provision in the BCP rules for pushing a document forward that identifies valid concerns and other loose ends rather than having those issues resolved sufficiently that we can talk about a practice. So it means that either * The document needs to be withdrawn, these (and other) concerns sorted out, and a new document produced that addresses them. or * The document needs to be recast into Proposed Standard or Experimental form, because we do have ways, there, to say these are known outstanding issues that deserve attention That, of course, doesn't solve some other strategic/ positioning issues with it; see my recent other note. john ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
On Thu, 06 Jan 2005 11:04:54 -0500, John C Klensin wrote: Peter, as soon as we get to valid concerns that deserve attention, we remove the proposed document, I believe, as a candidate for BCP. That pretty much applies to all specifications. A Last Call that produces any sort of serious concern that folks feel should be taken seriously means that the document is not yet ready for approval. It occurs to me that a Last Call for an independent submission has an added requirement to satisfy, namely that the community supports adoption of the work. We take a working group as a demonstration of community support. (However we used to pressure for explicit statements during Last Call.) My feeling is that an independent submission MUST show significant support during Last Call. In other words, a working group document getting IETF Last Call has something of a Default Yes. And independent submission needs to be Default No. And, indeed, I haven't seen much support for the document under discussion. d/ -- Dave Crocker Brandenburg InternetWorking +1.408.246.8253 dcrocker a t ... WE'VE MOVED to: www.bbiw.net ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
3066) that go beyond the patterns 'll(-CC) and lll(-CC). If we stick with RFC 3066, we will have no way of writing forward-compatible processors that will be able to do very useful matching. I want to reinforce what Peter has said. In RFC 3066 we have already registered language tags like zh-Hans, and zh-Hant. Nobody can parse out the script in the language tag because RFC 3066 does not provide for identification of the pieces. During the development of 3066bis, we have been holding off on registering all of the country variants of these, because we didn't want them to be redundant with the generated codes in 3066bis. If we don't get 3066bis, then we will end up needing to register the combinations zh-Hans-CN, zh-Hant-CN, zh-Hans-HK, zh-Hant-HK, zh-Hans-MO, zh-Hant-MO, zh-Hans-SG, zh-Hant-SG, zh-Hans-TW, zh-Hant-TW. And zh is just one example. There are many languages that can be written in different scripts, where it is important as a matter of practice to be able to distinguish the script as well as the country. There are very good reasons to have the script code before the country code, because differences by script swamp differences by country. Suppose that you are composing a web page by pulling together different pieces of data, and your target is Chinese simplified for Hong Kong. For one of those data sources, there is not an exact match. Given a choice between a data source in Chinese simplified, or a data source in Chinese Hong Kong (but traditional), you really want to pick the Chinese simplified. That is reflected in the use of the script value second (zh-Hant-HK), so that the common process of truncation will get the right result. This is similar to the reason why the language code comes before the country code. If we had the order CH-fr, then we could end up mixing French and German in the same page, because we would fall back (for one of the data sources) from CH-fr to CH, which could be German. Mark - Original Message - From: Peter Constable [EMAIL PROTECTED] To: [EMAIL PROTECTED]; ietf@ietf.org Sent: Thursday, January 06, 2005 07:42 Subject: RE: draft-phillips-langtags-08, process, sp ecifications,stability, and extensions From: [EMAIL PROTECTED] [mailto:ietf-languages- [EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Again, your pejorative dismissal of other people's concerns does not mean your position is valid... Parsing almost never is. But simply parsing these tag is not, and never has been, the issue. I think you guys are in violent agreement over country codes within a tag, and that the debate over intrepreting the wording of RFC 3066 serves no purpose. I think the intent of Mark's dismissal has been to refute perceived-invalid objections, in which case we need to consider that the line between perceived-invalid and truly-invalid has been blurred simply by the volume of discussion (the noise factor). There have been some invalid objections that bear some similarity to comments Ned has made as he has tried to make his point. (E.g. Bruce Lilly has claimed invalid back-compat problems on the incorrect premises that RFC 3066 does not permit ISO 3166 country codes except as second subtags or does not permit second subtags that are not country codes (at the moment I forget if it was one or the other or both).) But Ned's concerns are legitimate, I think. I'd say they are not necessarily blocking issues for this draft, because I think a possible outcome of discussion is to characterize them as concerns about outstanding issues that need to be solved rather than as concerns over the draft itself; but I do think they are valid concerns that deserve attention. In a nutshell, Ned was elaborating on a comment from Dave Singer that, once we have parsed a pair of tags and identified all the pieces, it's not a trivial matter to decide in every case how the two tags compare, and that there are factors that would exist if the draft were approved that didn't exist under RFC 3066. Again, I think this is a question that deserves discussion. In relation to the proposed draft, I don't see it as a particular problem with the draft. It is a problem that doesn't exist in RFC 3066, but that is only because RFC 3066 left us with bigger problems: it doesn't give us any way to identify pieces that we would be encountering in registered tags (apart from hard-coded tables compiled from versions of the registry that pre-exist a given implementation). RFC 3066 permits tags that have all kinds of internal structures. That is a problem as it will never allow us to derive much useful information from a tag with any confidence -- only the ISO 639 language category and in some cases a country category. I predict that in the future we will be seeing a significant number of tags (whether sanctioned without registration by a successor to RFC 3066 or as tags registered under RFC 3066) that go beyond the patterns 'll(-CC) and lll(-CC). If we stick with RFC 3066, we will have no way of writing
Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
Dave Crocker wrote: It occurs to me that a Last Call for an independent submission has an added requirement to satisfy, namely that the community supports adoption of the work. We take a working group as a demonstration of community support. (However we used to pressure for explicit statements during Last Call.) My feeling is that an independent submission MUST show significant support during Last Call. In other words, a working group document getting IETF Last Call has something of a Default Yes. And independent submission needs to be Default No. Pretty close. Certainly the default can't be Yes. However the reason why many things come in as individual submissions is that the community doesn't care much. So if the IESG is satisfied enough to put out a last call, and nobody responds -- it doesn't have community support -- the default community position shouldn't be no but no objection. (In this specific case it appears to me, as an outsider, that there has been significant objection, and not all of it dismissable.) swb ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
Dave, While we are pretty much in agreement, three observations, one based on Scott's default no objection observation. (1) I think you are right that there are two issues with an independent submission, one of which is the notion of support that doing something is a good idea. And I agree that, for WG efforts, we pretty much sort that issue out at charter time so it would take strong evidence at Last Call time to be persuasive that the community is not interested. For that part, I agree with you on the default no part -- either the community cares enough that the document should be part of our corpus of standards-track materials, or it doesn't and no objection doesn't seem to be a good option (even though the appropriate threshold for cares enough might be debated). But there is also the set of issues associated with given that we are interested, it is technically adequate, does it solve the stated (or implied) problems, and similar questions. And, for that piece, I think Scott's default no objection is about the right target: it is reasonable for the community, as it figures out how to allocate resources, to say this is worth doing, looks close enough, and we trust the people who have done the work sufficiently. (2) It is orthogonal to the issues you have raised, but I believe that barriers to approval to something that is intended to replace something that is deployed and working should be higher than the barrier for a new piece of work in a new area of application with no potential for screwing things up. One could try to further grade that (or not) based on degree of backward compatibility in applications and use. I think that treating replacement of different protocol or procedure or model as needing a higher degree of certainty than new work is what the IETF has done all along without ever being explicit about it as a principle. But it seems important to flag in this case. (3) Finally, there is apparently a procedural oddity with this document. The people who put it together apparently held extended discussions on the ietf-languages mailing list, a list that was established largely or completely to review registrations under 3066 and its predecessors.My understanding at this point is that their good-faith impression was that the discussions on that list were essentially equivalent to those of a WG. As a result, they came into this Last Call process believing that their document ought to be treated very much like a WG one with the presumption that all of the relevant people were aware of their efforts and on their list and hence that their consensus on the document should create a default yes to both of the key questions that you identify. I think that conclusion is wrong, precisely because they didn't have the benefit of the struggles about scope and charter details, and the announcements that the effort was going on, that invariably accompany a WG formation process. And I know of at least a case or two of IETF participants who would have felt obligated to carefully track a WG chartered to replace 3066 who were not sufficiently aware of this effort to track it except as an evolving individual submission draft. The question of how that process confusion arose, and whether we are doing the right things in cases like this, might be something the community should examine at some point, but it is largely independent of how this document should be treated going forward. regards and best new year's wishes, john --On Thursday, 06 January, 2005 09:02 -0800 Dave Crocker [EMAIL PROTECTED] wrote: On Thu, 06 Jan 2005 11:04:54 -0500, John C Klensin wrote: Peter, as soon as we get to valid concerns that deserve attention, we remove the proposed document, I believe, as a candidate for BCP. That pretty much applies to all specifications. A Last Call that produces any sort of serious concern that folks feel should be taken seriously means that the document is not yet ready for approval. It occurs to me that a Last Call for an independent submission has an added requirement to satisfy, namely that the community supports adoption of the work. We take a working group as a demonstration of community support. (However we used to pressure for explicit statements during Last Call.) My feeling is that an independent submission MUST show significant support during Last Call. In other words, a working group document getting IETF Last Call has something of a Default Yes. And independent submission needs to be Default No. And, indeed, I haven't seen much support for the document under discussion. d/ -- Dave Crocker Brandenburg InternetWorking +1.408.246.8253 dcrocker a t ... WE'VE MOVED to: www.bbiw.net ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf ___ Ietf mailing list Ietf@ietf.org
Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
This is similar to the reason why the language code comes before the country code. If we had the order CH-fr, then we could end up mixing French and German in the same page, because we would fall back (for one of the data sources) from CH-fr to CH, which could be German. It has to be application-specific which fallback happens. If the user says he's swiss french, and the the content has alternative offers for swiss german or french french, which do you present? If the content actually differs for legal or geographic reasons ('the legal representative in your country is', 'for copyright reasons this edition differs in material ways from other countries'), then the correct country but wrong language is the best answer. If the desire is simply for maximum intelligibility, then the reverse is true. -- David Singer Apple Computer/QuickTime ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
I notice two main types of arguments going on in this thread, where it seems to me that there is frustration and talking past each other occurring due to fundamentally different concerns and assumptions between different constituencies. One type of conflict seems to me between what I will term, for convenience (and please, I don't want to get side-tracked on my choice of terms -- I just want convenient words) implementors vs. linguists. By implementors, I mean those whose concern is primarily on how to interpret (act on) received language tags -- consumers of language tags, where falling back to a general or compatible match may be desirable when an exact match is not available. From their point of view, the most important aspect of language tags is being able to parse and match them -- exact linguistic purity and accuracy is a secondary issue. From their point of view, the addition of new tags, regardless of whether the new tags improve language tagging accuracy, may be actively harmful unless accompanied by improved matching rules. To the extent that the adding of tags moves beyond simple registration of new tags, and instead into new forms of tags and new rules for interpreting tags, that is, that the new tags bring up fundamental matching algorithm questions, that becomes the main concern for this group. There are what I will refer to as linguistic purists, whose concern is primarily on having precise, accurate tags availabel for languages. (These may be people whose orientation is on generating content, and labelling it accurately.) For this group, the most important aspect of language tags is having them be accurate and precise. Any matching issue (and in particular issues of trying to fall back to a more generic match when an exact match is not available) are secondary. The opinion on whether a tag is useful then varies: it's useful if I know how to match it vs. it's useful if it's accurate. An example where the difference in orientation shows up is with the position of script vs. country in tags. From the linguistic point of view, there are arguments for having script come first. But from the implementation point of view, that is less backwards-compatible with 3066, hence more problematic. The process question of whether this is appropriately a BCP, or whether it is at least implicitly bringing up algorithmic implementation issues and hence instead ought to be perhaps a Proposed Standard or an Experimental Standard, also has something to do with this difference in orientation. A second type of argument, (which I should mention I have largely tuned out so this is my superficial and not very informed take on it), seems to me to be more linguistic/political in nature, which is what is the correct (linguistically correct? politically correct?) way to name the tags: what sort of naming scheme corresponds to linguistic reality, or what sort of naming scheme is politically acceptable, and is there a conflict there. This does get back to the algorithmic matching issue in a sense though, which is that if one wants some sort of hierarchical structure to the tags (to allow easier matching), or indeed define any sort of matching rules (as an implementor wants), you're probably getting right into some political questions about how matching should work. So for those who wanted to stick just to linguistic accuracy and try to avoid political issues, trying to avoid discussion of algorithmic matching may have seemed appealing (but then provides no help to what I've termed the implementors). If we can keep in mind that there are different constituencies interested in language tags, with different main concerns, then I would hope for less frustration and irritation with others missing the main point, so that constructive discussions can occur, leading to some compromise useful to everyone. Regards, Kristin ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
Dave Singer scripsit: It has to be application-specific which fallback happens. If the user says he's swiss french, and the the content has alternative offers for swiss german or french french, which do you present? If the content actually differs for legal or geographic reasons ('the legal representative in your country is', 'for copyright reasons this edition differs in material ways from other countries'), then the correct country but wrong language is the best answer. If the desire is simply for maximum intelligibility, then the reverse is true. Absolutely, which is why the fallback rule isn't and can't be a protocol-level transaction or (a fortiori) an interop issue. It's simply a useful default in many circumstances. -- All Norstrilians knew what laughter was:John Cowan it was pleasurable corrigible malfunction.http://www.reutershealth.com --Cordwainer Smith, Norstrilia [EMAIL PROTECTED] ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
From: Dave Singer [mailto:[EMAIL PROTECTED] This is similar to the reason why the language code comes before the country code. If we had the order CH-fr, then we could end up mixing French and German in the same page, because we would fall back (for one of the data sources) from CH-fr to CH, which could be German. It has to be application-specific which fallback happens. If the user says he's swiss french, and the the content has alternative offers for swiss german or french french, which do you present? If the content actually differs for legal or geographic reasons ('the legal representative in your country is', 'for copyright reasons this edition differs in material ways from other countries'), then the correct country but wrong language is the best answer. If the desire is simply for maximum intelligibility, then the reverse is true. But that is a level of decision making that goes well beyond any algorithm that simply uses truncation of tags, which is the only case in which the ordering of sub-tags matters. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
At 11:34 AM -0800 1/6/05, Peter Constable wrote: From: Dave Singer [mailto:[EMAIL PROTECTED] This is similar to the reason why the language code comes before the country code. If we had the order CH-fr, then we could end up mixing French and German in the same page, because we would fall back (for one of the data sources) from CH-fr to CH, which could be German. It has to be application-specific which fallback happens. If the user says he's swiss french, and the the content has alternative offers for swiss german or french french, which do you present? If the content actually differs for legal or geographic reasons ('the legal representative in your country is', 'for copyright reasons this edition differs in material ways from other countries'), then the correct country but wrong language is the best answer. If the desire is simply for maximum intelligibility, then the reverse is true. But that is a level of decision making that goes well beyond any algorithm that simply uses truncation of tags, which is the only case in which the ordering of sub-tags matters. Sorry, I should have gone on to conclude: the important aspect of sub-tags is that their nature and purpose be identifiable and explained (e.g. that this is a country code), and that we retain compatibility with previous specifications. This tagging uses order (and size) of sub-tags rather than explicit labels to say what something is, and we're stuck with that. I don't believe that simple truncation is a necessarily useful operation in all circumstances, and it probably should not be in the spec. at all. For example, I'd say that we should retain the 3066 ordering of language-country and therefore script, if needed, comes later. However, my typesetting subsystem doesn't care a jot about language or country, it just needs to find the script code ('can I render this script'?). This spec. should unambiguously allow me to extract the language, country, script etc., should say under what circumstances two sub-tags of any type match, state the obvious that two tags exactly match if they have the same sub-tags and they all match, that partial perfect matches (of tags with differing numbers of sub-tags) are possible and may be applicable, and that the use of imperfect matches (in which not all sub-tags match) has to be application-specific. Examples of why on the latter would be helpful. -- David Singer Apple Computer/QuickTime ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
From: Dave Singer [mailto:[EMAIL PROTECTED] Sorry, I should have gone on to conclude: the important aspect of sub-tags is that their nature and purpose be identifiable and explained (e.g. that this is a country code), and that we retain compatibility with previous specifications. Ah! Then the proposed draft ensures that the nature of subtags are always identifiable, which RFC 3066 (as I mentioned earlier) fails to do. And the draft retains compatibility with previous specifications using an assumption (thoroughly discussed and concluded on the IETF-languages list a year ago) that, in case of left-prefix matching processes, script distinctions are generally far more important that country distinctions. I don't believe that simple truncation is a necessarily useful operation in all circumstances, I don't think anyone would dispute that. and it probably should not be in the spec. at all. For example, I'd say that we should retain the 3066 ordering of language-country and therefore script, if needed, comes later. However, my typesetting subsystem doesn't care a jot about language or country, it just needs to find the script code ('can I render this script'?). Here I disagree. For other purposes, I think it's very clear that the only time that choice of order matters is with matching algorithms that use simple truncation, and for the most common implementations, which use left-prefix truncation, the order lang-script-country will be far more useful in the long run precisely because script distinctions are generally far more important in matching than country distinctions. I don't know of any case in which a tag might be used that contained all three subtags but in which the country distinction generally matters more than the script distinction. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
Dave Singer scripsit: This spec. should unambiguously allow me to extract the language, country, script etc., It does (and RFC 3066 does not). should say under what circumstances two sub-tags of any type match, state the obvious that two tags exactly match if they have the same sub-tags and they all match, that partial perfect matches (of tags with differing numbers of sub-tags) are possible and may be applicable, and that the use of imperfect matches (in which not all sub-tags match) has to be application-specific. Except for the point that tags match if they are (case-insensitively) identical, which is already made, we don't believe that any of these other things can be normatively enunciated. -- Schlingt dreifach einen Kreis vom dies!John Cowan [EMAIL PROTECTED] Schliesst euer Aug vor heiliger Schau, http://www.reutershealth.com Denn er genoss vom Honig-Tau, http://www.ccil.org/~cowan Und trank die Milch vom Paradies.-- Coleridge (tr. Politzer) ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
John C Klensin scripsit: Content-language: 3066-tag X-Extended-Content-language: new-tag This reflects a fundamental misunderstanding of what the draft does compared to what RFC 3066 does. It imposes *more* restraints on language tags, not fewer. The RFC 3066 language tag registration process can register tags with almost unpredictable meaning once one gets past the first subtag. The draft *limits* the possible tags to a small subset, and tightens up the allowable semantics. It allows no tag to be used that was not already registerable under RFC 3066. In RFC 3066, it is only a heuristic (or examination of the IANA registry, which is not machine-parseable) that tells the meaning of the second subtag the existing registered tag sr-Latn. In the draft, its meaning is unambiguously specified a priori. -- John Cowan [EMAIL PROTECTED] http://www.ccil.org/~cowan Raffiniert ist der Herrgott, aber boshaft ist er nicht. --Albert Einstein ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
At 12:14 PM -0800 1/6/05, Peter Constable wrote: From: Dave Singer [mailto:[EMAIL PROTECTED] Sorry, I should have gone on to conclude: the important aspect of sub-tags is that their nature and purpose be identifiable and explained (e.g. that this is a country code), and that we retain compatibility with previous specifications. Ah! Then the proposed draft ensures that the nature of subtags are always identifiable, which RFC 3066 (as I mentioned earlier) fails to do. And the draft retains compatibility with previous specifications using an assumption (thoroughly discussed and concluded on the IETF-languages list a year ago) that, in case of left-prefix matching processes, script distinctions are generally far more important that country distinctions. as has been beautifully pointed out on the list, that is a view that is lingo-centric. If what I am trying to differentiate is the price (and the currency of the price) of an item, the country may be much more important than the script that the price is written in. (this is also an example for the last point below). I repeat, I don't think truncation -- and hence prefix-matching -- is very stable or nearly universally applicable enough to be mentioned. Whereas I do believe compatibility of ordering with 3066 is important. I don't believe that simple truncation is a necessarily useful operation in all circumstances, I don't think anyone would dispute that. and it probably should not be in the spec. at all. For example, I'd say that we should retain the 3066 ordering of language-country and therefore script, if needed, comes later. However, my typesetting subsystem doesn't care a jot about language or country, it just needs to find the script code ('can I render this script'?). Here I disagree. For other purposes, I think it's very clear that the only time that choice of order matters is with matching algorithms that use simple truncation, and for the most common implementations, which use left-prefix truncation, the order lang-script-country will be far more useful in the long run precisely because script distinctions are generally far more important in matching than country distinctions. I don't know of any case in which a tag might be used that contained all three subtags but in which the country distinction generally matters more than the script distinction. -- David Singer Apple Computer/QuickTime ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
Dave Singer scripsit: as has been beautifully pointed out on the list, that is a view that is lingo-centric. If what I am trying to differentiate is the price (and the currency of the price) of an item, the country may be much more important than the script that the price is written in. (this is also an example for the last point below). Using the language-tag to retrieve the country implicitly referenced in the content is far more unreliable than prefix-matching. Just because this document is written in en-US doesn't mean I can't refer to the price of some consumer device as 200,000 yen. I repeat, I don't think truncation -- and hence prefix-matching -- is very stable or nearly universally applicable enough to be mentioned. It's there to clarify the rule already given in RFC 3066. Whereas I do believe compatibility of ordering with 3066 is important. RFC 3066 already supports tags that don't fit. -- John Cowan www.reutershealth.com www.ccil.org/~cowan [EMAIL PROTECTED] Arise, you prisoners of Windows / Arise, you slaves of Redmond, Wash, The day and hour soon are coming / When all the IT folks say Gosh! It isn't from a clever lawsuit / That Windowsland will finally fall, But thousands writing open source code / Like mice who nibble through a wall. --The Linux-nationale by Greg Baker ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] I notice two main types of arguments going on in this thread, where it seems to me that there is frustration and talking past each other occurring due to fundamentally different concerns and assumptions between different constituencies... I have feet in both the implementors and linguistic purists camps, and so think I understand both. But there are many points on which I don't agree with your assessment. From [the implementors'] point of view, the most important aspect of language tags is being able to parse and match them -- exact linguistic purity and accuracy is a secondary issue. I would say as an implementor that it's important to find appropriate ways to match tags that meet legitimate needs in realistic scenarios in the best way we can, and to be aware of behaviour that will be experienced when using existing implementations, making sure that any degredation of behaviour is known and accepted to be offset by benefits, and that there are no really bad behaviours that may result. I would consider exact linguistic purity secondary since this system is not intended to document linguistic realities but to provide useful behaviours related to differences in language usage in information systems. From [the implementors'] point of view, the addition of new tags, regardless of whether the new tags improve language tagging accuracy, may be actively harmful unless accompanied by improved matching rules. Here, I disagree, unless this statement is to be understood in a hypothetical way -- a priori, it would be possible to make changes that are harmful, but I do not assume that addition of new tags is necessarily harmful. To the extent that the adding of tags moves beyond simple registration of new tags, and instead into new forms of tags and new rules for interpreting tags, that is, that the new tags bring up fundamental matching algorithm questions, that becomes the main concern for this group. There are no new forms of tags proposed! The draft would impose *restraints* on the forms that tags can take, and define precisely what forms tags could take. This is a point where there may be some talking past each other. Some people are speaking from a position in which it is assumed that the part of a tag that refers to country can be predicted to be in the second-subtag position. Those supporting the draft are responding that RFC 3066 does not assume this; it only implies that the only case in which a country code can be reliably recognized as such is when it is the second subtag. The former assume that we should continue to keep country codes in second position because that's the place we've been able up to now to recognize it. The latter respond that - existing implementations will still be able to recognize it when its in that place - RFC 3066 permits it to come in other places, but existing implementations will never be able to recognize it more than heuristically - that the new draft would allow new implementations to *always* recognize it in any tag, and - *as implementors* it is thought that requiring that country codes only ever come in that place is *not* what will provide the best behaviours for users (specifically in cases where script and country subtags are both used). For [linguistic purists], the most important aspect of language tags is having them be accurate and precise... Any matching issue (and in particular issues of trying to fall back to a more generic match when an exact match is not available) are secondary. For the linguist, what matters is the functional behaviour of the system, including matching, but not the implementation. The linguist, per se, has no opinion on what the internal structure of tags should look like; they only specify what the functional requirements of the overall system should be, and which tradeoffs in functionality are better or worse. But maybe I haven't got the same picture of the distinction between the implementor and the linguistic purist that you intend. A second type of argument... seems to me to be more linguistic/political in nature, which is what is the correct (linguistically correct? politically correct?) way to name the tags: what sort of naming scheme corresponds to linguistic reality, The question of what the relationship between the naming scheme an ontologies is important inasmuch as knowing the ontology informs us of what kinds of distinctions need to be made and kinds of relationships may exist between those kinds of distinctions, and that guides us in determining functional requirements, which should be the basis of implementations. (Once again, a pointer to a white paper on these issues from a few years ago: http://www.sil.org/silewp/abstract.asp?ref=2002-003.) or what sort of naming scheme is politically acceptable, and is there a conflict there. This does get back to the algorithmic matching issue in a sense though, which is that if
RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
I'm sorry, this example I gave doesn't correspond to *language* matching. My error. My apologies. (Nor should my questions on this subject be seen as suggesting either that I as an individual, or particularly Apple as a company, is unhappy revising RFC 3066.) At 12:35 PM -0800 1/6/05, Dave Singer wrote: At 12:14 PM -0800 1/6/05, Peter Constable wrote: From: Dave Singer [mailto:[EMAIL PROTECTED] Sorry, I should have gone on to conclude: the important aspect of sub-tags is that their nature and purpose be identifiable and explained (e.g. that this is a country code), and that we retain compatibility with previous specifications. Ah! Then the proposed draft ensures that the nature of subtags are always identifiable, which RFC 3066 (as I mentioned earlier) fails to do. And the draft retains compatibility with previous specifications using an assumption (thoroughly discussed and concluded on the IETF-languages list a year ago) that, in case of left-prefix matching processes, script distinctions are generally far more important that country distinctions. as has been beautifully pointed out on the list, that is a view that is lingo-centric. If what I am trying to differentiate is the price (and the currency of the price) of an item, the country may be much more important than the script that the price is written in. (this is also an example for the last point below). I repeat, I don't think truncation -- and hence prefix-matching -- is very stable or nearly universally applicable enough to be mentioned. Whereas I do believe compatibility of ordering with 3066 is important. I don't believe that simple truncation is a necessarily useful operation in all circumstances, I don't think anyone would dispute that. and it probably should not be in the spec. at all. For example, I'd say that we should retain the 3066 ordering of language-country and therefore script, if needed, comes later. However, my typesetting subsystem doesn't care a jot about language or country, it just needs to find the script code ('can I render this script'?). Here I disagree. For other purposes, I think it's very clear that the only time that choice of order matters is with matching algorithms that use simple truncation, and for the most common implementations, which use left-prefix truncation, the order lang-script-country will be far more useful in the long run precisely because script distinctions are generally far more important in matching than country distinctions. I don't know of any case in which a tag might be used that contained all three subtags but in which the country distinction generally matters more than the script distinction. -- David Singer Apple Computer/QuickTime ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
First, I apologize about the statement there has been a lot of noise on this issue. By that, I wasn't really meaning your message in particular. I was commenting more on the general status of a quite a number of statements that have been made on the overall topic. And by noise, I really mean high-level statements without explicit examples or scenarios, where it is very hard for people not familiar with the details to be able to judge the correctness of the statements. And I will assume that it was that perceived insult that caused you to be dismissive, with your statement below about Fine, whatever. I assume that otherwise you would not so readily conclude that it didn't matter whether RFC 3066 said if X then Y vs. if Y then X. Those are, after all, very different statements, and a confusion between them would cause incorrect conclusions to be drawn. (c) Every single tag that could be generated under RFC 3066bis is a tag that could have been registered under RFC 3066. True but irrelevant. Not at all irrelevant. Suppose someone is using a RFC 3066 parser, and is faced with either: (a) a registered tag from a future version of the RFC 3066 registry, or (b) a 3066bis tag (that uses generative features not in RFC 3066). Their parser will work *exactly* the same way; they would parse both as being equally well-formed, and they will be unable to determine any of the structure of either tag, and just treat each as a blob. So they are no better off, but *no worse off either*. (Had we not followed (c), this would not have been true.) Of course, if they try parsing a tag that is generated according to RFC 3066 (eg not in the registry), then they would be able to parse out the language code and/or country. If they update to a 3066bis parser, then they can reliably extract much more information from the tag. And because 3066bis was written to be backwards compatible, anything RFC 3066 generated language tag parses out exactly the same as it would with an RFC 3066 parser. Now you yourself may not care much about the extra information in the 3066bis language tag. But IBM, and many other companies and organizations do. This is not some theoretically problem; it is a real current issue that many are faced with. For example, without reliable script information many languages are severely underspecified. One simply cannot mix content with different scripts and have happy customers. And if you don't care about the extra information, you are no worse off than if you were trying to parse a registered RFC 3066 tag. For matching purposes, the commonly used truncation mechanism will work just as well with all 3066bis tags as it does with RFC 3066 tags, for all tags you will encounter. Mark - Original Message - From: [EMAIL PROTECTED] To: Mark Davis [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; ietf@ietf.org Sent: Thursday, January 06, 2005 06:44 Subject: Re: draft-phillips-langtags-08, process, sp ecifications,stability,and extensions Rather, the rule is simply that a country code, if present, always appears as a two letter second subtag. The new draft changes this rule, so applications that pay attention to coutnry codes in language tags have to change and the new algorithm for finding the country code is trickier. Your text above says (a) if there is a country code in the tag, it is the second subtag. That is not what text of RFC 3066 actually says, which is: The following rules apply to the second subtag: All 2-letter subtags are interpreted as ISO 3166 alpha-2 country... That is, it says (b) if a second subtag has 2 letters, then it is an ISO 3166 code, which is not the same as (a). (It is almost, but not quite, the converse.) Fine, whatever. The current RFC certainly does not forbid the use of country codes in other positions in language tags. One could absolutely register en-Latin-US, for example, meaning English as spoken in the US written in Latin script. Sure, but my point was, is, and always has been that any 3066-compliant implementation won't see this as a country code (unless it is table driven, which brings up its own set of issues). There has been a lot of noise on this issue, and too few concrete examples. No, what there has been is a lot of discussion of a real problem with no apparent recognition of it as such by the draft authors. Your pejorative characterization of this as noise does not make it so. In the so-called 3066bis draft, we have striven very hard to ensure that: (c) Every single tag that could be generated under RFC 3066bis is a tag that could have been registered under RFC 3066. True but irrelevant. Thus if someone wrote a parser that is future-compatible -- that could parse all RFC 3066 language tags including those registered after the parser was deployed -- then that parser can handle all 3066bis language tags. This is a significant advance over RFC 3066, whose registered
RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
From: [EMAIL PROTECTED] [mailto:ietf-languages- [EMAIL PROTECTED] On Behalf Of John C Klensin (3) Finally, there is apparently a procedural oddity with this document. The people who put it together apparently held extended discussions on the ietf-languages mailing list, a list that was established largely or completely to review registrations under 3066 and its predecessors.My understanding at this point is that their good-faith impression was that the discussions on that list were essentially equivalent to those of a WG. I believe I can say that it was done this way because it followed the example of the development of RFC 3066, which to my knowledge (as a member of the IETF-languages list at that time) happened in the same way. It was certainly done with a good-faith impression that appropriate procedures were being followed. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
And I will assume that it was that perceived insult that caused you to be dismissive, I was dismissive because your correction, while accurate, was irrelevant to the current discussion of the change to country code semantics. with your statement below about Fine, whatever. I assume that otherwise you would not so readily conclude that it didn't matter whether RFC 3066 said if X then Y vs. if Y then X. Those are, after all, very different statements, and a confusion between them would cause incorrect conclusions to be drawn. Of course they are, but I fail to see how any of this impacts the country code issue I have been discussing. (c) Every single tag that could be generated under RFC 3066bis is a tag that could have been registered under RFC 3066. True but irrelevant. Not at all irrelevant. I meant, of course, that it is irrelevant to the issue at hand. Suppose someone is using a RFC 3066 parser, and is faced with either: (a) a registered tag from a future version of the RFC 3066 registry, or (b) a 3066bis tag (that uses generative features not in RFC 3066). ... I am well aware of the value of this sort of backwards compatibility. I am not, I hope, a total fool, which I would have to be to be unaware of this. If they update to a 3066bis parser, then they can reliably extract much more information from the tag. And because 3066bis was written to be backwards compatible, anything RFC 3066 generated language tag parses out exactly the same as it would with an RFC 3066 parser. Now you yourself may not care much about the extra information in the 3066bis language tag. I never said that. In fact I have repeatedly said exactly the opposite. But IBM, and many other companies and organizations do. This is not some theoretically problem; it is a real current issue that many are faced with. For example, without reliable script information many languages are severely underspecified. One simply cannot mix content with different scripts and have happy customers. I am well aware of this. You are presenting a strawman argument here. And if you don't care about the extra information, you are no worse off than if you were trying to parse a registered RFC 3066 tag. It is somewhat axiomatic that if I don't care about something I don't care when that something changes. For matching purposes, the commonly used truncation mechanism will work just as well with all 3066bis tags as it does with RFC 3066 tags, for all tags you will encounter. Given that the matching approach I have been talking about is not simple truncation, I'm afraid this is yet another strawman. Ned ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
In a nutshell, Ned was elaborating on a comment from Dave Singer that, once we have parsed a pair of tags and identified all the pieces, it's not a trivial matter to decide in every case how the two tags compare, and that there are factors that would exist if the draft were approved that didn't exist under RFC 3066. Finally! Thank you! This is exactly what I have been trying to say. Again, I think this is a question that deserves discussion. In relation to the proposed draft, I don't see it as a particular problem with the draft. It is a problem that doesn't exist in RFC 3066, but that is only because RFC 3066 left us with bigger problems: it doesn't give us any way to identify pieces that we would be encountering in registered tags (apart from hard-coded tables compiled from versions of the registry that pre-exist a given implementation). With, as you point out below, one important exception: It did have a way to reliably identify a country code in most cases (but not all). And this ability to say 2 character subtag in the second position, most be a country code was quite useful even though it might miss other occurences of country codes in some cases. 3066bis provides a reliable way to locate country codes in all cases, but the algorithm is different. And this is a non-backwards-compatible change. Of course there's the option Dave Singer has raised: Reverse the positions of script and country codes in 3066bis. I see two problems with this: (1) Script codes are in general more important than country codes, and therefore really should come first so that simple truncation matches work better. (There are probably exceptions to this assertion lurking out there somewhere, but I believe it is mostly true.) (2) I believe it increases the number of grandfathered codes that won't conform to the new format. Now, it may be that, after full consideration of all the issues, especially given that the 3066 algorithm could not locate country codes in all cases, the right way forward is to make this non-backwards-compatible change, fully document the change and its consequences (although I will again point out that assessing the true impact on the installed base is a practical impossibility), and move on. But as you say, it does deserve discussion. RFC 3066 permits tags that have all kinds of internal structures. That is a problem as it will never allow us to derive much useful information from a tag with any confidence -- only the ISO 639 language category and in some cases a country category. I predict that in the future we will be seeing a significant number of tags (whether sanctioned without registration by a successor to RFC 3066 or as tags registered under RFC 3066) that go beyond the patterns 'll(-CC) and lll(-CC). If we stick with RFC 3066, we will have no way of writing forward-compatible processors that will be able to do very useful matching. A very good point. What this draft does is impose some order to all the other patterns within tags that are permitted, and tell us what the different pieces must be. As a result, we have more named pieces to deal with, and we are presented with the question that Ned raised: Now we have more named pieces than we did before; what do we do with them? That is a problem that will need to be addressed. But I don't think it's a reason to oppose the draft, since opposing the draft (or at least opposing any revision that introduces a richer internal structure) leaves us in a situation that must be characterized either as a worse problem or as turning our backs on increased functionality to meet real user needs. What would be really nice is to specify a parameterized matching algorithm (or more precisely, an algorithm family) along the lines of the stringprep family of string normalization algorithms. But I'm unsure if there's sufficient time and interest available to do this. But it is nice to dream... Ned ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
Dear John, thank you to acknowledge that the proposed draft _impose_ something ! It therefore do not report on an existing practice. thank you to acknowledge that the proposed draft even _limits_ the current practice ! thank you to explain that the decision of the user is replaced by an a-priori obligation .. resulting from a decision of a member of this list. Technically, these remarks are however without incidence on John Klenin's remark: a limitation is only a (negative) extension. I support _every_ position of John Klensin today (except that the IANA registrations should be transfered to IANA now which would set-up a global procedure equal to all, to address the possible discrepancies between the RFC 3066 and the draft). This would permit an acceptance by the IESG. Otherwise such an acceptance is unadvisable. jfc At 21:28 06/01/2005, John Cowan wrote: John C Klensin scripsit: Content-language: 3066-tag X-Extended-Content-language: new-tag This reflects a fundamental misunderstanding of what the draft does compared to what RFC 3066 does. It imposes *more* restraints on language tags, not fewer. The RFC 3066 language tag registration process can register tags with almost unpredictable meaning once one gets past the first subtag. The draft *limits* the possible tags to a small subset, and tightens up the allowable semantics. It allows no tag to be used that was not already registerable under RFC 3066. In RFC 3066, it is only a heuristic (or examination of the IANA registry, which is not machine-parseable) that tells the meaning of the second subtag the existing registered tag sr-Latn. In the draft, its meaning is unambiguously specified a priori. -- John Cowan [EMAIL PROTECTED] http://www.ccil.org/~cowan Raffiniert ist der Herrgott, aber boshaft ist er nicht. --Albert Einstein ___ Ietf-languages mailing list [EMAIL PROTECTED] http://www.alvestrand.no/mailman/listinfo/ietf-languages ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] RFC 3066 left us with bigger problems: it doesn't give us any way to identify pieces that we would be encountering in registered tags (apart from hard-coded tables compiled from versions of the registry that pre-exist a given implementation). With, as you point out below, one important exception: It did have a way to reliably identify a country code in most cases (but not all). If in most cases means from among tags in use today under the terms of RFC 3066 (as John Cowan would say, what is true), then yes. But if in most cases means trom among tags permitted by RFC 3066 (as John Cowan might say, what is the rule) -- including some that users have been wanting to use but have delayed using pending a revision of RFC 3066-- then no: RFC 3066 allowed for reliable identification of a country code in only a small portion of all possible cases: only if it occurred as the second subtag following an ISO 639 code (it does not prohibit a country code from occurring anywhere after the first subtag). And this ability to say 2 character subtag in the second position, most be a country code was quite useful even though it might miss other occurences of country codes in some cases. The draft would still grant the ability to make that statement, and would permit new implementations never to miss *any* occurences of country codes. 3066bis provides a reliable way to locate country codes in all cases, but the algorithm is different. And this is a non-backwards-compatible change. Surely this has been the point of greatest contention in this discussion, and is clearly not obvious, for there are several who have repeatedly indicated that they do not see any such backwards non-compatibility. Please, anyone claiming there would be incompatibility, be pedantic: define whatever terms, make explicit whatever assumptions are required to support this claim. (I suspect the root of this disagreement lies in unstated assumptions.) Those who claim backward compatibility do so on the basis that every existing implementation conformant to RFC 3066 will continue to operate precisely as designed and in conformance with RFC 3066 regardless whether they encounter a tag presently well-formed and valid under the terms of RFC 3066 or one that would be sanctioned by this draft. If there is any term needing clarification in that statement or any suspected assumption not made plain, please ask for clarification. Of course there's the option Dave Singer has raised: Reverse the positions of script and country codes in 3066bis. I see two problems with this: (1) Script codes are in general more important than country codes, and therefore really should come first so that simple truncation matches work better. (There are probably exceptions to this assertion lurking out there somewhere, but I believe it is mostly true.) Thank you for voicing support for this position. (2) I believe it increases the number of grandfathered codes that won't conform to the new format. If I'm not mistaken, I think there would be no difference in this regard. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
--On Thursday, 06 January, 2005 15:28 -0500 John Cowan [EMAIL PROTECTED] wrote: John C Klensin scripsit: Content-language: 3066-tag X-Extended-Content-language: new-tag This reflects a fundamental misunderstanding of what the draft does compared to what RFC 3066 does. It imposes *more* restraints on language tags, not fewer. It also very explicitly permits talking about scripts, not just languages and countries.That, to me, is an extension, regardless of the additional constraints. But I could have used a different word; this was just an example. The RFC 3066 language tag registration process can register tags with almost unpredictable meaning once one gets past the first subtag. The draft *limits* the possible tags to a small subset, and tightens up the allowable semantics. It allows no tag to be used that was not already registerable under RFC 3066. The extension that I see involves more semantics and formal variations, not more possible registered tags.And, as Ned as pointed out repeatedly, there are things that can be done in 3066 parsers/interpreters in practice that have to be done differently in this new system. I could, of course, have used X-Incompatible-Content-Language in my example, but that presumably would have set you off in some other direction. In RFC 3066, it is only a heuristic (or examination of the IANA registry, which is not machine-parseable) that tells the meaning of the second subtag the existing registered tag sr-Latn. In the draft, its meaning is unambiguously specified a priori. So? john ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
--On Thursday, 06 January, 2005 16:30 -0800 Peter Constable [EMAIL PROTECTED] wrote: From: [EMAIL PROTECTED] [mailto:ietf-languages- [EMAIL PROTECTED] On Behalf Of John C Klensin (3) Finally, there is apparently a procedural oddity with this document. The people who put it together apparently held extended discussions on the ietf-languages mailing list, a list that was established largely or completely to review registrations under 3066 and its predecessors.My understanding at this point is that their good-faith impression was that the discussions on that list were essentially equivalent to those of a WG. I believe I can say that it was done this way because it followed the example of the development of RFC 3066, which to my knowledge (as a member of the IETF-languages list at that time) happened in the same way. It was certainly done with a good-faith impression that appropriate procedures were being followed. Peter, just to clarify... In my opinion (which isn't necessarily worth much), the procedures that were followed were perfectly reasonable. Anyone can form a design team and put a document together, and there are no rules that bar such a design team from using and building on a mailing list set up for something else. That may or may not be wise, but it is certainly permitted. The only place this runs into a problem is if someone presumes that a document developed in the way this one was developed is equivalent to a WG product, or that it is entitled to the presumptions of relevancy and correctness that go with a WG product. From that point of view, it is nothing more or less than an individual submission (or the output of a self-defined design team) and the comments Dave and I have been making apply. john ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
From: [EMAIL PROTECTED] [mailto:ietf-languages- [EMAIL PROTECTED] On Behalf Of John C Klensin This reflects a fundamental misunderstanding of what the draft does compared to what RFC 3066 does. It imposes *more* restraints on language tags, not fewer. It also very explicitly permits talking about scripts, not just languages and countries.That, to me, is an extension, regardless of the additional constraints. There may be a disagreement here due to a difference of perspective: one could say that the grammar is more extensive, but that makes the formal language less extensive. So, I suppose whether one considers such a revision an extension depends on one's perspective. Note that while the draft permits talking about scripts, RFC 3066 permits talking about *anything*. More extensive grammar, less extensive language (and vice versa). And, as Ned as pointed out repeatedly, there are things that can be done in 3066 parsers/interpreters in practice that have to be done differently in this new system. I think this claim can only be made on the basis of assumptions not found in RFC 3066. Ned has most recently said, 3066bis provides a reliable way to locate country codes in all cases, but the algorithm is different. And this is a non-backwards-compatible change. The fact that it can identify country codes in all cases but requires a different algorithm does not imply a non-backwards-compatible change since it is a new functionality -- it is doing something that wasn't even possible in RFC 3066. Backwards compatibility cannot be measured in terms of whether new processors require different algorithms to achieve new functionality. It can only be measured in terms of whether new processors can perform correct operations (correct according to the specification for those processors -- the proposed draft) on existing tags, and whether existing processors can perform correct operations (correct according to the specification of those processors -- RFC 3066) on new tags. This draft permits this. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
John: Peter, just to clarify... In my opinion (which isn't necessarily worth much) (I sincerely doubt that's the case.) , the procedures that were followed were perfectly reasonable. Anyone can form a design team and put a document together, and there are no rules that bar such a design team from using and building on a mailing list set up for something else. That may or may not be wise, but it is certainly permitted. The only place this runs into a problem is if someone presumes that a document developed in the way this one was developed is equivalent to a WG product, or that it is entitled to the presumptions of relevancy and correctness that go with a WG product. I can't speak for the authors. I was not familiar with those distinctions when the process began, and I suspect that is true of others on the IETF-languages list who contributed. In my mind we were following a precident that implied not only a permitted procedure but an entirely appropriate one. I think all of us now understand, at least in part, that some distinctions exist that may have practical implications on how something is received by the IETF community and processed by the IESG. From that point of view, it is nothing more or less than an individual submission (or the output of a self-defined design team) and the comments Dave and I have been making apply. I don't think I have questioned the applicability of your comments in this regard at any point. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
[EMAIL PROTECTED] scripsit: Finding country codes is straightforward: any non-initial subtag of two letters (not appearing to the right of x- or -x-) is a country code. This is true in RFC 1766, RFC 3066, and the current draft. On the contrary, in RFC 3066 the rule is any 2 letter value that appears as the second subtag is a country code. The rule in the new draft is either the formulation you give above or any 2 letter value that appears as a subtag after the initial subtag and some number of 3 and 4 letter subtags. I didn't state it as a rule, but as true. Every non-initial 2-letter tag in RFC 3066 is a country code; the same is true in the draft. (A private correspondent notes that the reference to -x- should in fact be a reference to any singleton, though -x- and i- are the only singletons currently usable.) Just because something doesn't necessarily do something doesn't mean it never does it. It does mean it can't be counted on in the general case. Well, maybe I'm missing something obvious, but I see nothing in RFC 3066 that qualifies as a description of a matching algorithm. Section 2.5 (2.4.1 in the draft) states the matching rule in a succinct fashion. Everything in 2.4.2 is a non-normative elaboration of this. -- John Cowan www.reutershealth.com www.ccil.org/~cowan [EMAIL PROTECTED] 'Tis the Linux rebellion / Let coders take their place, The Linux-nationale / Shall Microsoft outpace, We can write better programs / Our CPUs won't stall, So raise the penguin banner of / The Linux-nationale. --Greg Baker ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
Finding country codes is straightforward: any non-initial subtag of two letters (not appearing to the right of x- or -x-) is a country code. This is true in RFC 1766, RFC 3066, and the current draft. On the contrary, in RFC 3066 the rule is any 2 letter value that appears as the second subtag is a country code. The rule in the new draft is either the formulation you give above or any 2 letter value that appears as a subtag after the initial subtag and some number of 3 and 4 letter subtags. I didn't state it as a rule, but as true. Every non-initial 2-letter tag in RFC 3066 is a country code; the same is true in the draft. Again, that is not what RFC 3066 says. From section 2.2: There are no rules apart from the syntactic ones for the third and subsequent subtags. Sure sounds to me like a third two letter subtag is (a) Allowed and (b) Isn't supposed to be treated as country code. Now, it may be the case that all _registered_ tags have avoided the use of non-country code two letter codes in the third and later position. But this is 100% irrelevant. The point is that conformant code implementing RFC 3066 is broken if it simply assumes any 2 letter code after the first subtag is a country code. Rather, the rule is simply that a country code, if present, always appears as a two letter second subtag. The new draft changes this rule, so applications that pay attention to coutnry codes in language tags have to change and the new algorithm for finding the country code is trickier. (A private correspondent notes that the reference to -x- should in fact be a reference to any singleton, though -x- and i- are the only singletons currently usable.) I have to say I find it quite interesting that one of the main proponents of the new draft, while arguing that the new draft doesn't make the matching problem a lot harder, ended up giving an erroneous rule for extracting country codes from a language tag. Just because something doesn't necessarily do something doesn't mean it never does it. It does mean it can't be counted on in the general case. Sure, in the general case most if not all of these nasty corner cases you've created can be blithly assumed away because they only appear in specific problem domains. Actual applications that operate in those specific domains aren't so lucky, however. And the metric we're supposed to apply in the IETF is real world implementability. As it happens I deal with messaging applications, and in this space text/plain with all sorts of nasty charset issues is the rule, not the exception. Well, maybe I'm missing something obvious, but I see nothing in RFC 3066 that qualifies as a description of a matching algorithm. Section 2.5 (2.4.1 in the draft) states the matching rule in a succinct fashion. Everything in 2.4.2 is a non-normative elaboration of this. ??? Which in no way refutes my assertion that no matching rule algorithm was given in RFC 3066! Ned ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
[EMAIL PROTECTED] scripsit: Now, it may be the case that all _registered_ tags have avoided the use of non-country code two letter codes in the third and later position. But this is 100% irrelevant. If you say so. The point is that conformant code implementing RFC 3066 is broken if it simply assumes any 2 letter code after the first subtag is a country code. Rather, the rule is simply that a country code, if present, always appears as a two letter second subtag. Not quite. The rule is that a 2-letter second subtag is a country code. Country codes could have appeared elsewhere, and may still wind up doing so before RFC 3066 is obsoleted. The new draft changes this rule, so applications that pay attention to coutnry codes in language tags have to change and the new algorithm for finding the country code is trickier. But not much. As an advantage, country codes can always be found in the new draft, whereas in RFC 3066 they could in principle be anywhere. (A private correspondent notes that the reference to -x- should in fact be a reference to any singleton, though -x- and i- are the only singletons currently usable.) I have to say I find it quite interesting that one of the main proponents of the new draft, while arguing that the new draft doesn't make the matching problem a lot harder, ended up giving an erroneous rule for extracting country codes from a language tag. Like other people, I sometimes post when tired; I don't think this particularly interesting. Sure, in the general case most if not all of these nasty corner cases you've created can be blithly assumed away because they only appear in specific problem domains. Actual applications that operate in those specific domains aren't so lucky, however. And the metric we're supposed to apply in the IETF is real world implementability. I fail to see what this has to do with the merit of marking scripts in language tags. The preferred IETF charset, UTF-8, contains no information about script whatever. As it happens I deal with messaging applications, and in this space text/plain with all sorts of nasty charset issues is the rule, not the exception. Extended language tags will neither help nor harm you, then. -- We are lost, lost. No name, no business, no Precious, nothing. Only empty. Only hungry: yes, we are hungry. A few little fishes, nassty bony little fishes, for a poor creature, and they say death. So wise they are; so just, so very just. --Gollum[EMAIL PROTECTED] www.ccil.org/~cowan ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
Rather, the rule is simply that a country code, if present, always appears as a two letter second subtag. The new draft changes this rule, so applications that pay attention to coutnry codes in language tags have to change and the new algorithm for finding the country code is trickier. Your text above says (a) if there is a country code in the tag, it is the second subtag. That is not what text of RFC 3066 actually says, which is: The following rules apply to the second subtag: All 2-letter subtags are interpreted as ISO 3166 alpha-2 country... That is, it says (b) if a second subtag has 2 letters, then it is an ISO 3166 code, which is not the same as (a). (It is almost, but not quite, the converse.) The current RFC certainly does not forbid the use of country codes in other positions in language tags. One could absolutely register en-Latin-US, for example, meaning English as spoken in the US written in Latin script. There has been a lot of noise on this issue, and too few concrete examples. In the so-called 3066bis draft, we have striven very hard to ensure that: (c) Every single tag that could be generated under RFC 3066bis is a tag that could have been registered under RFC 3066. Thus if someone wrote a parser that is future-compatible -- that could parse all RFC 3066 language tags including those registered after the parser was deployed -- then that parser can handle all 3066bis language tags. This is a significant advance over RFC 3066, whose registered (not generated) language tags are atomic, and cannot be effectively parsed at all. 3066bis adds more structure so as to allow effective parsing of tags. If you *can* come up with tags that would show that (c) is invalid, that would be a concrete case that we would have to make adjustments in the draft for. A second issue that has come up is complexity. Admittedly, 3066bis is more complex than RFC 3066. Part of that is due to adding additional structure, and part due to necessary clarifications (such as the distinction between well-formed and valid). But we did not add the additional structure at a whim. RFC 3066, while a significant advance, is simply not now powerful enough to meet the current needs for distinctions in language needed by the industry. The companies and organizations in the Unicode consortium, for example, are supporting 3066bis for improved software internationalization. For more information on the reasons behind the enhancements in 3066bis see http://www.inter-locale.com/ID/why-rfc3066bis.html. Moreover, all the talk about this being *too* complex is far overblown. All 3066bis language tags can be parsed, including all the grandfathered codes, with a very short piece of code, or even with a regular expression (such as in Perl). This is not rocket science. Mark - Original Message - From: [EMAIL PROTECTED] To: John Cowan [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; ietf@ietf.org Sent: Wednesday, January 05, 2005 07:33 Subject: Re: draft-phillips-langtags-08, process, sp ecifications,stability, and extensions Finding country codes is straightforward: any non-initial subtag of two letters (not appearing to the right of x- or -x-) is a country code. This is true in RFC 1766, RFC 3066, and the current draft. On the contrary, in RFC 3066 the rule is any 2 letter value that appears as the second subtag is a country code. The rule in the new draft is either the formulation you give above or any 2 letter value that appears as a subtag after the initial subtag and some number of 3 and 4 letter subtags. I didn't state it as a rule, but as true. Every non-initial 2-letter tag in RFC 3066 is a country code; the same is true in the draft. Again, that is not what RFC 3066 says. From section 2.2: There are no rules apart from the syntactic ones for the third and subsequent subtags. Sure sounds to me like a third two letter subtag is (a) Allowed and (b) Isn't supposed to be treated as country code. Now, it may be the case that all _registered_ tags have avoided the use of non-country code two letter codes in the third and later position. But this is 100% irrelevant. The point is that conformant code implementing RFC 3066 is broken if it simply assumes any 2 letter code after the first subtag is a country code. Rather, the rule is simply that a country code, if present, always appears as a two letter second subtag. The new draft changes this rule, so applications that pay attention to coutnry codes in language tags have to change and the new algorithm for finding the country code is trickier. (A private correspondent notes that the reference to -x- should in fact be a reference to any singleton, though -x- and i- are the only singletons currently usable.) I have to say I find it quite interesting that one of the main proponents of the new draft, while arguing that the new draft doesn't make the matching problem a lot harder, ended up giving
RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
This whole question of what 'matches' is subtle. Consider the case when I have a document that has variant content by language (e.g. different sound tracks), and the user indicates a set of preferred languages. If the content has de-CH and fr-CH (swiss german and french), and a default en (english) and the user says he speaks de-DE and fr-FR, on the face of it nothing matches, and I fall back to the catch-all default, which is almost certainly not the best result. David, this isn't the half of it. The case you describe is actually one of the easy ones, in that it can be handled by doing a preferred match on the entire tag, with a generic match on the primary tag only having lesser precedence but higher precedence than a fallback to a default. I know of two other wrinkles in the RFC 1766 world: (1) Matching may want to take into account the distinguished nature of country subtags in some way. (2) SGN- requires special handling, in that SGN-FR and SGN-EN are in fact sufficiently different languages that a primary tag match should not be taken to be a generic match. (Of course this only matters if sign languages are relevant to your situation - in many cases they aren't. In retrospect I think it was a mistake to register sign languages this way.) This proposed revision, however, opens pandora's box in regards to matching. Consider: (a) Extension tags appear as the first subtags, and as such have to be taken into account when looking for country subtags. (b) Script tags change the complexion of the matching problem significantly, in that they can interact with external factors like charset information in odd ways. (c) UN country numbers have been added (IMO for no good reason), requiring handling similar to country codes. The bottom line is that while I know how to write reasonable code to do RFC 1766 matching (and have in fact done so for widely deployed software), I haven't a clue how to handle this new draft competently in regards to matching. And the immediate consequence of this is that I, and I suspect many other, implementors are going to adopt a wait and see attitude in regards to implementing any of this. Ned ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
Small typo: In my previous response I referred to RFC 1766 when I meant RFC 3066. Too many documents open at once, sorry. Ned ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
At 9:14 AM -0800 1/4/05, [EMAIL PROTECTED] wrote: This whole question of what 'matches' is subtle. Consider the case when I have a document that has variant content by language (e.g. different sound tracks), and the user indicates a set of preferred languages. If the content has de-CH and fr-CH (swiss german and french), and a default en (english) and the user says he speaks de-DE and fr-FR, on the face of it nothing matches, and I fall back to the catch-all default, which is almost certainly not the best result. David, this isn't the half of it. The case you describe is actually one of the easy ones, in that it can be handled by doing a preferred match on the entire tag, with a generic match on the primary tag only having lesser precedence but higher precedence than a fallback to a default. Yes, I picked off an easy example for which the 'matching' section of the draft didn't seem adequate. This really is a tar-pit, of course. Serbo-croatian used to be a language; now it's serbian and croatian. I assume that they are mutually intelligible. Serbian is probably a better substitute for croatian than some general default (or silence), though saying this in some parts of the world might start wars. The whole question of what is a language, a variant or dialect of a language, or a suitable substitute for a language, would benefit some thought in any tagging scheme, though I agree the problem is not generally soluble. I know of two other wrinkles in the RFC 1766 world: (1) Matching may want to take into account the distinguished nature of country subtags in some way. (2) SGN- requires special handling, in that SGN-FR and SGN-EN are in fact sufficiently different languages that a primary tag match should not be taken to be a generic match. (Of course this only matters if sign languages are relevant to your situation - in many cases they aren't. In retrospect I think it was a mistake to register sign languages this way.) This proposed revision, however, opens pandora's box in regards to matching. Consider: (a) Extension tags appear as the first subtags, and as such have to be taken into account when looking for country subtags. (b) Script tags change the complexion of the matching problem significantly, in that they can interact with external factors like charset information in odd ways. (c) UN country numbers have been added (IMO for no good reason), requiring handling similar to country codes. The bottom line is that while I know how to write reasonable code to do RFC 1766 matching (and have in fact done so for widely deployed software), I haven't a clue how to handle this new draft competently in regards to matching. And the immediate consequence of this is that I, and I suspect many other, implementors are going to adopt a wait and see attitude in regards to implementing any of this. Ned -- David Singer Apple Computer/QuickTime ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
[EMAIL PROTECTED] scripsit: I know of two other wrinkles in the RFC 1766 world: Are you aware that RFC 1766 has been obsolete for four years now? (2) SGN- requires special handling, in that SGN-FR and SGN-EN are in fact sufficiently different languages that a primary tag match should not be taken to be a generic match. The same is true of the various registered zh-* tags. (a) Extension tags appear as the first subtags, and as such have to be taken into account when looking for country subtags. Finding country codes is straightforward: any non-initial subtag of two letters (not appearing to the right of x- or -x-) is a country code. This is true in RFC 1766, RFC 3066, and the current draft. (b) Script tags change the complexion of the matching problem significantly, in that they can interact with external factors like charset information in odd ways. Can you clarify this? Charset information neither specifies nor necessarily restricts (except in text/plain) the script used to write a document. (c) UN country numbers have been added (IMO for no good reason), requiring handling similar to country codes. They provide for supranational language varieties and for stability in country codes which is inappropriate for ISO 3166 alphabetic codes (which are codes for country *names*). The bottom line is that while I know how to write reasonable code to do RFC 1766 matching (and have in fact done so for widely deployed software), I haven't a clue how to handle this new draft competently in regards to matching. The draft describes only the RFC 1766 (3066) algorithm, without excluding other algorithms to be defined later. -- Clear? Huh! Why a four-year-old childJohn Cowan could understand this report. Run out [EMAIL PROTECTED] and find me a four-year-old child. I http://www.ccil.org/~cowan can't make head or tail out of it. http://www.reutershealth.com --Rufus T. Firefly on government reports ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
Dave Singer scripsit: Yes, I picked off an easy example for which the 'matching' section of the draft didn't seem adequate. This really is a tar-pit, of course. Indeed it is, which is why the draft provides only one simple algorithm (described as the most common implementation, which it is) and explicitly allows for cleverer techniques for those who want them. I assume that they are mutually intelligible. Among speakers of good will, yes. The whole question of what is a language, a variant or dialect of a language, or a suitable substitute for a language, would benefit some thought in any tagging scheme, though I agree the problem is not generally soluble. See the editor's draft of ISO 639-3 at http://tinyurl.com/6kky2 . This is a PDF file about 4 MB in size, so I excerpt the relevant text here (clause 4.2.1, pp. 3-4): # There is no one definition of language that is agreed upon by all and # appropriate for all purposes. As a result, there can be disagreement, # even among speakers or linguistic experts, as to whether two varieties # represent dialects of a single language or two distinct languages. For # this part of ISO 639, judgments regarding when two varieties are # considered to be the same or different languages are based on a number # of factors, including linguistic similarity, intelligibility, a common # literature, the views of speakers concerning the relationship between # language and identity, and other factors. The following basic criteria # are followed: # # Two related varieties are normally considered varieties of the same # language if speakers of each variety have inherent understanding # of the other variety (that is, can understand based on knowledge of # their own variety without needing to learn the other variety) at a # functional level. # # Where spoken intelligibility between varieties is marginal, the # existence of a common literature or of a common ethnolinguistic # identity with a central variety that both understand can be strong # indicators that they should nevertheless be considered varieties of # the same language. # # Where there is enough intelligibility between varieties to # enable communication, the existence of well-established distinct # ethnolinguistic identities can be a strong indicator that they should # nevertheless be considered to be different languages. # # Some of the distinctions made on this basis may not be considered # appropriate by some users or for certain applications. These basic # criteria are thought to best fit the intended range of applications, # however. -- First known example of political correctness: John Cowan After Nurhachi had united all the otherhttp://www.reutershealth.com Jurchen tribes under the leadership of the http://www.ccil.org/~cowan Manchus, his successor Abahai (1592-1643) [EMAIL PROTECTED] issued an order that the name Jurchen should --S. Robert Ramsey, be banned, and from then on, they were all The Languages of China to be called Manchus. ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
From: Dave Singer [mailto:[EMAIL PROTECTED] The whole question of what is a language, a variant or dialect of a language, or a suitable substitute for a language, would benefit some thought in any tagging scheme, though I agree the problem is not generally soluble. These are questions that have been given some thought. No time to delve into it at the moment, however. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
From: [EMAIL PROTECTED] [mailto:ietf-languages- [EMAIL PROTECTED] On Behalf Of John Cowan The whole question of what is a language, a variant or dialect of a language, or a suitable substitute for a language, would benefit some thought in any tagging scheme, though I agree the problem is not generally soluble. See the editor's draft of ISO 639-3 at http://tinyurl.com/6kky2 ... I would say that all of clause 4.2 is relevant; in addition to 4.2.1, I would especially include 4.2.2, in relation to which I have presented ideas that led to the inclusion of the Extensions subtag in the proposed draft. (I originally thought of it as a way to capture some existing registered tags as part of a consistent scheme rather than merely as ad-hoc tags, but I think it may be more generally useful as well for dealing with some of the issues regarding different perceptions of what is a language.) I'm afraid I don't have time at the moment to elaborate further. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
[EMAIL PROTECTED] scripsit: I know of two other wrinkles in the RFC 1766 world: Are you aware that RFC 1766 has been obsolete for four years now? Of course I am. (2) SGN- requires special handling, in that SGN-FR and SGN-EN are in fact sufficiently different languages that a primary tag match should not be taken to be a generic match. The same is true of the various registered zh-* tags. Yes, forgot to mention that one. It is actually different and more important in that the use-cases aren't the same as those for sign languages. (a) Extension tags appear as the first subtags, and as such have to be taken into account when looking for country subtags. Finding country codes is straightforward: any non-initial subtag of two letters (not appearing to the right of x- or -x-) is a country code. This is true in RFC 1766, RFC 3066, and the current draft. On the contrary, in RFC 3066 the rule is any 2 letter value that appears as the second subtag is a country code. The rule in the new draft is either the formulation you give above or any 2 letter value that appears as a subtag after the initial subtag and some number of 3 and 4 letter subtags. These aren't the same. (b) Script tags change the complexion of the matching problem significantly, in that they can interact with external factors like charset information in odd ways. Can you clarify this? Charset information neither specifies nor necessarily restricts (except in text/plain) the script used to write a document. And what if you're dealing with text/plain, as many applicationss do? Just because something doesn't necessarily do something doesn't mean it never does it. (c) UN country numbers have been added (IMO for no good reason), requiring handling similar to country codes. They provide for supranational language varieties and for stability in country codes which is inappropriate for ISO 3166 alphabetic codes (which are codes for country *names*). I'm aware of what they provide (although I see no explanation of this in the draft). I'm just not convinced that their addition is warranted. The bottom line is that while I know how to write reasonable code to do RFC 1766 matching (and have in fact done so for widely deployed software), I haven't a clue how to handle this new draft competently in regards to matching. The draft describes only the RFC 1766 (3066) algorithm, without excluding other algorithms to be defined later. Well, maybe I'm missing something obvious, but I see nothing in RFC 3066 that qualifies as a description of a matching algorithm. The new draft does include such a description in section 2.4.2 - an improvement - but leaves any number of details open. And we all know where the devil lives. Side note: I don't think item 4 really belongs in the list in section 2.4.2. It is a warning to implementors, not part of the matching mechanism. Ned ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
The *meaning* of any given language tag would be no more or less a problem under the proposed revision than it was for RFC 3066 or RFC 1766. For instance, there is a concurrent thread that has been discussing when country distinctions are appropriate or recommended (ca or ca-ES?); this discussion pertains to RFC 3066, and part of the issue is that meanings of tags are implied rather than specified -- and always have been even under RFC 1766 (I pointed this out five years ago when we were working on preparing RFC 3066). So, for instance, when an author uses de-CH, what does he intend recipients to understand to be the difference between that and de-DE or even de? Neither RFC 1766 or RFC 3066 shed any light on this, and ultimately only the author knows for sure. Under RFC 3066, it was the *exceptional* case that a complete tags was registered, allowing some indication of the meaning of the whole (though even in that regard nothing really required that the documentation provide clear indication of the meaning). The 98% cases were those like de-CH in which it was assumed that everyone would understand what the intended meaning is. This whole question of what 'matches' is subtle. Consider the case when I have a document that has variant content by language (e.g. different sound tracks), and the user indicates a set of preferred languages. If the content has de-CH and fr-CH (swiss german and french), and a default en (english) and the user says he speaks de-DE and fr-FR, on the face of it nothing matches, and I fall back to the catch-all default, which is almost certainly not the best result. -- David Singer Apple Computer/QuickTime ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
From: JFC (Jefsey) Morfin [mailto:[EMAIL PROTECTED] Dear Peter, please let focus on the discussion of draft to be approved by the IESG and on its role. Eh???!! I can't imagine what on earth do you think I was talking about if not that. This document intends to replace RFC 3066 but does not want to take into account RFC published since the RFC 3006, the current IANA procedures, the work chartered in some WG, the internet architectural principle (RFC 1958). There is no problem in having it been accepted for information or experimental. There are serious objections to get it approved otherwise. (RFC 1958 was published since RFC 3066??!!) Look, the IESG chair is the list administrator for the IETF-language list, and a participant in its deliberations. If there has been a serious lacuna in process for moving this draft toward BCP, I think he would have mentioned it on the IETF-languages list a *long* time ago. It was the IESG that issued the Last Call announcement, not me, not the authors of the draft, not anybody else on the IETF-languages list. It appears that IESG *is* moving it through a process toward BCP, and one can only assume they feel their process is adequate and appropriate. The *meaning* of any given language tag would be no more or less a problem under the proposed revision than it was for RFC 3066 or RFC 1766. (a) RFC 3066 was published without considering different usages of the proposed language tag format. Eh???!! That is simply not true. RFC 3066 was developed with full awareness and consideration of all the usage scenarios for RFC 1766, which it replaced. RFC 3066 discusses various IETF and W3C protocols that use language tags. (Have you actually read RFC 3066?) (b) nor which authority would document their meanings (plural) Eh???!! Section 2.2 clearly identifies the authorities that document the meanings of subtags. I have pointed out that there are aspects of meaning that it does not address (which, btw, are not easily resolvable), but that does not imply that RFC 3066 was published without consideration of what authorities document meanings. I think we can all agree that there's no much less likelihood of someone I suggest that we not dwell on pathological cases that we aren't really likely to encounter. This kind of thinking is not appropriate when standardizing a format. Julius Caesar would have though a pathological case to propose that Roman should speak Londinium's language. If Romans had started speaking a variant of Londinium's language, the proposed draft could easily accommodate that situation. That is not pathological. A tag like sr-Latn-CS-gaulish-boont-guoyu-i-enochian is pathological. It most certainly *is* appropriate to identify what kinds of examples are or not valid, as we need to design for *valid* usage scenarios. For any given character set encoding standard, the fact that nonsense character sequences can be devised is not a determining factor in development of that encoding; the same is true here. At this point, I feel confident that it is not a problem to combine script IDs into language tags, and this is the consensus of the domain experts that have been discussing this proposed revision for the past year and more. This may mean that current reluctances to incorporate originating source authority, destination, format conformance, internationalization, icons support (and may be additional needs) could be a further consensus. I suggest that we save time this time. ??? You want to incorporate these things into the draft, or into language tags themselves? The latter is either not necessary or not appropriate (language tags should *not* include anything to indicate destination). As for inclusion in the draft, the proposed draft is quite clear about source authorities for subtags and about conformance; destination is out of scope and irrelevant. Internationalization? These are symbolic identifiers; they are intrinsically not localizable. Icon support??? I haven't a clue what you're talking about! Not a problem: the proposed revision *allows* for the use of script IDs but does not require them. In the case of audio content, one simply would never include a script ID. Accents and types of voice have been documented as necessary items. They could use the script and police fields ? ??? If someone needs to tag content to distinguish a particular dialect, the proposed draft can easily accommodate that. If one wants to tag content for minor linguistic details (this utterance was spoken by someone who has a cleft palatte, who was intoxicated at the time, and uses tag-question intonation), it is a *non goal* of the proposed draft to accommodate that level of detail as it is not appropriate to try to capture that level of ad hoc detail in a general-purpose metadata element. The bigger problem you're pointing out is the limitations of using suffix-truncation alone as a
RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions
From: JFC (Jefsey) Morfin [mailto:[EMAIL PROTECTED] Of course it would not be clear if you don't have a conceptual model of what language tags are identifiers *of*. When RFC 3066 was being developed, there was a suggestion that script IDs be incorporated, but some were reluctant, raising the same question you have here. I was one of those. But I didn't remain obstructionist over the issue; instead, I gave a fair amount of thought to the ontology that underlies language tags, and subsequently published a white paper and presented on the topic at two conferences in the spring and fall of 2002. (Paper is available online at http://www.sil.org/silewp/abstract.asp?ref=2002-003 -- my thinking has evolved since then, but some key results remain valid, I think.) May us know which ones? It would be easier to identify two key points on which my thinking has changed. IIRC, I was uncertain at the time about what to do wrt sorting. I have since concluded that sort order is a presentation issue that, while linguistically related, is out of scope for language identifiers. (Note that there is no common usage scenario in which it makes sense to declare the sorted order of content.) Sort order may certainly be in scope for a locale identifier, but not for a language tag. The bigger change is that I have abandoned the fourth main category in the ontological model I proposed. At the time, I was still trying to work out where something like Latin America Spanish fit in. I saw the similarity to sub-language varieties / dialects, but at the time thought it needed to be a distinct category, for which reason I concocted the notion domain-specific data set. I was never very satisfied with that: it wasn't a particularly consistent model (a data set is quite a different kind of thing from a language variety) and it ignored the similarity with sub-language variety. (And the name was a bit unwieldy.) I have since realized that I was tripping up on the very problem that was blocking the Language Tag Reviewer from accepting the requested registration for es-americas: the assumption that a language tag necessarily refers to a conventionally-recognized linguistic identity that exists in the world. Language tags are not attributes declared on language varieties; they are attributes declared on information objects, indicating linguistic properties of those information objects. And the linguistic attributes of an information object do not necessarily coincide with conventionally-recognized linguistic identities. Of course, in the majority of useful cases they will; but it's not hard to show that this is not always the case: e.g. if I present chat as an expression that could be intrepreted in relation to several different languages, it would be entirely appropropriate for me to declare a linguistic attribute of that expression of indeterminate since that is precisely my intent -- but clearly indeterminate doesn't correspond with any particular language identity out in the world. Thus, I came to realize that the kind of distinction intended by es-americas was just the same kind of distinction made for any sub-language variety: it declares that the information object is not only in some particular language, but is even more constrained in terms of the language variety in use. It is simply coincidental that the more constrained usage in this case doesn't coincide with a single dialect used by some identifiable speaker community. Peter Constable ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf