RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-10 Thread Misha Wolf
Dave Crocker wrote: 

 And, indeed, I haven't seen much support for the document under
discussion.

I find statements such as this mind-boggling.  Please explain what you 
mean by much support.  There have been at least as many individuals 
writing mails in favour of the document as against it.  Furthermore, 
it has been made clear that the individuals writing the document and 
supporting it represent *very* large communities.

-- 
Misha Wolf
Standards Manager
Chief Architecture Office
Reuters




-- --
Visit our Internet site at http://www.reuters.com

Get closer to the financial markets with Reuters Messaging - for more
information and to register, visit http://www.reuters.com/messaging

Any views expressed in this message are those of  the  individual
sender,  except  where  the sender specifically states them to be
the views of Reuters Ltd.


___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-10 Thread Valdis . Kletnieks
On Mon, 10 Jan 2005 11:33:54 GMT, Misha Wolf said:

 I find statements such as this mind-boggling.  Please explain what you 
 mean by much support.  There have been at least as many individuals 
 writing mails in favour of the document as against it.  Furthermore, 
 it has been made clear that the individuals writing the document and 
 supporting it represent *very* large communities.

Support is there.  Consensus, however, is quite lacking on this one.


pgpCBTZi9YDml.pgp
Description: PGP signature
___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-10 Thread Deborah Goldsmith
Let me take this opportunity to say that Apple, too, strongly supports 
3066bis.

Deborah Goldsmith
Internationalization, Unicode liaison
Apple Computer, Inc.
[EMAIL PROTECTED]
On Jan 10, 2005, at 3:33 AM, Misha Wolf wrote:
I find statements such as this mind-boggling.  Please explain what you
mean by much support.  There have been at least as many individuals
writing mails in favour of the document as against it.  Furthermore,
it has been made clear that the individuals writing the document and
supporting it represent *very* large communities.

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-07 Thread John Cowan
JFC (Jefsey) Morfin scripsit:
 Dear John,
 thank you to acknowledge that the proposed draft _impose_ something ! 
 It therefore do not report on an existing practice.
 thank you to acknowledge that the proposed draft even _limits_ the 
 current practice !
 thank you to explain that the decision of the user is replaced by an 
 a-priori obligation .. resulting from a decision of a member of this list.

The practice that is being limited is that of the language tag review
process (the list, the Reviewer, IANA), not of any user.  Users are
free to use language tags or not, of course.

 Technically, these remarks are however without incidence on John Klenin's 
 remark: a limitation is only a (negative) extension. 

And tyranny is only negative liberty, I suppose?  Hogwash.

 (except that the IANA registrations should be transfered to IANA now  [...])

Is this supposed to mean something?

-- 
How they ever reached any conclusion at all[EMAIL PROTECTED]
is starkly unknowable to the human mind.   http://www.reutershealth.com
--Backstage Lensman, Randall Garrett  http://www.ccil.org/~cowan

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-07 Thread John Cowan
[EMAIL PROTECTED] scripsit:

 What would be really nice is to specify a parameterized matching
 algorithm (or more precisely, an algorithm family) along the lines
 of the stringprep family of string normalization algorithms. But
 I'm unsure if there's sufficient time and interest available to do
 this. But it is nice to dream...

That would be a Good Thing indeed.  However, it is definitely out of
scope for this draft, as it would stretch the definition of BCP well
beyond the breaking point.  If there's any defending the presence of an
*algorithm* in a BCP at all, it's because we are not making the algorithm
normative, but just saying The most commonly used algorithm is.

-- 
[W]hen I wrote it I was more than a little  John Cowan
febrile with foodpoisoning from an antique carrot   [EMAIL PROTECTED]
that I foolishly ate out of an illjudged faith  www.ccil.org/~cowan
in the benignancy of vegetables.  --And Rosta   www.reutershealth.com

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-07 Thread John Cowan
John C Klensin scripsit:

  In RFC 3066, it is only a heuristic (or examination of the
  IANA registry, which is not machine-parseable) that tells the
  meaning of the second subtag the existing registered tag
  sr-Latn.  In the draft, its meaning is unambiguously specified
  a priori.
 
 So?

So it is meaningless to talk about breaking backward compatibility
when the behavior in question is a heuristic (or to quote Ned,
it works in most but not all cases).  Registration of new tags
under RFC 3066 can and will break the heuristic all by themselves.
The new draft talks about scripts, but the existing registered tags
talk about scripts too.

-- 
John Cowan  www.reutershealth.com  www.ccil.org/~cowan  [EMAIL PROTECTED]
'Tis the Linux rebellion / Let coders take their place,
The Linux-nationale / Shall Microsoft outpace,
We can write better programs / Our CPUs won't stall,
So raise the penguin banner of / The Linux-nationale.  --Greg Baker

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-06 Thread ned . freed
 [EMAIL PROTECTED] scripsit:

  Now, it may be the case that all _registered_ tags have avoided the use of
  non-country code two letter codes in the third and later position. But this 
  is
  100% irrelevant.

 If you say so.

  The point is that conformant code implementing RFC 3066 is
  broken if it simply assumes any 2 letter code after the first subtag is a
  country code. Rather, the rule is simply that a country code, if present,
  always appears as a two letter second subtag.

 Not quite.  The rule is that a 2-letter second subtag is a country code.
 Country codes could have appeared elsewhere, and may still wind up doing so
 before RFC 3066 is obsoleted.

But it is wrong for a compliant 3066 implementation to assume that such a two
letter code is a country code!

I really cannot fathom why this issue is so hard for you to understand.

  The new draft changes this rule,
  so applications that pay attention to coutnry codes in language tags have to
  change and the new algorithm for finding the country code is trickier.

 But not much.  As an advantage, country codes can always be found in the new
 draft, whereas in RFC 3066 they could in principle be anywhere.

Not really. Anyhing that puts a country code in some other location in the
3066 world isn't going to get the benefit of automatic recognition of the
code as such by a 3066-compliant parser.

   (A private correspondent notes that the reference to -x- should
   in fact be a reference to any singleton, though -x- and i- are
   the only singletons currently usable.)
 
  I have to say I find it quite interesting that one of the main proponents of
  the new draft, while arguing that the new draft doesn't make the matching
  problem a lot harder, ended up giving an erroneous rule for extracting 
  country
  codes from a language tag.

 Like other people, I sometimes post when tired; I don't think this 
 particularly
 interesting.

Whereas everyone who writes code when they implement this stuff will be as
fresh as a daisy?

  Sure, in the general case most if not all of these nasty corner cases you've
  created can be blithly assumed away because they only appear in specific
  problem domains. Actual applications that operate in those specific domains
  aren't so lucky, however. And the metric we're supposed to apply in the 
  IETF is
  real world implementability.

 I fail to see what this has to do with the merit of marking scripts in 
 language
 tags.  The preferred IETF charset, UTF-8, contains no information about script
 whatever.

Sadly, the IETF's preferences haven't managed to catch on in many parts of the
world.

  As it happens I deal with messaging applications, and in this space 
  text/plain
  with all sorts of nasty charset issues is the rule, not the exception.

 Extended language tags will neither help nor harm you, then.

This actually may be true, because as I have said before, the likely outcome if
this draft is adopted in its present form will be that it will simply be
ignored in its entirety. But is this what we want?

Ned

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-06 Thread ned . freed
  Rather, the rule is simply that a country code, if present,
  always appears as a two letter second subtag. The new draft changes this
 rule,
  so applications that pay attention to coutnry codes in language tags have
 to
  change and the new algorithm for finding the country code is trickier.

 Your text above says (a) if there is a country code in the tag, it is the
 second subtag. That is not what text of RFC 3066 actually says, which is:

  The following rules apply to the second subtag:
  All 2-letter subtags are interpreted as ISO 3166 alpha-2 country...

 That is, it says (b) if a second subtag has 2 letters, then it is an ISO
 3166 code, which is not the same as (a). (It is almost, but not quite, the
 converse.)

Fine, whatever.

 The current RFC certainly does not forbid the use of country
 codes in other positions in language tags. One could absolutely register
 en-Latin-US, for example, meaning English as spoken in the US written in
 Latin script.

Sure, but my point was, is, and always has been that any 3066-compliant
implementation won't see this as a country code (unless it is table driven,
which brings up its own set of issues).

 There has been a lot of noise on this issue, and too few concrete examples.

No, what there has been is a lot of discussion of a real problem with no
apparent recognition of it as such by the draft authors. Your pejorative
characterization of this as noise does not make it so.

 In the so-called 3066bis draft, we have striven very hard to ensure that:

 (c) Every single tag that could be generated under RFC 3066bis is a tag that
 could have been registered under RFC 3066.

True but irrelevant.

 Thus if someone wrote a parser that is future-compatible -- that could parse
 all RFC 3066 language tags including those registered after the parser was
 deployed -- then that parser can handle all 3066bis language tags. This is a
 significant advance over RFC 3066, whose registered (not generated) language
 tags are atomic, and cannot be effectively parsed at all. 3066bis adds more
 structure so as to allow effective parsing of tags.

 If you *can* come up with tags that would show that (c) is invalid, that
 would be a concrete case that we would have to make adjustments in the draft
 for.

(c) is frankly not an issue I care one whit about. (Perhaps I should, but I
don't.) I don't register tags. I write code that processes, and more to the
point matches, tags. That's why I have issues with this draft.

 Moreover, all the talk about this being *too* complex is far overblown.

Again, your pejorative dismissal of other people's concerns does not
mean your position is valid.

 All
 3066bis language tags can be parsed, including all the grandfathered codes,
 with a very short piece of code, or even with a regular expression (such as
 in Perl).

Of course you can write a short piece of code to parse this stuff. It's what you
do with it after you parse it that's a problem.

 This is not rocket science.

Parsing almost never is. But simply parsing these tag is not, and never has
been, the issue.

Ned

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-06 Thread Peter Constable
 From: [EMAIL PROTECTED] [mailto:ietf-languages-
 [EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED]


 Again, your pejorative dismissal of other people's concerns does not
 mean your position is valid...


 Parsing almost never is. But simply parsing these tag is not, and
never
 has
 been, the issue.

I think you guys are in violent agreement over country codes within a
tag, and that the debate over intrepreting the wording of RFC 3066
serves no purpose.

I think the intent of Mark's dismissal has been to refute
perceived-invalid objections, in which case we need to consider that the
line between perceived-invalid and truly-invalid has been blurred simply
by the volume of discussion (the noise factor). There have been some
invalid objections that bear some similarity to comments Ned has made as
he has tried to make his point. (E.g. Bruce Lilly has claimed invalid
back-compat problems on the incorrect premises that RFC 3066 does not
permit ISO 3166 country codes except as second subtags or does not
permit second subtags that are not country codes (at the moment I forget
if it was one or the other or both).)

But Ned's concerns are legitimate, I think. I'd say they are not
necessarily blocking issues for this draft, because I think a possible
outcome of discussion is to characterize them as concerns about
outstanding issues that need to be solved rather than as concerns over
the draft itself; but I do think they are valid concerns that deserve
attention.

In a nutshell, Ned was elaborating on a comment from Dave Singer that,
once we have parsed a pair of tags and identified all the pieces, it's
not a trivial matter to decide in every case how the two tags compare,
and that there are factors that would exist if the draft were approved
that didn't exist under RFC 3066.

Again, I think this is a question that deserves discussion. In relation
to the proposed draft, I don't see it as a particular problem with the
draft. It is a problem that doesn't exist in RFC 3066, but that is only
because RFC 3066 left us with bigger problems: it doesn't give us any
way to identify pieces that we would be encountering in registered tags
(apart from hard-coded tables compiled from versions of the registry
that pre-exist a given implementation).

RFC 3066 permits tags that have all kinds of internal structures. That
is a problem as it will never allow us to derive much useful information
from a tag with any confidence -- only the ISO 639 language category and
in some cases a country category. I predict that in the future we will
be seeing a significant number of tags (whether sanctioned without
registration by a successor to RFC 3066 or as tags registered under RFC
3066) that go beyond the patterns 'll(-CC) and lll(-CC). If we stick
with RFC 3066, we will have no way of writing forward-compatible
processors that will be able to do very useful matching.

What this draft does is impose some order to all the other patterns
within  tags that are permitted, and tell us what the different pieces
must be. As a result, we have more named pieces to deal with, and we are
presented with the question that Ned raised: Now we have more named
pieces than we did before; what do we do with them? That is a problem
that will need to be addressed. But I don't think it's a reason to
oppose the draft, since opposing the draft (or at least opposing any
revision that introduces a richer internal structure) leaves us in a
situation that must be characterized either as a worse problem or as
turning our backs on increased functionality to meet real user needs.



Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-06 Thread John C Klensin


--On Thursday, 06 January, 2005 06:35 -0800
[EMAIL PROTECTED] wrote:

...
 Extended language tags will neither help nor harm you, then.
 
 This actually may be true, because as I have said before, the
 likely outcome if this draft is adopted in its present form
 will be that it will simply be ignored in its entirety. But is
 this what we want?

Actually, Ned, my concern goes a little beyond ignored in its
entirety.   If this thing were adopted as a separate standard,
with some scope of applicability, and it were completely
ignored, that would not really be harmful except on the great
scoreboard of standards issued versus standards used.   But, if,
in the process, it succeeds in identifying 3066 as Obsolete
without replacing it with anything that is usable and
compatible, that could cause a serious reduction in
interoperability in the areas where 3066, today, is good enough
or just about good enough.

That brings me back to what I've tried to suggest several times
but which, as you observe, the authors of this draft seem to
dismiss either as noise or as a no change to 3066 under any
circumstances position.If the purpose is to get this model
out there, rather than replacing 3066 because it is generally
offensive, there is a fairly quick way to get this document
unstuck:

(1) Remove the notion and statements about obsoleting 3066.
This would probably require a change of title and some
introductory material, but that shouldn't be a big deal if the
real goal is to get it finished and published.

(2) Create a new section, called applicability that contains
at least one example of how and where this system would be used.
I'm not wild about it but, personally, I'd settle for some vague
handwaving like places where more comprehensive identification
is needed than is provided by 3066.

(3) Ask the IESG to approve publication of the thing as either a
Proposed Standard or an Experimental document, as they believe
best matches the needs and consensus of the community.  The
document is not a good candidate for BCP for three reasons: (i)
as you and I have commented, it contains algorithmic and
protocol specifications, rather than just specifying a
registration procedure.  3066 was, IMO, marginal in that regard;
this is well over the line.  (ii) As Jefsey has pointed out,
this is not yet a current practice, best or otherwise.
(iii) Two BCP documents covering the same space would be certain
to create confusion unless the applicability differences were
much more clearly stated than I think anyone is prepared to do.

Then we let the market sort the situation out.  If the proposers
of this specification are right about how important the
additional detail are, I'd expect to see 
   Content-language: 3066-tag
   X-Extended-Content-language: new-tag
and its equivalents show up all over the email environment and
the web.  The interpretation and matching issues would either
sort themselves out or they wouldn't.   If it became clear, in
practice, that this was the right way to go for a broader range
of applications, writing a short RFC to update the applicability
statement (and to move the thing from Experimental to Proposed
Standard if the IETF went the experimental route), would be
pretty trivial.   If that range appeared to the community to be
subsuming all of the applications of 3066, the same document
could provide the obsoletes 3066 decision.

You've probably got a prediction about how likely the broadest
form of outcome is, and it probably matches mine.  But the IETF
does not, and should not try, to impose technologies by
replacing working standards, and, despite my biases about
experience and processes, our predictions should ultimately be
no more determinative than that of the authors.  Let's separate
this from replaces 3066, get it out there, see how important
and useful the marketplace thinks it is, and let the marketplace
(and the experimental/ proposed standard models) sort of the
implementability and usability problems.

john



___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-06 Thread John C Klensin


--On Thursday, 06 January, 2005 07:42 -0800 Peter Constable
[EMAIL PROTECTED] wrote:

...
 But Ned's concerns are legitimate, I think. I'd say they are
 not necessarily blocking issues for this draft, because I
 think a possible outcome of discussion is to characterize them
 as concerns about outstanding issues that need to be solved
 rather than as concerns over the draft itself; but I do think
 they are valid concerns that deserve attention.

Peter, as soon as we get to valid concerns that deserve
attention, we remove the proposed document, I believe, as a
candidate for BCP.  We don't have any provision in the BCP rules
for pushing a document forward that identifies valid concerns
and other loose ends rather than having those issues resolved
sufficiently that we can talk about a practice.  So it means
that either

* The document needs to be withdrawn, these (and other)
concerns sorted out, and a new document produced that
addresses them.

or

* The document needs to be recast into Proposed Standard
or Experimental form, because we do have ways, there, to
say these are known outstanding issues that deserve
attention

That, of course, doesn't solve some other strategic/ positioning
issues with it; see my recent other note.

john



___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-06 Thread Dave Crocker
On Thu, 06 Jan 2005 11:04:54 -0500, John C Klensin wrote:
  Peter, as soon as we get to valid concerns that deserve
  attention, we remove the proposed document, I believe, as a
  candidate for BCP.

That pretty much applies to all specifications.  A Last Call that produces any 
sort of serious concern that folks feel should be taken seriously means that 
the document is not yet ready for approval.

It occurs to me that a Last Call for an independent submission has an added 
requirement to satisfy, namely that the community supports adoption of the 
work.  We take a working group as a demonstration of community support.  
(However we used to pressure for explicit statements during Last Call.)  My 
feeling is that an independent submission MUST show significant support during 
Last Call.

In other words, a working group document getting IETF Last Call has something 
of a Default Yes. And independent submission needs to be Default No.

And, indeed, I haven't seen much support for the document under discussion.


d/
--
Dave Crocker
Brandenburg InternetWorking
+1.408.246.8253
dcrocker  a t ...
WE'VE MOVED to:  www.bbiw.net


___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-06 Thread Mark Davis
3066) that go beyond the patterns 'll(-CC) and lll(-CC). If we stick
with RFC 3066, we will have no way of writing forward-compatible
processors that will be able to do very useful matching.

I want to reinforce what Peter has said. In RFC 3066 we have already
registered language tags like zh-Hans, and zh-Hant. Nobody can parse out the
script in the language tag because RFC 3066 does not provide for
identification of the pieces. During the development of 3066bis, we have
been holding off on registering all of the country variants of these,
because we didn't want them to be redundant with the generated codes in
3066bis. If we don't get 3066bis, then we will end up needing to register
the combinations zh-Hans-CN, zh-Hant-CN, zh-Hans-HK, zh-Hant-HK, zh-Hans-MO,
zh-Hant-MO, zh-Hans-SG, zh-Hant-SG, zh-Hans-TW, zh-Hant-TW. And zh is just
one example. There are many languages that can be written in different
scripts, where it is important as a matter of practice to be able to
distinguish the script as well as the country.

There are very good reasons to have the script code before the country code,
because differences by script swamp differences by country. Suppose that you
are composing a web page by pulling together different pieces of data, and
your target is Chinese simplified for Hong Kong. For one of those data
sources, there is not an exact match. Given a choice between a data source
in Chinese simplified, or a data source in Chinese Hong Kong (but
traditional), you really want to pick the Chinese simplified. That is
reflected in the use of the script value second (zh-Hant-HK), so that the
common process of truncation will get the right result.

This is similar to the reason why the language code comes before the country
code. If we had the order CH-fr, then we could end up mixing French and
German in the same page, because we would fall back (for one of the data
sources) from CH-fr to CH, which could be German.

Mark

- Original Message - 
From: Peter Constable [EMAIL PROTECTED]
To: [EMAIL PROTECTED]; ietf@ietf.org
Sent: Thursday, January 06, 2005 07:42
Subject: RE: draft-phillips-langtags-08, process, sp
ecifications,stability, and extensions


 From: [EMAIL PROTECTED] [mailto:ietf-languages-
 [EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED]


 Again, your pejorative dismissal of other people's concerns does not
 mean your position is valid...


 Parsing almost never is. But simply parsing these tag is not, and
never
 has
 been, the issue.

I think you guys are in violent agreement over country codes within a
tag, and that the debate over intrepreting the wording of RFC 3066
serves no purpose.

I think the intent of Mark's dismissal has been to refute
perceived-invalid objections, in which case we need to consider that the
line between perceived-invalid and truly-invalid has been blurred simply
by the volume of discussion (the noise factor). There have been some
invalid objections that bear some similarity to comments Ned has made as
he has tried to make his point. (E.g. Bruce Lilly has claimed invalid
back-compat problems on the incorrect premises that RFC 3066 does not
permit ISO 3166 country codes except as second subtags or does not
permit second subtags that are not country codes (at the moment I forget
if it was one or the other or both).)

But Ned's concerns are legitimate, I think. I'd say they are not
necessarily blocking issues for this draft, because I think a possible
outcome of discussion is to characterize them as concerns about
outstanding issues that need to be solved rather than as concerns over
the draft itself; but I do think they are valid concerns that deserve
attention.

In a nutshell, Ned was elaborating on a comment from Dave Singer that,
once we have parsed a pair of tags and identified all the pieces, it's
not a trivial matter to decide in every case how the two tags compare,
and that there are factors that would exist if the draft were approved
that didn't exist under RFC 3066.

Again, I think this is a question that deserves discussion. In relation
to the proposed draft, I don't see it as a particular problem with the
draft. It is a problem that doesn't exist in RFC 3066, but that is only
because RFC 3066 left us with bigger problems: it doesn't give us any
way to identify pieces that we would be encountering in registered tags
(apart from hard-coded tables compiled from versions of the registry
that pre-exist a given implementation).

RFC 3066 permits tags that have all kinds of internal structures. That
is a problem as it will never allow us to derive much useful information
from a tag with any confidence -- only the ISO 639 language category and
in some cases a country category. I predict that in the future we will
be seeing a significant number of tags (whether sanctioned without
registration by a successor to RFC 3066 or as tags registered under RFC
3066) that go beyond the patterns 'll(-CC) and lll(-CC). If we stick
with RFC 3066, we will have no way of writing

Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-06 Thread Scott W Brim
Dave Crocker wrote:
It occurs to me that a Last Call for an independent submission has an added 
requirement to satisfy, namely that the community supports adoption of the 
work.  We take a working group as a demonstration of community support.  
(However we used to pressure for explicit statements during Last Call.)  My 
feeling is that an independent submission MUST show significant support during 
Last Call.
In other words, a working group document getting IETF Last Call has something of a Default Yes. And independent submission needs to be Default No.
 

Pretty close.  Certainly the default can't be Yes.  However the reason 
why many things come in as individual submissions is that the community 
doesn't care much.  So if the IESG is satisfied enough to put out a last 
call, and nobody responds -- it doesn't have community support -- the 
default community position shouldn't be no but no objection. 

(In this specific case it appears to me, as an outsider, that there has 
been significant objection, and not all of it dismissable.)

swb
___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-06 Thread John C Klensin
Dave,

While we are pretty much in agreement, three observations, one
based on Scott's default no objection observation.

(1) I think you are right that there are two issues with an
independent submission, one of which is the notion of support
that doing something is a good idea.  And I agree that, for WG
efforts, we pretty much sort that issue out at charter time so
it would take strong evidence at Last Call time to be persuasive
that the community is not interested.  For that part, I agree
with you on the default no part -- either the community cares
enough that the document should be part of our corpus of
standards-track materials, or it doesn't and no objection
doesn't seem to be a good option (even though the appropriate
threshold for cares enough might be debated).  But there is
also the set of issues associated with given that we are
interested, it is technically adequate, does it solve the stated
(or implied) problems, and similar questions.  And, for that
piece, I think Scott's default no objection is about the right
target: it is reasonable for the community, as it figures out
how to allocate resources, to say this is worth doing, looks
close enough, and we trust the people who have done the work
sufficiently.

(2) It is orthogonal to the issues you have raised, but I
believe that barriers to approval to something that is intended
to replace something that is deployed and working should be
higher than the barrier for a new piece of work in a new area of
application with no potential for screwing things up.  One could
try to further grade that (or not) based on degree of backward
compatibility in applications and use.   I think that treating
replacement of different protocol or procedure or model as
needing a higher degree of certainty than new work is what the
IETF has done all along without ever being explicit about it as
a principle.  But it seems important to flag in this case.

(3) Finally, there is apparently a procedural oddity with this
document.  The people who put it together apparently held
extended discussions on the ietf-languages mailing list, a list
that was established largely or completely to review
registrations under 3066 and its predecessors.My
understanding at this point is that their good-faith impression
was that the discussions on that list were essentially
equivalent to those of a WG.  As a result, they came into this
Last Call process believing that their document ought to be
treated very much like a WG one with the presumption that all of
the relevant people were aware of their efforts and on their
list and hence that their consensus on the document should
create a default yes to both of the key questions that you
identify.  I think that conclusion is wrong, precisely because
they didn't have the benefit of the struggles about scope and
charter details, and the announcements that the effort was going
on, that invariably accompany a WG formation process.  And I
know of at least a case or two of IETF participants who would
have felt obligated to carefully track a WG chartered to replace
3066 who were not sufficiently aware of this effort to track it
except as an evolving individual submission draft.   The
question of how that process confusion arose, and whether we are
doing the right things in cases like this,  might be something
the community should examine at some point, but it is largely
independent of how this document should be treated going forward.

regards and best new year's wishes,
john


--On Thursday, 06 January, 2005 09:02 -0800 Dave Crocker
[EMAIL PROTECTED] wrote:

 On Thu, 06 Jan 2005 11:04:54 -0500, John C Klensin wrote:
   Peter, as soon as we get to valid concerns that deserve
   attention, we remove the proposed document, I believe, as a
   candidate for BCP.
 
 That pretty much applies to all specifications.  A Last Call
 that produces any sort of serious concern that folks feel
 should be taken seriously means that the document is not yet
 ready for approval.
 
 It occurs to me that a Last Call for an independent submission
 has an added requirement to satisfy, namely that the community
 supports adoption of the work.  We take a working group as a
 demonstration of community support.  (However we used to
 pressure for explicit statements during Last Call.)  My
 feeling is that an independent submission MUST show
 significant support during Last Call.
 
 In other words, a working group document getting IETF Last
 Call has something of a Default Yes. And independent
 submission needs to be Default No.
 
 And, indeed, I haven't seen much support for the document
 under discussion.
 
 
 d/
 --
 Dave Crocker
 Brandenburg InternetWorking
 +1.408.246.8253
 dcrocker  a t ...
 WE'VE MOVED to:  www.bbiw.net
 
 
 ___
 Ietf mailing list
 Ietf@ietf.org
 https://www1.ietf.org/mailman/listinfo/ietf





___
Ietf mailing list
Ietf@ietf.org

Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-06 Thread Dave Singer
This is similar to the reason why the language code comes before the country
code. If we had the order CH-fr, then we could end up mixing French and
German in the same page, because we would fall back (for one of the data
sources) from CH-fr to CH, which could be German.

It has to be application-specific which fallback happens.  If the 
user says he's swiss french, and the the content has alternative 
offers for swiss german or french french, which do you present?  If 
the content actually differs for legal or geographic reasons ('the 
legal representative in your country is', 'for copyright reasons this 
edition differs in material ways from other countries'), then the 
correct country but wrong language is the best answer.  If the desire 
is simply for maximum intelligibility, then the reverse is true.
--
David Singer
Apple Computer/QuickTime

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-06 Thread kristin . hubner
I notice two main types of arguments going on in this thread, where it seems to 
me that there is frustration
and talking past each other occurring due to fundamentally different concerns 
and assumptions between
different constituencies. 

One type of conflict seems to me between what I will term, for convenience (and 
please, I don't want to get
side-tracked on my choice of terms -- I just want convenient words) 
implementors vs. linguists.

By implementors, I mean those whose concern is primarily on how to interpret 
(act on) received language tags
-- consumers of language tags, where falling back to a general or 
compatible match may be desirable when an
exact match is not available.  From their point of view, the most important 
aspect of language tags is being able to
parse and match them -- exact linguistic purity and accuracy is a secondary 
issue.  From their point of view, the 
addition of new tags, regardless of whether the new tags improve language 
tagging accuracy, may be actively
harmful unless accompanied by improved matching rules.  To the extent that the 
adding of tags moves beyond
simple registration of new tags, and instead into new forms of tags and new 
rules for interpreting tags, that is, that
the new tags bring up fundamental matching algorithm questions, that becomes 
the main concern for this group.

There are what I will refer to as linguistic purists, whose concern is 
primarily on having precise, accurate tags
availabel for languages.  (These may be people whose orientation is on 
generating content, and labelling it 
accurately.)  For this group, the most important aspect of language tags is 
having them be accurate and precise.
Any matching issue (and in particular issues of trying to fall back to a more 
generic match when an exact match
is not available) are secondary.

The opinion on whether a tag is useful then varies: it's useful if I know 
how to match it vs. it's useful if it's accurate.

An example where the difference in orientation shows up is with the position of 
script vs. country in tags.  From the
linguistic point of view, there are arguments for having script come first.  
But from the implementation point of view,
that is less backwards-compatible with 3066, hence more problematic.

The process question of whether this is appropriately a BCP, or whether it is 
at least implicitly  bringing up
algorithmic implementation issues and hence instead ought to be perhaps a 
Proposed Standard or an Experimental 
Standard, also has something to do with this difference in orientation.
 
A second type of argument, (which I should mention I have largely tuned out so 
this is my superficial and not very
informed take on it), seems to me to be more linguistic/political in nature, 
which is what is the correct (linguistically 
correct? politically correct?) way to name the tags: what sort of naming scheme 
corresponds to linguistic reality,
or what sort of naming scheme is politically acceptable, and is there a 
conflict there.  This does get back to the
algorithmic matching issue in a sense though, which is that if one wants some 
sort of hierarchical structure to
the tags (to allow easier matching), or indeed define any sort of matching 
rules (as an implementor wants), you're
probably getting right into some political questions about how matching should 
work.   So for those who wanted
to stick just to linguistic accuracy and try to avoid political issues, trying 
to avoid discussion of algorithmic matching
may have seemed appealing (but then provides no help to what I've termed the 
implementors).

If we can keep in mind that there are different constituencies interested in 
language tags, with different main concerns,
then I would hope for less frustration and irritation with others missing the 
main point, so that constructive 
discussions can occur, leading to some compromise useful to everyone. 

Regards,

Kristin

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-06 Thread John Cowan
Dave Singer scripsit:

 It has to be application-specific which fallback happens.  If the 
 user says he's swiss french, and the the content has alternative 
 offers for swiss german or french french, which do you present?  If 
 the content actually differs for legal or geographic reasons ('the 
 legal representative in your country is', 'for copyright reasons this 
 edition differs in material ways from other countries'), then the 
 correct country but wrong language is the best answer.  If the desire 
 is simply for maximum intelligibility, then the reverse is true.

Absolutely, which is why the fallback rule isn't and can't be a
protocol-level transaction or (a fortiori) an interop issue.  It's
simply a useful default in many circumstances.

-- 
All Norstrilians knew what laughter was:John Cowan
it was pleasurable corrigible malfunction.http://www.reutershealth.com
--Cordwainer Smith, Norstrilia  [EMAIL PROTECTED]

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-06 Thread Peter Constable
 From: Dave Singer [mailto:[EMAIL PROTECTED]

 This is similar to the reason why the language code comes before the
country
 code. If we had the order CH-fr, then we could end up mixing French
and
 German in the same page, because we would fall back (for one of the
data
 sources) from CH-fr to CH, which could be German.
 
 It has to be application-specific which fallback happens.  If the
 user says he's swiss french, and the the content has alternative
 offers for swiss german or french french, which do you present?  If
 the content actually differs for legal or geographic reasons ('the
 legal representative in your country is', 'for copyright reasons this
 edition differs in material ways from other countries'), then the
 correct country but wrong language is the best answer.  If the desire
 is simply for maximum intelligibility, then the reverse is true.

But that is a level of decision making that goes well beyond any
algorithm that simply uses truncation of tags, which is the only case in
which the ordering of sub-tags matters.



Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-06 Thread Dave Singer
At 11:34 AM -0800 1/6/05, Peter Constable wrote:
  From: Dave Singer [mailto:[EMAIL PROTECTED]
 This is similar to the reason why the language code comes before the
country
 code. If we had the order CH-fr, then we could end up mixing French
and
 German in the same page, because we would fall back (for one of the
data
 sources) from CH-fr to CH, which could be German.
 It has to be application-specific which fallback happens.  If the
 user says he's swiss french, and the the content has alternative
 offers for swiss german or french french, which do you present?  If
 the content actually differs for legal or geographic reasons ('the
 legal representative in your country is', 'for copyright reasons this
 edition differs in material ways from other countries'), then the
 correct country but wrong language is the best answer.  If the desire
 is simply for maximum intelligibility, then the reverse is true.
But that is a level of decision making that goes well beyond any
algorithm that simply uses truncation of tags, which is the only case in
which the ordering of sub-tags matters.
Sorry, I should have gone on to conclude:  the important aspect of 
sub-tags is that their nature and purpose be identifiable and 
explained (e.g. that this is a country code), and that we retain 
compatibility with previous specifications.  This tagging uses order 
(and size) of sub-tags rather than explicit labels to say what 
something is, and we're stuck with that.  I don't believe that simple 
truncation is a necessarily useful operation in all circumstances, 
and it probably should not be in the spec. at all.  For example, I'd 
say that we should retain the 3066 ordering of language-country and 
therefore script, if needed, comes later.  However, my typesetting 
subsystem doesn't care a jot about language or country, it just needs 
to find the script code ('can I render this script'?).

This spec. should unambiguously allow me to extract the language, 
country, script etc., should say under what circumstances two 
sub-tags of any type match, state the obvious that two tags exactly 
match if they have the same sub-tags and they all match, that partial 
perfect matches (of tags with differing numbers of sub-tags) are 
possible and may be applicable, and that the use of imperfect matches 
(in which not all sub-tags match) has to be application-specific. 
Examples of why on the latter would be helpful.
--
David Singer
Apple Computer/QuickTime

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-06 Thread Peter Constable
 From: Dave Singer [mailto:[EMAIL PROTECTED]


 Sorry, I should have gone on to conclude:  the important aspect of
 sub-tags is that their nature and purpose be identifiable and
 explained (e.g. that this is a country code), and that we retain
 compatibility with previous specifications.

Ah! Then the proposed draft ensures that the nature of subtags are
always identifiable, which RFC 3066 (as I mentioned earlier) fails to
do. 

And the draft retains compatibility with previous specifications using
an assumption (thoroughly discussed and concluded on the IETF-languages
list a year ago) that, in case of left-prefix matching processes, script
distinctions are generally far more important that country distinctions.


 I don't believe that simple
 truncation is a necessarily useful operation in all circumstances,

I don't think anyone would dispute that.


 and it probably should not be in the spec. at all.  For example, I'd
 say that we should retain the 3066 ordering of language-country and
 therefore script, if needed, comes later.  However, my typesetting
 subsystem doesn't care a jot about language or country, it just needs
 to find the script code ('can I render this script'?).

Here I disagree. For other purposes, I think it's very clear that the
only time that choice of order matters is with matching algorithms that
use simple truncation, and for the most common implementations, which
use left-prefix truncation, the order lang-script-country will be far
more useful in the long run precisely because script distinctions are
generally far more important in matching than country distinctions. I
don't know of any case in which a tag might be used that contained all
three subtags but in which the country distinction generally matters
more than the script distinction.


Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-06 Thread John Cowan
Dave Singer scripsit:

 This spec. should unambiguously allow me to extract the language, 
 country, script etc.,

It does (and RFC 3066 does not).

 should say under what circumstances two 
 sub-tags of any type match, state the obvious that two tags exactly 
 match if they have the same sub-tags and they all match, that partial 
 perfect matches (of tags with differing numbers of sub-tags) are 
 possible and may be applicable, and that the use of imperfect matches 
 (in which not all sub-tags match) has to be application-specific. 

Except for the point that tags match if they are (case-insensitively)
identical, which is already made, we don't believe that any of these
other things can be normatively enunciated.

-- 
Schlingt dreifach einen Kreis vom dies!John Cowan [EMAIL PROTECTED]
Schliesst euer Aug vor heiliger Schau, http://www.reutershealth.com  
Denn er genoss vom Honig-Tau,  http://www.ccil.org/~cowan  
Und trank die Milch vom Paradies.-- Coleridge (tr. Politzer)

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-06 Thread John Cowan
John C Klensin scripsit:

Content-language: 3066-tag
X-Extended-Content-language: new-tag

This reflects a fundamental misunderstanding of what the draft does
compared to what RFC 3066 does.  It imposes *more* restraints on language
tags, not fewer.  The RFC 3066 language tag registration process can
register tags with almost unpredictable meaning once one gets past the
first subtag.  The draft *limits* the possible tags to a small subset,
and tightens up the allowable semantics.  It allows no tag to be used
that was not already registerable under RFC 3066.

In RFC 3066, it is only a heuristic (or examination of the IANA registry,
which is not machine-parseable) that tells the meaning of the second
subtag the existing registered tag sr-Latn.  In the draft, its meaning
is unambiguously specified a priori.

-- 
John Cowan  [EMAIL PROTECTED]  http://www.ccil.org/~cowan
Raffiniert ist der Herrgott, aber boshaft ist er nicht.
--Albert Einstein

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-06 Thread Dave Singer
At 12:14 PM -0800 1/6/05, Peter Constable wrote:
  From: Dave Singer [mailto:[EMAIL PROTECTED]

 Sorry, I should have gone on to conclude:  the important aspect of
 sub-tags is that their nature and purpose be identifiable and
 explained (e.g. that this is a country code), and that we retain
 compatibility with previous specifications.
Ah! Then the proposed draft ensures that the nature of subtags are
always identifiable, which RFC 3066 (as I mentioned earlier) fails to
do.
And the draft retains compatibility with previous specifications using
an assumption (thoroughly discussed and concluded on the IETF-languages
list a year ago) that, in case of left-prefix matching processes, script
distinctions are generally far more important that country distinctions.
as has been beautifully pointed out on the list, that is a view that 
is lingo-centric.  If what I am trying to differentiate is the price 
(and the currency of the price) of an item, the country may be much 
more important than the script that the price is written in.  (this 
is also an example for the last point below).  I repeat, I don't 
think truncation -- and hence prefix-matching -- is very stable or 
nearly universally applicable enough to be mentioned.  Whereas I do 
believe compatibility of ordering with 3066 is important.


 I don't believe that simple
 truncation is a necessarily useful operation in all circumstances,
I don't think anyone would dispute that.

 and it probably should not be in the spec. at all.  For example, I'd
 say that we should retain the 3066 ordering of language-country and
 therefore script, if needed, comes later.  However, my typesetting
 subsystem doesn't care a jot about language or country, it just needs
 to find the script code ('can I render this script'?).
Here I disagree. For other purposes, I think it's very clear that the
only time that choice of order matters is with matching algorithms that
use simple truncation, and for the most common implementations, which
use left-prefix truncation, the order lang-script-country will be far
more useful in the long run precisely because script distinctions are
generally far more important in matching than country distinctions. I
don't know of any case in which a tag might be used that contained all
three subtags but in which the country distinction generally matters
more than the script distinction.
--
David Singer
Apple Computer/QuickTime
___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-06 Thread John Cowan
Dave Singer scripsit:

 as has been beautifully pointed out on the list, that is a view that 
 is lingo-centric.  If what I am trying to differentiate is the price 
 (and the currency of the price) of an item, the country may be much 
 more important than the script that the price is written in.  (this 
 is also an example for the last point below).

Using the language-tag to retrieve the country implicitly referenced
in the content is far more unreliable than prefix-matching.  Just
because this document is written in en-US doesn't mean I can't
refer to the price of some consumer device as 200,000 yen.

 I repeat, I don't 
 think truncation -- and hence prefix-matching -- is very stable or 
 nearly universally applicable enough to be mentioned.

It's there to clarify the rule already given in RFC 3066.

 Whereas I do 
 believe compatibility of ordering with 3066 is important.

RFC 3066 already supports tags that don't fit.

-- 
John Cowan  www.reutershealth.com  www.ccil.org/~cowan  [EMAIL PROTECTED]
Arise, you prisoners of Windows / Arise, you slaves of Redmond, Wash,
The day and hour soon are coming / When all the IT folks say Gosh!
It isn't from a clever lawsuit / That Windowsland will finally fall,
But thousands writing open source code / Like mice who nibble through a wall.
--The Linux-nationale by Greg Baker

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-06 Thread Peter Constable
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]

 I notice two main types of arguments going on in this thread, where it
seems to me
 that there is frustration
 and talking past each other occurring due to fundamentally different
concerns and
 assumptions between different constituencies...

I have feet in both the implementors and linguistic purists camps,
and so think I understand both. But there are many points on which I
don't agree with your assessment.


 From [the implementors'] point of view, the most important aspect of
 language tags is being able to
 parse and match them -- exact linguistic purity and accuracy is a
secondary issue.

I would say as an implementor that it's important to find appropriate
ways to match tags that meet legitimate needs in realistic scenarios in
the best way we can, and to be aware of behaviour that will be
experienced when using existing implementations, making sure that any
degredation of behaviour is known and accepted to be offset by benefits,
and that there are no really bad behaviours that may result. I would
consider exact linguistic purity secondary since this system is not
intended to document linguistic realities but to provide useful
behaviours related to differences in language usage in information
systems.



 From [the implementors'] point of view, the
 addition of new tags, regardless of whether the new tags improve
language tagging
 accuracy, may be actively
 harmful unless accompanied by improved matching rules.

Here, I disagree, unless this statement is to be understood in a
hypothetical way -- a priori, it would be possible to make changes that
are harmful, but I do not assume that addition of new tags is
necessarily harmful.


 To the extent that the adding of tags moves beyond
 simple registration of new tags, and instead into new forms of tags
and new rules for
 interpreting tags, that is, that
 the new tags bring up fundamental matching algorithm questions, that
becomes the
 main concern for this group.

There are no new forms of tags proposed! 

The draft would impose *restraints* on the forms that tags can take, and
define precisely what forms tags could take. 

This is a point where there may be some talking past each other. Some
people are speaking from a position in which it is assumed that the part
of a tag that refers to country can be predicted to be in the
second-subtag position. Those supporting the draft are responding that
RFC 3066 does not assume this; it only implies that the only case in
which a country code can be reliably recognized as such is when it is
the second subtag. The former assume that we should continue to keep
country codes in second position because that's the place we've been
able up to now to recognize it. The latter respond that 

- existing implementations will still be able to recognize it when its
in that place

- RFC 3066 permits it to come in other places, but existing
implementations will never be able to recognize it more than
heuristically

- that the new draft would allow new implementations to *always*
recognize it in any tag, and

- *as implementors* it is thought that requiring that country codes only
ever come in that place is *not* what will provide the best behaviours
for users (specifically in cases where script and country subtags are
both used).



 For [linguistic purists], the most important aspect of language tags
is having
 them be accurate and precise...
 Any matching issue (and in particular issues of trying to fall back to
a more generic
 match when an exact match is not available) are secondary.

For the linguist, what matters is the functional behaviour of the
system, including matching, but not the implementation. The linguist,
per se, has no opinion on what the internal structure of tags should
look like; they only specify what the functional requirements of the
overall system should be, and which tradeoffs in functionality are
better or worse.

But maybe I haven't got the same picture of the distinction between the
implementor and the linguistic purist that you intend. 


 
 A second type of argument... seems to me to be more
linguistic/political in nature, which is
 what is the correct (linguistically correct? politically correct?)
way to name the tags: what sort of naming scheme corresponds to
linguistic reality,

The question of what the relationship between the naming scheme an
ontologies is important inasmuch as knowing the ontology informs us of
what kinds of distinctions need to be made and kinds of relationships
may exist between those kinds of distinctions, and that guides us in
determining functional requirements, which should be the basis of
implementations. (Once again, a pointer to a white paper on these issues
from a few years ago:
http://www.sil.org/silewp/abstract.asp?ref=2002-003.) 


 or what sort of naming scheme is politically acceptable, and is there
a conflict there.
 This does get back to the
 algorithmic matching issue in a sense though, which is that if 

RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-06 Thread Dave Singer
I'm sorry, this example I gave doesn't correspond to *language* 
matching.  My error. My apologies.

(Nor should my questions on this subject be seen as suggesting either 
that I as an individual, or particularly Apple as a company, is 
unhappy revising RFC 3066.)

At 12:35 PM -0800 1/6/05, Dave Singer wrote:
At 12:14 PM -0800 1/6/05, Peter Constable wrote:
  From: Dave Singer [mailto:[EMAIL PROTECTED]
 Sorry, I should have gone on to conclude:  the important aspect of
 sub-tags is that their nature and purpose be identifiable and
 explained (e.g. that this is a country code), and that we retain
 compatibility with previous specifications.
Ah! Then the proposed draft ensures that the nature of subtags are
always identifiable, which RFC 3066 (as I mentioned earlier) fails to
do.
And the draft retains compatibility with previous specifications using
an assumption (thoroughly discussed and concluded on the IETF-languages
list a year ago) that, in case of left-prefix matching processes, script
distinctions are generally far more important that country distinctions.
as has been beautifully pointed out on the list, that is a view that 
is lingo-centric.  If what I am trying to differentiate is the price 
(and the currency of the price) of an item, the country may be much 
more important than the script that the price is written in.  (this 
is also an example for the last point below).  I repeat, I don't 
think truncation -- and hence prefix-matching -- is very stable or 
nearly universally applicable enough to be mentioned.  Whereas I do 
believe compatibility of ordering with 3066 is important.


 I don't believe that simple
 truncation is a necessarily useful operation in all circumstances,
I don't think anyone would dispute that.
 and it probably should not be in the spec. at all.  For example, I'd
 say that we should retain the 3066 ordering of language-country and
 therefore script, if needed, comes later.  However, my typesetting
 subsystem doesn't care a jot about language or country, it just needs
 to find the script code ('can I render this script'?).
Here I disagree. For other purposes, I think it's very clear that the
only time that choice of order matters is with matching algorithms that
use simple truncation, and for the most common implementations, which
use left-prefix truncation, the order lang-script-country will be far
more useful in the long run precisely because script distinctions are
generally far more important in matching than country distinctions. I
don't know of any case in which a tag might be used that contained all
three subtags but in which the country distinction generally matters
more than the script distinction.

--
David Singer
Apple Computer/QuickTime
___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-06 Thread Mark Davis
First, I apologize about the statement there has been a lot of noise on
this issue. By that, I wasn't really meaning your message in particular. I
was commenting more on the general status of a quite a number of statements
that have been made on the overall topic. And by noise, I really mean
high-level statements without explicit examples or scenarios, where it is
very hard for people not familiar with the details to be able to judge the
correctness of the statements.

And I will assume that it was that perceived insult that caused you to be
dismissive, with your statement below about Fine, whatever. I assume that
otherwise you would not so readily conclude that it didn't matter whether
RFC 3066 said if X then Y vs. if Y then X. Those are, after all, very
different statements, and a confusion between them would cause incorrect
conclusions to be drawn.

  (c) Every single tag that could be generated under RFC 3066bis is a tag
that
  could have been registered under RFC 3066.

 True but irrelevant.

Not at all irrelevant. Suppose someone is using a RFC 3066 parser, and is
faced with either:

(a) a registered tag from a future version of the RFC 3066 registry, or
(b) a 3066bis tag (that uses generative features not in RFC 3066).

Their parser will work *exactly* the same way; they would parse both as
being equally well-formed, and they will be unable to determine any of the
structure of either tag, and just treat each as a blob. So they are no
better off, but *no worse off either*. (Had we not followed (c), this would
not have been true.) Of course, if they try parsing a tag that is generated
according to RFC 3066 (eg not in the registry), then they would be able to
parse out the language code and/or country.

If they update to a 3066bis parser, then they can reliably extract much more
information from the tag. And because 3066bis was written to be backwards
compatible, anything RFC 3066 generated language tag parses out exactly the
same as it would with an RFC 3066 parser.

Now you yourself may not care much about the extra information in the
3066bis language tag. But IBM, and many other companies and organizations
do. This is not some theoretically problem; it is a real current issue that
many are faced with. For example, without reliable script information many
languages are severely underspecified. One simply cannot mix content with
different scripts and have happy customers.

And if you don't care about the extra information, you are no worse off than
if you were trying to parse a registered RFC 3066 tag. For matching
purposes, the commonly used truncation mechanism will work just as well with
all 3066bis tags as it does with RFC 3066 tags, for all tags you will
encounter.

Mark

- Original Message - 
From: [EMAIL PROTECTED]
To: Mark Davis [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; ietf@ietf.org
Sent: Thursday, January 06, 2005 06:44
Subject: Re: draft-phillips-langtags-08, process, sp
ecifications,stability,and extensions


   Rather, the rule is simply that a country code, if present,
   always appears as a two letter second subtag. The new draft changes
this
  rule,
   so applications that pay attention to coutnry codes in language tags
have
  to
   change and the new algorithm for finding the country code is trickier.

  Your text above says (a) if there is a country code in the tag, it is
the
  second subtag. That is not what text of RFC 3066 actually says, which
is:

   The following rules apply to the second subtag:
   All 2-letter subtags are interpreted as ISO 3166 alpha-2 country...

  That is, it says (b) if a second subtag has 2 letters, then it is an
ISO
  3166 code, which is not the same as (a). (It is almost, but not quite,
the
  converse.)

 Fine, whatever.

  The current RFC certainly does not forbid the use of country
  codes in other positions in language tags. One could absolutely register
  en-Latin-US, for example, meaning English as spoken in the US written in
  Latin script.

 Sure, but my point was, is, and always has been that any 3066-compliant
 implementation won't see this as a country code (unless it is table
driven,
 which brings up its own set of issues).

  There has been a lot of noise on this issue, and too few concrete
examples.

 No, what there has been is a lot of discussion of a real problem with no
 apparent recognition of it as such by the draft authors. Your pejorative
 characterization of this as noise does not make it so.

  In the so-called 3066bis draft, we have striven very hard to ensure
that:

  (c) Every single tag that could be generated under RFC 3066bis is a tag
that
  could have been registered under RFC 3066.

 True but irrelevant.

  Thus if someone wrote a parser that is future-compatible -- that could
parse
  all RFC 3066 language tags including those registered after the parser
was
  deployed -- then that parser can handle all 3066bis language tags. This
is a
  significant advance over RFC 3066, whose registered

RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-06 Thread Peter Constable
 From: [EMAIL PROTECTED] [mailto:ietf-languages-
 [EMAIL PROTECTED] On Behalf Of John C Klensin


 (3) Finally, there is apparently a procedural oddity with this
 document.  The people who put it together apparently held
 extended discussions on the ietf-languages mailing list, a list
 that was established largely or completely to review
 registrations under 3066 and its predecessors.My
 understanding at this point is that their good-faith impression
 was that the discussions on that list were essentially
 equivalent to those of a WG.

I believe I can say that it was done this way because it followed the
example of the development of RFC 3066, which to my knowledge (as a
member of the IETF-languages list at that time) happened in the same
way. It was certainly done with a good-faith impression that appropriate
procedures were being followed.



Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-06 Thread ned . freed
 And I will assume that it was that perceived insult that caused you to be
 dismissive,

I was dismissive because your correction, while accurate, was irrelevant
to the current discussion of the change to country code semantics.

 with your statement below about Fine, whatever. I assume that
 otherwise you would not so readily conclude that it didn't matter whether
 RFC 3066 said if X then Y vs. if Y then X. Those are, after all, very
 different statements, and a confusion between them would cause incorrect
 conclusions to be drawn.

Of course they are, but I fail to see how any of this impacts the country
code issue I have been discussing.

   (c) Every single tag that could be generated under RFC 3066bis is a tag 
   that
   could have been registered under RFC 3066.
 
  True but irrelevant.

 Not at all irrelevant.

I meant, of course, that it is irrelevant to the issue at hand.

 Suppose someone is using a RFC 3066 parser, and is
 faced with either:

 (a) a registered tag from a future version of the RFC 3066 registry, or
 (b) a 3066bis tag (that uses generative features not in RFC 3066).

 ...

I am well aware of the value of this sort of backwards compatibility. I
am not, I hope, a total fool, which I would have to be to be unaware of this.

 If they update to a 3066bis parser, then they can reliably extract much more
 information from the tag. And because 3066bis was written to be backwards
 compatible, anything RFC 3066 generated language tag parses out exactly the
 same as it would with an RFC 3066 parser.

 Now you yourself may not care much about the extra information in the
 3066bis language tag.

I never said that. In fact I have repeatedly said exactly the opposite.

 But IBM, and many other companies and organizations
 do. This is not some theoretically problem; it is a real current issue that
 many are faced with. For example, without reliable script information many
 languages are severely underspecified. One simply cannot mix content with
 different scripts and have happy customers.

I am well aware of this. You are presenting a strawman argument here.

 And if you don't care about the extra information, you are no worse off than
 if you were trying to parse a registered RFC 3066 tag.

It is somewhat axiomatic that if I don't care about something I don't
care when that something changes.

 For matching
 purposes, the commonly used truncation mechanism will work just as well with
 all 3066bis tags as it does with RFC 3066 tags, for all tags you will
 encounter.

Given that the matching approach I have been talking about is not simple
truncation, I'm afraid this is yet another strawman.

Ned

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-06 Thread ned . freed
 In a nutshell, Ned was elaborating on a comment from Dave Singer that,
 once we have parsed a pair of tags and identified all the pieces, it's
 not a trivial matter to decide in every case how the two tags compare,
 and that there are factors that would exist if the draft were approved
 that didn't exist under RFC 3066.

Finally! Thank you! This is exactly what I have been trying to say.

 Again, I think this is a question that deserves discussion. In relation
 to the proposed draft, I don't see it as a particular problem with the
 draft. It is a problem that doesn't exist in RFC 3066, but that is only
 because RFC 3066 left us with bigger problems: it doesn't give us any
 way to identify pieces that we would be encountering in registered tags
 (apart from hard-coded tables compiled from versions of the registry
 that pre-exist a given implementation).

With, as you point out below, one important exception: It did have a way to
reliably identify a country code in most cases (but not all). And this ability
to say 2 character subtag in the second position, most be a country code was
quite useful even though it might miss other occurences of country codes in
some cases.

3066bis provides a reliable way to locate country codes in all cases, but the
algorithm is different. And this is a non-backwards-compatible change.

Of course there's the option Dave Singer has raised: Reverse the positions of
script and country codes in 3066bis. I see two problems with this:

(1) Script codes are in general more important than country codes, and
therefore really should come first so that simple truncation matches
work better. (There are probably exceptions to this assertion lurking
out there somewhere, but I believe it is mostly true.)

(2) I believe it increases the number of grandfathered codes that won't conform
to the new format.

Now, it may be that, after full consideration of all the issues, especially
given that the 3066 algorithm could not locate country codes in all cases, the
right way forward is to make this non-backwards-compatible change, fully
document the change and its consequences (although I will again point out that
assessing the true impact on the installed base is a practical impossibility),
and move on. But as you say, it does deserve discussion.

 RFC 3066 permits tags that have all kinds of internal structures. That
 is a problem as it will never allow us to derive much useful information
 from a tag with any confidence -- only the ISO 639 language category and
 in some cases a country category. I predict that in the future we will
 be seeing a significant number of tags (whether sanctioned without
 registration by a successor to RFC 3066 or as tags registered under RFC
 3066) that go beyond the patterns 'll(-CC) and lll(-CC). If we stick
 with RFC 3066, we will have no way of writing forward-compatible
 processors that will be able to do very useful matching.

A very good point.

 What this draft does is impose some order to all the other patterns
 within  tags that are permitted, and tell us what the different pieces
 must be. As a result, we have more named pieces to deal with, and we are
 presented with the question that Ned raised: Now we have more named
 pieces than we did before; what do we do with them? That is a problem
 that will need to be addressed. But I don't think it's a reason to
 oppose the draft, since opposing the draft (or at least opposing any
 revision that introduces a richer internal structure) leaves us in a
 situation that must be characterized either as a worse problem or as
 turning our backs on increased functionality to meet real user needs.

What would be really nice is to specify a parameterized matching algorithm (or
more precisely, an algorithm family) along the lines of the stringprep family
of string normalization algorithms. But I'm unsure if there's sufficient time
and interest available to do this. But it is nice to dream...

Ned

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-06 Thread JFC (Jefsey) Morfin
Dear John,
thank you to acknowledge that the proposed draft _impose_ something ! It 
therefore do not report on an existing practice.
thank you to acknowledge that the proposed draft even _limits_ the 
current practice !
thank you to explain that the decision of the user is replaced by an 
a-priori obligation .. resulting from a decision of a member of this list.

Technically, these remarks are however without incidence on John Klenin's 
remark: a limitation is only a (negative) extension. I support _every_ 
position of John Klensin today (except that the IANA registrations should 
be transfered to IANA now which would set-up a global procedure equal to 
all, to address the possible discrepancies between the RFC 3066 and the 
draft). This would permit an acceptance by the IESG. Otherwise such an 
acceptance is unadvisable.
jfc

At 21:28 06/01/2005, John Cowan wrote:
John C Klensin scripsit:
Content-language: 3066-tag
X-Extended-Content-language: new-tag
This reflects a fundamental misunderstanding of what the draft does
compared to what RFC 3066 does.  It imposes *more* restraints on language
tags, not fewer.  The RFC 3066 language tag registration process can
register tags with almost unpredictable meaning once one gets past the
first subtag.  The draft *limits* the possible tags to a small subset,
and tightens up the allowable semantics.  It allows no tag to be used
that was not already registerable under RFC 3066.
In RFC 3066, it is only a heuristic (or examination of the IANA registry,
which is not machine-parseable) that tells the meaning of the second
subtag the existing registered tag sr-Latn.  In the draft, its meaning
is unambiguously specified a priori.
--
John Cowan  [EMAIL PROTECTED]  http://www.ccil.org/~cowan
Raffiniert ist der Herrgott, aber boshaft ist er nicht.
--Albert Einstein
___
Ietf-languages mailing list
[EMAIL PROTECTED]
http://www.alvestrand.no/mailman/listinfo/ietf-languages

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-06 Thread Peter Constable
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]


  RFC 3066 left us with bigger problems: it doesn't give us any
  way to identify pieces that we would be encountering in registered
tags
  (apart from hard-coded tables compiled from versions of the registry
  that pre-exist a given implementation).
 
 With, as you point out below, one important exception: It did have a
way
 to
 reliably identify a country code in most cases (but not all).

If in most cases means from among tags in use today under the terms of
RFC 3066 (as John Cowan would say, what is true), then yes. But if in
most cases means trom among tags permitted by RFC 3066 (as John Cowan
might say, what is the rule) -- including some that users have been
wanting to use but have delayed using pending a revision of RFC 3066--
then no: RFC 3066 allowed for reliable identification of a country code
in only a small portion of all possible cases: only if it occurred as
the second subtag following an ISO 639 code (it does not prohibit a
country code from occurring anywhere after the first subtag).



 And this ability
 to say 2 character subtag in the second position, most be a country
code
 was
 quite useful even though it might miss other occurences of country
codes
 in some cases.

The draft would still grant the ability to make that statement, and
would permit new implementations never to miss *any* occurences of
country codes.


 
 3066bis provides a reliable way to locate country codes in all cases,
but
 the
 algorithm is different. And this is a non-backwards-compatible change.

Surely this has been the point of greatest contention in this
discussion, and is clearly not obvious, for there are several who have
repeatedly indicated that they do not see any such backwards
non-compatibility. Please, anyone claiming there would be
incompatibility, be pedantic: define whatever terms, make explicit
whatever assumptions are required to support this claim. (I suspect the
root of this disagreement lies in unstated assumptions.) 

Those who claim backward compatibility do so on the basis that every
existing implementation conformant to RFC 3066 will continue to operate
precisely as designed and in conformance with RFC 3066 regardless
whether they encounter a tag presently well-formed and valid under the
terms of RFC 3066 or one that would be sanctioned by this draft. If
there is any term needing clarification in that statement or any
suspected assumption not made plain, please ask for clarification.
 

 
 Of course there's the option Dave Singer has raised: Reverse the
positions
 of
 script and country codes in 3066bis. I see two problems with this:
 
 (1) Script codes are in general more important than country codes, and
 therefore really should come first so that simple truncation
matches
 work better. (There are probably exceptions to this assertion
 lurking
 out there somewhere, but I believe it is mostly true.)

Thank you for voicing support for this position.

 
 (2) I believe it increases the number of grandfathered codes that
won't
 conform
 to the new format.

If I'm not mistaken, I think there would be no difference in this
regard.

 

Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-06 Thread John C Klensin


--On Thursday, 06 January, 2005 15:28 -0500 John Cowan
[EMAIL PROTECTED] wrote:

 John C Klensin scripsit:
 
Content-language: 3066-tag
X-Extended-Content-language: new-tag
 
 This reflects a fundamental misunderstanding of what the draft
 does compared to what RFC 3066 does.  It imposes *more*
 restraints on language tags, not fewer.

It also very explicitly permits talking about scripts, not just
languages and countries.That, to me, is an extension,
regardless of the additional constraints.  But I could have used
a different word; this was just an example.

  The RFC 3066 language
 tag registration process can register tags with almost
 unpredictable meaning once one gets past the first subtag.
 The draft *limits* the possible tags to a small subset, and
 tightens up the allowable semantics.  It allows no tag to be
 used that was not already registerable under RFC 3066.

The extension that I see involves more semantics and formal
variations, not more possible registered tags.And, as Ned as
pointed out repeatedly, there are things that can be done in
3066 parsers/interpreters in practice that have to be done
differently in this new system.  I could, of course, have used
X-Incompatible-Content-Language in my example, but that
presumably would have set you off in some other direction.

 In RFC 3066, it is only a heuristic (or examination of the
 IANA registry, which is not machine-parseable) that tells the
 meaning of the second subtag the existing registered tag
 sr-Latn.  In the draft, its meaning is unambiguously specified
 a priori.

So?

john




___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-06 Thread John C Klensin


--On Thursday, 06 January, 2005 16:30 -0800 Peter Constable
[EMAIL PROTECTED] wrote:

 From: [EMAIL PROTECTED]
 [mailto:ietf-languages- [EMAIL PROTECTED] On Behalf Of
 John C Klensin
 
 
 (3) Finally, there is apparently a procedural oddity with this
 document.  The people who put it together apparently held
 extended discussions on the ietf-languages mailing list, a
 list that was established largely or completely to review
 registrations under 3066 and its predecessors.My
 understanding at this point is that their good-faith
 impression was that the discussions on that list were
 essentially equivalent to those of a WG.
 
 I believe I can say that it was done this way because it
 followed the example of the development of RFC 3066, which to
 my knowledge (as a member of the IETF-languages list at that
 time) happened in the same way. It was certainly done with a
 good-faith impression that appropriate procedures were being
 followed.

Peter, just to clarify... In my opinion (which isn't necessarily
worth much), the procedures that were followed were perfectly
reasonable.   Anyone can form a design team and put a document
together, and there are no rules that bar such a design team
from using and building on a mailing list set up for something
else.  That may or may not be wise, but it is certainly
permitted.  The only place this runs into a problem is if
someone presumes that a document developed in the way this one
was developed is equivalent to a WG product, or that it is
entitled to the presumptions of relevancy and correctness that
go with a WG product.  From that point of view, it is nothing
more or less than an individual submission (or the output of a
self-defined design team) and the comments Dave and I have been
making apply.

   john


___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-06 Thread Peter Constable
 From: [EMAIL PROTECTED] [mailto:ietf-languages-
 [EMAIL PROTECTED] On Behalf Of John C Klensin


  This reflects a fundamental misunderstanding of what the draft
  does compared to what RFC 3066 does.  It imposes *more*
  restraints on language tags, not fewer.
 
 It also very explicitly permits talking about scripts, not just
 languages and countries.That, to me, is an extension,
 regardless of the additional constraints.

There may be a disagreement here due to a difference of perspective: one
could say that the grammar is more extensive, but that makes the formal
language less extensive. So, I suppose whether one considers such a
revision an extension depends on one's perspective. 

Note that while the draft permits talking about scripts, RFC 3066
permits talking about *anything*. More extensive grammar, less
extensive language (and vice versa). 


 And, as Ned as
 pointed out repeatedly, there are things that can be done in
 3066 parsers/interpreters in practice that have to be done
 differently in this new system.

I think this claim can only be made on the basis of assumptions not
found in RFC 3066. Ned has most recently said, 

3066bis provides a reliable way to locate country codes in all cases,
but the algorithm is different. And this is a non-backwards-compatible
change.

The fact that it can identify country codes in all cases but requires a
different algorithm does not imply a non-backwards-compatible change
since it is a new functionality -- it is doing something that wasn't
even possible in RFC 3066. 

Backwards compatibility cannot be measured in terms of whether new
processors require different algorithms to achieve new functionality. It
can only be measured in terms of whether new processors can perform
correct operations (correct according to the specification for those
processors -- the proposed draft) on existing tags, and whether existing
processors can perform correct operations (correct according to the
specification of those processors -- RFC 3066) on new tags. This draft
permits this.



Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-06 Thread Peter Constable
John:

 Peter, just to clarify... In my opinion (which isn't necessarily
 worth much)

(I sincerely doubt that's the case.)


, the procedures that were followed were perfectly
 reasonable.   Anyone can form a design team and put a document
 together, and there are no rules that bar such a design team
 from using and building on a mailing list set up for something
 else.  That may or may not be wise, but it is certainly
 permitted.  The only place this runs into a problem is if
 someone presumes that a document developed in the way this one
 was developed is equivalent to a WG product, or that it is
 entitled to the presumptions of relevancy and correctness that
 go with a WG product.

I can't speak for the authors. I was not familiar with those
distinctions when the process began, and I suspect that is true of
others on the IETF-languages list who contributed. In my mind we were
following a precident that implied not only a permitted procedure but an
entirely appropriate one. I think all of us now understand, at least in
part, that some distinctions exist that may have practical implications
on how something is received by the IETF community and processed by the
IESG.


 From that point of view, it is nothing
 more or less than an individual submission (or the output of a
 self-defined design team) and the comments Dave and I have been
 making apply.

I don't think I have questioned the applicability of your comments in
this regard at any point.



Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-05 Thread John Cowan
[EMAIL PROTECTED] scripsit:

  Finding country codes is straightforward: any non-initial subtag of
  two letters (not appearing to the right of x- or -x-) is a country
  code.  This is true in RFC 1766, RFC 3066, and the current draft.
 
 On the contrary, in RFC 3066 the rule is any 2 letter value that
 appears as the second subtag is a country code. The rule in the new
 draft is either the formulation you give above or  any 2 letter value
 that appears as a subtag after the initial subtag and some number of
 3 and 4 letter subtags.

I didn't state it as a rule, but as true.  Every non-initial 2-letter
tag in RFC 3066 is a country code; the same is true in the draft.
(A private correspondent notes that the reference to -x- should
in fact be a reference to any singleton, though -x- and i- are
the only singletons currently usable.)

 Just because something doesn't necessarily do something doesn't mean it
 never does it.

It does mean it can't be counted on in the general case.

 Well, maybe I'm missing something obvious, but I see nothing in RFC
3066 that  qualifies as a description of a matching algorithm.

Section 2.5 (2.4.1 in the draft) states the matching rule in a succinct
fashion.  Everything in 2.4.2 is a non-normative elaboration of this.

-- 
John Cowan  www.reutershealth.com  www.ccil.org/~cowan  [EMAIL PROTECTED]
'Tis the Linux rebellion / Let coders take their place,
The Linux-nationale / Shall Microsoft outpace,
We can write better programs / Our CPUs won't stall,
So raise the penguin banner of / The Linux-nationale.  --Greg Baker

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-05 Thread ned . freed
   Finding country codes is straightforward: any non-initial subtag of
   two letters (not appearing to the right of x- or -x-) is a country
   code.  This is true in RFC 1766, RFC 3066, and the current draft.

  On the contrary, in RFC 3066 the rule is any 2 letter value that
  appears as the second subtag is a country code. The rule in the new
  draft is either the formulation you give above or  any 2 letter value
  that appears as a subtag after the initial subtag and some number of
  3 and 4 letter subtags.

 I didn't state it as a rule, but as true.  Every non-initial 2-letter
 tag in RFC 3066 is a country code; the same is true in the draft.

Again, that is not what RFC 3066 says. From section 2.2:

 There are no rules apart from the syntactic ones for the third and subsequent
 subtags.

Sure sounds to me like a third two letter subtag is (a) Allowed and (b)
Isn't supposed to be treated as country code.

Now, it may be the case that all _registered_ tags have avoided the use of
non-country code two letter codes in the third and later position. But this is
100% irrelevant. The point is that conformant code implementing RFC 3066 is
broken if it simply assumes any 2 letter code after the first subtag is a
country code. Rather, the rule is simply that a country code, if present,
always appears as a two letter second subtag. The new draft changes this rule,
so applications that pay attention to coutnry codes in language tags have to
change and the new algorithm for finding the country code is trickier.

 (A private correspondent notes that the reference to -x- should
 in fact be a reference to any singleton, though -x- and i- are
 the only singletons currently usable.)

I have to say I find it quite interesting that one of the main proponents of
the new draft, while arguing that the new draft doesn't make the matching
problem a lot harder, ended up giving an erroneous rule for extracting country
codes from a language tag. 

  Just because something doesn't necessarily do something doesn't mean it
  never does it.

 It does mean it can't be counted on in the general case.

Sure, in the general case most if not all of these nasty corner cases you've
created can be blithly assumed away because they only appear in specific
problem domains. Actual applications that operate in those specific domains
aren't so lucky, however. And the metric we're supposed to apply in the IETF is
real world implementability.

As it happens I deal with messaging applications, and in this space text/plain
with all sorts of nasty charset issues is the rule, not the exception.

  Well, maybe I'm missing something obvious, but I see nothing in RFC
  3066 that qualifies as a description of a matching algorithm.

 Section 2.5 (2.4.1 in the draft) states the matching rule in a succinct
 fashion.  Everything in 2.4.2 is a non-normative elaboration of this.

??? Which in no way refutes my assertion that no matching rule algorithm
was given in RFC 3066!

Ned

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-05 Thread John Cowan
[EMAIL PROTECTED] scripsit:

 Now, it may be the case that all _registered_ tags have avoided the use of
 non-country code two letter codes in the third and later position. But this is
 100% irrelevant.

If you say so.

 The point is that conformant code implementing RFC 3066 is
 broken if it simply assumes any 2 letter code after the first subtag is a
 country code. Rather, the rule is simply that a country code, if present,
 always appears as a two letter second subtag.

Not quite.  The rule is that a 2-letter second subtag is a country code.
Country codes could have appeared elsewhere, and may still wind up doing so
before RFC 3066 is obsoleted.

 The new draft changes this rule,
 so applications that pay attention to coutnry codes in language tags have to
 change and the new algorithm for finding the country code is trickier.

But not much.  As an advantage, country codes can always be found in the new
draft, whereas in RFC 3066 they could in principle be anywhere.

  (A private correspondent notes that the reference to -x- should
  in fact be a reference to any singleton, though -x- and i- are
  the only singletons currently usable.)
 
 I have to say I find it quite interesting that one of the main proponents of
 the new draft, while arguing that the new draft doesn't make the matching
 problem a lot harder, ended up giving an erroneous rule for extracting country
 codes from a language tag. 

Like other people, I sometimes post when tired; I don't think this particularly
interesting.

 Sure, in the general case most if not all of these nasty corner cases you've
 created can be blithly assumed away because they only appear in specific
 problem domains. Actual applications that operate in those specific domains
 aren't so lucky, however. And the metric we're supposed to apply in the IETF 
 is
 real world implementability.

I fail to see what this has to do with the merit of marking scripts in language
tags.  The preferred IETF charset, UTF-8, contains no information about script
whatever.

 As it happens I deal with messaging applications, and in this space text/plain
 with all sorts of nasty charset issues is the rule, not the exception.

Extended language tags will neither help nor harm you, then.

-- 
We are lost, lost.  No name, no business, no Precious, nothing.  Only empty.
Only hungry: yes, we are hungry.  A few little fishes, nassty bony little
fishes, for a poor creature, and they say death.  So wise they are; so just,
so very just.  --Gollum[EMAIL PROTECTED]  www.ccil.org/~cowan

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-05 Thread Mark Davis
 Rather, the rule is simply that a country code, if present,
 always appears as a two letter second subtag. The new draft changes this
rule,
 so applications that pay attention to coutnry codes in language tags have
to
 change and the new algorithm for finding the country code is trickier.

Your text above says (a) if there is a country code in the tag, it is the
second subtag. That is not what text of RFC 3066 actually says, which is:

 The following rules apply to the second subtag:
 All 2-letter subtags are interpreted as ISO 3166 alpha-2 country...

That is, it says (b) if a second subtag has 2 letters, then it is an ISO
3166 code, which is not the same as (a). (It is almost, but not quite, the
converse.) The current RFC certainly does not forbid the use of country
codes in other positions in language tags. One could absolutely register
en-Latin-US, for example, meaning English as spoken in the US written in
Latin script.

There has been a lot of noise on this issue, and too few concrete examples.
In the so-called 3066bis draft, we have striven very hard to ensure that:

(c) Every single tag that could be generated under RFC 3066bis is a tag that
could have been registered under RFC 3066.

Thus if someone wrote a parser that is future-compatible -- that could parse
all RFC 3066 language tags including those registered after the parser was
deployed -- then that parser can handle all 3066bis language tags. This is a
significant advance over RFC 3066, whose registered (not generated) language
tags are atomic, and cannot be effectively parsed at all. 3066bis adds more
structure so as to allow effective parsing of tags.

If you *can* come up with tags that would show that (c) is invalid, that
would be a concrete case that we would have to make adjustments in the draft
for.

A second issue that has come up is complexity. Admittedly, 3066bis is more
complex than RFC 3066. Part of that is due to adding additional structure,
and part due to necessary clarifications (such as the distinction between
well-formed and valid). But we did not add the additional structure at a
whim. RFC 3066, while a significant advance, is simply not now powerful
enough to meet the current needs for distinctions in language needed by the
industry. The companies and organizations in the Unicode consortium, for
example, are supporting 3066bis for improved software internationalization.
For more information on the reasons behind the enhancements in 3066bis see
http://www.inter-locale.com/ID/why-rfc3066bis.html.

Moreover, all the talk about this being *too* complex is far overblown. All
3066bis language tags can be parsed, including all the grandfathered codes,
with a very short piece of code, or even with a regular expression (such as
in Perl). This is not rocket science.

Mark

- Original Message - 
From: [EMAIL PROTECTED]
To: John Cowan [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; ietf@ietf.org
Sent: Wednesday, January 05, 2005 07:33
Subject: Re: draft-phillips-langtags-08, process, sp
ecifications,stability, and extensions


Finding country codes is straightforward: any non-initial subtag of
two letters (not appearing to the right of x- or -x-) is a
country
code.  This is true in RFC 1766, RFC 3066, and the current draft.

   On the contrary, in RFC 3066 the rule is any 2 letter value that
   appears as the second subtag is a country code. The rule in the new
   draft is either the formulation you give above or  any 2 letter value
   that appears as a subtag after the initial subtag and some number of
   3 and 4 letter subtags.

  I didn't state it as a rule, but as true.  Every non-initial 2-letter
  tag in RFC 3066 is a country code; the same is true in the draft.

 Again, that is not what RFC 3066 says. From section 2.2:

  There are no rules apart from the syntactic ones for the third and
subsequent
  subtags.

 Sure sounds to me like a third two letter subtag is (a) Allowed and (b)
 Isn't supposed to be treated as country code.

 Now, it may be the case that all _registered_ tags have avoided the use of
 non-country code two letter codes in the third and later position. But
this is
 100% irrelevant. The point is that conformant code implementing RFC 3066
is
 broken if it simply assumes any 2 letter code after the first subtag is a
 country code. Rather, the rule is simply that a country code, if present,
 always appears as a two letter second subtag. The new draft changes this
rule,
 so applications that pay attention to coutnry codes in language tags have
to
 change and the new algorithm for finding the country code is trickier.

  (A private correspondent notes that the reference to -x- should
  in fact be a reference to any singleton, though -x- and i- are
  the only singletons currently usable.)

 I have to say I find it quite interesting that one of the main proponents
of
 the new draft, while arguing that the new draft doesn't make the matching
 problem a lot harder, ended up giving

RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-04 Thread ned . freed
This whole question of what 'matches' is subtle.  Consider the case
when I have a document that has variant content by language (e.g.
different sound tracks), and the user indicates a set of preferred
languages.  If the content has de-CH and fr-CH (swiss german and
french), and a default en (english) and the user says he speaks
de-DE and fr-FR, on the face of it nothing matches, and I fall
back to the catch-all default, which is almost certainly not the best
result.
David, this isn't the half of it. The case you describe is actually one of 
the
easy ones, in that it can be handled by doing a preferred match on the entire
tag, with a generic match on the primary tag only having lesser precedence
but higher precedence than a fallback to a default.
I know of two other wrinkles in the RFC 1766 world:
(1) Matching may want to take into account the distinguished nature
   of country subtags in some way.
(2) SGN- requires special handling, in that SGN-FR and SGN-EN are in fact
   sufficiently different languages that a primary tag match should not be
   taken to be a generic match. (Of course this only matters if sign
   languages are relevant to your situation - in many cases they aren't.
   In retrospect I think it was a mistake to register sign languages this
   way.)
This proposed revision, however, opens pandora's box in regards to matching.
Consider:
(a) Extension tags appear as the first subtags, and as such have to
   be taken into account when looking for country subtags.
(b) Script tags change the complexion of the matching problem significantly,
   in that they can interact with external factors like charset information
   in odd ways.
(c) UN country numbers have been added (IMO for no good reason), requiring
   handling similar to country codes.
The bottom line is that while I know how to write reasonable code to do RFC
1766 matching (and have in fact done so for widely deployed software), I
haven't a clue how to handle this new draft competently in regards to matching.
And the immediate consequence of this is that I, and I suspect many other,
implementors are going to adopt a wait and see attitude in regards to
implementing any of this.
Ned
___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-04 Thread ned . freed
Small typo: In my previous response I referred to RFC 1766 when I meant
RFC 3066. Too many documents open at once, sorry.
Ned
___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-04 Thread Dave Singer
At 9:14 AM -0800 1/4/05, [EMAIL PROTECTED] wrote:
This whole question of what 'matches' is subtle.  Consider the case
when I have a document that has variant content by language (e.g.
different sound tracks), and the user indicates a set of preferred
languages.  If the content has de-CH and fr-CH (swiss german and
french), and a default en (english) and the user says he speaks
de-DE and fr-FR, on the face of it nothing matches, and I fall
back to the catch-all default, which is almost certainly not the best
result.
David, this isn't the half of it. The case you describe is actually one of the
easy ones, in that it can be handled by doing a preferred match on 
the entire
tag, with a generic match on the primary tag only having lesser precedence
but higher precedence than a fallback to a default.
Yes, I picked off an easy example for which the 'matching' section of 
the draft didn't seem adequate.  This really is a tar-pit, of course. 
Serbo-croatian used to be a language;  now it's serbian and croatian. 
I assume that they are mutually intelligible.  Serbian is probably a 
better substitute for croatian than some general default (or 
silence), though saying this in some parts of the world might start 
wars.

The whole question of what is a language, a variant or dialect of a 
language, or a suitable substitute for a language, would benefit some 
thought in any tagging scheme, though I agree the problem is not 
generally soluble.

I know of two other wrinkles in the RFC 1766 world:
(1) Matching may want to take into account the distinguished nature
   of country subtags in some way.
(2) SGN- requires special handling, in that SGN-FR and SGN-EN are in fact
   sufficiently different languages that a primary tag match should not be
   taken to be a generic match. (Of course this only matters if sign
   languages are relevant to your situation - in many cases they aren't.
   In retrospect I think it was a mistake to register sign languages this
   way.)
This proposed revision, however, opens pandora's box in regards to matching.
Consider:
(a) Extension tags appear as the first subtags, and as such have to
   be taken into account when looking for country subtags.
(b) Script tags change the complexion of the matching problem significantly,
   in that they can interact with external factors like charset information
   in odd ways.
(c) UN country numbers have been added (IMO for no good reason), requiring
   handling similar to country codes.
The bottom line is that while I know how to write reasonable code to do RFC
1766 matching (and have in fact done so for widely deployed software), I
haven't a clue how to handle this new draft competently in regards 
to matching.
And the immediate consequence of this is that I, and I suspect many other,
implementors are going to adopt a wait and see attitude in regards to
implementing any of this.

Ned

--
David Singer
Apple Computer/QuickTime
___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-04 Thread John Cowan
[EMAIL PROTECTED] scripsit:

 I know of two other wrinkles in the RFC 1766 world:

Are you aware that RFC 1766 has been obsolete for four years now?

 (2) SGN- requires special handling, in that SGN-FR and SGN-EN are in fact
sufficiently different languages that a primary tag match should not be
taken to be a generic match. 

The same is true of the various registered zh-* tags.

 (a) Extension tags appear as the first subtags, and as such have to
be taken into account when looking for country subtags.

Finding country codes is straightforward: any non-initial subtag of two letters
(not appearing to the right of x- or -x-) is a country code.
This is true in RFC 1766, RFC 3066, and the current draft.

 (b) Script tags change the complexion of the matching problem significantly,
in that they can interact with external factors like charset information
in odd ways.

Can you clarify this?  Charset information neither specifies nor necessarily
restricts (except in text/plain) the script used to write a document.

 (c) UN country numbers have been added (IMO for no good reason), requiring
handling similar to country codes.

They provide for supranational language varieties and for stability in
country codes which is inappropriate for ISO 3166 alphabetic codes (which
are codes for country *names*).

 The bottom line is that while I know how to write reasonable code to do RFC
 1766 matching (and have in fact done so for widely deployed software), I
 haven't a clue how to handle this new draft competently in regards to 
 matching.

The draft describes only the RFC 1766 (3066) algorithm, without excluding
other algorithms to be defined later.

-- 
Clear?  Huh!  Why a four-year-old childJohn Cowan
could understand this report.  Run out  [EMAIL PROTECTED]
and find me a four-year-old child.  I   http://www.ccil.org/~cowan
can't make head or tail out of it. http://www.reutershealth.com
--Rufus T. Firefly on government reports

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-04 Thread John Cowan
Dave Singer scripsit:

 Yes, I picked off an easy example for which the 'matching' section of 
 the draft didn't seem adequate.  This really is a tar-pit, of course. 

Indeed it is, which is why the draft provides only one simple algorithm
(described as the most common implementation, which it is) and explicitly
allows for cleverer techniques for those who want them.

 I assume that they are mutually intelligible.  

Among speakers of good will, yes.

 The whole question of what is a language, a variant or dialect of a 
 language, or a suitable substitute for a language, would benefit some 
 thought in any tagging scheme, though I agree the problem is not 
 generally soluble.

See the editor's draft of ISO 639-3 at http://tinyurl.com/6kky2 .  This is
a PDF file about 4 MB in size, so I excerpt the relevant text here (clause
4.2.1, pp. 3-4):

# There is no one definition of language that is agreed upon by all and
# appropriate for all purposes. As a result, there can be disagreement,
# even among speakers or linguistic experts, as to whether two varieties
# represent dialects of a single language or two distinct languages. For
# this part of ISO 639, judgments regarding when two varieties are
# considered to be the same or different languages are based on a number
# of factors, including linguistic similarity, intelligibility, a common
# literature, the views of speakers concerning the relationship between
# language and identity, and other factors. The following basic criteria
# are followed:
#
#   Two related varieties are normally considered varieties of the same
#   language if speakers of each variety have inherent understanding
#   of the other variety (that is, can understand based on knowledge of
#   their own variety without needing to learn the other variety) at a
#   functional level.
#
#   Where spoken intelligibility between varieties is marginal, the
#   existence of a common literature or of a common ethnolinguistic
#   identity with a central variety that both understand can be strong
#   indicators that they should nevertheless be considered varieties of
#   the same language.
# 
#   Where there is enough intelligibility between varieties to
#   enable communication, the existence of well-established distinct
#   ethnolinguistic identities can be a strong indicator that they should
#   nevertheless be considered to be different languages.
#
# Some of the distinctions made on this basis may not be considered
# appropriate by some users or for certain applications. These basic
# criteria are thought to best fit the intended range of applications,
# however.

-- 
First known example of political correctness:   John Cowan
After Nurhachi had united all the otherhttp://www.reutershealth.com
Jurchen tribes under the leadership of the  http://www.ccil.org/~cowan
Manchus, his successor Abahai (1592-1643)   [EMAIL PROTECTED]
issued an order that the name Jurchen should   --S. Robert Ramsey,
be banned, and from then on, they were all The Languages of China
to be called Manchus.

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-04 Thread Peter Constable
 From: Dave Singer [mailto:[EMAIL PROTECTED]

 The whole question of what is a language, a variant or dialect of a
 language, or a suitable substitute for a language, would benefit some
 thought in any tagging scheme, though I agree the problem is not
 generally soluble.

These are questions that have been given some thought. No time to delve
into it at the moment, however.



Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-04 Thread Peter Constable
 From: [EMAIL PROTECTED] [mailto:ietf-languages-
 [EMAIL PROTECTED] On Behalf Of John Cowan


  The whole question of what is a language, a variant or dialect of a
  language, or a suitable substitute for a language, would benefit
some
  thought in any tagging scheme, though I agree the problem is not
  generally soluble.
 
 See the editor's draft of ISO 639-3 at http://tinyurl.com/6kky2 ...

I would say that all of clause 4.2 is relevant; in addition to 4.2.1, I
would especially include 4.2.2, in relation to which I have presented
ideas that led to the inclusion of the Extensions subtag in the proposed
draft. (I originally thought of it as a way to capture some existing
registered tags as part of a consistent scheme rather than merely as
ad-hoc tags, but I think it may be more generally useful as well for
dealing with some of the issues regarding different perceptions of what
is a language.) I'm afraid I don't have time at the moment to elaborate
further.



Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-04 Thread ned . freed
 [EMAIL PROTECTED] scripsit:

  I know of two other wrinkles in the RFC 1766 world:

 Are you aware that RFC 1766 has been obsolete for four years now?

Of course I am.

  (2) SGN- requires special handling, in that SGN-FR and SGN-EN are in fact
 sufficiently different languages that a primary tag match should not be
 taken to be a generic match.

 The same is true of the various registered zh-* tags.

Yes, forgot to mention that one. It is actually different and more important in
that the use-cases aren't the same as those for sign languages.

  (a) Extension tags appear as the first subtags, and as such have to
 be taken into account when looking for country subtags.

 Finding country codes is straightforward: any non-initial subtag of two 
 letters
 (not appearing to the right of x- or -x-) is a country code.
 This is true in RFC 1766, RFC 3066, and the current draft.

On the contrary, in RFC 3066 the rule is any 2 letter value that appears as
the second subtag is a country code. The rule in the new draft is either the
formulation you give above or  any 2 letter value that appears as a subtag
after the initial subtag and some number of 3 and 4 letter subtags.

These aren't the same.

  (b) Script tags change the complexion of the matching problem significantly,
 in that they can interact with external factors like charset information
 in odd ways.

 Can you clarify this?  Charset information neither specifies nor necessarily
 restricts (except in text/plain) the script used to write a document.

And what if you're dealing with text/plain, as many applicationss do?

Just because something doesn't necessarily do something doesn't mean it
never does it.

  (c) UN country numbers have been added (IMO for no good reason), requiring
 handling similar to country codes.

 They provide for supranational language varieties and for stability in
 country codes which is inappropriate for ISO 3166 alphabetic codes (which
 are codes for country *names*).

I'm aware of what they provide (although I see no explanation of this
in the draft). I'm just not convinced that their addition is warranted.

  The bottom line is that while I know how to write reasonable code to do RFC
  1766 matching (and have in fact done so for widely deployed software), I
  haven't a clue how to handle this new draft competently in regards to
  matching.

 The draft describes only the RFC 1766 (3066) algorithm, without excluding
 other algorithms to be defined later.

Well, maybe I'm missing something obvious, but I see nothing in RFC 3066 that
qualifies as a description of a matching algorithm. The new draft does include
such a description in section 2.4.2 - an improvement - but leaves any number of
details open. And we all know where the devil lives.

Side note: I don't think item 4 really belongs in the list in section 2.4.2.
It is a warning to implementors, not part of the matching mechanism.

Ned

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2005-01-03 Thread Dave Singer
The *meaning* of any given language tag would be no more or less a 
problem under the proposed revision than it was for RFC 3066 or RFC 
1766. For instance, there is a concurrent thread that has been 
discussing when country distinctions are appropriate or recommended 
(ca or ca-ES?); this discussion pertains to RFC 3066, and part 
of the issue is that meanings of tags are implied rather than 
specified -- and always have been even under RFC 1766 (I pointed 
this out five years ago when we were working on preparing RFC 3066).

So, for instance, when an author uses de-CH, what does he intend 
recipients to understand to be the difference between that and 
de-DE or even de? Neither RFC 1766 or RFC 3066 shed any light on 
this, and ultimately only the author knows for sure.

Under RFC 3066, it was the *exceptional* case that a complete tags 
was registered, allowing some indication of the meaning of the whole 
(though even in that regard nothing really required that the 
documentation provide clear indication of the meaning). The 98% 
cases were those like de-CH in which it was assumed that everyone 
would understand what the intended meaning is.
This whole question of what 'matches' is subtle.  Consider the case 
when I have a document that has variant content by language (e.g. 
different sound tracks), and the user indicates a set of preferred 
languages.  If the content has de-CH and fr-CH (swiss german and 
french), and a default en (english) and the user says he speaks 
de-DE and fr-FR, on the face of it nothing matches, and I fall 
back to the catch-all default, which is almost certainly not the best 
result.
--
David Singer
Apple Computer/QuickTime

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2004-12-30 Thread Peter Constable
 From: JFC (Jefsey) Morfin [mailto:[EMAIL PROTECTED]


 Dear Peter,
 please let focus on the discussion of draft to be approved by the IESG and
 on its role.

Eh???!! I can't imagine what on earth do you think I was talking about if not 
that.


 This document intends to replace RFC 3066 but does not want
 to
 take into account RFC published since the RFC 3006, the current IANA
 procedures, the work chartered in some WG, the internet architectural
 principle (RFC 1958).
 
 There is no problem in having it been accepted for information or
 experimental. There are serious objections to get it approved otherwise.

(RFC 1958 was published since RFC 3066??!!)

Look, the IESG chair is the list administrator for the IETF-language list, and 
a participant in its deliberations. If there has been a serious lacuna in 
process for moving this draft toward BCP, I think he would have mentioned it on 
the IETF-languages list a *long* time ago. It was the IESG that issued the Last 
Call announcement, not me, not the authors of the draft, not anybody else on 
the IETF-languages list. It appears that IESG *is* moving it through a process 
toward BCP, and one can only assume they feel their process is adequate and 
appropriate. 



 The *meaning* of any given language tag would be no more or less a
 problem
 under the proposed revision than it was for RFC 3066 or RFC 1766.
 
 (a) RFC 3066 was published without considering different usages of the
 proposed language tag format.

Eh???!! That is simply not true. RFC 3066 was developed with full awareness and 
consideration of all the usage scenarios for RFC 1766, which it replaced. RFC 
3066 discusses various IETF and W3C protocols that use language tags. (Have you 
actually read RFC 3066?)


 (b) nor which authority would document their meanings (plural)

Eh???!! Section 2.2 clearly identifies the authorities that document the 
meanings of subtags. I have pointed out that there are aspects of meaning that 
it does not address (which, btw, are not easily resolvable), but that does not 
imply that RFC 3066 was published without consideration of what authorities 
document meanings.


 I think we can all agree that there's no much less likelihood of someone
  I suggest that we not dwell on pathological cases that we aren't
 really likely to encounter.
 
 This kind of thinking is not appropriate when standardizing a format.
 Julius Caesar would have though a pathological case to propose that Roman
 should speak Londinium's language.

If Romans had started speaking a variant of Londinium's language, the proposed 
draft could easily accommodate that situation. That is not pathological. A tag 
like sr-Latn-CS-gaulish-boont-guoyu-i-enochian is pathological. It most 
certainly *is* appropriate to identify what kinds of examples are or not valid, 
as we need to design for *valid* usage scenarios. For any given character set 
encoding standard, the fact that nonsense character sequences can be devised is 
not a determining factor in development of that encoding; the same is true here.


 At this point, I feel confident that it is not a problem to combine
 script
 IDs into language tags, and this is the consensus of the domain experts
 that have been discussing this proposed revision for the past year and
 more.
 
 This may mean that current reluctances to incorporate originating source
 authority, destination, format conformance, internationalization, icons
 support (and may be additional needs) could be a further consensus. I
 suggest that we save time this time.

??? You want to incorporate these things into the draft, or into language tags 
themselves? The latter is either not necessary or not appropriate (language 
tags should *not* include anything to indicate destination). As for inclusion 
in the draft, the proposed draft is quite clear about source authorities for 
subtags and about conformance; destination is out of scope and irrelevant. 
Internationalization? These are symbolic identifiers; they are intrinsically 
not localizable. Icon support??? I haven't a clue what you're talking about!


 
 Not a problem: the proposed revision *allows* for the use of script IDs
 but does not require them. In the case of audio content, one simply would
 never include a script ID.
 
 Accents and types of voice have been documented as necessary items. They
 could use the script and police fields ?

??? If someone needs to tag content to distinguish a particular dialect, the 
proposed draft can easily accommodate that. If one wants to tag content for 
minor linguistic details (this utterance was spoken by someone who has a cleft 
palatte, who was intoxicated at the time, and uses tag-question intonation), 
it is a *non goal* of the proposed draft to accommodate that level of detail as 
it is not appropriate to try to capture that level of ad hoc detail in a 
general-purpose metadata element.



 The bigger problem you're pointing out is the limitations of using
 suffix-truncation alone as a 

RE: draft-phillips-langtags-08, process, sp ecifications, stability, and extensions

2004-12-30 Thread Peter Constable
 From: JFC (Jefsey) Morfin [mailto:[EMAIL PROTECTED]


 Of course it would not be clear if you don't have a conceptual model of
 what language tags are identifiers *of*. When RFC 3066 was being
 developed, there was a suggestion that script IDs be incorporated, but
 some were reluctant, raising the same question you have here. I was one
 of
 those. But I didn't remain obstructionist over the issue; instead, I gave
 a fair amount of thought to the ontology that underlies language tags,
 and subsequently published a white paper and presented on the topic at
 two
 conferences in the spring and fall of 2002. (Paper is available online at
 http://www.sil.org/silewp/abstract.asp?ref=2002-003 -- my thinking has
 evolved since then, but some key results remain valid, I think.)
 
 May us know which ones?

It would be easier to identify two key points on which my thinking has changed.

IIRC, I was uncertain at the time about what to do wrt sorting. I have since 
concluded that sort order is a presentation issue that, while linguistically 
related, is out of scope for language identifiers. (Note that there is no 
common usage scenario in which it makes sense to declare the sorted order of 
content.) Sort order may certainly be in scope for a locale identifier, but not 
for a language tag.

The bigger change is that I have abandoned the fourth main category in the 
ontological model I proposed. At the time, I was still trying to work out where 
something like Latin America Spanish fit in. I saw the similarity to 
sub-language varieties / dialects, but at the time thought it needed to be a 
distinct category, for which reason I concocted the notion domain-specific 
data set. 

I was never very satisfied with that: it wasn't a particularly consistent model 
(a data set is quite a different kind of thing from a language variety) and it 
ignored the similarity with sub-language variety. (And the name was a bit 
unwieldy.) 

I have since realized that I was tripping up on the very problem that was 
blocking the Language Tag Reviewer from accepting the requested registration 
for es-americas: the assumption that a language tag necessarily refers to a 
conventionally-recognized linguistic identity that exists in the world. 
Language tags are not attributes declared on language varieties; they are 
attributes declared on information objects, indicating linguistic properties of 
those information objects. And the linguistic attributes of an information 
object do not necessarily coincide with conventionally-recognized linguistic 
identities. Of course, in the majority of useful cases they will; but it's not 
hard to show that this is not always the case: e.g. if I present chat as an 
expression that could be intrepreted in relation to several different 
languages, it would be entirely appropropriate for me to declare a linguistic 
attribute of that expression of indeterminate since that is precisely my 
intent -- but clearly indeterminate doesn't correspond with any particular 
language identity out in the world.

Thus, I came to realize that the kind of distinction intended by es-americas 
was just the same kind of distinction made for any sub-language variety: it 
declares that the information object is not only in some particular language, 
but is even more constrained in terms of the language variety in use. It is 
simply coincidental that the more constrained usage in this case doesn't 
coincide with a single dialect used by some identifiable speaker community.



Peter Constable

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf