Re: [Ltru] RE: STD (was: Last Call: 'Tags for Identifying Languages'toBCP)

2005-08-29 Thread r&d afrac


I am sorry to impose again the community, what starts amounting to
ad-hominems.
Please, Brian advise if inadequate. 
At 04:26 29/08/2005, Peter Constable wrote:
> From: JFC (Jefsey) Morfin
[
mailto:[EMAIL PROTECTED]]
> The
> proposed langtag is an arbitrary limited compound of three
> information: language name, script and country. A language
> identification MAY call for far more elements, and deliver much
more
> information.
Mr. Morfin has often suggested to the LTRU WG that language tags
should
be able to provide greater information than is allowed by the draft.
He
has never provided any specific proposal except a request to permit
certain private-use tags, which I will return to below.
Dear Peter,
This kind of repetition now abuse no one. I bored everyone enough in
explaining that two additional subtags were necessary IMHO: the referent
and the context. There is also - a way or another the need of the date of
the reference (this can be a date or included in a subtag). 
This is documented at length in a mail of mine today. I will not repeat
it. I will only suggest you study Word.
The consensus of
the remainder of the LTRU WG is that the draft supports all relevant
distinctions needed to describe the linguistic and written-form
attributes of content as may be needed for all purposes, commercial
and
otherwise.
This is an historic statement I hope no one will forget. 
Every searcher and engineer knows the value of such final
"all".
Just in case: the langtag is not supposed to only support the
written-form attributes, but to be multimodal (cf. Peter Constable).
Please quote the voice, signs, icons, mood, etc. subtags.
> This means that:
> - "fr-Latn-fr" is the default tag based upon ISO
639-1/2/3
> - "x-fran" is a private use tag based upon ISO 639-6
> - "0-jefsey.com:franver" is my vision of the French at the
Palace of
> Versailles. Documented by an ISO 11179 conformant system (see
below)
Two comments: First, Mr. Morfin suggested within the LTRU WG that
the
syntax for language tags should be loosened to permit additional
characters, such as "." and ":".
This is a false affirmation. I did two things:
- benefiting from the marvelous capacity to direct the WG-ltru decisions
in proposing the necessary opposite, I made sure the ABNF would be fool
proof (this is not yet exactly the case as they did not always find the
proper [cf. Peter] "constraints".
- I supported the proposition of an African searcher (they treated of
troll) to reconcile the desire of a strict ABNF expressed by the WG
affinity group and the users, R&D and innovation (following ISO
evolution) support to use the URI-tags RFC in proposing first to use the
"private use" area. As indicated, a remark shown me it was a
wrong choice, the private use area also addressing other needs.
I then came to the conclusion that using the present Draft as a default
non exclusive solution, and some reserved numeric "singleton"
as the hooks for URI-tags was preserving the work made by the WG, while
addressing the needs of the rest of the world, avoiding an unnecessary
conflict.
The remainder of the WG was
in
consensus that this was unacceptable due to backward incompatibility
with processes designed to conform to RFC 3066.

Secondly, Mr. Morfin has repeatedly made mention of ISO 11179, a
series
of ISO standards on metadata and metadata registries, indicating his
view that language tags used on the Internet should be maintained in
a
registry conformant with ISO 11179, and therefore that the draft
should
make reference to those standards. He has also, on several occasions
such as his comments above, cited ISO 11179 in relation to his views
in
a manner that appears to be intended to suggest that his views are
superior to the draft because he has cited that series of standards
while the draft does not. 
The Draft addresses targets you defined a long ago. It was presented
privately (twice) and is now presented as a WG document.  The
document having not changed, one can expect that it keeps the same
targets. You consider it addresses them "all". 
There can therefore be no "superior" views. There are different
targets. My target is protect the R&D, users, and Internet
innovation. 
In a nutshell, I do _not_ believe that a draft crafted by a few
individuals can supports all the relevant distinctions needed to describe
the linguistic and written-form attributes of content as may be needed
for all purposes, commercial and otherwise. And I want to protect other
searchers and cultures' right to have their own solutions,
_without_conflict_ and detriment to _your_solution_.
The real solution is IRI-tags we will document as soon as the URI-tags
RFC is published. But that will create a deployment conflict with your
application, due to your sponsors. No one needs that.
A reality check is in need
here:
- While Mr. Morfin cites ISO 11179, he has never made statements 
  that clearly indicate that he actually understands those
standards.
I propose everyone having time to spend to rea

Re: [Ltru] RE: STD (was: Last Call: 'Tags for Identifying Languages'toBCP)

2005-08-29 Thread JFC (Jefsey) Morfin


I am sorry to impose again the community, what starts amounting to
ad-hominems. I am used to that, but the quality of the person and the
serious looking of the mail calls for a reponse. In particular in this
case, where two majors points are documented.
Sorry, Peter. Please, Brian advise if inadequate. 
At 04:26 29/08/2005, Peter Constable wrote:
> From: JFC (Jefsey) Morfin [

mailto:[EMAIL PROTECTED]]
> The
> proposed langtag is an arbitrary limited compound of three
> information: language name, script and country. A language
> identification MAY call for far more elements, and deliver much
more
> information.
Mr. Morfin has often suggested to the LTRU WG that language tags
should
be able to provide greater information than is allowed by the draft.
He
has never provided any specific proposal except a request to permit
certain private-use tags, which I will return to below.
Dear Peter,
This kind of repetition now abuse no one. I bored everyone enough in
explaining that two additional subtags were necessary IMHO: the referent
and the context. There is also - a way or another the need of the date of
the reference (this can be a date or included in a subtag). 
This is documented at length in a mail of mine today. I will not repeat
it. I will only suggest you study Word.
The consensus of
the remainder of the LTRU WG is that the draft supports all relevant
distinctions needed to describe the linguistic and written-form
attributes of content as may be needed for all purposes, commercial
and
otherwise.
This is an historic statement I hope no one will forget. 
Every searcher and engineer knows the value of such final
"all".
Just in case: the langtag is not supposed to only support the
written-form attributes, but to be multimodal (cf. Peter Constable).
Please quote the voice, signs, icons, mood, etc. subtags.
> This means that:
> - "fr-Latn-fr" is the default tag based upon ISO
639-1/2/3
> - "x-fran" is a private use tag based upon ISO 639-6
> - "0-jefsey.com:franver" is my vision of the French at the
Palace of
> Versailles. Documented by an ISO 11179 conformant system (see
below)
Two comments: First, Mr. Morfin suggested within the LTRU WG that
the
syntax for language tags should be loosened to permit additional
characters, such as "." and ":".
This is a false affirmation. I did two things:
- benefiting from the marvelous capacity to direct the WG-ltru decisions
in proposing the necessary opposite, I made sure the ABNF would be fool
proof (this is not yet exactly the case as they did not always find the
proper [cf. Peter] "constraints".
- I supported the proposition of an African searcher (they treated of
troll) to reconcile the desire of a strict ABNF expressed by the WG
affinity group and the users, R&D and innovation (following ISO
evolution) support to use the URI-tags RFC in proposing first to use the
"private use" area. As indicated, a remark shown me it was a
wrong choice, the private use area also addressing other needs.
I then came to the conclusion that using the present Draft as a default
non exclusive solution, and some reserved numeric "singleton"
as the hooks for URI-tags was preserving the work made by the WG, while
addressing the needs of the rest of the world, avoiding an unnecessary
conflict.
The remainder of the WG was
in
consensus that this was unacceptable due to backward incompatibility
with processes designed to conform to RFC 3066.
Secondly, Mr. Morfin has repeatedly made mention of ISO 11179, a
series
of ISO standards on metadata and metadata registries, indicating his
view that language tags used on the Internet should be maintained in
a
registry conformant with ISO 11179, and therefore that the draft
should
make reference to those standards. He has also, on several occasions
such as his comments above, cited ISO 11179 in relation to his views
in
a manner that appears to be intended to suggest that his views are
superior to the draft because he has cited that series of standards
while the draft does not. 
The Draft addresses targets you defined a long ago. It was presented
privately (twice) and is now presented as a WG document.  The
document having not changed, one can expect that it keeps the same
targets. You consider it addresses them "all". 
There can therefore be no "superior" views. There are different
targets. My target is protect the R&D, users, and Internet
innovation. 
In a nutshell, I do _not_ believe that a draft crafted by a few
individuals can supports all the relevant distinctions needed to describe
the linguistic and written-form attributes of content as may be needed
for all purposes, commercial and otherwise. And I want to protect other
searchers and cultures' right to have their own solutions,
_without_conflict_ and detriment to _your_solution_.
The real solution is IRI-tags we will document as soon as the URI-tags
RFC is published. But that will create a deployment conflict with your
application, due to your sponsors. No one needs that.
A reality check is in need

RE: STD (was: Last Call: 'Tags for Identifying Languages'toBCP)

2005-08-28 Thread Peter Constable
> From: JFC (Jefsey) Morfin [mailto:[EMAIL PROTECTED]


> An exchange on WG-ltru documents...

In this post from Mr. Morfin, it is difficult (at least for me) to
ascertain his point other than in relation to certain specifics:


> The
> proposed langtag is an arbitrary limited compound of three
> information: language name, script and country. A language
> identification MAY call for far more elements, and deliver much more
> information.

Mr. Morfin has often suggested to the LTRU WG that language tags should
be able to provide greater information than is allowed by the draft. He
has never provided any specific proposal except a request to permit
certain private-use tags, which I will return to below. The consensus of
the remainder of the LTRU WG is that the draft supports all relevant
distinctions needed to describe the linguistic and written-form
attributes of content as may be needed for all purposes, commercial and
otherwise.


> This means that:
> - "fr-Latn-fr" is the default tag based upon ISO 639-1/2/3
> - "x-fran" is a private use tag based upon ISO 639-6
> - "0-jefsey.com:franver" is my vision of the French at the Palace of
> Versailles. Documented by an ISO 11179 conformant system (see below)

Two comments: First, Mr. Morfin suggested within the LTRU WG that the
syntax for language tags should be loosened to permit additional
characters, such as "." and ":". The remainder of the WG was in
consensus that this was unacceptable due to backward incompatibility
with processes designed to conform to RFC 3066.

Secondly, Mr. Morfin has repeatedly made mention of ISO 11179, a series
of ISO standards on metadata and metadata registries, indicating his
view that language tags used on the Internet should be maintained in a
registry conformant with ISO 11179, and therefore that the draft should
make reference to those standards. He has also, on several occasions
such as his comments above, cited ISO 11179 in relation to his views in
a manner that appears to be intended to suggest that his views are
superior to the draft because he has cited that series of standards
while the draft does not. A reality check is in need here:

- While Mr. Morfin cites ISO 11179, he has never made statements 
  that clearly indicate that he actually understands those 
  standards.

- While Mr. Morfin refers to "an ISO 11179 conformant system", 
  none of the ISO 11179 series of standards contains any statement 
  of conformance requirements. Thus, no such notion of "ISO 11179 
  conformant" is defined anywhere. All that can be said is that a 
  system of metadata elements is maintained and administered using 
  a certain amount of the conceptual model, practice and 
  administrative infrastructure specified in the ISO 11179 standards. 
  The draft uses some measure of these, though it does not make 
  normative reference to ISO 11179.

  In terms of ISO 11179 notions, each entry in the proposed registry 
  includes the two essential components of a metadata element: a 
  representation, and a data element concept. Each item in the 
  registry indicates (i) the representation used in language tags, 
  (ii) a designator that indicates the value meaning and that can 
  also serve as the data identifier, (iii) the object class (its 
  "type"), (iv) the administrative status (limited to deprecated or 
  not deprecated), as well as other properties.

  Thus, while it cannot formally be said that the draft conforms
  to ISO 11179 (since no terms of conformance are defined), I think 
  it *can* reasonably be said that the draft creates a registry and
  system of metadata elements that is consistent with the model
  presented in ISO 11179.

- The primary reason that the LTRU WG chose not to reference ISO
  11179 in this draft had nothing to do with whether the WG 
  considered ISO 11179 appropriate or valuable in general. Rather,
  it was that it was not deemed that reference to ISO 11179 would
  add significant value in the context of an IETF language subtag
  registry. Taken together, the ISO 11179 standards are long and
  complex, and have not to our knowledge been referenced in any
  other IETF metadata registry -- and certainly not in relation
  to RFC 1766 or RFC 3066, which specifications accomplish their 
  purposes in spite of that absence of reference.


Thus, when I see Mr. Morfin citing ISO 11179 in the course of arguing
for some view that he holds, I consider that citation to have added
nothing of significance in support of his view.



> This means that this debate is only to lock a _final_ ABNF via an
> accepted RFC and a loaded operationalIANA registry _before_ a simpler
> solution [ISO 639-6] is available three months from now

This statement makes several assumptions of uncertain validity, not the
least of which is that use of alpha-4 symbols from ISO 639-6 for IETF
language tags would constitute a simpler solution. Given the widespread
existing use of RFC 3066 tags, use of ISO 639-6 would have to go
alongside use

STD (was: Last Call: 'Tags for Identifying Languages'toBCP)

2005-08-28 Thread JFC (Jefsey) Morfin
An exchange on WG-ltru documents (I do not say "support": the reader 
will judge) the positions I support.


It involves:
- Peter Constable: one of the initiator of the project and author of 
ISO 639-3 which lists 7500 languages and is used in building langtags

- Doug Ewel: author of the Draft concerning the initial content of the registry
- Debbie Garside: the author of ISO 639-6

At 22:20 28/08/2005, Peter Constable wrote:

[I'll preface this reply by saying that we don't want to spend too much
time discussing issues that are not of immediate concern while we've got
the matching draft and IETF last call on the registry drafts to deal
with. So, I won't pursue this thread much longer.]


The proposed Draft is based upon ISO 639-1,2,3 lists of language 
names. ISO 639-6 is a list of language use names and IDs. The 
proposed langtag is an arbitrary limited compound of three 
information: language name, script and country. A language 
identification MAY call for far more elements, and deliver much more 
information. However these three basic elements are necessary to sell 
lingually related products  (contract, ads, documentation, bills) and 
identify the current status of the art "locales" (CLDR project).


The alternative seems to be:
- GO for an e-commercial only multilingual internet, for ever.
- NO we do not want the Multilingual Internet to be only commercial.
The decision is NOW. And we understand Peter and the authors wants to 
win now, because they have real needs to address now.


But I do not think there is a need for anyone to "win". There is a 
third response.
- GO for an e-commercial multilingual internet support now, as 
default/immediate solution
- YES to a generalised Multilingual Internet hooked to the RFC 3066 
Draft how poor it is, using its reserved ABNF hooks.


This means that:
- "fr-Latn-fr" is the default tag based upon ISO 639-1/2/3
- "x-fran" is a private use tag based upon ISO 639-6
- "0-jefsey.com:franver" is my vision of the French at the Palace of 
Versailles. Documented by an ISO 11179 conformant system (see below)



> From:Doug Ewell
> I'm a bit surprised that a work characterized as a work-in-progress
> and not yet ready for public review is nevertheless deemed ready
> to be considered as a draft international standard.

Debbie at no point said that it was -- and it is not. It will be
December at the earliest that it can be registered as a CD, and it must
successfully complete a three-month ballot as CD before it can be
registered as a Draft International Standard. So last spring of 2006 at
the earliest.


This means that this debate is only to lock a _final_ ABNF via an 
accepted RFC and a loaded operationalIANA registry _before_ a simpler 
solution is available three months from now



> > In other words, in the system as proposed, you could
> > use either the alpha-4 representation or the unique DI to find the
> > closest 639-1,-2,-3 or -5 tags should you so wish.
>
> But in language tags, either one value needs to be canonical -- sorry,
> "preferred" -- over the others, or else the duplicative values should
> not be added at all.

Your statement doesn't contradict anything that Debbie has said,
provided the context is ISO 639-6 alone. If we were to talk about
incorporation of ISO 639-6 into a revision of RFC 3066, however, then
duplication would become an issue for consideration.


This is the WG-ltru Charter that all the ISO codes be included. As a 
user I am not much interested in mixing four formats only to please 
Peter Constable and/or Debbie Garside. All the more than the issue is 
the addition of the script information to document ... oral 
expression and they miss computer(ised?) languages (definition?) and 
all this is through computers.



For clarification of Debbie's statement, in the model of ISO 11179, we
have metadata elements that consist of a data element concept, such as
'English', and a representation for that, such as "en" or "eng" (these
would be distinct representations belonging to different value domains).
Within an metadata registry, a registry item corresponding to 'English'
can have a Data Identifier (DI), which is a unique identifier *within
the registry* for that administered item; in this example, that DI could
be any number of strings, though "English" would be among the better
choices.


Nice to see that ISO 11179 is accepted now. Peter Constable and the 
WG-ltru have opposed the reference to ISO 11179 model. This model 
permits to conceptualise languages and to include in their 
description an unlimited number of additional elements. Roughly it 
means that ISO 639-3 is a table of codes (names) related to non 
documented languages. While ISO 639-6 wants to be a root to a base of 
objects describing languages.


The Draft proposes a very limited version of that base with three 
columns only. This is enough in many cases. But not in an increasing 
number of cases. Hence the possibility to use the Draft as a default. 
Since the three ele