Re: IDN and language

2005-01-05 Thread Bruce Lilly
>  Date: 2005-01-04 12:06
>  From: John C Klensin <[EMAIL PROTECTED]>

> the
> IDN situation is not an issue except in a very narrow sense and
> similar situation would apply to local-parts if we ever do
> something there. ÂIn the IDN case, the protocols are written in
> terms of arbitrary Unicode strings and just about have to be --
> there has never been a DNS restriction requiring that the labels
> be names or words in a language. ÂThe protocols apply some
> mapping rules that reject a few characters (and hence the labels
> that contain them) and change some characters into others, but
> the net effect is still a set of standards written in terms of
> strings, not languages.

My concern is the distinction between "names" (in the sense used
in RFC 1958, i.e. protocol elements) vs. "text" (RFC 2277),
and internationalizing domain names seems to make sense only if
the domain names are being treated in some way as text (i.e.
human-readable content, and therefore (possibly, at least) in
some language (see 2277 section 2, third paragraph and section
4.1, first sentence).  RFC 2277, a.k.a. BCP 18, requires (sect.
4.2) that protocols which transfer text must have provision for
carrying language information.  The considerations for making
provision for language-tagging (N.B. not requiring that every
IDN be tagged) are clear when one considers presentation
issues for the visually impaired; a screen reader needs to be
able to identify language to correctly present at least some
subset of labels which might appear in either an IDN or
internationalized local-part.

> The situation with local-parts will, most of us are convinced,
> work out in much the same way. ÂThere is a long history of
> strings used in local-parts that are not "names", "words", or
> otherwise bound to a particular language. [...]

Agreed, and I'd be happy if local parts and domain names were
to be treated purely as protocol elements.  But given the
decision to internationalize and hence the treatment as text
at least for presentation, there ought to be provision for
indicating language where necessary for correct presentation.

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: IDN and language

2005-01-04 Thread JFC (Jefsey) Morfin
At 23:37 04/01/2005, John Cowan wrote:
John C Klensin scripsit:
I know that -- I did read 3743 first.  But in that case, whatever did
you mean by "ICANN has created a recommendation [...]  that languages
not be mixed within a label"?
The first question (see may yesterday mail) is to define what we are 
talking about. What is a language. You do not talk about the same thing as 
ICANN.

How could it?  There is no requirement that there be a table for
every possible language tag, after all; all existing language tags
remain valid.  These tables are just tagged content like any other,
though the application of the tag is different from the usual
application.
I do not understand. What is the "usual application" ? We are talking about 
a standard?
jfc

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: IDN and language

2005-01-04 Thread John Cowan
John C Klensin scripsit:

> I suppose there are always exceptions.  In particular, the
> recommendations of RFC 3743 are about tables of characters, not
> dictionary lookup.   

I know that -- I did read 3743 first.  But in that case, whatever did
you mean by "ICANN has created a recommendation [...]  that languages
not be mixed within a label"?

> If, however, a domain decided to adopt a
> canonical dictionary and lookup in it as a registration
> criterion, that rule would be perfectly enforceable.  

Certainly.  But that is not the same as saying "languages [SHOULD]
not be mixed in a label."  That is a stricture about linguistic entities,
not about entries in a dictionary.

> Other issues occur if the writing order of
> characters in a language obeys specific rules and one chooses to
> enforce them (a potential issue with, e.g., Hangul, although,
> again, the choice of whether or not to try to enforce is up to
> the registry).  

This is even more confusing.  What languages do *not* impose a specific
writing order on their characters?

> It is not clear that the current proposal is much better than 3066
> for handling those cases, but I wonder if anyone has carefully
> evaluated whether it would make things worse.

How could it?  There is no requirement that there be a table for
every possible language tag, after all; all existing language tags
remain valid.  These tables are just tagged content like any other,
though the application of the tag is different from the usual
application.

-- 
XQuery Blueberry DOMJohn Cowan
Entity parser dot-com   [EMAIL PROTECTED]
Abstract schemata   http://www.reutershealth.com
XPointer errata http://www.ccil.org/~cowan
Infoset Unicode BOM --Richard Tobin

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: IDN and language

2005-01-04 Thread John C Klensin


--On Tuesday, 04 January, 2005 12:52 -0500 John Cowan
<[EMAIL PROTECTED]> wrote:

> John C Klensin scripsit:
> 
>> Returning to the DNS/IDN situation, ICANN has created a
>> recommendation for all TLDs, and a requirement on at least
>> some gTLDs, that languages not be mixed within a label and for
>> registration and use of tables similar to those recommended by
>> RFC 3743.  
> 
> This regulation is going to be completely unenforceable, since
> with a few exceptions (hexagonal French), languages do not
> have bright-line rules saying what words they do and do not
> contain.  Are we to be in the position of saying that
> eigenvector.com may be registered (and is) because the word
> appears in dictionaries, whereas eigenevent.com is ruled out
> because it "mixes" English and German?

John, I am sure that ICANN would welcome your participation as
the various rules/ guidelines evolve -- those rules are not an
IETF problem, even though changes to the standard that is used
to label them might be.  One of the things their processes have
in common with the IETF is that they prefer that people actually
try to read and understand documents before attacking them, but
I suppose there are always exceptions.  In particular, the
recommendations of RFC 3743 are about tables of characters, not
dictionary lookup.   If, however, a domain decided to adopt a
canonical dictionary and lookup in it as a registration
criterion, that rule would be perfectly enforceable.  I'd
recommend against it for many reasons, but this would be more or
less up to them.

> Forbidding the mixing of scripts is another matter, although
> in fact some languages are written using more than one
> (Unicode) script.

Whether those languages are a problem or not in the DNS context
depends on whether one wishes to permit a single label to use
both (or all three in at least a few cases I know of) scripts.
Again a per-registry decision and again perfectly enforceable
either way.  Other issues occur if the writing order of
characters in a language obeys specific rules and one chooses to
enforce them (a potential issue with, e.g., Hangul, although,
again, the choice of whether or not to try to enforce is up to
the registry).  But one of the notational problems with using
3066 would be a rule that one can have a label that contains the
characters of a given language written in, e.g., either a
modified Arabic script or a modified Cyrillic one but not in a
modified Roman ("Latin") one.  Another issue arises when one
wants to permit a character collection that includes the
characters from a given script that are used by two separate
languages -- not all of the characters of that script, but
exactly those characters that fall into the union of the
characters from the script used by the relevant languages.  It
is not clear that the current proposal is much better than 3066
for handling those cases, but I wonder if anyone has carefully
evaluated whether it would make things worse.

  john



___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: IDN and language

2005-01-04 Thread JFC (Jefsey) Morfin
At 18:06 04/01/2005, John C Klensin wrote:
Returning to the DNS/IDN situation, ICANN has created a
recommendation for all TLDs, and a requirement on at least some
gTLDs, that languages not be mixed within a label and for
registration and use of tables similar to those recommended by
RFC 3743.  Those tables are identified by a combination of the
Domain name associated with the registering TLD registry and a
3066 code.  That system is not, IMO, working especially well and
the 3066 code model will, I think, have to be extended to deal
with some unusual situations.   But, interestingly,
draft-phillips... doesn't appear to solve that particular
problem: what is needed is a way to specify odd mixtures of
languages and/or scripts that may be appropriate to a particular
zone, and that means less specificity and more
linguistically-strange constructions, not more specificity and
structure.
The real problem is the confusion all this introduce because it is not a 
consensual draft by an IETF WG working along an IAB approved Charter, what 
is odd when the discussed RFC was authored by the IESG Chair and the 
private mailing list hosted under his name with the name "ietf-language" 
what is confusing to many.

At this stage, we can only say that there is no consensus on what is 
discussed, on the problems to solve and the proposed solutions. But that 
there is no reason why there would not be such a consensus when the charter 
I outlined yesterday would have been carried.
jfc 

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: IDN and language

2005-01-04 Thread Peter Sylvester
> ruled out because it "mixes" English and German?
> 

Sorry I can't resist: like in EdelWeb.fr 

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: IDN and language

2005-01-04 Thread John Cowan
John C Klensin scripsit:

> Returning to the DNS/IDN situation, ICANN has created a
> recommendation for all TLDs, and a requirement on at least some
> gTLDs, that languages not be mixed within a label and for
> registration and use of tables similar to those recommended by
> RFC 3743.  

This regulation is going to be completely unenforceable, since with a
few exceptions (hexagonal French), languages do not have bright-line
rules saying what words they do and do not contain.  Are we to be in
the position of saying that eigenvector.com may be registered (and is)
because the word appears in dictionaries, whereas eigenevent.com is
ruled out because it "mixes" English and German?

Forbidding the mixing of scripts is another matter, although in fact
some languages are written using more than one (Unicode) script.

-- 
"And it was said that ever after, if anyJohn Cowan
man looked in that Stone, unless he had a   [EMAIL PROTECTED]
great strength of will to turn it to other  www.ccil.org/~cowan
purpose, he saw only two aged hands withering   www.reutershealth.com
in flame."   --"The Pyre of Denethor"

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: IDN and language

2005-01-04 Thread John C Klensin


--On Tuesday, 04 January, 2005 09:38 -0500 Bruce Lilly
<[EMAIL PROTECTED]> wrote:

>> One is not.  Domain names are strings of characters; only
>> incidentally do they spell out one or more words in one or
>> more languages.  I doubt whether the names "Google," "Yahoo,"
>> and "AltaVista" can be pinned down as belonging to one
>> specific language.
> 
> I was referring specifically to internationalized domain names
> (IDN, RFCs 3490, 3491, 3492, 3743) where the on-the-wire
> domain name continues to be of traditional form (ANSI X3.4
> letters,digits, and hyphen (with restrictions on combinations
> and placement)), but where a certain class of names (those
> beginning with "xn--") are "internationalized" and might be
> presented to users in a different form (which can include
> non-ASCII characters).  That came about because of the
> tendency to associate a domain name (tag) with a natural
> language "name" or legally-registered name (trademark, etc.).
> Whether one considers such associations logical or
> irrational, that is what has happened.  So one could have
> a domain name (beginning with xn--) that is presented by
> an application as "Nestlé.com".  Now certainly some names,
> such as your examples, Kodak, Häagen-Dazs, etc. have no
> language (because they are made-up strings of characters),
> but others do have a specific language.  In skimming through
> the RFCs mentioned above, it appears that there is now some
> provision for language tagging (which was not present in
> earlier versions of IDN).  However, I have not thoroughly
> reviewed those recent additions; therefore it should be
> clear that I have not reviewed the impact of the proposed
> draft changes on IDN or vice versa.  Such a review should
> take place (ideally before the deadline for the New Last
> Call on draft-phillips-langtags-08 (tomorrow!)), but I'm
> not the person to do so as I have only slight interest in
> IDN (I'm one of those who considers associating a tag
> with natural language and/or legally registered names to
> be irrational).  One potential issue is that domain names
> are case-insensitive, and whether lower-case accented
> characters map to/compare with unaccented upper-case
> letters may be a function of language (or culture, or
> political fiat).
>...
> I would add that there is apparently some discussion of
> wreaking similar havoc on local-parts, which appear in
> message-identifiers and email mailbox identifiers (STD 11).
> That too should be evaluated w.r.t. specification of
> language and the proposed changes.

Bruce,

While I'm sympathetic to many of the points you have raised, the
IDN situation is not an issue except in a very narrow sense and
similar situation would apply to local-parts if we ever do
something there.  In the IDN case, the protocols are written in
terms of arbitrary Unicode strings and just about have to be --
there has never been a DNS restriction requiring that the labels
be names or words in a language.  The protocols apply some
mapping rules that reject a few characters (and hence the labels
that contain them) and change some characters into others, but
the net effect is still a set of standards written in terms of
strings, not languages.  There has been a good deal of concern
in the DNS community about the potential for deliberately or
accidentially misleading users about domain names and the
consequent opportunities for confusion or outright fraud.  Those
concerns have led to a good deal of work on restrictions about
what strings can be registered, imposing, e.g., rules that the
holder of one string may be the only permitted holder of a
related one and rules that prohibit mixing scripts within a
single label.  These types of rules, especially the latter, are
the "very narrow sense" mentioned above, but they have no impact
on the protocols themselves.  The registration rules actually
differ from zone to zone and can safely do so because, to the
user of the DNS, an unregistered name is an unregistered name
and the distinction as to whether a name is unregistered because
no one wanted it or because some subtle rule prohibited its
registration is not of importance.

The situation with local-parts will, most of us are convinced,
work out in much the same way.  There is a long history of
strings used in local-parts that are not "names", "words", or
otherwise bound to a particular language.  Worse, different
destination systems apply different internal syntax rules and
interpretations to local-part strings.  Protocols will need to
be designed to reflect that history and avoid unreasonable
restrictions.  At the same time, I would expect the
administrators of an given local system to impose restrictions
on what local-parts parts can be used for mailboxes there (just
as is often done today).   Those restrictions may, in many
cases, reflect assumptions about languages and/or scripts but,
since they are purely local conventions, there is no need for
external registration.

Returning to the DNS/

Re: IDN and language

2005-01-04 Thread Bruce Lilly
> Re: draft-phillips-langtags-08, process, specifications, "stability",  and 
> extensions
>  Date: 2005-01-01 19:56
>  From: "Doug Ewell" <[EMAIL PROTECTED]>
>  To: [EMAIL PROTECTED]
>  
> Bruce Lilly  wrote:

> > Domain names and
> > language tags are different types of names, used for
> > different purposes, and with different scope (largely
> > non-overlapping, though one might legitimately ask how
> > one is supposed to determine the language of an
> > "internationalized" domain name...)
> 
> One is not.  Domain names are strings of characters; only incidentally
> do they spell out one or more words in one or more languages.  I doubt
> whether the names "Google," "Yahoo," and "AltaVista" can be pinned down
> as belonging to one specific language.

I was referring specifically to internationalized domain names
(IDN, RFCs 3490, 3491, 3492, 3743) where the on-the-wire
domain name continues to be of traditional form (ANSI X3.4
letters,digits, and hyphen (with restrictions on combinations
and placement)), but where a certain class of names (those
beginning with "xn--") are "internationalized" and might be
presented to users in a different form (which can include
non-ASCII characters).  That came about because of the
tendency to associate a domain name (tag) with a natural
language "name" or legally-registered name (trademark, etc.).
Whether one considers such associations logical or
irrational, that is what has happened.  So one could have
a domain name (beginning with xn--) that is presented by
an application as "Nestlé.com".  Now certainly some names,
such as your examples, Kodak, Häagen-Dazs, etc. have no
language (because they are made-up strings of characters),
but others do have a specific language.  In skimming through
the RFCs mentioned above, it appears that there is now some
provision for language tagging (which was not present in
earlier versions of IDN).  However, I have not thoroughly
reviewed those recent additions; therefore it should be
clear that I have not reviewed the impact of the proposed
draft changes on IDN or vice versa.  Such a review should
take place (ideally before the deadline for the New Last
Call on draft-phillips-langtags-08 (tomorrow!)), but I'm
not the person to do so as I have only slight interest in
IDN (I'm one of those who considers associating a tag
with natural language and/or legally registered names to
be irrational).  One potential issue is that domain names
are case-insensitive, and whether lower-case accented
characters map to/compare with unaccented upper-case
letters may be a function of language (or culture, or
political fiat).

I would add that there is apparently some discussion of
wreaking similar havoc on local-parts, which appear in
message-identifiers and email mailbox identifiers (STD 11).
That too should be evaluated w.r.t. specification of
language and the proposed changes.

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf