Re: Names for control characters (Was: "(in 6429)" in allkeys.txt)

2014-03-12 Thread Eli Zaretskii
> From: "Whistler, Ken" 
> Date: Wed, 12 Mar 2014 16:48:25 +
> Cc: "Whistler, Ken" ,
> "unicode@unicode.org" 
> 
> Please be very careful here. Having a non-empty value in field 1 of
> UnicodeData.txt is *not* the same has "having a Unicode name".

You will see that I didn't refer to the Name attribute, I referred to
the old name attribute (called Unicode_1_Name in UAX#44).
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


RE: Names for control characters (Was: "(in 6429)" in allkeys.txt)

2014-03-12 Thread Whistler, Ken
Please be very careful here. Having a non-empty value in field 1 of
UnicodeData.txt is *not* the same has "having a Unicode name".

See:

http://www.unicode.org/versions/Unicode6.2.0/ch04.pdf#G135207

for the gory details.

The "Unicode name" is formally defined in terms of the Name property,
which itself is a combination of enumerated values extracted from
UnicodeData.txt, plus a number of rules.

For all characters whose General_Category=Cc, the formal definition
of the Name property is a null string. The string "" is *never*
to be interpreted as a "Unicode name". It is a field placeholder with
legacy status. See "Interpretation of Field 1 of UnicodeData.txt" in
the section I cited above.

As far as user interfaces and other applications needing "names" for
Unicode control characters -- one of the reasons that the namespace
for Unicode characters includes all of the formal name aliases provided
in NameAliases.txt is so that applications can safely treat any formal
name alias for a control character (or the other abbreviations, etc.,
also listed in NameAliases.txt) *as if* they were Unicode names, without
running into name collisions with the actual Name property value
for Unicode characters.

The history of the name collision for the (relatively) recently encoded
U+1F514 BELL with the traditional usage for the U+0007 control function
"BELL" led the UTC to extend the namespace as noted, so we won't be
running into more such problems in the future.

If Emacs were to use "ALERT" or the abbreviation "BEL" for U+0007,
instead of "", that would avoid the collision with U+1F514 BELL,
be conformant to the Unicode Standard, and presumably be helpful
to users, as well. See the entries for U+0007 in NameAliases.txt:

# Note that no formal name alias for the ISO 6429 "BELL" is
# provided for U+0007, because of the existing name collision
# with U+1F514 BELL.

0007;ALERT;control
0007;BEL;abbreviation

--Ken


> > Regarding these names in ISO 6429 again, how come these control
> > characters don't have Unicode names?
> 
> They have a non-empty "old name" field:
> 
>   ;;Cc;0;BN;N;NULL
>


___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Names for control characters (Was: "(in 6429)" in allkeys.txt)

2014-03-12 Thread Eli Zaretskii
> From: starb...@stp.lingfil.uu.se (Per Starbäck)
> Date: Wed, 12 Mar 2014 13:32:15 +0100
> Cc: "unicode@unicode.org" 
> 
> Regarding these names in ISO 6429 again, how come these control
> characters don't have Unicode names?

They have a non-empty "old name" field:

  ;;Cc;0;BN;N;NULL
   

> Emacs should do better regarding this

As you yourself say, it already does, so I don't see the point in this
rant.
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Names for control characters (Was: "(in 6429)" in allkeys.txt)

2014-03-12 Thread Mark Davis ☕
They do have aliases in NameAliases.txt

;NULL;control

;NUL;abbreviation

0001;START OF HEADING;control

0001;SOH;abbreviation

0002;START OF TEXT;control

0002;STX;abbreviation

...


Mark 

 *— Il meglio è l’inimico del bene —*


On Wed, Mar 12, 2014 at 1:32 PM, Per Starbäck wrote:

> Ken Whistler wrote:
> > Ah, I see what the interpretation problem was. Yes, that is
> > a straightforward kind of improvement -- easily enough done.
> > Look for a change the next time the file is updated. (It will not
> > be immediately changed, pending other review comments.)
>
> Thanks! Then I'll skip making a formal request about this.
>
> Regarding these names in ISO 6429 again, how come these control
> characters don't have Unicode names? For many uses of names, the control
> characters have as much need for them as any other character.
> Since it seems so straightforward it must have been suggested several
> times to introduce names like
>
>   CONTROL CHARACTER NULL
>   CONTROL CHARACTER START OF HEADING
>   CONTROL CHARACTER START OF TEXT
>
> etc., so I assume there are good reasons for not doing that, but I can't
> see what they are.
>
> Since applications want names they will use other things as names when
> there isn't a real name, and that leads to problems. Take Emacs where
> the command describe-char currently describes U+0007 as
>
>   name: 
>   old-name: BELL
>
> (I reported the misusage of "" here as a name in 2009, but it
> wasn't fixed until this year, so still not in a released version.)
> The usage of "BELL" here invites confusion with U+1F514 BELL.
>
> Emacs should do better regarding this, but still, with a proper name
> all of this would have been averted.
> ___
> Unicode mailing list
> Unicode@unicode.org
> http://unicode.org/mailman/listinfo/unicode
>
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Names for control characters (Was: "(in 6429)" in allkeys.txt)

2014-03-12 Thread Per Starbäck
Ken Whistler wrote:
> Ah, I see what the interpretation problem was. Yes, that is
> a straightforward kind of improvement -- easily enough done.
> Look for a change the next time the file is updated. (It will not
> be immediately changed, pending other review comments.)

Thanks! Then I'll skip making a formal request about this.

Regarding these names in ISO 6429 again, how come these control
characters don't have Unicode names? For many uses of names, the control
characters have as much need for them as any other character.
Since it seems so straightforward it must have been suggested several
times to introduce names like

  CONTROL CHARACTER NULL
  CONTROL CHARACTER START OF HEADING
  CONTROL CHARACTER START OF TEXT

etc., so I assume there are good reasons for not doing that, but I can't
see what they are.

Since applications want names they will use other things as names when
there isn't a real name, and that leads to problems. Take Emacs where
the command describe-char currently describes U+0007 as

  name: 
  old-name: BELL

(I reported the misusage of "" here as a name in 2009, but it
wasn't fixed until this year, so still not in a released version.)
The usage of "BELL" here invites confusion with U+1F514 BELL.

Emacs should do better regarding this, but still, with a proper name
all of this would have been averted.
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode