Re: Greek Characters Duplicated as Latin (was: Sanskrit nasalized L)

2011-08-27 Thread tulasi
Appreciate it for the info.
Wondering whether there are other (in addition to following) Greek
letters/symbols that were copied and renamed as LATIN?

Thanks,
Tulasi


From: Richard Wordingham 
Date: Sun, Aug 14, 2011 at 1:39 PM
Subject: Greek Characters Duplicated as Latin (was: Sanskrit nasalized L)
To: unicode Unicode Discussion 


On Sat, 6 Aug 2011 17:25:11 -0700
tulasi  wrote:

>- Why did Unicode Inc copies some letters/symbols from Greek-script
>irresponsibly and renamed as Latin-script?
>- Why din't it (Unicode Inc) use same Greek letters/symbols?

U+00B5 MICRO SIGN is an ISO-8859-1 character, and was therefore
included as U+00B5.  It normally precedes a Latin-script letter, and
therefore it actually makes sense to treat it as a Latin-script
character, and possibly give it a different shape in these contexts to
the shape of the Greek letter in Greek text.

The glyphs of U+0251 LATIN SMALL LETTER ALPHA are glyphs of U+0061
LATIN SMALL LETTER A - they have been given separate character status
because IPA uses it as a contrasting character, as with U+0261 LATIN
SMALL LETTER SCRIPT G.

U+1E9F LATIN SMALL LETTER DELTA looks to me like a glyph variant of
U+0065 LATIN SMALL LETTER D, but I may be wrong - look up the proposal
if you're really interested.

U+0216 OHM SIGN is similar to U+00B5 MICRO SIGN, except that it is used
on its own.  Whether it should be merged with U+03A9 GREEK CAPITAL
LETTER OMEGA is debatable, but that is what has been done.

The reason for the encoding of the next four letters as Latin
characters is that they have a special role in the IPA.  Three of them
have been used in extensions of the Roman alphabets for various
languages, and thereby acquired capital letters.

U+0263 LATIN SMALL LETTER GAMMA is for IPA usage, and tends to have
different glyphs to the Greek letter.  When used to extend the Roman
alphabet, its capital is different to the Greek form, so this fact also
calls for a different lower case letter.

U+025B LATIN SMALL LETTER OPEN E has the same explanation as
U+0263.

U+0278 LATIN SMALL LETTER PHI is for IPA usage, and, unlike Greek,
always has an ascender.

There is also the principal of script separation, whereby different
scripts do not share base characters.  This has led to some
duplication, e.g U+0269 LATIN SMALL LETTER IOTA, originally for IPA.
Its capital, U+0196 LATIN CAPITAL LETTER IOTA, is not the same as the
Greek capital iota.

I hope this makes things clearer.

Richard.


Re: PRI #202: Extensions to NameAliases.txt for Unicode 6.1.0

2011-08-27 Thread Asmus Freytag

On 8/27/2011 1:31 AM, Andrew West wrote:

On 27 August 2011 09:25, Andrew West  wrote:

On 27 August 2011 03:52, Benjamin M Scarborough
  wrote:

Are name aliases exempted from the normal character naming conventions? I ask 
because four of the entries have words that begin with numbers.

008E;SINGLE-SHIFT 2;control
008F;SINGLE-SHIFT 3;control
0091;PRIVATE USE 1;control
0092;PRIVATE USE 2;control


ISO 6429 (and consequently ISO/IEC 10646 Section 11) calls these characters:
SINGLE-SHIFT TWO
SINGLE-SHIFT THREE
PRIVATE USE ONE
PRIVATE USE TWO



Changing their names to "SINGLE-SHIFT 2" or "SINGLE-SHIFT-2" etc is
surely contrary to the whole point of the exercise.

Sorry, ignore that. I hadn't noticed that the digit forms were in
addition to the forms with numbers written as words.


Actually, you brought something to my attention that I had missed on 
reading the file, so I won't ignore this.


Having these ill-formatted names *in addition* to essentially the same 
name, but one that follows the naming conventions strikes me as silly. 
It would set a potential precedent for adding aliases for any character 
name containing either a digit or a the name for that digit. The PRI 
gives no rationale for the inclusion of names "valid in earlier versions".


If there's a known deviation that is currently supported (as named 
character ID, such as in regular expressions) in widely distributed 
software, I would support the addition on compatibility grounds (with 
tweaks that follow the naming rules). But simply because a name existed 
once (but was later deprecated) strikes me as going into the same 
"encyclopedic" direction that Ken himself has disavowed.


I do think now that grouping the file is a bad idea, because several 
people in this discussion, myself included, missed these particular near 
duplicates. The natural thing is wanting to know all names/aliases for a 
character. If someone needs grouping for some purposes, a spreadsheet or 
other tool can easily be used to filter by status field.


I also think that the status field "iso6429" is badly named. It should 
be "control", and what is named control should be "control-alternate", 
or perhaps, both of these groups should become simply "control". I think 
the labels chosen by the data file just set up bad precedents. If 6429, 
why not a section for 9535 (or whatever the kbd standard is) etc.


A./



Re: PRI #202: Extensions to NameAliases.txt for Unicode 6.1.0

2011-08-27 Thread Andrew West
On 27 August 2011 09:25, Andrew West  wrote:
> On 27 August 2011 03:52, Benjamin M Scarborough
>  wrote:
>> Are name aliases exempted from the normal character naming conventions? I 
>> ask because four of the entries have words that begin with numbers.
>>
>> 008E;SINGLE-SHIFT 2;control
>> 008F;SINGLE-SHIFT 3;control
>> 0091;PRIVATE USE 1;control
>> 0092;PRIVATE USE 2;control
>>
>
> ISO 6429 (and consequently ISO/IEC 10646 Section 11) calls these characters:
> SINGLE-SHIFT TWO
> SINGLE-SHIFT THREE
> PRIVATE USE ONE
> PRIVATE USE TWO
>
> 
>
> Changing their names to "SINGLE-SHIFT 2" or "SINGLE-SHIFT-2" etc is
> surely contrary to the whole point of the exercise.

Sorry, ignore that. I hadn't noticed that the digit forms were in
addition to the forms with numbers written as words.

Andrew



Re: PRI #202: Extensions to NameAliases.txt for Unicode 6.1.0

2011-08-27 Thread Andrew West
On 27 August 2011 03:52, Benjamin M Scarborough
 wrote:
> Are name aliases exempted from the normal character naming conventions? I ask 
> because four of the entries have words that begin with numbers.
>
> 008E;SINGLE-SHIFT 2;control
> 008F;SINGLE-SHIFT 3;control
> 0091;PRIVATE USE 1;control
> 0092;PRIVATE USE 2;control
>

ISO 6429 (and consequently ISO/IEC 10646 Section 11) calls these characters:
SINGLE-SHIFT TWO
SINGLE-SHIFT THREE
PRIVATE USE ONE
PRIVATE USE TWO



Changing their names to "SINGLE-SHIFT 2" or "SINGLE-SHIFT-2" etc is
surely contrary to the whole point of the exercise.

Andrew