Re: Greek Characters Duplicated as Latin (was: Sanskrit nasalized L)

2011-08-28 Thread tulasi
Appreciate it for the info.
Wondering whether there are other (in addition to following) Greek
letters/symbols that were copied and renamed as LATIN?

Thanks,
Tulasi


From: Richard Wordingham richard.wording...@ntlworld.com
Date: Sun, Aug 14, 2011 at 1:39 PM
Subject: Greek Characters Duplicated as Latin (was: Sanskrit nasalized L)
To: unicode Unicode Discussion unicode@unicode.org


On Sat, 6 Aug 2011 17:25:11 -0700
tulasi tulas...@gmail.com wrote:

- Why did Unicode Inc copies some letters/symbols from Greek-script
irresponsibly and renamed as Latin-script?
- Why din't it (Unicode Inc) use same Greek letters/symbols?

U+00B5 MICRO SIGN is an ISO-8859-1 character, and was therefore
included as U+00B5.  It normally precedes a Latin-script letter, and
therefore it actually makes sense to treat it as a Latin-script
character, and possibly give it a different shape in these contexts to
the shape of the Greek letter in Greek text.

The glyphs of U+0251 LATIN SMALL LETTER ALPHA are glyphs of U+0061
LATIN SMALL LETTER A - they have been given separate character status
because IPA uses it as a contrasting character, as with U+0261 LATIN
SMALL LETTER SCRIPT G.

U+1E9F LATIN SMALL LETTER DELTA looks to me like a glyph variant of
U+0065 LATIN SMALL LETTER D, but I may be wrong - look up the proposal
if you're really interested.

U+0216 OHM SIGN is similar to U+00B5 MICRO SIGN, except that it is used
on its own.  Whether it should be merged with U+03A9 GREEK CAPITAL
LETTER OMEGA is debatable, but that is what has been done.

The reason for the encoding of the next four letters as Latin
characters is that they have a special role in the IPA.  Three of them
have been used in extensions of the Roman alphabets for various
languages, and thereby acquired capital letters.

U+0263 LATIN SMALL LETTER GAMMA is for IPA usage, and tends to have
different glyphs to the Greek letter.  When used to extend the Roman
alphabet, its capital is different to the Greek form, so this fact also
calls for a different lower case letter.

U+025B LATIN SMALL LETTER OPEN E has the same explanation as
U+0263.

U+0278 LATIN SMALL LETTER PHI is for IPA usage, and, unlike Greek,
always has an ascender.

There is also the principal of script separation, whereby different
scripts do not share base characters.  This has led to some
duplication, e.g U+0269 LATIN SMALL LETTER IOTA, originally for IPA.
Its capital, U+0196 LATIN CAPITAL LETTER IOTA, is not the same as the
Greek capital iota.

I hope this makes things clearer.

Richard.


Re: PRI #202: Extensions to NameAliases.txt for Unicode 6.1.0

2011-08-28 Thread Philippe Verdy
2011/8/27 Asmus Freytag asm...@ix.netcom.com:
 I also think that the status field iso6429 is badly named. It should be
 control, and what is named control should be control-alternate, or
 perhaps, both of these groups should become simply control. I think the
 labels chosen by the data file just set up bad precedents. If 6429, why not
 a section for 9535 (or whatever the kbd standard is) etc.

Thanks a lot for admitting what I was trying to demonstrating you in a
prior message (whic was early dismissed as a complete non-starter).

I lso think that there are too many aliases for controls, if the only
need is for Perl to have a name to uniquely designate those controls.
Choose one alias name, but there's absolutely no emergency for now for
adding four aliases at once for them, when there's no demonstration
that all those aliases are needed! This is just unnecessary pollution
of the UCS namespace.

If there are other mappings to do with other standards, and those
standards must be normative, these mappings don't have to be with
aliases belonging to the same namespace, it can be just separate
properties of characters.

If there are other mappings to do with other standards, and those
standards must be only informative, we already have the /MAPPINGS
directory beside the /UNIDATA directory where the UCD belongs too.

(And in fact, I think that the mappings found in the informative UTN
for mathematics symbols should be stored in this sister /MAPPINGS
folder, as soon as it has been reviewed and the UTN is no longer in a
prerelease state; this could also include the default Postscript names
or Postscript id's also used in TrueType, OpenType, and in the open
PDF file format, because these names or id's are supported by an ISO
standard; this would not be different from the mappings used in the
GSM encoding, because there's no perfect one-to-one match between
those standards and the UCS).

-- Philippe.



RE: Encoding of Emoji in SMS, and UCS-2 vs UTF-16

2011-08-28 Thread Craig McQueen
From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf 
Of Craig McQueen
Sent: Tuesday, 16 August 2011 1:28 PM
To: unicode@unicode.org
Subject: Encoding of Emoji in SMS, and UCS-2 vs UTF-16

The SMS standard specifies UCS-2 encoding:
http://www.3gpp.org/ftp/Specs/html-info/23038.htm

I see many “emoji” have been defined in Unicode 6. But many emoji are outside 
the BMP, so can’t be encoded in UCS-2. Does anyone know, is the intention that 
these emoji should be encoded in SMS using UTF-16 rather than UCS-2? Are there 
any plans in-progress to update the SMS standards to specify UTF-16 rather than 
UCS-2?

Perhaps this question could be added to the Emoji FAQ. 
http://unicode.org/faq/emoji_dingbats.html

Regards,
Craig McQueen



I haven’t heard from anyone regarding this. Should I ask on some GSM or other 
mobile standards mailing list instead?

I do think it would be worth adding to the Unicode Emoji FAQ though.

Regards,
Craig McQueen



Re: Encoding of Emoji in SMS, and UCS-2 vs UTF-16

2011-08-28 Thread Doug Ewell
I would think all standards that specify the use of UCS-2 should be updated to 
specify UTF-16 instead.  There is simply no excuse for any technology that 
deals with characters to be arbitrarily limited to the BMP.
 
--
Doug Ewell • d...@ewellic.org
Sent via BlackBerry by ATT

-Original Message-
From: Craig McQueen craig.mcqu...@beamcommunications.com
Sender: unicode-bou...@unicode.org
Date: Sun, 28 Aug 2011 19:16:25 
To: unicode@unicode.orgunicode@unicode.org
Subject: RE: Encoding of Emoji in SMS, and UCS-2 vs UTF-16

From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf 
Of Craig McQueen
Sent: Tuesday, 16 August 2011 1:28 PM
To: unicode@unicode.org
Subject: Encoding of Emoji in SMS, and UCS-2 vs UTF-16

The SMS standard specifies UCS-2 encoding:
http://www.3gpp.org/ftp/Specs/html-info/23038.htm

I see many “emoji” have been defined in Unicode 6. But many emoji are outside 
the BMP, so can’t be encoded in UCS-2. Does anyone know, is the intention that 
these emoji should be encoded in SMS using UTF-16 rather than UCS-2? Are there 
any plans in-progress to update the SMS standards to specify UTF-16 rather than 
UCS-2?

Perhaps this question could be added to the Emoji FAQ. 
http://unicode.org/faq/emoji_dingbats.html

Regards,
Craig McQueen



I haven’t heard from anyone regarding this. Should I ask on some GSM or other 
mobile standards mailing list instead?

I do think it would be worth adding to the Unicode Emoji FAQ though.

Regards,
Craig McQueen



Re: Greek Characters Duplicated as Latin (was: Sanskrit nasalized L)

2011-08-28 Thread Philippe Verdy
Richard Wordingham richard.wording...@ntlworld.com wrote:
 U+0278 LATIN SMALL LETTER PHI is for IPA usage, and, unlike Greek,
 always has an ascender.

For linguistic Greek usage, the two variants are considered
equivalent. This is not the case in Maths where thee two variants are
clearly distinct. That's why (La)TeX preserves a distinction between
\phi (with an ascender, in fact drawn with a separate stroke on top of
a circle, also typically used in linguistic Greek for non cursive
style of books) and \varphi (without the ascender, in fact wholy drawn
with a single self-intersecting curved stroke, also typically used in
linguistic Greek, for more cursive styles, either handwritten, or in
books — even in monospaced fonts — for italic styles).

Note that both variants are existing in roman/straight and
italic/slanted styles.

You'll immediately see that the variant with the ascender is not
favored in linguistic uses for the italic style, because it becomes
too much near from a slashed lowercase Greek omicron (or Latin/Cyrilic
o). You will also easily confuse it with the notation for an empty
set, so the \phi variant of LaTeX is most often avoided in most
formulas, in favor of \varphi (unless there's a real need to use both
distinctly in the same article text).

But these \phi and \varphi variants are generally not distinct, except
(once again) in some mathematical formulas that need a rich set of
variables, or need a convention to make distinction between operands
and operators, or between scalars, vectors, tensors, torsors, fields,
differentiators and so on (or between variables belonging to distinct
definition domains, or in dual sets) : the same reasons explain why
there are other similar distinctions as well for all basic Latin
letters (and digits, as well as Hebrew letters) between italic, bold,
serif, sans-serif, and monospaced styles, with additional codepoints
defined as symbols rather than letters, preserving the needed semantic
distinctions in formulas, but not needed for normal linguistic
orthographies which should always avoid these symbols.

-- Philippe.




Re: PRI #202: Extensions to NameAliases.txt for Unicode 6.1.0

2011-08-28 Thread Doug Ewell

Philippe Verdy wrote:


If there are other mappings to do with other standards, and those
standards must be only informative, we already have the /MAPPINGS
directory beside the /UNIDATA directory where the UCD belongs too.


But in general, with the exception of MAPPINGS/VENDORS/MISC/SGML.TXT, 
the MAPPINGS directory isn't really a place for character *name* 
mappings.  It's primarily a place for *code point* mappings, for 
identifying U+0430 CYRILLIC SMALL LETTER A with 0xD0 in ISO 8859-5, and 
0xC0 in Windows-1251, and 0xE0 in MacCyrillic, and 0xC1 in KOI8-R. 
Character names in other standards, like 'acy' for U+0430, are 
comparatively less important.


--
Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14
www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell ­




Re: PRI #202: Extensions to NameAliases.txt for Unicode 6.1.0

2011-08-28 Thread Asmus Freytag

On 8/28/2011 9:46 PM, Doug Ewell wrote:

Philippe Verdy wrote:


If there are other mappings to do with other standards, and those
standards must be only informative, we already have the /MAPPINGS
directory beside the /UNIDATA directory where the UCD belongs too.


But in general, with the exception of MAPPINGS/VENDORS/MISC/SGML.TXT, 
the MAPPINGS directory isn't really a place for character *name* 
mappings.  It's primarily a place for *code point* mappings, for 
identifying U+0430 CYRILLIC SMALL LETTER A with 0xD0 in ISO 8859-5, 
and 0xC0 in Windows-1251, and 0xE0 in MacCyrillic, and 0xC1 in KOI8-R. 
Character names in other standards, like 'acy' for U+0430, are 
comparatively less important.


Right, however NAME mapping has not been a major issue - except for 
control codes, since Unicode did not name these, even though they were 
routinely named by people dealing with them.


It's really important to not jump off the deep-end and appear to create 
a precedent for name MAPPING across standards when what is desired is to 
have IDENTIFIERS for certain characters as well as SHORT IDENTIFIERS for 
characters very commonly referred to by identifier in source code 
(regular expressions, etc.).


A./




Re: PRI #202: Extensions to NameAliases.txt for Unicode 6.1.0

2011-08-28 Thread Asmus Freytag

On 8/28/2011 6:43 PM, Philippe Verdy wrote:

2011/8/27 Asmus Freytagasm...@ix.netcom.com:

I also think that the status field iso6429 is badly named. It should be
control, and what is named control should be control-alternate, or
perhaps, both of these groups should become simply control. I think the
labels chosen by the data file just set up bad precedents. If 6429, why not
a section for 9535 (or whatever the kbd standard is) etc.

Thanks a lot for admitting what I was trying to demonstrate to you in a
prior message (which was early dismissed as a complete non-starter).


You appeared to be making a non-starter proposal, rather than clearly 
making a hypothetical proposal designed only to showcase certain logical 
flaws in the PRI. If the latter was your intention, well we 
misunderstood you, but everybody seems to be on the same page, which is 
good.


I lso think that there are too many aliases for controls, if the only
need is for Perl to have a name to uniquely designate those controls.
Choose one alias name, but there's absolutely no emergency for now for
adding four aliases at once for them, when there's no demonstration
that all those aliases are needed! This is just unnecessary pollution
of the UCS namespace.


I tend to agree - however, I do think giving the common abbreviations 
some formal status is useful.


If I remember correctly, even in Perl there were some names that are 
legacy names. If programs other than Perl have an active need to support 
legacy names, then I would favor adding these one-by-one as demonstrated 
needs arise, but NOT wholesale, just because they existed in 6429 in 
some version.



Now, here's a subtle point: adding certain alias strings to the file is 
a cheap way for the editing tools that verify the uniqueness of the 
namespace to reserve a name (so it can't ever be given to a different 
character). Kind of like what happened to BELL. I bet a big motivation 
behind the long list (all for control codes) was to prevent any 
non-control code from ever getting a name that happens to match a known 
control code name.


While I appreciate that sentiment, I think this part of the proposal 
should not be rushed - aliases are forever, and warehousing all known 
obsolete names for control codes is a bit bizarre. I think you and I are 
possibly in agreement on that.




If there are other mappings ...


I've replied on the issue of mappings in reply to Doug's message.

A./