Re: Greek Characters Duplicated as Latin (was: Sanskrit nasalized L)
Appreciate it for the info. Wondering whether there are other (in addition to following) Greek letters/symbols that were copied and renamed as LATIN? Thanks, Tulasi From: Richard Wordingham richard.wording...@ntlworld.com Date: Sun, Aug 14, 2011 at 1:39 PM Subject: Greek Characters Duplicated as Latin (was: Sanskrit nasalized L) To: unicode Unicode Discussion unicode@unicode.org On Sat, 6 Aug 2011 17:25:11 -0700 tulasi tulas...@gmail.com wrote: - Why did Unicode Inc copies some letters/symbols from Greek-script irresponsibly and renamed as Latin-script? - Why din't it (Unicode Inc) use same Greek letters/symbols? U+00B5 MICRO SIGN is an ISO-8859-1 character, and was therefore included as U+00B5. It normally precedes a Latin-script letter, and therefore it actually makes sense to treat it as a Latin-script character, and possibly give it a different shape in these contexts to the shape of the Greek letter in Greek text. The glyphs of U+0251 LATIN SMALL LETTER ALPHA are glyphs of U+0061 LATIN SMALL LETTER A - they have been given separate character status because IPA uses it as a contrasting character, as with U+0261 LATIN SMALL LETTER SCRIPT G. U+1E9F LATIN SMALL LETTER DELTA looks to me like a glyph variant of U+0065 LATIN SMALL LETTER D, but I may be wrong - look up the proposal if you're really interested. U+0216 OHM SIGN is similar to U+00B5 MICRO SIGN, except that it is used on its own. Whether it should be merged with U+03A9 GREEK CAPITAL LETTER OMEGA is debatable, but that is what has been done. The reason for the encoding of the next four letters as Latin characters is that they have a special role in the IPA. Three of them have been used in extensions of the Roman alphabets for various languages, and thereby acquired capital letters. U+0263 LATIN SMALL LETTER GAMMA is for IPA usage, and tends to have different glyphs to the Greek letter. When used to extend the Roman alphabet, its capital is different to the Greek form, so this fact also calls for a different lower case letter. U+025B LATIN SMALL LETTER OPEN E has the same explanation as U+0263. U+0278 LATIN SMALL LETTER PHI is for IPA usage, and, unlike Greek, always has an ascender. There is also the principal of script separation, whereby different scripts do not share base characters. This has led to some duplication, e.g U+0269 LATIN SMALL LETTER IOTA, originally for IPA. Its capital, U+0196 LATIN CAPITAL LETTER IOTA, is not the same as the Greek capital iota. I hope this makes things clearer. Richard.
Re: PRI #202: Extensions to NameAliases.txt for Unicode 6.1.0
2011/8/27 Asmus Freytag asm...@ix.netcom.com: I also think that the status field iso6429 is badly named. It should be control, and what is named control should be control-alternate, or perhaps, both of these groups should become simply control. I think the labels chosen by the data file just set up bad precedents. If 6429, why not a section for 9535 (or whatever the kbd standard is) etc. Thanks a lot for admitting what I was trying to demonstrating you in a prior message (whic was early dismissed as a complete non-starter). I lso think that there are too many aliases for controls, if the only need is for Perl to have a name to uniquely designate those controls. Choose one alias name, but there's absolutely no emergency for now for adding four aliases at once for them, when there's no demonstration that all those aliases are needed! This is just unnecessary pollution of the UCS namespace. If there are other mappings to do with other standards, and those standards must be normative, these mappings don't have to be with aliases belonging to the same namespace, it can be just separate properties of characters. If there are other mappings to do with other standards, and those standards must be only informative, we already have the /MAPPINGS directory beside the /UNIDATA directory where the UCD belongs too. (And in fact, I think that the mappings found in the informative UTN for mathematics symbols should be stored in this sister /MAPPINGS folder, as soon as it has been reviewed and the UTN is no longer in a prerelease state; this could also include the default Postscript names or Postscript id's also used in TrueType, OpenType, and in the open PDF file format, because these names or id's are supported by an ISO standard; this would not be different from the mappings used in the GSM encoding, because there's no perfect one-to-one match between those standards and the UCS). -- Philippe.
RE: Encoding of Emoji in SMS, and UCS-2 vs UTF-16
From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf Of Craig McQueen Sent: Tuesday, 16 August 2011 1:28 PM To: unicode@unicode.org Subject: Encoding of Emoji in SMS, and UCS-2 vs UTF-16 The SMS standard specifies UCS-2 encoding: http://www.3gpp.org/ftp/Specs/html-info/23038.htm I see many “emoji” have been defined in Unicode 6. But many emoji are outside the BMP, so can’t be encoded in UCS-2. Does anyone know, is the intention that these emoji should be encoded in SMS using UTF-16 rather than UCS-2? Are there any plans in-progress to update the SMS standards to specify UTF-16 rather than UCS-2? Perhaps this question could be added to the Emoji FAQ. http://unicode.org/faq/emoji_dingbats.html Regards, Craig McQueen I haven’t heard from anyone regarding this. Should I ask on some GSM or other mobile standards mailing list instead? I do think it would be worth adding to the Unicode Emoji FAQ though. Regards, Craig McQueen
Re: Encoding of Emoji in SMS, and UCS-2 vs UTF-16
I would think all standards that specify the use of UCS-2 should be updated to specify UTF-16 instead. There is simply no excuse for any technology that deals with characters to be arbitrarily limited to the BMP. -- Doug Ewell • d...@ewellic.org Sent via BlackBerry by ATT -Original Message- From: Craig McQueen craig.mcqu...@beamcommunications.com Sender: unicode-bou...@unicode.org Date: Sun, 28 Aug 2011 19:16:25 To: unicode@unicode.orgunicode@unicode.org Subject: RE: Encoding of Emoji in SMS, and UCS-2 vs UTF-16 From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf Of Craig McQueen Sent: Tuesday, 16 August 2011 1:28 PM To: unicode@unicode.org Subject: Encoding of Emoji in SMS, and UCS-2 vs UTF-16 The SMS standard specifies UCS-2 encoding: http://www.3gpp.org/ftp/Specs/html-info/23038.htm I see many “emoji” have been defined in Unicode 6. But many emoji are outside the BMP, so can’t be encoded in UCS-2. Does anyone know, is the intention that these emoji should be encoded in SMS using UTF-16 rather than UCS-2? Are there any plans in-progress to update the SMS standards to specify UTF-16 rather than UCS-2? Perhaps this question could be added to the Emoji FAQ. http://unicode.org/faq/emoji_dingbats.html Regards, Craig McQueen I haven’t heard from anyone regarding this. Should I ask on some GSM or other mobile standards mailing list instead? I do think it would be worth adding to the Unicode Emoji FAQ though. Regards, Craig McQueen
Re: Greek Characters Duplicated as Latin (was: Sanskrit nasalized L)
Richard Wordingham richard.wording...@ntlworld.com wrote: U+0278 LATIN SMALL LETTER PHI is for IPA usage, and, unlike Greek, always has an ascender. For linguistic Greek usage, the two variants are considered equivalent. This is not the case in Maths where thee two variants are clearly distinct. That's why (La)TeX preserves a distinction between \phi (with an ascender, in fact drawn with a separate stroke on top of a circle, also typically used in linguistic Greek for non cursive style of books) and \varphi (without the ascender, in fact wholy drawn with a single self-intersecting curved stroke, also typically used in linguistic Greek, for more cursive styles, either handwritten, or in books — even in monospaced fonts — for italic styles). Note that both variants are existing in roman/straight and italic/slanted styles. You'll immediately see that the variant with the ascender is not favored in linguistic uses for the italic style, because it becomes too much near from a slashed lowercase Greek omicron (or Latin/Cyrilic o). You will also easily confuse it with the notation for an empty set, so the \phi variant of LaTeX is most often avoided in most formulas, in favor of \varphi (unless there's a real need to use both distinctly in the same article text). But these \phi and \varphi variants are generally not distinct, except (once again) in some mathematical formulas that need a rich set of variables, or need a convention to make distinction between operands and operators, or between scalars, vectors, tensors, torsors, fields, differentiators and so on (or between variables belonging to distinct definition domains, or in dual sets) : the same reasons explain why there are other similar distinctions as well for all basic Latin letters (and digits, as well as Hebrew letters) between italic, bold, serif, sans-serif, and monospaced styles, with additional codepoints defined as symbols rather than letters, preserving the needed semantic distinctions in formulas, but not needed for normal linguistic orthographies which should always avoid these symbols. -- Philippe.
Re: PRI #202: Extensions to NameAliases.txt for Unicode 6.1.0
Philippe Verdy wrote: If there are other mappings to do with other standards, and those standards must be only informative, we already have the /MAPPINGS directory beside the /UNIDATA directory where the UCD belongs too. But in general, with the exception of MAPPINGS/VENDORS/MISC/SGML.TXT, the MAPPINGS directory isn't really a place for character *name* mappings. It's primarily a place for *code point* mappings, for identifying U+0430 CYRILLIC SMALL LETTER A with 0xD0 in ISO 8859-5, and 0xC0 in Windows-1251, and 0xE0 in MacCyrillic, and 0xC1 in KOI8-R. Character names in other standards, like 'acy' for U+0430, are comparatively less important. -- Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14 www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell
Re: PRI #202: Extensions to NameAliases.txt for Unicode 6.1.0
On 8/28/2011 9:46 PM, Doug Ewell wrote: Philippe Verdy wrote: If there are other mappings to do with other standards, and those standards must be only informative, we already have the /MAPPINGS directory beside the /UNIDATA directory where the UCD belongs too. But in general, with the exception of MAPPINGS/VENDORS/MISC/SGML.TXT, the MAPPINGS directory isn't really a place for character *name* mappings. It's primarily a place for *code point* mappings, for identifying U+0430 CYRILLIC SMALL LETTER A with 0xD0 in ISO 8859-5, and 0xC0 in Windows-1251, and 0xE0 in MacCyrillic, and 0xC1 in KOI8-R. Character names in other standards, like 'acy' for U+0430, are comparatively less important. Right, however NAME mapping has not been a major issue - except for control codes, since Unicode did not name these, even though they were routinely named by people dealing with them. It's really important to not jump off the deep-end and appear to create a precedent for name MAPPING across standards when what is desired is to have IDENTIFIERS for certain characters as well as SHORT IDENTIFIERS for characters very commonly referred to by identifier in source code (regular expressions, etc.). A./
Re: PRI #202: Extensions to NameAliases.txt for Unicode 6.1.0
On 8/28/2011 6:43 PM, Philippe Verdy wrote: 2011/8/27 Asmus Freytagasm...@ix.netcom.com: I also think that the status field iso6429 is badly named. It should be control, and what is named control should be control-alternate, or perhaps, both of these groups should become simply control. I think the labels chosen by the data file just set up bad precedents. If 6429, why not a section for 9535 (or whatever the kbd standard is) etc. Thanks a lot for admitting what I was trying to demonstrate to you in a prior message (which was early dismissed as a complete non-starter). You appeared to be making a non-starter proposal, rather than clearly making a hypothetical proposal designed only to showcase certain logical flaws in the PRI. If the latter was your intention, well we misunderstood you, but everybody seems to be on the same page, which is good. I lso think that there are too many aliases for controls, if the only need is for Perl to have a name to uniquely designate those controls. Choose one alias name, but there's absolutely no emergency for now for adding four aliases at once for them, when there's no demonstration that all those aliases are needed! This is just unnecessary pollution of the UCS namespace. I tend to agree - however, I do think giving the common abbreviations some formal status is useful. If I remember correctly, even in Perl there were some names that are legacy names. If programs other than Perl have an active need to support legacy names, then I would favor adding these one-by-one as demonstrated needs arise, but NOT wholesale, just because they existed in 6429 in some version. Now, here's a subtle point: adding certain alias strings to the file is a cheap way for the editing tools that verify the uniqueness of the namespace to reserve a name (so it can't ever be given to a different character). Kind of like what happened to BELL. I bet a big motivation behind the long list (all for control codes) was to prevent any non-control code from ever getting a name that happens to match a known control code name. While I appreciate that sentiment, I think this part of the proposal should not be rushed - aliases are forever, and warehousing all known obsolete names for control codes is a bit bizarre. I think you and I are possibly in agreement on that. If there are other mappings ... I've replied on the issue of mappings in reply to Doug's message. A./