Re: Hot Beverage font.
I was not concerned with the mail because it was about one character. That is fine. The announcement itself was welcome. I was objecting to the length of the mail and what I thought were unnecessary details. Is there a reason to expect a TTF not to work in the scenarios described? I simply suggested that we not see an email about availability, character by character. The other font developers make infrequent announcements about substantive collections of characters. I just wanted to establish that perspective if he was going to work on more characters. William makes some interesting points from time to time, but it is difficult to read through all the (I think irrelevant) details to find them. If his mails were completely uninteresting, I would just delete them and it wouldn't be an issue at all. It probably didn't help I was catching up on my Unicode email and was wading thru 200 or so other mails on the list at the same time. hope that's clearer. tex [EMAIL PROTECTED] wrote: one down, 95000+ to go. Can we not have a detailed mail for each character describing 3 places it was used and it looks good to me? I'm curious if you would have sent the same message if Michael Everson had sent a message about one character. We've had threads on this list about one character before. Sure, if every character gets a message like this, it will get tedious, but messages like this certainly aren't off-topic. That was the most productive message William Overtoning has ever sent to the list, so lets not jump all over him for it. -- - Tex Texin cell: +1 781 789 1898 mailto:[EMAIL PROTECTED] Xen Master http://www.i18nGuy.com XenCrafthttp://www.XenCraft.com Making e-Business Work Around the World -
Re: Hot Beverage font
William's Hot Beverage glyph is actually quite a good interpretation of the character, that displays well at all point sizes. Perhaps he could add a glyph for the Hot Pizza character (U+2668) whilst he's on a roll. But why is the Hot Beverage character listed under the heading Weather Symbol in the Miscellaneous Symbols code chart ? Does it rain tea and coffee in North Korea ? Or does the annotation can be used to indicate a wait imply Oh look, it's raining again ... let's go inside and have a nice cup of tea while we wait for the sun to come out (Korean translation forthcoming). Andrew
CJK Unified Ideographs Range
I've asked this question before, but I've never had a satisfactory response, so I'll ask it again now that Unicode 4 is due to be released soon. Section 10.1 of the Unicode Standard, as well as Blocks-4.0.0.txt, give the range of the CJK Unified Ideographs block as U+4E00 through U+9FFF, whereas at the top of the CJK Unified Ideographs code chart it clearly states Range: 4E009FAF, and does not show the columns 9FB0-9FBF, 9FC0-9FCF, 9FD0-9FDF, 9FE0-9FEF and 9FF0-9FFF. Is there a reason for this discrepancy ? Given that new CJK unified ideographs are added to supplementary CJK blocks (CJK-A, CJK-B and CJK-C), and I understand that no more characters are intended to be added to the basic CJK block, why then are U+9FB0 through U+9FFF reserved for the CJK Unified Ideographs block ? Surely these eighty code points would be better utilised if freed for use by new scripts. Andrew
Wrong Charakter Categories (was: Hot Beverage font)
Hello. But why is the Hot Beverage character listed under the heading Weather Symbol in the Miscellaneous Symbols code chart ? This is by far not the only place where the category in the character description is simply wrong - or gone wrong by the introduction of new characters which doesn't fit. Especially in the charts which already were pretty full new characters often have no place under the category they would fit - the charts become more and mor mixed up. (e.g. the new arabic presentation form is no currency symbol) I knew, there is no way to avoid this (nothing worse than an re-ordering can be done to an ongoing standard), but the category-names can (and I think should) be reviced. It's no solution to add even more categories (we will end up each charakter beeing in it's own category), but find new category-names which better fit a fair number of characters. Best regards, Dominikus === Besuchen Sie Glück Kanja auf der CeBIT: Halle 17, Stand C31/25 Live Demo: CryptoEx Gateway - E-Mail-Sicherheit mit einem Server! === Dominikus Scherkl (mailto:[EMAIL PROTECTED]) Senior Developer Glück Kanja Technology AG Christian-Pless-Str. 11-13, D-63069 Offenbach, Germany Web http://www.glueckkanja.com --- Use strong cryptography to protect your e-mails! For info about CryptoEx Freeware mailto:[EMAIL PROTECTED] ===
Unicode keyboard layouts oddity in OS X 10.2.4
Greetings I have created several Unicode keyboard layouts for OS X 10.2.x which are available at http://quinon.com/files/keylayouts/ Usually I have activated two of them: LatinTL and ArabicQWERTY. After updating to OS X 10.2.4, Unicode keyboard layouts checked in Input Menu tag of Internet Preferences do not stick anymore. I.e. with each restart, they vanish from Flag menu and become unchecked in Input Menu. My settings in Input Menu tag of Internet Preferences have not always been retained even before 10.2.4. Sometimes one of checked keyboard layouts vanished or was replaced with another, e.g. my ArabicQWERTY replaced with Apple's Arabic. But these glitches were not always reproductible at least with 10.2.1-10.2.3. Now, even common keyboard layouts such as Unicode Hex Input do not seem to stick. I have not tested extensively with Apple keyboard layouts though. Of course, I suspected my system installation. So I clean installed OS X 10.2 on another partition and created a new user. I have not tested with each updater, but OS X 10.2.1 retains Unicode keyboard layouts I have chosen whereas 10.2.4 does not. Is this a bug? Or something is wrong with my keyboard layouts? Yusuke Kinoshita Yusuke Kinoshita
Re: ngstrm symbol
Doug Ewell wrote: As Stefan Persson already observed, U+212B ANGSTROM SIGN (â«) exists in Unicode alongside U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE (Ã ) only because both characters were present in some legacy character set with which Unicode had to maintain round-trip compatibility. Does anyone know which legacy character set we're talking about? I can only think of character sets including one of them. Stefan _ Gratis e-mail resten av livet på www.yahoo.se/mail Busenkelt!
PS glyph `phi' vs `phi1'
In the file U0370.pdf, describing Unicode 3.2, I find the following 03C6 GREEK SMALL LETTER PHI . the ordinary Greek letter, showing considerable glyph variation . in mathematical contexts, the loopy glyph is preferred, to contrast with 03D5 03D5 GREEK PHI SYMBOL . used as a technical symbol, with a stroked glyph . maps to phi1 symbol entities Looking into Adobe's `Symbol' font (version 001.007, coming with Acrobat Reader 4), I see exactly the opposite: `phi1' is the loopy glyph, and `phi' is the stroked variant. Either the Unicode charts are incorrect or `phi1' doesn't denote an Adobe Glyph name or the Symbol font is wrong or ... Please clarify. Werner
Re: RFC, 5-6 octets sequence in UTF8, non short form in UTF8
Yung-Fong Tang ftang at netscape dot com wrote: I read the RFC 2279 again ( http://www.cis.ohio-state.edu/cs/Services/rfc/rfc-text/rfc2279.txt ) 1. I cannot find any text in it mentioned about. non short form is invalid UTF8, and First, we've already established that a revision to RFC 2279 is in the works. That said, the existing RFC 2279 says the following: Encoding from UCS-4 to UTF-8 proceeds as follows: 1) Determine the number of octets required from the character value and the first column of the table above. It is important to note that the rows of the table are mutually exclusive, i.e. there is only one valid way to encode a given UCS-4 character. The phrase only one valid way makes it very clear, at least to me, that non-shortest forms are invalid. And in the Security Considerations section, overlong sequences are referred to as illegal UTF-8 sequences. This has not changed in the draft replacement, probably because it is already sufficient. 3. It mentioned about how to encode surrogate pair to UTF-8. But it does not say the UTF8 sequence mapping directly to Surrogate High and Surrogate Low are illegal Again, from RFC 2279: UTF-16 is a scheme for transforming a subset of the UCS-4 repertoire into pairs of UCS-2 values from a reserved range. UTF-16 impacts UTF-8 in that UCS-2 values from the reserved range must be treated specially in the UTF-8 transformation. and again: The algorithm for encoding UCS-2 (or Unicode) to UTF-8 can be obtained from the above, in principle, by simply extending each UCS-2 character with two zero-valued octets. However, pairs of UCS-2 values between D800 and DFFF (surrogate pairs in Unicode parlance), being actually UCS-4 characters transformed through UTF-16, need special treatment: the UTF-16 transformation must be undone, yielding a UCS-4 character that is then transformed as above. It's pretty hard to read these paragraphs and come away with the impression that it's OK to map directly between UTF-8 and UTF-16 code units. Only by ignoring the existence of UTF-16 and these passages in RFC 2279, and treating every 16-bit code unit as a character (as some database vendors evidently did), would this even be necessary. The only shortcoming in the RFC is that it doesn't use the word illegal to describe this. The draft replacement adds the following, which should remove all doubt: The definition of UTF-8 prohibits encoding character numbers between U+D800 and U+DFFF, which are reserved for use with the UTF-16 encoding form (as surrogate pairs) and do not directly represent characters. When encoding in UTF-8 from UTF-16 data, it is necessary to first decode the UTF-16 data to obtain character numbers, which are then encoded in UTF-8 as described above. Side note: I'm a little disappointed that the draft replacement goes on to include a description of CESU-8, which is basically a perversion of UTF-8 for processes that are ignorant of UTF-16, and which the RFC later (and correctly) refers to as a naive implementation. CESU-8 is best kept in a dark closet and used internally only by processes that have no choice, and not publicized any more than necessary. -Doug Ewell Fullerton, California
Re: Wrong Charakter Categories (was: Hot Beverage font)
At 12:57 PM 2/19/03 +0100, Dominikus Scherkl wrote: Hello. But why is the Hot Beverage character listed under the heading Weather Symbol in the Miscellaneous Symbols code chart ? This is by far not the only place where the category in the character description is simply wrong - or gone wrong by the introduction of new characters which doesn't fit. If you have issues with the Unicode BETA charts that you would like to see addressed, please follow the instructions on http://www.unicode.org/versions/beta.html about providing beta feedback. Nobody monitors this list for the purpose of extracting feedback buried in the general discussions. Especially in the charts which already were pretty full new characters often have no place under the category they would fit - the charts become more and mor mixed up. (e.g. the new arabic presentation form is no currency symbol) I knew, there is no way to avoid this (nothing worse than an re-ordering can be done to an ongoing standard), but the category-names can (and I think should) be reviced. In fact, they will be. The names list file that was used for the beta code chart was machine-merged from the Unicode 3.2 nameslist plus the list of proposed new characters. The tool does a good job merging blocks and characters (since they have a code position or range that gives them a fixec location in the list), but category headers and general comments (the ones in italics in the list) don't always get merged correctly, or the fact that the new characters interrupt an existing category is not apparent when we work with the list of proposed characters. The final nameslist will be machine-merged from a different source of data. It will take the character codes, names and decompositions from the Unidata.txt file in the Unicode Character Datatabase, and all the other information from an 'annotation' file, from which the correct annotations will be inserted into the nameslist at the correct place. In the process of preparing the annotation file we review the information and add subheaders and comments and make other changes. To see the best available state of the nameslist, look at http://www.unicode.org/Public/4.0-Update/NamesList-4.0.0.txt where '' are some letters that change with each beta draft level. [I just looked, the new file is not there yet, but will be in a few days.] That file is the plain text file that drives the charts generator. If you see headers still missing in that file when it comes out, you might want to send an official beta comment and we'll fix it. [We will not republish the PDF beta charts before they are final, since that's a very time consuming process.] Asmus Freytag Technical Vice President The Unicode Consortium
Re: Unicode keyboard layouts oddity in OS X 10.2.4
There are two problems we have seen with keyboard preferences. 1. Bringing up the force-quit dialog (command-option-escape) can sometimes disable keyboards in ~/Library/Keyboard Layouts. This can be worked around by moving them to /Library/Keyboard Layouts. Please let me know if this is part of the problem. 2. Sometimes other keyboards will not remain enabled over logoff/logon, even if they are not in ~/Library/Keyboard Layouts. Please do the following in Terminal: defaults read com.apple.HIToolbox Keyboard Menu The normal result is: The domain/default pair of (com.apple.HIToolbox, Keyboard Menu) does not exist If you get a different response, please contact me by private e-mail. Thanks, Deborah Goldsmith Manager, Fonts Unicode Apple Computer, Inc. [EMAIL PROTECTED] On Wednesday, February 19, 2003, at 05:34 AM, Kino wrote: Greetings I have created several Unicode keyboard layouts for OS X 10.2.x which are available at http://quinon.com/files/keylayouts/ Usually I have activated two of them: LatinTL and ArabicQWERTY. After updating to OS X 10.2.4, Unicode keyboard layouts checked in Input Menu tag of Internet Preferences do not stick anymore. I.e. with each restart, they vanish from Flag menu and become unchecked in Input Menu. My settings in Input Menu tag of Internet Preferences have not always been retained even before 10.2.4. Sometimes one of checked keyboard layouts vanished or was replaced with another, e.g. my ArabicQWERTY replaced with Apple's Arabic. But these glitches were not always reproductible at least with 10.2.1-10.2.3. Now, even common keyboard layouts such as Unicode Hex Input do not seem to stick. I have not tested extensively with Apple keyboard layouts though. Of course, I suspected my system installation. So I clean installed OS X 10.2 on another partition and created a new user. I have not tested with each updater, but OS X 10.2.1 retains Unicode keyboard layouts I have chosen whereas 10.2.4 does not. Is this a bug? Or something is wrong with my keyboard layouts? Yusuke Kinoshita Yusuke Kinoshita
Re: [OpenType] PS glyph `phi' vs `phi1'
From: Barbara Beeton [EMAIL PROTECTED] Subject: re: [OpenType] PS glyph `phi' vs `phi1' Date: Wed, 19 Feb 2003 11:56:03 -0500 (EST) [Dear Barbara, I took the liberty to cite your message almost completely while CCing the opentype and unicode lists.] the shapes of the two `phi's haven't changed since unicode 2.0; the change for unicode 3.2 is in the additional text. the naming in unicode of 03D5 as a symbol is the unicode technical committee's convention for indicating an established variant that we have to include. while i disagree with the designation of 03D5 as a symbol to the exclusion of 03C6 (resulting in the note in mathematical contexts ...), the fact that both shapes already existed in unicode meant that they shouldn't be switched, since they had presumably been used in documents whose meaning could be corrupted thereby. i have to regard the unicode use as correct regarding codes and shapes. there *could* be an error in the annotations; i'm not familiar with the name phi1. the only entity names i know are these: - isogrk3: - phis = straight phi - phiv = curly or open phi - isogrk1: - phgr = small phi, greek (shown as a curly phi) - there is no straight phi in this entity set unlike the main unicode names (which can't be changed -- a rule that ensures that iso 10646 will be identical to the relevant subset of unicode), the annotations can be changed, so i will forward your query to my contacts on the utc. Thanks. As a conclusion it seems that both Adobe's mapping of U+03D5 and U+03C6 to glyph names and the Unicode annotation for U+03D5 is incorrect (in case backwards compatibility is of importance). The right mapping should be phi 03D5 phi1 03C6 Werner
Re: Hot Beverage font
I know y'all are having fun with this thread, but in case Andrew's inquiry is at least half-serious: But why is the Hot Beverage character listed under the heading Weather Symbol in the Miscellaneous Symbols code chart ? Does it rain tea and coffee in North Korea ? Or does the annotation can be used to indicate a wait imply Oh look, it's raining again ... let's go inside and have a nice cup of tea while we wait for the sun to come out (Korean translation forthcoming). It won't be listed under the heading Weather symbol in the final charts, but instead under Miscellaneous symbol. The current charts are a beta production, based on preliminary name list annotations derived from the WG2 meeting last December in Tokyo. The editorial committee is busy improving the name list annotations -- and eventually an improved set of charts, with many fixes, will be posted for your delectation. In case it is raining, you can sit and have a cup of coffee (or tea) while you wait for them. --Ken
Re: Unicode keyboard layouts oddity in OS X 10.2.4
Thank you very much for your prompt reply. On Thursday, Feb 20, 2003, at 03:50 Asia/Tokyo, Deborah Goldsmith wrote: There are two problems we have seen with keyboard preferences. 1. Bringing up the force-quit dialog (command-option-escape) can sometimes disable keyboards in ~/Library/Keyboard Layouts. This can be worked around by moving them to /Library/Keyboard Layouts. Please let me know if this is part of the problem. I have never noticed it. BTW, 2. Sometimes other keyboards will not remain enabled over logoff/logon, even if they are not in ~/Library/Keyboard Layouts. After logoff/login, my custom keyboard layouts are not lost though Arabic QWERTY is often replaced by Arabic. But after restart, they will vanish from Flag menu and become unchecked in Input Menu tag of International Preferences. Please do the following in Terminal: defaults read com.apple.HIToolbox Keyboard Menu The normal result is: The domain/default pair of (com.apple.HIToolbox, Keyboard Menu) does not exist I got the normal result. So you have not experienced a similar problem with 10.2.4? At first, I thought it to be my personal problem. I have installed so many stuffs, some uncommon stuffs too. So I had been struggling to fix the oddity by all conceivable means. Trashing ~/Library/Preferences/com.apple.HIToolbox.plist, ~/Library/Preferences/ByHost/com.apple.HIToolbox.00039394fd48.plist and files under Caches folders. Repair Permissions. Installing 10.2 and Combo updater to 10.2.4 on another partition. Nothing has worked for me. But yesterday, on another list, I read a posting which *seems* to complain about the same problem. http://listserv.dartmouth.edu/scripts/ wa.exe?A2=ind0302L=nisusT=0F=S=P=18572 If you read messages on the same thread, you'll notice that the others do not seem to have the problem. If I'm not mistaken, the author of the message is using my Latin TL and AsianExtended created by Nobumi Iyanaga. http://www.bekkoame.ne.jp/~n-iyanag/researchTools/asianextended.html Both Latin TL and AsianExtended have the same structure for they have been created by modifying U.S. Extended. So I thought something might be wrong with my keylayout files though Console has not reported a single error. May this kind of oddity be caused by inappropriate owner/permission settings? If so, what is the appropriate setting? In About the Mac OS X 10.2.4 Update http://docs.info.apple.com/article.html?artnum=107362, it is written that Addresses an issue in which the Web browser selection could unexpectedly change to a different browser after updating your default browser. Does this fix have something to do with my problem? Another possibility. Is it possible that this oddity occurs only to specific model(s) of Mac? I'm running OS X 10.2.4 English International on PM G4 dual 1 G MDD. It's too late, almost morning here in Japan. Good night, good day. Yusuke Kinoshita
A new font called Gentium
Sharing with you a msg received today from a friend. How good is Gentium, and can it be used on a Mac? Anyone put it through all its paces - punctum delens, etc.? mg = Dear colleagues, Just thought I'd share a discovery about a new font called Gentium which is excellent for diacritics. It supports a wide range of Latin-based alphabets and includes glyphs that correspond to all the Latin ranges of Unicode. It can be downloaded for free from http://www.sil.org/~gaultney/gentium/index.html and used like any other font in Microsoft Word etc. With Gentium you can even place a dot / punctum delens over consonants, which is a godsend to students of Old Irish. Another thing I learnt recently is that in Microsoft Word for Windows 97-2000 a much more painless way than trawling the Symbol Box for letters with diacritics is to install a freeware add-on called UNIQODER. This adds two menus to the menu bar which makes entering Unicode characters much easier. This is available from http://hem.fyristorg.com/dahloe/uniqoder/ [...] -- Marion Gunn * EGT (Estab.1991) * http://www.egt.ie * fiosruithe/enquiries: [EMAIL PROTECTED] * [EMAIL PROTECTED] *
Re: CJK Unified Ideographs Range
Andrew asked: I've asked this question before, but I've never had a satisfactory response, so I'll ask it again now that Unicode 4 is due to be released soon. Section 10.1 of the Unicode Standard, as well as Blocks-4.0.0.txt, give the range of the CJK Unified Ideographs block as U+4E00 through U+9FFF, whereas at the top of the CJK Unified Ideographs code chart it clearly states Range: 4E009FAF, and does not show the columns 9FB0-9FBF, 9FC0-9FCF, 9FD0-9FDF, 9FE0-9FEF and 9FF0-9FFF. Is there a reason for this discrepancy ? Given that new CJK unified ideographs are added to supplementary CJK blocks (CJK-A, CJK-B and CJK-C), and I understand that no more characters are intended to be added to the basic CJK block, why then are U+9FB0 through U+9FFF reserved for the CJK Unified Ideographs block ? Surely these eighty code points would be better utilised if freed for use by new scripts. The UTC dealt with this issue of block boundaries back in October, 2001, in the context of the review of Blocks.txt for Unicode 3.2. There is mention of this issue and the changes made in Article VII of UAX #28, Unicode 3.2. In particular, the inconsistency in block ending range handling for CJK Unified Ideographs versus the Hangul and Extension A and Extension B blocks was resolved in favor of ending each block on a round hex boundary, i.e. at XXXF, regardless of whether that was the last character in the block or not. The extra space of reserved code points in the CJK Unified Ideographs block is an artifact of block decisions made way back in 1992, well before the BMP looked as tight as it does now. In case you are interested, the particular anomaly regarding the end of the CJK Unified Ideographs block versus the header printed in the code charts is just one of thirteen different types of anomalies that I analyzed and reported on for the 2001 UTC discussion. Below is the relevant excerpt. --Ken quote from L2/01-412 Title: Response to L2/01-419 Block Boundary Fixes Author: Ken Whistler Date: October 30, 2001 Mark Davis has suggested a number of fixes to Blocks.txt, to eliminate some inconsistencies and to try to establish an invariant that all block boundaries end on an XXXF boundary. As usual, in all things Unicode-related, there are some worms (I'm not sure whether they should be considered big wriggly earthworms or just nematodes) in this can. So as a response to The Great Innovator (Mark), The Great Disinnovator (me), has assembled the analysis below of *all* anomalies in block names. These fall into 13 distinct types, for each of which I give a separate analysis and a suggested disposition. In some instances, I think Mark's suggestions are fine, but in other cases, I'd rather we left well-enough alone and abandoned the quest for the invariant. /quote from L2/01-412 By the way, I lost that particular argument. The UTC *did* decide to end all the blocks on an XXXF boundary, and that change was made for Unicode 3.2. Anyone wanting to examine the resultant changes in detail can compare: http://www.unicode.org/Public/3.1-Update/Blocks-4.txt with http://www.unicode.org/Public/3.2-Update/Blocks-3.2.0.txt What follows is my assessment of Anomaly Type #11, which was the one Andrew was referring to, describing the technical production reason for the way the header is constructed in NamesList.txt. quote from L2/01-412 TYPE 11: Block ranges match in Unicode and 10646, for blocks with generated character names, but NamesList.txt shows a mismatched range. 4E00CJK Unified Ideographs 9FA5 4E00..9FFF; CJK Unified Ideographs CJK UNIFIED IDEOGRAPHS 4E00-9FFF Analysis: The range distinction in NamesList.txt is deliberate, to enable calculation of the cutoff point in the charts, where there are no actual character name entries in NamesList.txt to drive this. Suggested resolution: No action. /quote from L2/01-412
Re: [OpenType] PS glyph `phi' vs `phi1'
Thanks. As a conclusion it seems that both Adobe's mapping of U+03D5 and U+03C6 to glyph names and the Unicode annotation for U+03D5 is incorrect (in case backwards compatibility is of importance). The right mapping should be phi 03D5 phi1 03C6 I have to correct myself, fortunately. After looking into the printed version of Unicode 2.0 I see that the glyphs of 03D5 and 03C6 in the file U0370.pdf are exchanged. Your assuption is correct that the annotation in Unicode 3.2 is wrong. Werner
Re: [OpenType] PS glyph `phi' vs `phi1'
On Wednesday, February 19, 2003, at 04:13 PM, Werner LEMBERG wrote: I have to correct myself, fortunately. After looking into the printed version of Unicode 2.0 I see that the glyphs of 03D5 and 03C6 in the file U0370.pdf are exchanged. Your assuption is correct that the annotation in Unicode 3.2 is wrong. I'm sorry, but you've lost me here. The Unicode 3.2 text states: quote With Unicode 3.0 and the concurrent second edition of ISO/IEC 10646-1, the representative glyphs for U+03C6 GREEK LETTER SMALL PHI and U+03D5 GREEK PHI SYMBOL were swapped. In ordinary Greek text, the character U+03C6 is used exclusively, although this characters has considerably glyphic variation, sometimes represented with a glyph more like the representative glyph shown for U+03C6 (the loopy form) and less often with a glyph more like the representative glyph shown for U+03D5 (the straight form). For mathematical and technical use, the straight form of the small phi is an important symbol and needs to be consistently distinguishable from the loopy form. The straight form phi glyph is used as the representative glyph for the symbol phi at U+03D5 to satisfy this distinction. The reversed assignment of representative glyphs in versions of the Unicode Standard prior to Unicode 3.0 had the problem that the character explicitly identified as the mathematical symbol did not have the straight form of the character that is the preferred glyph for that use. Furthermore, it made it unnecessarily difficult for general purpose fonts supporting ordinary Greek text to also add support for Greek letters used as mathematical symbols. This resulted from the fact that many of those fonts already used the loopy form glyph for U+03C6, as preferred for Greek body text; to support the phi symbol as well, they would have had to disrupt glyph choices already optimized for Greek text. When mapping symbol sets or SGML entities to the Unicode Standard, it is important to make sure that codes or entities that require the straight form of the phi symbol be mapped to U+03D5 and not to U+03C6. Mapping to the latter should be reserved for codes or entities that represent the small phi as used in ordinary Greek text. Fonts used primarily for Greek text may use either glyph form for U+03C6, but fonts that also intend to support technical use of the Greek letters should use the loopy form to ensure appropriate contrast with the straight form used for U+03D5. /quote What annotation in 3.2 do you feel is incorrect? == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: DBCS and Unicode 3.1
On Tue, 18 Feb 2003, Markus Scherer wrote: Jungshik Shin wrote: On Mon, 17 Feb 2003, Markus Scherer wrote: Other examples: There are EUC-JP (1/2/3 bytes per character) and EUC-CN (1/2/4 BpC) which are quite old (much older than GB 18030). Markus's fingers made a mistake here :-). It's EUC-TW (not EUC-CN) that encodes CNS 11643 plane 2(1) thru plane 7 using SS2. MBCS. By the way, the encoding scheme for EUC-TW has space for 16 CNS planes, and some vendor implementations use higher planes than 7. Yup. BTW, EUC-KR also uses more than 2 bytes. 8(eight) byte sequences can be used to represent 8,822 precomposed modern Korean syllables not representable with 2 bytes in EUC-KR(ref. KS X 1001:1998/KS C 5601-1987 annex 2). So, the full set of 11,172 precomposed syllables in Unicode can be round-tripped between Unicode and EUC-KR. This is used by the most popular web mail service in Korea(well, they should switch to UTF-8 instead of lengthening the life of EUC-KR this way) and implemented in Mozilla/Netscape and a variant of xterm for Korean(hanterm). Jungshik