Re: Hot Beverage font.

2003-02-19 Thread Tex Texin
I was not concerned with the mail because it was about one character. That is
fine. The announcement itself was welcome.

I was objecting to the length of the mail and what I thought were unnecessary
details.
Is there a reason to expect a TTF not to work in the scenarios described?

I simply suggested that we not see an email about availability, character by
character.
The other font developers make infrequent announcements about substantive
collections of characters.
I just wanted to establish that perspective if he was going to work on more
characters. 

William makes some interesting points from time to time, but it is difficult
to read through all the (I think irrelevant) details to find them.
If his mails were completely uninteresting, I would just delete them and it
wouldn't be an issue at all.

It probably didn't help I was catching up on my Unicode email and was wading
thru 200 or so other mails on the list at the same time.
hope that's clearer.

tex

[EMAIL PROTECTED] wrote:
 
 one down, 95000+ to go.
 
 Can we not have a detailed mail for each character describing 3 places it was
 used and it looks good to me?
 
 I'm curious if you would have sent the same message if Michael Everson had
 sent a message about one character. We've had threads on this list about one
 character before. Sure, if every character gets a message like this, it will
 get tedious, but messages like this certainly aren't off-topic. That
 was the most productive message William Overtoning has ever sent to the list,
 so lets not jump all over him for it.

-- 
-
Tex Texin   cell: +1 781 789 1898   mailto:[EMAIL PROTECTED]
Xen Master  http://www.i18nGuy.com
 
XenCrafthttp://www.XenCraft.com
Making e-Business Work Around the World
-




Re: Hot Beverage font

2003-02-19 Thread Andrew C. West
William's Hot Beverage glyph is actually quite a good interpretation of the
character, that displays well at all point sizes. Perhaps he could add a glyph
for the Hot Pizza character (U+2668) whilst he's on a roll.

But why is the Hot Beverage character listed under the heading Weather Symbol
in the Miscellaneous Symbols code chart ? Does it rain tea and coffee in North
Korea ? Or does the annotation can be used to indicate a wait imply Oh look,
it's raining again ... let's go inside and have a nice cup of tea while we wait
for the sun to come out (Korean translation forthcoming).

Andrew




CJK Unified Ideographs Range

2003-02-19 Thread Andrew C. West
I've asked this question before, but I've never had a satisfactory response, so
I'll ask it again now that Unicode 4 is due to be released soon.

Section 10.1 of the Unicode Standard, as well as Blocks-4.0.0.txt, give the
range of the CJK Unified Ideographs block as U+4E00 through U+9FFF, whereas at
the top of the CJK Unified Ideographs code chart it clearly states Range:
4E00–9FAF, and does not show the columns 9FB0-9FBF, 9FC0-9FCF, 9FD0-9FDF,
9FE0-9FEF and 9FF0-9FFF. Is there a reason for this discrepancy ?

Given that new CJK unified ideographs are added to supplementary CJK blocks
(CJK-A, CJK-B and CJK-C), and I understand that no more characters are intended
to be added to the basic CJK block, why then are U+9FB0 through U+9FFF reserved
for the CJK Unified Ideographs block ? Surely these eighty code points would be
better utilised if freed for use by new scripts.

Andrew




Wrong Charakter Categories (was: Hot Beverage font)

2003-02-19 Thread Dominikus Scherkl
Hello.

 But why is the Hot Beverage character listed under the 
 heading Weather Symbol in the Miscellaneous Symbols
 code chart ?

This is by far not the only place where the category in
the character description is simply wrong - or gone wrong
by the introduction of new characters which doesn't fit.

Especially in the charts which already were pretty full
new characters often have no place under the category
they would fit - the charts become more and mor mixed up.
(e.g. the new arabic presentation form is no currency symbol)

I knew, there is no way to avoid this (nothing worse than
an re-ordering can be done to an ongoing standard), but
the category-names can (and I think should) be reviced.

It's no solution to add even more categories (we will end
up each charakter beeing in it's own category), but find
new category-names which better fit a fair number of
characters.

Best regards,

Dominikus
 
===
Besuchen Sie Glück  Kanja auf der CeBIT: Halle 17, Stand C31/25 
Live Demo: CryptoEx Gateway - E-Mail-Sicherheit mit einem Server!
===
Dominikus Scherkl (mailto:[EMAIL PROTECTED])
Senior Developer
Glück  Kanja Technology AG
Christian-Pless-Str. 11-13, D-63069 Offenbach, Germany
Web http://www.glueckkanja.com
---
Use strong cryptography to protect your e-mails!
For info about CryptoEx Freeware mailto:[EMAIL PROTECTED]
=== 




Unicode keyboard layouts oddity in OS X 10.2.4

2003-02-19 Thread Kino
Greetings

I have created several Unicode keyboard layouts for OS X 10.2.x which 
are available at
	http://quinon.com/files/keylayouts/
Usually I have activated two of them: LatinTL and ArabicQWERTY.

After updating to OS X 10.2.4, Unicode keyboard layouts checked in 
Input Menu tag of Internet Preferences do not stick anymore. I.e. with 
each restart, they vanish from Flag menu and become unchecked in Input 
Menu.

My settings in Input Menu tag of Internet Preferences have not always 
been retained even before 10.2.4. Sometimes one of checked keyboard 
layouts vanished or was replaced with another, e.g. my ArabicQWERTY 
replaced with Apple's Arabic. But these glitches were not always 
reproductible at least with 10.2.1-10.2.3.

Now, even common keyboard layouts such as Unicode Hex Input do not seem 
to stick. I have not tested extensively with Apple keyboard layouts 
though.

Of course, I suspected my system installation. So I clean installed OS 
X 10.2 on another partition and created a new user. I have not tested 
with each updater, but OS X 10.2.1 retains Unicode keyboard layouts I 
have chosen whereas 10.2.4 does not.

Is this a bug? Or something is wrong with my keyboard layouts?

Yusuke Kinoshita


Yusuke Kinoshita





Re: ngstrm symbol

2003-02-19 Thread Stefan Persson
Doug Ewell wrote:


As Stefan Persson already observed, U+212B ANGSTROM SIGN (Å) exists in
Unicode alongside U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE (Å) only
because both characters were present in some legacy character set with
which Unicode had to maintain round-trip compatibility.
 

Does anyone know which legacy character set we're talking about?  I can 
only think of character sets including one of them.

Stefan

_
Gratis e-mail resten av livet på www.yahoo.se/mail
Busenkelt!




PS glyph `phi' vs `phi1'

2003-02-19 Thread Werner LEMBERG

In the file U0370.pdf, describing Unicode 3.2, I find the following

  03C6  GREEK SMALL LETTER PHI
. the ordinary Greek letter, showing
  considerable glyph variation
. in mathematical contexts, the loopy glyph
  is preferred, to contrast with 03D5

  03D5  GREEK PHI SYMBOL
. used as a technical symbol, with a stroked
  glyph
. maps to phi1 symbol entities

Looking into Adobe's `Symbol' font (version 001.007, coming with
Acrobat Reader 4), I see exactly the opposite: `phi1' is the loopy
glyph, and `phi' is the stroked variant.

Either the Unicode charts are incorrect or `phi1' doesn't denote an
Adobe Glyph name or the Symbol font is wrong or ...

Please clarify.


Werner




Re: RFC, 5-6 octets sequence in UTF8, non short form in UTF8

2003-02-19 Thread Doug Ewell
Yung-Fong Tang ftang at netscape dot com wrote:

 I read the RFC 2279 again (
 http://www.cis.ohio-state.edu/cs/Services/rfc/rfc-text/rfc2279.txt )
 1.  I cannot find any text in it mentioned about. non short form is
 invalid UTF8, and

First, we've already established that a revision to RFC 2279 is in the
works.

That said, the existing RFC 2279 says the following:

Encoding from UCS-4 to UTF-8 proceeds as follows:

1) Determine the number of octets required from the character value
and the first column of the table above.  It is important to note
that the rows of the table are mutually exclusive, i.e. there is
only one valid way to encode a given UCS-4 character.

The phrase only one valid way makes it very clear, at least to me,
that non-shortest forms are invalid.  And in the Security
Considerations section, overlong sequences are referred to as illegal
UTF-8 sequences.  This has not changed in the draft replacement,
probably because it is already sufficient.

 3. It mentioned about how to encode surrogate pair to UTF-8. But it
 does not say the UTF8 sequence mapping directly to Surrogate High and
 Surrogate Low are illegal

Again, from RFC 2279:

UTF-16 is a scheme for transforming a subset of the UCS-4 repertoire
into pairs of UCS-2 values from a reserved range.  UTF-16 impacts
UTF-8 in that UCS-2 values from the reserved range must be treated
specially in the UTF-8 transformation.

and again:

The algorithm for encoding UCS-2 (or Unicode) to UTF-8 can be
obtained from the above, in principle, by simply extending each
UCS-2 character with two zero-valued octets.  However, pairs of
UCS-2 values between D800 and DFFF (surrogate pairs in Unicode
parlance), being actually UCS-4 characters transformed through
UTF-16, need special treatment: the UTF-16 transformation must be
undone, yielding a UCS-4 character that is then transformed as
above.

It's pretty hard to read these paragraphs and come away with the
impression that it's OK to map directly between UTF-8 and UTF-16 code
units.  Only by ignoring the existence of UTF-16 and these passages in
RFC 2279, and treating every 16-bit code unit as a character (as some
database vendors evidently did), would this even be necessary.  The only
shortcoming in the RFC is that it doesn't use the word illegal to
describe this.

The draft replacement adds the following, which should remove all doubt:

The definition of UTF-8 prohibits encoding character numbers between
U+D800 and U+DFFF, which are reserved for use with the UTF-16
encoding form (as surrogate pairs) and do not directly represent
characters.  When encoding in UTF-8 from UTF-16 data, it is necessary
to first decode the UTF-16 data to obtain character numbers, which
are then encoded in UTF-8 as described above.

Side note:  I'm a little disappointed that the draft replacement goes on
to include a description of CESU-8, which is basically a perversion of
UTF-8 for processes that are ignorant of UTF-16, and which the RFC later
(and correctly) refers to as a naive implementation.  CESU-8 is best
kept in a dark closet and used internally only by processes that have no
choice, and not publicized any more than necessary.

-Doug Ewell
 Fullerton, California





Re: Wrong Charakter Categories (was: Hot Beverage font)

2003-02-19 Thread Asmus Freytag
At 12:57 PM 2/19/03 +0100, Dominikus Scherkl wrote:

Hello.

 But why is the Hot Beverage character listed under the
 heading Weather Symbol in the Miscellaneous Symbols
 code chart ?

This is by far not the only place where the category in
the character description is simply wrong - or gone wrong
by the introduction of new characters which doesn't fit.


If you have issues with the Unicode BETA charts that you would like to see 
addressed, please follow the instructions on 
http://www.unicode.org/versions/beta.html about providing beta feedback. 
Nobody monitors this list for the purpose of extracting feedback buried in 
the general discussions.

Especially in the charts which already were pretty full
new characters often have no place under the category
they would fit - the charts become more and mor mixed up.
(e.g. the new arabic presentation form is no currency symbol)

I knew, there is no way to avoid this (nothing worse than
an re-ordering can be done to an ongoing standard), but
the category-names can (and I think should) be reviced.


In fact, they will be. The names list file that was used for the beta code 
chart was machine-merged from the Unicode 3.2 nameslist plus the list of 
proposed new characters. The tool does a good job merging blocks and 
characters (since they have a code position or range that gives them a 
fixec location in the list), but category headers and general comments (the 
ones in italics in the list) don't always get merged correctly, or the fact 
that the new characters interrupt an existing category is not apparent when 
we work with the list of proposed characters.

The final nameslist will be machine-merged from a different source of data. 
It will take the character codes, names and decompositions from the 
Unidata.txt file in the Unicode Character Datatabase, and all the other 
information from an 'annotation' file, from which the correct annotations 
will be inserted into the nameslist at the correct place.

In the process of preparing the annotation file we review the information 
and add subheaders and comments and make other changes. To see the best 
available state of the nameslist, look at 
http://www.unicode.org/Public/4.0-Update/NamesList-4.0.0.txt where 
'' are some letters that change with each beta draft level. [I just 
looked, the new file is not there yet, but will be in a few days.]

That file is the plain text file that drives the charts generator. If you 
see headers still missing in that file when it comes out, you might want to 
send an official beta comment and we'll fix it. [We will not republish the 
PDF beta charts before they are final, since that's a very time consuming 
process.]

Asmus Freytag
Technical Vice President
The Unicode Consortium



Re: Unicode keyboard layouts oddity in OS X 10.2.4

2003-02-19 Thread Deborah Goldsmith
There are two problems we have seen with keyboard preferences.

1. Bringing up the force-quit dialog (command-option-escape) can 
sometimes disable keyboards in ~/Library/Keyboard Layouts. This can be 
worked around by moving them to /Library/Keyboard Layouts. Please let 
me know if this is part of the problem.

2. Sometimes other keyboards will not remain enabled over logoff/logon, 
even if they are not in ~/Library/Keyboard Layouts.

Please do the following in Terminal:

defaults read com.apple.HIToolbox Keyboard Menu

The normal result is:

The domain/default pair of (com.apple.HIToolbox, Keyboard Menu) does 
not exist

If you get a different response, please contact me by private e-mail.

Thanks,

Deborah Goldsmith
Manager, Fonts  Unicode
Apple Computer, Inc.
[EMAIL PROTECTED]

On Wednesday, February 19, 2003, at 05:34  AM, Kino wrote:

Greetings

I have created several Unicode keyboard layouts for OS X 10.2.x which 
are available at
	http://quinon.com/files/keylayouts/
Usually I have activated two of them: LatinTL and ArabicQWERTY.

After updating to OS X 10.2.4, Unicode keyboard layouts checked in 
Input Menu tag of Internet Preferences do not stick anymore. I.e. with 
each restart, they vanish from Flag menu and become unchecked in Input 
Menu.

My settings in Input Menu tag of Internet Preferences have not always 
been retained even before 10.2.4. Sometimes one of checked keyboard 
layouts vanished or was replaced with another, e.g. my ArabicQWERTY 
replaced with Apple's Arabic. But these glitches were not always 
reproductible at least with 10.2.1-10.2.3.

Now, even common keyboard layouts such as Unicode Hex Input do not 
seem to stick. I have not tested extensively with Apple keyboard 
layouts though.

Of course, I suspected my system installation. So I clean installed OS 
X 10.2 on another partition and created a new user. I have not tested 
with each updater, but OS X 10.2.1 retains Unicode keyboard layouts I 
have chosen whereas 10.2.4 does not.

Is this a bug? Or something is wrong with my keyboard layouts?

Yusuke Kinoshita


Yusuke Kinoshita








Re: [OpenType] PS glyph `phi' vs `phi1'

2003-02-19 Thread Werner LEMBERG
From: Barbara Beeton [EMAIL PROTECTED]
Subject: re: [OpenType] PS glyph `phi' vs `phi1'
Date: Wed, 19 Feb 2003 11:56:03 -0500 (EST)

[Dear Barbara, I took the liberty to cite your message almost
 completely while CCing the opentype and unicode lists.]

 the shapes of the two `phi's haven't changed since unicode 2.0; the
 change for unicode 3.2 is in the additional text.  the naming in
 unicode of 03D5 as a symbol is the unicode technical committee's
 convention for indicating an established variant that we have to
 include.  while i disagree with the designation of 03D5 as a symbol
 to the exclusion of 03C6 (resulting in the note in mathematical
 contexts ...), the fact that both shapes already existed in unicode
 meant that they shouldn't be switched, since they had presumably
 been used in documents whose meaning could be corrupted thereby.

 i have to regard the unicode use as correct regarding codes and
 shapes.  there *could* be an error in the annotations; i'm not
 familiar with the name phi1.  the only entity names i know are
 these:

  - isogrk3:
- phis = straight phi
- phiv = curly or open phi
  - isogrk1:
- phgr = small phi, greek (shown as a curly phi)
- there is no straight phi in this entity set

 unlike the main unicode names (which can't be changed -- a rule that
 ensures that iso 10646 will be identical to the relevant subset of
 unicode), the annotations can be changed, so i will forward your
 query to my contacts on the utc.

Thanks.  As a conclusion it seems that both Adobe's mapping of U+03D5
and U+03C6 to glyph names and the Unicode annotation for U+03D5 is
incorrect (in case backwards compatibility is of importance).

The right mapping should be

  phi   03D5
  phi1  03C6


Werner




Re: Hot Beverage font

2003-02-19 Thread Kenneth Whistler
I know y'all are having fun with this thread, but in
case Andrew's inquiry is at least half-serious:

 But why is the Hot Beverage character listed under the heading Weather Symbol
 in the Miscellaneous Symbols code chart ? Does it rain tea and coffee in North
 Korea ? Or does the annotation can be used to indicate a wait imply Oh look,
 it's raining again ... let's go inside and have a nice cup of tea while we wait
 for the sun to come out (Korean translation forthcoming).

It won't be listed under the heading Weather symbol in
the final charts, but instead under Miscellaneous symbol.
The current charts are a beta production, based on
preliminary name list annotations derived from the WG2
meeting last December in Tokyo. The editorial committee
is busy improving the name list annotations -- and eventually
an improved set of charts, with many fixes, will be posted
for your delectation. In case it is raining, you can sit
and have a cup of coffee (or tea) while you wait for them.

--Ken






Re: Unicode keyboard layouts oddity in OS X 10.2.4

2003-02-19 Thread Kino
Thank you very much for your prompt reply.

On Thursday, Feb 20, 2003, at 03:50 Asia/Tokyo, Deborah Goldsmith wrote:


There are two problems we have seen with keyboard preferences.

1. Bringing up the force-quit dialog (command-option-escape) can  
sometimes disable keyboards in ~/Library/Keyboard Layouts. This can be  
worked around by moving them to /Library/Keyboard Layouts. Please let  
me know if this is part of the problem.

I have never noticed it. BTW,


2. Sometimes other keyboards will not remain enabled over  
logoff/logon, even if they are not in ~/Library/Keyboard Layouts.

After logoff/login, my custom keyboard layouts are not lost though  
Arabic QWERTY is often replaced by Arabic. But after restart, they will  
vanish from Flag menu and become unchecked in Input Menu tag of  
International Preferences.

Please do the following in Terminal:

defaults read com.apple.HIToolbox Keyboard Menu

The normal result is:

The domain/default pair of (com.apple.HIToolbox, Keyboard Menu) does  
not exist

I got the normal result.

So you have not experienced a similar problem with 10.2.4? At first, I  
thought it to be my personal problem. I have installed so many stuffs,  
some uncommon stuffs too. So I had been struggling to fix the oddity by  
all conceivable means. Trashing   
~/Library/Preferences/com.apple.HIToolbox.plist,  
~/Library/Preferences/ByHost/com.apple.HIToolbox.00039394fd48.plist and  
files under Caches folders. Repair Permissions. Installing 10.2 and  
Combo updater to 10.2.4 on another partition. Nothing has worked for me.

But yesterday, on another list, I read a posting which *seems* to  
complain about the same problem.
http://listserv.dartmouth.edu/scripts/ 
wa.exe?A2=ind0302L=nisusT=0F=S=P=18572
If you read messages on the same thread, you'll notice that the others  
do not seem to have the problem.
If I'm not mistaken, the author of the message is using my Latin TL and  
AsianExtended created by Nobumi Iyanaga.
http://www.bekkoame.ne.jp/~n-iyanag/researchTools/asianextended.html
Both Latin TL and AsianExtended have the same structure for they have  
been created by modifying U.S. Extended. So I thought something might  
be wrong with my keylayout files though Console has not reported a  
single error.

May this kind of oddity be caused by inappropriate owner/permission  
settings? If so, what is the appropriate setting?

In About the Mac OS X 10.2.4 Update  
http://docs.info.apple.com/article.html?artnum=107362, it is written  
that Addresses an issue in which the Web browser selection could  
unexpectedly change to a different browser after updating your default  
browser. Does this fix have something to do with my problem?

Another possibility. Is it possible that this oddity occurs only to  
specific model(s) of Mac? I'm running OS X 10.2.4 English International  
on PM G4 dual 1 G MDD.

It's too late, almost morning here in Japan. Good night, good day.


Yusuke Kinoshita











A new font called Gentium

2003-02-19 Thread Marion Gunn
Sharing with you a msg received today from a friend.

How good is Gentium, and can it be used on a Mac?

Anyone put it through all its paces - punctum delens, etc.?
mg


=
Dear colleagues,
Just thought I'd share a discovery about a new font called Gentium
which is excellent for
diacritics. It supports a wide range of Latin-based alphabets and includes
glyphs that correspond to
all the Latin ranges of Unicode.

It can be downloaded for free from

http://www.sil.org/~gaultney/gentium/index.html

and used like any other font in Microsoft Word etc.

With Gentium you can even place a dot / punctum delens over consonants,
which is a godsend to
students of Old Irish.

Another thing I learnt recently is that in Microsoft Word for Windows
97-2000 a much more
painless way than trawling the Symbol Box for letters with diacritics is to
install a freeware add-on
called UNIQODER.  This adds two menus to the menu bar which makes entering
Unicode
characters much easier.
This is available from

http://hem.fyristorg.com/dahloe/uniqoder/
[...]


--
Marion Gunn * EGT (Estab.1991) * http://www.egt.ie *
fiosruithe/enquiries: [EMAIL PROTECTED] * [EMAIL PROTECTED] *






Re: CJK Unified Ideographs Range

2003-02-19 Thread Kenneth Whistler
Andrew asked:

 I've asked this question before, but I've never had a satisfactory response, so
 I'll ask it again now that Unicode 4 is due to be released soon.
 
 Section 10.1 of the Unicode Standard, as well as Blocks-4.0.0.txt, give the
 range of the CJK Unified Ideographs block as U+4E00 through U+9FFF, whereas at
 the top of the CJK Unified Ideographs code chart it clearly states Range:
 4E00–9FAF, and does not show the columns 9FB0-9FBF, 9FC0-9FCF, 9FD0-9FDF,
 9FE0-9FEF and 9FF0-9FFF. Is there a reason for this discrepancy ?
 
 Given that new CJK unified ideographs are added to supplementary CJK blocks
 (CJK-A, CJK-B and CJK-C), and I understand that no more characters are intended
 to be added to the basic CJK block, why then are U+9FB0 through U+9FFF reserved
 for the CJK Unified Ideographs block ? Surely these eighty code points would be
 better utilised if freed for use by new scripts.

The UTC dealt with this issue of block boundaries back in October, 2001,
in the context of the review of Blocks.txt for Unicode 3.2. There
is mention of this issue and the changes made in Article VII of
UAX #28, Unicode 3.2.

In particular, the inconsistency in block ending range handling for
CJK Unified Ideographs versus the Hangul and Extension A and Extension B
blocks was resolved in favor of ending each block on a round hex
boundary, i.e. at XXXF, regardless of whether that was the last character
in the block or not. The extra space of reserved code points in
the CJK Unified Ideographs block is an artifact of block decisions made
way back in 1992, well before the BMP looked as tight as it does now.

In case you are interested, the particular anomaly regarding the end of
the CJK Unified Ideographs block versus the header printed in the code
charts is just one of thirteen different types of anomalies that I
analyzed and reported on for the 2001 UTC discussion. Below is the
relevant excerpt.

--Ken

quote from L2/01-412

Title: Response to L2/01-419 Block Boundary Fixes
Author: Ken Whistler
Date: October 30, 2001

Mark Davis has suggested a number of fixes to Blocks.txt, to
eliminate some inconsistencies and to try to establish an
invariant that all block boundaries end on an XXXF boundary.
As usual, in all things Unicode-related, there are some
worms (I'm not sure whether they should be considered big
wriggly earthworms or just nematodes) in this can.

So as a response to The Great Innovator (Mark), The
Great Disinnovator (me), has assembled the analysis below of
*all* anomalies in block names. These fall into 13 distinct
types, for each of which I give a separate analysis and
a suggested disposition.

In some instances, I think Mark's suggestions are fine, but
in other cases, I'd rather we left well-enough alone and
abandoned the quest for the invariant.
 
/quote from L2/01-412

By the way, I lost that particular argument. The UTC *did*
decide to end all the blocks on an XXXF boundary, and that
change was made for Unicode 3.2. Anyone wanting to examine
the resultant changes in detail can compare:

http://www.unicode.org/Public/3.1-Update/Blocks-4.txt

with

http://www.unicode.org/Public/3.2-Update/Blocks-3.2.0.txt

What follows is my assessment of Anomaly Type #11, which
was the one Andrew was referring to, describing the technical
production reason for the way the header is constructed in
NamesList.txt.

quote from L2/01-412



TYPE 11: Block ranges match in Unicode and 10646, for
blocks with generated character names, but NamesList.txt
shows a mismatched range.

4E00CJK Unified Ideographs  9FA5
4E00..9FFF; CJK Unified Ideographs
CJK UNIFIED IDEOGRAPHS  4E00-9FFF

Analysis: The range distinction in NamesList.txt is deliberate,
to enable calculation of the cutoff point in the charts,
where there are no actual character name entries in NamesList.txt
to drive this.

Suggested resolution: No action.



/quote from L2/01-412





Re: [OpenType] PS glyph `phi' vs `phi1'

2003-02-19 Thread Werner LEMBERG

 Thanks.  As a conclusion it seems that both Adobe's mapping of
 U+03D5 and U+03C6 to glyph names and the Unicode annotation for
 U+03D5 is incorrect (in case backwards compatibility is of
 importance).
 
 The right mapping should be
 
   phi   03D5
   phi1  03C6

I have to correct myself, fortunately.  After looking into the printed
version of Unicode 2.0 I see that the glyphs of 03D5 and 03C6 in the
file U0370.pdf are exchanged.  Your assuption is correct that the
annotation in Unicode 3.2 is wrong.


Werner




Re: [OpenType] PS glyph `phi' vs `phi1'

2003-02-19 Thread John H. Jenkins

On Wednesday, February 19, 2003, at 04:13 PM, Werner LEMBERG wrote:


I have to correct myself, fortunately.  After looking into the printed
version of Unicode 2.0 I see that the glyphs of 03D5 and 03C6 in the
file U0370.pdf are exchanged.  Your assuption is correct that the
annotation in Unicode 3.2 is wrong.



I'm sorry, but you've lost me here.  The Unicode 3.2 text states:

quote

With Unicode 3.0 and the concurrent second edition of ISO/IEC 10646-1, 
the representative glyphs for U+03C6 GREEK LETTER SMALL PHI and U+03D5 
GREEK PHI SYMBOL were swapped. In ordinary Greek text, the character 
U+03C6 is used exclusively, although this characters has considerably 
glyphic variation, sometimes represented with a glyph more like the 
representative glyph shown for U+03C6 (the “loopy” form) and less often 
with a glyph more like the representative glyph shown for U+03D5 (the 
“straight” form).

For mathematical and technical use, the straight form of the small phi 
is an important symbol and needs to be consistently distinguishable 
from the loopy form. The straight form phi glyph is used as the 
representative glyph for the symbol phi at U+03D5 to satisfy this 
distinction.

The reversed assignment of representative glyphs in versions of the 
Unicode Standard prior to Unicode 3.0 had the problem that the 
character explicitly identified as the mathematical symbol did not have 
the straight form of the character that is the preferred glyph for that 
use. Furthermore, it made it unnecessarily difficult for general 
purpose fonts supporting ordinary Greek text to also add support for 
Greek letters used as mathematical symbols. This resulted from the fact 
that many of those fonts already used the loopy form glyph for U+03C6, 
as preferred for Greek body text; to support the phi symbol as well, 
they would have had to disrupt glyph choices already optimized for 
Greek text.

When mapping symbol sets or SGML entities to the Unicode Standard, it 
is important to make sure that codes or entities that require the 
straight form of the phi symbol be mapped to U+03D5 and not to U+03C6. 
Mapping to the latter should be reserved for codes or entities that 
represent the small phi as used in ordinary Greek text.

Fonts used primarily for Greek text may use either glyph form for 
U+03C6, but fonts that also intend to support technical use of the 
Greek letters should use the loopy form to ensure appropriate contrast 
with the straight form used for U+03D5.

/quote

What annotation in 3.2 do you feel is incorrect?

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/




Re: DBCS and Unicode 3.1

2003-02-19 Thread Jungshik Shin
On Tue, 18 Feb 2003, Markus Scherer wrote:

 Jungshik Shin wrote:
  On Mon, 17 Feb 2003, Markus Scherer wrote:
 Other examples: There are EUC-JP (1/2/3 bytes per character) and
 EUC-CN (1/2/4 BpC) which are quite  old (much older than GB 18030).
 
Markus's fingers made a mistake here :-). It's EUC-TW (not EUC-CN)
  that encodes CNS 11643 plane 2(1) thru plane 7 using SS2.

 MBCS. By the way, the encoding scheme for EUC-TW has space for 16 CNS
 planes, and some vendor implementations use higher planes than 7.

  Yup. BTW, EUC-KR also uses more than 2 bytes. 8(eight) byte sequences
can be used to represent 8,822 precomposed modern Korean  syllables
not representable with 2 bytes in EUC-KR(ref.
KS X 1001:1998/KS C 5601-1987 annex 2). So, the full set
of 11,172 precomposed syllables in Unicode can be round-tripped
between Unicode and EUC-KR. This is used by the most popular
web mail service in Korea(well, they should switch to UTF-8
instead of lengthening the life of EUC-KR this way) and implemented
in Mozilla/Netscape and a variant of xterm for Korean(hanterm).

  Jungshik