Re: creating a test font w/ CJKV Extension B characters.

2003-12-01 Thread Frank Yung-Fong Tang

as my last memory, IE even could render the GB18030, still treat multi 
byte characters accorss TCP block poorly. For example, if you have a 4 
bytes GB18030 across a TCP block (4k? 8k?), it will be trashed.



Andrew C. West wrote:

 > On Mon, 24 Nov 2003 10:12:52 +, [EMAIL PROTECTED] wrote:
 > >
 > > Even with the registery changes that allow Uniscript to work with such
 > > characters?
 >
 > Oops, my mistake. I had forgotten that I had deliberately deleted the
 > registry
 > settings that control how IE deals with surrogate pairs sometime ago
 > in order to
 > prove a point (that IE won't display surrogate pairs without them ?).
 > Anyway,
 > restore the registry to its original state and Frank's page displays
 > OK without
 > any tweaking whatsoever - both NCR and GB18030 encoded CJK-B
 > characters render
 > correctly with my preferred CJK-B font.
 >
 > To install the registry keys necessary for IE to display surrogate
 > pairs simply
 > copy the code below to a file named "something.reg" and double-click
 > on it.
 > Replace "Code2001" with the name of your preferred Supra-BMP font if
 > necessary.
 >
 > 
 > Windows Registry Editor Version 5.00
 >
 > [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows
 > NT\CurrentVersion\LanguagePack]
 > "SURROGATE"=dword:0002
 >
 > [HKEY_CURRENT_USER\Software\Microsoft\Internet
 > Explorer\International\Scripts\42]
 > "IEFixedFontName"="Code2001"
 > "IEPropFontName"="Code2001"
 > 
 >
 > Andrew
 >

-- 
--
Frank Yung-Fong Tang
ÅÃÅtÃm ÃrÃhÃtÃÃt, IÃtÃrnÃtiÃnÃl DÃvÃlÃpmeÃt, AOL IntÃrÃÃtÃvà 
SÃrviÃes
AIM:yungfongta   mailto:[EMAIL PROTECTED] Tel:650-937-2913
Yahoo! Msg: frankyungfongtan

John 3:16 "For God so loved the world that he gave his one and only Son,
that whoever believes in him shall not perish but have eternal life."

Does your software display Thai language text correctly for Thailand users?
-> Basic Conceptof Thai Language linked from Frank Tang's
IÃtÃrnÃtiÃnÃlizÃtiÃn Secrets
Want to translate your English text to something Thailand users can
understand ?
-> Try English-to-Thai machine translation at
http://c3po.links.nectec.or.th/parsit/





Re: creating a test font w/ CJKV Extension B characters.

2003-11-24 Thread John Jenkins
於 Nov 19, 2003 7:29 PM 時,Frank Yung-Fong Tang 提到:

Can you point out which document and chapter in those doc talk in those
document talk about what we need to do to add non-BMP charactrers?
I don't think it's specifically addressed, but…

which of the following MacOSX font tool should be used for that 
purpose?


# ftxdumperfuser

This is the tool to use.  Dump the 'cmap' table, add the surrogate 
mappings, and fuse it back in.


John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage..mac.com/jhjenkins/



Re: creating a test font w/ CJKV Extension B characters.

2003-11-24 Thread John Jenkins
於 Nov 20, 2003 1:12 AM 時,Arcane Jill 提到:

> Does FontLab support generating TTF in format12 (32 bits)?
> Which "cheaper solutions" could  generating TTF in format12 (32 
bits)?

Not at the moment.


John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage..mac.com/jhjenkins/



Re: creating a test font w/ CJKV Extension B characters.

2003-11-24 Thread jon
> To install the registry keys necessary for IE to display surrogate pairs
> simply
> copy the code below to a file named "something.reg" and double-click on it.
> Replace "Code2001" with the name of your preferred Supra-BMP font if
> necessary.
> 
> 
> Windows Registry Editor Version 5.00
> 
> [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows
> NT\CurrentVersion\LanguagePack]
> "SURROGATE"=dword:0002
> 
> [HKEY_CURRENT_USER\Software\Microsoft\Internet
> Explorer\International\Scripts\42]
> "IEFixedFontName"="Code2001"
> "IEPropFontName"="Code2001"
> 

On this note, does anyone know why this isn't enabled by default? Presumably 
there is some risk and/or performance issue if this is enabled. Microsoft 
recommend (or at least used to) that you only do this registry change if you 
have a strong use-case for it.
Anyone know the details?



Re: creating a test font w/ CJKV Extension B characters.

2003-11-24 Thread Andrew C. West
On Mon, 24 Nov 2003 10:12:52 +, [EMAIL PROTECTED] wrote:
> 
> Even with the registery changes that allow Uniscript to work with such 
> characters?

Oops, my mistake. I had forgotten that I had deliberately deleted the registry
settings that control how IE deals with surrogate pairs sometime ago in order to
prove a point (that IE won't display surrogate pairs without them ?). Anyway,
restore the registry to its original state and Frank's page displays OK without
any tweaking whatsoever - both NCR and GB18030 encoded CJK-B characters render
correctly with my preferred CJK-B font.

To install the registry keys necessary for IE to display surrogate pairs simply
copy the code below to a file named "something.reg" and double-click on it.
Replace "Code2001" with the name of your preferred Supra-BMP font if necessary.


Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\LanguagePack]
"SURROGATE"=dword:0002

[HKEY_CURRENT_USER\Software\Microsoft\Internet Explorer\International\Scripts\42]
"IEFixedFontName"="Code2001"
"IEPropFontName"="Code2001"


Andrew



Re: creating a test font w/ CJKV Extension B characters.

2003-11-24 Thread jon
> As far as W2K/IE6 is concerned, if you have GB-18030 support installed (and
> it
> is not installed by default) then it seems to open, display and save
> GB-18030
> pages with no problem. The problem is that W2K/IE6 won't render supra-BMP
> characters no matter what encoding you use unless they are represented as
> NCRs
> (either as a single 32-bit value or as two 16-bit surrogates, in hex or
> decimal
> format) and the encoding is set to "User Defined" (either in the encoding
> declaration on the page or manually by the end-user).

Even with the registery changes that allow Uniscript to work with such 
characters?



RE: creating a test font w/ CJKV Extension B characters.

2003-11-21 Thread Philippe Verdy
De : Doug Ewell [mailto:[EMAIL PROTECTED]
> Unless GB 18030 prohibits invalid sequences the way Unicode does, I
> suppose there's no reason you couldn't map invalid GB 18030 sequences to
> PUA code points *within the privacy of your own application* if you
> really want to preserve them in some way, and have some idea what you
> want to do with them.  You MAY NOT map them to Unicode noncharacters or
> anything outside the Unicode/10646 range (i.e. beyond U+10FFFD).

I did not propose to use such map externally. An application or system
can use whatever internal encoding it thniks may be useful to handle
legacy cases, even invalid ones, provided that this internal encoding
is not used to create external data claiming it is Unicode. If that
module preserves the invalid sequences that were present on its input,
and provided that the input did not claim to be Unicode (GB18030 is in
that case), I don't think it violates Unicode conformance, simply
because there's no Unicode interface on this system.

Such system could be built explicitly to conform only to GB18030,
without claiming anything else about Unicode. The internal use of
Unicode mappings for some sequences, and extra mappings for characters
or sequences not in Unicode is an internal decision that only influence
the design of the implementation: Unicode in that case is used as a
convenient tool to perform some things, but there's no required
dependency. Using Unicode algorithms or mappings internally just
eases the implementation of the other encoding.

The solution that would map invalid sequences into Unicode PUAs may
have the problem of colliding with other valid PUAs used in GB18030.
These invalid sequences may as well contain information which is not
plain-text for Unicode, such as markup or presentation elements, and
this does not violate the Unicode model used to encode ONLY
plain-text, and leaving other non-standard uses free for markyp or
upper-layer protocols.

So my question remains: does GB18030 permanently binds out-of-range
or invalid sequences to non-characters? If not, GB18030 applications
may use them to encode something else than plain-text, and there
will be a need to map them to extra planes if the internal handling
of text is best done with a extended Unicode encoding form like
UCS-4.

Another solution could be that GB18030 mandates the mapping of invalid
sequences to a well-defined set of Unicode PUAs. This would allow them
to become usable in UTF-16 encoding forms. But as this mapping is not
done for now, the question of the current assignment of GB18030 invalid
sequences to non-characters remains open: is the mapping of GB18030
to Unicode completely closed, or left open for further applications
like markup (annotation or visual formating and layout, font selection,
text alternatives, semantic or syntactic data, pointers or links to
associated information, images, custom bitmap-glyphs, sets of character
properties, phonetic variants...)?


__
<< ella for Spam Control >> has removed Spam messages and set aside
Newsletters for me
You can use it too - and it's FREE!  http://www.ellaforspam.com
<>

Re: creating a test font w/ CJKV Extension B characters.

2003-11-21 Thread Doug Ewell
Philippe Verdy  wrote:

> Could an editor loading such incorrect but legacy GB-18030 file accept
> to load it and work with it using an internal-only UCS-4 mapping (or
> an extended UTF-8 mapping), to preserve those out of range sequences,
> as if they were mapped in a extra PUA range?
>
> Of course saving the file into a UTF encoding would be forbidden, but
> saving the internal UCS-4 file back to GB-18030 would preserve those
> out-of-range GB-18030 sequences, without making any other
> interpretation, and without changing them arbitrarily into the GB18030
> equivalent of U+FFFD?

We talked about this not long ago concerning invalid UTF-8 sequences,
and the same arguments would apply here.  Most people agreed that:

(1) There is no particular reason to preserve invalid code unit
sequences, as if they had some kind of paleographic value.

(2) It is not the responsibility of encoding scheme A to provide a
mapping for an invalid sequence in encoding scheme B.

Unless GB 18030 prohibits invalid sequences the way Unicode does, I
suppose there's no reason you couldn't map invalid GB 18030 sequences to
PUA code points *within the privacy of your own application* if you
really want to preserve them in some way, and have some idea what you
want to do with them.  You MAY NOT map them to Unicode noncharacters or
anything outside the Unicode/10646 range (i.e. beyond U+10FFFD).

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/




RE: creating a test font w/ CJKV Extension B characters.

2003-11-21 Thread Andrew C. West
On Fri, 21 Nov 2003 15:12:26 +0100, "Philippe Verdy" wrote:
> 
> Could an editor loading such incorrect but legacy GB-18030 file accept to
> load it and work with it using an internal-only UCS-4 mapping (or an
> extended UTF-8 mapping), to preserve those out of range sequences, as if
> they were mapped in a extra PUA range?
> 

An editor which stored data internally as extended UTF-32 or extended UTF-8
could easily preserve such invalid codepoints, but BabelPad stores data
internally as UTF-16 so it couldn't, and even if it could it wouldn't as its a
Unicode editor, and codepoints beyond U+10 are not Unicode (nor for that
matter are codepoints beyond  valid GB-18030 as far as I'm aware).
The first thing I'll do this evening is change BabelPad so that GB-18030
codepoints beyond  are converted to U+FFFD.

> Of course saving the file into a UTF encoding would be forbidden, but saving
> the internal UCS-4 file back to GB-18030 would preserve those out-of-range
> GB-18030 sequences, without making any other interpretation, and without
> changing them arbitrarily into the GB18030 equivalent of U+FFFD?
> 
> The editor could still use the Unicode rules for all valid GB18030
> sequences. And the invalid characters could be then represented for example
> with a colored/highlighted glyph such as . As both the input and
> output are not a Unicode scheme, I don't think this invalidates the Unicode
> conformance: the behavior would just be conforming to GB18030 or other
> legacy GB PUAs mappings.
> 

I'm pretty sure that there are no such legacy GB mapping, and I doubt that China
will ever want to map characters to extra-Unicode codepoints in GB-18030 ...
they seem far more interested in trying to force everyone else to accept their
unwanted characters in the BMP than putting them in some limbo beyond Plane 16.

Andrew



Re: creating a test font w/ CJKV Extension B characters.

2003-11-21 Thread Doug Ewell
Andrew C. West  wrote:

>> An invalid GB18030 sequence, like , or a valid but out-of-
>> range sequence, like , should be treated just like an
>> invalid or out-of-range UTF-8 sequence.  Issue an error message,
>> format the hard disk, whatever; just don't try to treat it like a
>> normal character.
>
> Hmm, surely  is a valid GB-18030 sequence = U+FA0C according to
> my reckoning (although Word fails to correctly convert  when
> told to open a file as GB-18030, it does save U+FA0C as  when
> told to save as GB-18030).

Oops, sorry.  I goofed in my example.  Substitute  or something
similar.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/




RE: creating a test font w/ CJKV Extension B characters.

2003-11-21 Thread Philippe Verdy
De: Andrew C. West
> (Unfortunately I've just noticed that BabelPad has a slight
> bug with out of range GB-18030 values such as
>  = U+11.)

Could an editor loading such incorrect but legacy GB-18030 file accept to
load it and work with it using an internal-only UCS-4 mapping (or an
extended UTF-8 mapping), to preserve those out of range sequences, as if
they were mapped in a extra PUA range?

Of course saving the file into a UTF encoding would be forbidden, but saving
the internal UCS-4 file back to GB-18030 would preserve those out-of-range
GB-18030 sequences, without making any other interpretation, and without
changing them arbitrarily into the GB18030 equivalent of U+FFFD?

The editor could still use the Unicode rules for all valid GB18030
sequences. And the invalid characters could be then represented for example
with a colored/highlighted glyph such as . As both the input and
output are not a Unicode scheme, I don't think this invalidates the Unicode
conformance: the behavior would just be conforming to GB18030 or other
legacy GB PUAs mappings.

Of course this editor will not be able to work on this text if its internal
encoding form is UTF-16, unless the editor uses aditional internal markup or
storage of GB sequences that were were mapped in the edit buffer as an
0xFFFD UTF-16 code unit. This "augmented text" with annotated values for
U+FFFD present in the text would then not be handled as if it was only
Unicode plain-text, but can constitute what Unicode calls an upper-layer
protocol, that is used to keep the original code sequences used in a
non-Unicode charset encoding and have no clear equivalent in Unicode.

The same thing could be used for example to map the "Apple logo" registered
character in files coded with MacRoman, instead of remapping it  to a weakly
interchangeable PUA: the out-of-band annotation of U+FFFD in the plain-text
part of the edited file would keep the track of the origin encoding of this
character, and the file may then be transmitted either in a latered form
with a UTF, or by using some other text encapsulation format: for example a
XML named entity (like "&apple-logo;") or a  element, or a  reference (in HTML files).


__
<< ella for Spam Control >> has removed Spam messages and set aside
Newsletters for me
You can use it too - and it's FREE!  http://www.ellaforspam.com
<>

Re: creating a test font w/ CJKV Extension B characters.

2003-11-21 Thread Andrew C. West
On Thu, 20 Nov 2003 21:02:49 -0800, "Doug Ewell" wrote:
> 
> An invalid GB18030 sequence, like , or a valid but out-of-range
> sequence, like , should be treated just like an invalid or
> out-of-range UTF-8 sequence.  Issue an error message, format the hard
> disk, whatever; just don't try to treat it like a normal character.
> 

Hmm, surely  is a valid GB-18030 sequence = U+FA0C according to my
reckoning (although Word fails to correctly convert  when told to open a
file as GB-18030, it does save U+FA0C as  when told to save as GB-18030).

In BabelPad I convert any invalid GB-18030 characters to U+FFFD ("used to
replace an incoming character whose value is unknown or unrepresentable in
Unicode"), and notify the user that the file has been opened with errors, which
I think is a compliant and sensible implementation. (Unfortunately I've just
noticed that BabelPad has a slight bug with out of range GB-18030 values such as
 = U+11.)

Andrew



Re: creating a test font w/ CJKV Extension B characters.

2003-11-21 Thread Andrew C. West
On Thu, 20 Nov 2003 11:45:35 -0800, "Frank Yung-Fong Tang" wrote:
> 
> so.. in summary, how is your concusion about the quality of "GB18030" 
> support on IE6/Win2K ? If you run the same test on Mozilla / Netscape 
> 7.0, what is your conclusion about that quality of support?

For the benefit of those who seem willing to trash Frank's page without actually
having looked at it, it is indeed encoded as GB-18030 with the declaration . The SIP
characters are represented three ways on the page : in native GB-18030 encoding,
as hexadecimal NCR entities, and as gif images. It is therefore a fine test for
browser support of GB-18030.

As far as W2K/IE6 is concerned, if you have GB-18030 support installed (and it
is not installed by default) then it seems to open, display and save GB-18030
pages with no problem. The problem is that W2K/IE6 won't render supra-BMP
characters no matter what encoding you use unless they are represented as NCRs
(either as a single 32-bit value or as two 16-bit surrogates, in hex or decimal
format) and the encoding is set to "User Defined" (either in the encoding
declaration on the page or manually by the end-user).

Andrew



Re: creating a test font w/ CJKV Extension B characters.

2003-11-21 Thread Doug Ewell
Philippe Verdy  wrote:

> What is a browser supposed to do if it finds an out-of-range GB
> sequence that is NOT mapped to Unicode? Does GB18030 specify that
> these sequences are now "invalid" (and permanently assigned to non-
> characters, like U+ in Unicode), and not "reserved" for future use
> (like "unassigned" code points in Unicode) ?

An invalid GB18030 sequence, like , or a valid but out-of-range
sequence, like , should be treated just like an invalid or
out-of-range UTF-8 sequence.  Issue an error message, format the hard
disk, whatever; just don't try to treat it like a normal character.

> This is critical, because I could fear that some future relase of
> GB18030 may assign some functions to these sequences, which will be
> impossible to map onto Unicode, but only onto ISO/IEC-10646 "extra"
> planes.



There ARE no "extra" planes in ISO/IEC 10646.  They will not be used.
Ever.  Forget you ever heard about them.

There are one hundred thirty-seven THOUSAND private-use code points.  If
the Chinese insist on encoding characters in GB18030 that haven't been
approved by UTC and WG2, rest assured there will be plenty of room for
them in the PUA or EPUA.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/




RE: creating a test font w/ CJKV Extension B characters.

2003-11-20 Thread Peter Constable
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
Behalf Of [EMAIL PROTECTED]


> Have you tried BabelMap or BabelPad?  Both can show non-BMP...

As can ViewGlyph. For links to that and other tools, see
http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=fonttoo
llinks


Peter Constable





Re: creating a test font w/ CJKV Extension B characters.

2003-11-20 Thread Frank Yung-Fong Tang


Michael (michka) Kaplan wrote:

 > From: "Frank Yung-Fong Tang" <[EMAIL PROTECTED]>
 >
 > > so.. in summary, how is your concusion about the quality of "GB18030"
 > > support on IE6/Win2K ? If you run the same test on Mozilla / Netscape
 > > 7.0, what is your conclusion about that quality of support?
 >
 > In Summary?
 >
 > Well, in summry, I fail to see how testing for NCRs has anything to do
 > with
 > suport of *any* encoding in a browser. It seems like an inadequate
 > test of
 > functionality of "gb18030" support.
 >
 > If you want to test gb18030 support, then please encode a web page in
 > gb18030 and test *that* in the browser of your choice.

Have you ever look at that page before you said this? or the html source 
of that page?
Those page display 5 information [for BMP characters, less information 
is there] for each characters
1. the GB18030 encoded value in hex. That hex value of the first two 
bytes are display on the top of the page for the plane 2 characters. The 
hex value of the thrid byte is display on the left of each row. The 4th 
byte is display on top of each column
2. The 4 bytes of the characters encoded in GB18030
3. The same characters encoded by using hex escape in html as &#xhhh;
4. a  point to the image on www.unicode.org
5 The equvilant Unicode hex value is display as U+ in the bottom

Ideally, if the browser do thing right and the font is install, the one 
who want to test can compare 2, 3, and 4 to see what happen.
Therefore, it could be used to test BOTH the NCR and GB18030.
If 2 display different from 4 (assume the server is up and running and 
you do see the glyph in gif), then it mean the conveter have problem.
If 3 display different from 4 (assume the gif can be view), then it mean 
  your html parser have problem
If 2 and 3 display different from 4, then it could be both have problem 
, the rendering engine itself have problem, or all of them have problem.

Of course, you don't really need to  part, you can compare with the 
Unicode 4.0 standard by yourself. But my tool is written 1 year before I 
got my hardcopy of Unicode 4.0 standard so that image help us to QA.

If you SAVE the page locally then look at the result, notice the "save" 
operation could already damange you page.

And YES, I DO encode that page in GB18030 and use byte to encode. I did 
have ADDITIONAL information encoded in NCR and  there to help to 
verfication. You may missed the real GB18030 encoded characters there if 
you do not pay close attention.

 >
 > Now if you want to discuss NCR support then that may also be interesting.
 > But it would be nice to have tests that actually cover what they claim to
 > cover
I do have actual claim about what it cover. And more than that. The 
problem is you look at the addition part which is beyond what I claim in 
the last email.

 >
 > MichKa [MS]
 > NLS Collation/Locale/Keyboard Development
 > Globalization Infrastructure and Font Technologies


-- 
--
Frank Yung-Fong Tang
ÅÃÅtÃm ÃrÃhÃtÃÃt, IÃtÃrnÃtiÃnÃl DÃvÃlÃpmeÃt, AOL IntÃrÃÃtÃvà 
SÃrviÃes
AIM:yungfongta   mailto:[EMAIL PROTECTED] Tel:650-937-2913
Yahoo! Msg: frankyungfongtan

John 3:16 "For God so loved the world that he gave his one and only Son,
that whoever believes in him shall not perish but have eternal life."

Does your software display Thai language text correctly for Thailand users?
-> Basic Conceptof Thai Language linked from Frank Tang's
IÃtÃrnÃtiÃnÃlizÃtiÃn Secrets
Want to translate your English text to something Thailand users can
understand ?
-> Try English-to-Thai machine translation at
http://c3po.links.nectec.or.th/parsit/





Re: creating a test font w/ CJKV Extension B characters.

2003-11-20 Thread Philippe Verdy
From: "Michael (michka) Kaplan" <[EMAIL PROTECTED]>
> If you want to test gb18030 support, then please encode a web page in
> gb18030 and test *that* in the browser of your choice.
>
> Now if you want to discuss NCR support then that may also be interesting.
> But it would be nice to have tests that actually cover what they claim to
> cover

Aren't NCR's supposed to contain ONLY a Unicode code point, even on
GB18030-encoded codepages?
Testing a page with NCR will only test Unicode support, not GB18030 support
even if the Unicode codepoint in the NCR indicates a character in the
ideographic plane 2...

To really test GB18030, you need to encode the page with it, without using
NCRs.
I.e. you need to know the mapping tables between GB18030 code positions and
Unicode code points, and implement the ranges table for those GB18030 code
positions that are algorithmically mapped on Unicode.

One subsidiary question.
What is a browser supposed to do if it finds an out-of-range GB sequence
that is NOT mapped to Unicode? Does GB18030 specify that these sequences are
now "invalid" (and permanently assigned to non-characters, like U+ in
Unicode), and not "reserved" for future use (like "unassigned" code points
in Unicode) ?

This is critical, because I could fear that some future relase of GB18030
may assign some functions to these sequences, which will be impossible to
map onto Unicode, but only onto ISO/IEC-10646 "extra" planes. My worst fear
is that these sequences could be used to define EUDCS ideographic character,
using some extra convention that allows encoding glyph forms (or sequences
of strokes and layout info) and assign them to a PUA, directly within a
plain-text GB18030 document.

The alternative to it would be to create a model for grapheme clusters
adapted to Han ideographs, using ideographic description characters and
assigning code points to the composite Han strokes that make up the
ideograph. Then it would become possible to create a normative dictionnary
between all existing Han ideographs and their composed strokes (with an
additional benefit as it could allow implementing collation order by stroke
more easily, using the normative Han description decomposition). This would
also help unifying new collections of ideographs and avoid duplicate
assignments for those ideographs that merit a distinct encoding as a single
code point.




Re: creating a test font w/ CJKV Extension B characters.

2003-11-20 Thread Frank Yung-Fong Tang
James:
I think the first thing you need to make sure is you did properly 
install all of the following:
a. you are running on W2K or WinXP
b. install the surrogate support from microsoft if you are running win2k
c. configure your font in the IE font pref

Try to do the following
1. open notepad
2. select the surrogate font
3. open Netscape 7 or mozilla
4. view thc url I gave you in mozilla
If you did a and b (even without c) you should see the chinese text there.
5. click the [text link]
6. copy and paste the text into your Word XP
7. Do you see it correctly in Word XP.
8. Do the same thing and put into Notepad
If that don't show you, then the problem is really you don't install the 
thing right.

[EMAIL PROTECTED] wrote:

 > .
 > Andrew C. West wrote,
 >
 > > Using W2K and IE6, if you have a CJK-B font configured for "User
 > Defined"
 > > scripts under the "Options : Fonts" settings, and manually select the
 > encoding
 > > for the page as "User Defined", then the second CJK-B character in
 > each box
 > > (just above the gif image) displays just fine.
 >
 > Yes.  The page was downloaded and heavily tweaked off line.
 >
 > First I substituted a decimal numeric character reference for one
 > of the hexadecimal entries.  No dice.
 >
 > I did a couple of other tricks to no avail.
 >
 > I removed the GB character set declaration and tried to manually
 > set the [View] to user defined.  The page loaded again, but didn't
 > display, checking the [View] showed that the page was still being
 > loaded as UTF-8!  Tried it again and again.
 >
 > At this point, *I* was heavily tweaked, so I didn't even try to
 > insert 'x-user-defined' character set into the HTML header.
 > I just went back on line and opened the page successfully with
 > a different browser.
 >
 > Best regards,
 >
 > James Kass
 > .
 >

-- 
--
Frank Yung-Fong Tang
ÅÃÅtÃm ÃrÃhÃtÃÃt, IÃtÃrnÃtiÃnÃl DÃvÃlÃpmeÃt, AOL IntÃrÃÃtÃvà 
SÃrviÃes
AIM:yungfongta   mailto:[EMAIL PROTECTED] Tel:650-937-2913
Yahoo! Msg: frankyungfongtan

John 3:16 "For God so loved the world that he gave his one and only Son,
that whoever believes in him shall not perish but have eternal life.

Does your software display Thai language text correctly for Thailand users?
-> Basic Conceptof Thai Language linked from Frank Tang's
IÃtÃrnÃtiÃnÃlizÃtiÃn Secrets
Want to translate your English text to something Thailand users can
understand ?
-> Try English-to-Thai machine translation at
http://c3po.links.nectec.or.th/parsit/





Re: creating a test font w/ CJKV Extension B characters.

2003-11-20 Thread Michael \(michka\) Kaplan
From: "Frank Yung-Fong Tang" <[EMAIL PROTECTED]>

> so.. in summary, how is your concusion about the quality of "GB18030"
> support on IE6/Win2K ? If you run the same test on Mozilla / Netscape
> 7.0, what is your conclusion about that quality of support?

In Summary?

Well, in summry, I fail to see how testing for NCRs has anything to do with
suport of *any* encoding in a browser. It seems like an inadequate test of
functionality of "gb18030" support.

If you want to test gb18030 support, then please encode a web page in
gb18030 and test *that* in the browser of your choice.

Now if you want to discuss NCR support then that may also be interesting.
But it would be nice to have tests that actually cover what they claim to
cover

MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies




Re: creating a test font w/ CJKV Extension B characters.

2003-11-20 Thread Frank Yung-Fong Tang

so.. in summary, how is your concusion about the quality of "GB18030" 
support on IE6/Win2K ? If you run the same test on Mozilla / Netscape 
7.0, what is your conclusion about that quality of support?


Andrew C. West wrote:

 > On Thu, 20 Nov 2003 01:32:16 +, [EMAIL PROTECTED] wrote:
 > >
 > > Frank Yung-Fong Tang wrote,
 > > > If you visit
 > > >
 > http://people.netscape.com/ftang/testscript/gb18030/gb18030.cgi?page=596
 > > > and your machine have surrogate support install correctly and
 > surrogate
 > > > font install correctly then you should see surrogate characters
 > show up
 > > > match the gif.
 > >
 > > It isn't working, but I have surrogate support and a font correctly
 > > installed.
 > >
 >
 > Using W2K and IE6, if you have a CJK-B font configured for "User Defined"
 > scripts under the "Options : Fonts" settings, and manually select the
 > encoding
 > for the page as "User Defined", then the second CJK-B character in
 > each box
 > (just above the gif image) displays just fine.
 >
 > The top character in each box appears to be encoded as GB-18030 (e.g.
 > GB-18030
 > 0x95328236 = U+2), and the second character is encoded as hex NCR
 > values
 > (e.g. 𠀀 for U+2).
 >
 > If GB-18030 is selected as the encoding for the page (as explicitly
 > given in the
 > file), then IE won't display the CJK-B characters correctly (even if you
 > configure a CJK-B font as your default font for displaying Chinese),
 > but you can
 > copy and paste them to a Unicode editor, where both the GB-18030 and
 > NCR encoded
 > forms of CJK-B characters will display correctly with an appropriate
 > CJK-B font.
 >
 > If User Defined is selected as the encoding for the page (either
 > manually or by
 > changing the meta tag in the file to charset="x-user-defined"), then the
 > GB-18030 encoded characters turn to gunk, but the NCR representations are
 > displayed using whatever font you have configured for user defined
 > scripts, and
 > if that is a CJK-B font then hey presto !
 >
 > Andrew
 >

-- 
--
Frank Yung-Fong Tang
ÅÃÅtÃm ÃrÃhÃtÃÃt, IÃtÃrnÃtiÃnÃl DÃvÃlÃpmeÃt, AOL IntÃrÃÃtÃvà 
SÃrviÃes
AIM:yungfongta   mailto:[EMAIL PROTECTED] Tel:650-937-2913
Yahoo! Msg: frankyungfongtan

John 3:16 "For God so loved the world that he gave his one and only Son,
that whoever believes in him shall not perish but have eternal life.

Does your software display Thai language text correctly for Thailand users?
-> Basic Conceptof Thai Language linked from Frank Tang's
IÃtÃrnÃtiÃnÃlizÃtiÃn Secrets
Want to translate your English text to something Thailand users can
understand ?
-> Try English-to-Thai machine translation at
http://c3po.links.nectec.or.th/parsit/





Re: creating a test font w/ CJKV Extension B characters.

2003-11-20 Thread jameskass
.
Andrew C. West wrote,

> Using W2K and IE6, if you have a CJK-B font configured for "User Defined"
> scripts under the "Options : Fonts" settings, and manually select the 
encoding
> for the page as "User Defined", then the second CJK-B character in each box
> (just above the gif image) displays just fine.

Yes.  The page was downloaded and heavily tweaked off line.

First I substituted a decimal numeric character reference for one
of the hexadecimal entries.  No dice.

I did a couple of other tricks to no avail.

I removed the GB character set declaration and tried to manually
set the [View] to user defined.  The page loaded again, but didn't
display, checking the [View] showed that the page was still being
loaded as UTF-8!  Tried it again and again.

At this point, *I* was heavily tweaked, so I didn't even try to
insert 'x-user-defined' character set into the HTML header.  
I just went back on line and opened the page successfully with 
a different browser.

Best regards,

James Kass
.



Re: creating a test font w/ CJKV Extension B characters.

2003-11-20 Thread jameskass
.
Gary P. Grosso wrote,


> On Win2K, "Character Map" (charmap.exe) does not show anything 
> beyond the BMP.  I haven't tried this on XP.

Have you tried BabelMap or BabelPad?  Both can show non-BMP...

http://uk.geocities.com/BabelStone1357

Best regards,

James Kass
.
> Since we're comparing notes on font tools, I recently was
> asked to look over an experimental font which had, among
> other things, characters in the Supplemental Multilingual
> Plane and used CFF format.  (I had to look up what CFF
> format even was.)
> 
> PFAEdit was able to load the font.  At least I could see the 
> SMP characters; I didn't attempt any editing, kerning, etc.
> I've always been fairly impressed with PFAEdit, which probably
> deserves a name which reflects the fact that it goes well
> beyond PFA files or even Type 1 fonts.  In fact, I'd like
> to see it ported to Windows.
> 
> Font Creator Program couldn't load the font due to the CFF 
> format, which was disappointing, because I like FCP's
> interface and other features, and was hoping to get an up
> close and personal look at some of the glyphs, which seemed
> to have some sort of height anomaly.
> 
> On Win2K, "Character Map" (charmap.exe) does not show anything 
> beyond the BMP.  I haven't tried this on XP.
> 
> Gary
> 
> At 09:00 AM 11/20/2003 -0500, Mark E. Shoulson wrote:
> >I haven't tested this myself, but from a look at the source code, it appears 
> that pfaedit (pfaedit.sourceforge.net) can generate format12 TTFs.  (Open 
> Source, for UNIX).
> >
> >~mark
> >
> >On 11/20/03 03:12, Arcane Jill wrote:
> >
> >>
> >>Is anyone able to answer this? I for one would really like to know.
> >>Thanks
> >>
> >>
> >>> -Original Message-
> >>> From: Frank Yung-Fong Tang [mailto:[EMAIL PROTECTED]
> >>> Sent: Thursday, November 20, 2003 2:29 AM
> >>> To: John Jenkins
> >>> Cc: [EMAIL PROTECTED]
> >>> Subject: Re: creating a test font w/ CJKV Extension B characters.
> >>
> >>
> >>> Does FontLab support generating TTF in format12 (32 bits)?
> >>> Which "cheaper solutions" could  generating TTF in format12 (32 bits)?
> 
> 
> ---
> Gary Grosso
> Arbortext, Inc.
> Ann Arbor, MI, USA
> 
> 



Re: creating a test font w/ CJKV Extension B characters.

2003-11-20 Thread Gary P. Grosso
Since we're comparing notes on font tools, I recently was
asked to look over an experimental font which had, among
other things, characters in the Supplemental Multilingual
Plane and used CFF format.  (I had to look up what CFF
format even was.)

PFAEdit was able to load the font.  At least I could see the 
SMP characters; I didn't attempt any editing, kerning, etc.
I've always been fairly impressed with PFAEdit, which probably
deserves a name which reflects the fact that it goes well
beyond PFA files or even Type 1 fonts.  In fact, I'd like
to see it ported to Windows.

Font Creator Program couldn't load the font due to the CFF 
format, which was disappointing, because I like FCP's
interface and other features, and was hoping to get an up
close and personal look at some of the glyphs, which seemed
to have some sort of height anomaly.

On Win2K, "Character Map" (charmap.exe) does not show anything 
beyond the BMP.  I haven't tried this on XP.

Gary

At 09:00 AM 11/20/2003 -0500, Mark E. Shoulson wrote:
>I haven't tested this myself, but from a look at the source code, it appears that 
>pfaedit (pfaedit.sourceforge.net) can generate format12 TTFs.  (Open Source, for 
>UNIX).
>
>~mark
>
>On 11/20/03 03:12, Arcane Jill wrote:
>
>>
>>Is anyone able to answer this? I for one would really like to know.
>>Thanks
>>
>>
>>> -Original Message-
>>> From: Frank Yung-Fong Tang [mailto:[EMAIL PROTECTED]
>>> Sent: Thursday, November 20, 2003 2:29 AM
>>> To: John Jenkins
>>> Cc: [EMAIL PROTECTED]
>>> Subject: Re: creating a test font w/ CJKV Extension B characters.
>>
>>
>>> Does FontLab support generating TTF in format12 (32 bits)?
>>> Which "cheaper solutions" could  generating TTF in format12 (32 bits)?


---
Gary Grosso
Arbortext, Inc.
Ann Arbor, MI, USA




Re: creating a test font w/ CJKV Extension B characters.

2003-11-20 Thread Mark E. Shoulson
I haven't tested this myself, but from a look at the source code, it 
appears that pfaedit (pfaedit.sourceforge.net) can generate format12 
TTFs.  (Open Source, for UNIX).

~mark

On 11/20/03 03:12, Arcane Jill wrote:

Is anyone able to answer this? I for one would really like to know.
Thanks
> -Original Message-
> From: Frank Yung-Fong Tang [mailto:[EMAIL PROTECTED]
> Sent: Thursday, November 20, 2003 2:29 AM
> To: John Jenkins
> Cc: [EMAIL PROTECTED]
> Subject: Re: creating a test font w/ CJKV Extension B characters.
> Does FontLab support generating TTF in format12 (32 bits)?
> Which "cheaper solutions" could  generating TTF in format12 (32 bits)?





Re: creating a test font w/ CJKV Extension B characters.

2003-11-20 Thread Andrew C. West
On Thu, 20 Nov 2003 01:32:16 +, [EMAIL PROTECTED] wrote:
> 
> Frank Yung-Fong Tang wrote,
> > If you visit 
> > http://people.netscape.com/ftang/testscript/gb18030/gb18030.cgi?page=596 
> > and your machine have surrogate support install correctly and surrogate 
> > font install correctly then you should see surrogate characters show up 
> > match the gif. 
> 
> It isn't working, but I have surrogate support and a font correctly
> installed.
> 

Using W2K and IE6, if you have a CJK-B font configured for "User Defined"
scripts under the "Options : Fonts" settings, and manually select the encoding
for the page as "User Defined", then the second CJK-B character in each box
(just above the gif image) displays just fine.

The top character in each box appears to be encoded as GB-18030 (e.g. GB-18030
0x95328236 = U+2), and the second character is encoded as hex NCR values
(e.g. 𠀀 for U+2).

If GB-18030 is selected as the encoding for the page (as explicitly given in the
file), then IE won't display the CJK-B characters correctly (even if you
configure a CJK-B font as your default font for displaying Chinese), but you can
copy and paste them to a Unicode editor, where both the GB-18030 and NCR encoded
forms of CJK-B characters will display correctly with an appropriate CJK-B font.

If User Defined is selected as the encoding for the page (either manually or by
changing the meta tag in the file to charset="x-user-defined"), then the
GB-18030 encoded characters turn to gunk, but the NCR representations are
displayed using whatever font you have configured for user defined scripts, and
if that is a CJK-B font then hey presto !

Andrew



RE: creating a test font w/ CJKV Extension B characters.

2003-11-20 Thread Arcane Jill
Is anyone able to answer this? I for one would really like to know.
Thanks
> -Original Message-
> From: Frank Yung-Fong Tang [mailto:[EMAIL PROTECTED]
> Sent: Thursday, November 20, 2003 2:29 AM
> To: John Jenkins
> Cc: [EMAIL PROTECTED]
> Subject: Re: creating a test font w/ CJKV Extension B characters.
> Does FontLab support generating TTF in format12 (32 bits)?
> Which "cheaper solutions" could  generating TTF in format12 (32 bits)?




RE: creating a test font w/ CJKV Extension B characters.

2003-11-20 Thread Arcane Jill
Actually, I'd also like to know how to create OTF fonts, not just TTF 
fonts, as OTF seems to be the new big thing, and TTF's successor.

Jill





Re: creating a test font w/ CJKV Extension B characters.

2003-11-19 Thread John Cowan
[EMAIL PROTECTED] scripsit:

> The page looks like it is calling for Unicode characters to display,
> example 𠀀, but the HTML header says "GB-18030" for the characters
> set.  Could this be the problem, or are Unicode and GB18030 matched 
> for plane two and for HTML numeric characters references?

HTML character references, like XML ones, always refer to the Unicode
character set; the GB-18030 definition refers only to the interpretation
of bytes, not character references.

-- 
Using RELAX NG compact syntax toJohn Cowan
develop schemas is one of the simplehttp://www.reutershealth.com
pleasures in life   http://www.ccil.org/~cowan
--Jeni Tennison <[EMAIL PROTECTED]>



Re: creating a test font w/ CJKV Extension B characters.

2003-11-19 Thread Frank Yung-Fong Tang


John Jenkins wrote:

 >
 > æ Nov 19, 2003 10:30 PM æïOstermueller, Erik æåï
 >
 > > Could someone recommend a good tutorial or 'font creator' application
 > > that addresses surrogate pairs?
 > >
 >
 > FontLab is probably the best cross-platform font creation software out
 > there, although it's not cheap.  Cheaper solutions are to be found IIRC
 > on Windows, and there's .
Does FontLab support generating TTF in format12 (32 bits)?
Which "cheaper solutions" could  generating TTF in format12 (32 bits)?

 > If you're on a Mac, Apple's font tool suite
 > (http://developer.apple.com/fonts/) is free and lets you add non-BMP
 > support to fonts.
Can you point out which document and chapter in those doc talk in those 
document talk about what we need to do to add non-BMP charactrers?

which of the following MacOSX font tool should be used for that purpose?
# ftxanalyzer
# ftxdiff
# ftxdumperfuser
# ftxenhancer
# ftxinstalledfonts
# ftxruler
# ftxvalidator

 >
 > 
 > John H. Jenkins
 > [EMAIL PROTECTED]
 > [EMAIL PROTECTED]
 > http://homepage..mac.com/jhjenkins/
 >
 >

-- 
--
Frank Yung-Fong Tang
ÅÃÅtÃm ÃrÃhÃtÃÃt, IÃtÃrnÃtiÃnÃl DÃvÃlÃpmeÃt, AOL IntÃrÃÃtÃvà 
SÃrviÃes
AIM:yungfongta   mailto:[EMAIL PROTECTED] Tel:650-937-2913
Yahoo! Msg: frankyungfongtan

John 3:16 "For God so loved the world that he gave his one and only Son,
that whoever believes in him shall not perish but have eternal life.

Does your software display Thai language text correctly for Thailand users?
-> Basic Conceptof Thai Language linked from Frank Tang's
IÃtÃrnÃtiÃnÃlizÃtiÃn Secrets
Want to translate your English text to something Thailand users can
understand ?
-> Try English-to-Thai machine translation at
http://c3po.links.nectec.or.th/parsit/





Re: creating a test font w/ CJKV Extension B characters.

2003-11-19 Thread Frank Yung-Fong Tang


Philippe Verdy wrote:

 > From: "Frank Yung-Fong Tang" <[EMAIL PROTECTED]>
 > > It is not that easy for you from "don't know beans about fonts" to
 > > "creat a test font that contains ... \u20050". If you are lucky, it
 > will
 > > take you several month if not year. There are commercial base font
 > tool.
 > > But I am not sure they support 32 bits cmap or not (probably not).
 >
 > According to:
 > http://www.microsoft.com/typography/otspec/cmap.htm
 >
 > The so-called "Microsoft Unicode" cmap format 4 (platfom id=3, encoding
 > id=1) is the one recommanded for all fonts, except those than need to
 > encode
 > supplementary planes.
 >
 > Format 0 is deprecated (was used to map 8-bit encodings to glyph ids), as
 > well as now Format 2 (was used to map DBCS encodings with leadbyte/trail
 > bytes in East Asia, as a mix of 8 and 16 bit codes)
 >
 > For supplementary planes, like a font built to support GB18030, the cmap
 > format 12 must be used instead with the same platform id, but the
 > encoding
 > id 10 (UCS-4).
 >
 > Format 8 is used to create a mix of 16-bit and 32-bit maps (with the
 > assumption that no 16bit Unicode character will have the same code
 > point as
 > the highest 16-bit of a character out of the BMP, meaning that it
 > works as
 > long as there's no glyph to assign for Unicode codepoints X between
 > U+
 > and U+0010 simultaneously with codepoints between X<<16 and (X+1)<<16 -
 > 1).This compresses a bit the size of the cmap.
 >
 > Format 10 is not portable unlike format 12 which must be provided in
 > addition to the recommanded format 4 for characters present in the
 > BMP. In
 > practice, this format is used mostly for GB18030 support, and
 > supported by
 > Windows 2000 and later. So you won't have to wait for years to create a
 > GB18030 font, using UCS-4 mappings...

Which font tool currently support generating TTF with format 12? While 
it is true the font format and application software (such as mozilla I 
wrote, WinXP, Office XP, etc) is ready to deal with it, not many font 
tools which I know can create TTF with format 12 that are design for 
someone claimed himself as "don't know beans about fonts" to "creat a 
test font that contains ... \u20050" now.


-- 
--
Frank Yung-Fong Tang
ÅÃÅtÃm ÃrÃhÃtÃÃt, IÃtÃrnÃtiÃnÃl DÃvÃlÃpmeÃt, AOL IntÃrÃÃtÃvà 
SÃrviÃes
AIM:yungfongta   mailto:[EMAIL PROTECTED] Tel:650-937-2913
Yahoo! Msg: frankyungfongtan

John 3:16 "For God so loved the world that he gave his one and only Son,
that whoever believes in him shall not perish but have eternal life.

Does your software display Thai language text correctly for Thailand users?
-> Basic Conceptof Thai Language linked from Frank Tang's
IÃtÃrnÃtiÃnÃlizÃtiÃn Secrets
Want to translate your English text to something Thailand users can
understand ?
-> Try English-to-Thai machine translation at
http://c3po.links.nectec.or.th/parsit/





Re: creating a test font w/ CJKV Extension B characters.

2003-11-19 Thread jameskass
.
> It isn't working, but I have surrogate support and a font correctly
> installed.
>

Right.
 
> The page looks like it is calling for Unicode characters to display,
> example 𠀀, but the HTML header says "GB-18030" for the characters
> set.  Could this be the problem

Apparently it isn't.  For some reason, MSIE 6.0 just can't deal with
this page, even downloaded, archived, and heavily tweaked.  

But, the Opera browser displays the Plane Two Unicode material
just fine, without any tweaking to the HTML.

Best regards,

James Kass
.



Re: creating a test font w/ CJKV Extension B characters.

2003-11-19 Thread Frank Yung-Fong Tang
are you using Netscape7 / Mozilla or IE?
If you use IE, then IE may have a bug about that.
I think Mozilla should not have the problem since I develope and test it 
by myself.

[EMAIL PROTECTED] wrote:

 > .
 > Frank Yung-Fong Tang wrote,
 >
 > > If you visit
 > >
 > http://people.netscape.com/ftang/testscript/gb18030/gb18030.cgi?page=596
 > > and your machine have surrogate support install correctly and surrogate
 > > font install correctly then you should see surrogate characters show up
 > > match the gif.
 >
 > It isn't working, but I have surrogate support and a font correctly
 > installed.

Are you running on XP or 2K? od you install all the necessary surrogate 
support? Do you teak your font pref to use the surrogate font for 
Chinese pages?


 >
 > The page looks like it is calling for Unicode characters to display,
 > example 𠀀, but the HTML header says "GB-18030" for the characters
 > set.  Could this be the problem, or are Unicode and GB18030 matched
 > for plane two and for HTML numeric characters references?
It should not matter . But again, it could be a bug in the IE.

 >
 > Each single Plane Two character is displaying as two missing glyphs,
 > if that is an extra clue.
 >
 > Best regards,
 >
 > James Kass
 > .


-- 
--
Frank Yung-Fong Tang
ÅÃÅtÃm ÃrÃhÃtÃÃt, IÃtÃrnÃtiÃnÃl DÃvÃlÃpmeÃt, AOL IntÃrÃÃtÃvà 
SÃrviÃes
AIM:yungfongta   mailto:[EMAIL PROTECTED] Tel:650-937-2913
Yahoo! Msg: frankyungfongtan

John 3:16 "For God so loved the world that he gave his one and only Son,
that whoever believes in him shall not perish but have eternal life.

Does your software display Thai language text correctly for Thailand users?
-> Basic Conceptof Thai Language linked from Frank Tang's
IÃtÃrnÃtiÃnÃlizÃtiÃn Secrets
Want to translate your English text to something Thailand users can
understand ?
-> Try English-to-Thai machine translation at
http://c3po.links.nectec.or.th/parsit/





Re: creating a test font w/ CJKV Extension B characters.

2003-11-19 Thread jameskass
.
Frank Yung-Fong Tang wrote,

> If you visit 
> http://people.netscape.com/ftang/testscript/gb18030/gb18030.cgi?page=596 
> and your machine have surrogate support install correctly and surrogate 
> font install correctly then you should see surrogate characters show up 
> match the gif. 

It isn't working, but I have surrogate support and a font correctly
installed.

The page looks like it is calling for Unicode characters to display,
example 𠀀, but the HTML header says "GB-18030" for the characters
set.  Could this be the problem, or are Unicode and GB18030 matched 
for plane two and for HTML numeric characters references?

Each single Plane Two character is displaying as two missing glyphs,
if that is an extra clue.

Best regards,

James Kass
.



Re: creating a test font w/ CJKV Extension B characters.

2003-11-19 Thread Philippe Verdy
From: "Frank Yung-Fong Tang" <[EMAIL PROTECTED]>
> It is not that easy for you from "don't know beans about fonts" to
> "creat a test font that contains ... \u20050". If you are lucky, it will
> take you several month if not year. There are commercial base font tool.
> But I am not sure they support 32 bits cmap or not (probably not).

According to:
http://www.microsoft.com/typography/otspec/cmap.htm

The so-called "Microsoft Unicode" cmap format 4 (platfom id=3, encoding
id=1) is the one recommanded for all fonts, except those than need to encode
supplementary planes.

Format 0 is deprecated (was used to map 8-bit encodings to glyph ids), as
well as now Format 2 (was used to map DBCS encodings with leadbyte/trail
bytes in East Asia, as a mix of 8 and 16 bit codes)

For supplementary planes, like a font built to support GB18030, the cmap
format 12 must be used instead with the same platform id, but the encoding
id 10 (UCS-4).

Format 8 is used to create a mix of 16-bit and 32-bit maps (with the
assumption that no 16bit Unicode character will have the same code point as
the highest 16-bit of a character out of the BMP, meaning that it works as
long as there's no glyph to assign for Unicode codepoints X between U+
and U+0010 simultaneously with codepoints between X<<16 and (X+1)<<16 -
1).This compresses a bit the size of the cmap.

Format 10 is not portable unlike format 12 which must be provided in
addition to the recommanded format 4 for characters present in the BMP. In
practice, this format is used mostly for GB18030 support, and supported by
Windows 2000 and later. So you won't have to wait for years to create a
GB18030 font, using UCS-4 mappings...




Re: creating a test font w/ CJKV Extension B characters.

2003-11-19 Thread John Jenkins
於 Nov 19, 2003 10:30 PM 時,Ostermueller, Erik 提到:

Could someone recommend a good tutorial or 'font creator' application
that addresses surrogate pairs?
FontLab is probably the best cross-platform font creation software out 
there, although it's not cheap.  Cheaper solutions are to be found IIRC 
on Windows, and there's .  If you're on a Mac, Apple's font tool suite 
(http://developer.apple.com/fonts/) is free and lets you add non-BMP 
support to fonts.


John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage..mac.com/jhjenkins/



Re: creating a test font w/ CJKV Extension B characters.

2003-11-19 Thread Markus Scherer
Ostermueller, Erik wrote:
I'd like to create a test font that contains a 
a standard US Latin alphabet and the following characters:

\u5000
\u20050
We need this for testing a software app that supports GB18030.
Why not use an existing font that has glyphs for supplementary code points?

markus




Re: creating a test font w/ CJKV Extension B characters.

2003-11-19 Thread Frank Yung-Fong Tang
why don't you find a font which already support it ?
you can find some info here -
http://www.microsoft.com/globaldev/DrIntl/columns/015/default.mspx

It is not that easy for you from "don't know beans about fonts" to 
"creat a test font that contains ... \u20050". If you are lucky, it will 
take you several month if not year. There are commercial base font tool. 
But I am not sure they support 32 bits cmap or not (probably not). You 
can start from http://www.microsoft.com/typography/users.htm , but I 
think it will take you a while  you need the 32 bits cmap support in 
OpenType to add u20050 . I don't know which commercial tool current 
support that.


Ostermueller, Erik wrote:

 > Hello all,
 >
 > I'd like to create a test font that contains a
 > a standard US Latin alphabet and the following characters:
 >
 > \u5000
 > \u20050
 >
 > We need this for testing a software app that supports GB18030.

if you want to get GB18030 test data, one thing you can do is to visit 
my GB18030 test page at
http://people.netscape.com/ftang/testscript/gb18030/gb18030.cgi?page=10

That is what I design to test Mozilla's GB18030 support. The page number 
  and the layout match exactly what the paper copy of GB18030 so you can 
do a screen to paper copy comparision. I do add addition pages in the 
web (from page 284 and later which is not in the hardcopy of GB18030). 
If you visit 
http://people.netscape.com/ftang/testscript/gb18030/gb18030.cgi?page=596 
and your machine have surrogate support install correctly and surrogate 
font install correctly then you should see surrogate characters show up 
match the gif. IF you click the [Text] in the left upper corner, it will 
open a new window and put those GB18030 text in plain text format.
Good luck.

 > My main problem is that I don't know beans about fonts.
 > Could someone recommend a good tutorial or 'font creator' application
 > that addresses surrogate pairs?
 >
 > Thanks,
 >
 > Erik Ostermueller
 >

-- 
--
Frank Yung-Fong Tang
ÅÃÅtÃm ÃrÃhÃtÃÃt, IÃtÃrnÃtiÃnÃl DÃvÃlÃpmeÃt, AOL IntÃrÃÃtÃvà 
SÃrviÃes
AIM:yungfongta   mailto:[EMAIL PROTECTED] Tel:650-937-2913
Yahoo! Msg: frankyungfongtan

John 3:16 "For God so loved the world that he gave his one and only Son,
that whoever believes in him shall not perish but have eternal life.

Does your software display Thai language text correctly for Thailand users?
-> Basic Conceptof Thai Language linked from Frank Tang's
IÃtÃrnÃtiÃnÃlizÃtiÃn Secrets
Want to translate your English text to something Thailand users can
understand ?
-> Try English-to-Thai machine translation at
http://c3po.links.nectec.or.th/parsit/