Re: Microsoft input method, 950, and Unicode mapping

2001-12-20 Thread Kevin Bracey

In message <[EMAIL PROTECTED]>
  Asmus Freytag <[EMAIL PROTECTED]> wrote:

> Because of this, you  get better interoperation among CJK code sets with
> using CIRCLED PLUS  instead of EARTH, but at the cost of having obscured
> the semantics (i.e.  compromised interoperation with Unicode-based
> systems).

I see. In constructing my tables, I was trying to identify semantics by
comparing surrounding and other characters in groups, so Earth/Sun was my
choice.

> > I was able to come up with a good Big5 mapping by taking the best ideas
> > from various Big5 and CNS11643 tables on the net, then making sure each
> > of those Unicode compatibility characters was used once, AND IN THE ORDER
> > THEY APPEAR IN UNICODE.
> 
> That's not always a good idea. Unicode order often does not follow any 
> standard, even when characters are intended to map.

But in this case, it seems clear that the correlation is too close to be
coincidental. U+FE30 to U+FE4E can extremely plausibly be found in order
in CNS11643/Big5. U+FE4F is out of order - the only exception. In the next
group, U+FE50 to U+FE6B again appear to appear in order. I would love to have
this confirmed by whoever placed the characters in Unicode. Here's my deduced
correlation for Big5:

0xA14A  0xFE30  # PRESENTATION FORM FOR VERTICAL TWO DOT LEADER
0xA155  0xFE31  # PRESENTATION FORM FOR VERTICAL EM DASH
0xA157  0xFE32  # PRESENTATION FORM FOR VERTICAL EN DASH
0xA159  0xFE33  # PRESENTATION FORM FOR VERTICAL LOW LINE
0xA15B  0xFE34  # PRESENTATION FORM FOR VERTICAL WAVY LOW LINE
0xA15C  0xFE4F  # WAVY LOW LINE
0xA15F  0xFE35  # PRESENTATION FORM FOR VERTICAL LEFT PARENTHESIS
0xA160  0xFE36  # PRESENTATION FORM FOR VERTICAL RIGHT PARENTHESIS
0xA163  0xFE37  # PRESENTATION FORM FOR VERTICAL LEFT CURLY BRACKET
0xA164  0xFE38  # PRESENTATION FORM FOR VERTICAL RIGHT CURLY BRACKET
0xA167  0xFE39  # PRESENTATION FORM FOR VERTICAL LEFT TORTOISE SHELL BRACKET
0xA168  0xFE3A  # PRESENTATION FORM FOR VERTICAL RIGHT TORTOISE SHELL BRACKET
0xA16B  0xFE3B  # PRESENTATION FORM FOR VERTICAL LEFT BLACK LENTICULAR BRACKET
0xA16C  0xFE3C  # PRESENTATION FORM FOR VERTICAL RIGHT BLACK LENTICULAR BRACKET
0xA16F  0xFE3D  # PRESENTATION FORM FOR VERTICAL LEFT DOUBLE ANGLE BRACKET
0xA170  0xFE3E  # PRESENTATION FORM FOR VERTICAL RIGHT DOUBLE ANGLE BRACKET
0xA173  0xFE3F  # PRESENTATION FORM FOR VERTICAL LEFT ANGLE BRACKET
0xA174  0xFE40  # PRESENTATION FORM FOR VERTICAL RIGHT ANGLE BRACKET
0xA177  0xFE41  # PRESENTATION FORM FOR VERTICAL LEFT CORNER BRACKET
0xA178  0xFE42  # PRESENTATION FORM FOR VERTICAL RIGHT CORNER BRACKET
0xA17B  0xFE43  # PRESENTATION FORM FOR VERTICAL LEFT WHITE CORNER BRACKET
0xA17C  0xFE44  # PRESENTATION FORM FOR VERTICAL RIGHT WHITE CORNER BRACKET
0xA1C6  0xFE49  # DASHED OVERLINE
0xA1C7  0xFE4A  # CENTRELINE OVERLINE
0xA1C8  0xFE4D  # DASHED LOW LINE
0xA1C9  0xFE4E  # CENTRELINE LOW LINE
0xA1CA  0xFE4B  # WAVY OVERLINE
0xA1CB  0xFE4C  # DOUBLE WAVY OVERLINE


0xA14D  0xFE50  # SMALL COMMA
0xA14E  0xFE51  # SMALL IDEOGRAPHIC COMMA
0xA14F  0xFE52  # SMALL FULL STOP
0xA151  0xFE54  # SMALL SEMICOLON
0xA152  0xFE55  # SMALL COLON
0xA153  0xFE56  # SMALL QUESTION MARK
0xA154  0xFE57  # SMALL EXCLAMATION MARK
0xA15A  0xFE58  # SMALL EM DASH
0xA17D  0xFE59  # SMALL LEFT PARENTHESIS
0xA17E  0xFE5A  # SMALL RIGHT PARENTHESIS
0xA1A1  0xFE5B  # SMALL LEFT CURLY BRACKET
0xA1A2  0xFE5C  # SMALL RIGHT CURLY BRACKET
0xA1A3  0xFE5D  # SMALL LEFT TORTOISE SHELL BRACKET
0xA1A4  0xFE5E  # SMALL RIGHT TORTOISE SHELL BRACKET
0xA1CC  0xFE5F  # SMALL NUMBER SIGN
0xA1CD  0xFE60  # SMALL AMPERSAND
0xA1CE  0xFE61  # SMALL ASTERISK
0xA1DE  0xFE62  # SMALL PLUS SIGN
0xA1DF  0xFE63  # SMALL HYPHEN-MINUS
0xA1E0  0xFE64  # SMALL LESS-THAN SIGN
0xA1E1  0xFE65  # SMALL GREATER-THAN SIGN
0xA1E2  0xFE66  # SMALL EQUALS SIGN
0xA242  0xFE68  # SMALL REVERSE SOLIDUS
0xA24C  0xFE69  # SMALL DOLLAR SIGN
0xA24D  0xFE6A  # SMALL PERCENT SIGN
0xA24E  0xFE6B  # SMALL COMMERCIAL AT

-- 
Kevin Bracey, Principal Software Engineer
Pace Micro Technology plc Tel: +44 (0) 1223 518566
645 Newmarket RoadFax: +44 (0) 1223 518526
Cambridge, CB5 8PB, United KingdomWWW: http://www.pace.co.uk/




Re: Microsoft input method, 950, and Unicode mapping

2001-12-19 Thread Asmus Freytag

At 10:38 AM 12/19/01 +, Kevin Bracey wrote:
>In message <[EMAIL PROTECTED]>
>   Asmus Freytag <[EMAIL PROTECTED]> wrote:
>
> > On top of that, it looks like 950 maps a bogus symbol or punctuation
> > character to U+2574. (2574 is one of a set of 4, and only 1 is mapped for
> > starters. Fonts covering CP950 give a way different image for that
> > character than you'd expect from either the charts or the names...
>
>I recently had to sort out our systems' Big5<->Unicode mapping table, and
>there seems to be great confusion in the punctuation space. The table (that
>used to be) on the Unicode site was unsatisfactory, and Microsoft's CP950
>mapping also doesn't seem to make sense (eg with that U+2574 mapping, and
>CIRCLED PLUS and DOT OPERATOR instead of EARTH and SUN).

The new JIS X 0213 (as mapped in a mapping table somebody sent out for 
comments a while back) contains CIRCLED PLUS, CIRCLED MINUS and CIRCLED 
TIMES, deciding at leas the '+' form in favor of the mathematical 
operators.   (It does not contain a circled dot). Incidentally GBK (as 
mapped for MS 936) has both the DOT OPERATOR and SUN (the only one of the 
mapping tables I found to have mapped to U+2609 SUN). Because of this, you 
get better interoperation among CJK code sets with using CIRCLED PLUS 
instead of EARTH, but at the cost of having obscured the semantics (i.e. 
compromised interoperation with Unicode-based systems).

>One point of note is that there are a whole cluster of characters in the
>compatibility area of Unicode from U+FE30 to U+FE6B that are designed to
>handle mapping CNS11643, whose punctuation area is almost identical to
>Big5's. Mapping tables I've seen don't make proper use of them.

I tend to agree.

>I was able to come up with a good Big5 mapping by taking the best ideas from
>various Big5 and CNS11643 tables on the net, then making sure each of those
>Unicode compatibility characters was used once, AND IN THE ORDER THEY APPEAR
>IN UNICODE.

That's not always a good idea. Unicode order often does not follow any 
standard, even when characters are intended to map. The reason can range 
from transcription mistakes to attempts at presenting a more orderly 
arrangements, with effects of piecemeal additions added on top to confuse 
all. However, if both Unicode and the other standard group related symbols, 
I would try to find mapping targets nearby rather than far away for 
characters of the same group.

>This ends up mapping A15A to U+FE58 SMALL EM DASH, which still
>might not be right, but it looks like a confused character anyway - it
>appears different in Big5 and CNS11643 tables, so it could just be a glyph
>variant issue.

And it appears as underline in some fonts, e.g. Win2K's version of 
PMingLiu. I wish I had more hardcopy documentation for some of the 
standards. Lunde's book is usualy a good resource, but he glosses over the 
punctuation in favor of the ideographs, especially for Big-5/Eten/CNS.

A./






Re: Microsoft input method, 950, and Unicode mapping

2001-12-19 Thread Kevin Bracey

In message <[EMAIL PROTECTED]>
  Asmus Freytag <[EMAIL PROTECTED]> wrote:

> On top of that, it looks like 950 maps a bogus symbol or punctuation 
> character to U+2574. (2574 is one of a set of 4, and only 1 is mapped for 
> starters. Fonts covering CP950 give a way different image for that 
> character than you'd expect from either the charts or the names...

I recently had to sort out our systems' Big5<->Unicode mapping table, and
there seems to be great confusion in the punctuation space. The table (that
used to be) on the Unicode site was unsatisfactory, and Microsoft's CP950
mapping also doesn't seem to make sense (eg with that U+2574 mapping, and
CIRCLED PLUS and DOT OPERATOR instead of EARTH and SUN).

One point of note is that there are a whole cluster of characters in the
compatibility area of Unicode from U+FE30 to U+FE6B that are designed to
handle mapping CNS11643, whose punctuation area is almost identical to
Big5's. Mapping tables I've seen don't make proper use of them.

I was able to come up with a good Big5 mapping by taking the best ideas from
various Big5 and CNS11643 tables on the net, then making sure each of those
Unicode compatibility characters was used once, AND IN THE ORDER THEY APPEAR
IN UNICODE. This ends up mapping A15A to U+FE58 SMALL EM DASH, which still
might not be right, but it looks like a confused character anyway - it
appears different in Big5 and CNS11643 tables, so it could just be a glyph
variant issue.

-- 
Kevin Bracey, Principal Software Engineer
Pace Micro Technology plc Tel: +44 (0) 1223 518566
645 Newmarket RoadFax: +44 (0) 1223 518526
Cambridge, CB5 8PB, United KingdomWWW: http://www.pace.co.uk/




Re: Microsoft input method, 950, and Unicode mapping

2001-12-18 Thread Asmus Freytag

On top of that, it looks like 950 maps a bogus symbol or punctuation 
character to U+2574. (2574 is one of a set of 4, and only 1 is mapped for 
starters. Fonts covering CP950 give a way different image for that 
character than you'd expect from either the charts or the names...

I let some people know about this, but fixing it would cause even more 
problems one assumes.
A./

At 11:13 PM 12/18/01 -0500, Tex Texin wrote:
>Ken,
>
>Thanks for commiserating.
>Yes, I noticed the differences in mapping tables.
>I am glad Sybase gave different character sets different names.
>I am curious how you deal with Unicode and HKSCS in the private use
>area, sometimes
>For that matter I wonder what a user in HK does when their Windows
>operating system is upgraded and their files that had HKSCS characters
>in the private use area now expect them in other locations.
>
>With respect to messy tables, and HKSCS and GB18030 in particular, it is
>a damn shame that there is no entity making a case to governments and
>others creating character set standards, that they not consider the set
>defined until it is registered to ISO and Unicode, so some of the silly
>mistakes get worked out first. A little press relations here, with
>recent history and resulting problems as evidence and the corrections
>that came about once registration was attempted, would show that working
>these things out in committee is helpful and not a threat to national
>soverignty.
>
>Oh well. Surely this won't happen again in 2002
>tex
>
>
>
>Kenneth Whistler wrote:
> >
> > Tex,
> >
> > >
> > > Thanks for this and the several private responses.
> > >
> > > For anyone interested, in addition to the Microsoft page:
> > > http://www.microsoft.com/hk/hkscs/
> > >
> > > The HK Gov't has a web page, fonts and mapping tables:
> > > http://www.info.gov.hk/digital21/eng/hkscs/introduction.html
> >
> > And to add to the chaos and confusion, note that the HKSCS
> > patch for Windows Code Page 950 does not map exactly the
> > same as the HK Government mapping table. And that the HK
> > Government mapping table has at least a couple of blatant
> > errors in it. And that the HKSCS path for Windows Code Page 950
> > (like Code Page 950 without the extension, but even moreso)
> > has duplicate mappings in it that need to be resolved in
> > order to roundtrip through Unicode. And you have no guarantee
> > that various vendors' attempts to sort out the HK Government
> > mapping table and Windows Code Page 950 + HKSCS path behavior
> > will themselves produce matching results.
> >
> > >
> > > Oracle gave a nice paper at a recent Unicode conference:
> > > http://www.unicode.org/iuc/iuc18/papers/b19.ppt
> > >
> > > It amazes me that in the year 2000, organizations are still creating
> > > chaos by amending definitions of standards especially code pages,
> > > without giving the new creation its own name or some other way of
> > > distinguishing it, and then on top of that creating multiple mapping
> > > tables.
> > >
> > > I understand the desire to get new functionality into users hands, but
> > > would it have been a problem to rename either big5 or 950 to something
> > > like big-6 or big-5hk or 950HK or 951?
> >
> > Sybase is now supporting "cp950" (+euro, by the way -- another addition
> > that may or may not be supported in a particular Windows implementation,
> > depending on date) and a separate "big5hk", so if you interoperate
> > with Sybase, you should know what you are getting. However, like
> > everybody else, it is hit or miss for us when a platform or other
> > data announces itself to us as "cp950" or "big-5", whether it
> > is with or without the HKSCS extensions.
> >
> > > So now we can't tell if big-5 or 950 will or won't have this data, or
> > > even whether Unicode data will have these characters in the private use
> > > area or elsewhere, or whether software that may be on the other end of
> > > the pipe supports HKSCS or not, or even if their operating system has
> > > the patch or not.
> > >
> > > Although "that which we call a rose by any other name would smell as
> > > sweet",
> > > calling everything a rose, makes it hard to know when you are getting a
> > > rose.
> >
> > I think this was all part of a conspiracy for Chinese to catch up
> > with Japanese, since the Chinese code pages (until now) didn't have
> > a mess the scale of SJIS. But between HKSCS and GB 18030, they are
> > making up for lost time.
> >
> > --Ken
> >
> > >
> > > Here's hoping for less chaos in 2002!
> > > tex
>
>--
>-
>Tex TexinDirector, International Business
>mailto:[EMAIL PROTECTED]Tel: +1-781-280-4271
>the Progress Company Fax: +1-781-280-4655
>-
>For a compelling demonstration for Unicode:
>http://www.geocities.com/i18nguy/unicode-example.html





Re: Microsoft input method, 950, and Unicode mapping

2001-12-18 Thread Thomas Chan

On Tue, 18 Dec 2001, Kenneth Whistler wrote:

> And to add to the chaos and confusion, note that the HKSCS
> patch for Windows Code Page 950 does not map exactly the
> same as the HK Government mapping table. And that the HK

And that's in addition to the confusion caused by the semi-official,
semi-published precursor version, GCCS.  I've got here an 2001
edition (post-publication of HKSCS) atlas+cdrom of Hong Kong which
includes a GCCS (!) support add-on (and not the same as the one that used
to be available from http://www.info.gov.hk/gccs/ before that URL became
a redirect to the present HKSCS site).


Thomas Chan
[EMAIL PROTECTED]





Re: Microsoft input method, 950, and Unicode mapping

2001-12-18 Thread Thomas Chan

On Tue, 18 Dec 2001, Tex Texin wrote:

> I am glad Sybase gave different character sets different names.

There's a "Big5-HKSCS" tag[1]--is anyone using that?

[1] http://www.iana.org/assignments/character-sets (see MIBenum 2101;
I don't understand why it's in the "vendor" range, though)


> For that matter I wonder what a user in HK does when their Windows
> operating system is upgraded and their files that had HKSCS characters
> in the private use area now expect them in other locations.

Or distinguishing between data in HKSCS, GCCS, pre-GCCS vendor extensions,
and privately-created extensions, all of which can occupy the same
encoding space.  Too bad that GCCS and HKSCS first existed as
government-anointed waizi/gaiji extensions, and were (and still are) 
implemented that way, rather than as part of a proper and separate
character set.

(I wish GCCS and HKSCS had proper numbers and dates to refer to them
by--the names are really too similar, and easily garbled and confused.
Recently I gave feedback on an article where HKSCS was used and discussed,
but under the GCCS name and with arguments that were only true for GCCS.)


Thomas Chan
[EMAIL PROTECTED]





Re: Microsoft input method, 950, and Unicode mapping

2001-12-18 Thread Tex Texin

Ken,

Thanks for commiserating.
Yes, I noticed the differences in mapping tables.
I am glad Sybase gave different character sets different names.
I am curious how you deal with Unicode and HKSCS in the private use
area, sometimes
For that matter I wonder what a user in HK does when their Windows
operating system is upgraded and their files that had HKSCS characters
in the private use area now expect them in other locations.

With respect to messy tables, and HKSCS and GB18030 in particular, it is
a damn shame that there is no entity making a case to governments and
others creating character set standards, that they not consider the set
defined until it is registered to ISO and Unicode, so some of the silly
mistakes get worked out first. A little press relations here, with
recent history and resulting problems as evidence and the corrections
that came about once registration was attempted, would show that working
these things out in committee is helpful and not a threat to national
soverignty.

Oh well. Surely this won't happen again in 2002
tex



Kenneth Whistler wrote:
> 
> Tex,
> 
> >
> > Thanks for this and the several private responses.
> >
> > For anyone interested, in addition to the Microsoft page:
> > http://www.microsoft.com/hk/hkscs/
> >
> > The HK Gov't has a web page, fonts and mapping tables:
> > http://www.info.gov.hk/digital21/eng/hkscs/introduction.html
> 
> And to add to the chaos and confusion, note that the HKSCS
> patch for Windows Code Page 950 does not map exactly the
> same as the HK Government mapping table. And that the HK
> Government mapping table has at least a couple of blatant
> errors in it. And that the HKSCS path for Windows Code Page 950
> (like Code Page 950 without the extension, but even moreso)
> has duplicate mappings in it that need to be resolved in
> order to roundtrip through Unicode. And you have no guarantee
> that various vendors' attempts to sort out the HK Government
> mapping table and Windows Code Page 950 + HKSCS path behavior
> will themselves produce matching results.
> 
> >
> > Oracle gave a nice paper at a recent Unicode conference:
> > http://www.unicode.org/iuc/iuc18/papers/b19.ppt
> >
> > It amazes me that in the year 2000, organizations are still creating
> > chaos by amending definitions of standards especially code pages,
> > without giving the new creation its own name or some other way of
> > distinguishing it, and then on top of that creating multiple mapping
> > tables.
> >
> > I understand the desire to get new functionality into users hands, but
> > would it have been a problem to rename either big5 or 950 to something
> > like big-6 or big-5hk or 950HK or 951?
> 
> Sybase is now supporting "cp950" (+euro, by the way -- another addition
> that may or may not be supported in a particular Windows implementation,
> depending on date) and a separate "big5hk", so if you interoperate
> with Sybase, you should know what you are getting. However, like
> everybody else, it is hit or miss for us when a platform or other
> data announces itself to us as "cp950" or "big-5", whether it
> is with or without the HKSCS extensions.
> 
> > So now we can't tell if big-5 or 950 will or won't have this data, or
> > even whether Unicode data will have these characters in the private use
> > area or elsewhere, or whether software that may be on the other end of
> > the pipe supports HKSCS or not, or even if their operating system has
> > the patch or not.
> >
> > Although "that which we call a rose by any other name would smell as
> > sweet",
> > calling everything a rose, makes it hard to know when you are getting a
> > rose.
> 
> I think this was all part of a conspiracy for Chinese to catch up
> with Japanese, since the Chinese code pages (until now) didn't have
> a mess the scale of SJIS. But between HKSCS and GB 18030, they are
> making up for lost time.
> 
> --Ken
> 
> >
> > Here's hoping for less chaos in 2002!
> > tex

-- 
-
Tex TexinDirector, International Business
mailto:[EMAIL PROTECTED]Tel: +1-781-280-4271
the Progress Company Fax: +1-781-280-4655
-
For a compelling demonstration for Unicode:
http://www.geocities.com/i18nguy/unicode-example.html




Re: Microsoft input method, 950, and Unicode mapping

2001-12-18 Thread Tex Texin

Thanks for this and the several private responses.

For anyone interested, in addition to the Microsoft page:
http://www.microsoft.com/hk/hkscs/

The HK Gov't has a web page, fonts and mapping tables:
http://www.info.gov.hk/digital21/eng/hkscs/introduction.html

Oracle gave a nice paper at a recent Unicode conference:
http://www.unicode.org/iuc/iuc18/papers/b19.ppt

It amazes me that in the year 2000, organizations are still creating
chaos by amending definitions of standards especially code pages,
without giving the new creation its own name or some other way of
distinguishing it, and then on top of that creating multiple mapping
tables.

I understand the desire to get new functionality into users hands, but
would it have been a problem to rename either big5 or 950 to something
like big-6 or big-5hk or 950HK or 951?

So now we can't tell if big-5 or 950 will or won't have this data, or
even whether Unicode data will have these characters in the private use
area or elsewhere, or whether software that may be on the other end of
the pipe supports HKSCS or not, or even if their operating system has
the patch or not.

Although "that which we call a rose by any other name would smell as
sweet",
calling everything a rose, makes it hard to know when you are getting a
rose.

Here's hoping for less chaos in 2002!
tex




-- 
-
Tex TexinDirector, International Business
mailto:[EMAIL PROTECTED]Tel: +1-781-280-4271
the Progress Company Fax: +1-781-280-4655
-
For a compelling demonstration for Unicode:
http://www.geocities.com/i18nguy/unicode-example.html




Re: Microsoft input method, 950, and Unicode mapping

2001-12-18 Thread Kenneth Whistler

Tex,

> 
> Thanks for this and the several private responses.
> 
> For anyone interested, in addition to the Microsoft page:
> http://www.microsoft.com/hk/hkscs/
> 
> The HK Gov't has a web page, fonts and mapping tables:
> http://www.info.gov.hk/digital21/eng/hkscs/introduction.html

And to add to the chaos and confusion, note that the HKSCS
patch for Windows Code Page 950 does not map exactly the
same as the HK Government mapping table. And that the HK
Government mapping table has at least a couple of blatant
errors in it. And that the HKSCS path for Windows Code Page 950
(like Code Page 950 without the extension, but even moreso)
has duplicate mappings in it that need to be resolved in
order to roundtrip through Unicode. And you have no guarantee
that various vendors' attempts to sort out the HK Government
mapping table and Windows Code Page 950 + HKSCS path behavior
will themselves produce matching results.

> 
> Oracle gave a nice paper at a recent Unicode conference:
> http://www.unicode.org/iuc/iuc18/papers/b19.ppt
> 
> It amazes me that in the year 2000, organizations are still creating
> chaos by amending definitions of standards especially code pages,
> without giving the new creation its own name or some other way of
> distinguishing it, and then on top of that creating multiple mapping
> tables.
> 
> I understand the desire to get new functionality into users hands, but
> would it have been a problem to rename either big5 or 950 to something
> like big-6 or big-5hk or 950HK or 951?

Sybase is now supporting "cp950" (+euro, by the way -- another addition
that may or may not be supported in a particular Windows implementation,
depending on date) and a separate "big5hk", so if you interoperate
with Sybase, you should know what you are getting. However, like
everybody else, it is hit or miss for us when a platform or other
data announces itself to us as "cp950" or "big-5", whether it
is with or without the HKSCS extensions.

> So now we can't tell if big-5 or 950 will or won't have this data, or
> even whether Unicode data will have these characters in the private use
> area or elsewhere, or whether software that may be on the other end of
> the pipe supports HKSCS or not, or even if their operating system has
> the patch or not.
> 
> Although "that which we call a rose by any other name would smell as
> sweet",
> calling everything a rose, makes it hard to know when you are getting a
> rose.

I think this was all part of a conspiracy for Chinese to catch up
with Japanese, since the Chinese code pages (until now) didn't have
a mess the scale of SJIS. But between HKSCS and GB 18030, they are
making up for lost time.

--Ken

> 
> Here's hoping for less chaos in 2002!
> tex




Re: Microsoft input method, 950, and Unicode mapping

2001-12-17 Thread Y M Chan

The answer is yes and no, depending on whether you've applied the patch.

To cut a long story short, you may want to check out:

http://www.microsoft.com/hk/hkscs/


On Tue, 18 Dec 2001 00:50:24 -0500
"Tex Texin" <[EMAIL PROTECTED]> wrote:

texin> Unicoders,
texin> I am sure there is a simple answer, but at the moment I am confused.
texin> 
texin> On Windows 2k with default locale "Traditional Chinese" and input locale
texin> "Chinese (Taiwan)" and using the
texin> "Chinese Traditional - Quick" method, users can enter Characters with
texin> the code points:
texin> 
texin> 0xFA44 0xFA41 0x916F
texin> 
texin> These values are outside the range of codepage 950.
texin> 
texin> So a subsequent conversion to Unicode fails, as these values are also
texin> not in the Microsoft mapping tables to Unicode.
texin> 
texin> The characters represent things such as:
texin> 0x916F for Quarry Bay in Hong Kong Island
texin> another is a character for a Island in Macau
texin> 
texin> 
texin> So my questions are:
texin> a) Is win 2k using an extended version of 950?
texin> b) Is the Trad. Chinese input method generating characters outside 950
texin> or perhaps generating 936 values?
texin> c) Perhaps these characters are in the HKSCS extension and there is a
texin> 950 +HKSCS code page?  
texin> 
texin> Anyway, my goal is to insure that users can input any character the
texin> input method supports, bring it into an application in the native code
texin> page, and map it to unicode. To do that I need a consistent definition
texin> for the code page.
texin> Any clues?
texin> 
texin> tex
texin> 
texin> 
texin> 
texin> -- 
texin> -
texin> Tex TexinDirector, International Business
texin> mailto:[EMAIL PROTECTED]Tel: +1-781-280-4271
texin> the Progress Company Fax: +1-781-280-4655
texin> -
texin> For a compelling demonstration for Unicode:
texin> http://www.geocities.com/i18nguy/unicode-example.html
texin> 

-- 
Y M Chan <[EMAIL PROTECTED]>