Re: GB18030

2001-09-27 Thread David Starner

On Thu, Sep 27, 2001 at 03:03:22PM -0700, Yung-Fong Tang wrote:
> David Starner wrote:
> 
> > If you can't recognize the
> > character, then just don't convert it.
> 
> It could be the quality of other's software, we have higher standard however.

Higher standard? If I'm working on "Old High German" on a system that only
supports Unicode 2.1, I'd be much happier for it to look for U+0225 in my
font and display what it finds there, rather than not displaying the 
character, refusing to read the file, or silently munging the file (in order
of evilness.) It is more important for me to be able to process the file
and lose some functionality than not to be able to read the file. 

-- 
David Starner - [EMAIL PROTECTED]
Pointless website: http://dvdeug.dhis.org
"I saw a daemon stare into my face into my face, and an angel touch my 
breast; each one softly calls my name . . . the daemon scares me less."
- "Disciple", Stuart Davis




INFO: Extension A and B on Windows... and/or in your browser

2001-09-27 Thread Michael \(michka\) Kaplan

[With permission from Microsoft], I had the help file for the Extension A &
B IME/font that ships with Office XP (CHS and Hong Kong editions) translated
into English, those who are interested in reading a bit on Microsoft's
efforts to provide an IME to handle these characters can see it here:

http://www.i18nwithvb.com/surrogate_ime/

(no warranty about the translation or even the original text, of course!)


Now, most people will be more interested in the code charts that are
included in the help file, reproduced in this online version (divided into
the same 16 parts as the help file does). They can be found at

http://www.i18nwithvb.com/surrogate_ime/code_charts/

There are three ways to load each page:

1) DEFAULT -- the tables are loaded with the font specified as "Simsun
(Founder Extended)" -- the font that ships with the CHS and HK versions of
Office XP.

2) NO FONT SPECIFIED -- the tables are loaded with no font specified -- good
for people who do not have the font on their machine and have hope that
their browser will make up for this with an alternate font.

3) EOT -- the tables are loaded with an EOT file produced by WEFT -- good
for people who do not have the font but do have IE and so have a hope that
the EOT files will allow one to see the characters

You will want to pay attention to the instructions on that default "code
charts" page on the configuration where I was able to verify that these
pages are rendered properly -- every other browser and OS combination I
tried would have some type of failure, in whole or part.

It was not my personal choice to prove that IE seems to have superior
support for Extension A and B, but I was unable to get either of the non-IE
browsers I tried to do such a hot job here. Even the IE case required
Win2000 or XP, properly set up for supplementary characters.

The two most interesting failures (IMHO):

1) Mozilla 0.9.3 -- Almost all of Ext. A shows up as question mark (?) and
all Ext B shows up as a double question mark (??) -- these are *not*
corrupted chars though -- copy and paste to Unicode notepad proves that the
actual characters are present, Mozilla is just using ? or ?? instead of the
default char. like it should be. Very weird and unexpected, to me at least.
:-)

2) IE >= 5.0 on machine set up for surrogate support and with font on
machine but without the font specified -- almost all of extension A is not
visible but all of extension B is -- I have been told this by various font
experts outside of MS that this is a known bug with displaying Extension A
characters in IE without the font explicitly specified.


Enjoy!


MichKa

Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/






Re: GB18030

2001-09-27 Thread Michael \(michka\) Kaplan

From: Yung-Fong Tang

> Case mapping ? You have no way to generate mapping table for
> case mapping with knowing the character unless you already
> define those character have no case or only one case.

Um, Unicode defines a behavior and even properties for unassigned code
points. If you choose not to implement this because you only handle
"assigned code points" then that is actually a problem with your software.

No one is arguing the point you make that until a code point is assigned,
its exact *FINAL* behavior is not completely understood with regards to
casing, collation,
and everything else. So you do not need to continue arguing this point -- I
am sure everyone agrees with it.

But do you understand that there is certainly a defined behavior for it in
the interim? In the time before it is assigned an actual character? That is
I think the crux of the matter here.

> Don't tell me there any people how implemented
> HanCharacterStokeNumber(U+2) in 1996, no body have a
> implementation of HanCharacterStokeNumber(U+2) until
> U+2 got defined.

Actually, several companies had the mechanisms defined to convert that to a
surrogate pair. Or to treat it as a single unassigned character for the
purposes of collation.

The difference between them and you would be that you do not recognize the
existence of this state -- the time before direct assignment?


MichKa

Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/






Re: GB18030

2001-09-27 Thread Yung-Fong Tang


 
Markus Scherer wrote:
Yung-Fong Tang wrote:
> ... But you
> still need to know what U+4ff3a to define such mapping table, right?
Wrong. You just need to know the mapping between code points, whether
assigned, used, or whatever.
> ... So, whatever the software the user currently have today, without
an
> upgrade (either upgrade the code or mapping table) still won't know
how to
> convert U+4ff3a to lower case or upper case, right ?
No, but that's irrelevant for character conversion. Once you update
the Unicode character database in your product, your software will
do it - if it knows how to deal with supplementary characters in general.
(That part is a technicality which is, again, independent of whether there
_are_ assigned characters.)
It still take a "Once you update the Unicode character database in your
product" to make it happen, right?  From software distribution
point of view, it mean a different version number and therefore usually
require a QA cycle. As I said, you CANNOT do it WITHOUT an upgrade. Anteing
could happen WITH an upgrade- either change to code or change the mapping
table.
> But how can you generate such mapping table without
knowing that character ?
By specifying which _code point_ in one encoding gets mapped to which
other _code point_ in the other encoding.
Character conversion never looks at whether the code points that it
maps are actual _characters_.
When you map between the GBK or Shift-JIS user-defined areas and Unicode
PUA or similar, then you also map code points that don't have characters.
What's new?
Case mapping ? You have no way to generate mapping table for case mapping
with knowing the character unless you already define those character have
no case or only one case.
> ...
> How many years does it take for people to realize that give a new
mappint to
> their customer still need a complete life cycle of QA and distribution? 
And
> there will be a new version number attach to the software for that.
Is this about the existence of supplementary characters again?
They exist since 1996, and a vendor who followed the UTC/ISO negotiations
could see it coming since 1993.
Surely most everyone had the time to roll out a new release of their
software to get the support for them in - in more than five years?
Don't tell me there any people how implemented HanCharacterStokeNumber(U+2)
in 1996, no body have a implementation of HanCharacterStokeNumber(U+2)
until U+2 got defined.
 
(I know that few actually worked on this in time. But time there was.)
markus



Re: GB18030

2001-09-27 Thread Yung-Fong Tang

ok... you beat me :)

David Starner wrote:

> On Thu, Sep 27, 2001 at 12:27:11PM -0700, Yung-Fong Tang wrote:
> > looks like I beat ICU by checkin my mapping table at April 9 (to
> > mozilla) , 10 days before they check in their first version of GB18030
> > xml mapping table :)  I probably can still claim the first open source
> > project which support GB18030 to Unicode conversion, althought I didn't
> > do anything beyond BMP 
>
> GNU libc CVS claims that the first version of the GB18030 iconv modules
> was uploaded to CVS on Jul 14, 2000, and the version corresponding to
> the current version of GB18030 was uploaded to CVS on Feb 14, 2001, with
> only minor changes since then. It has supported non-BMP characters since
> Jun 6, 2001.
>
> --
> David Starner - [EMAIL PROTECTED]
> Pointless website: http://dvdeug.dhis.org
> "I saw a daemon stare into my face into my face, and an angel touch my
> breast; each one softly calls my name . . . the daemon scares me less."
> - "Disciple", Stuart Davis





Re: GB18030

2001-09-27 Thread Yung-Fong Tang



David Starner wrote:

> On Thu, Sep 27, 2001 at 01:07:43PM -0700, Yung-Fong Tang wrote:
> > Draw a glyph from a font to implement case conversion, property mapping ? I don't 
>know how can you do that.
>
> When is case conversion a panic situation?

I never said it is "a panic situration"

> If you can't recognize the
> character, then just don't convert it.

It could be the quality of other's software, we have higher standard however.

> All unassigned characters have
> default properties - use them. No, you don't know all about the
> character, but you know enough to load a font and display it, which is
> all a webbrowser or a wordprocessor needs 90% of the time.
>
> > That is my quetion DOES it define so. I don't have the access to THE specification 
>itself and asking help to get one. Do you have the
> > access to the specification and DOES it specify so?
>
> Do you not have access to the web? It took me 4 minutes to find the
> information on the web. Start with www.google.com and type in GB18030,
> and you'll find most of the information right there.  Others have
> pointed out more specific links.

No, I am NOT asking "the information" about ths GB18030 standard. I am asking the 
GB18030 standard ITSELF. None of them show me THE GB18030
standard ITSELF from google. All of them show me the INFORMATION about GB18030. Since 
I work on supporting standard for years, I only trust
the standard itself these days. Tell me any link you can find from google which point 
to THE GB18030 standard. I really hope you can give
me one. Sorry, I am really a picky guy about standard. See too many false information 
and interpretation in the past

Kennth gave me a direct quote from the paper copy of the standard he had, which is 
what I need.


> --
> David Starner - [EMAIL PROTECTED]
> Pointless website: http://dvdeug.dhis.org
> When the aliens come, when the deathrays hum, when the bombers bomb,
> we'll still be freakin' friends. - "Freakin' Friends"





Re: GB18030

2001-09-27 Thread Markus Scherer

Yung-Fong Tang wrote:
> ... But you
> still need to know what U+4ff3a to define such mapping table, right?

Wrong. You just need to know the mapping between code points, whether assigned, used, 
or whatever.

> ... So, whatever the software the user currently have today, without an
> upgrade (either upgrade the code or mapping table) still won't know how to
> convert U+4ff3a to lower case or upper case, right ?

No, but that's irrelevant for character conversion. Once you update the Unicode 
character database in your product, your software will do it - if it knows how to deal 
with supplementary characters in general. (That part is a technicality which is, 
again, independent of whether there _are_ assigned characters.)

> But how can you generate such mapping table without knowing that character ?

By specifying which _code point_ in one encoding gets mapped to which other _code 
point_ in the other encoding.
Character conversion never looks at whether the code points that it maps are actual 
_characters_.

When you map between the GBK or Shift-JIS user-defined areas and Unicode PUA or 
similar, then you also map code points that don't have characters. What's new?

> ...
> How many years does it take for people to realize that give a new mappint to
> their customer still need a complete life cycle of QA and distribution?  And
> there will be a new version number attach to the software for that.

Is this about the existence of supplementary characters again?
They exist since 1996, and a vendor who followed the UTC/ISO negotiations could see it 
coming since 1993.
Surely most everyone had the time to roll out a new release of their software to get 
the support for them in - in more than five years?

(I know that few actually worked on this in time. But time there was.)


markus




Re: GB18030

2001-09-27 Thread Markus Scherer

Yung-Fong Tang wrote:
> ...
> > http://www-106.ibm.com/developerworks/library/u-china.html
> >
> > Markus Scherer's excellent documentation of GB 18030, with
> > code snippets and pointer to a complete ICU implementation.
> 
> That paper itself does not specify any details mapping table.

True, but it explains that they are treated algorithmically, and how to do that.

> I look at
> http://oss.software.ibm.com/cvs/icu/charset/data/xml/gb-18030-2000.xml .
> 
> It is interesting that the mapping between U+1 and U+10 is check
> in only 5 weeks ago in the version 1.3

We had this same, correct mapping table up elsewhere on our server since February, I 
believe.

When we imported the .xml mapping tables into our newish charset cvs repository, we 
accidentally ran the tool that generates .xml from our internal format on this one as 
well. That does not work since the internal 18030 file is missing all algorithmic 
parts (we don't have an equivalent of the  element). This is the one file that 
we cannot fully generate from our internal table...

I sent an email to this list 5 weeks ago pointing out this mistake. Sorry for the 
confusion.

> ...
> looks like I beat ICU by checkin my mapping table at April 9 (to
> mozilla) , 10 days before they check in their first version of GB18030
> xml mapping table :)

I am sorry to disappoint you. ICU 1.7, released in December 2000, had the GB 18030 
converter. I implemented it in October, and updated it with the new mapping table from 
2000-nov-30 on that same day. That all includes support for the supplementary planes!
:-)

> I probably can still claim the first open source
> project which support GB18030 to Unicode conversion, althought I didn't
> do anything beyond BMP 

Nope ;-)


markus




Re: GB18030

2001-09-27 Thread Yung-Fong Tang



Kenneth Whistler wrote:

> Frank,
>
> But on p. 5, clause 7.3 of the original GB 18030-2000, it states (in
> Chinese):
>
> "From 0x90308130 to 0xE339FE39, altogether 1058400 code points, correspond
> to GB 13000's 16 supplementary planes..."
> --Ken

OK, I have filed a bug against mozilla for this . see
http://bugzilla.mozilla.org/show_bug.cgi?id=101998 I also submit a patch there
(see the bug report). Unfortunately , I don't have time to test it yet.
It will be nice if someone can code review that change for me.

Sun folks, do you care about GB18030 to surrogate conversion in mozilla ?
Please help code review and QA it. Thanks.






Re: GB18030

2001-09-27 Thread David Starner

On Thu, Sep 27, 2001 at 12:27:11PM -0700, Yung-Fong Tang wrote:
> looks like I beat ICU by checkin my mapping table at April 9 (to
> mozilla) , 10 days before they check in their first version of GB18030
> xml mapping table :)  I probably can still claim the first open source
> project which support GB18030 to Unicode conversion, althought I didn't
> do anything beyond BMP 

GNU libc CVS claims that the first version of the GB18030 iconv modules
was uploaded to CVS on Jul 14, 2000, and the version corresponding to
the current version of GB18030 was uploaded to CVS on Feb 14, 2001, with
only minor changes since then. It has supported non-BMP characters since
Jun 6, 2001.

-- 
David Starner - [EMAIL PROTECTED]
Pointless website: http://dvdeug.dhis.org
"I saw a daemon stare into my face into my face, and an angel touch my 
breast; each one softly calls my name . . . the daemon scares me less."
- "Disciple", Stuart Davis




Re: GB18030

2001-09-27 Thread David Starner

On Thu, Sep 27, 2001 at 01:07:43PM -0700, Yung-Fong Tang wrote:
> Draw a glyph from a font to implement case conversion, property mapping ? I don't 
>know how can you do that.

When is case conversion a panic situation? If you can't recognize the
character, then just don't convert it. All unassigned characters have
default properties - use them. No, you don't know all about the 
character, but you know enough to load a font and display it, which is
all a webbrowser or a wordprocessor needs 90% of the time. 

> That is my quetion DOES it define so. I don't have the access to THE specification 
>itself and asking help to get one. Do you have the
> access to the specification and DOES it specify so?

Do you not have access to the web? It took me 4 minutes to find the
information on the web. Start with www.google.com and type in GB18030,
and you'll find most of the information right there.  Others have
pointed out more specific links.

-- 
David Starner - [EMAIL PROTECTED]
Pointless website: http://dvdeug.dhis.org
When the aliens come, when the deathrays hum, when the bombers bomb,
we'll still be freakin' friends. - "Freakin' Friends"




Re: GB18030

2001-09-27 Thread Yung-Fong Tang


 
Kenneth Whistler wrote:
Frank,
> You don't need to explain to me
> the concept of GB18030. The question I have is about details mapping
> information.
Now, now, there's no need to get snippy with me. It sounded
like you were unclear from the kinds of questions you were
asking.
Sorry for that. I have any flame in my message.
> I look at
> http://oss.software.ibm.com/cvs/icu/charset/data/xml/gb-18030-2000.xml
.
>
> It is interesting that the mapping between U+1 and U+10 is
check
> in only 5 weeks ago in the version 1.3
>
> 
|  30910:  
> bFirst="90 30 81 30" bLast="E3 32 9A 35"  bMin="81 30 81 30"
bMax="FE 39
> FE 39"/>
>
> Is the U+1 - U+10 mapping between Unicode and GB18030 specified
> in the GB18030 standard itself? can someone fax me that page ? Thanks.
Unfortunately, I don't have the revised and corrected version of
the standard to hand.
Is that possible you can fax me the old original version ? My fax number
is   +1 650 937 5413 . Thanks
 
But on p. 5, clause 7.3 of the original GB 18030-2000,
it states (in
Chinese):
"From 0x90308130 to 0xE339FE39, altogether 1058400 code points, correspond
to GB 13000's 16 supplementary planes..."
Thank you very much. This is the information I need. It clearly define
the mapping between GB18030 to Unicode supplement planes in the character
level. Thanks.  With this information, we can implement the conversion
between GB18030 to Unicode.
 
If you look at the ICU specification, bFirst="90
30 81 30" and
bLast="E3 32 9A 35" corresponds to:
83 "groups" (90..E2) of GB 18030:    83 x 10 x 1260 =
1045800 code points
 2 "planes" (E3 30..31) of GB 18030:  
2 x 1260 =    2520 code points
25 "rows"   (E3 32 81..99) of GB 18030:   
25 x 10 =  250 code points
 6 "cells"  (E3 32 9A 30..35) of GB 18030: 
6 code points

Total    1048576 code points
And 1048576 code points = 16 x 66536 code points = 16 planes of 10646.
So GB 18030 and ICU agree. Start at 0x90308130 and lay out all the
rest of the Unicode supplementary code points in order.
--Ken



Re: GB18030

2001-09-27 Thread Yung-Fong Tang


 
David Starner wrote:
On Wed, Sep 26, 2001 at 06:17:15PM -0700, Yung-Fong
Tang wrote:
> Sure Unicode defined those planes, but defining planes without defining
the characters in it mean not too much to people. How can
> you implement case conversion, property mapping without knowing what
is inside.
How do you do that for BMP characters? There's a whole lot you can do
without knowing the identity of a character. You can draw the glyph
from
a font, which will suffice for a lot of purposes.
Draw a glyph from a font to implement case conversion, property
mapping ? I don't know how can you do that.
> In particular, DOES GB18030 define code point to
> code point mapping (beyond BMP) between Unicode? Unless you can said
that is YES and show me the specification how to map between
> them, there are no way people can implement code set conversion between
GB18030 and Unicode.
 
That is my quetion DOES it define so. I don't have the access to THE specification
itself and asking help to get one. Do you have the access to the specification
and DOES it specify so?
 
Have you looked for the specification? Or are you just going to complain
on the list?
I am not complain on the list. I am asking for confirmation about what
is in the specification.
 
According to GNU libc, the algorithm for coverting a Unicode character
ch outside the BMP to GB18030 to outptr (1 .. 4) is:
    idx := ch + 16#1E248#;
    outptr (4) := (idx div 10)
+ 16#30#;
    idx := idx / 10;
    outptr (3) := (idx div 126)
+ 16#81#;
    idx := idx / 126;
    outptr (2) := (idx div 10)
+ 16#30#;
    outptr (1) := (idx / 10)
+ 16#81#;
Thanks for provide me such information, although I havce no clue what does
"16#1E248#" mean here. I assume it mean 0x1e248, is that true.
 
 
 
 
--
David Starner - [EMAIL PROTECTED]
Pointless website: http://dvdeug.dhis.org
When the aliens come, when the deathrays hum, when the bombers bomb,
we'll still be freakin' friends. - "Freakin' Friends"



Re: GB18030

2001-09-27 Thread Michael \(michka\) Kaplan

From: "Yung-Fong Tang" <[EMAIL PROTECTED]>

> Can anyone tell me where can I find a online version of the GB18030
> standard (yes, I want the STANDARD itself. Not someone's paper talk
> about the standard) . Or anyone could tell me where to get a copy of the
> standard.

You mean the original Chinese? Hmmm I remember that folks were
frantically sending that link around last year as they struggled to get it
translated into English. I am not sure where the links were pointing to,
though.

> Is the U+1 - U+10 mapping between Unicode and GB18030 specified
> in the GB18030 standard itself? can someone fax me that page ? Thanks.

The mapping is defined (how else could anyone have implemented it?).

> looks like I beat ICU by checkin my mapping table at April 9 (to
> mozilla) , 10 days before they check in their first version of GB18030
> xml mapping table :)  I probably can still claim the first open source
> project which support GB18030 to Unicode conversion, althought I didn't
> do anything beyond BMP 

Considering the fact that neither Netscape 6.1 nor Mozilla 0.9.3 seem to be
able to handle supplementary characters, even on a machine that has the
support turned on and the font available, I can verify that there is no
"beyond the BMP" support there. :-)

IE 5.5 and IE 6.0 seem to do a much better job here, on the whole but
there is always hope for the future


MichKa

Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/






Re: GB18030

2001-09-27 Thread Kenneth Whistler

Frank,

> You don't need to explain to me
> the concept of GB18030. The question I have is about details mapping
> information.

Now, now, there's no need to get snippy with me. It sounded
like you were unclear from the kinds of questions you were
asking.

> I look at
> http://oss.software.ibm.com/cvs/icu/charset/data/xml/gb-18030-2000.xml .
> 
> It is interesting that the mapping between U+1 and U+10 is check
> in only 5 weeks ago in the version 1.3
> 
>  |  30910:bFirst="90 30 81 30" bLast="E3 32 9A 35"  bMin="81 30 81 30" bMax="FE 39
> FE 39"/>
> 

> Is the U+1 - U+10 mapping between Unicode and GB18030 specified
> in the GB18030 standard itself? can someone fax me that page ? Thanks.

Unfortunately, I don't have the revised and corrected version of
the standard to hand.

But on p. 5, clause 7.3 of the original GB 18030-2000, it states (in
Chinese):

"From 0x90308130 to 0xE339FE39, altogether 1058400 code points, correspond
to GB 13000's 16 supplementary planes..."

If you look at the ICU specification, bFirst="90 30 81 30" and
bLast="E3 32 9A 35" corresponds to:

83 "groups" (90..E2) of GB 18030:83 x 10 x 1260 = 1045800 code points
 2 "planes" (E3 30..31) of GB 18030:   2 x 1260 =2520 code points
25 "rows"   (E3 32 81..99) of GB 18030:25 x 10 =  250 code points
 6 "cells"  (E3 32 9A 30..35) of GB 18030:  6 code points
 Total1048576 code points

And 1048576 code points = 16 x 66536 code points = 16 planes of 10646.

So GB 18030 and ICU agree. Start at 0x90308130 and lay out all the
rest of the Unicode supplementary code points in order.

--Ken




Re: GB18030

2001-09-27 Thread Yung-Fong Tang



Kenneth Whistler wrote:

> Frank,

>
> Yes. Absolutely it does. It is spelled out in the standard
> itself.
>
> GB 18030 <--> Unicode conversion is basically like a big
> UTF, with an enormous table for all the GBK part of the
> encoding, and a bunch of offset ranges to convert all the
> other code points.

I know. I already implement the Unicode BMP to GB18030 conversion (back
and forth) in Mozilla. The 4 bytes GB18030 to Unicode BMP conversion
only take me about 1488 bytes (see
http://lxr.mozilla.org/seamonkey/source/intl/uconv/ucvcn/gb180304bytes.ut
) . The Unicode BMP to GB18030 4 bytes part (not including the 2 bytes
part) only take me 1036 bytes to code the table (see
http://lxr.mozilla.org/seamonkey/source/intl/uconv/ucvcn/gb180304bytes.uf
). I got the origional mapping from Sun Microsystem. Unfortunately, I
did find a mapping table beyond BMP.  You don't need to explain to me
the concept of GB18030. The question I have is about details mapping
information.

>
>
> > Unless you
> > can said that is YES and show me the specification how to
> > map between
> > them, there are no way people can implement code set
> > conversion between GB18030 and Unicode.
>
> http://www-106.ibm.com/developerworks/library/u-china.html
>
> Markus Scherer's excellent documentation of GB 18030, with
> code snippets and pointer to a complete ICU implementation.

That paper itself does not specify any details mapping table.

I look at
http://oss.software.ibm.com/cvs/icu/charset/data/xml/gb-18030-2000.xml .

It is interesting that the mapping between U+1 and U+10 is check
in only 5 weeks ago in the version 1.3

 |  30910:   

Can anyone tell me where can I find a online version of the GB18030
standard (yes, I want the STANDARD itself. Not someone's paper talk
about the standard) . Or anyone could tell me where to get a copy of the
standard.

Is the U+1 - U+10 mapping between Unicode and GB18030 specified
in the GB18030 standard itself? can someone fax me that page ? Thanks.

looks like I beat ICU by checkin my mapping table at April 9 (to
mozilla) , 10 days before they check in their first version of GB18030
xml mapping table :)  I probably can still claim the first open source
project which support GB18030 to Unicode conversion, althought I didn't
do anything beyond BMP 

>
>
> >
> > That question is not wheather they should define the
> > relationship or not, but have they defined it yet.
>
> They have.
>
> --Ken





Re: GB18030

2001-09-27 Thread Yung-Fong Tang

Sure I know it could (and will ) be implement by a mapping table. But you
still need to know what U+4ff3a to define such mapping table, right ? and the
mapping table will still be part of the software package, right ? And the user
still won't get your new version of mapping table untill they upgrade it,
right ? So, whatever the software the user currently have today, without an
upgrade (either upgrade the code or mapping table) still won't know how to
convert U+4ff3a to lower case or upper case, right ?

But how can you generate such mapping table without knowing that character ?
When we deliver software to the customer, it contains both code and mapping
table. and once the software is distribute to the customer, unless you
redistribute a newer version or a patch to the customer, the customer won't
have a new code or mapping table. From software engineering point of view,
upgrading a mapping table are the same as upgrade code. You need to run the
full QA cycle, you need to rebuild the installer, you need to distribute them
the end user. Although it is safer to change the mapping table than change the
code, it is the same in term of software distribution.

Geoffrey Waigh wrote:

> On Wed, 26 Sep 2001, Yung-Fong Tang wrote:
>
> > how can you implement tolower(U+4ff3a) without knowing what U+4ff3a is ?
>
> With a data table.  One set of debugged code that handles surrogates,
> composing characters, bidirectionality etc. coupled with a datafile that
> gets upgraded with each release of Unicode.  How many years does it take
> to implement some of these concepts?  It shouldn't require
> honest-to-goodness we-were't-kidding see-here's-one-defined-now characters
> for developers to slap themselves on the head and start developing support
> for these things.

How many years does it take for people to realize that give a new mappint to
their customer still need a complete life cycle of QA and distribution?  And
there will be a new version number attach to the software for that.

>
>
> Geoffrey





Re: Cyrillic Q

2001-09-27 Thread James E. Agenbroad

On Thu, 27 Sep 2001, John Hudson wrote:

> At 02:48 9/27/2001, Marco Cimarosti wrote:
> 
> >A lot of time ago, someone on this list mentioned a language, written in the
> >Cyrillic alphabet, which employed letter "Q", taken from the Latin alphabet.
> >
> >Which language is it?
> 
> Kurdish. The common Cyrillic orthography includes four Latin letterforms 
> that are, as far as I know, unique to Kurdish:
> 
>  U+0051, U+0071  Capital, Small Q
>  U+0057, U+077   Capital, Small W
> 
> John Hudson
> 
> Tiro Typeworkswww.tiro.com
> Vancouver, BC [EMAIL PROTECTED]
> 
> Type is something that you can pick up and hold in your hand.
>- Harry Carter
> 
> 
 Thursday, Septembe 27, 2001
Besides Kurdish, the section on tansliteration of non-Slavic languages
using Cyrillic the ALA-LC romanization tables (1997) shows Q used with
four other languages: Aisor, Chechen (the 1862 and 1908 orthographies but
not the 1938 one), Dargwa (Uslar) and Lak (1864 but not 1938). For Kurdish
Q seems also to have an alternative glyph that appears as "O" followed by
a vertical bar which is also used with Lezghian (Uslar).  

 Regards,
  Jim Agenbroad ( [EMAIL PROTECTED] )
 The above are purely personal opinions, not necessarily the official
views of any government or any agency of any.
Phone: 202 707-9612; Fax: 202 707-0955; US mail: I.T.S. Dev.Gp.4, Library
of Congress, 101 Independence Ave. SE, Washington, D.C. 20540-9334 U.S.A.  





Re: Egyptian Transliteration Characters

2001-09-27 Thread Mark Davis



You need to get a Unicode-enabled browser and font 
;-)
 
Attached is a screen shot, and here is the html (sorry for the decimal, but I'm in a rush, and that's 
what MS gives you):


"shape to ỉ and ʻ or ʿ 
that they cannot be used?"
 
Mark
—
 
Δός μοι ποῦ στῶ, καὶ κινῶ τὴν γῆν — 
Ἀρχιμήδης[http://www.macchiato.com]
- Original Message - 
From: "Michael Everson" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, September 27, 2001 3:41 
AM
Subject: Re: Egyptian Transliteration 
Characters
> At 15:05 -0700 2001-09-26, §§Û§S§¶§Í§Â§¶§½ wrote:> >Is 
this the same Unicode that encodes characters and not glyphs?> > 
Yes, it is, and I am not certain that Mark's "strong" suspicion is > 
correct because I have seen a lot of data. But I'll be asking > 
Egyptologists.> > >  >1. LATIN CAPITAL LETTER 
EGYPTOLOGICAL YOD> >>LATIN SMALL LETTER EGYPTOLOGICAL YOD> 
>>2. LATIN CAPITAL LETTER EGYPTOLOGICAL AYIN> >>LATIN SMALL 
LETTER EGYPTOLOGICAL AYIN> >>> >>I strongly suspect 
that current diacritics (for 1) and modifier letters (for> >>2) are 
similar enough in shape to what is required that they can be used.> 
>>Are there any other characters used by Egyptologist that are so close 
in> >shape to i?? and ?? or ?? that they cannot be used?> 
> I don't know what i?? and ?? or ?? were meant to be, Mark.> -- 
> Michael Everson *** Everson Typography *** http://www.evertype.com> 15 
Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland> 
Telephone +353 86 807 9169 *** Fax +353 1 478 2597 (by arrangement)> 
> 
 eqypt.gif


Re: Egyptian Transliteration Characters

2001-09-27 Thread Spencer_Tasker


For what its worth I did not think of doing anything with the YODs because
of their close correspondence to

1F30GREEK SMALL LETTER IOTA WITH PSILI
1F38GREEK CAPITAL LETTER IOTA WITH PSILI

Which in practice would look all the more like the YODs  because of the
standard egyptological practice if italicising transliterations.

But having said that I certainly have no problem with these characters and
this is somewhat more systematic that would be the case were one to use
iotas.

- Spencer




   

Michael Everson

 

   

   

Sent by:   

unicode-bounce@uTo:[EMAIL PROTECTED] 

nicode.org  cc:

Subject:Re: Egyptian Transliteration 
Characters
   

27.09.01 12:41 

   

   





At 15:05 -0700 2001-09-26, §?§Û§?§¶§Í§Â§¶§½ wrote:
>Is this the same Unicode that encodes characters and not glyphs?

Yes, it is, and I am not certain that Mark's "strong" suspicion is
correct because I have seen a lot of data. But I'll be asking
Egyptologists.

>  >1. LATIN CAPITAL LETTER EGYPTOLOGICAL YOD
>>LATIN SMALL LETTER EGYPTOLOGICAL YOD
>>2. LATIN CAPITAL LETTER EGYPTOLOGICAL AYIN
>>LATIN SMALL LETTER EGYPTOLOGICAL AYIN
>>
>>I strongly suspect that current diacritics (for 1) and modifier letters
(for
>>2) are similar enough in shape to what is required that they can be used.
>>Are there any other characters used by Egyptologist that are so close in
>shape to i?? and ?? or ?? that they cannot be used?

I don't know what i?? and ?? or ?? were meant to be, Mark.
--
Michael Everson *** Everson Typography *** http://www.evertype.com
15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland
Telephone +353 86 807 9169 *** Fax +353 1 478 2597 (by arrangement)








Re: Missing Arabic and Syriac characters in Unicode

2001-09-27 Thread Miikka-Markus Alhonen


On 24-Sep-01 Michael Everson wrote:
> Miikka-Markus,
> 
> I'd suggest that you write this up as a PDF document (with scanned 
> examples) and submit it to the UTC and WG2 for consideration. 

OK. I'll start working on it. I mean, at least the Arabic part of my message.
I'm not a professional Semiticist and I live in Finland which is quite far from
Syriac-writing countries, so I'm not sure if I can get access to any material
about the early forms of Edessan vowels here. I'll see what I can find in
Finnish university libraries and consult professionals, too. Is there anyone on
this list who could provide more information/samples about the things I wrote?

Best regards!

--
E-Mail: Miikka-Markus Alhonen <[EMAIL PROTECTED]>
Date: 27-Sep-01
Time: 15:45:30

This message was sent by XFMail
--




Re: GB18030

2001-09-27 Thread Tom Emerson

GB 18030 is aligned to ISO 10646, which does not define the semantic
properties that Unicode does.

-- 
Tom Emerson  Basis Technology Corp.
Sr. Sinostringologist  http://www.basistech.com
  "Beware the lollipop of mediocrity: lick it once and you suck forever"




Re: Egyptian Transliteration Characters

2001-09-27 Thread Michael Everson

At 15:05 -0700 2001-09-26, §§Û§Š§¶§Í§Â§¶§½ wrote:
>Is this the same Unicode that encodes characters and not glyphs?

Yes, it is, and I am not certain that Mark's "strong" suspicion is 
correct because I have seen a lot of data. But I'll be asking 
Egyptologists.

>  >1. LATIN CAPITAL LETTER EGYPTOLOGICAL YOD
>>LATIN SMALL LETTER EGYPTOLOGICAL YOD
>>2. LATIN CAPITAL LETTER EGYPTOLOGICAL AYIN
>>LATIN SMALL LETTER EGYPTOLOGICAL AYIN
>>
>>I strongly suspect that current diacritics (for 1) and modifier letters (for
>>2) are similar enough in shape to what is required that they can be used.
>>Are there any other characters used by Egyptologist that are so close in
>shape to i?? and ?? or ?? that they cannot be used?

I don't know what i?? and ?? or ?? were meant to be, Mark.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com
15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland
Telephone +353 86 807 9169 *** Fax +353 1 478 2597 (by arrangement)




Re: Cyrillic Q

2001-09-27 Thread John Hudson

At 02:48 9/27/2001, Marco Cimarosti wrote:

>A lot of time ago, someone on this list mentioned a language, written in the
>Cyrillic alphabet, which employed letter "Q", taken from the Latin alphabet.
>
>Which language is it?

Kurdish. The common Cyrillic orthography includes four Latin letterforms 
that are, as far as I know, unique to Kurdish:

 U+0051, U+0071  Capital, Small Q
 U+0057, U+077   Capital, Small W

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

Type is something that you can pick up and hold in your hand.
   - Harry Carter





Re: Cyrillic Q

2001-09-27 Thread Roozbeh Pournader

On Thu, 27 Sep 2001, Marco Cimarosti wrote:

> A lot of time ago, someone on this list mentioned a language, written in the
> Cyrillic alphabet, which employed letter "Q", taken from the Latin alphabet.
>
> Which language is it?

IIRC, it was Kurdish.

roozbeh





Cyrillic Q

2001-09-27 Thread Marco Cimarosti

A lot of time ago, someone on this list mentioned a language, written in the
Cyrillic alphabet, which employed letter "Q", taken from the Latin alphabet.

Which language is it?

Are the glyphs for that "Q" identical to Latin in both cases?

How is the status of this "Q" in Unicode: is it still unified with Latin Q
or has it been allocated as a Cyrillic letter?

Thanks in advance.

_ Marco