Re: GB18030
On Thu, Sep 27, 2001 at 03:03:22PM -0700, Yung-Fong Tang wrote: > David Starner wrote: > > > If you can't recognize the > > character, then just don't convert it. > > It could be the quality of other's software, we have higher standard however. Higher standard? If I'm working on "Old High German" on a system that only supports Unicode 2.1, I'd be much happier for it to look for U+0225 in my font and display what it finds there, rather than not displaying the character, refusing to read the file, or silently munging the file (in order of evilness.) It is more important for me to be able to process the file and lose some functionality than not to be able to read the file. -- David Starner - [EMAIL PROTECTED] Pointless website: http://dvdeug.dhis.org "I saw a daemon stare into my face into my face, and an angel touch my breast; each one softly calls my name . . . the daemon scares me less." - "Disciple", Stuart Davis
INFO: Extension A and B on Windows... and/or in your browser
[With permission from Microsoft], I had the help file for the Extension A & B IME/font that ships with Office XP (CHS and Hong Kong editions) translated into English, those who are interested in reading a bit on Microsoft's efforts to provide an IME to handle these characters can see it here: http://www.i18nwithvb.com/surrogate_ime/ (no warranty about the translation or even the original text, of course!) Now, most people will be more interested in the code charts that are included in the help file, reproduced in this online version (divided into the same 16 parts as the help file does). They can be found at http://www.i18nwithvb.com/surrogate_ime/code_charts/ There are three ways to load each page: 1) DEFAULT -- the tables are loaded with the font specified as "Simsun (Founder Extended)" -- the font that ships with the CHS and HK versions of Office XP. 2) NO FONT SPECIFIED -- the tables are loaded with no font specified -- good for people who do not have the font on their machine and have hope that their browser will make up for this with an alternate font. 3) EOT -- the tables are loaded with an EOT file produced by WEFT -- good for people who do not have the font but do have IE and so have a hope that the EOT files will allow one to see the characters You will want to pay attention to the instructions on that default "code charts" page on the configuration where I was able to verify that these pages are rendered properly -- every other browser and OS combination I tried would have some type of failure, in whole or part. It was not my personal choice to prove that IE seems to have superior support for Extension A and B, but I was unable to get either of the non-IE browsers I tried to do such a hot job here. Even the IE case required Win2000 or XP, properly set up for supplementary characters. The two most interesting failures (IMHO): 1) Mozilla 0.9.3 -- Almost all of Ext. A shows up as question mark (?) and all Ext B shows up as a double question mark (??) -- these are *not* corrupted chars though -- copy and paste to Unicode notepad proves that the actual characters are present, Mozilla is just using ? or ?? instead of the default char. like it should be. Very weird and unexpected, to me at least. :-) 2) IE >= 5.0 on machine set up for surrogate support and with font on machine but without the font specified -- almost all of extension A is not visible but all of extension B is -- I have been told this by various font experts outside of MS that this is a known bug with displaying Extension A characters in IE without the font explicitly specified. Enjoy! MichKa Michael Kaplan Trigeminal Software, Inc. http://www.trigeminal.com/
Re: GB18030
From: Yung-Fong Tang > Case mapping ? You have no way to generate mapping table for > case mapping with knowing the character unless you already > define those character have no case or only one case. Um, Unicode defines a behavior and even properties for unassigned code points. If you choose not to implement this because you only handle "assigned code points" then that is actually a problem with your software. No one is arguing the point you make that until a code point is assigned, its exact *FINAL* behavior is not completely understood with regards to casing, collation, and everything else. So you do not need to continue arguing this point -- I am sure everyone agrees with it. But do you understand that there is certainly a defined behavior for it in the interim? In the time before it is assigned an actual character? That is I think the crux of the matter here. > Don't tell me there any people how implemented > HanCharacterStokeNumber(U+2) in 1996, no body have a > implementation of HanCharacterStokeNumber(U+2) until > U+2 got defined. Actually, several companies had the mechanisms defined to convert that to a surrogate pair. Or to treat it as a single unassigned character for the purposes of collation. The difference between them and you would be that you do not recognize the existence of this state -- the time before direct assignment? MichKa Michael Kaplan Trigeminal Software, Inc. http://www.trigeminal.com/
Re: GB18030
Markus Scherer wrote: Yung-Fong Tang wrote: > ... But you > still need to know what U+4ff3a to define such mapping table, right? Wrong. You just need to know the mapping between code points, whether assigned, used, or whatever. > ... So, whatever the software the user currently have today, without an > upgrade (either upgrade the code or mapping table) still won't know how to > convert U+4ff3a to lower case or upper case, right ? No, but that's irrelevant for character conversion. Once you update the Unicode character database in your product, your software will do it - if it knows how to deal with supplementary characters in general. (That part is a technicality which is, again, independent of whether there _are_ assigned characters.) It still take a "Once you update the Unicode character database in your product" to make it happen, right? From software distribution point of view, it mean a different version number and therefore usually require a QA cycle. As I said, you CANNOT do it WITHOUT an upgrade. Anteing could happen WITH an upgrade- either change to code or change the mapping table. > But how can you generate such mapping table without knowing that character ? By specifying which _code point_ in one encoding gets mapped to which other _code point_ in the other encoding. Character conversion never looks at whether the code points that it maps are actual _characters_. When you map between the GBK or Shift-JIS user-defined areas and Unicode PUA or similar, then you also map code points that don't have characters. What's new? Case mapping ? You have no way to generate mapping table for case mapping with knowing the character unless you already define those character have no case or only one case. > ... > How many years does it take for people to realize that give a new mappint to > their customer still need a complete life cycle of QA and distribution? And > there will be a new version number attach to the software for that. Is this about the existence of supplementary characters again? They exist since 1996, and a vendor who followed the UTC/ISO negotiations could see it coming since 1993. Surely most everyone had the time to roll out a new release of their software to get the support for them in - in more than five years? Don't tell me there any people how implemented HanCharacterStokeNumber(U+2) in 1996, no body have a implementation of HanCharacterStokeNumber(U+2) until U+2 got defined. (I know that few actually worked on this in time. But time there was.) markus
Re: GB18030
ok... you beat me :) David Starner wrote: > On Thu, Sep 27, 2001 at 12:27:11PM -0700, Yung-Fong Tang wrote: > > looks like I beat ICU by checkin my mapping table at April 9 (to > > mozilla) , 10 days before they check in their first version of GB18030 > > xml mapping table :) I probably can still claim the first open source > > project which support GB18030 to Unicode conversion, althought I didn't > > do anything beyond BMP > > GNU libc CVS claims that the first version of the GB18030 iconv modules > was uploaded to CVS on Jul 14, 2000, and the version corresponding to > the current version of GB18030 was uploaded to CVS on Feb 14, 2001, with > only minor changes since then. It has supported non-BMP characters since > Jun 6, 2001. > > -- > David Starner - [EMAIL PROTECTED] > Pointless website: http://dvdeug.dhis.org > "I saw a daemon stare into my face into my face, and an angel touch my > breast; each one softly calls my name . . . the daemon scares me less." > - "Disciple", Stuart Davis
Re: GB18030
David Starner wrote: > On Thu, Sep 27, 2001 at 01:07:43PM -0700, Yung-Fong Tang wrote: > > Draw a glyph from a font to implement case conversion, property mapping ? I don't >know how can you do that. > > When is case conversion a panic situation? I never said it is "a panic situration" > If you can't recognize the > character, then just don't convert it. It could be the quality of other's software, we have higher standard however. > All unassigned characters have > default properties - use them. No, you don't know all about the > character, but you know enough to load a font and display it, which is > all a webbrowser or a wordprocessor needs 90% of the time. > > > That is my quetion DOES it define so. I don't have the access to THE specification >itself and asking help to get one. Do you have the > > access to the specification and DOES it specify so? > > Do you not have access to the web? It took me 4 minutes to find the > information on the web. Start with www.google.com and type in GB18030, > and you'll find most of the information right there. Others have > pointed out more specific links. No, I am NOT asking "the information" about ths GB18030 standard. I am asking the GB18030 standard ITSELF. None of them show me THE GB18030 standard ITSELF from google. All of them show me the INFORMATION about GB18030. Since I work on supporting standard for years, I only trust the standard itself these days. Tell me any link you can find from google which point to THE GB18030 standard. I really hope you can give me one. Sorry, I am really a picky guy about standard. See too many false information and interpretation in the past Kennth gave me a direct quote from the paper copy of the standard he had, which is what I need. > -- > David Starner - [EMAIL PROTECTED] > Pointless website: http://dvdeug.dhis.org > When the aliens come, when the deathrays hum, when the bombers bomb, > we'll still be freakin' friends. - "Freakin' Friends"
Re: GB18030
Yung-Fong Tang wrote: > ... But you > still need to know what U+4ff3a to define such mapping table, right? Wrong. You just need to know the mapping between code points, whether assigned, used, or whatever. > ... So, whatever the software the user currently have today, without an > upgrade (either upgrade the code or mapping table) still won't know how to > convert U+4ff3a to lower case or upper case, right ? No, but that's irrelevant for character conversion. Once you update the Unicode character database in your product, your software will do it - if it knows how to deal with supplementary characters in general. (That part is a technicality which is, again, independent of whether there _are_ assigned characters.) > But how can you generate such mapping table without knowing that character ? By specifying which _code point_ in one encoding gets mapped to which other _code point_ in the other encoding. Character conversion never looks at whether the code points that it maps are actual _characters_. When you map between the GBK or Shift-JIS user-defined areas and Unicode PUA or similar, then you also map code points that don't have characters. What's new? > ... > How many years does it take for people to realize that give a new mappint to > their customer still need a complete life cycle of QA and distribution? And > there will be a new version number attach to the software for that. Is this about the existence of supplementary characters again? They exist since 1996, and a vendor who followed the UTC/ISO negotiations could see it coming since 1993. Surely most everyone had the time to roll out a new release of their software to get the support for them in - in more than five years? (I know that few actually worked on this in time. But time there was.) markus
Re: GB18030
Yung-Fong Tang wrote: > ... > > http://www-106.ibm.com/developerworks/library/u-china.html > > > > Markus Scherer's excellent documentation of GB 18030, with > > code snippets and pointer to a complete ICU implementation. > > That paper itself does not specify any details mapping table. True, but it explains that they are treated algorithmically, and how to do that. > I look at > http://oss.software.ibm.com/cvs/icu/charset/data/xml/gb-18030-2000.xml . > > It is interesting that the mapping between U+1 and U+10 is check > in only 5 weeks ago in the version 1.3 We had this same, correct mapping table up elsewhere on our server since February, I believe. When we imported the .xml mapping tables into our newish charset cvs repository, we accidentally ran the tool that generates .xml from our internal format on this one as well. That does not work since the internal 18030 file is missing all algorithmic parts (we don't have an equivalent of the element). This is the one file that we cannot fully generate from our internal table... I sent an email to this list 5 weeks ago pointing out this mistake. Sorry for the confusion. > ... > looks like I beat ICU by checkin my mapping table at April 9 (to > mozilla) , 10 days before they check in their first version of GB18030 > xml mapping table :) I am sorry to disappoint you. ICU 1.7, released in December 2000, had the GB 18030 converter. I implemented it in October, and updated it with the new mapping table from 2000-nov-30 on that same day. That all includes support for the supplementary planes! :-) > I probably can still claim the first open source > project which support GB18030 to Unicode conversion, althought I didn't > do anything beyond BMP Nope ;-) markus
Re: GB18030
Kenneth Whistler wrote: > Frank, > > But on p. 5, clause 7.3 of the original GB 18030-2000, it states (in > Chinese): > > "From 0x90308130 to 0xE339FE39, altogether 1058400 code points, correspond > to GB 13000's 16 supplementary planes..." > --Ken OK, I have filed a bug against mozilla for this . see http://bugzilla.mozilla.org/show_bug.cgi?id=101998 I also submit a patch there (see the bug report). Unfortunately , I don't have time to test it yet. It will be nice if someone can code review that change for me. Sun folks, do you care about GB18030 to surrogate conversion in mozilla ? Please help code review and QA it. Thanks.
Re: GB18030
On Thu, Sep 27, 2001 at 12:27:11PM -0700, Yung-Fong Tang wrote: > looks like I beat ICU by checkin my mapping table at April 9 (to > mozilla) , 10 days before they check in their first version of GB18030 > xml mapping table :) I probably can still claim the first open source > project which support GB18030 to Unicode conversion, althought I didn't > do anything beyond BMP GNU libc CVS claims that the first version of the GB18030 iconv modules was uploaded to CVS on Jul 14, 2000, and the version corresponding to the current version of GB18030 was uploaded to CVS on Feb 14, 2001, with only minor changes since then. It has supported non-BMP characters since Jun 6, 2001. -- David Starner - [EMAIL PROTECTED] Pointless website: http://dvdeug.dhis.org "I saw a daemon stare into my face into my face, and an angel touch my breast; each one softly calls my name . . . the daemon scares me less." - "Disciple", Stuart Davis
Re: GB18030
On Thu, Sep 27, 2001 at 01:07:43PM -0700, Yung-Fong Tang wrote: > Draw a glyph from a font to implement case conversion, property mapping ? I don't >know how can you do that. When is case conversion a panic situation? If you can't recognize the character, then just don't convert it. All unassigned characters have default properties - use them. No, you don't know all about the character, but you know enough to load a font and display it, which is all a webbrowser or a wordprocessor needs 90% of the time. > That is my quetion DOES it define so. I don't have the access to THE specification >itself and asking help to get one. Do you have the > access to the specification and DOES it specify so? Do you not have access to the web? It took me 4 minutes to find the information on the web. Start with www.google.com and type in GB18030, and you'll find most of the information right there. Others have pointed out more specific links. -- David Starner - [EMAIL PROTECTED] Pointless website: http://dvdeug.dhis.org When the aliens come, when the deathrays hum, when the bombers bomb, we'll still be freakin' friends. - "Freakin' Friends"
Re: GB18030
Kenneth Whistler wrote: Frank, > You don't need to explain to me > the concept of GB18030. The question I have is about details mapping > information. Now, now, there's no need to get snippy with me. It sounded like you were unclear from the kinds of questions you were asking. Sorry for that. I have any flame in my message. > I look at > http://oss.software.ibm.com/cvs/icu/charset/data/xml/gb-18030-2000.xml . > > It is interesting that the mapping between U+1 and U+10 is check > in only 5 weeks ago in the version 1.3 > > | 30910: > bFirst="90 30 81 30" bLast="E3 32 9A 35" bMin="81 30 81 30" bMax="FE 39 > FE 39"/> > > Is the U+1 - U+10 mapping between Unicode and GB18030 specified > in the GB18030 standard itself? can someone fax me that page ? Thanks. Unfortunately, I don't have the revised and corrected version of the standard to hand. Is that possible you can fax me the old original version ? My fax number is +1 650 937 5413 . Thanks But on p. 5, clause 7.3 of the original GB 18030-2000, it states (in Chinese): "From 0x90308130 to 0xE339FE39, altogether 1058400 code points, correspond to GB 13000's 16 supplementary planes..." Thank you very much. This is the information I need. It clearly define the mapping between GB18030 to Unicode supplement planes in the character level. Thanks. With this information, we can implement the conversion between GB18030 to Unicode. If you look at the ICU specification, bFirst="90 30 81 30" and bLast="E3 32 9A 35" corresponds to: 83 "groups" (90..E2) of GB 18030: 83 x 10 x 1260 = 1045800 code points 2 "planes" (E3 30..31) of GB 18030: 2 x 1260 = 2520 code points 25 "rows" (E3 32 81..99) of GB 18030: 25 x 10 = 250 code points 6 "cells" (E3 32 9A 30..35) of GB 18030: 6 code points Total 1048576 code points And 1048576 code points = 16 x 66536 code points = 16 planes of 10646. So GB 18030 and ICU agree. Start at 0x90308130 and lay out all the rest of the Unicode supplementary code points in order. --Ken
Re: GB18030
David Starner wrote: On Wed, Sep 26, 2001 at 06:17:15PM -0700, Yung-Fong Tang wrote: > Sure Unicode defined those planes, but defining planes without defining the characters in it mean not too much to people. How can > you implement case conversion, property mapping without knowing what is inside. How do you do that for BMP characters? There's a whole lot you can do without knowing the identity of a character. You can draw the glyph from a font, which will suffice for a lot of purposes. Draw a glyph from a font to implement case conversion, property mapping ? I don't know how can you do that. > In particular, DOES GB18030 define code point to > code point mapping (beyond BMP) between Unicode? Unless you can said that is YES and show me the specification how to map between > them, there are no way people can implement code set conversion between GB18030 and Unicode. That is my quetion DOES it define so. I don't have the access to THE specification itself and asking help to get one. Do you have the access to the specification and DOES it specify so? Have you looked for the specification? Or are you just going to complain on the list? I am not complain on the list. I am asking for confirmation about what is in the specification. According to GNU libc, the algorithm for coverting a Unicode character ch outside the BMP to GB18030 to outptr (1 .. 4) is: idx := ch + 16#1E248#; outptr (4) := (idx div 10) + 16#30#; idx := idx / 10; outptr (3) := (idx div 126) + 16#81#; idx := idx / 126; outptr (2) := (idx div 10) + 16#30#; outptr (1) := (idx / 10) + 16#81#; Thanks for provide me such information, although I havce no clue what does "16#1E248#" mean here. I assume it mean 0x1e248, is that true. -- David Starner - [EMAIL PROTECTED] Pointless website: http://dvdeug.dhis.org When the aliens come, when the deathrays hum, when the bombers bomb, we'll still be freakin' friends. - "Freakin' Friends"
Re: GB18030
From: "Yung-Fong Tang" <[EMAIL PROTECTED]> > Can anyone tell me where can I find a online version of the GB18030 > standard (yes, I want the STANDARD itself. Not someone's paper talk > about the standard) . Or anyone could tell me where to get a copy of the > standard. You mean the original Chinese? Hmmm I remember that folks were frantically sending that link around last year as they struggled to get it translated into English. I am not sure where the links were pointing to, though. > Is the U+1 - U+10 mapping between Unicode and GB18030 specified > in the GB18030 standard itself? can someone fax me that page ? Thanks. The mapping is defined (how else could anyone have implemented it?). > looks like I beat ICU by checkin my mapping table at April 9 (to > mozilla) , 10 days before they check in their first version of GB18030 > xml mapping table :) I probably can still claim the first open source > project which support GB18030 to Unicode conversion, althought I didn't > do anything beyond BMP Considering the fact that neither Netscape 6.1 nor Mozilla 0.9.3 seem to be able to handle supplementary characters, even on a machine that has the support turned on and the font available, I can verify that there is no "beyond the BMP" support there. :-) IE 5.5 and IE 6.0 seem to do a much better job here, on the whole but there is always hope for the future MichKa Michael Kaplan Trigeminal Software, Inc. http://www.trigeminal.com/
Re: GB18030
Frank, > You don't need to explain to me > the concept of GB18030. The question I have is about details mapping > information. Now, now, there's no need to get snippy with me. It sounded like you were unclear from the kinds of questions you were asking. > I look at > http://oss.software.ibm.com/cvs/icu/charset/data/xml/gb-18030-2000.xml . > > It is interesting that the mapping between U+1 and U+10 is check > in only 5 weeks ago in the version 1.3 > > | 30910:bFirst="90 30 81 30" bLast="E3 32 9A 35" bMin="81 30 81 30" bMax="FE 39 > FE 39"/> > > Is the U+1 - U+10 mapping between Unicode and GB18030 specified > in the GB18030 standard itself? can someone fax me that page ? Thanks. Unfortunately, I don't have the revised and corrected version of the standard to hand. But on p. 5, clause 7.3 of the original GB 18030-2000, it states (in Chinese): "From 0x90308130 to 0xE339FE39, altogether 1058400 code points, correspond to GB 13000's 16 supplementary planes..." If you look at the ICU specification, bFirst="90 30 81 30" and bLast="E3 32 9A 35" corresponds to: 83 "groups" (90..E2) of GB 18030:83 x 10 x 1260 = 1045800 code points 2 "planes" (E3 30..31) of GB 18030: 2 x 1260 =2520 code points 25 "rows" (E3 32 81..99) of GB 18030:25 x 10 = 250 code points 6 "cells" (E3 32 9A 30..35) of GB 18030: 6 code points Total1048576 code points And 1048576 code points = 16 x 66536 code points = 16 planes of 10646. So GB 18030 and ICU agree. Start at 0x90308130 and lay out all the rest of the Unicode supplementary code points in order. --Ken
Re: GB18030
Kenneth Whistler wrote: > Frank, > > Yes. Absolutely it does. It is spelled out in the standard > itself. > > GB 18030 <--> Unicode conversion is basically like a big > UTF, with an enormous table for all the GBK part of the > encoding, and a bunch of offset ranges to convert all the > other code points. I know. I already implement the Unicode BMP to GB18030 conversion (back and forth) in Mozilla. The 4 bytes GB18030 to Unicode BMP conversion only take me about 1488 bytes (see http://lxr.mozilla.org/seamonkey/source/intl/uconv/ucvcn/gb180304bytes.ut ) . The Unicode BMP to GB18030 4 bytes part (not including the 2 bytes part) only take me 1036 bytes to code the table (see http://lxr.mozilla.org/seamonkey/source/intl/uconv/ucvcn/gb180304bytes.uf ). I got the origional mapping from Sun Microsystem. Unfortunately, I did find a mapping table beyond BMP. You don't need to explain to me the concept of GB18030. The question I have is about details mapping information. > > > > Unless you > > can said that is YES and show me the specification how to > > map between > > them, there are no way people can implement code set > > conversion between GB18030 and Unicode. > > http://www-106.ibm.com/developerworks/library/u-china.html > > Markus Scherer's excellent documentation of GB 18030, with > code snippets and pointer to a complete ICU implementation. That paper itself does not specify any details mapping table. I look at http://oss.software.ibm.com/cvs/icu/charset/data/xml/gb-18030-2000.xml . It is interesting that the mapping between U+1 and U+10 is check in only 5 weeks ago in the version 1.3 | 30910: Can anyone tell me where can I find a online version of the GB18030 standard (yes, I want the STANDARD itself. Not someone's paper talk about the standard) . Or anyone could tell me where to get a copy of the standard. Is the U+1 - U+10 mapping between Unicode and GB18030 specified in the GB18030 standard itself? can someone fax me that page ? Thanks. looks like I beat ICU by checkin my mapping table at April 9 (to mozilla) , 10 days before they check in their first version of GB18030 xml mapping table :) I probably can still claim the first open source project which support GB18030 to Unicode conversion, althought I didn't do anything beyond BMP > > > > > > That question is not wheather they should define the > > relationship or not, but have they defined it yet. > > They have. > > --Ken
Re: GB18030
Sure I know it could (and will ) be implement by a mapping table. But you still need to know what U+4ff3a to define such mapping table, right ? and the mapping table will still be part of the software package, right ? And the user still won't get your new version of mapping table untill they upgrade it, right ? So, whatever the software the user currently have today, without an upgrade (either upgrade the code or mapping table) still won't know how to convert U+4ff3a to lower case or upper case, right ? But how can you generate such mapping table without knowing that character ? When we deliver software to the customer, it contains both code and mapping table. and once the software is distribute to the customer, unless you redistribute a newer version or a patch to the customer, the customer won't have a new code or mapping table. From software engineering point of view, upgrading a mapping table are the same as upgrade code. You need to run the full QA cycle, you need to rebuild the installer, you need to distribute them the end user. Although it is safer to change the mapping table than change the code, it is the same in term of software distribution. Geoffrey Waigh wrote: > On Wed, 26 Sep 2001, Yung-Fong Tang wrote: > > > how can you implement tolower(U+4ff3a) without knowing what U+4ff3a is ? > > With a data table. One set of debugged code that handles surrogates, > composing characters, bidirectionality etc. coupled with a datafile that > gets upgraded with each release of Unicode. How many years does it take > to implement some of these concepts? It shouldn't require > honest-to-goodness we-were't-kidding see-here's-one-defined-now characters > for developers to slap themselves on the head and start developing support > for these things. How many years does it take for people to realize that give a new mappint to their customer still need a complete life cycle of QA and distribution? And there will be a new version number attach to the software for that. > > > Geoffrey
Re: Cyrillic Q
On Thu, 27 Sep 2001, John Hudson wrote: > At 02:48 9/27/2001, Marco Cimarosti wrote: > > >A lot of time ago, someone on this list mentioned a language, written in the > >Cyrillic alphabet, which employed letter "Q", taken from the Latin alphabet. > > > >Which language is it? > > Kurdish. The common Cyrillic orthography includes four Latin letterforms > that are, as far as I know, unique to Kurdish: > > U+0051, U+0071 Capital, Small Q > U+0057, U+077 Capital, Small W > > John Hudson > > Tiro Typeworkswww.tiro.com > Vancouver, BC [EMAIL PROTECTED] > > Type is something that you can pick up and hold in your hand. >- Harry Carter > > Thursday, Septembe 27, 2001 Besides Kurdish, the section on tansliteration of non-Slavic languages using Cyrillic the ALA-LC romanization tables (1997) shows Q used with four other languages: Aisor, Chechen (the 1862 and 1908 orthographies but not the 1938 one), Dargwa (Uslar) and Lak (1864 but not 1938). For Kurdish Q seems also to have an alternative glyph that appears as "O" followed by a vertical bar which is also used with Lezghian (Uslar). Regards, Jim Agenbroad ( [EMAIL PROTECTED] ) The above are purely personal opinions, not necessarily the official views of any government or any agency of any. Phone: 202 707-9612; Fax: 202 707-0955; US mail: I.T.S. Dev.Gp.4, Library of Congress, 101 Independence Ave. SE, Washington, D.C. 20540-9334 U.S.A.
Re: Egyptian Transliteration Characters
You need to get a Unicode-enabled browser and font ;-) Attached is a screen shot, and here is the html (sorry for the decimal, but I'm in a rush, and that's what MS gives you): "shape to ỉ and ʻ or ʿ that they cannot be used?" Mark — Δός μοι ποῦ στῶ, καὶ κινῶ τὴν γῆν — Ἀρχιμήδης[http://www.macchiato.com] - Original Message - From: "Michael Everson" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thursday, September 27, 2001 3:41 AM Subject: Re: Egyptian Transliteration Characters > At 15:05 -0700 2001-09-26, §§Û§S§¶§Í§Â§¶§½ wrote:> >Is this the same Unicode that encodes characters and not glyphs?> > Yes, it is, and I am not certain that Mark's "strong" suspicion is > correct because I have seen a lot of data. But I'll be asking > Egyptologists.> > > >1. LATIN CAPITAL LETTER EGYPTOLOGICAL YOD> >>LATIN SMALL LETTER EGYPTOLOGICAL YOD> >>2. LATIN CAPITAL LETTER EGYPTOLOGICAL AYIN> >>LATIN SMALL LETTER EGYPTOLOGICAL AYIN> >>> >>I strongly suspect that current diacritics (for 1) and modifier letters (for> >>2) are similar enough in shape to what is required that they can be used.> >>Are there any other characters used by Egyptologist that are so close in> >shape to i?? and ?? or ?? that they cannot be used?> > I don't know what i?? and ?? or ?? were meant to be, Mark.> -- > Michael Everson *** Everson Typography *** http://www.evertype.com> 15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland> Telephone +353 86 807 9169 *** Fax +353 1 478 2597 (by arrangement)> > eqypt.gif
Re: Egyptian Transliteration Characters
For what its worth I did not think of doing anything with the YODs because of their close correspondence to 1F30GREEK SMALL LETTER IOTA WITH PSILI 1F38GREEK CAPITAL LETTER IOTA WITH PSILI Which in practice would look all the more like the YODs because of the standard egyptological practice if italicising transliterations. But having said that I certainly have no problem with these characters and this is somewhat more systematic that would be the case were one to use iotas. - Spencer Michael Everson Sent by: unicode-bounce@uTo:[EMAIL PROTECTED] nicode.org cc: Subject:Re: Egyptian Transliteration Characters 27.09.01 12:41 At 15:05 -0700 2001-09-26, §?§Û§?§¶§Í§Â§¶§½ wrote: >Is this the same Unicode that encodes characters and not glyphs? Yes, it is, and I am not certain that Mark's "strong" suspicion is correct because I have seen a lot of data. But I'll be asking Egyptologists. > >1. LATIN CAPITAL LETTER EGYPTOLOGICAL YOD >>LATIN SMALL LETTER EGYPTOLOGICAL YOD >>2. LATIN CAPITAL LETTER EGYPTOLOGICAL AYIN >>LATIN SMALL LETTER EGYPTOLOGICAL AYIN >> >>I strongly suspect that current diacritics (for 1) and modifier letters (for >>2) are similar enough in shape to what is required that they can be used. >>Are there any other characters used by Egyptologist that are so close in >shape to i?? and ?? or ?? that they cannot be used? I don't know what i?? and ?? or ?? were meant to be, Mark. -- Michael Everson *** Everson Typography *** http://www.evertype.com 15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland Telephone +353 86 807 9169 *** Fax +353 1 478 2597 (by arrangement)
Re: Missing Arabic and Syriac characters in Unicode
On 24-Sep-01 Michael Everson wrote: > Miikka-Markus, > > I'd suggest that you write this up as a PDF document (with scanned > examples) and submit it to the UTC and WG2 for consideration. OK. I'll start working on it. I mean, at least the Arabic part of my message. I'm not a professional Semiticist and I live in Finland which is quite far from Syriac-writing countries, so I'm not sure if I can get access to any material about the early forms of Edessan vowels here. I'll see what I can find in Finnish university libraries and consult professionals, too. Is there anyone on this list who could provide more information/samples about the things I wrote? Best regards! -- E-Mail: Miikka-Markus Alhonen <[EMAIL PROTECTED]> Date: 27-Sep-01 Time: 15:45:30 This message was sent by XFMail --
Re: GB18030
GB 18030 is aligned to ISO 10646, which does not define the semantic properties that Unicode does. -- Tom Emerson Basis Technology Corp. Sr. Sinostringologist http://www.basistech.com "Beware the lollipop of mediocrity: lick it once and you suck forever"
Re: Egyptian Transliteration Characters
At 15:05 -0700 2001-09-26, §§Û§§¶§Í§Â§¶§½ wrote: >Is this the same Unicode that encodes characters and not glyphs? Yes, it is, and I am not certain that Mark's "strong" suspicion is correct because I have seen a lot of data. But I'll be asking Egyptologists. > >1. LATIN CAPITAL LETTER EGYPTOLOGICAL YOD >>LATIN SMALL LETTER EGYPTOLOGICAL YOD >>2. LATIN CAPITAL LETTER EGYPTOLOGICAL AYIN >>LATIN SMALL LETTER EGYPTOLOGICAL AYIN >> >>I strongly suspect that current diacritics (for 1) and modifier letters (for >>2) are similar enough in shape to what is required that they can be used. >>Are there any other characters used by Egyptologist that are so close in >shape to i?? and ?? or ?? that they cannot be used? I don't know what i?? and ?? or ?? were meant to be, Mark. -- Michael Everson *** Everson Typography *** http://www.evertype.com 15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland Telephone +353 86 807 9169 *** Fax +353 1 478 2597 (by arrangement)
Re: Cyrillic Q
At 02:48 9/27/2001, Marco Cimarosti wrote: >A lot of time ago, someone on this list mentioned a language, written in the >Cyrillic alphabet, which employed letter "Q", taken from the Latin alphabet. > >Which language is it? Kurdish. The common Cyrillic orthography includes four Latin letterforms that are, as far as I know, unique to Kurdish: U+0051, U+0071 Capital, Small Q U+0057, U+077 Capital, Small W John Hudson Tiro Typeworks www.tiro.com Vancouver, BC [EMAIL PROTECTED] Type is something that you can pick up and hold in your hand. - Harry Carter
Re: Cyrillic Q
On Thu, 27 Sep 2001, Marco Cimarosti wrote: > A lot of time ago, someone on this list mentioned a language, written in the > Cyrillic alphabet, which employed letter "Q", taken from the Latin alphabet. > > Which language is it? IIRC, it was Kurdish. roozbeh
Cyrillic Q
A lot of time ago, someone on this list mentioned a language, written in the Cyrillic alphabet, which employed letter "Q", taken from the Latin alphabet. Which language is it? Are the glyphs for that "Q" identical to Latin in both cases? How is the status of this "Q" in Unicode: is it still unified with Latin Q or has it been allocated as a Cyrillic letter? Thanks in advance. _ Marco