Re: Chinese in VB
Hi Violet, Thank you very much for your reply. Can you please explain in detail of your last paragraph? What do you mean by using TC/SC conversion module to convert between the DBCS encoding? How can I implement Unicode in this situation? What I mean is this: If you *already have* a module that converts between TC and SC, using a DBCS encoding, and you are satisfied with the results, then it would be possible to modify such a converter to use Unicode instead of the DBCS. The data in Unihan.txt can assist you here. If you *do not* have a satisfactory TC/SC conversion routine, then switching to Unicode will not get you one, because the Unicode and ISO/IEC 10646 people have very wisely decided not to get themselves tangled up in that cobweb. You need to define (or explain) what you mean by implementing Unicode. If it means converting your input data to Unicode from GB 2312 or CNS 11643 or Big Five or whatever, that's fine. But don't be misled, Unicode support does not in any way imply support for TC/SC conversion. I had written: Of course, if you already have the TC/SC conversion module and just need to convert between a DBCS encoding (e.g. GB 2312) in order to implement Unicode in the coding, the Unihan.txt file does include these mappings. -Doug Ewell Fullerton, California
Re: Hindi keyboard with the Microsoft Hindi font Mangal
nbarman at randomhouse dot com wrote: I am trying to use the Hindi keyboard with the Hindi font Mangal provided by Microsoft. I'm not understanding what key I should press to get the ZWJ (Zero-Width Joiner) as well as the Zero-Width-Non-Joiner (ZWNJ). I've looked at various keyboard mappings online and not found how to get these characters. I'm able to use these two in Microsoft Word if I go into the character map and find them by their Unicode number value and then assign a keyboard shortcut to them, but otherwise, I've not met with success. I visited Microsoft's keyboard site at: http://www.microsoft.com/globaldev/keyboards/keyboards.asp and couldn't find any mappings for ZWJ or ZWNJ on the Hindi keyboard either. I know that ZWJ is used in Devanagari to form explicit half-consonants. What would ZWNJ do? -Doug Ewell Fullerton, California
Re: Origin of the term i18n
At 10:25 PM 10/14/2002 -0700, you wrote: Hmmph. It was a mildly interesting question at first, and it wouldn't have been too bad to see six or eight responses, but by my count we are up to 52 messages in this thread. (53, counting this one.) The participants have either fallen into a religious debate over which group or individual first came up with the idea -- as if that could ever be proved conclusively -- or have started a fad of coining silly new I don't see it as a religious debate or even a debate at all - after all, the conclusion was for all intents and purposes on my web site already. What is more interesting to me is an exploration of the history of internationalization now that we have more or less settled when i18n was coined. The history is goes through a period of hand wringing about what to even call what we now know as internationalization and localization. It wasn't always so clear cut - I made some calls to people I know who aren't in this community anymore but who were long ago who might provide some insight. I have an article written for me last week by the source in my article last week at my request covering some of the history - further back than we have covered in this thread. I intend to post is ASAP on i18n.com except I had a server crash over the weekend. Hopefully that will be fixed in the morning and I can get the article to you. There is an interesting twist in the story about why, at that time and place, internationalization itself was not sufficient as Mark suggested and it is persuasive to me. Then I intend to raise the question of those who were around longer than me of just how far back does the idea of internationalization actually go and when was that term first used. To me, the two holy grails of computer science from day one have been good chess playing programs and machine translation. So at least back into he mid 1950s there was a need for multilingual computing of some type. I am sure there was a lot of roll your own techniques for a good long time. When did these techniques get a name at all, and what was the name and definition? Was it something other than internationalization? If so how did it morph to what we know now? when did localization come into it? These are important historical questions and I think wholly appropriate for this list. You won't see *this* happen every day, but I'm in almost total agreement with Mark Davis. Some of these number-based abbreviations may be useful at times, but for the most part they're like emoticons -- overuse them, or cross the line inventing new ones, and they immediately become trite and cutesy. One of the signs of a mature specialty is a set of jargon and a set of inside humor. To me, l10n and i18n are the only ones we should use everyday. I respectfully disagree about g11n. The rest may be overdoing it a bit but I see the point if they express a concept of i18n/l10n as applied to a specific region or locale beyond the word spelled out itself. that is the power of jargon and branding both. It has nil to do with Unicode. My research over the last week indicates that the origins of Unicode are very definitely of the same era and from the same community of the people who brought the idea of internationalization to a critical mass, and coined the term i18n. One has not been separable from the other since at least 1989. I can do all that, if it would help kill this thread. Personally I would love to see it all end up being moved to i18n.com. There has been a fair amount of off-list discussion going on, btw. Barry Caplan www.i18n.com
Re: Hindi keyboard with the Microsoft Hindi font Mangal
Doug Ewell wrote, I know that ZWJ is used in Devanagari to form explicit half-consonants. What would ZWNJ do? ZWNJ prevents conjuncts (or half-letter forms) from appearing which forces explicit virama in the display. See Explicit Virama on pp 214-215 of TUS 3.0. Best regards, James Kass.
Re: Origin of the term i18n
Barry Caplan bcaplan at i18n dot com wrote: My research over the last week indicates that the origins of Unicode are very definitely of the same era and from the same community of the people who brought the idea of internationalization to a critical mass, and coined the term i18n. One has not been separable from the other since at least 1989. Just to make sure everyone is clear on this: I am not arguing against the concept of internationalization, or even against occasional use of the abbreviation i18n. I use it myself sometimes, just as I use smileys sometimes. What I am arguing against is going hog-wild making up new obscure abbreviations from the same template, and clogging the Unicode list with them. Anything beyond i18n and l10n is tantamount to the man with glasses smoking a cigar and drooling type of smiley. -Doug Ewell Fullerton, California
Re: Hindi keyboard with the Microsoft Hindi font Mangal
According to this letter recently posted on another list: quote Shift-Control-1 will insert a ZWJ; and Shift-Control-2 will insert a ZWNJ. This combination is present across all Indic keyboards that ship with Windows 2K and later. kr /quote Best regards, James Kass.
Re: the carnival of lost souls
Pavla OR Francis Frazier scripsit: the carnival of lost souls What an expression! Almost makes me want to view the poster to see what inspired it... Googling suggests that this is the title of a film, but the Internet Movie Database (imdb.com) knows it not. -- My corporate data's a mess! John Cowan It's all semi-structured, no less. http://www.ccil.org/~cowan But I'll be carefree[EMAIL PROTECTED] Using XSLT http://www.reutershealth.com In an XML DBMS.
Re: Manchu/Mongolian in Unicode
On Tue, 15 Oct 2002, Stefan Persson wrote: That font also includes some characters mapped to the PUA: A sign, and several #28450; character, many of which look like radicals. Why? Is that something that's also required by that law? It's my experience that many fonts include gunk in the Private Use Area. A quick check of some of the CJK glyphs in the PUA of SimSun-18030 shows that they are not unique, but are also mapped to codepoints in the CJK Radical Supplement and CJK-A blocks for example. I believe that it is intended to maintain a one-to-one correspondence between the GB18030 standard and Unicode, and so there should be no need for any supplementary glyphs in the PUA. The new PRC law is, as you hint, overly restrictive and prescriptive, and is, I think, a serious setback for popularisation of Unicode on the Web. The intent is that GB18030 should replace GB2312 and Big5, and so that instead of the current mishmash of GB2312 (SC) and Big5 (TC) websites, in the future Traditional and Simplified Chinese sites (at least those hosted in China) will use the same GB18030 encoding. Where does this leave websites written in Unicode Chinese ? Out in the cold ! At present web pages written in Unicode Chinese (some of mine for example) are not being indexed by Google, and are ignored by both Yahoo China (SC) and Chinese Yahoo (TC). The situation will certainly not be improved by the replacement of GB2312 and Big5 with GB18030. Andrew
Re: the carnival of lost souls
It's Carnival of Souls, actually. http://us.imdb.com/Title?0055830 is the original version, made by a fellow whose stock-in-trade was those old movies they used to show in high school to teach hygiene and the like. He shot it in something like a week while he was supposed to be on vacation, mostly in Lawrence, Kansas, and Salt Lake City, using the abandoned spa on the Great Salt Lake, Saltair, as a major set. Now, do you think I could have gotten any *more* off-topic than that? On Tuesday, October 15, 2002, at 06:43 AM, John Cowan wrote: Pavla OR Francis Frazier scripsit: the carnival of lost souls What an expression! Almost makes me want to view the poster to see what inspired it... Googling suggests that this is the title of a film, but the Internet Movie Database (imdb.com) knows it not. -- My corporate data's a mess! John Cowan It's all semi-structured, no less.http://www.ccil.org/~cowan But I'll be carefree [EMAIL PROTECTED] Using XSLThttp://www.reutershealth.com In an XML DBMS. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: Origin of the term i18n
At 12:37 AM 10/15/2002 -0700, Doug Ewell wrote: Barry Caplan bcaplan at i18n dot com wrote: What I am arguing against is going hog-wild making up new obscure abbreviations from the same template, and clogging the Unicode list with them. Anything beyond i18n and l10n is tantamount to the man with glasses smoking a cigar and drooling type of smiley. Well, some were used in jest by correspondents who often engage in wordplay on list and off list truth be told. But I pointed out that the scheme is a meme picking up steam, and not just in software. I didn't make up a12n, even though I hadn't seen it used before. I also didn't make up c17g or m17n. I provided evidence of my claims that this is spreading by pointers to the sites. The only reason I did that is because someone (Mark I think but I could be wrong) objected the entire abbreviation scheme. the point is it is not going away and it will probably be used more and more in different types of places. It occurred to me the other day, I haven't had a chance to check this and maybe someone else will, that all 4 character domain names under dot com domain, which means there may be a lot more sites of the form xdx.com or xddx.com. Barry Caplan www.i18n.com
Re: Hindi keyboard with the Microsoft Hindi font Mangal
James Kass jameskass at att dot net wrote: ZWNJ prevents conjuncts (or half-letter forms) from appearing which forces explicit virama in the display. See Explicit Virama on pp 214-215 of TUS 3.0. I looked all over that section, but got caught up in all the little ZWJ boxes and missed that passage. Thanks. I know I had heard that before. Shift-Control-1 will insert a ZWJ; and Shift-Control-2 will insert a ZWNJ. This combination is present across all Indic keyboards that ship with Windows 2K and later. Good news. It's too bad you can't see that combination on the Javascript keyboards at Globaldev. -Doug Ewell Fullerton, California
Re: Hindi keyboard with the Microsoft Hindi font Mangal
From: Doug Ewell [EMAIL PROTECTED] It's too bad you can't see that combination on the Javascript keyboards at Globaldev. The use of either CTRL or CTRLSHIFT shift states in Microsoft-supplied keyboards is very rare. The reason it is rare is that it interferes with programs that use those shift states to perform control actions (such as Microsoft Word). It is also difficult (though not impossible) to query the actual information on these shift states due to the fact that USER will automatically map such keystrokes to control characters (if there is no assigned keystroke in the keyboard layout itself). MichKa
Re: Hindi keyboard with the Microsoft Hindi font Mangal
On Mon, 14 Oct 2002, Doug Ewell wrote: I visited Microsoft's keyboard site at: http://www.microsoft.com/globaldev/keyboards/keyboards.asp and couldn't find any mappings for ZWJ or ZWNJ on the Hindi keyboard either. Which also means that the layouts a bad reference for keyboard layout implementors/designers. MS Farsi keyboard layout also lacks the ZWNJ (which is very frequent in Persian) and ZWJ. MS developers have been made aware of this long ago. roozbeh
Re: Manchu/Mongolian in Unicode
Andrew C. West wrote: On Tue, 15 Oct 2002, Stefan Persson wrote: That font also includes some characters mapped to the PUA: A € sign, and several #28450; character, many of which look like radicals. Why? Is that something that's also required by that law? It's my experience that many fonts include gunk in the Private Use Area. A quick check of some of the CJK glyphs in the PUA of SimSun-18030 shows that they are not unique, but are also mapped to codepoints in the CJK Radical Supplement and CJK-A blocks for example. I may be able to shed some light on this. GB 18030 is really an extension not only of GB 2312, but also of GBK. GBK contained all ideographs from Unicode 2.0, plus of course many other characters. GB 18030 is based on Unicode 3.0. Between 2.0 and 3.0 some characters were added to Unicode that GBK had mapped to the Unicode Private Use Area. GB 18030 maps those characters to their Unicode 3.0 code points instead of PUA ones, and the PUA ones now map instead to linearly enumerated 4-byte sequences. About 80 such characters are affected, among them the Euro sign and the Ideographic Description Sequence characters. (Listed in Appendix E of the GB 18030 standard.) I assume that the font shows glyphs for those 80 or so characters in both the old GBK/Unicode PUA position and for the new GB 18030/Unicode 3.0 real code point. See http://oss.software.ibm.com/icu/docs/papers/gb18030.html I believe that it is intended to maintain a one-to-one correspondence between the GB18030 standard and Unicode, and so there should be no need for any supplementary glyphs in the PUA. The new PRC law is, as you hint, overly restrictive and prescriptive, and is, I think, a serious setback for popularisation of Unicode on the Web. The intent is that GB18030 should replace GB2312 ... and GBK ... and Big5, and so that instead of the current mishmash of GB2312 (SC) and Big5 (TC) websites, in the future Traditional and Simplified Chinese sites (at least those hosted in China) will use the same GB18030 encoding. I am not sure about this. GB 18030 requires to _support_ its new encoding, but I believe it does not require to _use_ it. Most implementations have a converter to/from Unicode, and GB 18030 works quite well for that because it is defined _in terms of_ Unicode. As such, it actually boosts the spread of Unicode-based software. The drawback is of course that a GB 18030 converter requires special code on top of a large mapping table. Where does this leave websites written in Unicode Chinese ? Out in the cold ! At present web pages written in Unicode Chinese (some of mine for example) are not being indexed by Google, and are ignored by both Yahoo China (SC) and Chinese Yahoo (TC). The situation will certainly not be improved by the replacement of GB2312 and Big5 with GB18030. There is no reason for that. You should contact Google to get that fixed. markus -- Opinions expressed here may not reflect my company's positions unless otherwise noted.
Re: Hindi keyboard with the Microsoft Hindi font Mangal
On Tue, 15 Oct 2002 [EMAIL PROTECTED] wrote: quote Shift-Control-1 will insert a ZWJ; and Shift-Control-2 will insert a ZWNJ. This combination is present across all Indic keyboards that ship with Windows 2K and later. kr /quote But sadly, apart from the obvious pain in that certain place to type the shortcut, the trick doesn't work in MS Word. roozbeh
RE: Hindi keyboard with the Microsoft Hindi font Mangal
Both the Persian (Farsi) keyboard and the Devanagari keyboard include both ZWJ and ZWNJ. ZWJ is 1+Shift+Ctrl, while ZWNJ is 2+Shift+Ctrl. The problem appears with the reference tool. I've notified the folks who maintain that site. Thanks for pointing out the problem. FWIW, the same keys are supported by three Arabic, both Divehi and both Syriac, Urdu, and many of our Indic keyboards. John Global Infrastructure -Original Message- From: Roozbeh Pournader [mailto:[EMAIL PROTECTED]] Sent: Tuesday, October 15, 2002 9:44 AM To: Doug Ewell Cc: Unicode Mailing List; [EMAIL PROTECTED] Subject: Re: Hindi keyboard with the Microsoft Hindi font Mangal On Mon, 14 Oct 2002, Doug Ewell wrote: I visited Microsoft's keyboard site at: http://www.microsoft.com/globaldev/keyboards/keyboards.asp and couldn't find any mappings for ZWJ or ZWNJ on the Hindi keyboard either. Which also means that the layouts a bad reference for keyboard layout implementors/designers. MS Farsi keyboard layout also lacks the ZWNJ (which is very frequent in Persian) and ZWJ. MS developers have been made aware of this long ago. roozbeh
RE: Hindi keyboard with the Microsoft Hindi font Mangal
On Tue, 15 Oct 2002, John McConnell wrote: Both the Persian (Farsi) keyboard and the Devanagari keyboard include both ZWJ and ZWNJ. ZWJ is 1+Shift+Ctrl, while ZWNJ is 2+Shift+Ctrl. But these don't work in many applications, and most important of all, Microsoft Word. The problem appears with the reference tool. I've notified the folks who maintain that site. Thanks for pointing out the problem. Then also please notify the corresponding maintainer of keyboard layouts in Windows about the Microsoft Word incompatiblity, and ask him/her to put them on a natural place like Shift+Space for ZWNJ, which is both the practice and the standard. For the national Iranian keyboard layout, see: http://www.farsiweb.info/table/2901-unicode.txt Thanks a lot, roozbeh
Sorting on number of strokes for Traditional Chinese
-Original Message- Date/Time:Tue Oct 15 05:13:41 EDT 2002 Contact: [EMAIL PROTECTED] Report Type: Other Question, Problem, or Feedback To whom concerns, I wonder Unicode provide us a way to do sorting on number of strokes for Traditional Chinese characters. This is urgent, please advise. regards Tony -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- (End of Report)
Re: Sorting on number of strokes for Traditional Chinese
The Unihan database has total stroke count for many (but not all) characters. It may provide an adequate first-order set of data for a pure stroke-based ordering in TC. On Tuesday, October 15, 2002, at 12:02 PM, Magda Danish (Unicode) wrote: -Original Message- Date/Time:Tue Oct 15 05:13:41 EDT 2002 Contact: [EMAIL PROTECTED] Report Type: Other Question, Problem, or Feedback To whom concerns, I wonder Unicode provide us a way to do sorting on number of strokes for Traditional Chinese characters. This is urgent, please advise. regards Tony -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- (End of Report) == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: Hindi keyboard with the Microsoft Hindi font Mangal
Michael (michka) Kaplan michka at trigeminal dot com wrote: The use of either CTRL or CTRLSHIFT shift states in Microsoft- supplied keyboards is very rare. The reason it is rare is that it interferes with programs that use those shift states to perform control actions (such as Microsoft Word). I agree completely that assigning characters to Ctrl+keys, Shift+Ctrl+keys, Alt+keys, and Shift+Alt+keys is a Bad Idea, for the reason you state. It's not just about conflicts with large programs like Word, either. Every Windows program with an edit control allocates at least Ctrl+X for cut, Ctrl+C for copy, and Ctrl+V for paste. It seems inconsistent for Ctrl+1, or even Shift+Ctrl+1, to be a character and not an action. I'm pretty sure I've seen at least one of Microsoft's Globaldev (Javascript) keyboards that used either Ctrl or Alt as a shifting key. I remembered thinking that such a thing was very un-Windows-like. It is also difficult (though not impossible) to query the actual information on these shift states due to the fact that USER will automatically map such keystrokes to control characters (if there is no assigned keystroke in the keyboard layout itself). -Doug Ewell Fullerton, California
Call for Papers: IUC23 (23rd Internationalization and Unicode Conference)
Folks...at long last, here is the announcement for the next Unicode conference. We look forward to seeing your proposals and hope you will join us in Prague! Best regards, Lisa Call for Papers! Twenty-third Internationalization and Unicode Conference (IUC23) Unicode, Internationalization, the Web: The Global Connection Week of March 24-28, 2003 Prague, Czech Republic Send in your submission now! Submissions due: November 15, 2002 Notification date: November 29, 2002 Completed papers due: January 6, 2003 (in electronic form and camera-ready paper form) Just 4 weeks to go! The Internationalization Unicode Conference is the premier technical conference worldwide for both software and Web internationalization. The conference (renamed from Unicode Conference to more accurately reflect its content) features tutorials, lectures, and panel discussions that provide coverage of standards, best practices, and recent advances in the globalization of software and the Internet. Attendees benefit from the wide range of basic to advanced topics and the opportunities for dialog and idea exchange with experts in the field. The conference runs multiple sessions simultaneously to maximize the value provided. New technologies, innovative Internet applications, and the evolving Unicode Standard bring new challenges along with their new capabilities. This technical conference will explore the opportunities created by the latest advances and how to leverage them for global users, as well as potential pitfalls to be aware of, and problem areas that need further research. There will also be demonstrations of best practices for designing applications that can accommodate any language. We invite you to submit papers that relate to Unicode or any aspect of software and Web Internationalization. You can view the programs of previous conferences at: http://www.unicode.org/unicode/conference/about-conf.html CONFERENCE ATTENDEES Conference attendees are generally involved in either the development and deployment of Unicode software, or the globalization of software and the Internet. They include managers, software engineers, systems analysts, font designers, graphic designers, content developers, web designers, web administrators, technical writers, and product marketing personnel. THEME TOPICS International computing is the overall theme of the Conference. Presentations should be geared towards a technical audience. Topics of interest include, but are not limited to, the following (within the context of Unicode, internationalization or localizability): - Internationalization issues with new technologies - XML and Web protocols - The World Wide Web (WWW) - Security concerns e.g. Avoiding the spoofing of UTF-8 data - Impact of new encoding standards - Implementing Unicode: Practical and political hurdles - Implementing new features of recent versions of Unicode - Evaluations (case studies, usability studies) - Natural language processing - Algorithms (e.g. normalization, collation, bidirectional) - Programming languages and libraries (Java, Perl, et al) - Optimizing performance of systems and applications - Search engines - Library and archival concerns - Portable devices - Migrating legacy applications - Cross platform issues - Printing and imaging - Operating systems - Databases - Large scale networks - Government applications - Testing applications - Business models for software development (e.g. Open source) We invite you to submit papers which define tomorrow's computing, demonstrate best practices in computing today, or articulate problems that must be solved before further advances can occur. SESSIONS The Conference Program will provide a wide range of sessions including: - Keynote presentations - Workshops/Tutorials - Technical presentations - Panel sessions All sessions except the Workshops/Tutorials will be of 40 minute duration. In some cases, two consecutive 40 minute program slots may be devoted to a single session. The Workshops/Tutorials will each last approximately three hours. They should be designed to stimulate discussion and participation, using slides and demonstrations. PUBLICATIONS If your paper is accepted, your details will be included in the Conference brochure and Web pages and the paper itself will appear on a Conference CD, with an optional printed book of Conference Proceedings. CONFERENCE LANGUAGE The Conference language is English. All submissions, papers and presentations should be provided in English. SUBMISSIONS Submissions MUST contain: 1. An abstract of 150-250 words, consisting of statement of purpose, paper description, and your conclusions or final summary. Also, if this is a paper for an intermediate or advanced audience, please specify what assumptions you are making about the attendees' prior knowledge. 2. A