Re: Who to make OTF
At 19:48 3/7/2002, K S Rohilla wrote: Hi, all, I am font designer Pl. suggest me Who to make Open Type Fonts Lots of people are making or trying to make OT fonts. The user community for Microsoft's VOLT tool now numbers almost 2,500 people, many of them Indic and Arabic developers. I'm not sure what percentage of these are professional type designers and font developers. There seem to be a lot of enthusiastic but not very experienced amateurs keen to increase the number of fonts supporting their native scripts, but I'm afraid most of the products I have seen are not very good. What are you looking for? Someone to help you make an OT font? Someone to make an OT font from your existing designs? If you are not already a member of the VOLT community, you should join and ask your question there. There are developers, both professional and amateur, working on Devanagari and Bengali fonts, and probably on other Indian scripts. See http://www.microsoft.com/typography/developers/volt/default.htm for more information. John Hudson Tiro Typeworks www.tiro.com Vancouver, BC [EMAIL PROTECTED] ... es ist ein unwiederbringliches Bild der Vergangenheit, das mit jeder Gegenwart zu verschwinden droht, die sich nicht in ihm gemeint erkannte. ... every image of the past that is not recognized by the present as one of its own concerns threatens to disappear irretrievably. Walter Benjamin
RE: Devanagari variations
Peter Constable wrote: On 03/07/2002 02:16:10 PM James E. Agenbroad wrote: A similar but not the same situation is found in the fourth example in figure 9-3 of Unicode 3.0 (page 214) where an intedpendent vowel has the reph (an abridged form of a the consonant 'ra') above it. Unicode wants this encoded as consonant + halant + independent vowel. I believe it is better considered as a consonant + vowel sign combination which happens to have an odd display and at least one Sanskrit textbook agrees. I may be wrong, but I believe that example has ra, halant, ra, independent i . The first ra is the one that transforms into the reph. You are wrong, in fact, sorry. Although figure 9-3 does not show code point values, both the glyphs and the abbreviated letter names make it clear that the sequence is: U+0930 (DEVANAGARI LETTER RA) U+094D (DEVANAGARI SIGN VIRAMA) U+090B (DEVANAGARI LETTER VOCALIC R) James' idea is that the same graphemes could have been better represented with sequence: U+0930 (DEVANAGARI LETTER RA) U+0943 (DEVANAGARI VOWEL SIGN VOCALIC R) It is an interesting idea, because ra never occurs with matra r., so there is no danger of confusion. But it is probably too late for changing it: it would break compatibility with ISCII and existing Unicode fonts. _ Marco
RE: Keyboard Layouts for Office XP in WIndows 98
Lateef Sagar wrote: How can I create such a keyboard layout that can be used with Office XP (in Windows 98). http://www.tavultesoft.com/keyman/ It also works on Win 98. _ Marco
Re: Devanagari variations
At 15:36 -0600 07/03/2002, [EMAIL PROTECTED] wrote: I may be wrong, but I believe that example has ra, halant, ra, independent i . The first ra is the one that transforms into the reph. You're wrong. RI in this case is a way of writing the vocalic r. Compare Kr.s.n.a and Krishna. -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: Devanagari variations
At 15:16 -0500 07/03/2002, James E. Agenbroad wrote: On Wed, 6 Mar 2002 [EMAIL PROTECTED] wrote: On 03/06/2002 08:25:18 AM Michael Everson wrote: [snip] In Cham, independent vowels can take dependent vowel signs. In Devanagari, I guess that doesn't occur, but the Brahmic model shouldn't be understood to preclude this behaviour. [snip] - Peter A similar but not the same situation is found in the fourth example in figure 9-3 of Unicode 3.0 (page 214) where an intedpendent vowel has the reph (an abridged form of a the consonant 'ra') above it. Unicode wants this encoded as consonant + halant + independent vowel. I believe it is better considered as a consonant + vowel sign combination which happens to have an odd display and at least one Sanskrit textbook agrees. Is that the sample you showed me when I was a-photocopying at the Library of Congress in August, James? You're saying that RA + virama + INDEPENDENT VOCALIC R and RA + VOWEL SIGN VOCALIC R should both produce the same glyph? -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: Devanagari variations
Using Apple's WorldText, I can confirm that short I did not reorder correctly when preceded by 0294. But the 0294 glyph was in another font. I wonder could we see some samples of this in actual Limbu text? -- Michael Everson *** Everson Typography *** http://www.evertype.com
RE: Devanagari variations
At 11:26 +0100 2002-03-08, Marco Cimarosti wrote: You are wrong, in fact, sorry. Although figure 9-3 does not show code point values, both the glyphs and the abbreviated letter names make it clear that the sequence is: U+0930 (DEVANAGARI LETTER RA) U+094D (DEVANAGARI SIGN VIRAMA) U+090B (DEVANAGARI LETTER VOCALIC R) James' idea is that the same graphemes could have been better represented with sequence: U+0930 (DEVANAGARI LETTER RA) U+0943 (DEVANAGARI VOWEL SIGN VOCALIC R) It is an interesting idea, because ra never occurs with matra r., so there is no danger of confusion. But it is probably too late for changing it: it would break compatibility with ISCII and existing Unicode fonts. Well, Apple's in WorldText version 1.1 I just typed both of these. The first one displayed as RA VIRAMA (visible) VOCALIC R and the second displayed as REPHA VOCALIC R. So in at least one implementation the latter is supported. -- Michael Everson *** Everson Typography *** http://www.evertype.com
MS Command Prompt
From: Doug Ewell [EMAIL PROTECTED] Indie was doing the right thing by typing Alt+0248 to get the Latin-1 character, instead of Alt+248 to get the MS-DOS character. That isn't the problem. In Windows 95, 98, and NT 4, everything that happens in the command prompt goes through the MS-DOS code page -- 437, 850 or whatever. Since Indie's code page is set to 437, and U+00F8 LATIN SMALL LETTER O WITH STROKE is not in code page 437, the internal conversion tables in NT 4 converted '' to 'o', a reasonable if imperfect fallback. Note that Alt+0243 works just fine, because U+00F3 is in code page 437. Also note that if Indie had been using 850 instead of 437, there would have been no problem, since 850 does include U+00F8. Windows 2000 is different. You can set your command prompt code page to 437 and type Alt+0248, and you will still get the ' ' you want. The Alt+0xxx logic has been decoupled from the active code page issue, which is nice. Martin is right, you can change the code page; but I don't know if that will help Indie. What's kind of fun is that in Windows 2000, you can change your code page to 65001 and do all your command-prompt work in UTF-8. In Windows XP, if I type the Alt+0248 in the command prompt with the font set to raster fonts, I get an o. If I type it in a command prompt with the font set to Lucida Console, I get the ø. However, it only works if I change the font before I type the character. So I am guessing that in XP, whatever code page you have selected, if the default font for the command line doesn't have the character you want, you're stuck with the closest approximation in that font. Don't know if this will help any with NT. Patrick Rourke [EMAIL PROTECTED]
Re: Keyboard Layouts for Office XP in WIndows 98
At 07:37 +0100 2002.03.08, Lateef Sagar wrote: MS Office XP installs many keyboard layouts (like Arabic etc) in Windows 98. For Windows NT/2000/XP there is a shareware software Keyboard Layout Manager 32 bit, but I haven't found out any software yet that allows making a non-ASCII keyboard layout for Windows 98. How can I create such a keyboard layout that can be used with Office XP (in Windows 98). Do you mean the Keyboard Layout Manager at http://www.klm.freeservers.com/index.html ? quote This program allows you to create and modify Microsoft keyboard layout files. It works with Windows 95, Windows 95-OSR/2, Windows 98 and Windows ME operating systems. Also, it works with Windows NT 4.0, Windows XP, and Windows 2000 operating systems. /quote How can I create such a keyboard layout that can be used with Office XP (in Windows 98). Office XP in Windows 98 ?? --
RE: Concerning mathematics
Stefan Persson [mailto:[EMAIL PROTECTED]] asks how in the formula mfågel = 1 kg would the italic å be encoded? Mathematics has a set of standard letters for mathematical symbols. They can include diacritics, which can be expressed using the appropriate combining marks. In your formula mfågel = 1 kg the m is a mathematical symbol, while the fågel is a natural language subscript. Italic shouldn't be used for such a subscript, since italic is used for symbols in mathematical notation (and consequently mathematical journals will change to fågel for this case). Else one might construe fågel to be a subscript consisting of the product of the five variables. Such natural language text is conveniently done with characters from the BMP, although you need some kind of markup to turn it into a subscript. If you insist on using italic for this kind of text and for characters like the italic ø that aren't used in standard mathematical notation, you can fall back to markup. Since such usage is extremely rare and not recommended for mathematical text, it wasn't perceived as important to represent unambiguously in plain text. Murray
Re: MS Command Prompt
Patrick Rourke [EMAIL PROTECTED] wrote: In Windows XP, if I type the Alt+0248 in the command prompt with the font set to raster fonts, I get an o. If I type it in a command prompt with the font set to Lucida Console, I get the ø. However, it only works if I change the font before I type the character. So I am guessing that in XP, whatever code page you have selected, if the default font for the command line doesn't have the character you want, you're stuck with the closest approximation in that font. I hadn't thought of that. In Windows 2000 I am using Lucida Console, while my colleague's NT 4 computer on which I conducted the test was using the Terminal bitmap font. I didn't know the NT 4 system was doing substitutions based on what was available in the font, but it seems that's what's happening. Thanks for the info. -Doug Ewell Fullerton, California
RE: Keyboard Layouts for Office XP in WIndows 98
On 03/08/2002 04:39:49 AM Marco Cimarosti wrote: Lateef Sagar wrote: How can I create such a keyboard layout that can be used with Office XP (in Windows 98). http://www.tavultesoft.com/keyman/ It also works on Win 98. There are some issues to keep in mind in relation to Win9x/Me. I won't explain all the gory details (I probably have sometime earlier on this list), but in a nutshell, for most of the life of Win9x/Me, the characters that could be entered from a keyboard were limited to only those in some Windows codepage, and a given layout couldn't mix characters from different codepages. Late in 2000, MS added a new mechanism that involved using the system message WM_UNICHAR rather than WM_CHAR. This invention was quite slick since it could be used without breaking existing software and without requiring any patches to Windows itself. With old apps, it would just get ignored (not perfect, but not bad). All it would take to use it is (a) an input method that will generate it, and (b) apps that will recognise it. Tavultesoft Keyman will attempt to communicate with an app using WM_UNICHAR. If the app doesn't recognise that message, then Keyman will gracefully resort to plan B -- if the developer of the particular input method included rules for ANSI mode as well as Unicode, then Keyman will fall back to ANSI mode; otherwise, it deactivates that input method (the IM can be reactivated when focus is switched to another app). There are not many apps at this point that support WM_UNICHAR, but Word 2002 is one of them. The other apps in the Office suite do not, however, with the minor exception that the RichEdit control does support it, so it is supported wherever those other apps use the RichEdit control (e.g. the text boxes in search/replace dialogs). (I've been told that Keyman can be used to give full Unicode input support on Win 98 with Internet Messenger; I'm guessing it must be using RichEdit.) If you are using Word 2000, you can obtain an add-in (WordLink) from Tavultesoft that will add support for WM_UNICHAR. One last point: Keyman 5 did not provide support for supplementary plane characters. This will be added in Keyman 6, which will be available this spring. So, if you are on Win9x/Me and want to use Unicode characters that are *not* supported by a Windows codeage, it can be done with certain limitations. Here's a summary: Unicode characters that can be input using Keyman 5Keyman 6 (when released) Word 2000limited by limited by Windows codepages Windows codepages Word 2000 w/ WordLinkall of BMP all (planes 0 - 16) other Office 2000 apps limited by limited by Windows codepages Windows codepages Word 2002all of BMP all (planes 0 - 16) other Office XP apps limited by limited by Windows codepages Windows codepages I'm hoping that when Office dotNet appears that support for WM_UNICHAR will have been added to other apps in the Office suite. (Chris Pratley, can you comment on that?) - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED]
Re: Devanagari variations
On 03/08/2002 06:54:54 AM Michael Everson wrote: Using Apple's WorldText, I can confirm that short I did not reorder correctly when preceded by 0294. But the 0294 glyph was in another font. I wonder could we see some samples of this in actual Limbu text? It's on its way. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED]
Re: Devanagari variations
On 03/08/2002 05:09:46 AM Michael Everson wrote: At 15:36 -0600 07/03/2002, [EMAIL PROTECTED] wrote: I may be wrong, but I believe that example has ra, halant, ra, independent i . The first ra is the one that transforms into the reph. You're wrong. RI in this case is a way of writing the vocalic r. Compare Kr.s.n.a and Krishna. I guess that's what I get for comment on things beyond my ken. Mea culpa. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED]
Re: Concerning mathematics
On 03/08/2002 04:09:14 AM Stefan Persson wrote: The Standard contains several special mathematics characters, such as, for example, A (italic A). But I thought of some letters that might not be fully supported: Let's say that you find a formula like this in some Swedish book: msubfågel/sub = 1 kg Surely sub-scripted qualifiers of this sort -- which, being from a spoken language rather than math, could contain any string using any script -- is something to be handled by MathML and not character encoding. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED]
Re: Devanagari variations
Jim Agenbroad responded (off list): Not quite. On page 214 of 3.0 there is one RA vowel, a halant and a RI vowel: RA(d) + RI(n) -- RI(n) +RA(sup) ( parens in lieu ofsubscript) I didn't realise that RI meant the vocalic R. I mistook it to mean something else. I find it a weakness of that section that such notations are not defined and prominently displayed in an easy-to-find location. Thanks for setting me straight. I should have known you knew what you were talking about. Peter
Re: Devanagari variations
[EMAIL PROTECTED] scripsit: I didn't realise that RI meant the vocalic R. It reflects the modern Hindi pronunciation of Skt /r=/. -- John Cowan [EMAIL PROTECTED] http://www.reutershealth.com I amar prestar aen, han mathon ne nen,http://www.ccil.org/~cowan han mathon ne chae, a han noston ne 'wilith. --Galadriel, _LOTR:FOTR_
RE: Devanagari enthousiasm!
It appears that hindi.exe installs Uniscribe - which, AFAIK, is not permitted by Microsoft - so much for honouring license agreements! That's another reason why they'd package it as an EXE. - rick cameron -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Sent: Wednesday, 6 March 2002 12:14 To: Yaap Raaf Cc: [EMAIL PROTECTED] Subject: Re: Devanagari enthousiasm! On 06-03-2002 04:29:20 PM Yaap Raaf wrote: At 14:02 +0100 2002.03.06, [EMAIL PROTECTED] wrote: I am on a Mac and can't open it, Well, this is going to be a problem for non-Windows clients, I admit. it's a 244K .exe Why an .exe? I don't know if this is what the BBC was trying to do, but using an executable installer package is at least one way to make sure people see the license agreement... Bob
Support for Japanese characters
Need help please. Problem: 1. Current librarybuilt forunix and supports ASCII characters only. 2. This library must now accept wide characters from Japanese client. Facts: -- 1. The library does not really evaluate the Japanese characters to make logical decisions. Webelieve base64 encode the character array to avoid any "bad things happening in the code" (such as hitting a null value or other values that could potential cause problems). 2. Cannot rewrite library in time allowed and don't really need to based on Fact item #1. Plus, pressure to get product to market is greater than internationalizing the library. What I need help with: -- 1.How do I set up an ASCII based unix machine, test application and test environment to send Japanese characters to the library in question. 2. Do I need to create hex input or binary input to represent Japanese characters. Since I'm using a standard keyboard how do we get Japanese characters into the application? 3. What am I not considering here? What gotchas will I come across by not making my library i18nized? Unfortunately, I've never done any i18n or l10n work before so I'm really having trouble figuring out where and how to get started. Any advice is appreciated. Thanks. Eric Ray
Re: Devanagari variations
At 10:29 -0600 2002-03-08, [EMAIL PROTECTED] wrote: Jim Agenbroad responded (off list): Not quite. On page 214 of 3.0 there is one RA vowel, a halant and a RI vowel: RA(d) + RI(n) -- RI(n) +RA(sup) ( parens in lieu ofsubscript) I didn't realise that RI meant the vocalic R. I mistook it to mean something else. I find it a weakness of that section that such notations are not defined and prominently displayed in an easy-to-find location. Actually, I would like to see that written R with dot below. We should use decent transliteration in those notations; why not? -- Michael Everson *** Everson Typography *** http://www.evertype.com
RE: Keyboard Layouts for Office XP in WIndows 98
I should point out that Word2002 does not actually support WM_UNICHAR (actually no OfficeXP app does). Only RichEdit 4.0 (riched20.dll) does. RichEdit is used in many places in the system and in Office and various applets such as WordPad, and likely Messenger, so that can be handy but it is not universal. However, the recommended method for communicating in Unicode to apps including Office is to a) use an NT-based OS such as NT4/Win2000/WindowsXP. Everything just works. b) or use the Text Services Framework, which is shipped in WindowsXP and also in OfficeXp. This is what, I believe Keyman actually uses now to get Unicode in Word2002 on Win98/Me - or the specific Word (object model based) method Peter mentions below. Keep in mind that most OfficeXP installations are now running on either Win2k or WinXP, and this trend is accelerating. The large majority of customers upgrade their OS or their entire machine at the time they acquire major new software. By the time we ship the next release of Office, the % of people who a) want to get a new version of Office and who b) insist on remaining with their old Win9x/ME OS will be very small indeed (not zero, I understand). Generally speaking, the Office team tries to make sure you can do everything on older OSes that we offer on the newer ones, but there is a limit to how much back-porting and investment in workarounds for older OS limitations we will make . We'd rather invest in more powerful features for the newer OSes that most people are using. So it is unlikely we will be improving our multilingual support on Win9x/Me - instead we'll extend it even further on the newer OSes. Chris Sent with OfficeXP on WindowsXP -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Sent: March 8, 2002 08:29 To: [EMAIL PROTECTED] Subject: RE: Keyboard Layouts for Office XP in WIndows 98 On 03/08/2002 04:39:49 AM Marco Cimarosti wrote: Lateef Sagar wrote: How can I create such a keyboard layout that can be used with Office XP (in Windows 98). http://www.tavultesoft.com/keyman/ It also works on Win 98. There are some issues to keep in mind in relation to Win9x/Me. I won't explain all the gory details (I probably have sometime earlier on this list), but in a nutshell, for most of the life of Win9x/Me, the characters that could be entered from a keyboard were limited to only those in some Windows codepage, and a given layout couldn't mix characters from different codepages. Late in 2000, MS added a new mechanism that involved using the system message WM_UNICHAR rather than WM_CHAR. This invention was quite slick since it could be used without breaking existing software and without requiring any patches to Windows itself. With old apps, it would just get ignored (not perfect, but not bad). All it would take to use it is (a) an input method that will generate it, and (b) apps that will recognise it. Tavultesoft Keyman will attempt to communicate with an app using WM_UNICHAR. If the app doesn't recognise that message, then Keyman will gracefully resort to plan B -- if the developer of the particular input method included rules for ANSI mode as well as Unicode, then Keyman will fall back to ANSI mode; otherwise, it deactivates that input method (the IM can be reactivated when focus is switched to another app). There are not many apps at this point that support WM_UNICHAR, but Word 2002 is one of them. The other apps in the Office suite do not, however, with the minor exception that the RichEdit control does support it, so it is supported wherever those other apps use the RichEdit control (e.g. the text boxes in search/replace dialogs). (I've been told that Keyman can be used to give full Unicode input support on Win 98 with Internet Messenger; I'm guessing it must be using RichEdit.) If you are using Word 2000, you can obtain an add-in (WordLink) from Tavultesoft that will add support for WM_UNICHAR. One last point: Keyman 5 did not provide support for supplementary plane characters. This will be added in Keyman 6, which will be available this spring. So, if you are on Win9x/Me and want to use Unicode characters that are *not* supported by a Windows codeage, it can be done with certain limitations. Here's a summary: Unicode characters that can be input using Keyman 5Keyman 6 (when released) Word 2000limited by limited by Windows codepages Windows codepages Word 2000 w/ WordLinkall of BMP all (planes 0 - 16) other Office 2000 apps limited by limited by Windows codepages Windows codepages Word 2002all of BMP all (planes 0 - 16) other Office XP apps limited by limited by
RE: Keyboard Layouts for Office XP in WIndows 98
On 03/08/2002 01:11:37 PM Chris Pratley wrote: I should point out that Word2002 does not actually support WM_UNICHAR (actually no OfficeXP app does). My mistake (how could I forget -- I was disappointed when it didn't quite make it). Word 2002 still needs WordLink, but Publisher 2002 does support WM_UNICHAR. However, the recommended method for communicating in Unicode to apps including Office is to a) use an NT-based OS such as NT4/Win2000/WindowsXP. Everything just works. I quite agree. There are many users who will be on Win98 for a while though (at least, many that I need to support). b) or use the Text Services Framework, which is shipped in WindowsXP and also in OfficeXp. This is what, I believe Keyman actually uses now to get Unicode in Word2002 on Win98/Me - or the specific Word (object model based) method Peter mentions below. Not yet. It will in Keyman 6. I'll revise my summary in relation to MS apps and Win9x/Me Unicode characters that can be input using Keyman 5Keyman 6 (when released) Word 2000limited by limited by Windows codepages Windows codepages Word 2000 w/ WordLinkall of BMP all (planes 0 - 16) other Office 2000 apps limited by limited by Windows codepages Windows codepages Word 2002limited by all (planes 0 - 16) Windows codepages Word 2002 w/ WordLinkall of BMP all (planes 0 - 16) Publisher 2002 all of BMP all (planes 0 - 16) other Office XP apps limited by limited by Windows codepages Windows codepages - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED]
Re: Devanagari variations
On Fri, 8 Mar 2002, Michael Everson wrote: At 15:16 -0500 07/03/2002, James E. Agenbroad wrote: On Wed, 6 Mar 2002 [EMAIL PROTECTED] wrote: On 03/06/2002 08:25:18 AM Michael Everson wrote: [snip] In Cham, independent vowels can take dependent vowel signs. In Devanagari, I guess that doesn't occur, but the Brahmic model shouldn't be understood to preclude this behaviour. [snip] - Peter A similar but not the same situation is found in the fourth example in figure 9-3 of Unicode 3.0 (page 214) where an intedpendent vowel has the reph (an abridged form of a the consonant 'ra') above it. Unicode wants this encoded as consonant + halant + independent vowel. I believe it is better considered as a consonant + vowel sign combination which happens to have an odd display and at least one Sanskrit textbook agrees. Is that the sample you showed me when I was a-photocopying at the Library of Congress in August, James? You're saying that RA + virama + INDEPENDENT VOCALIC R and RA + VOWEL SIGN VOCALIC R should both produce the same glyph? -- Michael Everson *** Everson Typography *** http://www.evertype.com Friday, March 8, 2002 Michael, Yes. [Call lme Jim] Regards, Jim Agenbroad ( [EMAIL PROTECTED] ) It is not true that people stop pursuing their dreams because they grow old, they grow old because they stop pursuing their dreams. Adapted from a letter by Gabriel Garcia Marquez. The above are purely personal opinions, not necessarily the official views of any government or any agency of any. Addresses: Office: Phone: 202 707-9612; Fax: 202 707-0955; US mail: I.T.S. Sys.Dev.Gp.4, Library of Congress, 101 Independence Ave. SE, Washington, D.C. 20540-9334 U.S.A. Home: Phone: 301 946-7326; US mail: Box 291, Garrett Park, MD 20896.
Re: Support for Japanese characters
At 12:21 PM 3/8/2002 -0600, Eric Ray wrote: Need help please. Problem: 1. Current library built for unix and supports ASCII characters only. 2. This library must now accept wide characters from Japanese client. You need to doublebyte enable the library except for the most trivial uses. Doing so is not trivial. Facts: -- 1. The library does not really evaluate the Japanese characters to make logical decisions. If the data just passes through, that might be relatively trivial. We believe base64 encode the character array to avoid any bad things happening in the code (such as hitting a null value or other values that could potential cause problems). Is the (non-Japanese) data already base 64 encoded? If so, why? Why create trouble handling that just to avoid checking for null values? Anyway, if you really aren't going to process the Japanese characters in this library except to pass them thru, then you need to take the Japanese text, base64 encode it, and then pass it to the library the usual way. Then retrieve it the usual way and base64 unencode and voila! Of course this may just move your questions to other parts of your program, but you haven't asked about those places. without knowing what the application is or what the configuration is except unix it is hard to say more. 2. Cannot rewrite library in time allowed and don't really need to based on Fact item #1. Plus, pressure to get product to market is greater than internationalizing the library. This is probably a guaranteed method to fail in Japan. Japanese users and your Japanese partners if you have them have had many years of experience with bad software form the us that claims to work. They will know how to break it quickly. Then you will learn a hard lesson about doing business with Japanese while not taking heed of the well known requirement for quality. What I need help with: -- 1. How do I set up an ASCII based unix machine, test application and test environment to send Japanese characters to the library in question. I see from your web site that the application is likely some sort of encryption device, possibly for email. Having run the Japanese software group at an email company in the past,I can tell you Japanese email is fraught with its own perils under any circumstances. Without knowing what the actual channel is that you want to pass the text thru, it is hard to say how you will want to test it. You also have not described the time schedule and why you consider it tight. Is it safe to assume that your plan to counteract any lack of experience and time schedule is to spend money to hire someone who has both? 2. Do I need to create hex input or binary input to represent Japanese characters. Since I'm using a standard keyboard how do we get Japanese characters into the application? Use the Japanese Input Method Editor supplied with or for the operating system. But that does not guarantee that the data will actually get to the application properly if the application has not been coded to handle it. This is part of internationalizing your code, and now you see why skipping corners during the initial development is coming back to haunt you. 3. What am I not considering here? What gotchas will I come across by not making my library i18nized? The gotchas are going to fall into the categories of Won't work or Data passes thru ok, but the rest of the application doesn't know how to handle it. OTTOMH, I would watch out for endianness when you base64 encode Japanese multibyte text too. Probably OK, but worth taking a close look at. Unfortunately, I've never done any i18n or l10n work before so I'm really having trouble figuring out where and how to get started. Any advice is appreciated. There is no magic bullet here in general. if Zixit values the opportunity in Japan, I would suggest you be open to the offers you are sure to get from experienced folks to assist you. If you don't get any, contact me off-list and I will put you in touch with some. Barry Caplan Publisher, www.i18n.com
Re: Devanagari variations
On Fri, 8 Mar 2002 [EMAIL PROTECTED] wrote: Jim Agenbroad responded (off list): Not quite. On page 214 of 3.0 there is one RA vowel, a halant and a RI vowel: RA(d) + RI(n) -- RI(n) +RA(sup) ( parens in lieu ofsubscript) I didn't realise that RI meant the vocalic R. I mistook it to mean something else. I find it a weakness of that section that such notations are not defined and prominently displayed in an easy-to-find location. Thanks for setting me straight. I should have known you knew what you were talking about. Peter Friday, March 8, 2002 Peter, I agree there is a weakness there. Maybe more than one. I have mailed you (Peter) the Deshpande and Monier Williams examples I cited. Have a nice weekend all! Regards, Jim Agenbroad ( [EMAIL PROTECTED] ) It is not true that people stop pursuing their dreams because they grow old, they grow old because they stop pursuing their dreams. Adapted from a letter by Gabriel Garcia Marquez. The above are purely personal opinions, not necessarily the official views of any government or any agency of any. Addresses: Office: Phone: 202 707-9612; Fax: 202 707-0955; US mail: I.T.S. Sys.Dev.Gp.4, Library of Congress, 101 Independence Ave. SE, Washington, D.C. 20540-9334 U.S.A. Home: Phone: 301 946-7326; US mail: Box 291, Garrett Park, MD 20896.