RE: Devanagari
I came across 'Where is my character?' page and read that there is a combination of keystrokes to represent the Indic half forms, such as KA and Halant combines to form half KA. Also there is a list of other letter representation through combination of Devanagari letters. Please email me the list for my ready reference. Best Regards, Vipul Garg Mind Axis (I) Solutions Pvt. Ltd. Phone: +91 (22) 55994860 / 61 -Original Message- From: John Cowan [mailto:[EMAIL PROTECTED]] Sent: Tuesday, December 03, 2002 5:33 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: Devanagari Vipul Garg scripsit: > I have downloaded your font chart for Devanagari, which is in the range > from 0900 to 097F. I have also installed the Arial Unicode font supplied > by Microsoft office XP suite. I found that not all characters are > available for Devanagari. For example letters such as Aadha KA, Aadha > KHA, Aadha GA etc. are not available. This is not a Unicode problem. Arial Unicode is not designed to handle Indic scripts; it does not contain the necessary ligatures and half forms. You need to use a more suitable font. -- John Cowan <[EMAIL PROTECTED]> http://www.ccil.org/~cowan "One time I called in to the central system and started working on a big thick 'sed' and 'awk' heavy duty data bashing script. One of the geologists came by, looked over my shoulder and said 'Oh, that happens to me too. Try hanging up and phoning in again.'" --Beverly Erlebacher
Re: Default properties for PUA characters???
> characters*, we have found that is generally best practice to interpret the I should make it clear that the "we" above does not refer to the Unicode consortium! Mark __ http://www.macchiato.com ► “Eppur si muove” ◄ - Original Message - From: "Mark Davis" <[EMAIL PROTECTED]> To: "John Cowan" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Tuesday, December 03, 2002 10:23 Subject: Re: Default properties for PUA characters??? > Ken is correct: the default properties are somewhat different for ideographs > than for PUAs. In addition, PUAs are a special case compared to other > characters; implementations are free, within very broad limits, to change > the default properties associated with a PUA code point to whatever is > appropriate to whatever private-use character definition the application > gives to that code point. > > In other words, an application, if it treats a particular PUA as an > ideograph, is free to change the default properties to match Ken's list (and > for other properties): > > gc=Lo (general category = Other_Letter) > ccc=0 (combining class = 0, i.e. Not_Reordered) > bc=L(bidi class = strong Left_To_Right) > sc=Hani (script = Han) > lb=ID (line break = Ideographic) > ea=W(east asian width = Wide) > > If an application treated a particular PUA character as a Greek Linear B > character, on the other hand, it would assign yet different properties. > > Now in practice, the vast majority of PUA characters in use are representing > ideographs, mapped from East Asian standards. Due to this fact, *in the > absence of other protocols establishing the precise usage of the PUA > characters*, we have found that is generally best practice to interpret the > PUA characters as ideographs. However, applications are free to interpret > them however they want. > > Mark > __ > http://www.macchiato.com > ► “Eppur si muove” ◄ > > - Original Message - > From: "John Cowan" <[EMAIL PROTECTED]> > To: <[EMAIL PROTECTED]> > Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> > Sent: Monday, December 02, 2002 21:08 > Subject: Re: Default properties for PUA characters??? > > > > Kenneth Whistler scripsit: > > > > > So I'd say that the XML Core WG has got the situation only > > > partially correct for Unicode PUA characters. > > > > As the actual author of that Core WG text, mea culpa. But I was basing > > my remarks on things said on this list. > > > > -- > > All Gaul is divided into three parts: the part John Cowan > > that cooks with lard and goose fat, the part > www.ccil.org/~cowan > > that cooks with olive oil, and the part that > www.reutershealth.com > > cooks with butter. -- David Chessler > [EMAIL PROTECTED] > > > > > > >
Re: Possible proposal for new Hebrew accent character
On 11/28/2002 06:05:27 AM Julian Gilbey wrote: >I'd like to ask for people's advice before submitting a proposal. > >In the Hebrew part of Unicode, there are a range of positions >allocated to Biblical accents (U0591-U05AE). In particular, one of >them is: > 05AA: HEBREW ACCENT YERAH BEN YOMO >with a note "= galgal". > >Now, from my recent studies in the field, it appears that in the books >of Psalms, Proverbs and Job, which use a different accentuation system >from the rest of the Bible (Old Testament), there are two similar >accents which are often printed in the same way, but which were >clearly distinguished and written differently in the early >manuscripts. One is usually called the GALGAL, the other is called >ETNAH HAFUKH. (I don't recall offhand which one is the same symbol as >the YERAH BEN YOMO.) > >Would it be reasonable to propose the addition of a new accent symbol >ETNAH HAFUKH (or GALGAL, with the note attached to YERAH BEN YOMO >changed to "= etnah hafukh")? It would be helpful to see some visual samples. In the sources I have access to, I haven't come across any references to "etnah hafukh" (though I'm just getting into these, so may well have missed something that's there), so I'm not sure what it is or what it looks like. Yeivin (Introduction to the Tiberian Masorah) mentions two shapes for galgal, a semi-circle open on top used in manuscripts, and a "v" shape used in printed texts. Are these the two accents you're referring to? If so, are both used contrastively in MSS (e.g. both used in close proximity in a single mss)? If there's a distinct character out there, it's always possible to add it. We just need enough info making clear what's needed. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485
Re: Default properties for PUA characters???
On 12/02/2002 07:27:40 PM Christian Wittern wrote: >Leaving aside the red light that flashed in my head on the notion of >the W3C recommending PUA (for interchange?) That isn't necessarily a problem given that the fundamental assumption of PUA characters is that their semantics is determined by prior agreement of the parties involved in interchange. As a solution for unencoded characters, it has some shortcomings (and it's possible to imagine other solutions to the problem), but it can be useful if users understand the limitations. BTW, Christian, I have continued to be interested in where TEI goes with this issue as it's still an issue I need to deal with for other contexts, though I haven't had any opportunity to give any time to it. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485
RE: Proposal to add Bengali Khanda Ta
On 12/02/2002 10:23:54 AM "Andy White" wrote: >> > Marco wrote >> >> My counter-proposal is: >> >>09A4 + 034F + 09CD [= Khanda Ta] >>(TA + CGJ + VIRAMA) >I thought about your proposal and checked up on the semantics of CGJ. The >Standard states, > "In particular, inserting a combining grapheme joiner between two >characters has no effect on their ligation or cursive joining behaviour" >Would that mean that CGJ should not change the shape of Ta Virama? >Any way I have a new counter, counter proposal. See my nest message. I don't think it would be a good idea to use CGJ for some new Indic shaping control; that kind of thing is only likely to get us into trouble down the road. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485
Re: Proposal to add Bengali Khanda Ta
Hi folks, This post is a bit long, so here is a resume: - regarding the encodings of TMA, they are currently several possibilities, so it should be possible to sort all "normal" cases with current characters. - however, this shows that ISCII provides a characetr, INV, with no counter part in Unicode. Perhaps this is the problem to be solved. Andy White wrote on 2002-11-29 13:21:14Z: Marco wrote - Does ISCII have a way to distinguish the two cases above and the other possible combinations? I mean: 1. Ta_Ma_Ligature, 2. Khanda_Ta + Ma, 3. Half_Ta + Ma, 4. Ta + Virama + Ma. 1. Ta_Ma_Ligature is simply 'ta virama ma' 2. Khanda_Ta + Ma, is 'ta virama virama ma' (equivalent to 'ta virama zwnj ma') 3. Half_Ta + Ma is 'ta virama inv ma' (equivalent to 'ta virama zwj ma') I fail to understand why it cannot (also) be coded as 'ta halant nukta ma' using the "soft-halant" feature of ISCII, which is supposed to do just that (see IS13194:1991 6.3.2) I know iLeap (and ISFA in general) renders it incorrectly, but when I read 6.3.2 ("prevents it from combining with the following consonant"), I believe that the iLeap software is in error here. 4. Ta + Virama + Ma should be 'ta virama virama inv ma' but this is not implemented in the iLeap application I am using! I got an acceptable result with 'ta inv halant ma'. Of course this is a complete hack (for example, a romanisation of the result will show the incorrectness), but for visual purposes ony, it does the job. And since Ta + visible halant is not supposed to be anything useful for normal writing (i.e. only useful for school taughing or similar tasks, as I understand things; at least no Bengali words are supposed to be written this way), it seems to me The problem I have, and it is very well synthetised by Andy and Marco here, is that in ISCII-91 I see *three* mechanisms to vary the rendering "Explicit halant", coded E8 E8, described in 6.3.1 "Soft Halant", coded E8 E9, described in 6.3.2 "Invisible consonant INV", coded D9, described in 6.4, which further may combine with the other two, but is intended only for rendering purpose At the same time, Unicode (3.0) does only provide *two* mechanisms inserting ZWNJ after virama, called "Explicit Virama" inserting ZWJ after virama, called "Explicit Half-Consonant" There is little doubt that "Explicit Virama" and "Explicit Halant" can be paired: their descriptions are very similar. However, I remember reading in Unicode 1.0 (unfortunately, I did have it at hands) that the position at DA (INV consonant, according to ISCII-88) was equated to the ZWJ. While it might appear correct for some cases, I do not believe this is correct. The Indic FAQ also has words on the topic, but there is many things to comment on this FAQ, so I won't elaborate further (however, if the editor is reading, please contact me.) I believe ZWJ could be equated to Soft Halant, as the description are similar (except the well-known exception of the eyelash-ra, as stated in Unicode 2.0), despite the important difference in words. I understand that now Malayalam cillus are to be encoded with ZWJ, too. As a result, we are left with one code in ISCII-91, INV (D9), which is indeed quite special (its description makes clear it is not used to write some sound, it is merely an artefact, useful for specialized tasks), that ends with no corresponding in Unicode, at least that I may spot at once (remember, it should be a character that shares the properties of the "regular" consonants, i.e. ligating before or after virama, or before vowel signs.) Perhaps, as the discussion above showed, this is really this character that appears to be missing to perform specialized tasks with Indic scripts? (such as the Malayalam Half-U that I were speaking about last month.) Andy's new proposal, CBM, is a bit different, since it affects precise rules to solve some cases. The thing that makes me a bit reluctant, is that there is no previous art with CBM, so we can be wrong a couple of times, with subsequent rectifications, erratas and change of meaning, overall bad things. On the other hand, including a new character, with the same semantics as already present in ISCII, would ease some conversions (I know it would be few), and also provide a reference to implement. Having say that, the first example of Andy, with the relatives priorities of reph versus jophola (and similar examples between reph and rakar-vattu/vakar/yakar/lakar) remains to be examined in more details. Regards, Antoine
Re: Proposal for addition of CONSONANT BASE MARKER to the BMP
On 12/02/2002 10:40:16 AM "Andy White" wrote: >Please see and comment on my first rough draft proposal for the new CBM >character. >http://www.exnet.btinternet.co.uk/uniprop/proposalform.htm > >Question: Is there a better name for this character? >Would there be any use for it in non-Indic scripts? This seems to be adding one additional shaping control for Indic scripts. I don't think there's any obvious application to non-Indic scripts, but if we had one more shaping control and something came up for some non-Indic script, then someone could well suggest the two functions be controlled by the same character. Mind you, there *is* a way to add a new Indic shaping control function without adding a new character: shaping behaviours are defined for virama + ZWJ and virama + ZWNJ; it would be possible to define Indic shaping behaviours for virama + ZWJ + ZWNJ, and other shaping functions using other permutations. Just mentioning possibilities... - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485
Re: code points in MS word
On 12/03/2002 01:18:54 PM Rick McGowan wrote: >> I am new to this list. I like to know if there is method to input >> hexadecimal code points into a file on Windows and use MS Word to see the >> actual character ? > >Yes, if you have a recent enough MS Word (2000 or newer). Please see the >FAQ page below for some info: > >http://www.unicode.org/unicode/faq/font_keyboard.html Actually, I might be wrong, but I got the impression that he wants to have a bunch of USVs as text, like "0041 0020 0042 ... " in the document. You can't do that in Word, but I have written a function that allows me to do it in Excel: you enter USVs in a cell, and display the corresponding character in another cell by entering a formula in that other cell. I suppose it would be possible to write a VBA program for Word that read USV stings and entered the corresponding characters somewhere else in the doc. The only problem is telling it where to look for the source USV strings and where to enter the corresponding characters. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485
Re: Localized names of character ranges
Doug, seconding a suggestion by Marco, wrote: > I agree > that a multilingual Unicode glossary should be assembled (possibly as a > volunteer project) and officially endorsed by the Unicode Consortium, so > users and vendors will be on common terminological ground. In general, I favor such an activity, although at the moment it would have to be something done by outside volunteers, as the UTC editorial committee doesn't have the bandwidth now (in the crunch for Unicode 4.0) to undertake more open-ended responsibilities. My caution, however, is that the terminology used by the Unicode Standard is still evolving -- as witness the ongoing arguments about some of the terminology related to the character encoding model. The glossary in Unicode 4.0 will be substantially revised in some of the key points having a bearing on the Unicode encoding model. And as more content is added to the standard, additional terms keep accumulating in the glossary as well. And it will be some time before the online glossary can be completely synched back up with the Unicode 4.0 glossary. Once people start maintaining a multilingual glossary based on the online glossary (or supplemented from other sources), the burden of maintenance will escalate rapidly for any change introduced to terminology. These things only work if there is an ongoing institutional commitment to maintenance and updates. Otherwise all the translated versions start to get out-of-synch quickly, both with the English original and with each other. This can lead to dangerous misunderstandings among people who assume that their own translated version is accurate. So if anyone wants to undertake such an effort, don't forget to provide for ongoing maintenance and for the fact that eager volunteers tend to drop like flies when repeatedly forced to update their work at irregular intervals. --Ken
Re: code points in MS word
> Raghupathy, Ramesh asked... > Hi, > > I am new to this list. I like to know if there is method to input > hexadecimal code points into a file on Windows and use MS Word to see the > actual character ? Yes, if you have a recent enough MS Word (2000 or newer). Please see the FAQ page below for some info: http://www.unicode.org/unicode/faq/font_keyboard.html Rick
Re: Default properties for PUA characters???
Ken is correct: the default properties are somewhat different for ideographs than for PUAs. In addition, PUAs are a special case compared to other characters; implementations are free, within very broad limits, to change the default properties associated with a PUA code point to whatever is appropriate to whatever private-use character definition the application gives to that code point. In other words, an application, if it treats a particular PUA as an ideograph, is free to change the default properties to match Ken's list (and for other properties): gc=Lo (general category = Other_Letter) ccc=0 (combining class = 0, i.e. Not_Reordered) bc=L(bidi class = strong Left_To_Right) sc=Hani (script = Han) lb=ID (line break = Ideographic) ea=W(east asian width = Wide) If an application treated a particular PUA character as a Greek Linear B character, on the other hand, it would assign yet different properties. Now in practice, the vast majority of PUA characters in use are representing ideographs, mapped from East Asian standards. Due to this fact, *in the absence of other protocols establishing the precise usage of the PUA characters*, we have found that is generally best practice to interpret the PUA characters as ideographs. However, applications are free to interpret them however they want. Mark __ http://www.macchiato.com ► “Eppur si muove” ◄ - Original Message - From: "John Cowan" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Monday, December 02, 2002 21:08 Subject: Re: Default properties for PUA characters??? > Kenneth Whistler scripsit: > > > So I'd say that the XML Core WG has got the situation only > > partially correct for Unicode PUA characters. > > As the actual author of that Core WG text, mea culpa. But I was basing > my remarks on things said on this list. > > -- > All Gaul is divided into three parts: the part John Cowan > that cooks with lard and goose fat, the part www.ccil.org/~cowan > that cooks with olive oil, and the part that www.reutershealth.com > cooks with butter. -- David Chessler [EMAIL PROTECTED] > >
Indo-Iranian scripts
At the 3rd Iranian Unicode Conference, held this past weekend in Prague, repertoires and character names were agreed for the Old Persian script, the Manichaean script, and for the Avestan script. Pahlavi extensions are to be part of the Avestan block. The code tables and proposal summary forms will, I hope, be available prior to the WG2 meeting. -- Michael Everson * * Everson Typography * * http://www.evertype.com
code points in MS word
Hi, I am new to this list. I like to know if there is method to input hexadecimal code points into a file on Windows and use MS Word to see the actual character ? For example: I like to input 0x0041 in a file and when I use MS Word to read this file it should show me LATIN CAPITAL LETTER A. Thanks.
Re: Localized names of character ranges
Marco Cimarosti wrote: > I suggest that the 400-odd property and value name be listed in a > text file on the Unicode FTP site (with each English term well > commented and explained) and translations be collected on a voluntary > basis has was done for the "What's Unicode?" text. The copyright on > this material should grant free and unrestricted usage to any > implementation such as ICU. Microsoft has done this on a few occasions for Windows terminology. There was a book in 1993 called "The GUI Guide" which included translations for about 450 common terms and menu items used in Windows, such as "combo box" and "Match Whole Word Only." Similar lists have been published in other Microsoft books, such as "The Microsoft Guide to User Interface" (but not, I believe, in its successor, "The Windows User Experience"). Obviously Microsoft's intent is that, if it's important for these terms to be clearly defined and distinguished in English, it's important in other languages as well. This is equally true for Unicode, which, as Marco said, uses many common words such as "character" and "case" in very specific ways. I agree that a multilingual Unicode glossary should be assembled (possibly as a volunteer project) and officially endorsed by the Unicode Consortium, so users and vendors will be on common terminological ground. -Doug Ewell Fullerton, California
Re: Devanagari
[EMAIL PROTECTED] scripsit: > Au contraire! You might find the attached gif of interest. (This is version > 1.0 of the font. Some people might have earlier versions.) Ah, excellent. It has not always been so. > If you're not getting Indic shaping with Arial Unicode MS, it's very likely > the fault of your software, not the font (and, of course, not Unicode). Indeed, but the original poster specified the use of XP (Windows or Office, I forget which), so I discounted that. -- They do not preach John Cowan that their God will rouse them[EMAIL PROTECTED] A little before the nuts work loose.http://www.ccil.org/~cowan They do not teach http://www.reutershealth.com that His Pity allows them --Rudyard Kipling, to drop their job when they damn-well choose. "The Sons of Martha"
[OT] document summarization
FYI, from The Linguist List: "MEAD is a multi-document summarization system with multi-lingual capabilities..." For details go to http://linguistlist.org/issues/13/13-3157.html. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485
Re: CJK fonts
Hello Andrew, Many thanks. No this font is new to me, and I will download it. Meanwhile I have been on the phone to order Word 2002, so I should be up to date now. When I have finished debugging it I will pass on to you the look-up program that I wrote for unihan.txt. It is a pity about all the mistakes that you mentioned. Raymond At 03:45 AM 12/3/2002 -0800, you wrote: don't know if anyone mentioned the SimSun-18030 font. This font has all of CJK-A (but none of CJK-B), and is freely available as part of Microsoft's GB18030 support package, downloadable from http://www.microsoft.com/china/windows2000/downloads/18030.asp Regards, Andrew
RE: Devanagari
Vipul Garg wrote: > I have downloaded your font chart for Devanagari, which is in > the range from 0900 to 097F. I have also installed the Arial > Unicode font supplied by Microsoft office XP suite. I found > that not all characters are available for Devanagari. For > example letters such as Aadha KA, Aadha KHA, Aadha GA etc. > are not available. > > These letters are required in the devanagari words such as > KANYA, NANHA, PARMATMA etc. > > If you could provide the above letters then our requirement > for formation of Devanagari words would be possible. This > requirement is very crucial as we have a large volume project > on Devanagari language involving data storage in Oracle database. > > Would appreciate an early reply. Please, see document "Where is my character": http://www.unicode.org/unicode/standard/where/ Also have a look to question 17 in the "Indic" FAQ: http://www.unicode.org/unicode/faq/indic.html#17 All is explained in more detail in Section 9.1 "Devanagari" of the Unicode manual: http://www.unicode.org/unicode/uni2book/ch09.pdf Regards. M.C.
RE: Devanagari
Vipal Garg was asking why half characters were not included in Unicode code charts and in his copy of Arial Unicode font. More recent versions of Arial Unicode Do contain half characters etc. for Devanagari. As to the code charts, to answer this, you needed to explore the Unicode web site a bit more to find the answer. Please see the following for detailed information regarding the half characters etc: http://www.unicode.org/unicode/standard/where/ http://www.unicode.org/unicode/faq/indic.html http://www.unicode.org/unicode/uni2book/ch09.pdf Best Regards Andy You Wrote: I have downloaded your font chart for Devanagari, which is in the range from 0900 to 097F. I have also installed the Arial Unicode font supplied by Microsoft office XP suite. I found that not all characters are available for Devanagari. For example letters such as Aadha KA, Aadha KHA, Aadha GA etc. are not available. These letters are required in the devanagari words such as KANYA, NANHA, PARMATMA etc.
RE: Devanagari
Vipul Garg wrote: > I have downloaded your font chart for Devanagari, which is in the range > from 0900 to 097F. I have also installed the Arial Unicode font supplied > by Microsoft office XP suite. I found that not all characters are > available for Devanagari. For example letters such as Aadha KA, Aadha KHA, > Aadha GA etc. are not available. > > These letters are required in the devanagari words such as KANYA, NANHA, > PARMATMA etc. > > If you could provide the above letters then our requirement for formation > of Devanagari words would be possible. This requirement is very crucial as > we have a large volume project on Devanagari language involving data > storage in Oracle database. > You could try using a different font, for example one of the specialist Devanagari fonts listed at: http://www.alanwood.net/unicode/fonts.html#devanagari Alan Wood http://www.alanwood.net (Unicode, special characters, pesticide names)
Re: Devanagari
Vipul Garg scripsit: > I have downloaded your font chart for Devanagari, which is in the range > from 0900 to 097F. I have also installed the Arial Unicode font supplied > by Microsoft office XP suite. I found that not all characters are > available for Devanagari. For example letters such as Aadha KA, Aadha > KHA, Aadha GA etc. are not available. This is not a Unicode problem. Arial Unicode is not designed to handle Indic scripts; it does not contain the necessary ligatures and half forms. You need to use a more suitable font. -- John Cowan <[EMAIL PROTECTED]> http://www.ccil.org/~cowan "One time I called in to the central system and started working on a big thick 'sed' and 'awk' heavy duty data bashing script. One of the geologists came by, looked over my shoulder and said 'Oh, that happens to me too. Try hanging up and phoning in again.'" --Beverly Erlebacher
Devanagari
I have downloaded your font chart for Devanagari, which is in the range from 0900 to 097F. I have also installed the Arial Unicode font supplied by Microsoft office XP suite. I found that not all characters are available for Devanagari. For example letters such as Aadha KA, Aadha KHA, Aadha GA etc. are not available. These letters are required in the devanagari words such as KANYA, NANHA, PARMATMA etc. If you could provide the above letters then our requirement for formation of Devanagari words would be possible. This requirement is very crucial as we have a large volume project on Devanagari language involving data storage in Oracle database. Would appreciate an early reply. Best Regards, Vipul Garg Phone: (022) 55994861 BEGIN:VCARD VERSION:2.1 N:Garg;Vipul FN:Vipul Garg ([EMAIL PROTECTED]) ORG:Mind Axis (I) Solutions Pvt. Ltd. TITLE:Project Director TEL;WORK;VOICE:91-22-55994860 TEL;WORK;FAX:91-22-55994861 ADR;WORK;ENCODING=QUOTED-PRINTABLE:;;A-203, Hamilton,=0D=0AHiranandani Estate,=0D=0AGhodbunder Road,=0D=0APatli= pada,;Thane (W);Maharashtra;400607;India LABEL;WORK;ENCODING=QUOTED-PRINTABLE:A-203, Hamilton,=0D=0AHiranandani Estate,=0D=0AGhodbunder Road,=0D=0APatlipa= da,=0D=0AThane (W), Maharashtra 400607=0D=0AIndia URL: URL:http://www.mindaxis.com EMAIL;PREF;INTERNET:[EMAIL PROTECTED] REV:20021118T122317Z END:VCARD
Re: CJK fonts
Michka, I see I will have to upgrade to XP from my Office 2000 (0n Win2000). I suppose I can install it on Win2000, without going up also to Win XP. I have tried the font Ming(for ISO10646) but that has only a small part of Ext A. Thanks to all for your help. Raymond
Re: Unihan Mandarin Readings
"John H. Jenkins" wrote: > Certainly in the Unicode 4.0 time-frame we can improve things. I can't > make any guarantees, however. Thanks for the response. I've got an old 3.1 version of the Unihan database at home, and I was going to complain that the Radical.Stroke index values given for U+20003 through U+200ED inclusive are in fact the Radical and Stroke index values for the preceding ideograph. Checking the latest (3.2) version of the Unihan database this morning I found that this problem has now been fixed (not the Mandarin readings though), so I guess it pays to ensure that you always have the latest version of Unihan. BTW, is it possible for Unicode to provide a Unihan.xml version of the Unihan database ? The first thing I do is convert the Unihan.txt file into XML format for ease of processing. Andrew (P.S. Sorry about the bouncing [EMAIL PROTECTED] cc - this address seems no longer to be accepting errata)
RE: Localized names of character ranges
Mark Davis wrote: > While not a trivial task (about 400 terms), it is many, many > times easier than translating all the significant character > names. That might someday be worth considering for the > Common XML Locale Repository > (http://oss.software.ibm.com/icu/locale/). The problem is not the number of terms involved (400 strings is not a big deal: it corresponds to a small localization project), but rather the utter idiosyncrasy of Unicode-related terminology. Terms such as "title case", "caseless", "reordering" or "combining" are nearly impossible to translate satisfactorily in other languages, and even simple terms such as "character", "letter" or "ideograph", can be tricky, if they have to remain distinguished from each other. I tried myself to translate the Unicode glossary in Italian, but I still have to find satisfactory translations for several entries, although I knocked to experts of disciplines ranging from typography (I still don't have a valid equivalent for the "case" in "case sensitive", etc.) to Hebraism (I am still odds with "cantillation marks"). IMHO, if such an effort is really worth doing, it should be organized and promoted by the Unicode Consortium itself, rather than in an OEM library like ICU. I suggest that the 400-odd property and value name be listed in a text file on the Unicode FTP site (with each English term well commented and explained) and translations be collected on a voluntary basis has was done for the "What's Unicode?" text. The copyright on this material should grant free and unrestricted usage to any implementation such as ICU. _ Marco