Re: Unicode 3.2: BETA files updated
-BEGIN PGP SIGNED MESSAGE- Kenneth Whistler wrote: And StandardizedVariants.html has been updated again, with more of the missing glyphs provided. I can't see any difference between plain U+2278 (either in the draft code chart or StandardizedVariants.html) and U+2278 with VS1. Is plain U+2278 supposed to have an oblique stroke? Same for U+2279. For U+2A9D, the tilde-like part of the glyph is reversed left-to-right relative to what it should be (compare U+2272 and U+2273, and look at the code chart for plain U+2A9D). This is more important than it sounds! Less importantly, U+2268 and U+2269 with VS1 should use the same style of glyph (i.e. opening angle) for the less than/greater than sign, as the other characters. The Mongolian descriptions say second form, third form, and fourth form. Unless these are already defined somewhere, I suggest variation one, variation two, and variation three instead. Is variant or variation the preferred term? If variant is preferred, then why VARIATION SELECTOR ONE, etc.? If not, why StandardizedVariants? - -- David Hopwood [EMAIL PROTECTED] Home page PGP public key: http://www.users.zetnet.co.uk/hopwood/ RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5 0F 69 8C D4 FA 66 15 01 Nothing in this message is intended to be legally binding. If I revoke a public key but refuse to specify why, it is because the private key has been seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip -BEGIN PGP SIGNATURE- Version: 2.6.3i Charset: noconv iQEVAwUBPE+owzkCAxeYt5gVAQGfMAgAtejHL/lEiqaYW3NYTj6Eku7RMlZqA+om sXwEZlskrALzBxHs+G1gwx09f3/DCD8vfIlFHOVkHYfkMfxJpMf8CXfSPVpIKM2z 36vhCSc7okQsfIwfDqymj+T/InuF495Ph/g6j5cgQO35vVEC4gzzy04Qy03l5FMm OP/JoiPgaazcolMslErNmVxUEhwBApheTLlMgMJoK81oDVEhmRmGqFmgcMHUFZUO pxLyWgXrESvAPwrt3qUs+Des0P++8p6KRbwAVbUA/s2eDBeisYZsiJCiIz45IRfF elwfv2Ek1pyDiZqvcda4+5x3m3Y1GUt+xoWQ+1C9pt7TM7Q3Z/LK5Q== =c8Z+ -END PGP SIGNATURE-
Re: Multiple script Handling (kanji - kana)
Hi Rajat, Any solutions to handle the same ( or in other words to compare 2 Japanese strings written in different scripts or by mixture of two scripts) ?? It is definitely a non-trivial task. If you just want to transform a katakana string into hiragana (or vice versa), it is very easy. But as soon as you start dealing with kanji, it gets really, really tricky. Whereas Chinese - mostly - only has one reading per character, kanji very often have multiple readings. If you want to compare a kanji string to a hiragana string, you have to find out the reading of the kanji - and there is no 1-to-1 table for doing this, rather 1-to-n. You would need a dictionary of Japanese to determine the reading of a compound. But some compounds have various readings, depending on the context. So you would also need a semantic analysis of the sentence! 生物 = 1. seibutsu, 2. namamono 今日 = 1. konnichi, 2. kyou 上手 = 1. jouzu, 2. uwate, 3. kamite 下 = 1. ka, 2. ge, 3. shita, 4. shimo, 5. moto, 6. sa(...), 7. kuda(...), 8. o(riru) Readings of place names and personal names are especially difficult to figure out. However, it really depends on what kind of data you are about to process. If you e.g. have two fields for a Japanese person's name, one in kanji, and the transcription in kana, you could at least check whether it is among the correct transcriptions of the name ... (sigh!) Regards, Berthold __ Do You Yahoo!? Yahoo! BB is Broadband by Yahoo! http://bb.yahoo.co.jp/
RE: [OT] Rich man Bill (RE: Issues with Unicode Hindi)
Michael Kaplan wrote: We rob NO ONE. We behave with honor and we wish others to do the same with us. Its a respect thing. For sure. But you understand that this is politics as well. Many aspects of copyright and intellectual property, and even the very concept of private property, are still the subjects of political debate. In order for your noble statement to be totally effective, there must be no rich men or poor men to play Robin Hood with. But solving such economic dilemmas, as Sarasvati might remind us, is not the task for the public mailing list of a character encoding standard. Real-life down-to-hearth issues pop up every day on the Unicode List (probably because this is the roll-out phase of the standard). We recently discovered how IT people in India have to work sharing rented modem on old and slow telephone lines. It is not the task of Unicode to fix the telecomm infrastructure of India. Not to mention the problem of uneven distribution of resources in the world. So, in that occasion, this forum concentrated itself on answering a single pertaining question: is the size overhead of UTF-8 compatible with the situation of Internet in India? Luckily, the answer was yes: however slow is your Internet connection, the fact that UTF-8 uses 3 bytes for each Indic character won't make things worse, because plain text is an insignificant part of the overall size of an Internet document. About the issue that you raised regarding software piracy, you should consider that in many countries it is easy to step into huge multi-floor shops which sell illegal copies of software and manuals. Feel free to disagree with this state of things, but please avoid publicly calling pirates people who just did what it is customary to do in most parts of the world. Or, probably, who just did a quick test. This is not fair, not appropriate, and not a technical approach to problems. By the way, I would not be so sure that at Microsoft Corp. they need your help to do their math. It is not the task of this forum to decide whether, for a major corporation, it is more important to be strict about copyright, rather than to be the first and best behavers on one of the most promising markets on the planet. It is OK to point out that such-and-such font is not supposed to be free (or that it is, but only provided you install the latest version of such-and-such operating system), or to inform that there also are free or shareware fonts out there that cover such-and-such Indic scripts. But I feel that this is not the proper place to settle legal issues about software distribution. JMHO. _ Marco
Re: Multiple script Handling (kanji - kana)
From my database with roughly 50.000 lexical entries (compounds) I get a number of 1431 compounds with at least two readings and 71 with at least three readings. If I include also 60.000 personal and local names I get 6840 compounds with two or more, 1344 with three or more and 348 with four or more readings. Kay Genenz, Bonn Berthold Frommann wrote: 生物 = 1. seibutsu, 2. namamono 今日 = 1. konnichi, 2. kyou 上手 = 1. jouzu, 2. uwate, 3. kamite Do you have a rough evaluation of how many compound words have multiple readings?
Re: Wade - Pinyin transliteration (Unihan ?)
On Thu, 24 Jan 2002, Patrick Andries wrote: John Cowan wrote: Patrick Andries scripsit: Let's assume I want to transliterate a large Wade-Giles database into pinyin. It this a purely algorithmic process? For all nouns ? Common and proper (cf. Chiang Kai-Shek vs Jiang Jeshi )? Even for dialectal words? Chiang Kai-Shek isn't Wade-Giles; it isn't even Mandarin. I did mention dialectal forms (I believe final -k does no longer occur in Mandarin), I just wondered whether I would find such nouns (proper or common) in dictionary edited in Taiwan. I asked because I could see no algorithmic way of converting this name using traditional Wade to Pinyin tables. Incidentally, if this is not Wade-Giles applied to a dialectal pronunciation, what is it? Geniously interested. It should be noted that Wade-Giles is commonly misused as a cover term for many old, ad hoc, non-Mandarin-based, or non-Pinyin romanization systems. Chiang Kai-shek is a mixture of what looks like Wade-Giles (surname CHIANG) and some kind of archaic romanization based on Cantonese (given name Kai-shek). For placenames, there are many postal romanizations that are often erroneously considered to be Wade-Giles, e.g., the city Nanking (postal)/Nan-ching (Wade-Giles)/Nanjing (Pinyin). In any case, one should also beware of degenerate Wade-Giles forms where details such as apostrophes (denoting aspiration) are omitted, e.g., the city Changchun (degenerate Wade-Giles)/Ch'ang-ch'un (Wade-Giles)/Changchun (Pinyin). If Changchun were accepted as proper Wade-Giles input, then a corrupt *Zhangzhun pinyin form would be generated. Thomas Chan [EMAIL PROTECTED]
Re: Multiple script Handling (kanji - kana)
Dear Prof. Genenz, From my database with roughly 50.000 lexical entries (compounds) I get a number of 1431 compounds with at least two readings and 71 with at least three readings. Taking only into account compounds with multiple readings. But imagine this: If a program had merely access to a database containing the readings of single characters, it still couldn't figure out the reading of a compound (reliably). How could it "know" that $B?M4V(J is "ningen" and not, for instance, *jinkan? This means that it would be vital to have access to a database containing entries for kanji-compound-reading(s). Without semantic analysis of the sentences concerned, it is not possible to determine the correct contextual reading of every Japanese compound. It is only possible to check whether the reading given in the second string is _one_ of the correct readings. Greetings from Edo, Berthold Frommann __ Do You Yahoo!? Yahoo! BB is Broadband by Yahoo! http://bb.yahoo.co.jp/
Re: Has anyone looked at Laban dance notation?
At 12:46 -0800 2002-01-24, Kenneth Whistler wrote: This heads immediately into a rathole where any scheme of dynamic notation for anything whatsoever becomes a candidate for character encoding. Any candidate for encoding has to meet certain criteria. Like Klingon didn't. One of those criteria would be doable. Another would be meets user requirements. A priori rejection of things makes me nervous, though. -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: Unicode 3.2 Beta Period Finishing
Hm, I still don't see 'em all -- letf column images starting at U+2A3C are still missing. Bob On 22-01-2002 05:02:24 Mark Davis wrote: I sent the message out somewhat prematurely -- the images in the first column (the normal representative glyphs) should be there tomorrow. Mark
Re: Unicode 3.2: BETA files updated
At 06:29 AM 1/24/02 +, David Hopwood wrote: Kenneth Whistler wrote: And StandardizedVariants.html has been updated again, with more of the missing glyphs provided. I can't see any difference between plain U+2278 (either in the draft code chart or StandardizedVariants.html) and U+2278 with VS1. Is plain U+2278 supposed to have an oblique stroke? Same for U+2279. The plain ones are supposed to have the oblique stroke in the *reference* glyphs. As with all mathematical glyph variations, *both* variations are acceptable in common, unmarked situations. For U+2A9D, the tilde-like part of the glyph is reversed left-to-right relative to what it should be (compare U+2272 and U+2273, and look at the code chart for plain U+2A9D). This is more important than it sounds! After the last update, I sent Rick a font for these variations that pays attention to these details. Less importantly, U+2268 and U+2269 with VS1 should use the same style of glyph (i.e. opening angle) for the less than/greater than sign, as the other characters. Where possible we've taken the variations from actual fonts, that means that there may be such minor differences that are unrelated to the feature called out in the description. The Mongolian descriptions say second form, third form, and fourth form. Unless these are already defined somewhere, I suggest variation one, variation two, and variation three instead. This list is being published as Amd 1:ISO/IEC 10646-1:2000 (2002), so it's essentially frozen. The list of variants has been out as a UNU TR for a long time now with these terms. Is variant or variation the preferred term? If variant is preferred, then why VARIATION SELECTOR ONE, etc.? If not, why StandardizedVariants? While VARIATION SELECTOR is the formal name of the character (and therefore fixed), referring to the selected thing as a 'variation' sounds really odd, that's why the more common term 'variant' is used all over the place. Perhaps we ought to make them formally synonyms, somewhat like code point and code location. I think it's a subtle thing. Without context, *VARIANT SELECTOR could be understood as a VARIANT of a SELECTOR. Equally, without context, referring of the 'variation' of a character is less clear than saying 'variant'. A./
Re: Has anyone looked at Laban dance notation?
Michael Everson wrote: Any candidate for encoding has to meet certain criteria. Like Klingon didn't. One of those criteria would be doable. Another would be meets user requirements. A priori rejection of things makes me nervous, though. Yeah. I agree that a priori rejection of Labanotation, or any other of various symbolic notations, might be imprudent. But these are cases where the burden of proof -- that a character-based encoding is doable and useful to the user community -- should be squarely on the proposers. So far, nobody has even proposed Labanotation nor done anything near the analysis and inventory that would be required to really engage in a discussion of suitability for character encoding. Same applies to other symbologies, like chemical notation, for that matter. Rick
Re: [Very-OT] Re: ü
- Original Message - From: Michael Everson [EMAIL PROTECTED] To: Patrick Andries [EMAIL PROTECTED] Cc: David Starner [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Wednesday, January 23, 2002 12:35 AM Subject: [Very-OT] Re: ü snip Garçon in Oxford English Dictionary but garconnière (bachelor's housing) in my Webster's New Lexicon (no cedilla, grave accent). Webster's Third New International (1961): garçon Supplement (n.d.): garçonnière. Oxford New Dictionary of English (2001): garçon, garçonnière. New Shorter Oxford English Dictionary, January 1997, on CD-ROM has: garçon, garconnière. How's that for consistency? Of course, given the evidence above, they may have revised that by now. Mike.
RE: Unicode 3.2: BETA files updated
At 06:29 AM 1/24/02 +, David Hopwood wrote: Kenneth Whistler wrote: And StandardizedVariants.html has been updated again, with more of the missing glyphs provided. Can anyone send me the URL for this chart? I can't seem to find it.
Re: Unicode 3.2 Beta Period Finishing
Bob, Hm, I still don't see 'em all -- letf column images starting at U+2A3C are still missing. They're all there. I just checked. Try reloading the page. --Ken Bob On 22-01-2002 05:02:24 Mark Davis wrote: I sent the message out somewhat prematurely -- the images in the first column (the normal representative glyphs) should be there tomorrow. Mark
Re: Fontlab 4.0, Opentype and supplementary characters
Yuri Yarmola has written to me again to say that he is working on 4-byte cmap support, but needs an existing font with such a cmap in order to test his import function. Does anyone have a font with Plane One characters encoded in such a cmap? If so, please contact Yuri directly at [EMAIL PROTECTED]. Thanks. John Hudson Tiro Typeworks www.tiro.com Vancouver, BC [EMAIL PROTECTED] ... es ist ein unwiederbringliches Bild der Vergangenheit, das mit jeder Gegenwart zu verschwinden droht, die sich nicht in ihm gemeint erkannte. ... every image of the past that is not recognized by the present as one of its own concerns threatens to disappear irretrievably. Walter Benjamin
Re: Unicode 3.2: BETA files updated
Asmus Freytag scripsit: While VARIATION SELECTOR is the formal name of the character (and therefore fixed), referring to the selected thing as a 'variation' sounds really odd, that's why the more common term 'variant' is used all over the place. Perhaps we ought to make them formally synonyms, somewhat like code point and code location. I think it's a subtle thing. Without context, *VARIANT SELECTOR could be understood as a VARIANT of a SELECTOR. Equally, without context, referring of the 'variation' of a character is less clear than saying 'variant'. The variation selector specifies the variation which will produce the variant. -- John Cowan http://www.ccil.org/~cowan [EMAIL PROTECTED] Please leave your values| Check your assumptions. In fact, at the front desk. | check your assumptions at the door. --sign in Paris hotel |--Miles Vorkosigan
Re: Microsoft's Japanese IME has no Unicode option
From: "Michael \(michka\) Kaplan" [EMAIL PROTECTED] To: "$B$m!;!;!;!;(B $B$m!;!;!;(B" [EMAIL PROTECTED] Subject: Re: Microsoft's Japanese IME has no Unicode option Date: Fri, 25 Jan 2002 10:15:32 -0800 You are wrong. $B$o$!$!$!$!$!$C$C$C$C!*!*!*(B If that is so, how to I get the thing to give me Unicode? All I saw in the list is JIS, Shift-JIS, and Kuten. _ $B%a!<%k$r3Z$7$_$?$$J}$K:G9b$N%5!<%S%9(B MSN Hotmail $B$,$*$9$9$a(B http://www.hotmail.com/JA/
Re: Microsoft's Japanese IME has no Unicode option
If you use a scripting language like VBScript or JScript, it is converted to Unicode for of the strings in your code. I have explained that you can actually see Unicode in a particular scenario, but it is not going to convert your pages for you or anything like that. Rather than complaining about what is not happening here, why don't you stop and calmly explain what you WANT to happen? Perhaps then someone can answer your question. (and also, no need to be offensive here!) MichKa Michael Kaplan Trigeminal Software, Inc. -- http://www.trigeminal.com/ - Original Message - From: "$B$m!;!;!;!;(B $B$m!;!;!;(B" [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Friday, January 25, 2002 11:00 AM Subject: Re: Microsoft's Japanese IME has no Unicode option From: "Michael \(michka\) Kaplan" [EMAIL PROTECTED] To: "$B$m!;!;!;!;(B $B$m!;!;!;(B" [EMAIL PROTECTED] Subject: Re: Microsoft's Japanese IME has no Unicode option Date: Fri, 25 Jan 2002 10:15:32 -0800 You are wrong. $B$o$!$!$!$!$!$C$C$C$C!*!*!*(B If that is so, how to I get the thing to give me Unicode? All I saw in the list is JIS, Shift-JIS, and Kuten. _ $B%a!<%k$r3Z$7$_$?$$J}$K:G9b$N%5!<%S%9(B MSN Hotmail $B$,$*$9$9$a(B http://www.hotmail.com/JA/
RE: Unicode 3.2: BETA files updated
John Hudson asked, As Unicode continues to grow, I wonder if we can expect another book-- or multiple volumes -- at some stage, or if the standard will become a purely electronic document? Has any decision been taken about this? Speaking in my official capacity as editor, the answer is yes, you can expect another book. The editorial committee is already hard at work on 4.0, which we expect to publish as one volume. Publication is tentatively scheduled for spring 2003. As to the form and timing of 5.0, that would be pure speculation at this point. Someone else on the committee might be willing to speculate, but I won't! Julie Allen
Re: Microsoft's Japanese IME has no Unicode option
From: "$B$m!;!;!;!;(B $B$m!;!;!;(B" [EMAIL PROTECTED] Okay, here's the scoop: I have a page with some (poorly written) Japanese in it, and it is in Unicode. I want to be able to edit the page without having to port the whole doggone thing into Unipad and then curse when I can't use my IME in Unipad so I have to cut-and-paste from MS Word and THEN go thru the whole rigmarole of replacing my page. No. I want it to work in the Geocities page editor. And I am using the .com (not .co.jp) version of Geocities for this page. You should use a program (such as FrontPage 2000 or FrontPage XP) that can support any encoding you choose to use for your pages. A browser is for displaying pages in the encoding they are in, NOT a tool for editing web pages. If you do not like this kind of option, I do not know what else to tell you -- there are many programs out there designed to do what you are asking, you cannot insist on using one that will not and then be surprised if it does not work MichKa Michael Kaplan Trigeminal Software, Inc. -- http://www.trigeminal.com/
RE: Unicode 3.2: BETA files updated
Julie said: As to the form and timing of 5.0, that would be pure speculation at this point. Someone else on the committee might be willing to speculate, but I won't! Ummm... Unicode 5.0 will be published on December 22, 2007, in DVD3 holographic format, complete with a remastered Unicode hymn and MSNBC, E!, and MTV interviews with the cast of thousands who contributed to the project. We'll get David Kelley to do the producing. --Ken
Re: Unicode 3.2: BETA files updated
On Fri, Jan 25, 2002 at 11:31:19AM -0800, Julie Allen wrote: Speaking in my official capacity as editor, the answer is yes, you can expect another book. The editorial committee is already hard at work on 4.0, which we expect to publish as one volume. So are you worried about 4.0 being 2,000 pages long, or do you have a solution to that problem? -- David Starner - [EMAIL PROTECTED], dvdeug/jabber.com (Jabber) Pointless website: http://dvdeug.dhis.org What we've got is a blue-light special on truth. It's the hottest thing with the youth. -- Information Society, Peace and Love, Inc.
Re: RE: Unicode 3.2: BETA files updated
Ken let the cat out of the bag: Unicode 5.0 will be published on December 22, 2007... complete with a remastered Unicode hymn... It's true. We've already booked an Abbey Road studio for five days in March 2007, and we've signed 75 of the hottest young voices in the world to be in the chorus, including Oumou Sangare, Nityasree Mahadevan, and Ning Liang... Send in your major cash donation today and you, too, can be in the chorus rubbing shoulders with the divas! Rick
Re: Problems with viewing Hindi Unicode Page
On 01/23/2002 02:50:58 AM John Hudson wrote: The problem for Win 9x users, even with current browsers, is lack of a system installed Devanagari font with OpenType layout tables. The version of Arial Unicode that ships with pre-XP versions of Windows does not contain layout tables for Indic scripts (I have not check the XP version, but I know that this is something that Monotype have been working on for Microsoft). The version of Arial Unicode MS on my system does have layout tables for Devanagari. I don't know with what product this version was introduced to my system -- I've got Win2K, IE5.5 and Office XP. BTW, don't go and borrow Mangal.ttf from a Win2K user and install it on Win98; you won't get the results you're looking for. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED]
Re: Problems with viewing Hindi Unicode Page
At 12:56 1/25/2002, [EMAIL PROTECTED] wrote: The version of Arial Unicode MS on my system does have layout tables for Devanagari. I don't know with what product this version was introduced to my system -- I've got Win2K, IE5.5 and Office XP. Interesting. What's the file date on that font? John Hudson Tiro Typeworks www.tiro.com Vancouver, BC [EMAIL PROTECTED] ... es ist ein unwiederbringliches Bild der Vergangenheit, das mit jeder Gegenwart zu verschwinden droht, die sich nicht in ihm gemeint erkannte. ... every image of the past that is not recognized by the present as one of its own concerns threatens to disappear irretrievably. Walter Benjamin
RE: Microsoft's Japanese IME has no Unicode option
There is a quite simple way to do what you want: If you want to input directly into an HTML form on the Geocities site, all you have to do is pull down your "view" menu (I presuppose IE here) and choose "UTF-8" from the Encoding submenu. Since Geocities doesn't send a META tag, your browser will now encode all of the data you type as UTF-8 for you and those are the bytes that will get stored in your page on the back-end. The reason you're getting Shift-JIS now is that your browser is probably set to "Japanese auto-detect" and ASCII is certainly valid Shift-JIS.. Note that adding a META tag to your page is a very good idea if you decide to use UTF-8 as the encoding. You can see that this works here: http://www.geocities.com/apphillips2000/index.html You will note that I included a META tag. Otherwise you have to manually select UTF-8 as the page encoding. Regards, Addison Addison P. Phillips Globalization Architect / Manager, Globalization Engineering webMethods, Inc. | The Business Integration Company 432 Lakeside Drive, Sunnyvale, California, USA +1 408.962.5487 (phone) +1 408.210.3569 (mobile) - Internationalization is an architecture. It is not a feature.
Re: Multiple script Handling (kanji - kana)
- Original Message - From: Marco Cimarosti [EMAIL PROTECTED] To: 'Berthold Frommann' [EMAIL PROTECTED]; Rajat Bawa [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: den 25 januari 2002 12:06 Subject: RE: Multiple script Handling (kanji - kana) In plain-text unicode, furigana might be encoded using this set of control characters: U+FFF9 (INTERLINEAR ANNOTATION ANCHOR) U+FFFA (INTERLINEAR ANNOTATION SEPARATOR) U+FFFB (INTERLINEAR ANNOTATION TERMINATOR) The format of a word with furigana should be: U+FFF9 kanji(s) U+FFFA hiragana(s) U+FFFB Do you know any font that supports these characters? Stefan _ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
RE: Unicode 3.2: BETA files updated
At 11:31 AM 1/25/02 -0800, Julie Allen wrote: John Hudson asked, As Unicode continues to grow, I wonder if we can expect another book-- or multiple volumes -- at some stage, or if the standard will become a purely electronic document? Has any decision been taken about this? There are lots of space saving options. Printing the code charts in 7 pt type (they are now in 22pt type) would allow us to print nine times as many characters per page. We would have room left over for adding one of the handy little magnifying glasses like that ones that accompany the boxed set of the OED. A./
RE: Unicode 3.2: BETA files updated
On Fri, Jan 25, 2002 at 11:31:19AM -0800, Julie Allen wrote: Speaking in my official capacity as editor, the answer is yes, you can expect another book. The editorial committee is already hard at work on 4.0, which we expect to publish as one volume. So are you worried about 4.0 being 2,000 pages long, or do you have a solution to that problem? We're estimating that 4.0 will be roughly 1500 pages, which the publisher says is not a problem for one volume. Now whether you can carry it with one hand is a different question. :-) --Julie
Furigana can be katakana
In my Love Hina vol 7, $B@iG/(B has furigana $B%_%l%K%"%`(B. Just thought you might wanna know. _ $B%a!<%k%5!<%S%9$O!"@$3&(B No.1 $B$N(B MSN Hotmail $B$G!*(Bhttp://www.hotmail.com/JA/
Re: Furigana can be katakana
- Original Message - From: ろ ろ〇〇〇 [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: den 25 januari 2002 23:23 Subject: Furigana can be katakana In my Love Hina vol 7, 千年 has furigana ミレニアム. In cases such as ?瑞典?スウェーデン? (is the furigana encoded correctly?) the furigana should always be written in katakana, right? Stefan _ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
Carrying it around [was RE: Unicode 3.2: BETA files updated]
Julie We're estimating that 4.0 will be roughly 1500 pages, which the Julie publisher says is not a problem for one volume. Now whether you can Julie carry it with one hand is a different question. :-) We Unicode accolytes have a rule that requires using both hands when carrying the holy book anyway :-) Mind you, revealed wisdom should never exceed 4.5 kilograms in weight (in Earth-normal gravity) so that it remains suitable for slamming authoritatively on the tops of podiums, desks and other flattish surfaces. - Mark Leisher... I get my ideas from reading the news, Computing Research Lab which is probably why my writing has the New Mexico State University intellectual depth of Saran Wrap. Box 30001, Dept. 3CRL -- Michael Swaine Las Cruces, NM 88003
RE: Multiple script Handling (kanji - kana)
Here I am in the list -Original Message- From: Stefan Persson [mailto:[EMAIL PROTECTED]] Sent: Friday, January 25, 2002 4:49 PM To: Marco Cimarosti; 'Berthold Frommann'; Rajat Bawa Cc: [EMAIL PROTECTED] Subject: Re: Multiple script Handling (kanji - kana) - Original Message - From: Marco Cimarosti [EMAIL PROTECTED] To: 'Berthold Frommann' [EMAIL PROTECTED]; Rajat Bawa [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: den 25 januari 2002 12:06 Subject: RE: Multiple script Handling (kanji - kana) In plain-text unicode, furigana might be encoded using this set of control characters: U+FFF9 (INTERLINEAR ANNOTATION ANCHOR) U+FFFA (INTERLINEAR ANNOTATION SEPARATOR) U+FFFB (INTERLINEAR ANNOTATION TERMINATOR) The format of a word with furigana should be: U+FFF9 kanji(s) U+FFFA hiragana(s) U+FFFB Do you know any font that supports these characters? Stefan _ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
Re: Problems with viewing Hindi Unicode Page
- Original Message - From: [EMAIL PROTECTED] The version of Arial Unicode MS on my system does have layout tables for Devanagari. I don't know with what product this version was introduced to my system -- I've got Win2K, IE5.5 and Office XP. I guess the question becomes, which version of Arial Unicode MS? I suspect that the version of Arial Unicode MS you have must be form Office XP Andj
iMode to Unicode mapping data
Hi all, Does anybody know if there exists mappings from any of the iMode (DoCoMo) vendor-specific character codes to Unicode? The iMode characters (164 of them) exist in Shift-JIS from 0xF89F to 0xF9AF. Most of them would be considered dingbats. Yes, I could eyeball the glyphs and come up with mappings on my own, but I'm wondering if this has already been done. Thanks, -- Ken Ken Krugler TransPac Software, Inc. http://www.transpac.com +1 530-470-9200
Re: Unicode 3.2: BETA files updated
-BEGIN PGP SIGNED MESSAGE- Asmus Freytag wrote: At 06:29 AM 1/24/02 +, David Hopwood wrote: Kenneth Whistler wrote: And StandardizedVariants.html has been updated again, with more of the missing glyphs provided. I can't see any difference between plain U+2278 (either in the draft code chart or StandardizedVariants.html) and U+2278 with VS1. Is plain U+2278 supposed to have an oblique stroke? Same for U+2279. The plain ones are supposed to have the oblique stroke in the *reference* glyphs. As with all mathematical glyph variations, *both* variations are acceptable in common, unmarked situations. In that case how do I specify that the reference glyph is required? I.e. there's an asymmetry here between the VS1 glyph, which can be specified explicitly, and the reference glyph, which can't. One possibility is to make VS1 specify what is now the reference glyph, and VS2 specify the alternate glyph. Unmarked would mean either. The other possibility is to say that to be strictly Unicode-conformant, fonts should always use the reference glyph for unmarked characters (ignoring differences only of style). I think this is actually a better solution in practice; it avoids having to add selectors that would usually be redundant, and that would interfere with normalisation. It's also consistent with the Mongolian variant selectors, where unmarked should mean the first form. [...] The Mongolian descriptions say second form, third form, and fourth form. Unless these are already defined somewhere, I suggest variation one, variation two, and variation three instead. This list is being published as Amd 1:ISO/IEC 10646-1:2000 (2002), so it's essentially frozen. OK. Is variant or variation the preferred term? If variant is preferred, then why VARIATION SELECTOR ONE, etc.? If not, why StandardizedVariants? While VARIATION SELECTOR is the formal name of the character (and therefore fixed), referring to the selected thing as a 'variation' sounds really odd, that's why the more common term 'variant' is used all over the place. Perhaps we ought to make them formally synonyms, somewhat like code point and code location. Yes, we should. I think it's a subtle thing. Without context, *VARIANT SELECTOR could be understood as a VARIANT of a SELECTOR. But there is always sufficient context for Unicode character names: the Unicode standard :-) I realise that the VS character names can't be changed now, though (because they have been accepted for ISO 10646). - -- David Hopwood [EMAIL PROTECTED] Home page PGP public key: http://www.users.zetnet.co.uk/hopwood/ RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5 0F 69 8C D4 FA 66 15 01 Nothing in this message is intended to be legally binding. If I revoke a public key but refuse to specify why, it is because the private key has been seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip -BEGIN PGP SIGNATURE- Version: 2.6.3i Charset: noconv iQEVAwUBPFCRGDkCAxeYt5gVAQHMHwf/bsgs7cbksude6LMvxXi665uM7ypwTuUx GOHxF4g7ji3KbHhYIdfKHqjhVikMrg8TyJFmfI7v3hcgtASZF6fJkOf9Ai3nRDuP ku+l8LN0nuBTp2t3evsWa0gmBWcN6k4LhydiyGez1ndPM6nwLx4yF5nmyjaYWm+E LiNtDn6Tn+oMsMzs7MwxPC6AOq1ZveIOtgw47Tbh/wa0AAjfa+1XCAnf2OEfZvR9 O6jGLCpmqHByoqzrDhlkVwGaGU6vn6TtXBR0xDWtLUI77DINWwi/dmpBTNHE+7FF UsyL0+fue1dKZLUgV/idBPdZDxRVq6cjw0nksBZgPKjqjRBc+GmhQw== =4JRE -END PGP SIGNATURE-
[OT] weight vs. mass
On 01/25/2002 04:45:04 PM Mark Leisher wrote: Mind you, revealed wisdom should never exceed 4.5 kilograms in weight (in Earth-normal gravity) Uh, Mark? Kilograms are units of mass, not weight, so something that's 4.5 kilograms or less will be 4.5 kilograms or less whether in Earth-normal gravity or on the surface of a neutron star. It's weight in those two locations would, of course, be quite different (and would be measured in units like kg m / s^2 or something like that -- I forget). Peter
POSITIVELY MUST READ! Bytext is here!
Hello Unicode list members, Unicode now has a serious competitor. Please read about it at www.bytext.org. Everyone on this list should find it extremely interesting. I hope people concerned with Unicode see this as an opportunity for growth. I think you will find Bytext to be a superior technology that is worthy of the work that will be required to implement it. It is sufficiently different that there is no possible way it can be turned into a transformation format of Unicode or something like that. I guess this means that many of you are now officially my colleagues. After following this list and reading various things youve written you all seem very friendly and intelligent. I look forward to future conversations between us. Sincerely, Bernard Miller __ Do You Yahoo!? Great stuff seeking new owners in Yahoo! Auctions! http://auctions.yahoo.com
RE: Unicode 3.2: BETA files updated
At 13:48 1/25/2002, Julie Allen wrote: We're estimating that 4.0 will be roughly 1500 pages, which the publisher says is not a problem for one volume. Now whether you can carry it with one hand is a different question. :-) Please try to ensure a sturdy binding. The binding of 3.0 is a little weaker than it should be for a book this size, and 1500 pages is going to make this more of an issue. Thanks. John Hudson Tiro Typeworks www.tiro.com Vancouver, BC [EMAIL PROTECTED] ... es ist ein unwiederbringliches Bild der Vergangenheit, das mit jeder Gegenwart zu verschwinden droht, die sich nicht in ihm gemeint erkannte. ... every image of the past that is not recognized by the present as one of its own concerns threatens to disappear irretrievably. Walter Benjamin
Re: POSITIVELY MUST READ! Bytext is here!
Unicode now has a serious competitor. Please read about it at www.bytext.org. Everyone on this list should find it extremely interesting. Goll dang! Just what ah've bin waitin' fer! Code points is gettin' way too expensive in Unicode, so I sure hope bytext is sellin' 'em cheaper. Yer ol' pal, Youtie _ Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp.
Re: Problems with viewing Hindi Unicode Page
On 01/25/2002 03:17:59 PM John Hudson wrote: The version of Arial Unicode MS on my system does have layout tables for Devanagari. I don't know with what product this version was introduced to my system -- I've got Win2K, IE5.5 and Office XP. Interesting. What's the file date on that font? Nov 30, 2000 - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED]
Re: POSITIVELY MUST READ! Bytext is here!
From Bernard's personal site: There are so many people smarter than me Indeed. But few who are so presumptuous to believe that they are a serious competitor on such a basis? Though I can offer you a deal on personalized tutorials to help you with your misconceptions on Unicode, though it may be too late to do any good. :-) Good luck with your crusade, you will need it MichKa Michael Kaplan Trigeminal Software, Inc. -- http://www.trigeminal.com/
Re: POSITIVELY MUST READ! Bytext is here!
On Fri, Jan 25, 2002 at 08:33:15PM -0800, Bernard Miller wrote: Unicode now has a serious competitor. Please read about it at www.bytext.org. Everyone on this list should find it extremely interesting. Let's see. Bytext has no corporate supporters, nor is it supported by any standards organizations. It has a hard-to-read standard that requires intimate knowledge of Unicode to understand (I think), and that shows no typographical sophistication. At several points (keyboard design, markup language), the author seems to want to change the world instead of being compatible with what's there. For all their problems, I find Rosetta and Tron to be more serious competitors and more interesting than Bytext. -- David Starner - [EMAIL PROTECTED], dvdeug/jabber.com (Jabber) Pointless website: http://dvdeug.dhis.org What we've got is a blue-light special on truth. It's the hottest thing with the youth. -- Information Society, Peace and Love, Inc.
Re: POSITIVELY MUST READ! Bytext is here!
In a message dated 2002-01-25 20:45:46 Pacific Standard Time, [EMAIL PROTECTED] writes: Unicode now has a serious competitor. Please read about it at www.bytext.org. Everyone on this list should find it extremely interesting. I just downloaded the PDF file and spent about 10 minutes skimming through it. This is a joke, right? -Doug Ewell Fullerton, California
Re: Unicode 3.2: BETA files updated
At 10:58 PM 1/24/02 +, David Hopwood wrote: One possibility is to make VS1 specify what is now the reference glyph, and VS2 specify the alternate glyph. Unmarked would mean either. Boy, great minds do think alike. I proposed that in a paper to the UTC last year. ;-) You realize that this issue is not limited to variation selectors? Read the section on greek phi in http://www.unicode.org/unicode/reports/tr28 The other possibility is to say that to be strictly Unicode-conformant, fonts should always use the reference glyph for unmarked characters (ignoring differences only of style). I think this is actually a better solution in practice; it avoids having to add selectors that would usually be redundant, and that would interfere with normalisation. It's also consistent with the Mongolian variant selectors, where unmarked should mean the first form. Boy, great minds to think alike. Mark Davis just proposed that in a paper to the UTC this week. Unfortunately. this is not a model that's always usable. Please read the section on phi for background. By adding a variation, we cannot restrict the glyph range for the unmarked character - Mongolian being an exception since the unmarked character's glyph range has been *explicitly* restricted from the outset to the standard positional forms. For VS1, the situation is different in that the glyph range of the *unmarked* character *also* includes the glyph identified by VS1. A./
Re: POSITIVELY MUST READ! Bytext is here!
So, is there a script -- something along the lines of the dialectizer recently mentioned here -- that automatically generates 'Competitor to Unicode' websites? I wonder, because they all make the same set of claims, display the same confusion about or misrepresentation of Unicode, and offer eerily identical absence of industry support. John Hudson Tiro Typeworks www.tiro.com Vancouver, BC [EMAIL PROTECTED] ... es ist ein unwiederbringliches Bild der Vergangenheit, das mit jeder Gegenwart zu verschwinden droht, die sich nicht in ihm gemeint erkannte. ... every image of the past that is not recognized by the present as one of its own concerns threatens to disappear irretrievably. Walter Benjamin
Re: POSITIVELY MUST READ! Bytext is here!
At 08:33 PM 2002-01-25 -0800, Bernard Miller wrote: Hello Unicode list members, Unicode now has a serious competitor. Please read about it at www.bytext.org. Everyone on this list should find it extremely interesting. Juvenal said It is hard NOT to write satire. But in future I suggest you resist the temptation. -- Sean M. Burke [EMAIL PROTECTED] http://www.spinn.net/~sburke/
Re: POSITIVELY MUST READ! Bytext is here!
Unicode now has a serious competitor. Kllhk!! Kllhk!! Kllhk! Whoa! Almost choked on my tofu burger! Oh dewd, you have it so, like, all wrong... Universal character encoding isn't about Competition and Marketing, it's about everybody doin' it in the road, all together like, in love, peace, and harmony. One of the major, major take-home points is the U word in universal character encoding. There's only supposed to be one of them, otherwise, sorry to say, it's just as bloody pointless as saddlin' a herd of cats to cart your horse to the flea market. Please, get a grip. Rick