Forwarded question....
Hi Unicoders... I received this question and I didn't have a good answer ...perhaps someone else here can help? I have a Japanese text file in Shift JIS and I need to convert it to escaped Unicode. Does anyone know of any tools or utilities that can do this? The standard character encoding sets available in text editing tools like Hidemaru don't appear to do this. Any suggestions would be helpful. Thank you. By escaped Unicode, she means \u format. Barry Caplan http://www.i18n.com
22nd Unicode Conference, Sep 2002, San Jose, CA -- Just 1 week to go!
OK, Unicoders, we're almost there! About a week to go before the conference...hope to see you there... Lisa *** Register now! Just 1 week to go! Register now! Just 1 week to go! *** Twenty-second International Unicode Conference (IUC22) Unicode and the Web: Evolution or Revolution? http://www.unicode.org/iuc/iuc22 September 9-13, 2002 San Jose, California *** Full program now live! Five days of 3 tracks! Check the Web site! *** NEWS Visit the Conference Web site ( http://www.unicode.org/iuc/iuc22 ) to check the Conference program and register. To help you choose Conference sessions, we've included abstracts of talks and speakers' biographies. Guest rooms at the DoubleTree Hotel San Jose still available at the conference rate. CONFERENCE SPONSORS Agfa Monotype Corporation Basis Technology Corporation Microsoft Corporation Netscape Communications Oracle Corporation Reuters Ltd. Sun Microsystems, Inc. World Wide Web Consortium (W3C) GLOBAL COMPUTING SHOWCASE Visit the Showcase to find out more about products supporting the Unicode Standard, and products and services that can help you globalize/localize your software, documentation and Internet content. For details, visit the Conference Web site. CONFERENCE VENUE The Conference will take place at: DoubleTree Hotel San Jose 2050 Gateway Place San Jose, CA 95110 USA Tel: +1 408 453 4000 Fax: +1 408 437 2898 CONFERENCE MANAGEMENT Global Meeting Services Inc. 8949 Lombard Place, #416 San Diego, CA 92122, USA Tel: +1 858 638 0206 (voice) +1 858 638 0504 (fax) Email: [EMAIL PROTECTED] or: [EMAIL PROTECTED] THE UNICODE CONSORTIUM The Unicode Consortium was founded as a non-profit organization in 1991. It is dedicated to the development, maintenance and promotion of The Unicode Standard, a worldwide character encoding. The Unicode Standard encodes the characters of the world's principal scripts and languages, and is code-for-code identical to the international standard ISO/IEC 10646. In addition to cooperating with ISO on the future development of ISO/IEC 10646, the Consortium is responsible for providing character properties and algorithms for use in implementations. Today the membership base of the Unicode Consortium includes major computer corporations, software producers, database vendors, research institutions, international agencies and various user groups. For further information on the Unicode Standard, visit the Unicode Web site at http://www.unicode.org or e-mail [EMAIL PROTECTED] * * * * * Unicode(r) and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.
Re: Forwarded question....
Hi, Barry, The uniconv utility which comes with Gaspar Sinai's unicode editor, yudit (http://www.yudit.org) should work quite nicely. On Thu, 29 Aug 2002, Barry Caplan wrote: Hi Unicoders... I received this question and I didn't have a good answer ...perhaps someone else here can help? I have a Japanese text file in Shift JIS and I need to convert it to escaped Unicode. Does anyone know of any tools or utilities that can do this? The standard character encoding sets available in text editing tools like Hidemaru don't appear to do this. Any suggestions would be helpful. Thank you. By escaped Unicode, she means \u format. Barry Caplan http://www.i18n.com
Re: Forwarded question....
Barry Caplan [EMAIL PROTECTED] wrote: I have a Japanese text file in Shift JIS and I need to convert it to escaped Unicode. By escaped Unicode, she means \u format. This type of conversion can also be done with UniPad (http://www.unipad.org). Import file as Shift-JIS, Save As ASCII + UCN, or Copy As ASCII + UCN via clipboard. UCN means Universal Character Name (i.e. \u sequences). --Torsten
Re: Forwarded question....
"native2ascii" in the JDK. The following command produces exactly what she wants: native2ascii -encoding SJIS shift_jis_file Thanks, Naoto Barry Caplan wrote: Hi Unicoders... I received this question and I didn't have a good answer ...perhaps someone else here can help? I have a Japanese text file in Shift JIS and I need to convert it to escaped Unicode. Does anyone know of any tools or utilities that can do this? The standard character encoding sets available in text editing tools like Hidemaru don't appear to do this. Any suggestions would be helpful. Thank you. By "escaped Unicode", she means "\u" format. Barry Caplan http://www.i18n.com -- Naoto Sato
[OT] looking for electronic dictionaries
For my personal use, I would like to acquire electronic dictionaries, principally for the major European languages, with the following characteristics: - reputable source - raw datafiles accessible - I appreciate the interfaces that dictionary vendors may provide, but I want to be able to write my own code to find the data I am looking for - the wordlist is the principal aspect; I can live without definitions. - markup about the structure of words, for things like hyphenation, etc. (or from which hyphenation can be derived) - some form of frequency count would be nice For example, I'd like to compute something like: the average French character occupies x bytes in UTF-8, with average defined in sync with the frequency count. And I'd like to compute things like spelling changes introduced by hyphenation in Dutch. Any pointers? Thanks, Eric.