Forwarded question....

2002-08-29 Thread Barry Caplan

Hi Unicoders...


I received this question and I didn't have a good answer ...perhaps someone else here 
can help?

I have a Japanese text file in Shift JIS and I need
to convert it to escaped Unicode. 

Does anyone know of any tools or utilities that can do this?

The standard character encoding sets available in
text editing tools like Hidemaru don't appear to do this.

Any suggestions would be helpful.

Thank you.

By escaped Unicode, she means \u format.

Barry Caplan
http://www.i18n.com





22nd Unicode Conference, Sep 2002, San Jose, CA -- Just 1 week to go!

2002-08-29 Thread Lisa Moore

OK, Unicoders, we're almost there!  About a week to go before the
conference...hope to see you there...

Lisa

***
Register now!  Just 1 week to go!  Register now!  Just 1 week to go!
***

 Twenty-second International Unicode Conference (IUC22)
 Unicode and the Web: Evolution or Revolution?
http://www.unicode.org/iuc/iuc22
  September 9-13, 2002
  San Jose, California

***
Full program now live!  Five days of 3 tracks!  Check the Web site!
***

NEWS

  Visit the Conference Web site ( http://www.unicode.org/iuc/iuc22 )
   to check the Conference program and register.  To help you choose
   Conference sessions, we've included abstracts of talks and speakers'
   biographies.

  Guest rooms at the DoubleTree Hotel San Jose still available at the
   conference rate.

CONFERENCE SPONSORS

   Agfa Monotype Corporation
   Basis Technology Corporation
   Microsoft Corporation
   Netscape Communications
   Oracle Corporation
   Reuters Ltd.
   Sun Microsystems, Inc.
   World Wide Web Consortium (W3C)

GLOBAL COMPUTING SHOWCASE

   Visit the Showcase to find out more about products supporting the
   Unicode Standard, and products and services that can help you
   globalize/localize your software, documentation and Internet content.
   For details, visit the Conference Web site.

CONFERENCE VENUE

The Conference will take place at:

   DoubleTree Hotel San Jose
   2050 Gateway Place
   San Jose, CA 95110
   USA

   Tel: +1 408 453 4000
   Fax: +1 408 437 2898

CONFERENCE MANAGEMENT

   Global Meeting Services Inc.
   8949 Lombard Place, #416
   San Diego, CA 92122, USA

   Tel: +1 858 638 0206 (voice)
+1 858 638 0504 (fax)

   Email: [EMAIL PROTECTED]
  or: [EMAIL PROTECTED]

THE UNICODE CONSORTIUM

The Unicode Consortium was founded as a non-profit organization in 1991.
It is dedicated to the development, maintenance and promotion of The
Unicode Standard, a worldwide character encoding. The Unicode Standard
encodes the characters of the world's principal scripts and languages,
and is code-for-code identical to the international standard ISO/IEC
10646. In addition to cooperating with ISO on the future development of
ISO/IEC 10646, the Consortium is responsible for providing character
properties and algorithms for use in implementations. Today the
membership base of the Unicode Consortium includes major computer
corporations, software producers, database vendors, research
institutions, international agencies and various user groups.

For further information on the Unicode Standard, visit the Unicode Web
site at http://www.unicode.org or e-mail [EMAIL PROTECTED]

   *  *  *  *  *

Unicode(r) and the Unicode logo are registered trademarks of Unicode,
Inc. Used with permission.












Re: Forwarded question....

2002-08-29 Thread Edward H Trager

Hi, Barry,

The uniconv utility which comes with Gaspar Sinai's unicode editor,
yudit (http://www.yudit.org) should work quite nicely.

On Thu, 29 Aug 2002, Barry Caplan wrote:

 Hi Unicoders...


 I received this question and I didn't have a good answer ...perhaps someone else 
here can help?

 I have a Japanese text file in Shift JIS and I need
 to convert it to escaped Unicode.
 
 Does anyone know of any tools or utilities that can do this?
 
 The standard character encoding sets available in
 text editing tools like Hidemaru don't appear to do this.
 
 Any suggestions would be helpful.
 
 Thank you.

 By escaped Unicode, she means \u format.

 Barry Caplan
 http://www.i18n.com







Re: Forwarded question....

2002-08-29 Thread Torsten Mohrin

Barry Caplan [EMAIL PROTECTED] wrote:

I have a Japanese text file in Shift JIS and I need
to convert it to escaped Unicode. 
By escaped Unicode, she means \u format.

This type of conversion can also be done with UniPad
(http://www.unipad.org). Import file as Shift-JIS, Save As ASCII +
UCN, or Copy As ASCII + UCN via clipboard. UCN means Universal
Character Name (i.e. \u sequences). 

--Torsten





Re: Forwarded question....

2002-08-29 Thread Naoto Sato
"native2ascii" in the JDK.  The following command produces exactly what
she wants:

native2ascii -encoding SJIS  shift_jis_file

Thanks, Naoto

Barry Caplan wrote:

 Hi Unicoders...
 
 
 I received this question and I didn't have a good answer ...perhaps someone else 
here can help?
 
 
I have a Japanese text file in Shift JIS and I need
to convert it to escaped Unicode. 

Does anyone know of any tools or utilities that can do this?

The standard character encoding sets available in
text editing tools like Hidemaru don't appear to do this.

Any suggestions would be helpful.

Thank you.

 
 By "escaped Unicode", she means "\u" format.
 
 Barry Caplan
 http://www.i18n.com
 
 
 


-- 
Naoto Sato


[OT] looking for electronic dictionaries

2002-08-29 Thread Eric Muller

For my personal use, I would like to acquire electronic dictionaries, 
principally for the major European languages, with the following 
characteristics:

- reputable source

- raw datafiles accessible - I appreciate the interfaces that 
dictionary vendors may provide, but I want to be able to write my own 
code to find the data I am looking for

- the wordlist is the principal aspect; I can live without definitions.

- markup about the structure of words, for things like hyphenation, 
etc. (or from which hyphenation can be derived)

- some form of frequency count would be nice

For example, I'd like to compute something like: the average French 
character occupies x bytes in UTF-8, with average defined in sync with 
the frequency count. And I'd like to compute things like spelling 
changes introduced by hyphenation in Dutch.

Any pointers?

Thanks,
Eric.