Re: UTF-8 file to ASCII file converter

2002-04-12 Thread Vasilis Vasaitis

On Thu, Apr 11, 2002 at 12:04:18AM -0700, Pedro Ferreira wrote:
 I already have a perl script (thanks to Oyvind A.
 Holm) that converts an ascii file with U+ unicode
 codes to an utf-8 file.
 Now I would like to do the oposite, convert an utf-8
 file to an ascii file, each utf-8 character would be
 encoded back to U+. Many thanks in advance for any
 help!

  Just like in the case of the opposite conversion, this conversion can also
be easily achieved with an one-liner. The following seems to be able to do
the job:

  perl -ne 'for (unpack U*, $_) { printf $_  255 ? U+%04X : %c, $_ }'

-- 
Vasilis Vasaitis
[EMAIL PROTECTED]

Don't do drugs. Santa Claus is watching.
-- winamp.com


--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: UTF-8 file to ASCII file converter

2002-04-12 Thread jshin

On Fri, 12 Apr 2002, Bruno Haible wrote:

 H. Peter Anvin writes:
 
  You'd probably be better off using C-like escape codes \u and
  \U with \ escaped as \\.
 
 And when you use this C/Java syntax, you get the converter for free:
 it is contained it libiconv. Try iconv -f UTF-8 -t JAVA.

  That's nice to know. BTW, in case somebody wants to 'torture'
her/his computer/processor for this simple task doable by a Perl one-liner
or iconv, (s)he can run the following:

   native2ascii -encoding UTF-8 file.utf8 file.java
   native2ascii -reverse -encoding UTF-8 file.java file.utf8

native2ascii comes with JDK. 

  Jungshik Shin

--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: UTF-8 file to ASCII file converter

2002-04-12 Thread jshin

On Fri, 12 Apr 2002, Vasilis Vasaitis wrote:
 On Thu, Apr 11, 2002 at 12:04:18AM -0700, Pedro Ferreira wrote:
  Now I would like to do the oposite, convert an utf-8
  file to an ascii file, each utf-8 character would be
  encoded back to U+. Many thanks in advance for any

   Just like in the case of the opposite conversion, this conversion can also
 be easily achieved with an one-liner. The following seems to be able to do
 the job:
 
   perl -ne 'for (unpack U*, $_) { printf $_  255 ? U+%04X : %c, $_ }'

 Unless you regard ISO-8859-1 as a synonym to US-ASCII, '255' has to
be '127' :-) 

--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: UTF-8 file to ASCII file converter

2002-04-12 Thread James H. Cloos Jr.

 Bruno == Bruno Haible [EMAIL PROTECTED] writes:

Bruno And when you use this C/Java syntax, you get the converter for
Bruno free: it is contained it libiconv. Try iconv -f UTF-8 -t JAVA.

Cool.  But when was that addded?  iconv (GNU libc) 2.2.4 as included
in SuSE 7.3's glibc-2.2.4-64.i386.rpm does not support it.  RH7.2 also
has 2.2.4, and also lacks JAVA (one never knows what patches vendors
add...).  I don't see any support for JAVA in cvs either, though I've
only browsed the tree.

-JimC

--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: UTF-8 file to ASCII file converter

2002-04-12 Thread Vasilis Vasaitis

On Fri, Apr 12, 2002 at 11:11:41AM -0400, [EMAIL PROTECTED] wrote:
 On Fri, 12 Apr 2002, Vasilis Vasaitis wrote:
 
Just like in the case of the opposite conversion, this conversion can also
  be easily achieved with an one-liner. The following seems to be able to do
  the job:
  
perl -ne 'for (unpack U*, $_) { printf $_  255 ? U+%04X : %c, $_ }'
 
  Unless you regard ISO-8859-1 as a synonym to US-ASCII, '255' has to
 be '127' :-) 

  Er, right. That's what I meant, actually, but I guess I wasn't thinking
much at that moment :^). And since I only tested this with an iconv'ed
ISO-8859-7 text to UTF-8, I didn't even notice...

Cheers,

-- 
Vasilis Vasaitis
[EMAIL PROTECTED]

Don't do drugs. Santa Claus is watching.
-- winamp.com


--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/