Re: UTF-8 file to ASCII file converter

2002-04-12 Thread Vasilis Vasaitis

On Thu, Apr 11, 2002 at 12:04:18AM -0700, Pedro Ferreira wrote:
 I already have a perl script (thanks to Oyvind A.
 Holm) that converts an ascii file with U+ unicode
 codes to an utf-8 file.
 Now I would like to do the oposite, convert an utf-8
 file to an ascii file, each utf-8 character would be
 encoded back to U+. Many thanks in advance for any
 help!

  Just like in the case of the opposite conversion, this conversion can also
be easily achieved with an one-liner. The following seems to be able to do
the job:

  perl -ne 'for (unpack U*, $_) { printf $_  255 ? U+%04X : %c, $_ }'

-- 
Vasilis Vasaitis
[EMAIL PROTECTED]

Don't do drugs. Santa Claus is watching.
-- winamp.com


--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: UTF-8 file to ASCII file converter

2002-04-12 Thread jshin

On Fri, 12 Apr 2002, Bruno Haible wrote:

 H. Peter Anvin writes:
 
  You'd probably be better off using C-like escape codes \u and
  \U with \ escaped as \\.
 
 And when you use this C/Java syntax, you get the converter for free:
 it is contained it libiconv. Try iconv -f UTF-8 -t JAVA.

  That's nice to know. BTW, in case somebody wants to 'torture'
her/his computer/processor for this simple task doable by a Perl one-liner
or iconv, (s)he can run the following:

   native2ascii -encoding UTF-8 file.utf8 file.java
   native2ascii -reverse -encoding UTF-8 file.java file.utf8

native2ascii comes with JDK. 

  Jungshik Shin

--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: UTF-8 file to ASCII file converter

2002-04-12 Thread jshin

On Fri, 12 Apr 2002, Vasilis Vasaitis wrote:
 On Thu, Apr 11, 2002 at 12:04:18AM -0700, Pedro Ferreira wrote:
  Now I would like to do the oposite, convert an utf-8
  file to an ascii file, each utf-8 character would be
  encoded back to U+. Many thanks in advance for any

   Just like in the case of the opposite conversion, this conversion can also
 be easily achieved with an one-liner. The following seems to be able to do
 the job:
 
   perl -ne 'for (unpack U*, $_) { printf $_  255 ? U+%04X : %c, $_ }'

 Unless you regard ISO-8859-1 as a synonym to US-ASCII, '255' has to
be '127' :-) 

--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: UTF-8 file to ASCII file converter

2002-04-12 Thread James H. Cloos Jr.

 Bruno == Bruno Haible [EMAIL PROTECTED] writes:

Bruno And when you use this C/Java syntax, you get the converter for
Bruno free: it is contained it libiconv. Try iconv -f UTF-8 -t JAVA.

Cool.  But when was that addded?  iconv (GNU libc) 2.2.4 as included
in SuSE 7.3's glibc-2.2.4-64.i386.rpm does not support it.  RH7.2 also
has 2.2.4, and also lacks JAVA (one never knows what patches vendors
add...).  I don't see any support for JAVA in cvs either, though I've
only browsed the tree.

-JimC

--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: UTF-8 file to ASCII file converter

2002-04-12 Thread Vasilis Vasaitis

On Fri, Apr 12, 2002 at 11:11:41AM -0400, [EMAIL PROTECTED] wrote:
 On Fri, 12 Apr 2002, Vasilis Vasaitis wrote:
 
Just like in the case of the opposite conversion, this conversion can also
  be easily achieved with an one-liner. The following seems to be able to do
  the job:
  
perl -ne 'for (unpack U*, $_) { printf $_  255 ? U+%04X : %c, $_ }'
 
  Unless you regard ISO-8859-1 as a synonym to US-ASCII, '255' has to
 be '127' :-) 

  Er, right. That's what I meant, actually, but I guess I wasn't thinking
much at that moment :^). And since I only tested this with an iconv'ed
ISO-8859-7 text to UTF-8, I didn't even notice...

Cheers,

-- 
Vasilis Vasaitis
[EMAIL PROTECTED]

Don't do drugs. Santa Claus is watching.
-- winamp.com


--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




UTF-8 file to ASCII file converter

2002-04-11 Thread Pedro Ferreira

I already have a perl script (thanks to Oyvind A.
Holm) that converts an ascii file with U+ unicode
codes to an utf-8 file.
Now I would like to do the oposite, convert an utf-8
file to an ascii file, each utf-8 character would be
encoded back to U+. Many thanks in advance for any
help!

--- Pedro Ferreira [EMAIL PROTECTED] wrote:
 
 Works fine, thank you!
 
 
 --- Oyvind A. Holm [EMAIL PROTECTED] wrote:
  On 2002-03-26 06:58-0800 Pedro Ferreira wrote:
  
   Please, what is the best tool to convert an
 ascii
  file
   with unicode character codes like this:
   U+3400
   U+3405
   to another UTF-8 file with the corresponding
  unicode
   characters?
  
  This Perl script should do the job:
  
  == CUT HERE ==
  
  #!/usr/bin/perl -w
  
 
 
 __
 Do You Yahoo!?
 Yahoo! Movies - coverage of the 74th Academy Awards®
 http://movies.yahoo.com/
 --
 Linux-UTF8:   i18n of Linux on all levels
 Archive:  http://mail.nl.linux.org/linux-utf8/
 


=
Pedro Ferreira
Grenoble - France

Everything should be made as simple as possible, but not simpler. - Einstein

__
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/
--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: UTF-8 file to ASCII file converter

2002-04-11 Thread H. Peter Anvin

Followup to:  [EMAIL PROTECTED]
By author:Pedro Ferreira [EMAIL PROTECTED]
In newsgroup: linux.utf8

 I already have a perl script (thanks to Oyvind A.
 Holm) that converts an ascii file with U+ unicode
 codes to an utf-8 file.
 Now I would like to do the oposite, convert an utf-8
 file to an ascii file, each utf-8 character would be
 encoded back to U+. Many thanks in advance for any
 help!
 

You'd probably be better off using C-like escape codes \u and
\U with \ escaped as \\.

-hpa
-- 
[EMAIL PROTECTED] at work, [EMAIL PROTECTED] in private!
Unix gives you enough rope to shoot yourself in the foot.
http://www.zytor.com/~hpa/puzzle.txt[EMAIL PROTECTED]
--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/