Re: UTF-8 file to ASCII file converter
On Thu, Apr 11, 2002 at 12:04:18AM -0700, Pedro Ferreira wrote: I already have a perl script (thanks to Oyvind A. Holm) that converts an ascii file with U+ unicode codes to an utf-8 file. Now I would like to do the oposite, convert an utf-8 file to an ascii file, each utf-8 character would be encoded back to U+. Many thanks in advance for any help! Just like in the case of the opposite conversion, this conversion can also be easily achieved with an one-liner. The following seems to be able to do the job: perl -ne 'for (unpack U*, $_) { printf $_ 255 ? U+%04X : %c, $_ }' -- Vasilis Vasaitis [EMAIL PROTECTED] Don't do drugs. Santa Claus is watching. -- winamp.com -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: UTF-8 file to ASCII file converter
On Fri, 12 Apr 2002, Bruno Haible wrote: H. Peter Anvin writes: You'd probably be better off using C-like escape codes \u and \U with \ escaped as \\. And when you use this C/Java syntax, you get the converter for free: it is contained it libiconv. Try iconv -f UTF-8 -t JAVA. That's nice to know. BTW, in case somebody wants to 'torture' her/his computer/processor for this simple task doable by a Perl one-liner or iconv, (s)he can run the following: native2ascii -encoding UTF-8 file.utf8 file.java native2ascii -reverse -encoding UTF-8 file.java file.utf8 native2ascii comes with JDK. Jungshik Shin -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: UTF-8 file to ASCII file converter
On Fri, 12 Apr 2002, Vasilis Vasaitis wrote: On Thu, Apr 11, 2002 at 12:04:18AM -0700, Pedro Ferreira wrote: Now I would like to do the oposite, convert an utf-8 file to an ascii file, each utf-8 character would be encoded back to U+. Many thanks in advance for any Just like in the case of the opposite conversion, this conversion can also be easily achieved with an one-liner. The following seems to be able to do the job: perl -ne 'for (unpack U*, $_) { printf $_ 255 ? U+%04X : %c, $_ }' Unless you regard ISO-8859-1 as a synonym to US-ASCII, '255' has to be '127' :-) -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: UTF-8 file to ASCII file converter
Bruno == Bruno Haible [EMAIL PROTECTED] writes: Bruno And when you use this C/Java syntax, you get the converter for Bruno free: it is contained it libiconv. Try iconv -f UTF-8 -t JAVA. Cool. But when was that addded? iconv (GNU libc) 2.2.4 as included in SuSE 7.3's glibc-2.2.4-64.i386.rpm does not support it. RH7.2 also has 2.2.4, and also lacks JAVA (one never knows what patches vendors add...). I don't see any support for JAVA in cvs either, though I've only browsed the tree. -JimC -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: UTF-8 file to ASCII file converter
On Fri, Apr 12, 2002 at 11:11:41AM -0400, [EMAIL PROTECTED] wrote: On Fri, 12 Apr 2002, Vasilis Vasaitis wrote: Just like in the case of the opposite conversion, this conversion can also be easily achieved with an one-liner. The following seems to be able to do the job: perl -ne 'for (unpack U*, $_) { printf $_ 255 ? U+%04X : %c, $_ }' Unless you regard ISO-8859-1 as a synonym to US-ASCII, '255' has to be '127' :-) Er, right. That's what I meant, actually, but I guess I wasn't thinking much at that moment :^). And since I only tested this with an iconv'ed ISO-8859-7 text to UTF-8, I didn't even notice... Cheers, -- Vasilis Vasaitis [EMAIL PROTECTED] Don't do drugs. Santa Claus is watching. -- winamp.com -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
UTF-8 file to ASCII file converter
I already have a perl script (thanks to Oyvind A. Holm) that converts an ascii file with U+ unicode codes to an utf-8 file. Now I would like to do the oposite, convert an utf-8 file to an ascii file, each utf-8 character would be encoded back to U+. Many thanks in advance for any help! --- Pedro Ferreira [EMAIL PROTECTED] wrote: Works fine, thank you! --- Oyvind A. Holm [EMAIL PROTECTED] wrote: On 2002-03-26 06:58-0800 Pedro Ferreira wrote: Please, what is the best tool to convert an ascii file with unicode character codes like this: U+3400 U+3405 to another UTF-8 file with the corresponding unicode characters? This Perl script should do the job: == CUT HERE == #!/usr/bin/perl -w __ Do You Yahoo!? Yahoo! Movies - coverage of the 74th Academy Awards® http://movies.yahoo.com/ -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/ = Pedro Ferreira Grenoble - France Everything should be made as simple as possible, but not simpler. - Einstein __ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: UTF-8 file to ASCII file converter
Followup to: [EMAIL PROTECTED] By author:Pedro Ferreira [EMAIL PROTECTED] In newsgroup: linux.utf8 I already have a perl script (thanks to Oyvind A. Holm) that converts an ascii file with U+ unicode codes to an utf-8 file. Now I would like to do the oposite, convert an utf-8 file to an ascii file, each utf-8 character would be encoded back to U+. Many thanks in advance for any help! You'd probably be better off using C-like escape codes \u and \U with \ escaped as \\. -hpa -- [EMAIL PROTECTED] at work, [EMAIL PROTECTED] in private! Unix gives you enough rope to shoot yourself in the foot. http://www.zytor.com/~hpa/puzzle.txt[EMAIL PROTECTED] -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/