Re: Converting to UTF-8 encoding

Shlomi Fish Sat, 17 Feb 2007 15:56:56 -0800

Hi Uri! (and all)

On Sunday 18 February 2007, Uri Even-Chen wrote:
> Dear Linux people,
>
> In addition to my previous message, I decided it's about time to
> convert my Hebrew websites from Windows-1255 encoding to Unicode
> (UTF-8).  (By the way, is it a smart decision?)
>
> Anyway, since I didn't find a better way to do it - that's how I did
> it: I opened each file that contains Hebrew text (English files didn't
> need any conversion) with Windows Notepad, saved it as UTF-8 encoding,
> then FTP'd it to my Linux server, converted it to unix using dos2unix,
> edited it with pico and removed the first 3 characters, which are
> created by Notepad but don't work well with PHP (at least my version,
> PHP 4.4.2), then I FTP'd it back and replaced the original file.  The
> problem is - it takes too much time for each file, and I have hundreds
> of files.  And also, Notepad doesn't recognize these files as UTF-8
> encoded files.  Is there a way to do it simultaneously to hundreds of
> files?
>


Maybe I'm missing something, but perhaps you should look at iconv:

http://www.gnu.org/software/libiconv/

You can use the following command to convert a single file:

<<<<
iconv -f WINDOWS-1255 -t UTF-8 oldfile > newfile
>>>>

And you can convert a group of files using a shell script or a find command. 
If you want to do it the Perl way then read perldoc Encode:

http://perldoc.perl.org/Encode.html

Regards,

        Shlomi Fish

---------------------------------------------------------------------
Shlomi Fish      [EMAIL PROTECTED]
Homepage:        http://www.shlomifish.org/

Chuck Norris wrote a complete Perl 6 implementation in a day but then
destroyed all evidence with his bare hands, so no one will know his secrets.

=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

Re: Converting to UTF-8 encoding

Reply via email to