Re: encoding latin1 to utf-8

J. Clifford Dyer Mon, 10 Sep 2007 05:49:53 -0700

On Mon, Sep 10, 2007 at 12:25:46PM -0000, Harshad Modi wrote regarding encoding 
latin1 to utf-8:
> Path: 
> news.xs4all.nl!newsspool.news.xs4all.nl!transit.news.xs4all.nl!newsgate.cistron.nl!xs4all!news.glorb.com!postnews.google.com!22g2000hsm.googlegroups.com!not-for-mail
> 
> hello ,
>  I make one function for encoding latin1 to utf-8. but i think it is
> not work proper.
> plz guide me.
> 
> it is not get proper result . such that i got "Belgi???" using this
> method, (Belgium)  :
> 
> import codecs
> import sys
> # Encoding / decoding functions
> def encode(filename):
>  file = codecs.open(filename, encoding="latin-1")
>  data = file.read()
>  file = codecs.open(filename,"wb", encoding="utf-8")
>  file.write(data)
> 
> file_name=sys.argv[1]
> encode(file_name)


Some tips to help you out. 

1.  Close your filehandles when you're done with them.
2.  Don't shadow builtin names.  Python uses the name file, and binding it to 
your own function can have ugly side effects that manifest down the road.

So perhaps try the following:

import codecs

def encode(filename):
        read_handle = codecs.open(filename, encoding='latin-1')
        data = read_handle.read()
        read_handle.close()
        write_handle = codecs.open(filename, 'wb', encoding='utf-8')
        write_handle.write(data)
        write_handle.close()

For what it's worth though, I couldn't reproduce your problem with either your 
code or mine.  This is not too surprising as all the ascii characters are 
encoded identically in utf-8 and latin-1.  So your program should output 
exactly the same file as it reads, if the contents of the file just read 
"Belgium"

Cheers,
Cliff
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: encoding latin1 to utf-8

Reply via email to