On Mon, Sep 10, 2007 at 12:25:46PM -0000, Harshad Modi wrote regarding encoding
latin1 to utf-8:
> Path:
> news.xs4all.nl!newsspool.news.xs4all.nl!transit.news.xs4all.nl!newsgate.cistron.nl!xs4all!news.glorb.com!postnews.google.com!22g2000hsm.googlegroups.com!not-for-mail
>
> hello ,
> I make one function for encoding latin1 to utf-8. but i think it is
> not work proper.
> plz guide me.
>
> it is not get proper result . such that i got "Belgi???" using this
> method, (Belgium) :
>
> import codecs
> import sys
> # Encoding / decoding functions
> def encode(filename):
> file = codecs.open(filename, encoding="latin-1")
> data = file.read()
> file = codecs.open(filename,"wb", encoding="utf-8")
> file.write(data)
>
> file_name=sys.argv[1]
> encode(file_name)
Some tips to help you out.
1. Close your filehandles when you're done with them.
2. Don't shadow builtin names. Python uses the name file, and binding it to
your own function can have ugly side effects that manifest down the road.
So perhaps try the following:
import codecs
def encode(filename):
read_handle = codecs.open(filename, encoding='latin-1')
data = read_handle.read()
read_handle.close()
write_handle = codecs.open(filename, 'wb', encoding='utf-8')
write_handle.write(data)
write_handle.close()
For what it's worth though, I couldn't reproduce your problem with either your
code or mine. This is not too surprising as all the ascii characters are
encoded identically in utf-8 and latin-1. So your program should output
exactly the same file as it reads, if the contents of the file just read
"Belgium"
Cheers,
Cliff
--
http://mail.python.org/mailman/listinfo/python-list