Jeffrey Barish wrote: >>Luke Dunstan wrote: >> >> >>>----- Original Message ----- >>>From: "Jeffrey Barish" <[EMAIL PROTECTED]> >>>To: <pythonce@python.org> >>>Sent: Friday, February 24, 2006 11:03 AM >>>Subject: [PythonCE] Unicode default encoding >>> >>> >>> >>>>What is the correct way to set PythonCE's default Unicode encoding? My >>>>reading (Python in a Nutshell) indicates that I am supposed to make a >>>>change to site.py, but there doesn't seem to be a site.py in >>>>PythonCE. (The closest I came is a site.pyc in python23.zip.) Nutshell >>>>suggests that in desperation one could put the following at the start of >>>>the main script: >>>> >>>>import sys >>>>reload(sys) >>>>sys.setdefaultencoding('iso-8859-15') >>>>del sys.setdefaultencoding >>>> >>>>This code solved the problem I was having reading and processing text that >>>>contains Unicode characters, but I am uncomfortable leaving a desperation >>>>solution in place. >>>> >>>> >>>> >>>I don't think modifying site.py would be a good solution, because if you >>>upgrade or reinstall python then the script will be overwritten. If you >>>only want to run your program on your own system then a better solution is >>>to create a file sitecustomize.py in your Python\Lib directory containing >>>this: >>> >>>import sys >>>sys.setdefaultencoding('iso-8859-15') >>> >>>If you want to distribute your program to other people though, you can't >>>expect them to change their default encoding so it is better not to rely on >>>the default encoding at all. >>> >>> >>> >>> >>Yep, using unicode and explicitly encoding/decoding is a better approach. >> >>Fuzzyman >> >> > >Once again, I am forced to display my ignorance. Sorry guys. I really don't >know much about Unicode. The solution that Luke suggested (sitecustomize.py >in my Python\Lib directory) works fine for me, but I am concerned about the >suggestion from him and Fuzzyman that explicit encoding/decoding is a better >approach. What is explicit encoding/decoding? Can someone point me to a >good resource for learning how to deal with Unicode correctly? > > Unicode, and text encodings in general, is a bit of a learning curve. Once you get your head round it, Python makes it pretty straightforward.
Simple rules : * In Python text *really* means a unicode string * Because ordinary strings are really just strings of bytes * If you know the encoding, decode it to turn it into encoding * When writing or printing, encode it to turn it back into bytes * If you don't know the encoding then you better pray that whatever it is is encoded in the system default. ;-) byte_string = open(filename).read() # read a file text = byte_string.decode('utf_8') # we know it is UTF8, so we decode to unicode # ....code that uses the text byte_string = text.encode('utf_8') # we encode it back to UTF8 open(filename, 'w').write(byte_string) # so we can write it back out Decoding turns a byte string into a unicode object. Encoding turns a unicode object into a byte string. If this still confuses you (which it probably does) then there are lots of good resources. I happen to like : http://www.pyzine.com/Issue008/Section_Articles/article_Encodings.html Which seems to be down at the moment. :-( All the best, Fuzzyman http://www.voidspace.org.uk/python/index.shtml _______________________________________________ PythonCE mailing list PythonCE@python.org http://mail.python.org/mailman/listinfo/pythonce