On Wed, 2004-12-29 at 23:54, Thomas Heller wrote: > I found the discussion of unicode, in any python book I have, insufficient.
I couldn't agree more. I think explicit treatment of implicit conversion, the role of sysdefaultencoding, the u'' constructor and unicode() built in, etc would be helpful to many. A clear explanation of why Python strings, despite being assumed to be ASCII, can contain any 8-bit data in any text encoding (or no text encoding at all) may also help newbies. I spent a while fighting to understand the way python handles encodings a while ago and benefited significantly from it - but there really needs to be a good explanation. The relationship between 'str' and 'unicode' objects, the way implicit conversion works with sysdefaultencoding, and how explicit conversions between encodings and to/from unicode, in particular, need attention. It'd also be REALLY good to mention the role of, and importance of, the coding: line. An explanation of its relationship with the interpretation of strings in the script, and with the sysdefaultencoding, would also be helpful, as IMO the script encodings PEP only really makes sense once you already understand it. It wouldn't hurt to point C extension authors at things like the 'es' encoded string format for PyArg_ParseTuple to help them make their code better behaved with non-ascii text. -- Craig Ringer -- http://mail.python.org/mailman/listinfo/python-list