>>>>> Dave Angel <da...@ieee.org> (DA) wrote: [snip] >DA> Thanks for the correction. What I meant by "works for me" is that the >DA> single example in the docstring translated okay. But I do have a lot to >DA> learn about using Unicode in sources, and I want to learn.
>DA> So tell me, how were we supposed to guess what encoding the original >DA> message used? I originally had the mailing list message (in Thunderbird >DA> email). When I copied (copy/paste) to Komodo IDE (text editor), it wouldn't >DA> let me save because the file type was ASCII. So I randomly chosen latin-1 >DA> for file type, and it seemed to like it. You can see the encoding of the message in its headers. But it is not important, as the Unicode characters you see is what it is about. You just copy and paste them in your Python file. The Python file does not have to use the same encoding as the message from which you pasted. The editor will do the proper conversion. (If it doesn't throw it away immediately.) Only for the Python file you must choose an encoding that can encode all the characters that are in the file. In this case utf-8 is the only reasonable choice, but if there are only latin-1 characters in the file then of course latin-1 (iso-8859-1) will also be good. Any decent editor will only allow you to save in an encoding that can encode all the characters in the file, otherwise you will lose some characters. Because Python must also know which encoding you used and this is not in itself deductible from the file contents, you need the coding declaration. And it must be the same as the encoding in which the file is saved, otherwise Python will see something different than you saw in your editor. Sooner or later this will give you a big headache. >DA> At that point I expected and got errors from Python because I had no coding >DA> declaration. I used latin-1, and still had problems, though I forget what >DA> they were. Only when I changed the file encoding type again, to utf-8, did >DA> the errors go away. I agree that they should agree, but I don't know how to >DA> reconcile the copy/paste boundary, the file type (without BOM, which is >DA> another variable), the coding declaration, and the stdout implicit ASCII >DA> encoding. I understand a bunch of it, but not enough to be able to safely >DA> walk through the choices. >DA> Is this all written up in one place, to where an experienced programmer can >DA> make sense of it? I've nibbled at the edges (even wrote a UTF-8 >DA> encoder/decoder a dozen years ago). I don't know a place. Usually utf-8 is a safe bet but in some cases can be overkill. And then in you Python input/output (read/write) you may have to use a different encoding if the programs that you have to communicate with expect something different. -- Piet van Oostrum <p...@vanoostrum.org> WWW: http://pietvanoostrum.com/ PGP key: [8DAE142BE17999C4] -- http://mail.python.org/mailman/listinfo/python-list