Ben Finney wrote:
I'd phrase that as:

* Text is a sequence of characters. Most inputs to the program,
   including files, sockets, etc., contain a sequence of bytes.

* Always know whether you're dealing with text or with bytes. No object
   can be both.

* In Python 2, ‘str’ is the type for a sequence of bytes. ‘unicode’ is
   the type for text.

* In Python 3, ‘str’ is the type for text. ‘bytes’ is the type for a
   sequence of bytes.


That is very helpful...   thanks


MRAB, Steve, John, Terry, Ben F, Ben K, Ian...
...thank you guys so much, I think I've got a better picture now of what is going on... this is also one place where I don't think the books are as clear as they need to be at least for me...(Lutz, Summerfield).

So, the UTF-16 UTF-32 is INTERNAL only, for Python... and text in/out is based on locale... in my case UTF-8 ...that is enormously helpful for me... understanding locale on this system is as mystifying as unicode is in the first place. Well, after reading about unicode tonight (about four hours) I realize that its not really that hard... there's just a lot of details that have to come together. Straightening out that whole tower-of-babel thing is sure a pain in the butt. I also was not aware that UTF-8 chars could be up to six(6) byes long from left to right. I see now that the little-endianness I was ascribing to python is just a function of hexdump... and I was a little disappointed to find that hexdump does not support UTF-8, just ascii...doh.
Anyway, thanks again... I've got enough now to play around a bit...

PS thanks Steve for that link, informative and entertaining too... Joe says, "If you are a programmer . . . and you don't know the basics of characters, character sets, encodings, and Unicode, and I catch you, I'm going to punish you by making you peel onions for 6 months in a submarine. I swear I will". :)








kind regards,
m harris





--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to