Re: Trying to set a cookie within a python script

MRAB Tue, 03 Aug 2010 12:07:12 -0700

Dave Angel wrote:

¯º¿Â wrote:
On 3 Αύγ, 18:41, Dave Angel <[email protected]> wrote:
Different encodings equal different ways of storing the data to the
media, correct?
Exactly. The file is a stream of bytes, and Unicode has more than 256
possible characters. Further, even the subset of characters that *do*
take one byte are different for different encodings. So you need to tell
the editor what encoding you want to use.
For example an 'a' char in iso-8859-1 is stored different than an 'a'
char in iso-8859-7 and an 'a' char of utf-8 ?
Nope, the ASCII subset is identical. It's the ones between 80 and ffthat differ, and of course not all of those. Further, some of the codesthat are one byte in 8859 are two bytes in utf-8.
You *could* just decide that you're going to hardwire the assumptionthat you'll be dealing with a single character set that does fit in 8bits, and most of this complexity goes away. But if you do that, do*NOT* use utf-8.
But if you do want to be able to handle more than 256 characters, ormore than one encoding, read on.
Many people confuse encoding and decoding. A unicode character is anabstraction which represents a raw character. For convenience, the first128 code points map directly onto the 7 bit encoding called ASCII. Butbefore Unicode there were several other extensions to 256, which wereincompatible with each other. For example, a byte which might be aEuropean character in one such encoding might be a kata-kana characterin another one. Each encoding was 8 bits, but it was difficult for asingle program to handle more than one such encoding.

One encoding might be ASCII + accented Latin, another ASCII + Greek,
another ASCII + Cyrillic, etc. If you wanted ASCII + accented Latin +
Greek then you'd need more than 1 byte per character.

If you're working with multiple alphabets it gets very messy, which is
where Unicode comes in. It contains all those characters, and UTF-8 can
encode all of them in a straightforward manner.

So along comes unicode, which is typically implemented in 16 or 32 bitcells. And it has an 8 bit encoding called utf-8 which uses one byte forthe first 192 characters (I think), and two bytes for some more, andthree bytes beyond that.

[snip]
In UTF-8 the first 128 codepoints are encoded to 1 byte.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Trying to set a cookie within a python script

Reply via email to