In article <499f397c.7030...@v.loewis.de>, "Martin v. Löwis" <mar...@v.loewis.de> wrote:
> > Yes, I know that. But every concrete representation of a unicode string > > has to have an encoding associated with it, including unicode strings > > produced by the Python parser when it parses the ascii string "u'\xb5'" > > > > My question is: what is that encoding? > > The internal representation is either UTF-16, or UTF-32; which one is > a compile-time choice (i.e. when the Python interpreter is built). > > > Put this another way: I would have thought that when the Python parser > > parses "u'\xb5'" it would produce the same result as calling > > unicode('\xb5'), but it doesn't. > > Right. In the former case, \xb5 denotes a Unicode character, namely > U+00B5, MICRO SIGN. It is the same as u"\u00b5", and still the same > as u"\N{MICRO SIGN}". By "the same", I mean "the very same". > > OTOH, unicode('\xb5') is something entirely different. '\xb5' is a > byte string with length 1, with a single byte with the numeric > value 0xb5, or 181. It does not, per se, denote any specific character. > It only gets a character meaning when you try to decode it to unicode, > which you do with unicode('\xb5'). This is short for > > unicode('\xb5', sys.getdefaultencoding()) > > and sys.getdefaultencoding() is (or should be) "ascii". Now, in > ASCII, byte 0xb5 does not have a meaning (i.e. it does not denote > a character at all), hence you get a UnicodeError. > > > Instead it seems to produce the same > > result as calling unicode('\xb5', 'latin-1'). > > Sure. However, this is only by coincidence, because latin-1 has the same > code points as Unicode (for 0..255). > > > But my default encoding > > is not latin-1, it's ascii. So where is the Python parser getting its > > encoding from? Why does parsing "u'\xb5'" not produce the same error as > > calling unicode('\xb5')? > > Because \xb5 *directly* refers to character U+00b5, with no > byte-oriented encoding in-between. > > Regards, > Martin OK, I think I get it now. Thanks! rg -- http://mail.python.org/mailman/listinfo/python-list