In article <499f3a8f.9010...@v.loewis.de>, "Martin v. Löwis" <mar...@v.loewis.de> wrote:
> >>>>> u'\xb5' > >> u'\xb5' > >>>>> print u'\xb5' > >> ? > > > > Unicode literals are *in the source file*, which can only have one > > encoding (for a given source file). > > > >> (That last character shows up as a micron sign despite the fact that > >> my default encoding is ascii, so it seems to me that that unicode > >> string must somehow have picked up a latin-1 encoding.) > > > > I think latin-1 was the default without a coding cookie line. (May be > > uft-8 in 3.0). > > It is, but that's irrelevant for the example. In the source > > u'\xb5' > > all characters are ASCII (i.e. all of "letter u", "single > quote", "backslash", "letter x", "letter b", "digit 5"). > As a consequence, this source text has the same meaning in all > supported source encodings (as source encodings must be ASCII > supersets). > > The Unicode literal shown here does not get its interpretation > from Latin-1. Instead, it directly gets its interpretation from > the Unicode coded character set. The string is a short-hand > for > > u'\u00b5' > > and this denotes character U+00B5 (just as u'\u20ac" denotes > U+20AC; the same holds for any other u'\uXXXX'). > > HTH, > Martin Ah, that makes sense. Thanks! rg -- http://mail.python.org/mailman/listinfo/python-list