Peter Otten wrote: > ygao wrote: > > >>>> compile('U"中"','c:/test','single') > > <code object ? at 00F06B60, file "c:/test", line 1> > >>>> d=compile('U"中"','c:/test','single') > >>>> d > > <code object ? at 00F06BA0, file "c:/test", line 1> > >>>> exec(d) > > u'\xd6\xd0' > >>>> U"中" > > u'\u4e2d' > >>>> > > > > why is the result different? > > a bug or another reason? > > How that particular output came to be I don't know, but you should be able > to avoid the confusion by either passing a unicode string to compile() or > specifying the encoding: > > >>> exec compile(u'u"中"','c:/test','single') > u'\u4e2d' > >>> exec compile('# -*- coding: utf8 -*-\nu"中"','c:/test','single') > u'\u4e2d' > > Peter > > PS: In and all-UTF-8 environment I would have /expected/ to see > > >>> your_encoding = "utf8" > >>> identity = "latin1" > >>> u'\u4e2d'.encode(your_encoding).decode(identity) > u'\xe4\xb8\xad' > > and that's indeed what I get over here: > > >>> exec compile('u"中"','c:/test','single') > u'\xe4\xb8\xad'
But it's not an all-UTF-8 environment; his_encoding = 'gb2312' or one of its heirs/successors :-) Cheers, John -- http://mail.python.org/mailman/listinfo/python-list