[EMAIL PROTECTED] wrote: > John Machin ìì: > > On 16 May 2005 10:15:22 -0700, [EMAIL PROTECTED] wrote: > > > > >[EMAIL PROTECTED] wrote: > > >> Python's InteractiveInterpreter uses the built-in compile > function. > > >> > > >> According to the ref. manual, it doesn't seem to concern about the > > >> encoding of the source string. > > >> > > >> When I hand in an unicode object, it is encoded in utf-8 > > >automatically. > > >> It can be a problem when I'm building an interactive environment > > >using > > >> "compile", with a different encoding from utf-8. > > > > I don't understand this. Suppose your "different encoding" is cp125x > > (where x is a digit). Would you not do something like this? > > > > compile_input = user_input.decode('cp125x') > > code_object = compile(compile_input, ...... > > > > > > >> IDLE itself has the > > >> same problem. ( '<a string with non-ascii-encoding>' is treated > okay > > >> but u'<a string with non-ascii-encoding>' is treated wrong.) > > >> > > >> Any suggestions or any plans in future python versions? > > > > > >I've read a posting from Martin Von Loewis mentioning trying to > build > > >in that feature(optionally marking encoding when calling "compile"). > > >Anyone knows how it is going on? > > > > Firstly, it would help those who might be trying to help you if you > > could post a simple example: input, output, what error message, what > > you mean by 'is treated wrong' ... and when it comes to Unicode > > objects (indeed any text), show us repr(text) -- "what you see is > > often not what others see and often not what you've actually got". > > > > Secondly, are any of the contents of PEP 263 of any use to you? > > http://www.python.org/peps/pep-0263.html > > > Okay, I'll use one of the CJK codecs as the example. EUC-KR is the > default encoding. > > >>> import sys;sys.getdefaultencoding() > 'euc-kr' > >>> 'íê' > '\xc7\xd1\xb1\xdb' > >>> u'íê' > u'\ud55c\uae00' > >>> s=compile("inside=u'íê'",'','single') > >>> exec s > >>> inside #wrong > u'\xc7\xd1\xb1\xdb' > >>> s=compile(u"inside=u'íê'",'','single') > >>> exec s > >>> inside #correct > u'\ud55c\uae00' > > So I reckon that the "compile" should get a unicode object. However... > > C:\Python24\Lib>python code.py > > <string>(1)?() > (Pdb) c > Python 2.4 (#60, Nov 30 2004, 11:49:19) [MSC v.1310 32 bit (Intel)] on > win32 > Type "help", "copyright", "credits" or "license" for more information. > (InteractiveConsole) > >>> 'íê' > '\xc7\xd1\xb1\xdb' > >>> u'íê' #wrong.. should be u'\ud55c\uae00' instead > u'\xc7\xd1\xb1\xdb' > >>> import sys;sys.getdefaultencoding() > 'euc-kr' > >>> ^Z > > Am I right that I assume the problem lies in the code.py(and therefore > in codeop.py)? To correct the problem, I seem to parse each string and > change the literal unicode object... Hmm... Sounds a bad approach.
Oh, I forgot one more thing. C:\Python24\Lib>python Python 2.4 (#60, Nov 30 2004, 11:49:19) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> s=compile(u"'íê'",'','single') >>> exec s #wrong. the result is encoded in utf-8 instead of euc-kr '\xed\x95\x9c\xea\xb8\x80' >>> s=compile(u"u'íê'",'','single') >>> exec s #correct u'\ud55c\uae00' >>> -- http://mail.python.org/mailman/listinfo/python-list