John Machin ìì: > On 16 May 2005 10:15:22 -0700, [EMAIL PROTECTED] wrote: > > >[EMAIL PROTECTED] wrote: > >> Python's InteractiveInterpreter uses the built-in compile function. > >> > >> According to the ref. manual, it doesn't seem to concern about the > >> encoding of the source string. > >> > >> When I hand in an unicode object, it is encoded in utf-8 > >automatically. > >> It can be a problem when I'm building an interactive environment > >using > >> "compile", with a different encoding from utf-8. > > I don't understand this. Suppose your "different encoding" is cp125x > (where x is a digit). Would you not do something like this? > > compile_input = user_input.decode('cp125x') > code_object = compile(compile_input, ...... > > > >> IDLE itself has the > >> same problem. ( '<a string with non-ascii-encoding>' is treated okay > >> but u'<a string with non-ascii-encoding>' is treated wrong.) > >> > >> Any suggestions or any plans in future python versions? > > > >I've read a posting from Martin Von Loewis mentioning trying to build > >in that feature(optionally marking encoding when calling "compile"). > >Anyone knows how it is going on? > > Firstly, it would help those who might be trying to help you if you > could post a simple example: input, output, what error message, what > you mean by 'is treated wrong' ... and when it comes to Unicode > objects (indeed any text), show us repr(text) -- "what you see is > often not what others see and often not what you've actually got". > > Secondly, are any of the contents of PEP 263 of any use to you? > http://www.python.org/peps/pep-0263.html
Okay, I'll use one of the CJK codecs as the example. EUC-KR is the default encoding. >>> import sys;sys.getdefaultencoding() 'euc-kr' >>> 'íê' '\xc7\xd1\xb1\xdb' >>> u'íê' u'\ud55c\uae00' >>> s=compile("inside=u'íê'",'','single') >>> exec s >>> inside #wrong u'\xc7\xd1\xb1\xdb' >>> s=compile(u"inside=u'íê'",'','single') >>> exec s >>> inside #correct u'\ud55c\uae00' So I reckon that the "compile" should get a unicode object. However... C:\Python24\Lib>python code.py > <string>(1)?() (Pdb) c Python 2.4 (#60, Nov 30 2004, 11:49:19) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. (InteractiveConsole) >>> 'íê' '\xc7\xd1\xb1\xdb' >>> u'íê' #wrong.. should be u'\ud55c\uae00' instead u'\xc7\xd1\xb1\xdb' >>> import sys;sys.getdefaultencoding() 'euc-kr' >>> ^Z Am I right that I assume the problem lies in the code.py(and therefore in codeop.py)? To correct the problem, I seem to parse each string and change the literal unicode object... Hmm... Sounds a bad approach. -- http://mail.python.org/mailman/listinfo/python-list