On 16 May 2005 16:44:30 -0700, [EMAIL PROTECTED] wrote: > >[EMAIL PROTECTED] wrote: >> John Machin ??: >> > On 16 May 2005 10:15:22 -0700, [EMAIL PROTECTED] wrote: >> > >> > >[EMAIL PROTECTED] wrote: >> > >> Python's InteractiveInterpreter uses the built-in compile >> function. >> > >> >> > >> According to the ref. manual, it doesn't seem to concern about >the >> > >> encoding of the source string. >> > >> >> > >> When I hand in an unicode object, it is encoded in utf-8 >> > >automatically. >> > >> It can be a problem when I'm building an interactive environment >> > >using >> > >> "compile", with a different encoding from utf-8. >> >
==== This is *EXACTLY* what your problem is ==== >> > I don't understand this. Suppose your "different encoding" is >cp125x >> > (where x is a digit). Would you not do something like this? >> > >> > compile_input = user_input.decode('cp125x') >> > code_object = compile(compile_input, ...... ================================================= ==== It would have helped had you followed this ========== >> > and when it comes to Unicode >> > objects (indeed any text), show us repr(text) -- "what you see is >> > often not what others see and often not what you've actually got". =========================================================== >> Okay, I'll use one of the CJK codecs as the example. EUC-KR is the >> default encoding. >> >> >>> import sys;sys.getdefaultencoding() >> 'euc-kr' >> >>> '??' # There's a very strong assumption that the above was originally encoded in euc-kr but by the time I copied the 2 chars out of my browser it was definitely Unicode. See what I mean about using repr()? >> '\xc7\xd1\xb1\xdb' >> >>> u'??' >> u'\ud55c\uae00' >> >>> s=compile("inside=u'??'",'','single') >> >>> exec s >> >>> inside #wrong [big snip] Like I said, *ALL* you have to do (like in any other Unicode-aware app) is decode your user input into Unicode (you *don't* need to parse bits and pieces of it) and feed it in ... like this: >>> user_input_kr = "inside=u'\xc7\xd1\xb1\xdb'" >>> user_input_uc = user_input_kr.decode('euc-kr') >>> user_input_uc u"inside=u'\ud55c\uae00'" >>> s = compile(user_input_uc, '', 'single') >>> exec s >>> inside u'\ud55c\uae00' >>> # right HTH, John -- http://mail.python.org/mailman/listinfo/python-list