Claudio Grondi wrote: > [EMAIL PROTECTED] wrote: > > Here is my script: > > > > from mechanize import * > > from BeautifulSoup import * > > import StringIO > > b = Browser() > > f = b.open("http://www.translate.ru/text.asp?lang=ru") > > b.select_form(nr=0) > > b["source"] = "hello python" > > html = b.submit().get_data() > > soup = BeautifulSoup(html) > > print soup.find("span", id = "r_text").string > > > > OUTPUT: > > привет > > питон > > ---------- > > In russian it looks like: > > "привет питон" > > > > How can I translate this using standard Python libraries?? > > > > -- > > Pak Andrei, http://paxoblog.blogspot.com, icq://97449800 > > > Translate to what and with what purpose? > > Assuming your intention is to get a Python Unicode string, what about: > > strHTML = 'привет > питон' > strUnicodeHexCode = strHTML.replace('&#','\u').replace(';','') > strUnicode = eval("u'%s'"%strUnicodeHexCode) > > ? > > I am sure, there is a more elegant and direct solution, but just wanted > to provide here some quick response. > > Claudio Grondi
Thank you, Claudio. Really interest solution, but it doesn't work... In [19]: strHTML = 'привет питон' In [20]: strUnicodeHexCode = strHTML.replace('&#','\u').replace(';','') In [21]: strUnicode = eval("u'%s'"%strUnicodeHexCode) In [22]: print strUnicode --------------------------------------------------------------------------- exceptions.UnicodeEncodeError Traceback (most recent call last) C:\Documents and Settings\dron\<ipython console> C:\usr\lib\encodings\cp866.py in encode(self, input, errors) 16 def encode(self,input,errors='strict'): 17 ---> 18 return codecs.charmap_encode(input,errors,encoding_map) 19 20 def decode(self,input,errors='strict'): UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-5: character maps to <undefined> In [23]: print strUnicode.encode("utf-8") сВЗсВИсВАсБ┤сБ╖сВР сВЗсВАсВРсВЖсВЕ <-- it's not my string "привет питон" In [24]: strUnicode.encode("utf-8") Out[24]: '\xe1\x82\x87\xe1\x82\x88\xe1\x82\x80\xe1\x81\xb4\xe1\x81\xb7\xe1\x82\x90 \xe1\x82\x87\xe1\x82\x80\xe1\x82\x90\xe1\x82\x86\xe1\x82\ x85' <-- and too many chars -- http://mail.python.org/mailman/listinfo/python-list