You might want to try this. This works for me. def convert_HTMLToUnicode(string):
u = "" n = len(string) i = 0 while i < n: if i < n-5: if string[i:i+3] == '&#x' and string[i+5] == ';': u += unichr(int(string[i+3:i+5], 16)).encode('utf-8') i += 6 continue u += string[i] i += 1 return u On Mon, Dec 28, 2009 at 2:17 PM, David López Luengo <ole...@gmail.com>wrote: > Hi everybody there!, here's a quick question. > > I'm getting the text from an entry: > > <gtk.Entry-instance>.get_text() > > Which actually has 'text text \xff text text'. This strings is returned "as > is", this mean, with each character, including \ and x and f and f, all I > want to do is just get the same string BUT with the scaped character "\xff" > as just one byte, I have read gtk.Entry reference and I think it is not > possible from there, instead of that I have to get the text "as is" and then > manipulate it to transform those four bytes not escaped in just one escaped > character. Do you know how to do that? I suppose it is possible using > functions of python str class but which and how?. This could be a question > for "python strings mailing list", but I'm sure someone have this problem > before. > > Thanks for your wisdom :) > > > > -- > David > > _______________________________________________ > pygtk mailing list pygtk@daa.com.au > http://www.daa.com.au/mailman/listinfo/pygtk > Read the PyGTK FAQ: http://faq.pygtk.org/ > -- b3rx
_______________________________________________ pygtk mailing list pygtk@daa.com.au http://www.daa.com.au/mailman/listinfo/pygtk Read the PyGTK FAQ: http://faq.pygtk.org/