You might want to try this. This works for me.

def convert_HTMLToUnicode(string):

    u = ""
    n = len(string)
    i = 0

    while i < n:

        if i < n-5:

            if string[i:i+3] == '&#x' and string[i+5] == ';':

                u += unichr(int(string[i+3:i+5], 16)).encode('utf-8')
                i += 6

                continue

        u += string[i]
        i += 1

    return u

On Mon, Dec 28, 2009 at 2:17 PM, David López Luengo <ole...@gmail.com>wrote:

> Hi everybody there!, here's a quick question.
>
> I'm getting the text from an entry:
>
> <gtk.Entry-instance>.get_text()
>
> Which actually has 'text text \xff text text'. This strings is returned "as
> is", this mean, with each character, including \ and x and f and f, all I
> want to do is just get the same string BUT with the scaped character "\xff"
> as just one byte, I have read gtk.Entry reference and I think it is not
> possible from there, instead of that I have to get the text "as is" and then
> manipulate it to transform those four bytes not escaped in just one escaped
> character. Do you know how to do that? I suppose it is possible using
> functions of python str class but which and how?. This could be a question
> for "python strings mailing list", but I'm sure someone have this problem
> before.
>
> Thanks for your wisdom :)
>
>
>
> --
> David
>
> _______________________________________________
> pygtk mailing list   pygtk@daa.com.au
> http://www.daa.com.au/mailman/listinfo/pygtk
> Read the PyGTK FAQ: http://faq.pygtk.org/
>



-- 
b3rx
_______________________________________________
pygtk mailing list   pygtk@daa.com.au
http://www.daa.com.au/mailman/listinfo/pygtk
Read the PyGTK FAQ: http://faq.pygtk.org/

Reply via email to