On 12/01/2011 16:24, Giuseppe Penone wrote:
> Yes I also was thinking that, being the first two chars not valid (\0xff and
> \0xfe)

That would be the BOM (Byte Order Mark)...

, the problem is that I cannot find a reference to understand what is
> the encoding according to those chars.

... for UTF-16LE (or UTF-16 for short). You'll also want to be careful
about NULL characters.

The attached fragment accepts "html" pastes from firefox/thinderbird
and correctly shows the Arabic fragment from your original message
when copied from thunderbird.

Hey, it even honors RTL, which is kinda neat :)

mvg,
Dieter
import gtk


def on_paste(textview, clipboard):
    textview.stop_emission("paste-clipboard")
    targets = clipboard.wait_for_targets()

    if 'text/html' in targets:
        clipboard.request_contents('text/html', paste_html, 
textview.get_buffer())
        return True

def paste_html(clipboard, selectiondata, textbuffer):
    selection_data = selectiondata.data.decode('utf_16').replace('\x00', '')
    textbuffer.insert_at_cursor(selection_data)
    return True

if __name__ == '__main__':
    clipboard = gtk.clipboard_get()
    
    window = gtk.Window()
    window.connect('delete-event', gtk.main_quit)
    buffer = gtk.TextBuffer()
    textview = gtk.TextView(buffer)
    textview.connect('paste-clipboard', on_paste, clipboard)
    window.add(textview)
    window.show_all()

    gtk.main()
_______________________________________________
pygtk mailing list   pygtk@daa.com.au
http://www.daa.com.au/mailman/listinfo/pygtk
Read the PyGTK FAQ: http://faq.pygtk.org/

Reply via email to