On 12/01/2011 16:24, Giuseppe Penone wrote: > Yes I also was thinking that, being the first two chars not valid (\0xff and > \0xfe)
That would be the BOM (Byte Order Mark)... , the problem is that I cannot find a reference to understand what is > the encoding according to those chars. ... for UTF-16LE (or UTF-16 for short). You'll also want to be careful about NULL characters. The attached fragment accepts "html" pastes from firefox/thinderbird and correctly shows the Arabic fragment from your original message when copied from thunderbird. Hey, it even honors RTL, which is kinda neat :) mvg, Dieter
import gtk def on_paste(textview, clipboard): textview.stop_emission("paste-clipboard") targets = clipboard.wait_for_targets() if 'text/html' in targets: clipboard.request_contents('text/html', paste_html, textview.get_buffer()) return True def paste_html(clipboard, selectiondata, textbuffer): selection_data = selectiondata.data.decode('utf_16').replace('\x00', '') textbuffer.insert_at_cursor(selection_data) return True if __name__ == '__main__': clipboard = gtk.clipboard_get() window = gtk.Window() window.connect('delete-event', gtk.main_quit) buffer = gtk.TextBuffer() textview = gtk.TextView(buffer) textview.connect('paste-clipboard', on_paste, clipboard) window.add(textview) window.show_all() gtk.main()
_______________________________________________ pygtk mailing list pygtk@daa.com.au http://www.daa.com.au/mailman/listinfo/pygtk Read the PyGTK FAQ: http://faq.pygtk.org/