[pygtk] problem pasting clipboard content from arabic website (target text/html)
Hi, I have a problem with gtk.clipboard() and gtk.selectiondata() in my open source app http://giuspen.com/cherrytree when pasting clipboard content from arabic website taking data from the target text/html. when printing the selectiondata.data I can see that the content is: (print to terminal) ��span style=font-size: 130%;span style=color: red;span style=color: black;'DE4GH1 H'D1'9 AI 'D'5/'1 'D,/J/ *E '5D'- (96 'D9DD H*E 9ED/span/span/span where the first two chars are 255 and 254. when I write to the textbuffer I see: 'DE4GH1 H'D1'9 AI 'D'5/'1 'D,/J/ *E '5D'- (96 'D9DD H*E 9ED instead of: المشهور والرائع فى الاصدار الجديد تم اصلاح بعض العلل وتم عمل I checked the source code of the website (http://ubuntu-c.blogspot.com/) and the encoding is utf-8, so I don't have idea of how to behave. Please if anybody can give me a clue help me. Regards, Giuseppe. ___ pygtk mailing list pygtk@daa.com.au http://www.daa.com.au/mailman/listinfo/pygtk Read the PyGTK FAQ: http://faq.pygtk.org/
Re: [pygtk] problem pasting clipboard content from arabic website (target text/html)
On Wed, Jan 12, 2011 at 13:34, Giuseppe Penone gius...@gmail.com wrote: Hi, I have a problem with gtk.clipboard() and gtk.selectiondata() in my open source app http://giuspen.com/cherrytree when pasting clipboard content from arabic website taking data from the target text/html. What browser is it? I think Mozilla used to put utf-16 in text/html. Regards, Tomeu when printing the selectiondata.data I can see that the content is: (print to terminal) ��span style=font-size: 130%;span style=color: red;span style=color: black;' D E 4 G H 1 H ' D 1 ' 9 A I ' D ' 5 / ' 1 ' D , / J / * E ' 5 D ' - ( 9 6 ' D 9 D D H * E 9 E D /span/span/span where the first two chars are 255 and 254. when I write to the textbuffer I see: 'DE4GH1 H'D1'9 AI 'D'5/'1 'D,/J/ *E '5D'- (96 'D9DD H*E 9ED instead of: المشهور والرائع فى الاصدار الجديد تم اصلاح بعض العلل وتم عمل I checked the source code of the website (http://ubuntu-c.blogspot.com/) and the encoding is utf-8, so I don't have idea of how to behave. Please if anybody can give me a clue help me. Regards, Giuseppe. ___ pygtk mailing list pygtk@daa.com.au http://www.daa.com.au/mailman/listinfo/pygtk Read the PyGTK FAQ: http://faq.pygtk.org/ ___ pygtk mailing list pygtk@daa.com.au http://www.daa.com.au/mailman/listinfo/pygtk Read the PyGTK FAQ: http://faq.pygtk.org/
Re: [pygtk] problem pasting clipboard content from arabic website (target text/html)
Yes it's mozilla firefox. I tried with input_string.decode(utf-16, ignore) but what I get is 猼慰瑳汹㵥昢湯楳敺›㌱┰∻㰾灳湡猠祴敬∽潣潬㩲爠摥∻㰾灳湡猠祴敬∽潣潬㩲戠慬正∻䐾㑅䡇‱❈ㅄ☧‹䥁✠❄⼵ㄧ✠ⱄ䨯 䔪✠䐵ⴧ⠠㘹✠㥄䑄䠠䔪㤠䑅⼼灳湡㰾猯慰㹮⼼灳湡 Regards, Giuseppe. On Wed, Jan 12, 2011 at 3:17 PM, Tomeu Vizoso to...@sugarlabs.org wrote: On Wed, Jan 12, 2011 at 13:34, Giuseppe Penone gius...@gmail.com wrote: Hi, I have a problem with gtk.clipboard() and gtk.selectiondata() in my open source app http://giuspen.com/cherrytree when pasting clipboard content from arabic website taking data from the target text/html. What browser is it? I think Mozilla used to put utf-16 in text/html. Regards, Tomeu when printing the selectiondata.data I can see that the content is: (print to terminal) ��span style=font-size: 130%;span style=color: red;span style=color: black;' D E 4 G H 1 H ' D 1 ' 9 A I ' D ' 5 / ' 1 ' D , / J / * E ' 5 D ' - ( 9 6 ' D 9 D D H * E 9 E D /span/span/span where the first two chars are 255 and 254. when I write to the textbuffer I see: 'DE4GH1 H'D1'9 AI 'D'5/'1 'D,/J/ *E '5D'- (96 'D9DD H*E 9ED instead of: المشهور والرائع فى الاصدار الجديد تم اصلاح بعض العلل وتم عمل I checked the source code of the website (http://ubuntu-c.blogspot.com/) and the encoding is utf-8, so I don't have idea of how to behave. Please if anybody can give me a clue help me. Regards, Giuseppe. ___ pygtk mailing list pygtk@daa.com.au http://www.daa.com.au/mailman/listinfo/pygtk Read the PyGTK FAQ: http://faq.pygtk.org/ ___ pygtk mailing list pygtk@daa.com.au http://www.daa.com.au/mailman/listinfo/pygtk Read the PyGTK FAQ: http://faq.pygtk.org/
Re: [pygtk] problem pasting clipboard content from arabic website (target text/html)
On 12/01/2011 16:03, Giuseppe Penone wrote: Yes it's mozilla firefox. I tried with input_string.decode(utf-16, ignore) but what I get is 猼慰瑳汹㵥昢湯楳敺›㌱┰∻㰾灳湡猠祴敬∽潣潬㩲爠摥∻㰾灳湡猠祴敬∽潣潬㩲戠慬正∻䐾㑅䡇‱❈ㅄ☧‹䥁✠❄⼵ㄧ✠ⱄ䨯 䔪✠䐵ⴧ⠠㘹✠㥄䑄䠠䔪㤠䑅⼼灳湡㰾猯慰㹮⼼灳湡 Never had a need for it myself, but I've stumbled over this a couple of times: the tuple returned by clipboard.wait_for_targets() seems to contain a hint to the encoding of the data. At least in the case where you copied from a mozilla application (firefox/thunderbird/maybe others). hth, Dieter ___ pygtk mailing list pygtk@daa.com.au http://www.daa.com.au/mailman/listinfo/pygtk Read the PyGTK FAQ: http://faq.pygtk.org/
Re: [pygtk] problem pasting clipboard content from arabic website (target text/html)
Yes I also was thinking that, being the first two chars not valid (\0xff and \0xfe), the problem is that I cannot find a reference to understand what is the encoding according to those chars. Looking on the html of the webpage it tells utf-8 but probably then firefox uses another to fill the clipboard. Regards, Giuseppe. On Wed, Jan 12, 2011 at 5:18 PM, Dieter Verfaillie diet...@optionexplicit.be wrote: On 12/01/2011 16:03, Giuseppe Penone wrote: Yes it's mozilla firefox. I tried with input_string.decode(utf-16, ignore) but what I get is 猼慰瑳汹㵥昢湯楳敺›㌱┰∻㰾灳湡猠祴敬∽潣潬㩲爠摥∻㰾灳湡猠祴敬∽潣潬㩲戠慬正∻䐾㑅䡇‱❈ㅄ☧‹䥁✠❄⼵ㄧ✠ⱄ䨯 䔪✠䐵ⴧ⠠㘹✠㥄䑄䠠䔪㤠䑅⼼灳湡㰾猯慰㹮⼼灳湡 Never had a need for it myself, but I've stumbled over this a couple of times: the tuple returned by clipboard.wait_for_targets() seems to contain a hint to the encoding of the data. At least in the case where you copied from a mozilla application (firefox/thunderbird/maybe others). hth, Dieter ___ pygtk mailing list pygtk@daa.com.au http://www.daa.com.au/mailman/listinfo/pygtk Read the PyGTK FAQ: http://faq.pygtk.org/
Re: [pygtk] problem pasting clipboard content from arabic website (target text/html)
On Wed, Jan 12, 2011 at 16:03, Giuseppe Penone gius...@gmail.com wrote: Yes it's mozilla firefox. I tried with input_string.decode(utf-16, ignore) Reading this code helped me with fixing interoperability between Sugar and Mozilla, but was a long time ago and don't remember the details: http://mxr.mozilla.org/mozilla-central/source/widget/src/gtk2/nsClipboard.cpp#707 HTH, Tomeu but what I get is 猼慰瑳汹㵥昢湯楳敺›㌱┰∻㰾灳湡猠祴敬∽潣潬㩲爠摥∻㰾灳湡猠祴敬∽潣潬㩲戠慬正∻䐾㑅䡇‱❈ㅄ☧‹䥁✠❄⼵ㄧ✠ⱄ䨯 䔪✠䐵ⴧ⠠㘹✠㥄䑄䠠䔪㤠䑅⼼灳湡㰾猯慰㹮⼼灳湡 Regards, Giuseppe. On Wed, Jan 12, 2011 at 3:17 PM, Tomeu Vizoso to...@sugarlabs.org wrote: On Wed, Jan 12, 2011 at 13:34, Giuseppe Penone gius...@gmail.com wrote: Hi, I have a problem with gtk.clipboard() and gtk.selectiondata() in my open source app http://giuspen.com/cherrytree when pasting clipboard content from arabic website taking data from the target text/html. What browser is it? I think Mozilla used to put utf-16 in text/html. Regards, Tomeu when printing the selectiondata.data I can see that the content is: (print to terminal) ��span style=font-size: 130%;span style=color: red;span style=color: black;' D E 4 G H 1 H ' D 1 ' 9 A I ' D ' 5 / ' 1 ' D , / J / * E ' 5 D ' - ( 9 6 ' D 9 D D H * E 9 E D /span/span/span where the first two chars are 255 and 254. when I write to the textbuffer I see: 'DE4GH1 H'D1'9 AI 'D'5/'1 'D,/J/ *E '5D'- (96 'D9DD H*E 9ED instead of: المشهور والرائع فى الاصدار الجديد تم اصلاح بعض العلل وتم عمل I checked the source code of the website (http://ubuntu-c.blogspot.com/) and the encoding is utf-8, so I don't have idea of how to behave. Please if anybody can give me a clue help me. Regards, Giuseppe. ___ pygtk mailing list pygtk@daa.com.au http://www.daa.com.au/mailman/listinfo/pygtk Read the PyGTK FAQ: http://faq.pygtk.org/ ___ pygtk mailing list pygtk@daa.com.au http://www.daa.com.au/mailman/listinfo/pygtk Read the PyGTK FAQ: http://faq.pygtk.org/
Re: [pygtk] problem pasting clipboard content from arabic website (target text/html)
On 12/01/2011 16:24, Giuseppe Penone wrote: Yes I also was thinking that, being the first two chars not valid (\0xff and \0xfe) That would be the BOM (Byte Order Mark)... , the problem is that I cannot find a reference to understand what is the encoding according to those chars. ... for UTF-16LE (or UTF-16 for short). You'll also want to be careful about NULL characters. The attached fragment accepts html pastes from firefox/thinderbird and correctly shows the Arabic fragment from your original message when copied from thunderbird. Hey, it even honors RTL, which is kinda neat :) mvg, Dieter import gtk def on_paste(textview, clipboard): textview.stop_emission(paste-clipboard) targets = clipboard.wait_for_targets() if 'text/html' in targets: clipboard.request_contents('text/html', paste_html, textview.get_buffer()) return True def paste_html(clipboard, selectiondata, textbuffer): selection_data = selectiondata.data.decode('utf_16').replace('\x00', '') textbuffer.insert_at_cursor(selection_data) return True if __name__ == '__main__': clipboard = gtk.clipboard_get() window = gtk.Window() window.connect('delete-event', gtk.main_quit) buffer = gtk.TextBuffer() textview = gtk.TextView(buffer) textview.connect('paste-clipboard', on_paste, clipboard) window.add(textview) window.show_all() gtk.main() ___ pygtk mailing list pygtk@daa.com.au http://www.daa.com.au/mailman/listinfo/pygtk Read the PyGTK FAQ: http://faq.pygtk.org/
Re: [pygtk] problem pasting clipboard content from arabic website (target text/html)
Great help, thank you very much. Regards, Giuseppe. On Wed, Jan 12, 2011 at 8:00 PM, Dieter Verfaillie diet...@optionexplicit.be wrote: On 12/01/2011 16:24, Giuseppe Penone wrote: Yes I also was thinking that, being the first two chars not valid (\0xff and \0xfe) That would be the BOM (Byte Order Mark)... , the problem is that I cannot find a reference to understand what is the encoding according to those chars. ... for UTF-16LE (or UTF-16 for short). You'll also want to be careful about NULL characters. The attached fragment accepts html pastes from firefox/thinderbird and correctly shows the Arabic fragment from your original message when copied from thunderbird. Hey, it even honors RTL, which is kinda neat :) mvg, Dieter ___ pygtk mailing list pygtk@daa.com.au http://www.daa.com.au/mailman/listinfo/pygtk Read the PyGTK FAQ: http://faq.pygtk.org/
Re: [pygtk] problem pasting clipboard content from arabic website (target text/html)
It helped thank you, the first hexs being 0xfffe or 0xfeff indicate utf-16. regards, Giuseppe. On Wed, Jan 12, 2011 at 6:15 PM, Tomeu Vizoso to...@sugarlabs.org wrote: On Wed, Jan 12, 2011 at 16:03, Giuseppe Penone gius...@gmail.com wrote: Yes it's mozilla firefox. I tried with input_string.decode(utf-16, ignore) Reading this code helped me with fixing interoperability between Sugar and Mozilla, but was a long time ago and don't remember the details: http://mxr.mozilla.org/mozilla-central/source/widget/src/gtk2/nsClipboard.cpp#707 HTH, Tomeu but what I get is 猼慰瑳汹㵥昢湯楳敺›㌱┰∻㰾灳湡猠祴敬∽潣潬㩲爠摥∻㰾灳湡猠祴敬∽潣潬㩲戠慬正∻䐾㑅䡇‱❈ㅄ☧‹䥁✠❄⼵ㄧ✠ⱄ䨯 䔪✠䐵ⴧ⠠㘹✠㥄䑄䠠䔪㤠䑅⼼灳湡㰾猯慰㹮⼼灳湡 Regards, Giuseppe. On Wed, Jan 12, 2011 at 3:17 PM, Tomeu Vizoso to...@sugarlabs.org wrote: On Wed, Jan 12, 2011 at 13:34, Giuseppe Penone gius...@gmail.com wrote: Hi, I have a problem with gtk.clipboard() and gtk.selectiondata() in my open source app http://giuspen.com/cherrytree when pasting clipboard content from arabic website taking data from the target text/html. What browser is it? I think Mozilla used to put utf-16 in text/html. Regards, Tomeu when printing the selectiondata.data I can see that the content is: (print to terminal) ��span style=font-size: 130%;span style=color: red;span style=color: black;' D E 4 G H 1 H ' D 1 ' 9 A I ' D ' 5 / ' 1 ' D , / J / * E ' 5 D ' - ( 9 6 ' D 9 D D H * E 9 E D /span/span/span where the first two chars are 255 and 254. when I write to the textbuffer I see: 'DE4GH1 H'D1'9 AI 'D'5/'1 'D,/J/ *E '5D'- (96 'D9DD H*E 9ED instead of: المشهور والرائع فى الاصدار الجديد تم اصلاح بعض العلل وتم عمل I checked the source code of the website ( http://ubuntu-c.blogspot.com/) and the encoding is utf-8, so I don't have idea of how to behave. Please if anybody can give me a clue help me. Regards, Giuseppe. ___ pygtk mailing list pygtk@daa.com.au http://www.daa.com.au/mailman/listinfo/pygtk Read the PyGTK FAQ: http://faq.pygtk.org/ ___ pygtk mailing list pygtk@daa.com.au http://www.daa.com.au/mailman/listinfo/pygtk Read the PyGTK FAQ: http://faq.pygtk.org/