Re: [pygtk] problem pasting clipboard content from arabic website (target text/html)
It helped thank you, the first hexs being 0xfffe or 0xfeff indicate utf-16. regards, Giuseppe. On Wed, Jan 12, 2011 at 6:15 PM, Tomeu Vizoso wrote: > On Wed, Jan 12, 2011 at 16:03, Giuseppe Penone wrote: > > Yes it's mozilla firefox. > > I tried with > > > > input_string.decode("utf-16", "ignore") > > Reading this code helped me with fixing interoperability between Sugar > and Mozilla, but was a long time ago and don't remember the details: > > > http://mxr.mozilla.org/mozilla-central/source/widget/src/gtk2/nsClipboard.cpp#707 > > HTH, > > Tomeu > > > but what I get is > > > > 猼慰瑳汹㵥昢湯楳敺›㌱┰∻㰾灳湡猠祴敬∽潣潬㩲爠摥∻㰾灳湡猠祴敬∽潣潬㩲戠慬正∻䐾㑅䡇‱❈ㅄ☧‹䥁✠❄⼵ㄧ✠ⱄ䨯 > 䔪✠䐵ⴧ⠠㘹✠㥄䑄䠠䔪㤠䑅⼼灳湡㰾猯慰㹮⼼灳湡 > > > > Regards, > > Giuseppe. > > > > > > On Wed, Jan 12, 2011 at 3:17 PM, Tomeu Vizoso > wrote: > >> > >> On Wed, Jan 12, 2011 at 13:34, Giuseppe Penone > wrote: > >> > Hi, > >> > I have a problem with gtk.clipboard() and gtk.selectiondata() in my > open > >> > source app http://giuspen.com/cherrytree > >> > when pasting clipboard content from arabic website taking data from > the > >> > target "text/html". > >> > >> What browser is it? I think Mozilla used to put utf-16 in "text/html". > >> > >> Regards, > >> > >> Tomeu > >> > >> > > >> > when printing the selectiondata.data I can see that the content is: > >> > (print > >> > to terminal) > >> > > >> > �� >> > style="color: black;">' D E 4 G H 1 H ' D 1 ' & 9 A I ' D ' 5 / ' 1 ' > D > >> > , / > >> > J / * E ' 5 D ' - ( 9 6 ' D 9 D D H * E 9 E D > >> > > >> > where the first two chars are 255 and 254. > >> > > >> > when I write to the textbuffer I see: > >> > 'DE4GH1 H'D1'&9 AI 'D'5/'1 'D,/J/ *E '5D'- (96 'D9DD H*E 9ED > >> > > >> > instead of: > >> > المشهور والرائع فى الاصدار الجديد تم اصلاح بعض العلل وتم عمل > >> > > >> > I checked the source code of the website ( > http://ubuntu-c.blogspot.com/) > >> > and > >> > the encoding is utf-8, so I don't have idea of how to behave. > >> > Please if anybody can give me a clue help me. > >> > Regards, > >> > Giuseppe. > >> > > >> > ___ > >> > pygtk mailing list pygtk@daa.com.au > >> > http://www.daa.com.au/mailman/listinfo/pygtk > >> > Read the PyGTK FAQ: http://faq.pygtk.org/ > >> > > > > > > ___ pygtk mailing list pygtk@daa.com.au http://www.daa.com.au/mailman/listinfo/pygtk Read the PyGTK FAQ: http://faq.pygtk.org/
Re: [pygtk] problem pasting clipboard content from arabic website (target text/html)
Great help, thank you very much. Regards, Giuseppe. On Wed, Jan 12, 2011 at 8:00 PM, Dieter Verfaillie < diet...@optionexplicit.be> wrote: > On 12/01/2011 16:24, Giuseppe Penone wrote: > > Yes I also was thinking that, being the first two chars not valid (\0xff > and > > \0xfe) > > That would be the BOM (Byte Order Mark)... > > , the problem is that I cannot find a reference to understand what is > > the encoding according to those chars. > > ... for UTF-16LE (or UTF-16 for short). You'll also want to be careful > about NULL characters. > > The attached fragment accepts "html" pastes from firefox/thinderbird > and correctly shows the Arabic fragment from your original message > when copied from thunderbird. > > Hey, it even honors RTL, which is kinda neat :) > > mvg, > Dieter > ___ pygtk mailing list pygtk@daa.com.au http://www.daa.com.au/mailman/listinfo/pygtk Read the PyGTK FAQ: http://faq.pygtk.org/
Re: [pygtk] problem pasting clipboard content from arabic website (target text/html)
On 12/01/2011 16:24, Giuseppe Penone wrote: > Yes I also was thinking that, being the first two chars not valid (\0xff and > \0xfe) That would be the BOM (Byte Order Mark)... , the problem is that I cannot find a reference to understand what is > the encoding according to those chars. ... for UTF-16LE (or UTF-16 for short). You'll also want to be careful about NULL characters. The attached fragment accepts "html" pastes from firefox/thinderbird and correctly shows the Arabic fragment from your original message when copied from thunderbird. Hey, it even honors RTL, which is kinda neat :) mvg, Dieter import gtk def on_paste(textview, clipboard): textview.stop_emission("paste-clipboard") targets = clipboard.wait_for_targets() if 'text/html' in targets: clipboard.request_contents('text/html', paste_html, textview.get_buffer()) return True def paste_html(clipboard, selectiondata, textbuffer): selection_data = selectiondata.data.decode('utf_16').replace('\x00', '') textbuffer.insert_at_cursor(selection_data) return True if __name__ == '__main__': clipboard = gtk.clipboard_get() window = gtk.Window() window.connect('delete-event', gtk.main_quit) buffer = gtk.TextBuffer() textview = gtk.TextView(buffer) textview.connect('paste-clipboard', on_paste, clipboard) window.add(textview) window.show_all() gtk.main() ___ pygtk mailing list pygtk@daa.com.au http://www.daa.com.au/mailman/listinfo/pygtk Read the PyGTK FAQ: http://faq.pygtk.org/
Re: [pygtk] problem pasting clipboard content from arabic website (target text/html)
On Wed, Jan 12, 2011 at 16:03, Giuseppe Penone wrote: > Yes it's mozilla firefox. > I tried with > > input_string.decode("utf-16", "ignore") Reading this code helped me with fixing interoperability between Sugar and Mozilla, but was a long time ago and don't remember the details: http://mxr.mozilla.org/mozilla-central/source/widget/src/gtk2/nsClipboard.cpp#707 HTH, Tomeu > but what I get is > > 猼慰瑳汹㵥昢湯楳敺›㌱┰∻㰾灳湡猠祴敬∽潣潬㩲爠摥∻㰾灳湡猠祴敬∽潣潬㩲戠慬正∻䐾㑅䡇‱❈ㅄ☧‹䥁✠❄⼵ㄧ✠ⱄ䨯 > 䔪✠䐵ⴧ⠠㘹✠㥄䑄䠠䔪㤠䑅⼼灳湡㰾猯慰㹮⼼灳湡 > > Regards, > Giuseppe. > > > On Wed, Jan 12, 2011 at 3:17 PM, Tomeu Vizoso wrote: >> >> On Wed, Jan 12, 2011 at 13:34, Giuseppe Penone wrote: >> > Hi, >> > I have a problem with gtk.clipboard() and gtk.selectiondata() in my open >> > source app http://giuspen.com/cherrytree >> > when pasting clipboard content from arabic website taking data from the >> > target "text/html". >> >> What browser is it? I think Mozilla used to put utf-16 in "text/html". >> >> Regards, >> >> Tomeu >> >> > >> > when printing the selectiondata.data I can see that the content is: >> > (print >> > to terminal) >> > >> > ��> > style="color: black;">' D E 4 G H 1 H ' D 1 ' & 9 A I ' D ' 5 / ' 1 ' D >> > , / >> > J / * E ' 5 D ' - ( 9 6 ' D 9 D D H * E 9 E D >> > >> > where the first two chars are 255 and 254. >> > >> > when I write to the textbuffer I see: >> > 'DE4GH1 H'D1'&9 AI 'D'5/'1 'D,/J/ *E '5D'- (96 'D9DD H*E 9ED >> > >> > instead of: >> > المشهور والرائع فى الاصدار الجديد تم اصلاح بعض العلل وتم عمل >> > >> > I checked the source code of the website (http://ubuntu-c.blogspot.com/) >> > and >> > the encoding is utf-8, so I don't have idea of how to behave. >> > Please if anybody can give me a clue help me. >> > Regards, >> > Giuseppe. >> > >> > ___ >> > pygtk mailing list pygtk@daa.com.au >> > http://www.daa.com.au/mailman/listinfo/pygtk >> > Read the PyGTK FAQ: http://faq.pygtk.org/ >> > > > ___ pygtk mailing list pygtk@daa.com.au http://www.daa.com.au/mailman/listinfo/pygtk Read the PyGTK FAQ: http://faq.pygtk.org/
Re: [pygtk] problem pasting clipboard content from arabic website (target text/html)
Yes I also was thinking that, being the first two chars not valid (\0xff and \0xfe), the problem is that I cannot find a reference to understand what is the encoding according to those chars. Looking on the html of the webpage it tells utf-8 but probably then firefox uses another to fill the clipboard. Regards, Giuseppe. On Wed, Jan 12, 2011 at 5:18 PM, Dieter Verfaillie < diet...@optionexplicit.be> wrote: > On 12/01/2011 16:03, Giuseppe Penone wrote: > > Yes it's mozilla firefox. > > I tried with > > > > input_string.decode("utf-16", "ignore") > > > > but what I get is > > > > 猼慰瑳汹㵥昢湯楳敺›㌱┰∻㰾灳湡猠祴敬∽潣潬㩲爠摥∻㰾灳湡猠祴敬∽潣潬㩲戠慬正∻䐾㑅䡇‱❈ㅄ☧‹䥁✠❄⼵ㄧ✠ⱄ䨯 > > 䔪✠䐵ⴧ⠠㘹✠㥄䑄䠠䔪㤠䑅⼼灳湡㰾猯慰㹮⼼灳湡 > > Never had a need for it myself, but I've stumbled over this a couple > of times: the tuple returned by clipboard.wait_for_targets() seems to > contain a hint to the encoding of the data. At least in the case where > you copied from a mozilla application (firefox/thunderbird/maybe > others). > > hth, > Dieter > ___ pygtk mailing list pygtk@daa.com.au http://www.daa.com.au/mailman/listinfo/pygtk Read the PyGTK FAQ: http://faq.pygtk.org/
Re: [pygtk] problem pasting clipboard content from arabic website (target text/html)
On 12/01/2011 16:03, Giuseppe Penone wrote: > Yes it's mozilla firefox. > I tried with > > input_string.decode("utf-16", "ignore") > > but what I get is > > 猼慰瑳汹㵥昢湯楳敺›㌱┰∻㰾灳湡猠祴敬∽潣潬㩲爠摥∻㰾灳湡猠祴敬∽潣潬㩲戠慬正∻䐾㑅䡇‱❈ㅄ☧‹䥁✠❄⼵ㄧ✠ⱄ䨯 > 䔪✠䐵ⴧ⠠㘹✠㥄䑄䠠䔪㤠䑅⼼灳湡㰾猯慰㹮⼼灳湡 Never had a need for it myself, but I've stumbled over this a couple of times: the tuple returned by clipboard.wait_for_targets() seems to contain a hint to the encoding of the data. At least in the case where you copied from a mozilla application (firefox/thunderbird/maybe others). hth, Dieter ___ pygtk mailing list pygtk@daa.com.au http://www.daa.com.au/mailman/listinfo/pygtk Read the PyGTK FAQ: http://faq.pygtk.org/
Re: [pygtk] problem pasting clipboard content from arabic website (target text/html)
Yes it's mozilla firefox. I tried with input_string.decode("utf-16", "ignore") but what I get is 猼慰瑳汹㵥昢湯楳敺›㌱┰∻㰾灳湡猠祴敬∽潣潬㩲爠摥∻㰾灳湡猠祴敬∽潣潬㩲戠慬正∻䐾㑅䡇‱❈ㅄ☧‹䥁✠❄⼵ㄧ✠ⱄ䨯 䔪✠䐵ⴧ⠠㘹✠㥄䑄䠠䔪㤠䑅⼼灳湡㰾猯慰㹮⼼灳湡 Regards, Giuseppe. On Wed, Jan 12, 2011 at 3:17 PM, Tomeu Vizoso wrote: > On Wed, Jan 12, 2011 at 13:34, Giuseppe Penone wrote: > > Hi, > > I have a problem with gtk.clipboard() and gtk.selectiondata() in my open > > source app http://giuspen.com/cherrytree > > when pasting clipboard content from arabic website taking data from the > > target "text/html". > > What browser is it? I think Mozilla used to put utf-16 in "text/html". > > Regards, > > Tomeu > > > > > when printing the selectiondata.data I can see that the content is: > (print > > to terminal) > > > > �� > style="color: black;">' D E 4 G H 1 H ' D 1 ' & 9 A I ' D ' 5 / ' 1 ' D , > / > > J / * E ' 5 D ' - ( 9 6 ' D 9 D D H * E 9 E D > > > > where the first two chars are 255 and 254. > > > > when I write to the textbuffer I see: > > 'DE4GH1 H'D1'&9 AI 'D'5/'1 'D,/J/ *E '5D'- (96 'D9DD H*E 9ED > > > > instead of: > > المشهور والرائع فى الاصدار الجديد تم اصلاح بعض العلل وتم عمل > > > > I checked the source code of the website (http://ubuntu-c.blogspot.com/) > and > > the encoding is utf-8, so I don't have idea of how to behave. > > Please if anybody can give me a clue help me. > > Regards, > > Giuseppe. > > > > ___ > > pygtk mailing list pygtk@daa.com.au > > http://www.daa.com.au/mailman/listinfo/pygtk > > Read the PyGTK FAQ: http://faq.pygtk.org/ > > > ___ pygtk mailing list pygtk@daa.com.au http://www.daa.com.au/mailman/listinfo/pygtk Read the PyGTK FAQ: http://faq.pygtk.org/
Re: [pygtk] problem pasting clipboard content from arabic website (target text/html)
On Wed, Jan 12, 2011 at 13:34, Giuseppe Penone wrote: > Hi, > I have a problem with gtk.clipboard() and gtk.selectiondata() in my open > source app http://giuspen.com/cherrytree > when pasting clipboard content from arabic website taking data from the > target "text/html". What browser is it? I think Mozilla used to put utf-16 in "text/html". Regards, Tomeu > > when printing the selectiondata.data I can see that the content is: (print > to terminal) > > �� style="color: black;">' D E 4 G H 1 H ' D 1 ' & 9 A I ' D ' 5 / ' 1 ' D , / > J / * E ' 5 D ' - ( 9 6 ' D 9 D D H * E 9 E D > > where the first two chars are 255 and 254. > > when I write to the textbuffer I see: > 'DE4GH1 H'D1'&9 AI 'D'5/'1 'D,/J/ *E '5D'- (96 'D9DD H*E 9ED > > instead of: > المشهور والرائع فى الاصدار الجديد تم اصلاح بعض العلل وتم عمل > > I checked the source code of the website (http://ubuntu-c.blogspot.com/) and > the encoding is utf-8, so I don't have idea of how to behave. > Please if anybody can give me a clue help me. > Regards, > Giuseppe. > > ___ > pygtk mailing list pygtk@daa.com.au > http://www.daa.com.au/mailman/listinfo/pygtk > Read the PyGTK FAQ: http://faq.pygtk.org/ > ___ pygtk mailing list pygtk@daa.com.au http://www.daa.com.au/mailman/listinfo/pygtk Read the PyGTK FAQ: http://faq.pygtk.org/