Re: [pygtk] problem pasting clipboard content from arabic website (target text/html)

2011-01-12 Thread Giuseppe Penone
It helped thank you,

the first hexs being 0xfffe or 0xfeff indicate utf-16.

regards,
Giuseppe.


On Wed, Jan 12, 2011 at 6:15 PM, Tomeu Vizoso  wrote:

> On Wed, Jan 12, 2011 at 16:03, Giuseppe Penone  wrote:
> > Yes it's mozilla firefox.
> > I tried with
> >
> > input_string.decode("utf-16", "ignore")
>
> Reading this code helped me with fixing interoperability between Sugar
> and Mozilla, but was a long time ago and don't remember the details:
>
>
> http://mxr.mozilla.org/mozilla-central/source/widget/src/gtk2/nsClipboard.cpp#707
>
> HTH,
>
> Tomeu
>
> > but what I get is
> >
> > 猼慰瑳汹㵥昢湯⵴楳敺›㌱┰∻㰾灳湡猠祴敬∽潣潬㩲爠摥∻㰾灳湡猠祴敬∽潣潬㩲戠慬正∻䐾㑅䡇‱❈ㅄ☧‹䥁✠❄⼵ㄧ✠ⱄ䨯
> 䔪✠䐵ⴧ⠠㘹✠㥄䑄䠠䔪㤠䑅⼼灳湡㰾猯慰㹮⼼灳湡
> >
> > Regards,
> > Giuseppe.
> >
> >
> > On Wed, Jan 12, 2011 at 3:17 PM, Tomeu Vizoso 
> wrote:
> >>
> >> On Wed, Jan 12, 2011 at 13:34, Giuseppe Penone 
> wrote:
> >> > Hi,
> >> > I have a problem with gtk.clipboard() and gtk.selectiondata() in my
> open
> >> > source app http://giuspen.com/cherrytree
> >> > when pasting clipboard content from arabic website taking data from
> the
> >> > target "text/html".
> >>
> >> What browser is it? I think Mozilla used to put utf-16 in "text/html".
> >>
> >> Regards,
> >>
> >> Tomeu
> >>
> >> >
> >> > when printing the selectiondata.data I can see that the content is:
> >> > (print
> >> > to terminal)
> >> >
> >> > �� >> > style="color: black;">' D E 4 G H 1 H ' D 1 ' & 9 A I ' D ' 5 / ' 1 '
> D
> >> > , /
> >> > J / * E ' 5 D ' - ( 9 6 ' D 9 D D H * E 9 E D 
> >> >
> >> > where the first two chars are 255 and 254.
> >> >
> >> > when I write to the textbuffer I see:
> >> > 'DE4GH1 H'D1'&9 AI 'D'5/'1 'D,/J/ *E '5D'- (96 'D9DD H*E 9ED
> >> >
> >> > instead of:
> >> > المشهور والرائع فى الاصدار الجديد تم اصلاح بعض العلل وتم عمل
> >> >
> >> > I checked the source code of the website (
> http://ubuntu-c.blogspot.com/)
> >> > and
> >> > the encoding is utf-8, so I don't have idea of how to behave.
> >> > Please if anybody can give me a clue help me.
> >> > Regards,
> >> > Giuseppe.
> >> >
> >> > ___
> >> > pygtk mailing list   pygtk@daa.com.au
> >> > http://www.daa.com.au/mailman/listinfo/pygtk
> >> > Read the PyGTK FAQ: http://faq.pygtk.org/
> >> >
> >
> >
>
___
pygtk mailing list   pygtk@daa.com.au
http://www.daa.com.au/mailman/listinfo/pygtk
Read the PyGTK FAQ: http://faq.pygtk.org/

Re: [pygtk] problem pasting clipboard content from arabic website (target text/html)

2011-01-12 Thread Giuseppe Penone
Great help, thank you very much.
Regards,
Giuseppe.


On Wed, Jan 12, 2011 at 8:00 PM, Dieter Verfaillie <
diet...@optionexplicit.be> wrote:

> On 12/01/2011 16:24, Giuseppe Penone wrote:
> > Yes I also was thinking that, being the first two chars not valid (\0xff
> and
> > \0xfe)
>
> That would be the BOM (Byte Order Mark)...
>
> , the problem is that I cannot find a reference to understand what is
> > the encoding according to those chars.
>
> ... for UTF-16LE (or UTF-16 for short). You'll also want to be careful
> about NULL characters.
>
> The attached fragment accepts "html" pastes from firefox/thinderbird
> and correctly shows the Arabic fragment from your original message
> when copied from thunderbird.
>
> Hey, it even honors RTL, which is kinda neat :)
>
> mvg,
> Dieter
>
___
pygtk mailing list   pygtk@daa.com.au
http://www.daa.com.au/mailman/listinfo/pygtk
Read the PyGTK FAQ: http://faq.pygtk.org/

Re: [pygtk] problem pasting clipboard content from arabic website (target text/html)

2011-01-12 Thread Dieter Verfaillie
On 12/01/2011 16:24, Giuseppe Penone wrote:
> Yes I also was thinking that, being the first two chars not valid (\0xff and
> \0xfe)

That would be the BOM (Byte Order Mark)...

, the problem is that I cannot find a reference to understand what is
> the encoding according to those chars.

... for UTF-16LE (or UTF-16 for short). You'll also want to be careful
about NULL characters.

The attached fragment accepts "html" pastes from firefox/thinderbird
and correctly shows the Arabic fragment from your original message
when copied from thunderbird.

Hey, it even honors RTL, which is kinda neat :)

mvg,
Dieter
import gtk


def on_paste(textview, clipboard):
textview.stop_emission("paste-clipboard")
targets = clipboard.wait_for_targets()

if 'text/html' in targets:
clipboard.request_contents('text/html', paste_html, 
textview.get_buffer())
return True

def paste_html(clipboard, selectiondata, textbuffer):
selection_data = selectiondata.data.decode('utf_16').replace('\x00', '')
textbuffer.insert_at_cursor(selection_data)
return True

if __name__ == '__main__':
clipboard = gtk.clipboard_get()

window = gtk.Window()
window.connect('delete-event', gtk.main_quit)
buffer = gtk.TextBuffer()
textview = gtk.TextView(buffer)
textview.connect('paste-clipboard', on_paste, clipboard)
window.add(textview)
window.show_all()

gtk.main()
___
pygtk mailing list   pygtk@daa.com.au
http://www.daa.com.au/mailman/listinfo/pygtk
Read the PyGTK FAQ: http://faq.pygtk.org/

Re: [pygtk] problem pasting clipboard content from arabic website (target text/html)

2011-01-12 Thread Tomeu Vizoso
On Wed, Jan 12, 2011 at 16:03, Giuseppe Penone  wrote:
> Yes it's mozilla firefox.
> I tried with
>
> input_string.decode("utf-16", "ignore")

Reading this code helped me with fixing interoperability between Sugar
and Mozilla, but was a long time ago and don't remember the details:

http://mxr.mozilla.org/mozilla-central/source/widget/src/gtk2/nsClipboard.cpp#707

HTH,

Tomeu

> but what I get is
>
> 猼慰瑳汹㵥昢湯⵴楳敺›㌱┰∻㰾灳湡猠祴敬∽潣潬㩲爠摥∻㰾灳湡猠祴敬∽潣潬㩲戠慬正∻䐾㑅䡇‱❈ㅄ☧‹䥁✠❄⼵ㄧ✠ⱄ䨯 
> 䔪✠䐵ⴧ⠠㘹✠㥄䑄䠠䔪㤠䑅⼼灳湡㰾猯慰㹮⼼灳湡
>
> Regards,
> Giuseppe.
>
>
> On Wed, Jan 12, 2011 at 3:17 PM, Tomeu Vizoso  wrote:
>>
>> On Wed, Jan 12, 2011 at 13:34, Giuseppe Penone  wrote:
>> > Hi,
>> > I have a problem with gtk.clipboard() and gtk.selectiondata() in my open
>> > source app http://giuspen.com/cherrytree
>> > when pasting clipboard content from arabic website taking data from the
>> > target "text/html".
>>
>> What browser is it? I think Mozilla used to put utf-16 in "text/html".
>>
>> Regards,
>>
>> Tomeu
>>
>> >
>> > when printing the selectiondata.data I can see that the content is:
>> > (print
>> > to terminal)
>> >
>> > ��> > style="color: black;">' D E 4 G H 1 H ' D 1 ' & 9 A I ' D ' 5 / ' 1 ' D
>> > , /
>> > J / * E ' 5 D ' - ( 9 6 ' D 9 D D H * E 9 E D 
>> >
>> > where the first two chars are 255 and 254.
>> >
>> > when I write to the textbuffer I see:
>> > 'DE4GH1 H'D1'&9 AI 'D'5/'1 'D,/J/ *E '5D'- (96 'D9DD H*E 9ED
>> >
>> > instead of:
>> > المشهور والرائع فى الاصدار الجديد تم اصلاح بعض العلل وتم عمل
>> >
>> > I checked the source code of the website (http://ubuntu-c.blogspot.com/)
>> > and
>> > the encoding is utf-8, so I don't have idea of how to behave.
>> > Please if anybody can give me a clue help me.
>> > Regards,
>> > Giuseppe.
>> >
>> > ___
>> > pygtk mailing list   pygtk@daa.com.au
>> > http://www.daa.com.au/mailman/listinfo/pygtk
>> > Read the PyGTK FAQ: http://faq.pygtk.org/
>> >
>
>
___
pygtk mailing list   pygtk@daa.com.au
http://www.daa.com.au/mailman/listinfo/pygtk
Read the PyGTK FAQ: http://faq.pygtk.org/

Re: [pygtk] problem pasting clipboard content from arabic website (target text/html)

2011-01-12 Thread Giuseppe Penone
Yes I also was thinking that, being the first two chars not valid (\0xff and
\0xfe), the problem is that I cannot find a reference to understand what is
the encoding according to those chars.
Looking on the html of the webpage it tells utf-8 but probably then firefox
uses another to fill the clipboard.

Regards,
Giuseppe.


On Wed, Jan 12, 2011 at 5:18 PM, Dieter Verfaillie <
diet...@optionexplicit.be> wrote:

> On 12/01/2011 16:03, Giuseppe Penone wrote:
> > Yes it's mozilla firefox.
> > I tried with
> >
> > input_string.decode("utf-16", "ignore")
> >
> > but what I get is
> >
> > 猼慰瑳汹㵥昢湯⵴楳敺›㌱┰∻㰾灳湡猠祴敬∽潣潬㩲爠摥∻㰾灳湡猠祴敬∽潣潬㩲戠慬正∻䐾㑅䡇‱❈ㅄ☧‹䥁✠❄⼵ㄧ✠ⱄ䨯
> > 䔪✠䐵ⴧ⠠㘹✠㥄䑄䠠䔪㤠䑅⼼灳湡㰾猯慰㹮⼼灳湡
>
> Never had a need for it myself, but I've stumbled over this a couple
> of times: the tuple returned by clipboard.wait_for_targets() seems to
> contain a hint to the encoding of the data. At least in the case where
> you copied from a mozilla application (firefox/thunderbird/maybe
> others).
>
> hth,
> Dieter
>
___
pygtk mailing list   pygtk@daa.com.au
http://www.daa.com.au/mailman/listinfo/pygtk
Read the PyGTK FAQ: http://faq.pygtk.org/

Re: [pygtk] problem pasting clipboard content from arabic website (target text/html)

2011-01-12 Thread Dieter Verfaillie
On 12/01/2011 16:03, Giuseppe Penone wrote:
> Yes it's mozilla firefox.
> I tried with
> 
> input_string.decode("utf-16", "ignore")
> 
> but what I get is
> 
> 猼慰瑳汹㵥昢湯⵴楳敺›㌱┰∻㰾灳湡猠祴敬∽潣潬㩲爠摥∻㰾灳湡猠祴敬∽潣潬㩲戠慬正∻䐾㑅䡇‱❈ㅄ☧‹䥁✠❄⼵ㄧ✠ⱄ䨯
> 䔪✠䐵ⴧ⠠㘹✠㥄䑄䠠䔪㤠䑅⼼灳湡㰾猯慰㹮⼼灳湡

Never had a need for it myself, but I've stumbled over this a couple
of times: the tuple returned by clipboard.wait_for_targets() seems to
contain a hint to the encoding of the data. At least in the case where
you copied from a mozilla application (firefox/thunderbird/maybe
others).

hth,
Dieter
___
pygtk mailing list   pygtk@daa.com.au
http://www.daa.com.au/mailman/listinfo/pygtk
Read the PyGTK FAQ: http://faq.pygtk.org/

Re: [pygtk] problem pasting clipboard content from arabic website (target text/html)

2011-01-12 Thread Giuseppe Penone
Yes it's mozilla firefox.
I tried with

input_string.decode("utf-16", "ignore")

but what I get is

猼慰瑳汹㵥昢湯⵴楳敺›㌱┰∻㰾灳湡猠祴敬∽潣潬㩲爠摥∻㰾灳湡猠祴敬∽潣潬㩲戠慬正∻䐾㑅䡇‱❈ㅄ☧‹䥁✠❄⼵ㄧ✠ⱄ䨯
䔪✠䐵ⴧ⠠㘹✠㥄䑄䠠䔪㤠䑅⼼灳湡㰾猯慰㹮⼼灳湡

Regards,
Giuseppe.


On Wed, Jan 12, 2011 at 3:17 PM, Tomeu Vizoso  wrote:

> On Wed, Jan 12, 2011 at 13:34, Giuseppe Penone  wrote:
> > Hi,
> > I have a problem with gtk.clipboard() and gtk.selectiondata() in my open
> > source app http://giuspen.com/cherrytree
> > when pasting clipboard content from arabic website taking data from the
> > target "text/html".
>
> What browser is it? I think Mozilla used to put utf-16 in "text/html".
>
> Regards,
>
> Tomeu
>
> >
> > when printing the selectiondata.data I can see that the content is:
> (print
> > to terminal)
> >
> > �� > style="color: black;">' D E 4 G H 1 H ' D 1 ' & 9 A I ' D ' 5 / ' 1 ' D ,
> /
> > J / * E ' 5 D ' - ( 9 6 ' D 9 D D H * E 9 E D 
> >
> > where the first two chars are 255 and 254.
> >
> > when I write to the textbuffer I see:
> > 'DE4GH1 H'D1'&9 AI 'D'5/'1 'D,/J/ *E '5D'- (96 'D9DD H*E 9ED
> >
> > instead of:
> > المشهور والرائع فى الاصدار الجديد تم اصلاح بعض العلل وتم عمل
> >
> > I checked the source code of the website (http://ubuntu-c.blogspot.com/)
> and
> > the encoding is utf-8, so I don't have idea of how to behave.
> > Please if anybody can give me a clue help me.
> > Regards,
> > Giuseppe.
> >
> > ___
> > pygtk mailing list   pygtk@daa.com.au
> > http://www.daa.com.au/mailman/listinfo/pygtk
> > Read the PyGTK FAQ: http://faq.pygtk.org/
> >
>
___
pygtk mailing list   pygtk@daa.com.au
http://www.daa.com.au/mailman/listinfo/pygtk
Read the PyGTK FAQ: http://faq.pygtk.org/

Re: [pygtk] problem pasting clipboard content from arabic website (target text/html)

2011-01-12 Thread Tomeu Vizoso
On Wed, Jan 12, 2011 at 13:34, Giuseppe Penone  wrote:
> Hi,
> I have a problem with gtk.clipboard() and gtk.selectiondata() in my open
> source app http://giuspen.com/cherrytree
> when pasting clipboard content from arabic website taking data from the
> target "text/html".

What browser is it? I think Mozilla used to put utf-16 in "text/html".

Regards,

Tomeu

>
> when printing the selectiondata.data I can see that the content is: (print
> to terminal)
>
> �� style="color: black;">' D E 4 G H 1 H ' D 1 ' & 9 A I ' D ' 5 / ' 1 ' D , /
> J / * E ' 5 D ' - ( 9 6 ' D 9 D D H * E 9 E D 
>
> where the first two chars are 255 and 254.
>
> when I write to the textbuffer I see:
> 'DE4GH1 H'D1'&9 AI 'D'5/'1 'D,/J/ *E '5D'- (96 'D9DD H*E 9ED
>
> instead of:
> المشهور والرائع فى الاصدار الجديد تم اصلاح بعض العلل وتم عمل
>
> I checked the source code of the website (http://ubuntu-c.blogspot.com/) and
> the encoding is utf-8, so I don't have idea of how to behave.
> Please if anybody can give me a clue help me.
> Regards,
> Giuseppe.
>
> ___
> pygtk mailing list   pygtk@daa.com.au
> http://www.daa.com.au/mailman/listinfo/pygtk
> Read the PyGTK FAQ: http://faq.pygtk.org/
>
___
pygtk mailing list   pygtk@daa.com.au
http://www.daa.com.au/mailman/listinfo/pygtk
Read the PyGTK FAQ: http://faq.pygtk.org/