[pygtk] problem pasting clipboard content from arabic website (target text/html)

2011-01-12 Thread Giuseppe Penone
Hi,
I have a problem with gtk.clipboard() and gtk.selectiondata() in my open
source app http://giuspen.com/cherrytree
when pasting clipboard content from arabic website taking data from the
target text/html.

when printing the selectiondata.data I can see that the content is: (print
to terminal)

��span style=font-size: 130%;span style=color: red;span
style=color: black;'DE4GH1 H'D1'9 AI 'D'5/'1
'D,/J/ *E '5D'- (96 'D9DD H*E
9ED/span/span/span

where the first two chars are 255 and 254.

when I write to the textbuffer I see:
'DE4GH1 H'D1'9 AI 'D'5/'1 'D,/J/ *E '5D'- (96 'D9DD H*E 9ED

instead of:
المشهور والرائع فى الاصدار الجديد تم اصلاح بعض العلل وتم عمل

I checked the source code of the website (http://ubuntu-c.blogspot.com/) and
the encoding is utf-8, so I don't have idea of how to behave.
Please if anybody can give me a clue help me.
Regards,
Giuseppe.
___
pygtk mailing list   pygtk@daa.com.au
http://www.daa.com.au/mailman/listinfo/pygtk
Read the PyGTK FAQ: http://faq.pygtk.org/

Re: [pygtk] problem pasting clipboard content from arabic website (target text/html)

2011-01-12 Thread Tomeu Vizoso
On Wed, Jan 12, 2011 at 13:34, Giuseppe Penone gius...@gmail.com wrote:
 Hi,
 I have a problem with gtk.clipboard() and gtk.selectiondata() in my open
 source app http://giuspen.com/cherrytree
 when pasting clipboard content from arabic website taking data from the
 target text/html.

What browser is it? I think Mozilla used to put utf-16 in text/html.

Regards,

Tomeu


 when printing the selectiondata.data I can see that the content is: (print
 to terminal)

 ��span style=font-size: 130%;span style=color: red;span
 style=color: black;' D E 4 G H 1 H ' D 1 '  9 A I ' D ' 5 / ' 1 ' D , /
 J / * E ' 5 D ' - ( 9 6 ' D 9 D D H * E 9 E D /span/span/span

 where the first two chars are 255 and 254.

 when I write to the textbuffer I see:
 'DE4GH1 H'D1'9 AI 'D'5/'1 'D,/J/ *E '5D'- (96 'D9DD H*E 9ED

 instead of:
 المشهور والرائع فى الاصدار الجديد تم اصلاح بعض العلل وتم عمل

 I checked the source code of the website (http://ubuntu-c.blogspot.com/) and
 the encoding is utf-8, so I don't have idea of how to behave.
 Please if anybody can give me a clue help me.
 Regards,
 Giuseppe.

 ___
 pygtk mailing list   pygtk@daa.com.au
 http://www.daa.com.au/mailman/listinfo/pygtk
 Read the PyGTK FAQ: http://faq.pygtk.org/

___
pygtk mailing list   pygtk@daa.com.au
http://www.daa.com.au/mailman/listinfo/pygtk
Read the PyGTK FAQ: http://faq.pygtk.org/

Re: [pygtk] problem pasting clipboard content from arabic website (target text/html)

2011-01-12 Thread Giuseppe Penone
Yes it's mozilla firefox.
I tried with

input_string.decode(utf-16, ignore)

but what I get is

猼慰瑳汹㵥昢湯⵴楳敺›㌱┰∻㰾灳湡猠祴敬∽潣潬㩲爠摥∻㰾灳湡猠祴敬∽潣潬㩲戠慬正∻䐾㑅䡇‱❈ㅄ☧‹䥁✠❄⼵ㄧ✠ⱄ䨯
䔪✠䐵ⴧ⠠㘹✠㥄䑄䠠䔪㤠䑅⼼灳湡㰾猯慰㹮⼼灳湡

Regards,
Giuseppe.


On Wed, Jan 12, 2011 at 3:17 PM, Tomeu Vizoso to...@sugarlabs.org wrote:

 On Wed, Jan 12, 2011 at 13:34, Giuseppe Penone gius...@gmail.com wrote:
  Hi,
  I have a problem with gtk.clipboard() and gtk.selectiondata() in my open
  source app http://giuspen.com/cherrytree
  when pasting clipboard content from arabic website taking data from the
  target text/html.

 What browser is it? I think Mozilla used to put utf-16 in text/html.

 Regards,

 Tomeu

 
  when printing the selectiondata.data I can see that the content is:
 (print
  to terminal)
 
  ��span style=font-size: 130%;span style=color: red;span
  style=color: black;' D E 4 G H 1 H ' D 1 '  9 A I ' D ' 5 / ' 1 ' D ,
 /
  J / * E ' 5 D ' - ( 9 6 ' D 9 D D H * E 9 E D /span/span/span
 
  where the first two chars are 255 and 254.
 
  when I write to the textbuffer I see:
  'DE4GH1 H'D1'9 AI 'D'5/'1 'D,/J/ *E '5D'- (96 'D9DD H*E 9ED
 
  instead of:
  المشهور والرائع فى الاصدار الجديد تم اصلاح بعض العلل وتم عمل
 
  I checked the source code of the website (http://ubuntu-c.blogspot.com/)
 and
  the encoding is utf-8, so I don't have idea of how to behave.
  Please if anybody can give me a clue help me.
  Regards,
  Giuseppe.
 
  ___
  pygtk mailing list   pygtk@daa.com.au
  http://www.daa.com.au/mailman/listinfo/pygtk
  Read the PyGTK FAQ: http://faq.pygtk.org/
 

___
pygtk mailing list   pygtk@daa.com.au
http://www.daa.com.au/mailman/listinfo/pygtk
Read the PyGTK FAQ: http://faq.pygtk.org/

Re: [pygtk] problem pasting clipboard content from arabic website (target text/html)

2011-01-12 Thread Dieter Verfaillie
On 12/01/2011 16:03, Giuseppe Penone wrote:
 Yes it's mozilla firefox.
 I tried with
 
 input_string.decode(utf-16, ignore)
 
 but what I get is
 
 猼慰瑳汹㵥昢湯⵴楳敺›㌱┰∻㰾灳湡猠祴敬∽潣潬㩲爠摥∻㰾灳湡猠祴敬∽潣潬㩲戠慬正∻䐾㑅䡇‱❈ㅄ☧‹䥁✠❄⼵ㄧ✠ⱄ䨯
 䔪✠䐵ⴧ⠠㘹✠㥄䑄䠠䔪㤠䑅⼼灳湡㰾猯慰㹮⼼灳湡

Never had a need for it myself, but I've stumbled over this a couple
of times: the tuple returned by clipboard.wait_for_targets() seems to
contain a hint to the encoding of the data. At least in the case where
you copied from a mozilla application (firefox/thunderbird/maybe
others).

hth,
Dieter
___
pygtk mailing list   pygtk@daa.com.au
http://www.daa.com.au/mailman/listinfo/pygtk
Read the PyGTK FAQ: http://faq.pygtk.org/

Re: [pygtk] problem pasting clipboard content from arabic website (target text/html)

2011-01-12 Thread Giuseppe Penone
Yes I also was thinking that, being the first two chars not valid (\0xff and
\0xfe), the problem is that I cannot find a reference to understand what is
the encoding according to those chars.
Looking on the html of the webpage it tells utf-8 but probably then firefox
uses another to fill the clipboard.

Regards,
Giuseppe.


On Wed, Jan 12, 2011 at 5:18 PM, Dieter Verfaillie 
diet...@optionexplicit.be wrote:

 On 12/01/2011 16:03, Giuseppe Penone wrote:
  Yes it's mozilla firefox.
  I tried with
 
  input_string.decode(utf-16, ignore)
 
  but what I get is
 
  猼慰瑳汹㵥昢湯⵴楳敺›㌱┰∻㰾灳湡猠祴敬∽潣潬㩲爠摥∻㰾灳湡猠祴敬∽潣潬㩲戠慬正∻䐾㑅䡇‱❈ㅄ☧‹䥁✠❄⼵ㄧ✠ⱄ䨯
  䔪✠䐵ⴧ⠠㘹✠㥄䑄䠠䔪㤠䑅⼼灳湡㰾猯慰㹮⼼灳湡

 Never had a need for it myself, but I've stumbled over this a couple
 of times: the tuple returned by clipboard.wait_for_targets() seems to
 contain a hint to the encoding of the data. At least in the case where
 you copied from a mozilla application (firefox/thunderbird/maybe
 others).

 hth,
 Dieter

___
pygtk mailing list   pygtk@daa.com.au
http://www.daa.com.au/mailman/listinfo/pygtk
Read the PyGTK FAQ: http://faq.pygtk.org/

Re: [pygtk] problem pasting clipboard content from arabic website (target text/html)

2011-01-12 Thread Tomeu Vizoso
On Wed, Jan 12, 2011 at 16:03, Giuseppe Penone gius...@gmail.com wrote:
 Yes it's mozilla firefox.
 I tried with

 input_string.decode(utf-16, ignore)

Reading this code helped me with fixing interoperability between Sugar
and Mozilla, but was a long time ago and don't remember the details:

http://mxr.mozilla.org/mozilla-central/source/widget/src/gtk2/nsClipboard.cpp#707

HTH,

Tomeu

 but what I get is

 猼慰瑳汹㵥昢湯⵴楳敺›㌱┰∻㰾灳湡猠祴敬∽潣潬㩲爠摥∻㰾灳湡猠祴敬∽潣潬㩲戠慬正∻䐾㑅䡇‱❈ㅄ☧‹䥁✠❄⼵ㄧ✠ⱄ䨯 
 䔪✠䐵ⴧ⠠㘹✠㥄䑄䠠䔪㤠䑅⼼灳湡㰾猯慰㹮⼼灳湡

 Regards,
 Giuseppe.


 On Wed, Jan 12, 2011 at 3:17 PM, Tomeu Vizoso to...@sugarlabs.org wrote:

 On Wed, Jan 12, 2011 at 13:34, Giuseppe Penone gius...@gmail.com wrote:
  Hi,
  I have a problem with gtk.clipboard() and gtk.selectiondata() in my open
  source app http://giuspen.com/cherrytree
  when pasting clipboard content from arabic website taking data from the
  target text/html.

 What browser is it? I think Mozilla used to put utf-16 in text/html.

 Regards,

 Tomeu

 
  when printing the selectiondata.data I can see that the content is:
  (print
  to terminal)
 
  ��span style=font-size: 130%;span style=color: red;span
  style=color: black;' D E 4 G H 1 H ' D 1 '  9 A I ' D ' 5 / ' 1 ' D
  , /
  J / * E ' 5 D ' - ( 9 6 ' D 9 D D H * E 9 E D /span/span/span
 
  where the first two chars are 255 and 254.
 
  when I write to the textbuffer I see:
  'DE4GH1 H'D1'9 AI 'D'5/'1 'D,/J/ *E '5D'- (96 'D9DD H*E 9ED
 
  instead of:
  المشهور والرائع فى الاصدار الجديد تم اصلاح بعض العلل وتم عمل
 
  I checked the source code of the website (http://ubuntu-c.blogspot.com/)
  and
  the encoding is utf-8, so I don't have idea of how to behave.
  Please if anybody can give me a clue help me.
  Regards,
  Giuseppe.
 
  ___
  pygtk mailing list   pygtk@daa.com.au
  http://www.daa.com.au/mailman/listinfo/pygtk
  Read the PyGTK FAQ: http://faq.pygtk.org/
 


___
pygtk mailing list   pygtk@daa.com.au
http://www.daa.com.au/mailman/listinfo/pygtk
Read the PyGTK FAQ: http://faq.pygtk.org/

Re: [pygtk] problem pasting clipboard content from arabic website (target text/html)

2011-01-12 Thread Dieter Verfaillie
On 12/01/2011 16:24, Giuseppe Penone wrote:
 Yes I also was thinking that, being the first two chars not valid (\0xff and
 \0xfe)

That would be the BOM (Byte Order Mark)...

, the problem is that I cannot find a reference to understand what is
 the encoding according to those chars.

... for UTF-16LE (or UTF-16 for short). You'll also want to be careful
about NULL characters.

The attached fragment accepts html pastes from firefox/thinderbird
and correctly shows the Arabic fragment from your original message
when copied from thunderbird.

Hey, it even honors RTL, which is kinda neat :)

mvg,
Dieter
import gtk


def on_paste(textview, clipboard):
textview.stop_emission(paste-clipboard)
targets = clipboard.wait_for_targets()

if 'text/html' in targets:
clipboard.request_contents('text/html', paste_html, 
textview.get_buffer())
return True

def paste_html(clipboard, selectiondata, textbuffer):
selection_data = selectiondata.data.decode('utf_16').replace('\x00', '')
textbuffer.insert_at_cursor(selection_data)
return True

if __name__ == '__main__':
clipboard = gtk.clipboard_get()

window = gtk.Window()
window.connect('delete-event', gtk.main_quit)
buffer = gtk.TextBuffer()
textview = gtk.TextView(buffer)
textview.connect('paste-clipboard', on_paste, clipboard)
window.add(textview)
window.show_all()

gtk.main()
___
pygtk mailing list   pygtk@daa.com.au
http://www.daa.com.au/mailman/listinfo/pygtk
Read the PyGTK FAQ: http://faq.pygtk.org/

Re: [pygtk] problem pasting clipboard content from arabic website (target text/html)

2011-01-12 Thread Giuseppe Penone
Great help, thank you very much.
Regards,
Giuseppe.


On Wed, Jan 12, 2011 at 8:00 PM, Dieter Verfaillie 
diet...@optionexplicit.be wrote:

 On 12/01/2011 16:24, Giuseppe Penone wrote:
  Yes I also was thinking that, being the first two chars not valid (\0xff
 and
  \0xfe)

 That would be the BOM (Byte Order Mark)...

 , the problem is that I cannot find a reference to understand what is
  the encoding according to those chars.

 ... for UTF-16LE (or UTF-16 for short). You'll also want to be careful
 about NULL characters.

 The attached fragment accepts html pastes from firefox/thinderbird
 and correctly shows the Arabic fragment from your original message
 when copied from thunderbird.

 Hey, it even honors RTL, which is kinda neat :)

 mvg,
 Dieter

___
pygtk mailing list   pygtk@daa.com.au
http://www.daa.com.au/mailman/listinfo/pygtk
Read the PyGTK FAQ: http://faq.pygtk.org/

Re: [pygtk] problem pasting clipboard content from arabic website (target text/html)

2011-01-12 Thread Giuseppe Penone
It helped thank you,

the first hexs being 0xfffe or 0xfeff indicate utf-16.

regards,
Giuseppe.


On Wed, Jan 12, 2011 at 6:15 PM, Tomeu Vizoso to...@sugarlabs.org wrote:

 On Wed, Jan 12, 2011 at 16:03, Giuseppe Penone gius...@gmail.com wrote:
  Yes it's mozilla firefox.
  I tried with
 
  input_string.decode(utf-16, ignore)

 Reading this code helped me with fixing interoperability between Sugar
 and Mozilla, but was a long time ago and don't remember the details:


 http://mxr.mozilla.org/mozilla-central/source/widget/src/gtk2/nsClipboard.cpp#707

 HTH,

 Tomeu

  but what I get is
 
  猼慰瑳汹㵥昢湯⵴楳敺›㌱┰∻㰾灳湡猠祴敬∽潣潬㩲爠摥∻㰾灳湡猠祴敬∽潣潬㩲戠慬正∻䐾㑅䡇‱❈ㅄ☧‹䥁✠❄⼵ㄧ✠ⱄ䨯
 䔪✠䐵ⴧ⠠㘹✠㥄䑄䠠䔪㤠䑅⼼灳湡㰾猯慰㹮⼼灳湡
 
  Regards,
  Giuseppe.
 
 
  On Wed, Jan 12, 2011 at 3:17 PM, Tomeu Vizoso to...@sugarlabs.org
 wrote:
 
  On Wed, Jan 12, 2011 at 13:34, Giuseppe Penone gius...@gmail.com
 wrote:
   Hi,
   I have a problem with gtk.clipboard() and gtk.selectiondata() in my
 open
   source app http://giuspen.com/cherrytree
   when pasting clipboard content from arabic website taking data from
 the
   target text/html.
 
  What browser is it? I think Mozilla used to put utf-16 in text/html.
 
  Regards,
 
  Tomeu
 
  
   when printing the selectiondata.data I can see that the content is:
   (print
   to terminal)
  
   ��span style=font-size: 130%;span style=color: red;span
   style=color: black;' D E 4 G H 1 H ' D 1 '  9 A I ' D ' 5 / ' 1 '
 D
   , /
   J / * E ' 5 D ' - ( 9 6 ' D 9 D D H * E 9 E D /span/span/span
  
   where the first two chars are 255 and 254.
  
   when I write to the textbuffer I see:
   'DE4GH1 H'D1'9 AI 'D'5/'1 'D,/J/ *E '5D'- (96 'D9DD H*E 9ED
  
   instead of:
   المشهور والرائع فى الاصدار الجديد تم اصلاح بعض العلل وتم عمل
  
   I checked the source code of the website (
 http://ubuntu-c.blogspot.com/)
   and
   the encoding is utf-8, so I don't have idea of how to behave.
   Please if anybody can give me a clue help me.
   Regards,
   Giuseppe.
  
   ___
   pygtk mailing list   pygtk@daa.com.au
   http://www.daa.com.au/mailman/listinfo/pygtk
   Read the PyGTK FAQ: http://faq.pygtk.org/
  
 
 

___
pygtk mailing list   pygtk@daa.com.au
http://www.daa.com.au/mailman/listinfo/pygtk
Read the PyGTK FAQ: http://faq.pygtk.org/