Re: [pygtk] utf8 validating string

2007-12-03 Thread Yann Leboulanger
John Ehresman wrote:
> Yann Leboulanger wrote:
>> John Ehresman wrote:
>>> I'm confused here; I think your last example passes '\x0' to a gtk
>>> function which does not work.  Either remove the '\x0' or do something
>>> else with \x0 here.  Or am I missing something?
>>>
>>
>> removeing the \x0 isn't a problem, a replce can do that, but is it the
>> only char that will cause this problem?
> 
> Yes as long as the rest is valid utf8.  \x0 is a problem because it
> terminates C strings so you can never have a C string with a \x0 in it
> (it's not quite that simple, but if you don't know C it's probably close
> enough).  Python strings can contain \x0 so there's a problem when
> passing the length to the conversion function.
> 
> Cheers,
> 
> John
> 

ok great, thanks, python's greater than C ;)
Ok ok I go out ->[] ;)
-- 
Yann
___
pygtk mailing list   pygtk@daa.com.au
http://www.daa.com.au/mailman/listinfo/pygtk
Read the PyGTK FAQ: http://www.async.com.br/faq/pygtk/


Re: [pygtk] utf8 validating string

2007-12-03 Thread Yann Leboulanger
John Ehresman wrote:
> I'm confused here; I think your last example passes '\x0' to a gtk
> function which does not work.  Either remove the '\x0' or do something
> else with \x0 here.  Or am I missing something?
> 

removeing the \x0 isn't a problem, a replce can do that, but is it the
only char that will cause this problem?

-- 
Yann
___
pygtk mailing list   pygtk@daa.com.au
http://www.daa.com.au/mailman/listinfo/pygtk
Read the PyGTK FAQ: http://www.async.com.br/faq/pygtk/


Re: [pygtk] utf8 validating string

2007-12-03 Thread Yann Leboulanger
John Ehresman wrote:
> Yann Leboulanger wrote:
>> I'd like not to have it. But I getthis string by gpg-decodding a message
>> send by Miranda IM. I think it's a bug in their GnuPG implementation,
>> but anyway I'd like my client to detect those bad string and a) print
>> message correctly if I can or b) don't traceback and print a warning
>> message. But for that I need a function that tells me that
>> g_utf8_validate will fail ...
> 
> You probably should explicitly decide how to handle \0.  If it's always
> at the end, it's probably just a simple bug and can be chopped off but
> it may be something more if valid text follows the \0.
> 
> But in general, I think this'll work:
> 
> def valid_glib_utf8(s):
>   try:
> unicode(s, 'utf-8')
>   except Exception:
> return False
>   else:
> return '\x0' not in s
> 
> In case you need it s.replace('\x0', '') will remove the \0's.
> 
> Cheers,
> 
> John
> 

That doesn't work:
>>> import gtk
>>> tv = gtk.TextView()
>>> b = tv.get_buffer()
>>> t = "test\x00"
>>> u = unicode(t, 'utf-8')
>>> b.set_text(t)
__main__:1: GtkWarning: gtk_text_buffer_emit_insert: assertion
`g_utf8_validate (text, len, NULL)' failed

it's the same if I try with the unicode:
>>> import gtk
>>> tv = gtk.TextView()
>>> b = tv.get_buffer()
>>> t = "test\x00"
>>> u = unicode(t, 'utf-8')
>>> b.set_text(u)
__main__:1: GtkWarning: gtk_text_buffer_emit_insert: assertion
`g_utf8_validate (text, len, NULL)' failed

-- 
Yann
___
pygtk mailing list   pygtk@daa.com.au
http://www.daa.com.au/mailman/listinfo/pygtk
Read the PyGTK FAQ: http://www.async.com.br/faq/pygtk/


Re: [pygtk] utf8 validating string

2007-11-30 Thread John Ehresman

Yann Leboulanger wrote:

import gtk
tv = gtk.TextView()
b = tv.get_buffer()
t = "Let's check this out.\x00"
u = unicode(t, 'utf-8')
b.set_text(t)

__main__:1: GtkWarning: gtk_text_buffer_emit_insert: assertion
`g_utf8_validate (text, len, NULL)' failed

but b.set_text(u) works ... is it the way to go?


Your mistake might be the final '\x00'.  Is there a reason you're 
including it?  Python handles \x00 in strings, but gtk (& most C libs) 
probably doesn't.


Cheers,

John
___
pygtk mailing list   pygtk@daa.com.au
http://www.daa.com.au/mailman/listinfo/pygtk
Read the PyGTK FAQ: http://www.async.com.br/faq/pygtk/


Re: [pygtk] utf8 validating string

2007-11-30 Thread John Ehresman

Yann Leboulanger wrote:

Hi,

I have a string that a textview can't display. It contains invalid chars:


t = "Let's check this out.\x00"
import gtk
tv = gtk.TextView()
b = tv.get_buffer()
b.set_text(t)

__main__:1: GtkWarning: gtk_text_buffer_emit_insert: assertion
`g_utf8_validate (text, len, NULL)' failed

but when I di that I have no problem:

t.decode('utf-8')

u"Let's check this out.\x00"


try:
  u = unicode(t, 'utf-8')
except Exception:
  print 'not utf8'
else:
  b.set_text(t)

John
___
pygtk mailing list   pygtk@daa.com.au
http://www.daa.com.au/mailman/listinfo/pygtk
Read the PyGTK FAQ: http://www.async.com.br/faq/pygtk/


Re: [pygtk] utf8 validating string

2007-11-30 Thread Yann Leboulanger
John Ehresman wrote:
> Yann Leboulanger wrote:
>> Hi,
>>
>> I have a string that a textview can't display. It contains invalid chars:
>>
> t = "Let's check this out.\x00"
> import gtk
> tv = gtk.TextView()
> b = tv.get_buffer()
> b.set_text(t)
>> __main__:1: GtkWarning: gtk_text_buffer_emit_insert: assertion
>> `g_utf8_validate (text, len, NULL)' failed
>>
>> but when I di that I have no problem:
> t.decode('utf-8')
>> u"Let's check this out.\x00"
> 
> try:
>   u = unicode(t, 'utf-8')
> except Exception:
>   print 'not utf8'
> else:
>   b.set_text(t)
> 
> John
> 


>>> import gtk
>>> tv = gtk.TextView()
>>> b = tv.get_buffer()
>>> t = "Let's check this out.\x00"
>>> u = unicode(t, 'utf-8')
>>> b.set_text(t)
__main__:1: GtkWarning: gtk_text_buffer_emit_insert: assertion
`g_utf8_validate (text, len, NULL)' failed

but b.set_text(u) works ... is it the way to go?
___
pygtk mailing list   pygtk@daa.com.au
http://www.daa.com.au/mailman/listinfo/pygtk
Read the PyGTK FAQ: http://www.async.com.br/faq/pygtk/


[pygtk] utf8 validating string

2007-11-30 Thread Yann Leboulanger
Hi,

I have a string that a textview can't display. It contains invalid chars:

>>> t = "Let's check this out.\x00"
>>> import gtk
>>> tv = gtk.TextView()
>>> b = tv.get_buffer()
>>> b.set_text(t)
__main__:1: GtkWarning: gtk_text_buffer_emit_insert: assertion
`g_utf8_validate (text, len, NULL)' failed

but when I di that I have no problem:
>>> t.decode('utf-8')
u"Let's check this out.\x00"

so what could I do to validate the string before sending it to GTK?
___
pygtk mailing list   pygtk@daa.com.au
http://www.daa.com.au/mailman/listinfo/pygtk
Read the PyGTK FAQ: http://www.async.com.br/faq/pygtk/