[google-appengine] Re: Sporadic Unicode error in POST queries

Jarek Zgoda Wed, 25 Feb 2009 14:56:01 -0800

This will not work as expected, for many reasons.

The line "value = unicode(value)" will break with UnicodeEncodeError
(not Decode) if the value is unicode object and value contains
characters from outside ASCII range.


The line "unicode(str(value).encode('string_escape'))" can be
translated to human "encode value twice then decode". str() produce
byte string. Then you call encode to byte string, but byte strings do
not have attribute encode() (unicode objects have). The you call
"naive" unicode (without specifying encoding, so ASCII will be used).

See http://effbot.org/zone/unicode-objects.htm for brief explanation
on what is byte string and what is unicode object.

On 25 Lut, 23:35, lenza <le...@lenza.org> wrote:
> Hey Brian, I am trying to deal with a similar situation in my app.  I
> would love it if someone offered a good general solution for dealing
> with unexpected non-UTF-8 data in a string.  There must be a way
> because my web browsers can display the data without crashing! =)
>
> The admittedly poor solution that I have been using as a workaround is
> creating a "CheckStringProperty" and using it in place of Google's
> db.StringProperty:
>
> class CheckStringProperty(db.StringProperty):
>     def validate(self, value):
>         try:
>             value = unicode(value)
>         except UnicodeDecodeError:
>             # string contains bad values
>             logging.warn("Encoding bad string with escapes: " + str
> ([ch for ch in value]))
>             value = unicode(str(value).encode('string_escape'))
>
>         return super(db.StringProperty, self).validate(value)
>
> Lenza
> blog.lenza.org
>
> On Feb 25, 1:17 pm, Brian <bsmcconn...@gmail.com> wrote:
>
> > Hi. I have observed a sporadic Unicode related bug that appears to be
> > browser specific. It causes a db.put() to fail. I am not doing
> > anything unusual with the incoming text except to put it in a
> > variable, and then store in a record. I have determined that this
> > issue is browser specific, and probably also related to the user's
> > configuration for language preferences etc.
>
> > I have run out of ideas for tracking this down. My understanding of
> > the CGI interface is that it is supposed to force everything to UTF-8
> > by default. It would be nice to be able to set some global options to
> > manage encodings on incoming queries. I suspect what is going on is
> > the texts are being sent in something besides UTF-8 but Python thinks
> > they are Unicode.
>
> > Unfortunately, Python crashes when in a situation like this. It would
> > be far better if the CGI interface would make a best effort to deal
> > with the incoming text, inserting garbage characters where necessary.
> > That is "less damaging" than a outright failure, as most people can
> > live with an occasional [] in place of a tilde, etc.
>
> > On this subject, there really needs to be better documentation on
> > Unicode, encoding conversions, etc. It is poorly documented in Python
> > also, and I am sure a lot of people are making the same mistakes in
> > trying to figure out what works and what breaks.
>
> > Thanks,
>
> > Brian McConnell
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

[google-appengine] Re: Sporadic Unicode error in POST queries

Reply via email to