Hey Brian, I am trying to deal with a similar situation in my app.  I
would love it if someone offered a good general solution for dealing
with unexpected non-UTF-8 data in a string.  There must be a way
because my web browsers can display the data without crashing! =)

The admittedly poor solution that I have been using as a workaround is
creating a "CheckStringProperty" and using it in place of Google's
db.StringProperty:

class CheckStringProperty(db.StringProperty):
    def validate(self, value):
        try:
            value = unicode(value)
        except UnicodeDecodeError:
            # string contains bad values
            logging.warn("Encoding bad string with escapes: " + str
([ch for ch in value]))
            value = unicode(str(value).encode('string_escape'))

        return super(db.StringProperty, self).validate(value)

Lenza
blog.lenza.org

On Feb 25, 1:17 pm, Brian <bsmcconn...@gmail.com> wrote:
> Hi. I have observed a sporadic Unicode related bug that appears to be
> browser specific. It causes a db.put() to fail. I am not doing
> anything unusual with the incoming text except to put it in a
> variable, and then store in a record. I have determined that this
> issue is browser specific, and probably also related to the user's
> configuration for language preferences etc.
>
> I have run out of ideas for tracking this down. My understanding of
> the CGI interface is that it is supposed to force everything to UTF-8
> by default. It would be nice to be able to set some global options to
> manage encodings on incoming queries. I suspect what is going on is
> the texts are being sent in something besides UTF-8 but Python thinks
> they are Unicode.
>
> Unfortunately, Python crashes when in a situation like this. It would
> be far better if the CGI interface would make a best effort to deal
> with the incoming text, inserting garbage characters where necessary.
> That is "less damaging" than a outright failure, as most people can
> live with an occasional [] in place of a tilde, etc.
>
> On this subject, there really needs to be better documentation on
> Unicode, encoding conversions, etc. It is poorly documented in Python
> also, and I am sure a lot of people are making the same mistakes in
> trying to figure out what works and what breaks.
>
> Thanks,
>
> Brian McConnell
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to