Hi Karen, thanks for investigating... I solved the problem. There were 2 reasons: - php code non passing correct encoded POST - urllib.unquote_plus not working as expected
and not for last that the raw_post_data is not decoded and a POST var is... (my blindness) Thanks again for your help. -- Hinnack 2009/11/26 Karen Tracey <kmtra...@gmail.com> > On Thu, Nov 26, 2009 at 7:03 AM, Hinnack <henrik.gens...@googlemail.com>wrote: > >> Hi Karen, >> >> thanks again for your reply. >> I use Aptana with pydev extension. >> Debugging the app shows the following for search: >> dict: {u'caption': u'f\\xfcr', u'showold': False} >> >> > That's confusing to me, because other than having an extra \ (which could > be an artifact of how it's being displayed), that looks like a > correctly-built unicode object für. > > and for qs: >> str: für >> although it seems to be � instead of ASCII 252 - but this could be, >> because I am sitting on a MAC >> while debugging. >> > > Using python manage.py shell might shed more light, I fear the tool here is > assuming an incorrect bytestring encoding and getting in the way. > > I cannot recreate anything like what you are seeing. I have a model Thing > stored in a MySQL DB (using a utf-8 encoded table) with CharField name. > There are two instances of this Thing in the DB that contain für in the > name. From a python manage.py shell, using Django 1.1.1: > > >>> from ttt.models import Thing > >>> import django > >>> django.get_version() > '1.1.1' > >>> ufur = u'f\u00fcr' > >>> print ufur > für > >>> ufur > u'f\xfcr' > >>> ufur.encode('utf-8') > 'f\xc3\xbcr' > >>> ufur.encode('iso-8859-1') > 'f\xfcr' > > small-u with umlaut is U+00FC, encoded in utf-8 that takes 2 bytes C3BC, > encoded in iso-8859-1 it is the 1 byte FC. > > Filtering with icontains, using either the Unicode object or the utf-8 > encode bytestring version, works properly: > > >>> Thing.objects.filter(name__icontains=ufur) > [<Thing: für inserted as unicode>, <Thing: für inserted as utf8 > bytestring>] > >>> Thing.objects.filter(name__icontains=ufur.encode('utf-8')) > [<Thing: für inserted as unicode>, <Thing: für inserted as utf8 > bytestring>] > > Attempting to filter with an iso-8859-1 encoded bytestring raises an error: > > >>> Thing.objects.filter(name__icontains=ufur.encode('iso-8859-1')) > Traceback (most recent call last): > File "<console>", line 1, in <module> > File "/usr/lib/python2.5/site-packages/django/db/models/manager.py", line > 129, in filter > return self.get_query_set().filter(*args, **kwargs) > File "/usr/lib/python2.5/site-packages/django/db/models/query.py", line > 498, in filter > return self._filter_or_exclude(False, *args, **kwargs) > File "/usr/lib/python2.5/site-packages/django/db/models/query.py", line > 516, in _filter_or_exclude > clone.query.add_q(Q(*args, **kwargs)) > File "/usr/lib/python2.5/site-packages/django/db/models/sql/query.py", > line 1675, in add_q > can_reuse=used_aliases) > File "/usr/lib/python2.5/site-packages/django/db/models/sql/query.py", > line 1614, in add_filter > connector) > File "/usr/lib/python2.5/site-packages/django/db/models/sql/where.py", > line 56, in add > obj, params = obj.process(lookup_type, value) > File "/usr/lib/python2.5/site-packages/django/db/models/sql/where.py", > line 269, in process > params = self.field.get_db_prep_lookup(lookup_type, value) > File > "/usr/lib/python2.5/site-packages/django/db/models/fields/__init__.py", line > 214, in get_db_prep_lookup > return ["%%%s%%" % connection.ops.prep_for_like_query(value)] > File "/usr/lib/python2.5/site-packages/django/db/backends/__init__.py", > line 364, in prep_for_like_query > return smart_unicode(x).replace("\\", "\\\\").replace("%", > "\%").replace("_", "\_") > File "/usr/lib/python2.5/site-packages/django/utils/encoding.py", line > 44, in smart_unicode > return force_unicode(s, encoding, strings_only, errors) > File "/usr/lib/python2.5/site-packages/django/utils/encoding.py", line > 92, in force_unicode > raise DjangoUnicodeDecodeError(s, *e.args) > DjangoUnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-2: > unexpected end of data. You passed in 'f\xfcr' (<type 'str'>) > > This is because Django assumes the bytestring is utf-8 encoded, and runs > into trouble attempting to convert to unicode specifying utf-8 as the > string's encoding, since it is not valid utf-8 data. > > The only way I have been able to recreate anything like what you are > describing is to incorrectly construct the original unicode object from a > utf-8 bytestring assuming a iso-8859-1 encoding: > > >>> badufur = ufur.encode('utf-8').decode('iso-8859-1') > >>> badufur > u'f\xc3\xbcr' > >>> print badufur > für > >>> print badufur.encode('utf-8') > für > >>> print badufur.encode('iso-8859-1') > für > > Using that unicode object doesn't produce any hits in the DB: > > >>> Thing.objects.filter(name__icontains=badufur) > [] > > But encoding it to iso-8859-1 does, because that has the effect of > restoring the original utf-8 bytestring: > > >>> Thing.objects.filter(name__icontains=badufur.encode('iso-8859-1')) > [<Thing: für inserted as unicode>, <Thing: für inserted as utf8 > bytestring>] > > However, the debug info you show above doesn't show an incorrectly-built > unicode object, so I'm very confused by it. > > Karen > > -- > You received this message because you are subscribed to the Google Groups > "Django users" group. > To post to this group, send email to django-us...@googlegroups.com. > To unsubscribe from this group, send email to > django-users+unsubscr...@googlegroups.com<django-users%2bunsubscr...@googlegroups.com> > . > For more options, visit this group at > http://groups.google.com/group/django-users?hl=en. > -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-us...@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.