Re: german umlaute on search querys

Hinnack Sun, 29 Nov 2009 05:07:14 -0800

Hi Karen,

thanks for investigating...
I solved the problem.
There were 2 reasons:
- php code non passing correct encoded POST
- urllib.unquote_plus not working as expected


and not for last that the raw_post_data is not decoded and a POST var is...
(my blindness)

Thanks again for your help.

-- Hinnack

2009/11/26 Karen Tracey <kmtra...@gmail.com>

> On Thu, Nov 26, 2009 at 7:03 AM, Hinnack <henrik.gens...@googlemail.com>wrote:
>
>> Hi Karen,
>>
>> thanks again for your reply.
>> I use Aptana with pydev extension.
>> Debugging the app shows the following for search:
>> dict: {u'caption': u'f\\xfcr', u'showold': False}
>>
>>
> That's confusing to me, because other than having an extra \ (which could
> be an artifact of how it's being displayed), that looks like a
> correctly-built unicode object für.
>
> and for qs:
>> str: für
>> although it seems to be &#65533; instead of ASCII 252 - but this could be,
>> because I am sitting on a MAC
>> while debugging.
>>
>
> Using python manage.py shell might shed more light, I fear the tool here is
> assuming an incorrect bytestring encoding and getting in the way.
>
> I cannot recreate anything like what you are seeing.  I have a model Thing
> stored in a MySQL DB (using a utf-8 encoded table) with CharField name.
> There are two instances of this Thing in the DB that contain für in the
> name.  From a python manage.py shell, using Django 1.1.1:
>
> >>> from ttt.models import Thing
> >>> import django
> >>> django.get_version()
> '1.1.1'
> >>> ufur = u'f\u00fcr'
> >>> print ufur
> für
> >>> ufur
> u'f\xfcr'
> >>> ufur.encode('utf-8')
> 'f\xc3\xbcr'
> >>> ufur.encode('iso-8859-1')
> 'f\xfcr'
>
> small-u with umlaut is U+00FC, encoded in utf-8 that takes 2 bytes C3BC,
> encoded in iso-8859-1 it is the 1 byte FC.
>
> Filtering with icontains, using either the Unicode object or the utf-8
> encode bytestring version, works properly:
>
> >>> Thing.objects.filter(name__icontains=ufur)
> [<Thing: für inserted as unicode>, <Thing: für inserted as utf8
> bytestring>]
> >>> Thing.objects.filter(name__icontains=ufur.encode('utf-8'))
> [<Thing: für inserted as unicode>, <Thing: für inserted as utf8
> bytestring>]
>
> Attempting to filter with an iso-8859-1 encoded bytestring raises an error:
>
> >>> Thing.objects.filter(name__icontains=ufur.encode('iso-8859-1'))
> Traceback (most recent call last):
>   File "<console>", line 1, in <module>
>   File "/usr/lib/python2.5/site-packages/django/db/models/manager.py", line
> 129, in filter
>     return self.get_query_set().filter(*args, **kwargs)
>   File "/usr/lib/python2.5/site-packages/django/db/models/query.py", line
> 498, in filter
>     return self._filter_or_exclude(False, *args, **kwargs)
>   File "/usr/lib/python2.5/site-packages/django/db/models/query.py", line
> 516, in _filter_or_exclude
>     clone.query.add_q(Q(*args, **kwargs))
>   File "/usr/lib/python2.5/site-packages/django/db/models/sql/query.py",
> line 1675, in add_q
>     can_reuse=used_aliases)
>   File "/usr/lib/python2.5/site-packages/django/db/models/sql/query.py",
> line 1614, in add_filter
>     connector)
>   File "/usr/lib/python2.5/site-packages/django/db/models/sql/where.py",
> line 56, in add
>     obj, params = obj.process(lookup_type, value)
>   File "/usr/lib/python2.5/site-packages/django/db/models/sql/where.py",
> line 269, in process
>     params = self.field.get_db_prep_lookup(lookup_type, value)
>   File
> "/usr/lib/python2.5/site-packages/django/db/models/fields/__init__.py", line
> 214, in get_db_prep_lookup
>     return ["%%%s%%" % connection.ops.prep_for_like_query(value)]
>   File "/usr/lib/python2.5/site-packages/django/db/backends/__init__.py",
> line 364, in prep_for_like_query
>     return smart_unicode(x).replace("\\", "\\\\").replace("%",
> "\%").replace("_", "\_")
>   File "/usr/lib/python2.5/site-packages/django/utils/encoding.py", line
> 44, in smart_unicode
>     return force_unicode(s, encoding, strings_only, errors)
>   File "/usr/lib/python2.5/site-packages/django/utils/encoding.py", line
> 92, in force_unicode
>     raise DjangoUnicodeDecodeError(s, *e.args)
> DjangoUnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-2:
> unexpected end of data. You passed in 'f\xfcr' (<type 'str'>)
>
> This is because Django assumes the bytestring is utf-8 encoded, and runs
> into trouble attempting to convert to unicode specifying utf-8 as the
> string's encoding, since it is not valid utf-8 data.
>
> The only way I have been able to recreate anything like what you are
> describing is to incorrectly construct the original unicode object from a
> utf-8 bytestring assuming a iso-8859-1 encoding:
>
> >>> badufur = ufur.encode('utf-8').decode('iso-8859-1')
> >>> badufur
> u'f\xc3\xbcr'
> >>> print badufur
> fÃ¼r
> >>> print badufur.encode('utf-8')
> fÃ¼r
> >>> print badufur.encode('iso-8859-1')
> für
>
> Using that unicode object doesn't produce any hits in the DB:
>
> >>> Thing.objects.filter(name__icontains=badufur)
> []
>
> But encoding it to iso-8859-1 does, because that has the effect of
> restoring the original utf-8 bytestring:
>
> >>> Thing.objects.filter(name__icontains=badufur.encode('iso-8859-1'))
> [<Thing: für inserted as unicode>, <Thing: für inserted as utf8
> bytestring>]
>
> However, the debug info you show above doesn't show an incorrectly-built
> unicode object, so I'm very confused by it.
>
> Karen
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To post to this group, send email to django-us...@googlegroups.com.
> To unsubscribe from this group, send email to
> django-users+unsubscr...@googlegroups.com<django-users%2bunsubscr...@googlegroups.com>
> .
> For more options, visit this group at
> http://groups.google.com/group/django-users?hl=en.
>

--

You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: german umlaute on search querys

Reply via email to