I've run into a strange problem using Django's [i]regex search with
non-ascii characters.
I'm using the [i]regex in the following manner:
-------- begin code --------
class Tag(models.Model):
def __unicode__(self):
return self.keyword
keyword = models.CharField(max_length=64)
kwlist = [some list of keywords]
tag_query = Q()
for k in kwlist:
rx = r'\b' + k + r'\b'
tag_query = tag_query | Q(keyword__iregex=rx) # or keyword__regex,
neither works
for t in Tag.objects.filter(tag_query):
print t
-------- end code --------
When a Tag's keyword *begins* with a non-ascii character (e.g. in my
case \u010d, which
is "latin small letter c with caron"), the [i]regex lookup fails for
some reason.
The strange thing is that there seem to be no problems with words
containing such characters
elsewhere, i.e. not at the beginning of a string. I've also tried the
following:
- Q(keyword__iexact), works OK
- modified regex: r'\b.*' + k[1:] + r'\b' works OK, but obviously may
return many false positives
- modified regex: r'\b.' + k[1:] + r'\b' doesn't work(?)
- python's re.search() works OK on such strings
Is this a known issue with Django + sqlite3 combination? I've seen the
docs mention that
iexact might be problematic, but that one actually works fine.
I'm using Django 1.0.2 with sqlite3 3.4.0 and python 2.5.1 (Mac OS X
10.5).
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Django users" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---