I've run into a strange problem using Django's [i]regex search with
non-ascii characters.
I'm using the [i]regex in the following manner:
-------- begin code --------
class Tag(models.Model):
  def __unicode__(self):
    return self.keyword
  keyword = models.CharField(max_length=64)

kwlist = [some list of keywords]
tag_query = Q()
for k in kwlist:
  rx = r'\b' + k + r'\b'
  tag_query = tag_query | Q(keyword__iregex=rx)  # or keyword__regex,
neither works

for t in Tag.objects.filter(tag_query):
  print t
-------- end code --------

When a Tag's keyword *begins* with a non-ascii character (e.g. in my
case \u010d, which
is "latin small letter c with caron"), the [i]regex lookup fails for
some reason.
The strange thing is that there seem to be no problems with words
containing such characters
elsewhere, i.e. not at the beginning of a string. I've also tried the
following:
- Q(keyword__iexact), works OK
- modified regex: r'\b.*' + k[1:] + r'\b' works OK, but obviously may
return many false positives
- modified regex: r'\b.' + k[1:] + r'\b' doesn't work(?)
- python's re.search() works OK on such strings

Is this a known issue with Django + sqlite3 combination? I've seen the
docs mention that
iexact might be problematic, but that one actually works fine.

I'm using Django 1.0.2 with sqlite3 3.4.0 and python 2.5.1 (Mac OS X
10.5).

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to