Re: UPPER vs. ILIKE for __icontains

2011-10-05 Thread Ted Gruenloh
Yeah, I've played around with the trigram indexes a little in 9.1, with mixed 
results.  For example, they didn't appear to improve the queries I used in my 
examples below.

Anyway, I think the case for using ILIKE for __icontains on postgres is 
worthwhile, and it's dead simple to change the 2 lines of code required.  I 
hope you'll consider it for the next release.

thanks
ted

+


Though apparently pg 9.1's trigram indexes can be used for ILIKE 
searches (but aren't a win for all query patterns). It seems to me it'd 
be worthwhile to have a setting to tell the backend whether to use ILIKE vs. 
LIKE LOWER, though personally I'm not using __icontains so I don't 
have a dog in this fight :)


____
From: Ted Gruenloh 
To: "django-developers@googlegroups.com" 
Sent: Tuesday, October 4, 2011 12:37 PM
Subject: UPPER vs. ILIKE for __icontains

Jonas:

Thanks for the response.  The ticket you referenced refers to __iexact.   In 
that case, I agree with the use of UPPER.  My question/suggestion is about 
__icontains, which must either use '...LIKE UPPER()' or ILIKE.  In my quick 
tests, ILIKE wins by about 10% on a table with 1 million records.  ILIKE gets 
better the more columns you're comparing, too (30% better, in my example).  See 
my examples below.


Anyway, I changed it in my implementation because it helped significantly in my 
specific case.  Hope this helps.

thanks
ted


sdb=# explain analyze select * from activity where msg ilike '%exploit abc 
akndkf%';
  QUERY 
PLAN  
--
 Seq Scan on activity  (cost=0.00..42147.00 rows=1 width=192) (actual 
time=1802.493..1802.493 rows=0 loops=1)
   Filter: ((msg)::text ~~* '%exploit abc akndkf%'::text)
 Total runtime: 1802.553 ms
(3 rows)

sdb=# explain analyze select * from activity where upper(msg) like 
upper('%exploit abc akndkf%');
   QUERY 
PLAN   

 Seq Scan on activity  (cost=0.00..44647.00 rows=100 width=192) (actual 
time=2074.101..2074.101 rows=0 loops=1)
   Filter: (upper((msg)::text) ~~ '%EXPLOIT ABC AKNDKF%'::text)
 Total runtime: 2074.158 ms
(3 rows)



sdb=# explain analyze select * from activity where msg ilike '%123%' or 
text_src ilike '%123%' or text_dst ilike '%123%' order by msg;
    QUERY 
PLAN 
---
 Sort  (cost=48622.98..48646.30 rows=9327 width=192) (actual 
time=2241.154..2270.481 rows=3774 loops=1)
   Sort Key: msg
   Sort Method: external merge  Disk: 808kB
   ->  Seq Scan on activity  (cost=0.00..47147.00 rows=9327 width=192) (actual 
time=11.701..2178.329 rows=3774 loops=1)
 Filter: (((msg)::text ~~* '%123%'::text) OR ((text_src)::text ~~* 
'%123%'::text) OR ((text_dst)::text ~~* '%123%'::text))
 Total runtime: 2298.757 ms
(6 rows)

sdb=# explain analyze select * from activity where upper(msg) like 
upper('%123%') or upper(text_src) like upper('%123%') or upper(text_dst) like 
upper('%123%') order by msg;
 QUERY 
PLAN  
-
 Sort  (cost=85617.58..85905.74 rows=115264 width=192) (actual 
time=3117.558..3147.699 rows=3774 loops=1)
   Sort Key: msg
   Sort Method: external merge  Disk: 808kB
   ->  Seq Scan on activity  (cost=0.00..54647.00 rows=115264 width=192) 
(actual time=8.239..3064.163 rows=3774 loops=1)
 Filter: ((upper((msg)::text) ~~ '%123%'::text) OR 
(upper((text_src)::text) ~~ '%123%'::text) OR (upper((text_dst)::text) ~~ 
'%123%'::text))
 Total runtime: 3177.240 ms
(6 rows)






- Original Message -
From: Jonas H. 
To: django-developers@googlegroups.com
Cc: 
Sent: Tuesday, October 4, 2011 11:19 AM
Subject: Re:

On 10/04/2011 05:51 PM, Ted Gruenloh wrote:
> The django online documentation mentions that the SQL equivalent for  
> __icontains is something like:
>
>      SELECT ... WHERE headline ILIKE '%Lennon%';
> However, for postgresql - one of the dbs that actually supports ILIKE - I

UPPER vs. ILIKE for __icontains

2011-10-04 Thread Ted Gruenloh
Jonas:

Thanks for the response.  The ticket you referenced refers to __iexact.   In 
that case, I agree with the use of UPPER.  My question/suggestion is about 
__icontains, which must either use '...LIKE UPPER()' or ILIKE.  In my quick 
tests, ILIKE wins by about 10% on a table with 1 million records.  ILIKE gets 
better the more columns you're comparing, too (30% better, in my example).  See 
my examples below.


Anyway, I changed it in my implementation because it helped significantly in my 
specific case.  Hope this helps.

thanks
ted


sdb=# explain analyze select * from activity where msg ilike '%exploit abc 
akndkf%';
  QUERY 
PLAN  
--
 Seq Scan on activity  (cost=0.00..42147.00 rows=1 width=192) (actual 
time=1802.493..1802.493 rows=0 loops=1)
   Filter: ((msg)::text ~~* '%exploit abc akndkf%'::text)
 Total runtime: 1802.553 ms
(3 rows)

sdb=# explain analyze select * from activity where upper(msg) like 
upper('%exploit abc akndkf%');
   QUERY 
PLAN   

 Seq Scan on activity  (cost=0.00..44647.00 rows=100 width=192) (actual 
time=2074.101..2074.101 rows=0 loops=1)
   Filter: (upper((msg)::text) ~~ '%EXPLOIT ABC AKNDKF%'::text)
 Total runtime: 2074.158 ms
(3 rows)



sdb=# explain analyze select * from activity where msg ilike '%123%' or 
text_src ilike '%123%' or text_dst ilike '%123%' order by msg;
    QUERY 
PLAN 
---
 Sort  (cost=48622.98..48646.30 rows=9327 width=192) (actual 
time=2241.154..2270.481 rows=3774 loops=1)
   Sort Key: msg
   Sort Method: external merge  Disk: 808kB
   ->  Seq Scan on activity  (cost=0.00..47147.00 rows=9327 width=192) (actual 
time=11.701..2178.329 rows=3774 loops=1)
 Filter: (((msg)::text ~~* '%123%'::text) OR ((text_src)::text ~~* 
'%123%'::text) OR ((text_dst)::text ~~* '%123%'::text))
 Total runtime: 2298.757 ms
(6 rows)

sdb=# explain analyze select * from activity where upper(msg) like 
upper('%123%') or upper(text_src) like upper('%123%') or upper(text_dst) like 
upper('%123%') order by msg;
 QUERY 
PLAN  
-
 Sort  (cost=85617.58..85905.74 rows=115264 width=192) (actual 
time=3117.558..3147.699 rows=3774 loops=1)
   Sort Key: msg
   Sort Method: external merge  Disk: 808kB
   ->  Seq Scan on activity  (cost=0.00..54647.00 rows=115264 width=192) 
(actual time=8.239..3064.163 rows=3774 loops=1)
 Filter: ((upper((msg)::text) ~~ '%123%'::text) OR 
(upper((text_src)::text) ~~ '%123%'::text) OR (upper((text_dst)::text) ~~ 
'%123%'::text))
 Total runtime: 3177.240 ms
(6 rows)






- Original Message -
From: Jonas H. 
To: django-developers@googlegroups.com
Cc: 
Sent: Tuesday, October 4, 2011 11:19 AM
Subject: Re:

On 10/04/2011 05:51 PM, Ted Gruenloh wrote:
> The django online documentation mentions that the SQL equivalent for  
> __icontains is something like:
>
>      SELECT ... WHERE headline ILIKE '%Lennon%';
> However, for postgresql - one of the dbs that actually supports ILIKE - I 
> noticed __icontains was actually performing something similar to:
>      SELECT ... WHERE LOWER(headline) LIKE LOWER('%Lennon%');
>
> The ILIKE is obviously much faster [...]

https://code.djangoproject.com/ticket/3575

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



[no subject]

2011-10-04 Thread Ted Gruenloh


All:

I'm not a Django developer, nor will I pretend to be.  But I did come across 
something I thought you should be aware of.

The django online documentation mentions that the SQL equivalent for  
__icontains is something like:

    SELECT ... WHERE headline ILIKE '%Lennon%';
However, for postgresql - one of the dbs that actually supports ILIKE - I 
noticed __icontains was actually performing something similar to:
    SELECT ... WHERE LOWER(headline) LIKE LOWER('%Lennon%');

The ILIKE is obviously much faster, so, I dug around a little and changed a 
line or two in these files:

/usr/local/lib/python2.7/site-packages/django/db/backends/postgresql_psycopg2/base.py
/usr/local/lib/python2.7/site-packages/django/db/backends/postgresql/operations.py

...so __icontains actually uses ILIKE instead, and it's running great.

OK:

1) Any reason I shouldn't have done that, performance wise?
2) Should this be changed in the codebase going forward?


thanks, and I hope this helps,
ted

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.