Re: UPPER vs. ILIKE for __icontains
Yeah, I've played around with the trigram indexes a little in 9.1, with mixed results. For example, they didn't appear to improve the queries I used in my examples below. Anyway, I think the case for using ILIKE for __icontains on postgres is worthwhile, and it's dead simple to change the 2 lines of code required. I hope you'll consider it for the next release. thanks ted + Though apparently pg 9.1's trigram indexes can be used for ILIKE searches (but aren't a win for all query patterns). It seems to me it'd be worthwhile to have a setting to tell the backend whether to use ILIKE vs. LIKE LOWER, though personally I'm not using __icontains so I don't have a dog in this fight :) ____ From: Ted Gruenloh To: "django-developers@googlegroups.com" Sent: Tuesday, October 4, 2011 12:37 PM Subject: UPPER vs. ILIKE for __icontains Jonas: Thanks for the response. The ticket you referenced refers to __iexact. In that case, I agree with the use of UPPER. My question/suggestion is about __icontains, which must either use '...LIKE UPPER()' or ILIKE. In my quick tests, ILIKE wins by about 10% on a table with 1 million records. ILIKE gets better the more columns you're comparing, too (30% better, in my example). See my examples below. Anyway, I changed it in my implementation because it helped significantly in my specific case. Hope this helps. thanks ted sdb=# explain analyze select * from activity where msg ilike '%exploit abc akndkf%'; QUERY PLAN -- Seq Scan on activity (cost=0.00..42147.00 rows=1 width=192) (actual time=1802.493..1802.493 rows=0 loops=1) Filter: ((msg)::text ~~* '%exploit abc akndkf%'::text) Total runtime: 1802.553 ms (3 rows) sdb=# explain analyze select * from activity where upper(msg) like upper('%exploit abc akndkf%'); QUERY PLAN Seq Scan on activity (cost=0.00..44647.00 rows=100 width=192) (actual time=2074.101..2074.101 rows=0 loops=1) Filter: (upper((msg)::text) ~~ '%EXPLOIT ABC AKNDKF%'::text) Total runtime: 2074.158 ms (3 rows) sdb=# explain analyze select * from activity where msg ilike '%123%' or text_src ilike '%123%' or text_dst ilike '%123%' order by msg; QUERY PLAN --- Sort (cost=48622.98..48646.30 rows=9327 width=192) (actual time=2241.154..2270.481 rows=3774 loops=1) Sort Key: msg Sort Method: external merge Disk: 808kB -> Seq Scan on activity (cost=0.00..47147.00 rows=9327 width=192) (actual time=11.701..2178.329 rows=3774 loops=1) Filter: (((msg)::text ~~* '%123%'::text) OR ((text_src)::text ~~* '%123%'::text) OR ((text_dst)::text ~~* '%123%'::text)) Total runtime: 2298.757 ms (6 rows) sdb=# explain analyze select * from activity where upper(msg) like upper('%123%') or upper(text_src) like upper('%123%') or upper(text_dst) like upper('%123%') order by msg; QUERY PLAN - Sort (cost=85617.58..85905.74 rows=115264 width=192) (actual time=3117.558..3147.699 rows=3774 loops=1) Sort Key: msg Sort Method: external merge Disk: 808kB -> Seq Scan on activity (cost=0.00..54647.00 rows=115264 width=192) (actual time=8.239..3064.163 rows=3774 loops=1) Filter: ((upper((msg)::text) ~~ '%123%'::text) OR (upper((text_src)::text) ~~ '%123%'::text) OR (upper((text_dst)::text) ~~ '%123%'::text)) Total runtime: 3177.240 ms (6 rows) - Original Message - From: Jonas H. To: django-developers@googlegroups.com Cc: Sent: Tuesday, October 4, 2011 11:19 AM Subject: Re: On 10/04/2011 05:51 PM, Ted Gruenloh wrote: > The django online documentation mentions that the SQL equivalent for > __icontains is something like: > > SELECT ... WHERE headline ILIKE '%Lennon%'; > However, for postgresql - one of the dbs that actually supports ILIKE - I
UPPER vs. ILIKE for __icontains
Jonas: Thanks for the response. The ticket you referenced refers to __iexact. In that case, I agree with the use of UPPER. My question/suggestion is about __icontains, which must either use '...LIKE UPPER()' or ILIKE. In my quick tests, ILIKE wins by about 10% on a table with 1 million records. ILIKE gets better the more columns you're comparing, too (30% better, in my example). See my examples below. Anyway, I changed it in my implementation because it helped significantly in my specific case. Hope this helps. thanks ted sdb=# explain analyze select * from activity where msg ilike '%exploit abc akndkf%'; QUERY PLAN -- Seq Scan on activity (cost=0.00..42147.00 rows=1 width=192) (actual time=1802.493..1802.493 rows=0 loops=1) Filter: ((msg)::text ~~* '%exploit abc akndkf%'::text) Total runtime: 1802.553 ms (3 rows) sdb=# explain analyze select * from activity where upper(msg) like upper('%exploit abc akndkf%'); QUERY PLAN Seq Scan on activity (cost=0.00..44647.00 rows=100 width=192) (actual time=2074.101..2074.101 rows=0 loops=1) Filter: (upper((msg)::text) ~~ '%EXPLOIT ABC AKNDKF%'::text) Total runtime: 2074.158 ms (3 rows) sdb=# explain analyze select * from activity where msg ilike '%123%' or text_src ilike '%123%' or text_dst ilike '%123%' order by msg; QUERY PLAN --- Sort (cost=48622.98..48646.30 rows=9327 width=192) (actual time=2241.154..2270.481 rows=3774 loops=1) Sort Key: msg Sort Method: external merge Disk: 808kB -> Seq Scan on activity (cost=0.00..47147.00 rows=9327 width=192) (actual time=11.701..2178.329 rows=3774 loops=1) Filter: (((msg)::text ~~* '%123%'::text) OR ((text_src)::text ~~* '%123%'::text) OR ((text_dst)::text ~~* '%123%'::text)) Total runtime: 2298.757 ms (6 rows) sdb=# explain analyze select * from activity where upper(msg) like upper('%123%') or upper(text_src) like upper('%123%') or upper(text_dst) like upper('%123%') order by msg; QUERY PLAN - Sort (cost=85617.58..85905.74 rows=115264 width=192) (actual time=3117.558..3147.699 rows=3774 loops=1) Sort Key: msg Sort Method: external merge Disk: 808kB -> Seq Scan on activity (cost=0.00..54647.00 rows=115264 width=192) (actual time=8.239..3064.163 rows=3774 loops=1) Filter: ((upper((msg)::text) ~~ '%123%'::text) OR (upper((text_src)::text) ~~ '%123%'::text) OR (upper((text_dst)::text) ~~ '%123%'::text)) Total runtime: 3177.240 ms (6 rows) - Original Message - From: Jonas H. To: django-developers@googlegroups.com Cc: Sent: Tuesday, October 4, 2011 11:19 AM Subject: Re: On 10/04/2011 05:51 PM, Ted Gruenloh wrote: > The django online documentation mentions that the SQL equivalent for > __icontains is something like: > > SELECT ... WHERE headline ILIKE '%Lennon%'; > However, for postgresql - one of the dbs that actually supports ILIKE - I > noticed __icontains was actually performing something similar to: > SELECT ... WHERE LOWER(headline) LIKE LOWER('%Lennon%'); > > The ILIKE is obviously much faster [...] https://code.djangoproject.com/ticket/3575 -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en. -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
[no subject]
All: I'm not a Django developer, nor will I pretend to be. But I did come across something I thought you should be aware of. The django online documentation mentions that the SQL equivalent for __icontains is something like: SELECT ... WHERE headline ILIKE '%Lennon%'; However, for postgresql - one of the dbs that actually supports ILIKE - I noticed __icontains was actually performing something similar to: SELECT ... WHERE LOWER(headline) LIKE LOWER('%Lennon%'); The ILIKE is obviously much faster, so, I dug around a little and changed a line or two in these files: /usr/local/lib/python2.7/site-packages/django/db/backends/postgresql_psycopg2/base.py /usr/local/lib/python2.7/site-packages/django/db/backends/postgresql/operations.py ...so __icontains actually uses ILIKE instead, and it's running great. OK: 1) Any reason I shouldn't have done that, performance wise? 2) Should this be changed in the codebase going forward? thanks, and I hope this helps, ted -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.