Hi,

I have a query

select count(*)
 from result
 where exists
   (select * from item where item.url LIKE result.url || '%' limit 1);

which basically returns the number of items which exist in table result and match a URL in table item by its prefix. I read all about idexes (http://www.postgresql.org/docs/8.3/static/indexes-types.html ) and especially this part: "The optimizer can also use a B-tree index for queries involving the pattern matching operators LIKE and ~ if the pattern is a constant and is anchored to the beginning of the string — for example, col LIKE 'foo %' or col ~ '^foo', but not col LIKE '%bar'."

Since my server does not use the C locale I created the index with

CREATE INDEX test_index
  ON item
  USING btree
  (url varchar_pattern_ops);

which works fine for queries like

SELECT distinct url from item where url like 'http://www.micro%' limit 10;

explain analyze shows:
"Limit (cost=9.53..9.54 rows=1 width=34) (actual time=80.809..80.856 rows=10 loops=1)" " -> Unique (cost=9.53..9.54 rows=1 width=34) (actual time=80.806..80.835 rows=10 loops=1)" " -> Sort (cost=9.53..9.53 rows=1 width=34) (actual time=80.802..80.812 rows=11 loops=1)"
"              Sort Key: url"
"              Sort Method:  quicksort  Memory: 306kB"
" -> Index Scan using test_index on item (cost=0.00..9.52 rows=1 width=34) (actual time=0.030..6.165 rows=2254 loops=1)" " Index Cond: (((url)::text ~>=~ 'http:// www.micro'::text) AND ((url)::text ~<~ 'http://www.micrp'::text))"
"                    Filter: ((url)::text ~~ 'http://www.micro%'::text)"
"Total runtime: 80.908 ms"

which is great but if I run the query with the subselect it uses a sequence scan:

select *
 from result
 where exists
(select * from item where item.url LIKE result.url || '%' limit 1) limit 10;

"Limit (cost=0.00..96.58 rows=10 width=36) (actual time=12.660..35295.928 rows=10 loops=1)" " -> Seq Scan on result (cost=0.00..93886121.77 rows=9721314 width=36) (actual time=12.657..35295.906 rows=10 loops=1)"
"        Filter: (subplan)"
"        SubPlan"
" -> Limit (cost=0.00..4.81 rows=1 width=42) (actual time=2715.061..2715.061 rows=1 loops=13)" " -> Seq Scan on item (cost=0.00..109589.49 rows=22781 width=42) (actual time=2715.055..2715.055 rows=1 loops=13)" " Filter: ((url)::text ~~ (($0)::text || '%'::text))"
"Total runtime: 35295.994 ms"


The only explaination is that I don't use a constant when comparing the values. But actually it is a constant...


any help?

using postgres 8.3.3 on ubuntu.



Cheers,

moritz


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to