Re: [PERFORM] wildcard search performance with "like"

2006-01-18 Thread Michael Riess
As far as I know the index is only used when you do a prefix search, for 
example


col like 'xyz%'

I think that if you are looking for expressions such as 'A%B', you could 
rephrase them like this:


col like 'A%' AND col like 'A%B'

So the database could use the index to narrow down the result and then 
do a sequential search for the second condition.


Mike


Yantao Shi schrieb:

Hi,

I have a postges 8.1.1 table with over 29 million rows in it. The colunm 
(file_name) that I need to search on has entries like the following:


MOD04_L2.A2005311.1400.004.2005312013848.hdf

MYD04_L2.A2005311.0700.004.2005312013437.hdf
I have an index on this column. But an index search is performance only 
when I give the full file_name for search:


testdbspc=# explain select file_name from catalog where file_name = 
'MOD04_L2.A2005311.1400.004.2005312013848.hdf';

QUERY PLAN
Index Scan using catalog_pk_idx on catalog  (cost=0.00..6.01 rows=1 
width=404)
 Index Cond: (file_name = 
'MOD04_L2.A2005311.1400.004.2005312013848.hdf'::bpchar)

(2 rows)

What I really need to do most of the time is a multi-wildcard search on 
this column, which is now doing a whole table scan without using the 
index at all:


testdbspc=# explain select file_name from catalog where file_name like 
'MOD04_L2.A2005311.%.004.2005312013%.hdf';

QUERY PLAN
Seq Scan on catalog  (cost=0.00..429.00 rows=1 width=404)
 Filter: (file_name ~~ 'MOD04_L2.A2005311.%.004.2005312013%.hdf'::text)
(2 rows)

Obviously, the performance of the table scan on such a large table is 
not acceptable.


I tried full-text indexing and searching. It did NOT work on this column 
because all the letters and numbers are linked together with "." and 
considered one big single word by to_tsvector.


Any solutions for this column to use an index search with multiple wild 
cards?


Thanks a lot,
Yantao Shi




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] wildcard search performance with "like"

2006-01-17 Thread Tom Lane
Yantao Shi <[EMAIL PROTECTED]> writes:
> testdbspc=# explain select file_name from catalog where file_name like 
> 'MOD04_L2.A2005311.%.004.2005312013%.hdf';
> QUERY PLAN
> Seq Scan on catalog  (cost=0.00..429.00 rows=1 width=404)
>   Filter: (file_name ~~ 'MOD04_L2.A2005311.%.004.2005312013%.hdf'::text)
> (2 rows)

I'm betting you are using a non-C locale.  You need either to run the
database in C locale, or to create a special index type that is
compatible with LIKE searches.  See
http://www.postgresql.org/docs/8.1/static/indexes-opclass.html

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] wildcard search performance with "like"

2006-01-17 Thread Yantao Shi

Hi,

I have a postges 8.1.1 table with over 29 million rows in it. The colunm 
(file_name) that I need to search on has entries like the following:


MOD04_L2.A2005311.1400.004.2005312013848.hdf

MYD04_L2.A2005311.0700.004.2005312013437.hdf 

I have an index on this column. But an index search is performance only 
when I give the full file_name for search:


testdbspc=# explain select file_name from catalog where file_name = 
'MOD04_L2.A2005311.1400.004.2005312013848.hdf';

QUERY PLAN
Index Scan using catalog_pk_idx on catalog  (cost=0.00..6.01 rows=1 
width=404)
 Index Cond: (file_name = 
'MOD04_L2.A2005311.1400.004.2005312013848.hdf'::bpchar)

(2 rows)

What I really need to do most of the time is a multi-wildcard search on 
this column, which is now doing a whole table scan without using the 
index at all:


testdbspc=# explain select file_name from catalog where file_name like 
'MOD04_L2.A2005311.%.004.2005312013%.hdf';

QUERY PLAN
Seq Scan on catalog  (cost=0.00..429.00 rows=1 width=404)
 Filter: (file_name ~~ 'MOD04_L2.A2005311.%.004.2005312013%.hdf'::text)
(2 rows)

Obviously, the performance of the table scan on such a large table is 
not acceptable.


I tried full-text indexing and searching. It did NOT work on this column 
because all the letters and numbers are linked together with "." and 
considered one big single word by to_tsvector.


Any solutions for this column to use an index search with multiple wild 
cards?


Thanks a lot,
Yantao Shi




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq