Kevin, I am also interested in a deeper explanation of what you mean by, "The easy way to do that is just to add a trigram index and search for similar strings, and forget about full text search." Because I need to make a decision about whether to use full text search or use other pattern matching facilities such as LIKE and/or regular expressions. For me, the reason I don't just default to full text search is the documents are relative small (i.e, HTML <= 128K) and number fewer than 10,000 so I'm not sure if the effort expended to learn the ins/outs of full text search will be beneficial to my use case.
Regards, Dane On Thu, Jul 23, 2015 at 12:17 PM, Jeff Janes <jeff.ja...@gmail.com> wrote: > On Thu, Jul 23, 2015 at 2:55 AM, amihay gonen <agone...@gmail.com> wrote: > >> Hi I'm trying to implement a text search in PG . >> >> My goal to enable the user search on several columns also on partial >> words. >> >> here is sample code : >> create table test_test( text_data tsvector, text_a varchar,text_b >> varchar); >> >> insert into test_test(text_a,text_b) select 'name 10.10.2.3 ','name3 >> name' from generate_series(1,500); >> update test_test set text_data=to_tsvector(text_a||' '||text_b); >> CREATE INDEX test_test_idx ON test_test USING gin(text_data); >> >> explain ANALYZE select * from test_test where text_data@ >> @plainto_tsquery('name'); >> >> my questions are : >> 1. why the index is not used (I guess it is related to the way the data >> is generated) >> > > It returns the entire table, so there is no point in using an index. Yes, > it is the way it is generated, the same data repeated over and over is not > very realistic. If you just want random text, I use md5(random()::text). > But if you want text that looks vaguely like English, I don't have a nice > trick for that. Maybe load the sgml files into a table. > > >> 2, how can I use pg_trgm with ts_vector to enable to answer query like >> 10.10 or nam ? >> the idea is to use the gin index , maybe there are other option >> without using pg_trgm? >> > > Do you mean: > > WHERE text_a LIKE '%10.10%' or text_a LIKE '%nam%' ? > > With the or, that going to be hard to optimize. > > Anyway, pg_tgrm requires its own special GIN index, it can't piggy back on > the tsvector GIN index. > > CREATE INDEX whatever ON test_test USING gin(text_a gin_trgm_ops); > > or > > CREATE INDEX whatever ON test_test USING gin((text_a||' '||text_b) > gin_trgm_ops); > > But, LIKE '%10.10%' is going to be tough for a pg_trgm index to help with, > unless you compile your own code after removing "#define KEEPONLYALNUM" > > Cheers, > > Jeff >