On Wed, Nov 6, 2013 at 10:53 AM, Zev Benjamin <zev-pg...@strangersgate.com>wrote:
> On 11/06/2013 01:47 PM, bricklen wrote: > >> >> On Wed, Nov 6, 2013 at 10:45 AM, Zev Benjamin >> <zev-pg...@strangersgate.com <mailto:zev-pg...@strangersgate.com>> wrote: >> >> Hi, >> >> I have Postgres full text search set up for my application and it's >> been working great! However, my users would like their searches to >> turn up parts of URLs. For example, they would like a search for >> "foobar" to turn up a document that contains the string >> "http://example.com/foobar/__blah <http://example.com/foobar/blah>" >> (and similarly for queries like "example" and "blah). With the >> default dictionaries for host, url, and url_path, the search query >> would have to contain the complete host or url path. >> >> What is the best way to accomplish this? Should I be looking at >> building a custom dictionary that breaks down hosts and urls or is >> there something simpler I can do? >> >> >> Have you looked into trigrams? >> http://www.postgresql.org/docs/current/static/pgtrgm.html >> > > I've looked at it in the context of adding fuzzy search. But my > understanding is that doing a fuzzy search here would only work if the > query were a significant fraction of, say, the url path. For example, I > would expect a fuzzy search of "foobar" on "/foobar/x" to return a high > similarity, but a fuzzy search of "foobar" on "/foobar/some/very/long/path/x" > to have a low similarity. > > Or are you suggesting using trigrams in a different way? > Yeah, I was thinking more along the lines of allowing wildcard searching, not similarity. Eg. CREATE INDEX yourtable_yourcol_gist_fbi ON yourtable using GIST ( yourcol gist_trgm_ops ); select * from yourtable where yourcol ~~ '%foobar%';