Hi, On 08/02/2010 03:12 PM, Sushant Sinha wrote:
The current text parser already returns url and url_path. That already increases the number of unique tokens.
Well, I think I simply turned that off to be able to search for plain words. It still works for complete URLs, those are just treated like text, then.
Earlier people have expressed the need to index urls/emails and currently the text parser already does so. Reverting that would be a regression of functionality. Further, a ranking function can take advantage of direct match of a token.
That's a point, yes. However, simply making the same string turn up twice in the tokenizer's output doesn't sound like the right solution to me. Especially considering that the query parser uses the very same tokenizer.
Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers