Good day,

I am currently using lucene for my searches. And one of the problems that Im
facing is when keyword is a url. The tokens such as http, https, ://, index,
html, etc seems to be messing up with our search results. The focus was
supposed to be only on the url domain.

The idea that I have is modify the idf so that rare terms get boosted much
more than the default settings in lucene. Since there are probably a lot of
http, https://, etc, then matches to these terms should be really really
low, while matches to the domain (which is rare) should be high.

Would this work or am I totally misunderstanding lucene's tf/idf? :-)

Thanks,

-- 
Franz Allan Valencia See | Java Software Engineer
franz....@gmail.com
LinkedIn: http://www.linkedin.com/in/franzsee
Twitter: http://www.twitter.com/franz_see

Reply via email to