Good day, I am currently using lucene for my searches. And one of the problems that Im facing is when keyword is a url. The tokens such as http, https, ://, index, html, etc seems to be messing up with our search results. The focus was supposed to be only on the url domain.
The idea that I have is modify the idf so that rare terms get boosted much more than the default settings in lucene. Since there are probably a lot of http, https://, etc, then matches to these terms should be really really low, while matches to the domain (which is rare) should be high. Would this work or am I totally misunderstanding lucene's tf/idf? :-) Thanks, -- Franz Allan Valencia See | Java Software Engineer franz....@gmail.com LinkedIn: http://www.linkedin.com/in/franzsee Twitter: http://www.twitter.com/franz_see