Instead of playing around with tf/idf, how about just indexing and searching the domain.
-- Ian. On Fri, Jan 29, 2010 at 3:43 AM, Franz Allan Valencia See <franz....@gmail.com> wrote: > Good day, > > I am currently using lucene for my searches. And one of the problems that Im > facing is when keyword is a url. The tokens such as http, https, ://, index, > html, etc seems to be messing up with our search results. The focus was > supposed to be only on the url domain. > > The idea that I have is modify the idf so that rare terms get boosted much > more than the default settings in lucene. Since there are probably a lot of > http, https://, etc, then matches to these terms should be really really > low, while matches to the domain (which is rare) should be high. > > Would this work or am I totally misunderstanding lucene's tf/idf? :-) > > Thanks, > > -- > Franz Allan Valencia See | Java Software Engineer > franz....@gmail.com > LinkedIn: http://www.linkedin.com/in/franzsee > Twitter: http://www.twitter.com/franz_see > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org