How should I go about identifying the domain? Thanks,
-- Franz Allan Valencia See | Java Software Engineer franz....@gmail.com LinkedIn: http://www.linkedin.com/in/franzsee Twitter: http://www.twitter.com/franz_see On Fri, Jan 29, 2010 at 6:42 PM, Ian Lea <ian....@gmail.com> wrote: > Instead of playing around with tf/idf, how about just indexing and > searching the domain. > > > -- > Ian. > > > On Fri, Jan 29, 2010 at 3:43 AM, Franz Allan Valencia See > <franz....@gmail.com> wrote: > > Good day, > > > > I am currently using lucene for my searches. And one of the problems that > Im > > facing is when keyword is a url. The tokens such as http, https, ://, > index, > > html, etc seems to be messing up with our search results. The focus was > > supposed to be only on the url domain. > > > > The idea that I have is modify the idf so that rare terms get boosted > much > > more than the default settings in lucene. Since there are probably a lot > of > > http, https://, etc, then matches to these terms should be really really > > low, while matches to the domain (which is rare) should be high. > > > > Would this work or am I totally misunderstanding lucene's tf/idf? :-) > > > > Thanks, > > > > -- > > Franz Allan Valencia See | Java Software Engineer > > franz....@gmail.com > > LinkedIn: http://www.linkedin.com/in/franzsee > > Twitter: http://www.twitter.com/franz_see > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >