Instead of playing around with tf/idf, how about just indexing and
searching the domain.


--
Ian.


On Fri, Jan 29, 2010 at 3:43 AM, Franz Allan Valencia See
<franz....@gmail.com> wrote:
> Good day,
>
> I am currently using lucene for my searches. And one of the problems that Im
> facing is when keyword is a url. The tokens such as http, https, ://, index,
> html, etc seems to be messing up with our search results. The focus was
> supposed to be only on the url domain.
>
> The idea that I have is modify the idf so that rare terms get boosted much
> more than the default settings in lucene. Since there are probably a lot of
> http, https://, etc, then matches to these terms should be really really
> low, while matches to the domain (which is rare) should be high.
>
> Would this work or am I totally misunderstanding lucene's tf/idf? :-)
>
> Thanks,
>
> --
> Franz Allan Valencia See | Java Software Engineer
> franz....@gmail.com
> LinkedIn: http://www.linkedin.com/in/franzsee
> Twitter: http://www.twitter.com/franz_see
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to