How should I go about identifying the domain?

Thanks,

-- 
Franz Allan Valencia See | Java Software Engineer
franz....@gmail.com
LinkedIn: http://www.linkedin.com/in/franzsee
Twitter: http://www.twitter.com/franz_see

On Fri, Jan 29, 2010 at 6:42 PM, Ian Lea <ian....@gmail.com> wrote:

> Instead of playing around with tf/idf, how about just indexing and
> searching the domain.
>
>
> --
> Ian.
>
>
> On Fri, Jan 29, 2010 at 3:43 AM, Franz Allan Valencia See
> <franz....@gmail.com> wrote:
> > Good day,
> >
> > I am currently using lucene for my searches. And one of the problems that
> Im
> > facing is when keyword is a url. The tokens such as http, https, ://,
> index,
> > html, etc seems to be messing up with our search results. The focus was
> > supposed to be only on the url domain.
> >
> > The idea that I have is modify the idf so that rare terms get boosted
> much
> > more than the default settings in lucene. Since there are probably a lot
> of
> > http, https://, etc, then matches to these terms should be really really
> > low, while matches to the domain (which is rare) should be high.
> >
> > Would this work or am I totally misunderstanding lucene's tf/idf? :-)
> >
> > Thanks,
> >
> > --
> > Franz Allan Valencia See | Java Software Engineer
> > franz....@gmail.com
> > LinkedIn: http://www.linkedin.com/in/franzsee
> > Twitter: http://www.twitter.com/franz_see
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Reply via email to