Re: Normalization of Documents

Peter Carlson Sat, 13 Apr 2002 08:24:36 -0700

Hi Bernhard,

I think this is a very interesting issue.


I think that changing the scoring algorithm is one part of it, the other is
to get the information from the Document to use in the ranking. Since this
is an expensive operation, there will have to be an alternative approach.

Do you have any suggestions to start?

Thanks

--Peter

On 4/13/02 6:05 AM, "Bernhard Messer" <[EMAIL PROTECTED]> wrote:


> 
> the topic you are focusing on is a never ending story in content
> retrieval in general. There is no perfect solution which fits in every
> environment. Retrieving a document's context based on a single query
> term seems to be very difficult also. In Lucene it isn't de very
> difficult to change the ranking algorithm. If you don't like the field
> normalization, you could comment the following in line in the TermScorer
> class.
> 
> score *= Similarity.norm(norms[d]);
> 
> If you put a comment around this line, youre scoring is based on the
> term frequency.
> 
> If more people are interested, we could think on a little bit more
> flexible ranking system within Lucene. There would be several parameters
> which from the environment which could be used to rank a document.
> Therefore we would need an interface where we could change the lucene
> document boost factor during runtime. For example, a document's ranking
> could be based on:
>   links pointing to that document (like Google)
>   last modification date,
>   size of the document,
>   term frequency,
>   how often was it displayed by other users, sending the same query
> terms to the system
>   .....


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Re: Normalization of Documents

Reply via email to