[EMAIL PROTECTED] wrote on 06/17/2006 10:52 PM:
> I am thinking of modifying lucene's current ranking algorithm to include the
> document's recency-weightage. So that the latest modified documents gets
> preference over earlier modified documents, which makes sense for news
> search.
>
> (I believe) To do this I have to tinker with TermScorer.score() method, and
> calculate document-score in its while (doc < end) {..} loop. The requirement
> is that document's lastModifiedTime is stored in the doc's field, and
> extracting this value could be quite expensive for every iteration in its
> posting stream. One approach could be to store it in a separate file (like
> Normalization) to avoid field-lookup.
>
> Any other ideas/suggestions.. Or if anyone has already implemented this ?
>
Does recency correlate with the order in which documents are added to
you index? If so, then perhaps you can use doc-id as a measure of
recency and thereby avoid accessing a stored field. I'm not certain,
but based on a quick perusal of the relevant code, it appears that both
index opening and segment merging preserve the order of doc-ids. If you
take this approach, you should verify.
If you end up needed a stored field, then be sure to use the lazy fields
capability (recently committed) to access it.
Chuck
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]