[EMAIL PROTECTED] wrote on 06/17/2006 10:52 PM:
> I am thinking of modifying lucene's current ranking algorithm to include the 
> document's recency-weightage. So that the latest modified documents gets 
> preference over earlier modified documents, which makes sense for news 
> search. 
>
> (I believe) To do this I have to tinker with TermScorer.score() method, and 
> calculate document-score  in its while (doc < end) {..} loop. The requirement 
> is that document's lastModifiedTime is stored in the doc's field, and 
> extracting this value could be quite expensive for every iteration in its 
> posting stream. One approach could be to store it in a separate file (like 
> Normalization) to avoid field-lookup. 
>
> Any other ideas/suggestions.. Or if anyone has already implemented this ? 
>   

Does recency correlate with the order in which documents are added to
you index?  If so, then perhaps you can use doc-id as a measure of
recency and thereby avoid accessing a stored field.  I'm not certain,
but based on a quick perusal of the relevant code, it appears that both
index opening and segment merging preserve the order of doc-ids.  If you
take this approach, you should verify.

If you end up needed a stored field, then be sure to use the lazy fields
capability (recently committed) to access it.

Chuck


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to