[EMAIL PROTECTED] wrote on 06/17/2006 10:52 PM: > I am thinking of modifying lucene's current ranking algorithm to include the > document's recency-weightage. So that the latest modified documents gets > preference over earlier modified documents, which makes sense for news > search. > > (I believe) To do this I have to tinker with TermScorer.score() method, and > calculate document-score in its while (doc < end) {..} loop. The requirement > is that document's lastModifiedTime is stored in the doc's field, and > extracting this value could be quite expensive for every iteration in its > posting stream. One approach could be to store it in a separate file (like > Normalization) to avoid field-lookup. > > Any other ideas/suggestions.. Or if anyone has already implemented this ? >
Does recency correlate with the order in which documents are added to you index? If so, then perhaps you can use doc-id as a measure of recency and thereby avoid accessing a stored field. I'm not certain, but based on a quick perusal of the relevant code, it appears that both index opening and segment merging preserve the order of doc-ids. If you take this approach, you should verify. If you end up needed a stored field, then be sure to use the lazy fields capability (recently committed) to access it. Chuck --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]