Using the doc-id itself as a recency metric is smart thinking. But the weight
is actually a sigmoidal function based on the oldness(i.e.
currentTime-documentIndexingTime), hence just cant use the doc-id itself.
What is the JIRA BUGid for the lazy fiekd capability. Woudl like to know more
about this feature.
thanks for the help,
Prasen
-----Original Message-----
From: Chuck Williams <[EMAIL PROTECTED]>
To: [email protected]
Sent: Sun, 18 Jun 2006 07:47:40 -1000
Subject: Re: Recency weightage in Lucene
[EMAIL PROTECTED] wrote on 06/17/2006 10:52 PM:
> I am thinking of modifying lucene's current ranking algorithm to include the
document's recency-weightage. So that the latest modified documents gets
preference over earlier modified documents, which makes sense for news search.
>
> (I believe) To do this I have to tinker with TermScorer.score() method, and
calculate document-score in its while (doc < end) {..} loop. The requirement
is
that document's lastModifiedTime is stored in the doc's field, and extracting
this value could be quite expensive for every iteration in its posting stream.
One approach could be to store it in a separate file (like Normalization) to
avoid field-lookup.
>
> Any other ideas/suggestions.. Or if anyone has already implemented this ?
>
Does recency correlate with the order in which documents are added to
you index? If so, then perhaps you can use doc-id as a measure of
recency and thereby avoid accessing a stored field. I'm not certain,
but based on a quick perusal of the relevant code, it appears that both
index opening and segment merging preserve the order of doc-ids. If you
take this approach, you should verify.
If you end up needed a stored field, then be sure to use the lazy fields
capability (recently committed) to access it.
Chuck
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
________________________________________________________________________
Check out AOL.com today. Breaking news, video search, pictures, email and IM.
All on demand. Always Free.