On 3/10/11 8:32 PM, Felipe Hummel wrote:
Hi, I'm building a system where I want to show only results indexed in the
past few days. Furthermore, I don't want to maintain a giant index with
millions of documents if I only want to return results from a couple of days
(thousands of documents).

My system heavily relies that the occurrences of terms in documents stored
in the index have a realistic distribution (consequently: realistic IDF).

That said, I would like to use a small index to return results, but I want
to compute documents score using a IDF from a much greater Index (or even an
external source).

The Similarity API doesn't seem to allow me to do this. The *idf* method
does not receive as parameter the term being used.

Another possibility is to use TrieRangeQuery to make sure the documents
shown are within the last couple of days. Again, I rather not mantain a
large index. Also this kind of query is not cheap.

Am I missing something?

Take a look at SOLR-1632. Indeed, it's not possible to do this using Similarity alone. You will need something like the DFSource class in that patch, i.e. a subclass of IndexSearcher, where you populate term->DF map with values obtained from the full index, and then you use this map to calculate IDF.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to