Mark Miller wrote:
https://issues.apache.org/jira/browse/LUCENE-997

What this issue doesn't discuss is what to do with partial results obtained when a timeout occurred. As the original poster points out, document lists are traversed in the order they were added and not the order of their importance, which introduces a bias to partial results in that they reflect results from a random sample (which is likely not the most relevant, i.e. there could have been more relevant results later in the traversal order).

The answer to this issue is org.apache.nutch.indexer.IndexSorter, which rearranges posting lists in an already existing index, according to an arbitrary measure of document importance, so that documents with smaller id are more "important". Then the partial hit lists obtained as a result of early termination will have a better chance of being more relevant.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to