Mark Miller wrote:
https://issues.apache.org/jira/browse/LUCENE-997
What this issue doesn't discuss is what to do with partial results obtained when a timeout occurred. As the original poster points out, document lists are traversed in the order they were added and not the order of their importance, which introduces a bias to partial results in that they reflect results from a random sample (which is likely not the most relevant, i.e. there could have been more relevant results later in the traversal order).
The answer to this issue is org.apache.nutch.indexer.IndexSorter, which rearranges posting lists in an already existing index, according to an arbitrary measure of document importance, so that documents with smaller id are more "important". Then the partial hit lists obtained as a result of early termination will have a better chance of being more relevant.
-- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]