Re: how to get results without getting total number of found documents?

Andrzej Bialecki Tue, 26 Sep 2006 16:26:56 -0700

Vlad,

Please check published papers on sampling inverted indexes andmulti-level caching - this is most probably what Google and other majorsearch engines use.

You can see a simple implementation of this principle in Nutch - theindex is sorted in decreasing order by a PageRank-like score (the logicfor this is in IndexSorter.java), and then when running a query we onlycollect top-N results, and extrapolate total numbers over the wholecollection, assuming certain model of term distributions(LuceneQueryOptimizer.LimitedCollector).


--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: how to get results without getting total number of found documents?

Reply via email to