Hello, 小鱼儿. On Tue, Dec 31, 2019 at 6:32 AM 小鱼儿 <ctengc...@gmail.com> wrote:
> Assume i first use keyword search to get a DocIDSet from inverted index, > then i want to sort these docIds by some numeric field, like a > `updateTime`, does Lucene do this without need of loading the Document > objects but only with an sorted index on `updateTime`? 1. Lucene doesn't load Document objects from stored fields files while sorting for sure. 2. Lucene uses dedicated columnar data structure (DocVaues made index time, or in the worst case lazily loaded FieldCache) to obtain field values while collecting search results from inverted index. 3. One deviation from this generic algorithm is sorted index and early termination, that's probably what you meant in "Index-Only Search Optimization". > Which i call it > "Index-Only Sort Optimization" (MUST be some equal concepts in RDBMS?) > > And since Lucene has a `SortField` API, what does it do the sort? I thought > It brings up TopFieldCollector instead of the default TopScoreDocCollector. > SortField is just a post-processing... > Not really. Scoring/sorting should be done along side with searching to reduce memory footprint by storing only top candidate results in a binary heap. IIRC it's described in this classic paper http://www.savar.se/media/1181/space_optimizations_for_total_ranking.pdf -- Sincerely yours Mikhail Khludnev