Hello, 小鱼儿.

On Tue, Dec 31, 2019 at 6:32 AM 小鱼儿 <ctengc...@gmail.com> wrote:

> Assume i first use keyword search to get a DocIDSet from inverted index,
> then i want to sort these docIds by some numeric field, like a
> `updateTime`, does Lucene do this without need of loading the Document
> objects but only with an sorted index on `updateTime`?

1. Lucene doesn't load Document objects from stored fields files while
sorting for sure.
2. Lucene uses dedicated columnar data structure (DocVaues made index time,
or in the worst case lazily loaded FieldCache) to obtain field values while
collecting search results from inverted index.
3. One deviation from this generic algorithm is sorted index and early
termination, that's probably what you meant in "Index-Only Search
Optimization".


> Which i call it
> "Index-Only Sort Optimization" (MUST be some equal concepts in RDBMS?)
>
> And since Lucene has a `SortField` API, what does it do the sort? I thought
>
It brings up TopFieldCollector instead of the default TopScoreDocCollector.

> SortField is just a post-processing...
>
Not really. Scoring/sorting should be done along side with searching to
reduce memory footprint by storing only top candidate results in a binary
heap.
IIRC it's described in this classic paper
http://www.savar.se/media/1181/space_optimizations_for_total_ranking.pdf


-- 
Sincerely yours
Mikhail Khludnev

Reply via email to