> > This means that even though you have eg. 15 segments, if you requested > 50 documents, you will get the top 50 documents out of your > TopHitsCollector.
Yes, we can get the top-50 docs finally. I am not denying that. I will probably re-phrase my question. Apologize if I am not clear How do we ensure global sort-order during search across all segments of the index, when using ESTC+SMP that works only at per-segment level? When wondering about stored fields vs doc values, the right trade-off > is usually to use: > - stored fields when looking up several field values for a few documents, > - doc values when loading a few field values for many documents. Thanks for this clarification. Shall surely move towards doc-values... -- Ravi On Mon, Jun 23, 2014 at 5:36 PM, Adrien Grand <jpou...@gmail.com> wrote: > On Sun, Jun 22, 2014 at 6:44 PM, Ravikumar Govindarajan > <ravikumar.govindara...@gmail.com> wrote: > > For a normal sorting-query, on a top-level searcher, I execute > > > > TopDocs docs = searcher.search(query, 50, sortField) > > > > Then I can issue reader.document() for final list of exactly 50 docs, > which > > gives me a global order across segments but at the obvious cost of > memory... > > > > SortingMergePolicy + ETSC will make me do 50*N [N=no.of.segments] > collects, > > which could increase cost of seeks when each segment collects > considerable > > hits... > > This is not correct. :) ETSC will collect segments one after another > but in the end, what you will get are the top hits for all segments. > This means that even though you have eg. 15 segments, if you requested > 50 documents, you will get the top 50 documents out of your > TopHitsCollector. > > > - you can afford the merging overhead (ie. for heavy indexing > >> workloads, this might not be the best solution) > >> - there is a single sort order that is used for most queries > >> - you don't need any feature that requires to collect all documents > >> (like computing the total hit count or facets). > > > > > > Our use-case fits perfectly on all these 3 points and thats why we wanted > > to explore this. But our final set of results must also be globally > > ordered. May be it's mistake to assume that Sorting can be entirely > > replaced with SMP + ETSC... > > I don't think it is a mistake, this can help make the execution of > search requests significantly faster. > > > I would not advise to use the stored fields API, even in the context > >> of early termination. Doc values should be more efficient here? > > > > > > I read your excellent blog on stored-fields compression, where you've > > mentioned that stored-fields now take only one random seek. [ > > > http://blog.jpountz.net/post/35667727458/stored-fields-compression-in-lucene-4-1 > > ] > > > > If so, then what could make DocValues still a winner? > > Yes. If you use eg. 2 doc values fields to run your query, it is true > that the number of seeks in the worst case would be 2 for doc values > and only 1 for stored fields, so stored fields might look more > appropriate. However, doc values play much better with the operating > system thanks to column-stride storage since: > - it allows for lightweight and efficient compression, > - the filesystem cache doesn't get loaded on field values that you > are not interested in. > > When wondering about stored fields vs doc values, the right trade-off > is usually to use: > - stored fields when looking up several field values for a few documents, > - doc values when loading a few field values for many documents. > > > -- > Adrien > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >