On Sun, Jun 22, 2014 at 6:44 PM, Ravikumar Govindarajan <ravikumar.govindara...@gmail.com> wrote: > For a normal sorting-query, on a top-level searcher, I execute > > TopDocs docs = searcher.search(query, 50, sortField) > > Then I can issue reader.document() for final list of exactly 50 docs, which > gives me a global order across segments but at the obvious cost of memory... > > SortingMergePolicy + ETSC will make me do 50*N [N=no.of.segments] collects, > which could increase cost of seeks when each segment collects considerable > hits...
This is not correct. :) ETSC will collect segments one after another but in the end, what you will get are the top hits for all segments. This means that even though you have eg. 15 segments, if you requested 50 documents, you will get the top 50 documents out of your TopHitsCollector. > - you can afford the merging overhead (ie. for heavy indexing >> workloads, this might not be the best solution) >> - there is a single sort order that is used for most queries >> - you don't need any feature that requires to collect all documents >> (like computing the total hit count or facets). > > > Our use-case fits perfectly on all these 3 points and thats why we wanted > to explore this. But our final set of results must also be globally > ordered. May be it's mistake to assume that Sorting can be entirely > replaced with SMP + ETSC... I don't think it is a mistake, this can help make the execution of search requests significantly faster. > I would not advise to use the stored fields API, even in the context >> of early termination. Doc values should be more efficient here? > > > I read your excellent blog on stored-fields compression, where you've > mentioned that stored-fields now take only one random seek. [ > http://blog.jpountz.net/post/35667727458/stored-fields-compression-in-lucene-4-1 > ] > > If so, then what could make DocValues still a winner? Yes. If you use eg. 2 doc values fields to run your query, it is true that the number of seeks in the worst case would be 2 for doc values and only 1 for stored fields, so stored fields might look more appropriate. However, doc values play much better with the operating system thanks to column-stride storage since: - it allows for lightweight and efficient compression, - the filesystem cache doesn't get loaded on field values that you are not interested in. When wondering about stored fields vs doc values, the right trade-off is usually to use: - stored fields when looking up several field values for a few documents, - doc values when loading a few field values for many documents. -- Adrien --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org