No, you are incorrect. The point of a search engine is to return top-N most relevant.
If you insist you need to open an indexreader on every single search, and then return huge amounts of docs, maybe you should use a database instead. On Tue, Jun 3, 2014 at 6:42 AM, Jamie <ja...@mailarchiva.com> wrote: > Vitality / Robert > > I wouldn't go so far as to call our pagination naive!? Sub-optimal, yes. > Unless I am mistaken, the Lucene library's pagination mechanism, makes the > assumption that you will cache the scoredocs for the entire result set. This > is not practical when you have a result set that exceeds 60M. As stated > earlier, in any case, it is the first query that is slow. > > We do open index readers.. since we are using NRT search. Since documents > are being added to the indexes on a continuous basis. When the user clicks > on the Search button, the user will expect to see the latest result set. > With regards to NRT search, my understanding is that we do need to open the > index readers on each search operation to see the latest changes. > > Thus, on each search, we combine the indexreaders into a multireader, and > open each reader based their corresponding writer. > > protected IndexReader initIndexReader() { > List<IndexReader> readers = new LinkedList<>(); > for (Writer writer : writers) { > readers.add(DirectoryReader.open(writer, true); > } > return MultiReader(readers,true); > } > > Thank you for your ideas/suggestions. > > Regards > > Jamie > > On 2014/06/03, 12:29 PM, Vitaly Funstein wrote: >> >> Jamie, >> >> What if you were to forget for a moment the whole pagination idea, and >> always capped your search at 1000 results for testing purposes only? This >> is just to try and pinpoint the bottleneck here; if, regardless of the >> query parameters, the search latency stays roughly the same and well below >> 5 min, you now have the answer - the problem is your naive implementation >> of pagination which results in snowballing result numbers and search >> times, >> the closer you get to the end of the results range. Otherwise, I would >> focus on your query and filter next. >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org