No, you are incorrect. The point of a search engine is to return top-N
most relevant.

If you insist you need to open an indexreader on every single search,
and then return huge amounts of docs, maybe you should use a database
instead.

On Tue, Jun 3, 2014 at 6:42 AM, Jamie <ja...@mailarchiva.com> wrote:
> Vitality / Robert
>
> I wouldn't go so far as to call our pagination naive!? Sub-optimal, yes.
> Unless I am mistaken, the Lucene library's pagination mechanism, makes the
> assumption that you will cache the scoredocs for the entire result set. This
> is not practical  when you have a result set that exceeds 60M. As stated
> earlier, in any case, it is the first query that is slow.
>
> We do open index readers.. since we are using NRT search. Since documents
> are being added to the indexes on a continuous basis. When the user clicks
> on the Search button, the user will expect to see the latest result set.
> With regards to NRT search, my understanding is that we do need to open the
> index readers on each search operation to see the latest changes.
>
> Thus, on each search, we combine the indexreaders into a multireader, and
> open each reader based their corresponding writer.
>
> protected IndexReader initIndexReader() {
>     List<IndexReader> readers = new LinkedList<>();
>     for (Writer writer : writers) {
>         readers.add(DirectoryReader.open(writer, true);
>     }
>     return MultiReader(readers,true);
> }
>
> Thank you for your ideas/suggestions.
>
> Regards
>
> Jamie
>
> On 2014/06/03, 12:29 PM, Vitaly Funstein wrote:
>>
>> Jamie,
>>
>> What if you were to forget for a moment the whole pagination idea, and
>> always capped your search at 1000 results for testing purposes only? This
>> is just to try and pinpoint the bottleneck here; if, regardless of the
>> query parameters, the search latency stays roughly the same and well below
>> 5 min, you now have the answer - the problem is your naive implementation
>> of pagination which results in snowballing result numbers and search
>> times,
>> the closer you get to the end of the results range. Otherwise, I would
>> focus on your query and filter next.
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to