These are not document hits but text hits (to be more specific, spans).
For the search result it is necessary to have the precise number of document 
and text hits and a relatively small number of matched text snippets.

I've tried several approaches to optimize the search algorithm but they didn't 
help - for the specific types of queries there is indeed a great amount of data 
to be retrieved from the index.
At the moment I'm thinking about in-RAM caching of posting lists. Is it 
possible in Lucene?

-- 
Igor

02.04.2013, 20:44, "Adrien Grand" <jpou...@gmail.com>:
> On Tue, Apr 2, 2013 at 4:39 PM, Igor Shalyminov
> <ishalymi...@yandex-team.ru> wrote:
>
>>  Yes, the number of documents is not too large (about 90 000), but the 
>> queries are very hard. Although they're just boolean, a typical query can 
>> produce a result with tens of millions of hits.
>
> How can there be tens of millions of hits with only 90000 docs?
>
>>  Single-threadedly such a query runs ~20 seconds, which is too slow. 
>> therefore, multithreading is vital for this task.
>
> Indeed, that's super slow. Multithreading could help a little, but
> maybe there is something to do to better index your data so that
> queries get faster?
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to