Re: BooleanQuery Performance Help

Paul Elschot Sat, 20 Dec 2008 07:03:32 -0800

Op Saturday 20 December 2008 15:23:43 schreef Prafulla Kiran:
> Hi Everyone,
>
> I have an index of relatively small size (400mb) , containing roughly
> 0.7 million documents. The index is actually a copy of an existing
> database table. Hence, most of my queries are of the form
>
> " +field1:value1 +field2:value2 +field3:value3..... ~20 fields"
>
> I have been running performance tests using this query. Strangely, I
> noticed that if I remove some specific clauses... I get a performance
> improvement of atleast 5 times. Here are the numbers and examples, so
> that I could be more precise
>
> 1) Complete Query: 90 requests per second using 10 threads
> 2) If I remove few specific clauses : 500 requests per second using
> 10 threads
> 3) If I form a new query using only 2 clauses from the set of removed
> clauses -> 100 requests per second using 10 threads
>
> Now, some of these specific clauses are such that they match around
> half of the entire document set.  Also, note that I need all the
> query terms to be present in the documents retrieved. My target is to
> obtain 300 requests per second with the given query (20 clauses). It
> includes 2 range queries. However, I am unable to get 300 rps unless
> I remove some of the clauses (which include these range queries) .
> I have tried using filters without any significant improvement in
> performance. Also, I have more than enough RAM, so I am using the
> RAMDirectory to read the index. I have optimized my index before
> searching. All the tests have been warmed for 5 seconds ( the test
> duration is 10 seconds).
>
> My first question is, is this kind of decrease in performance
> expected as the number of clauses shoot up ? Using a single clause
> out of these 20 , I was able to get 2000 requests per second!
> Could someone please guide me if there are any other ways in which I
> can obtain improvement in performance ?


You might try and add brackets and a + around a group
of the less frequently occurring terms, like this:

+field1:frequentValue1 +field2:frequentValue2 +(+field3:inFrequentValue3 
+field4:inFrequentValue4)

This may help, and at least it should not degrade performance much.
Also, it will affect score values somewhat.

> Particularly, I am interested to know more about what further caching
> could be done apart from the default caching which lucene does.

More caching is probably not going to help.

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: BooleanQuery Performance Help

Reply via email to