Op Saturday 20 December 2008 15:23:43 schreef Prafulla Kiran: > Hi Everyone, > > I have an index of relatively small size (400mb) , containing roughly > 0.7 million documents. The index is actually a copy of an existing > database table. Hence, most of my queries are of the form > > " +field1:value1 +field2:value2 +field3:value3..... ~20 fields" > > I have been running performance tests using this query. Strangely, I > noticed that if I remove some specific clauses... I get a performance > improvement of atleast 5 times. Here are the numbers and examples, so > that I could be more precise > > 1) Complete Query: 90 requests per second using 10 threads > 2) If I remove few specific clauses : 500 requests per second using > 10 threads > 3) If I form a new query using only 2 clauses from the set of removed > clauses -> 100 requests per second using 10 threads > > Now, some of these specific clauses are such that they match around > half of the entire document set. Also, note that I need all the > query terms to be present in the documents retrieved. My target is to > obtain 300 requests per second with the given query (20 clauses). It > includes 2 range queries. However, I am unable to get 300 rps unless > I remove some of the clauses (which include these range queries) . > I have tried using filters without any significant improvement in > performance. Also, I have more than enough RAM, so I am using the > RAMDirectory to read the index. I have optimized my index before > searching. All the tests have been warmed for 5 seconds ( the test > duration is 10 seconds). > > My first question is, is this kind of decrease in performance > expected as the number of clauses shoot up ? Using a single clause > out of these 20 , I was able to get 2000 requests per second! > Could someone please guide me if there are any other ways in which I > can obtain improvement in performance ?
You might try and add brackets and a + around a group of the less frequently occurring terms, like this: +field1:frequentValue1 +field2:frequentValue2 +(+field3:inFrequentValue3 +field4:inFrequentValue4) This may help, and at least it should not degrade performance much. Also, it will affect score values somewhat. > Particularly, I am interested to know more about what further caching > could be done apart from the default caching which lucene does. More caching is probably not going to help. Regards, Paul Elschot --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org