Hi,

Here's the code which I am using to time the query:

long startTime = System.currentTimeMillis();
TopDocCollector collector = new TopDocCollector(10);
is.search(query,collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
long endTime = System.currentTimeMillis();

Most of the clauses which I removed, had very few unique terms: say 2 or 3.
I have started taking the timings after I've fired the warmup queries.
Also, I am not doing any kind of sorting or iterating through the hits object.

Regards,
Prafulla

Erick Erickson wrote:
What specifically are you measuring when you time the queries? I've been
mislead by including in my measurement say, creating the response. I realize
that throughput includes assembling the response, but the solution is
different
depending upon whether it's the actual search or what you do with the
results that takes the time.

Are you doing any sorting?

Are you using a Hits object and iterating on it? This gets very inefficient.

You might post your code where you time the query. Also what do the "few
specific clauses" you remove look like? Do they have anything to do with
time?
How many unique values do the fields have that you remove to see the
improvement?

Do you start your timings *after* you've fired up a few warmup queries?

Best
Erick

On Sat, Dec 20, 2008 at 9:23 AM, Prafulla Kiran <prafu...@tachyontech.net>wrote:

Hi Everyone,

I have an index of relatively small size (400mb) , containing roughly 0.7
million documents. The index is actually a copy of an existing database
table. Hence, most of my queries are of the form

" +field1:value1 +field2:value2 +field3:value3..... ~20 fields"

I have been running performance tests using this query. Strangely, I
noticed that if I remove some specific clauses... I get a performance
improvement of atleast 5 times. Here are the numbers and examples, so that I
could be more precise

1) Complete Query: 90 requests per second using 10 threads
2) If I remove few specific clauses : 500 requests per second using 10
threads
3) If I form a new query using only 2 clauses from the set of removed
clauses -> 100 requests per second using 10 threads

Now, some of these specific clauses are such that they match around half of
the entire document set.  Also, note that I need all the query terms to be
present in the documents retrieved. My target is to obtain 300 requests per
second with the given query (20 clauses). It includes 2 range queries.
However, I am unable to get 300 rps unless I remove some of the clauses
(which include these range queries) .
I have tried using filters without any significant improvement in
performance. Also, I have more than enough RAM, so I am using the
RAMDirectory to read the index. I have optimized my index before searching.
All the tests have been warmed for 5 seconds ( the test duration is 10
seconds).

My first question is, is this kind of decrease in performance expected as
the number of clauses shoot up ? Using a single clause out of these 20 , I
was able to get 2000 requests per second!
Could someone please guide me if there are any other ways in which I can
obtain improvement in performance ?
Particularly, I am interested to know more about what further caching could
be done apart from the default caching which lucene does.

Thanks In Advance,
Prafulla

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



------------------------------------------------------------------------


No virus found in this incoming message.
Checked by AVG - http://www.avg.com Version: 8.0.176 / Virus Database: 270.9.19/1857 - Release Date: 12/19/2008 10:09 AM



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to