[
https://issues.apache.org/jira/browse/LUCENE-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499290
]
Michael Busch commented on LUCENE-893:
--------------------------------------
I ran some performance tests with the same setup I used for LUCENE-866:
- 1.2 GB index, optimized, compound format, documents from Wikipedia
- 50,000 queries, each query has 3 AND terms, each term has a df>100,
each query has one or more hits
- 2.8 GHz Xeon, 4 GB RAM, SCSI HD, Windows Server 2003
My tests simply executes all 50k queries in a row and measures the
overall time. I used the current trunk version patched with LUCENE-888
and LUCENE-866 and varied the buffer size of the cfs reader.
Here are the results:
1 KB: Time: 51703 ms.
2 KB: Time: 50672 ms.
4 KB: Time: 50969 ms.
8 KB: Time: 57047 ms.
16 KB: Time: 64547 ms.
I seems that it doesn't really matter if the buffer size is 1 KB, 2 KB,
or 4 KB. Above 4 KB the performance decreases significantly.
Now the same test with a cfs reader buffer of 1 KB and varying buffer
sizes for the freq stream in SegmentTermDocs:
1 KB: Time: 51875 ms.
2 KB: Time: 46828 ms.
4 KB: Time: 44500 ms.
8 KB: Time: 50953 ms.
16 KB: Time: 64485 ms.
With 4 KB there is a performance improvement of 14%! But considering
the fact that this stream is cloned for every query term, I think
that 2 KB is the better choice, still a 10% improvement.
Now I simply vary the readBufferSize for all buffered inputs:
1 KB: Time: 51778 ms.
2 KB: Time: 46172 ms.
4 KB: Time: 49000 ms.
8 KB: Time: 52187 ms.
16 KB: Time: 69562 ms.
Now the same test with 50k disjunction queries, 3 terms per query:
1 KB: Time: 288422 ms.
2 KB: Time: 259672 ms.
4 KB: Time: 279563 ms.
2 KB for all input buffers seems to be a good compromise. It's about
10% faster than 1 KB for both types of queries.
Question are:
- Can we afford the increased memory consumption?
- Is 2 KB also the best choice on other systems?
> Increase buffer sizes used during searching
> -------------------------------------------
>
> Key: LUCENE-893
> URL: https://issues.apache.org/jira/browse/LUCENE-893
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Store
> Affects Versions: 2.1
> Reporter: Michael McCandless
>
> Spinoff of LUCENE-888.
> In LUCENE-888 we increased buffer sizes that impact indexing and found
> substantial (10-18%) overall performance gains.
> It's very likely that we can also gain some performance for searching
> by increasing the read buffers in BufferedIndexInput used by
> searching.
> We need to test performance impact to verify and then pick a good
> overall default buffer size, also being careful not to add too much
> overall HEAP RAM usage because a potentially very large number of
> BufferedIndexInput instances are created during searching
> (# segments X # index files per segment).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]