[ https://issues.apache.org/jira/browse/LUCENE-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499290 ]
Michael Busch commented on LUCENE-893: -------------------------------------- I ran some performance tests with the same setup I used for LUCENE-866: - 1.2 GB index, optimized, compound format, documents from Wikipedia - 50,000 queries, each query has 3 AND terms, each term has a df>100, each query has one or more hits - 2.8 GHz Xeon, 4 GB RAM, SCSI HD, Windows Server 2003 My tests simply executes all 50k queries in a row and measures the overall time. I used the current trunk version patched with LUCENE-888 and LUCENE-866 and varied the buffer size of the cfs reader. Here are the results: 1 KB: Time: 51703 ms. 2 KB: Time: 50672 ms. 4 KB: Time: 50969 ms. 8 KB: Time: 57047 ms. 16 KB: Time: 64547 ms. I seems that it doesn't really matter if the buffer size is 1 KB, 2 KB, or 4 KB. Above 4 KB the performance decreases significantly. Now the same test with a cfs reader buffer of 1 KB and varying buffer sizes for the freq stream in SegmentTermDocs: 1 KB: Time: 51875 ms. 2 KB: Time: 46828 ms. 4 KB: Time: 44500 ms. 8 KB: Time: 50953 ms. 16 KB: Time: 64485 ms. With 4 KB there is a performance improvement of 14%! But considering the fact that this stream is cloned for every query term, I think that 2 KB is the better choice, still a 10% improvement. Now I simply vary the readBufferSize for all buffered inputs: 1 KB: Time: 51778 ms. 2 KB: Time: 46172 ms. 4 KB: Time: 49000 ms. 8 KB: Time: 52187 ms. 16 KB: Time: 69562 ms. Now the same test with 50k disjunction queries, 3 terms per query: 1 KB: Time: 288422 ms. 2 KB: Time: 259672 ms. 4 KB: Time: 279563 ms. 2 KB for all input buffers seems to be a good compromise. It's about 10% faster than 1 KB for both types of queries. Question are: - Can we afford the increased memory consumption? - Is 2 KB also the best choice on other systems? > Increase buffer sizes used during searching > ------------------------------------------- > > Key: LUCENE-893 > URL: https://issues.apache.org/jira/browse/LUCENE-893 > Project: Lucene - Java > Issue Type: Improvement > Components: Store > Affects Versions: 2.1 > Reporter: Michael McCandless > > Spinoff of LUCENE-888. > In LUCENE-888 we increased buffer sizes that impact indexing and found > substantial (10-18%) overall performance gains. > It's very likely that we can also gain some performance for searching > by increasing the read buffers in BufferedIndexInput used by > searching. > We need to test performance impact to verify and then pick a good > overall default buffer size, also being careful not to add too much > overall HEAP RAM usage because a potentially very large number of > BufferedIndexInput instances are created during searching > (# segments X # index files per segment). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]