[ https://issues.apache.org/jira/browse/LUCENE-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499332 ]
robert engels commented on LUCENE-893: -------------------------------------- Some food for thought: A couple of runs of XBench on hardware that is radically difference in terms of raw performance shows a nearly 4x performance improvement using 256k blocks during sequential access. For random reads the numbers are closer to 20x. The trick is determining how much sequential data is (should) be read - the locality of data for the current query along with future queries, since even if Lucene reads extra unneeded data in this run, what is the chance that the data will be needed in future queries (thus having it already in the cache). It would seem that these numbers show the ideal solution would vary the buffer size when the engine determines that it is going to read a lot of sequential data (e.g. a wide open range query), and use smaller buffer sizes when it expects only a few results. Maybe this might shove Lucene down the path where the index is optimized so that common queries terms are always put in a separate segment/index providing a high degree of locality to optimize the reading. Maybe there is some academic research in this area? Disk Test 81.23 Sequential 81.55 Uncached Write 80.69 33.63 MB/sec [4K blocks] Uncached Write 80.94 33.15 MB/sec [256K blocks] Uncached Read 77.68 12.30 MB/sec [4K blocks] Uncached Read 87.48 35.35 MB/sec [256K blocks] Random 80.92 Uncached Write 62.67 0.94 MB/sec [4K blocks] Uncached Write 89.93 20.28 MB/sec [256K blocks] Uncached Read 89.01 0.59 MB/sec [4K blocks] Uncached Read 89.93 18.51 MB/sec [256K blocks] Disk Test 48.34 Sequential 47.83 Uncached Write 39.10 16.30 MB/sec [4K blocks] Uncached Write 59.73 24.46 MB/sec [256K blocks] Uncached Read 38.72 6.13 MB/sec [4K blocks] Uncached Read 64.56 26.08 MB/sec [256K blocks] Random 48.87 Uncached Write 35.51 0.53 MB/sec [4K blocks] Uncached Write 46.00 10.37 MB/sec [256K blocks] Uncached Read 66.61 0.44 MB/sec [4K blocks] Uncached Read 59.06 12.15 MB/sec [256K blocks] > Increase buffer sizes used during searching > ------------------------------------------- > > Key: LUCENE-893 > URL: https://issues.apache.org/jira/browse/LUCENE-893 > Project: Lucene - Java > Issue Type: Improvement > Components: Store > Affects Versions: 2.1 > Reporter: Michael McCandless > > Spinoff of LUCENE-888. > In LUCENE-888 we increased buffer sizes that impact indexing and found > substantial (10-18%) overall performance gains. > It's very likely that we can also gain some performance for searching > by increasing the read buffers in BufferedIndexInput used by > searching. > We need to test performance impact to verify and then pick a good > overall default buffer size, also being careful not to add too much > overall HEAP RAM usage because a potentially very large number of > BufferedIndexInput instances are created during searching > (# segments X # index files per segment). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]