[ 
https://issues.apache.org/jira/browse/LUCENE-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499332
 ] 

robert engels commented on LUCENE-893:
--------------------------------------

Some food for thought:

A couple of runs of XBench on hardware that is radically difference in terms of 
raw performance shows a nearly 4x performance improvement using 256k blocks 
during sequential access. For random reads the numbers are closer to 20x.

The trick is determining how much sequential data is (should) be read - the 
locality of data for the current query along with future queries, since even if 
Lucene reads extra unneeded data in this run, what is the chance that the data 
will be needed in future queries (thus having it already in the cache).

It would seem that these numbers show the ideal solution would vary the buffer 
size when the engine determines that it is going to read a lot of sequential 
data (e.g. a wide open range query), and use smaller buffer sizes when it 
expects only a few results.

Maybe this might shove Lucene down the path where the index is optimized so 
that common queries terms are always put in a separate segment/index providing 
a high degree of locality to optimize the reading. Maybe there is some academic 
research in this area?

Disk Test       81.23   
                Sequential      81.55   
                        Uncached Write  80.69   33.63 MB/sec [4K blocks]
                        Uncached Write  80.94   33.15 MB/sec [256K blocks]
                        Uncached Read   77.68   12.30 MB/sec [4K blocks]
                        Uncached Read   87.48   35.35 MB/sec [256K blocks]
                Random  80.92   
                        Uncached Write  62.67   0.94 MB/sec [4K blocks]
                        Uncached Write  89.93   20.28 MB/sec [256K blocks]
                        Uncached Read   89.01   0.59 MB/sec [4K blocks]
                        Uncached Read   89.93   18.51 MB/sec [256K blocks]

Disk Test       48.34   
                Sequential      47.83   
                        Uncached Write  39.10   16.30 MB/sec [4K blocks]
                        Uncached Write  59.73   24.46 MB/sec [256K blocks]
                        Uncached Read   38.72   6.13 MB/sec [4K blocks]
                        Uncached Read   64.56   26.08 MB/sec [256K blocks]
                Random  48.87   
                        Uncached Write  35.51   0.53 MB/sec [4K blocks]
                        Uncached Write  46.00   10.37 MB/sec [256K blocks]
                        Uncached Read   66.61   0.44 MB/sec [4K blocks]
                        Uncached Read   59.06   12.15 MB/sec [256K blocks]


> Increase buffer sizes used during searching
> -------------------------------------------
>
>                 Key: LUCENE-893
>                 URL: https://issues.apache.org/jira/browse/LUCENE-893
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions: 2.1
>            Reporter: Michael McCandless
>
> Spinoff of LUCENE-888.
> In LUCENE-888 we increased buffer sizes that impact indexing and found
> substantial (10-18%) overall performance gains.
> It's very likely that we can also gain some performance for searching
> by increasing the read buffers in BufferedIndexInput used by
> searching.
> We need to test performance impact to verify and then pick a good
> overall default buffer size, also being careful not to add too much
> overall HEAP RAM usage because a potentially very large number of
> BufferedIndexInput instances are created during searching
> (# segments X # index files per segment).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to