[ 
https://issues.apache.org/jira/browse/LUCENE-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537995
 ] 

Ning Li commented on LUCENE-1035:
---------------------------------

> most lucene usecases store much more than just the document id... that would 
> really affect locality.

In the experiments, I was simulating the (Google) paradigm where you retrieve 
just the docids and go to document servers for other things. If store almost 
always negatively affects locality, we can make the buffer pool sit only on 
data/files which we expect good locality (say posting lists), but not others.

> It seems like a simple LRU cache could really be blown out of the water by 
> certain types of queries (retrieve a lot of stored fields, or do an expanding 
> term query) that would force out all previously cached hotspots. Most OS 
> level caching has protection against this (multi-level LRU or whatever). But 
> of our user-level LRU cache fails, we've also messed up the OS level cache 
> since we've been hiding page hits from it.

That's a good point. We can improve the algorithm but hopefully still keep it 
simple and general. This buffer pool is not a fit-all solution. But hopefully 
it will benefit a number of use cases. That's why I say "optional". :)

> I'd like to see single term queries, "OR" queries, and queries across 
> multiple fields (also a common usecase) that match more documents tested also.

I'll change to "OR" queries and see what happens. The dataset is enwiki with 
four fields: docid, date (optional), title and body. Most terms are from title 
and body.


> Optional Buffer Pool to Improve Search Performance
> --------------------------------------------------
>
>                 Key: LUCENE-1035
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1035
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Ning Li
>         Attachments: LUCENE-1035.patch
>
>
> Index in RAMDirectory provides better performance over that in FSDirectory.
> But many indexes cannot fit in memory or applications cannot afford to
> spend that much memory on index. On the other hand, because of locality,
> a reasonably sized buffer pool may provide good improvement over FSDirectory.
> This issue aims at providing such an optional buffer pool layer. In cases
> where it fits, i.e. a reasonable hit ratio can be achieved, it should provide
> a good improvement over FSDirectory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to