[ 
https://issues.apache.org/jira/browse/LUCENE-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chuck Williams updated LUCENE-1052:
-----------------------------------

    Attachment: termInfosConfigurer.patch

termInfosConfigurer.patch extends the termInfoIndexDivisor mechanism to allow 
dynamic management of this parameter.  A new new interface, 
TermInfosConfigurer, allows specification of a method, getMaxTermsCached(), 
that bounds the size of the in-memory term infos as a function of the segment 
name, segment numDocs, and total segment terms.  This bound is then used to 
automatically set termInfosIndexDivisor whenever a TermInfosReader reads the 
term index.  This mechanism provides a simple way to ensure that the total 
amount of memory consumed by the term cache is bounded by, say, O(log(numDocs)).

All Lucene core tests pass.  I'm using another version of this same patch in 
Lucene 2.1+ in an application that has indexes with binary term pollution, 
using the TermInfosConfigurer to dynamically bound the term cache in the 
polluted segments.

Tried to test contrib, but it appears gdata-server needs external libraries I 
don't have to compile.

Michael, this patch applies cleanly to today's Lucene trunk.  I'd appreciate if 
you could verify one thing.  Lucene 2.3 has the incremental reopen mechanism 
(can't wait to get that!), new since Lucene 2.1.  It appears that reopen of a 
segment reuses the same TermInfosReader and thus does not need to configure a 
new one.  I've implemented that part of the patch with this assumption.


> Add an "termInfosIndexDivisor" to IndexReader
> ---------------------------------------------
>
>                 Key: LUCENE-1052
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1052
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.2
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.3
>
>         Attachments: LUCENE-1052.patch, termInfosConfigurer.patch
>
>
> The termIndexInterval, set during indexing time, let's you tradeoff
> how much RAM is used by a reader to load the indexed terms vs cost of
> seeking to the specific term you want to load.
> But the downside is you must set it at indexing time.
> This issue adds an indexDivisor to TermInfosReader so that on opening
> a reader you could further sub-sample the the termIndexInterval to use
> less RAM.  EG a setting of 2 means every 2 * termIndexInterval is
> loaded into RAM.
> This is particularly useful if your index has a great many terms (eg
> you accidentally indexed binary terms).
> Spinoff from this thread:
>   http://www.gossamer-threads.com/lists/lucene/java-dev/54371

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to