Re: setTermInfosIndexDivisor

Michael McCandless Thu, 25 Jun 2009 02:55:35 -0700

On Thu, Jun 25, 2009 at 5:40 AM, Ganesh<[email protected]> wrote:
> I am updating status of the documents frequently. There will be huge number 
> of deletes. I do optimize the index once in a day.


OK

> I want to know the usage for setTermInfosIndexDivisor.
>
> Directory dir = FSDirectory.getDirectory(indexPath);
> IndexReader reader = IndexReader.open(dir, true);
> reader.setTermInfosIndexDivisor(5);
>
> I reopen the IndexReader whenever there is any document added to Index.  Do i 
> need to set setTermInfosIndexDivisor(5); during re-opening of the index also. 
> I tried this, first time it accepted and second time onwards it throws "terms 
> already loaded" expection.

In fact Lucene has a bug here: on reopen, your index divisor is not
properly carried over to the newly opened segments.  Worse, if you
attempt to call setTermInfosIndexDivisor, it'll throw an exception
because the already-opened readers have already loaded their terms
index.  I think the only workaround is to not use reopen.

>>Loaded terms might not dominate your memory consumption in side
>>lucene. Again, you should provide more information of indexing, the
>>environment and the situation where the error occurs.
> I do indexing with no norms with all default values.

OK.  As of 2.9 (not yet released), you should also call
IndexReader.setDisableFakeNorms(true), to prevent the creation of a
full array of "fake" norms.

> As per the documentation, it should subsample the terms loaded in to memory.

That's what termInfosIndexDivisor does, but if the memory used by your
actual index's terms index is smallish (run CheckIndex to see), this
setting won't help much anyway.

Are you sorting by field for any of your queries?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: setTermInfosIndexDivisor

Reply via email to