Hi,

I index several millions small documents per day. each day, I remove some 
of the older documents to keep the index at a stable number of documents. 
after each purge, I commit then I optimize the index. what I found is that 
if I keep optimizing with max num segments = 2, then the index keeps 
growing on the disk. but as soon as I optimize with just 1 segment, the 
space gets reclaimed on the disk. so, I have currently adopted the 
following strategy : every night I optimize with 2 segments, except once 
per week where I optimize with just 1 segment.

is that an expected behavior?
I guess I am doing something special because I was not able to reproduce 
this behavior in a unit test. what could it be?

it would be nice to get some explanatory services within the product to 
help get some understanding on its behavior. something that tells you some 
information about your index for instance (number of docs in the different 
states, how the space is being used, ...). lucene is a wonderful product, 
but to me this is almost like black magic, and when there is a specific 
behavior, I have got little clues to figure out something by myself. some 
user oriented logging would be nice as well (the index writer info stream 
is really verbose and very low level).

thanks for your help,


Vince

************************ DISCLAIMER ************************
This message is intended only for use by the person to
whom it is addressed. It may contain information that is
privileged and confidential. Its content does not
constitute a formal commitment by Lombard Odier
Darier Hentsch & Cie or any of its branches or affiliates.
If you are not the intended recipient of this message,
kindly notify the sender immediately and destroy this
message. Thank You.
*****************************************************************

Reply via email to