On Wed, Apr 1, 2009 at 5:20 PM, Dan OConnor <docon...@acquiremedia.com> wrote:
> All:
>
> We are using java lucene 2.3.2 to index a fairly large number of documents 
> (roughly 400,000 per day). We have divided the time history into various 
> depths.
>
> Our first stage covers 8 days and our next stage covers 22. The index 
> directory for the first stage is approximately 20G when fully optimized. The 
> index directory of our second stage is over 250GB when optimized. Our third 
> stage (which is 60 days) is only ~80GB when optimized.

Are these divisions overlapping?  Ie, 2nd index includes all docs from
the first one (8 days) plus another 14 days, and 3rd index (60 days)
includes all 22 days from 2nd one plus another 38 days?

If so, I don't understand eg why the 60 day index is smaller than the
22 day index.  That's odd.

> The second stage index failed an optimization with a disk full exception (I 
> had to move it to another lucene machine with a larger disk partition to 
> complete the optimization. Is there a reason why a 22 day index would be 10x 
> the size of an 8 day index when the document indexing rate is fairly 
> constant? Also, is there a way to shrink the index without regenerating it?

You can't shrink the index without deleting docs... eg you could
delete docs, and add newer docs that store less stuff, to shrink the
index over time (merging/optimizing will reclaim disk space).

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to