I've had reports of OOM exceptions during optimize on a couple of large 
deployments recently (based on Lucene 2.4.0)
I've given the usual advice of turning off norms, providing plenty of RAM and 
also suggested setting IndexWriter.setTermIndexInterval().

I don't have access to these deployment environments and have tried hard to 
reproduce the circumstances that lead to this. For the record, I've 
experimented with huge indexes with hundreds of fields, several "unique value" 
fields e.g. primary keys, "fixed-vocab" fields with limited values e.g. 
male/female and fields with "power-curve" distributions e.g. plain text.
I've wound my index up to 22GB with several commit sessions involving 
deletions, full optimises and partial optimises along the way. Still no error.

However, the errors that have been reported to me from 2 different environments 
with large indexes make me think there is still something to be uncovered 
here...




----- Original Message ----
From: Michael McCandless <luc...@mikemccandless.com>
To: java-user@lucene.apache.org
Cc: Utan Bisaya <lebi...@ymail.com>
Sent: Tuesday, 23 December, 2008 14:08:26
Subject: Re: Optimize and Out Of Memory Errors


How many indexed fields do you have, overall, in the index?

If you have a very large number of fields that are "sparse" (meaning any given 
document would only have a small subset of the fields), then norms could 
explain what you are seeing.

Norms are not stored sparsely, so when segments get merged the "holes" get 
filled (occupy bytes on disk and in RAM) and consume more resources.  Turning 
off norms on sparse fields would resolve it, but you must rebuild the entire 
index since if even a single doc in the index has norms enabled for a given 
field, it "spreads".

Mike

Utan Bisaya wrote:

> Recently, our lucene index version was upgraded to 2.3.1 and the index had to 
> be rebuilt for several weeks which made the entire index a total of 20 GB or 
> so.
> 
> After the the rebuild, a weekly sunday task was executed for optimization.
> 
> During that time, the optimization failed several times complaining about OOM 
> errors but then after a couple of tries, it completes.
> 
> So the entire index is now 1 segment that is 20 GB.
> 
> The problem is that any subsequent searches on that index fails with OOM 
> errors at Lucene's reading of bytes.
> 
> Our environment:
> jvm Xmx1600 (This is the max we could set the box since it's windows)
> 8G Memory available on box
> 4G CPU (8 core) but only 12.5% is used. (Not sure if this would impact it)
> Harddisk available is 120GB
> 
> mergeFactor, and other lucene config is set at default.
> 
> We
> checked this 20GB using luke and it has 400,000,000 documents in it. It was 
> able to count the docs however when we do the search in Luke it fails giving 
> us OOM errors.
> 
> We also did the check index and the tool fails at the 20GB segment but 
> succeeds on the others.
> 
> We've managed to rollback to a previously unoptimized index copy (about 20GB 
> or so) and the searches were find now. This unoptimized index is made up of 
> several 8GB segments and a few smaller segments.
> 
> However there is a big possibility that the optimization error could happen 
> again...
> 
> Does anybody have insights on why this is happening?
> 
> 
> 
> 
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to