OK, it's early days and I'm holding my breath but I'm currently progressing 
further through my content without an OOM just by using a different GC setting.

Thanks to advice here and colleagues at work I've gone with a GC setting of 
-XX:+UseSerialGC for this indexing task.

The rationale that is emerging is there are potentially 2 different GC 
strategies that should be applied for Lucene work - one for indexing and one 
for search

1) Search GC model
Search typically generates much less state than indexing tasks and each search 
can often have an SLA associated with it (e.g. results returned in <0.5 
seconds). 
In this environment a non-invasive background GC is useful (e.g. the parallel 
collector) to avoid long interrupts on individual searches. ParallelCollector 
will throw OOM if these background GC tasks appear to lock-up ( 
http://java.sun.com/j2se/1.5.0/docs/guide/vm/gc-ergonomics.html ). I think this 
is the problem I was getting originally.
2) Index GC model
Indexing generates large volumes of objects which are kept in RAM while 
indexing then flushed. Individual document adds are typically not subject to an 
SLA but overall indexing time for a batch is. In this scenario a 
"stop-the-world" garbage collect mid-indexing is a welcome cleansing and has no 
business impact if it takes a while. This is what I beleive I'm now getting 
from the -UseSerialGC setting.


This does suggest that it might be a good idea to use a different VM for 
indexing to the one used for searching so that you can impose these different 
GC models.

That's the theory at least and I'm hoping this works out for me in 
practice......






----- Original Message ----
From: Michael McCandless <luc...@mikemccandless.com>
To: java-user@lucene.apache.org
Sent: Wednesday, 11 March, 2009 13:04:56
Subject: Re: A model for predicting indexing memory costs?


Mark Miller wrote:

> Michael McCandless wrote:
>> 
>> Ie, it's still not clear if you are running out of memory vs hitting some 
>> weird "it's too hard for GC to deal" kind of massive heap fragmentation 
>> situation or something.  It reminds me of the special ("I cannot be played 
>> on record player X") record (your application) that cannot be played on a 
>> given record player X (your JRE) in Gödel, Escher, Bach ;)
>> 
> Perhaps its been too long since I've seen that book, but if I remember right, 
> he only has to get himself a JRE Omega version and he should be all set...


That's right ;)  But JRE Omega will still have a [different] record that causes 
its GC to grind to a halt... (or maybe I don't remember the book very well!).

Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to