OK, it's early days and I'm holding my breath but I'm currently progressing further through my content without an OOM just by using a different GC setting.
Thanks to advice here and colleagues at work I've gone with a GC setting of -XX:+UseSerialGC for this indexing task. The rationale that is emerging is there are potentially 2 different GC strategies that should be applied for Lucene work - one for indexing and one for search 1) Search GC model Search typically generates much less state than indexing tasks and each search can often have an SLA associated with it (e.g. results returned in <0.5 seconds). In this environment a non-invasive background GC is useful (e.g. the parallel collector) to avoid long interrupts on individual searches. ParallelCollector will throw OOM if these background GC tasks appear to lock-up ( http://java.sun.com/j2se/1.5.0/docs/guide/vm/gc-ergonomics.html ). I think this is the problem I was getting originally. 2) Index GC model Indexing generates large volumes of objects which are kept in RAM while indexing then flushed. Individual document adds are typically not subject to an SLA but overall indexing time for a batch is. In this scenario a "stop-the-world" garbage collect mid-indexing is a welcome cleansing and has no business impact if it takes a while. This is what I beleive I'm now getting from the -UseSerialGC setting. This does suggest that it might be a good idea to use a different VM for indexing to the one used for searching so that you can impose these different GC models. That's the theory at least and I'm hoping this works out for me in practice...... ----- Original Message ---- From: Michael McCandless <[email protected]> To: [email protected] Sent: Wednesday, 11 March, 2009 13:04:56 Subject: Re: A model for predicting indexing memory costs? Mark Miller wrote: > Michael McCandless wrote: >> >> Ie, it's still not clear if you are running out of memory vs hitting some >> weird "it's too hard for GC to deal" kind of massive heap fragmentation >> situation or something. It reminds me of the special ("I cannot be played >> on record player X") record (your application) that cannot be played on a >> given record player X (your JRE) in Gödel, Escher, Bach ;) >> > Perhaps its been too long since I've seen that book, but if I remember right, > he only has to get himself a JRE Omega version and he should be all set... That's right ;) But JRE Omega will still have a [different] record that causes its GC to grind to a halt... (or maybe I don't remember the book very well!). Mike --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
