Thanks, I have a heap dump now from a run with reduced JVM memory (in order to speed up a failure point) and am working through it offline with VisualVm. This test induced a proper OOM as opposed to one of those "timed out waiting for GC " type OOMs so may be misleading. The main culprit in this particular dump looks to be FreqProxTermsWriter$PostingsList but the number of instances is in line with the volumes of terms I would have expected at that stage of indexing. I'll report back on my findings as I discover more.
I have another run soldiering on with the -XX:-UseGCOverheadLimit setting to avoid GC -related timeouts and this has not hit OOM but is slowing to a crawl. I'll try capturing InfoStream too if it doesn't generate terabytes of data. Cheers Mark ----- Original Message ---- From: Florian Weimer <f...@deneb.enyo.de> To: java-user@lucene.apache.org Sent: Wednesday, 11 March, 2009 10:42:33 Subject: Re: A model for predicting indexing memory costs? * mark harwood: >>>Could you get a heap dump (eg with YourKit) of what's using up all the >>>memory when you hit OOM? > > On this particular machine I have a JRE, no admin rights and > therefore limited profiling capability :( Maybe this could give you a heap dump which you can analyze on a different box? -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/path/to/file.dump --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org