mark harwood wrote:


Could you get a heap dump (eg with YourKit) of what's using up all the memory when you hit OOM?

On this particular machine I have a JRE, no admin rights and therefore limited profiling capability :( That's why I was trying to come up with some formula for estimating memory usage.

Hmm, OK.  infoStream?

Or maybe you could use jmap -histo <pid>?

When you say "write session", are you closing & opening a new IndexWriter each time?

Yes, commit then close.

OK.

It seems likely this has something to do with merging,

Presumably (norms + deleted arrays discounted) RAM usage for merges is not proportional to number of terms or docs? I imagine the structures being merged are streamed rather than loaded in whole or as a fixed percentage of the whole.

Oh yeah, right.  No deleted docs to load, and no docMaps to create.
And no norms.  So merging should use negligible amounts of RAM.

So the exception really means "GC is working too hard" (and your
graph, and the super-slow GC when you run with
-XX:-UseGCOverheadLimit) seem to indicate GC is taking alot of time to
collect.  But: how much RAM is actually in use?  Can you plot that?

I suppose you could be hitting some kind of horrible degenerate
fragmentation craziness worst-case-scenario situation, that's swamping
the GC...

Maybe start binary-searching -- turn features off (like Trie*) and see
if it the problem goes away, then go back and drill down?

If you take that 98th batch of docs and index it into a fresh index, do
you also hit the GC-working-too-hard exception?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to