Re: A model for predicting indexing memory costs?

Michael McCandless Tue, 10 Mar 2009 08:30:31 -0700


mark harwood wrote:

Could you get a heap dump (eg with YourKit) of what's using up allthe memory when you hit OOM?
On this particular machine I have a JRE, no admin rights andtherefore limited profiling capability :(That's why I was trying to come up with some formula for estimatingmemory usage.


Hmm, OK.  infoStream?

Or maybe you could use jmap -histo <pid>?

When you say "write session", are you closing & opening a newIndexWriter each time?
Yes, commit then close.

OK.

It seems likely this has something to do with merging,
Presumably (norms + deleted arrays discounted) RAM usage for mergesis not proportional to number of terms or docs? I imagine thestructures being merged are streamed rather than loaded in whole oras a fixed percentage of the whole.


Oh yeah, right.  No deleted docs to load, and no docMaps to create.
And no norms.  So merging should use negligible amounts of RAM.

So the exception really means "GC is working too hard" (and your
graph, and the super-slow GC when you run with
-XX:-UseGCOverheadLimit) seem to indicate GC is taking alot of time to
collect.  But: how much RAM is actually in use?  Can you plot that?

I suppose you could be hitting some kind of horrible degenerate
fragmentation craziness worst-case-scenario situation, that's swamping
the GC...

Maybe start binary-searching -- turn features off (like Trie*) and see
if it the problem goes away, then go back and drill down?

If you take that 98th batch of docs and index it into a fresh index, do
you also hit the GC-working-too-hard exception?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: A model for predicting indexing memory costs?

Reply via email to