In the past we've seen diminishing returns after the RAM buffer was
larger than a few 100s of MB, I believe.  But it'd be great to do
these tests again :)

A very large RAM buffer means you're not committing very often which
means if things go south, you lose all those indexed docs.

IndexWriter also consumes sudden RAM when merging -- it opens a
SegmentReader per segment being merged (but does not load its terms
index, but does load its norms); it allocates 4 bytes per doc to remap
around deletions.

For resolving deletions, it opens a SegmentReader for every segment in
the index, and does load the terms index (which in your case Tom is
probably going to tie up alot of RAM!).  If you pool these readers
(either by using IW.getReader (NRT reader) or by allowing pooling via
IndexWriterConfig (new in trunk)), you need to budget RAM for them.

Also, for searching, be sure to leave RAM to the OS for allocation to
the IO cache...

Mike

On Tue, Apr 27, 2010 at 1:28 AM, Shai Erera <ser...@gmail.com> wrote:
> Hi Tom
>
> I don't know of an easy way to understand the relationship between the max
> RAM and the buffer size. I ran the test w/ 8GB heap and 2048 MB RAM buffer.
> indexing 16M documents (roughly 288GB data) took 7400 seconds (by 8
> threads). I will post the full benchmark output when I finish indexing 25M
> documents w/ different RAM buffer sizes.
>
> My gut feeling (and after reading this
> http://www.ibm.com/developerworks/java/library/j-jtp09275.html) tells me
> that if I need N MB of RAM, I should allocate at least 2*N space on the
> heap. But that just takes the RAM buffer into consideration. Since there is
> other memory that is allocated, GC might wake up, so in order to avoid that
> (as much as possible), I allocate at least 3*N, if N is large enough.
>
> In the current example, I need 2GB for RAM buffer, so I'll allocate at least
> 4 for on the heap. Then if I assume that the rest of the app won't allocate
> a total of more than 2GB, I'll set the heap size to 6GB. Since I have lots
> of RAM and cannot use it w/ Lucene, I set the heap size to 8GB. I haven't
> though turned on any flags to determine if and when GC ran, so I don't know
> if I've hit any nasty GC issues. But, given the total indexing throughput
> ~(140GB / hour), I think these are good settings.
>
> BTW, I think that w/ parallel arrays
> (https://issues.apache.org/jira/browse/LUCENE-2329), the performance should
> be better if you use a lower heap size. You can also read there that Michael
> B. ran the test w/ 200 RAM buffer and 2GB heap (and also 256MB heap), which
> might give you another indication of the RAM buffer / heap size ratio.
>
> Hope this helps,
> Shai
>
> On Mon, Apr 26, 2010 at 8:26 PM, Tom Burton-West <tburtonw...@gmail.com>
> wrote:
>>
>> I'm looking forward to your results Shai.
>>
>>
>> Once we get our new test server we will be running tests with different
>> RAM
>> buffer sizes.  We have 10 300GB indexes to re-index, so we need to
>> minimize
>> any merging/disk I/O.
>>
>> See also this related thread on the Solr list:
>>
>> http://lucene.472066.n3.nabble.com/What-is-largest-reasonable-setting-for-ramBufferSizeMB-tc505964.html#a505964
>>
>> Is there any easy way to understand the relationship between the max RAM
>> buffer size and the total amount of memory you need to give the JVM ?
>>
>>
>> Tom Burton-West
>> www.hathitrust.org
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Lucene-RAM-buffer-size-limit-tp756752p757354.html
>> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to