Michael Busch wrote on 11/22/2006 08:47 AM:
> Ning Li wrote:
>> A possible design could be:
>> First, in addDocument(), compute the byte size of a ram segment after
>> the ram segment is created. In the synchronized block, when the newly
>> created segment is added to ramSegmentInfos, also add its byte size to
>> the total byte size of ram segments.
>> Then, in maybeFlushRamSegments(), either one of two conditions can
>> trigger a flush: number of ram segments reaching maxBufferedDocs, and
>> total byte size of ram segments exceeding a threshold.

There is a flaw in this approach as you exceed the threshold before
flushing.  With very large documents, that can cause an OOM.

>
> This is exactly how I implemented it in my private version a couple of
> weeks ago. It works good and I don't see performance problems with
> this design. I named the new parameter in IndexWriter:
> setMaxBufferSize(long).

I implemented it externally because I need to check the size before
adding a new document.  To make this work, I have a notion of size of
Document (via a Sized interface).

I agree that it would be better to do this in IndexWriter, but more
machinery would be needed.  Lucene would need to estimate the size of
the new ram segment and check the threshold prior to consuming the space.

The API that Yonik committed last night (thanks Yonik!) provides the
flexibility to address both use cases.  It's a tiny bit more work for
the app, but at least in my case, is necessary to tune for best
performance (by minimizing memory usage variance as a function of size
parameters) and avoid OOM's.

Chuck


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to