Michael Busch wrote on 11/22/2006 08:47 AM: > Ning Li wrote: >> A possible design could be: >> First, in addDocument(), compute the byte size of a ram segment after >> the ram segment is created. In the synchronized block, when the newly >> created segment is added to ramSegmentInfos, also add its byte size to >> the total byte size of ram segments. >> Then, in maybeFlushRamSegments(), either one of two conditions can >> trigger a flush: number of ram segments reaching maxBufferedDocs, and >> total byte size of ram segments exceeding a threshold.
There is a flaw in this approach as you exceed the threshold before flushing. With very large documents, that can cause an OOM. > > This is exactly how I implemented it in my private version a couple of > weeks ago. It works good and I don't see performance problems with > this design. I named the new parameter in IndexWriter: > setMaxBufferSize(long). I implemented it externally because I need to check the size before adding a new document. To make this work, I have a notion of size of Document (via a Sized interface). I agree that it would be better to do this in IndexWriter, but more machinery would be needed. Lucene would need to estimate the size of the new ram segment and check the threshold prior to consuming the space. The API that Yonik committed last night (thanks Yonik!) provides the flexibility to address both use cases. It's a tiny bit more work for the app, but at least in my case, is necessary to tune for best performance (by minimizing memory usage variance as a function of size parameters) and avoid OOM's. Chuck --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]