Re: [jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

Michael McCandless Tue, 03 Apr 2007 09:51:48 -0700

"Ning Li" <[EMAIL PROTECTED]> wrote:
> On 4/3/07, Michael McCandless (JIRA) <[EMAIL PROTECTED]> wrote:
> 
> >  * With term vectors and/or stored fields, the new patch has
> >    substantially better RAM efficiency.
> 
> Impressive numbers! The new patch improves RAM efficiency quite a bit
> even with no term vectors nor stored fields, because of the periodic
> in-RAM merges of posting lists & term infos etc. The frequency of the
> in-RAM merges is controlled by flushedMergeFactor, which measures in
> doc count, right? How sensitive is performance to the value of
> flushedMergeFactor?


Right, the in-RAM merges seem to help *alot* because you get great
compression of the terms dictionary, and also some compression of the
freq postings since the docIDs are delta encoded.  Also, you waste
less end buffer space (buffers are fixed sizes) when you merge together
into a large segment.

The in-RAM merges are triggered by number of bytes used vs RAM buffer
size.  Each doc is indexed to its own RAM segment, then once these
level 0 segments use > 1/Nth of the RAM buffer size, I merge into
level 1.  Then once level 1 segments are using > 1/Mth of the RAM
buffer size, I merge into level 2.  I don't do any merges beyond that.
Right now N = 14 and M = 7 but I haven't really tuned them yet ...

Once RAM is full, all of those segments are merged into a single
on-disk segment.  Once enough on-disk segments accumulate they are
periodically merged (based on flushedMergeFactor) as well.  Finally
when it's time to commit a real segment I merge all RAM segments and
flushed segments into a real Lucene segment.

I haven't done much testing to find sweet spot for these merge
settings just yet.  Still plenty to do!

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

Reply via email to