"Ning Li" <[EMAIL PROTECTED]> wrote: > On 4/3/07, Michael McCandless (JIRA) <[EMAIL PROTECTED]> wrote: > > > * With term vectors and/or stored fields, the new patch has > > substantially better RAM efficiency. > > Impressive numbers! The new patch improves RAM efficiency quite a bit > even with no term vectors nor stored fields, because of the periodic > in-RAM merges of posting lists & term infos etc. The frequency of the > in-RAM merges is controlled by flushedMergeFactor, which measures in > doc count, right? How sensitive is performance to the value of > flushedMergeFactor?
Right, the in-RAM merges seem to help *alot* because you get great compression of the terms dictionary, and also some compression of the freq postings since the docIDs are delta encoded. Also, you waste less end buffer space (buffers are fixed sizes) when you merge together into a large segment. The in-RAM merges are triggered by number of bytes used vs RAM buffer size. Each doc is indexed to its own RAM segment, then once these level 0 segments use > 1/Nth of the RAM buffer size, I merge into level 1. Then once level 1 segments are using > 1/Mth of the RAM buffer size, I merge into level 2. I don't do any merges beyond that. Right now N = 14 and M = 7 but I haven't really tuned them yet ... Once RAM is full, all of those segments are merged into a single on-disk segment. Once enough on-disk segments accumulate they are periodically merged (based on flushedMergeFactor) as well. Finally when it's time to commit a real segment I merge all RAM segments and flushed segments into a real Lucene segment. I haven't done much testing to find sweet spot for these merge settings just yet. Still plenty to do! Mike --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]