On 3/22/07, Michael McCandless <[EMAIL PROTECTED]> wrote:
Right I'm calling a newly created segment (ie flushed from RAM) level 0 and then a level 1 segment is created when you merge 10 level 0 segments, level 2 is created when merge 10 level 1 segments, etc.
That is not how the current merge policy works. There are two orthogonal aspects to this problem: 1 the measurement of a segment size 2 the merge behaviour given a measurement In the current code: 1 The measurement of a segment size is the document count in the segment, not the actual RAM or file size. Levels are defined according to this measurement. 2 The behaviour is the two invariants when mergeFactor (M) does not change and segment doc count is not reaching maxMergeDocs: B for maxBufferedDocs, f(n) defined as ceil(log_M(ceil(n/B))) 1) If i (left*) and i+1 (right*) are two consecutive segments of doc counts x and y, then f(x) >= f(y). 2) The number of committed segments on the same level (f(n)) <= M. The document counts are approximation of segment sizes thus approximation of merge cost. Sometimes, however, they do not correctly reflect segment sizes. So it is probably a good idea to use RAM or file size as measurement of a segment size as Mike suggested. But the behaviour does not have to change: the two invariants can still be guaranteed, with the definition of sizes and levels modified according to the new measurement. Ning --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]