What about an invariant that says the number of main index segments with the same level (f(n)) should be less than M.
That is exactly what the second property says: "Less than M number of segments whose doc count n satisfies B*(M^c) <= n < B*(M^(c+1)) for any c >= 0." In other words, less than M number of segments with the same f(n).
I am concerned about corner cases causing tons of segments and slowing search or causing errors due to file descriptor exhaustion. When merging, maybe we should count the number of segments at a particular index level f(n), rather than adding up the number of documents. In the presence of deletions, this should lead to faster indexing (due to less frequent merges) I think.
Given M, B and an index which has L (0 < L < M) segments with docs less than B, how many ram docs should be accumulated before a merge is triggered? B is not good. B-sum(L) is the old strategy which has problems. So between B-sum(L) and B? Once there are M segments with docs less than B, they'll be merged. But what if L=0? Should B ram docs be accumulated before flushed in that case? In any case, if flushing ram docs causes the the number of segments with <B docs to reach M in close(), a merge with those segments should be triggered.
What is the behavior of your patch under the current scenario: M=10, B=1000 open writer, add 3 docs, close writer open writer, add 1000 docs, close writer Do you avoid the situation of having segments with docs=3 and 1000 (hence f(n) increases as you increase segment numbers... a no-no)?
Currently, it does result in segments with docs=3 and 1000. I'll modify the patch so that it completely complies with all the index invariants once an agreement is reached. Ning --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]