What about an invariant that says the number of main index segments
with the same level (f(n)) should be less than M.

That is exactly what the second property says:
"Less than M number of segments whose doc count n satisfies B*(M^c) <=
n < B*(M^(c+1)) for any c >= 0."

In other words, less than M number of segments with the same f(n).


I am concerned about corner cases causing tons of segments and slowing
search or causing errors due to file descriptor exhaustion.

When merging, maybe we should count the number of segments at a
particular index level f(n), rather than adding up the number of
documents.  In the presence of deletions, this should lead to faster
indexing (due to less frequent merges) I think.

Given M, B and an index which has L (0 < L < M) segments with docs
less than B, how many ram docs should be accumulated before a merge is
triggered? B is not good. B-sum(L) is the old strategy which has
problems. So between B-sum(L) and B? Once there are M segments with
docs less than B, they'll be merged. But what if L=0? Should B ram
docs be accumulated before flushed in that case?

In any case, if flushing ram docs causes the the number of segments
with <B docs to reach M in close(), a merge with those segments should
be triggered.


What is the behavior of your patch under the current scenario:

M=10, B=1000
open writer, add 3 docs, close writer
open writer, add 1000 docs, close writer

Do you avoid the situation of having segments with docs=3 and 1000
(hence f(n) increases as you increase segment numbers... a no-no)?

Currently, it does result in segments with docs=3 and 1000. I'll
modify the patch so that it completely complies with all the index
invariants once an agreement is reached.

Ning

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to