Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

Ning Li Wed, 06 Sep 2006 08:27:09 -0700

> "Less than M number of segments whose doc count n satisfies B*(M^c) <=
> n < B*(M^(c+1)) for any c >= 0."
> In other words, less than M number of segments with the same f(n).


Ah, I had missed that.  But I don't believe that lucene currently
obeys this in all cases.


I think it does hold for n >= B, i.e. c >= 0. But not for n < B.

The new IndexWriter changes ad an additional constraint: to delete
documents efficiently, the first merge must be on buffered documents
only to ensure that ids don't change.  We should also explore changing
the index invariants to accommodate this.

Do you have any ideas in this area?  Is a monotonically decreasing
segment level (your f(n)) really required?


Currently, the first merge always starts on buffered documents. Do you
want this constraint to be reflected in the index invariants, or do
you want to remove this constraint?

In any case, a monotonically decreasing f(n) is definitely a good
thing. Otherwise, cases like a sandwich (segments with small f(n)
sandwiched by two segments with large f(n)) make it even harder to
come up with a robust merge policy.

> So between B-sum(L) and B? Once there are M segments with
> docs less than B, they'll be merged. But what if L=0? Should B ram
> docs be accumulated before flushed in that case?

It seems like it.  Examples are easier to visualize sometimes... do
you have an example where this wouldn't be advisable?


I'm ok with it. I simply wish there were one strategy that would work
for both cases.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

Reply via email to