RE: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

Steven Parkes Thu, 22 Mar 2007 12:35:06 -0800

> EG if you set maxBufferedDocs to say 10000 but then it turns out based
> on RAM usage you actually flush every 300 docs then the merge policy
> will incorrectly merge a level 1 segment (with 3000 docs) in with the
> level 0 segments (with 300 docs).  This is because the merge policy
> looks at the current value of maxBufferedDocs to compute the levels
> so a 3000 doc segment and a 300 doc segment all look like "level 0".


Are you calling the 3K segment a level 1 segment because it was created
from level 0 segments? Because based on size, it is a level 0 segment,
right? With the current merge policy, you can merge level n segments and
get a level n segment. Deletes will do this, plus other things like
changing merge policy parameters and combining indexes.

Leads to the question of what is "over merging". The current merge
policy doesn't consider the size of the result, it simply counts the
number of segments at a level. Do you think this qualifies as over
merging? It still should only merge when there are mergeFactor segments
at a level, so you shouldn't be doing too terribly much merging.  And
you have to be careful not to do less, right? By bounding the number of
segments at each level, you ensure that your file descriptor usage only
grows logarithmically.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

Reply via email to