[jira] Commented: (LUCENE-845) If you "flush by RAM usage" then IndexWriter may over-merge

Michael McCandless (JIRA) Sat, 31 Mar 2007 04:22:51 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12485722
 ]


Michael McCandless commented on LUCENE-845:
-------------------------------------------

Just recapping some following discussion from java-dev ...

The current merge policy can be thought of logically as two different
steps:

  1. How to determine the "level" of each segment in the index.

  2. How & when to pick which level N segments into a level N+1
     segment.

The current policy determines a segment's level by looking at the doc
count in the segment as well as the current maxBufferedDocs, which is
very problematic when you "flush by RAM usage" instead.  This Jira
issue, then, is proposing to instead look at overall byte size of a
segment for determining its level, while keeping step 2. above.

However, I would propose we also fix LUCENE-854 (which addresses step
2 above and not step 1) at the same time, as a single merge policy,
and maybe at some point in the future make this new merge policy the
default merge policy.


> If you "flush by RAM usage" then IndexWriter may over-merge
> -----------------------------------------------------------
>
>                 Key: LUCENE-845
>                 URL: https://issues.apache.org/jira/browse/LUCENE-845
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.1
>            Reporter: Michael McCandless
>         Assigned To: Michael McCandless
>            Priority: Minor
>
> I think a good way to maximize performance of Lucene's indexing for a
> given amount of RAM is to flush (writer.flush()) the added documents
> whenever the RAM usage (writer.ramSizeInBytes()) has crossed the max
> RAM you can afford.
> But, this can confuse the merge policy and cause over-merging, unless
> you set maxBufferedDocs properly.
> This is because the merge policy looks at the current maxBufferedDocs
> to figure out which segments are level 0 (first flushed) or level 1
> (merged from <mergeFactor> level 0 segments).
> I'm not sure how to fix this.  Maybe we can look at net size (bytes)
> of a segment and "infer" level from this?  Still we would have to be
> resilient to the application suddenly increasing the RAM allowed.
> The good news is to workaround this bug I think you just need to
> ensure that your maxBufferedDocs is less than mergeFactor *
> typical-number-of-docs-flushed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-845) If you "flush by RAM usage" then IndexWriter may over-merge

Reply via email to