[ 
https://issues.apache.org/jira/browse/LUCENE-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733498#action_12733498
 ] 

Shai Erera commented on LUCENE-1750:
------------------------------------

bq.we could add an optimize(long maxSegmentSize)

This I think would be useful anyway, and kind of required if we introduce the 
proposed merge policy. Otherwise, if someone's code calls optimize (w/ or w/o 
num segments limit), those large segments will be optimized as well.

bq. except if it accumulates too many deletes (as a percentage of docs) then it 
can be compacted and new segments merged into it?

If one would call expungeDeletes, and that segment will go below the max size, 
then it will be eligible for merging, right? But I have a question here, and it 
may be that I'm missing something in the merge process. Say I have the 
following segments, each at 4 GB (the limit), except D:
A (docs 0-99), B (docs 100-230), C (docs 231-450) and D (docs 451-470). Then A 
accumulates 50 deletes. On one hand, we'd want it to be merged, but if we want 
that, we have to merge B and C either, right? We cannot merge A w/ D, because 
the doc IDs need to be in increasing order and retain the order they were added 
to the index?

So will the merge policy detect that? I think that it should and the way to 
work around that is to ensure that the first segment which is below the limit, 
triggers the merge of all following segments (in doc ID order), regardless of 
their size?

I don't know if your patch already takes care of this case, and whether my 
understanding is correct, so if you already handle it that way (or some other 
way), then that's fine.

> Create a MergePolicy that limits the maximum size of it's segments
> ------------------------------------------------------------------
>
>                 Key: LUCENE-1750
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1750
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4.1
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: LUCENE-1750.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Basically I'm trying to create largish 2-4GB shards using
> LogByteSizeMergePolicy, however I've found in the attached unit
> test segments that exceed maxMergeMB.
> The goal is for segments to be merged up to 2GB, then all
> merging to that segment stops, and then another 2GB segment is
> created. This helps when replicating in Solr where if a single
> optimized 60GB segment is created, the machine stops working due
> to IO and CPU starvation. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to