MichelLiu commented on pull request #92:
URL: https://github.com/apache/lucene/pull/92#issuecomment-825347763


   I had a problem with the tiered merge policy. As I continuously updated a 
batch of data over time and time, then I got a lot of segments with 4.9G which 
segDelPct already greater than deletePctAllowed and cannot be merged by tiered 
merge policy.
   Then I found the code here and figured out the reason:
   `
   if (segSizeDocs.sizeInBytes > maxMergedSegmentBytes / 2 && (totalDelPct <= 
deletesPctAllowed || segDelPct <= deletesPctAllowed)) {
           iter.remove();
           tooBigCount++; // Just for reporting purposes.
           totIndexBytes -= segSizeDocs.sizeInBytes;
           allowedDelCount -= segSizeDocs.delCount;
         }
   `
   
   Here was the segments I had met before:
   
   1613741580098 0     p      10.10.112.123 _2h             89    1224440       
569330    4.9gb     4905832 true      true       8.4.0   false
   1613741580098 0     p      10.10.112.123 _4v            175    2383463       
425919    4.9gb     5636245 true      true       8.4.0   false
   1613741580098 0     p      10.10.112.123 _6n            239    2891298       
380212    4.9gb     5617940 true      true       8.4.0   false
   1613741580098 0     p      10.10.112.123 _1lwc        75036     468350       
364104    4.3gb     3718611 true      true       8.4.0   false
   1613741580098 0     p      10.10.112.123 _1xh2        90038     678187       
252779    3.6gb     3453739 true      true       8.4.0   false
   1613741580098 0     p      10.10.112.123 _25u8       100880     482795       
237275    4.1gb     3370799 true      true       8.4.0   false
   1613741580098 0     p      10.10.112.123 _2fld       113521     721503       
225160    4.1gb     3776954 true      true       8.4.0   false
   1613741580098 0     p      10.10.112.123 _2m9h       122165     831574       
127572    4.2gb     3812013 true      true       8.4.0   false
   1613741580098 0     p      10.10.112.123 _2n01       123121      34000       
 27437  345.3mb      543426 true      true       8.4.0   true
   1613741580098 0     p      10.10.112.123 _2nq6       124062      36985       
 19838  319.2mb      515882 true      true       8.4.0   true
   1613741580098 0     p      10.10.112.123 _2o7d       124681      52725       
 40581  556.3mb      632128 true      true       8.4.0   true
   1613741580098 0     p      10.10.112.123 _2ouj       125515      11158       
  6330    114mb      235396 true      true       8.4.0   true
   
   
   And I had an index with 564G, after bulk updating for a month, then grows up 
to 1400G. That caused significant waste of disk, and also highed up the search 
delay to 450ms. So we have to  reindex the index per month now.
   
   My solution is to merge the large segments as low-frequency as possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to