MichelLiu commented on pull request #92: URL: https://github.com/apache/lucene/pull/92#issuecomment-825347763
I had a problem with the tiered merge policy. As I continuously updated a batch of data over time and time, then I got a lot of segments with 4.9G which segDelPct already greater than deletePctAllowed and cannot be merged by tiered merge policy. Then I found the code here and figured out the reason: ` if (segSizeDocs.sizeInBytes > maxMergedSegmentBytes / 2 && (totalDelPct <= deletesPctAllowed || segDelPct <= deletesPctAllowed)) { iter.remove(); tooBigCount++; // Just for reporting purposes. totIndexBytes -= segSizeDocs.sizeInBytes; allowedDelCount -= segSizeDocs.delCount; } ` Here was the segments I had met before: 1613741580098 0 p 10.10.112.123 _2h 89 1224440 569330 4.9gb 4905832 true true 8.4.0 false 1613741580098 0 p 10.10.112.123 _4v 175 2383463 425919 4.9gb 5636245 true true 8.4.0 false 1613741580098 0 p 10.10.112.123 _6n 239 2891298 380212 4.9gb 5617940 true true 8.4.0 false 1613741580098 0 p 10.10.112.123 _1lwc 75036 468350 364104 4.3gb 3718611 true true 8.4.0 false 1613741580098 0 p 10.10.112.123 _1xh2 90038 678187 252779 3.6gb 3453739 true true 8.4.0 false 1613741580098 0 p 10.10.112.123 _25u8 100880 482795 237275 4.1gb 3370799 true true 8.4.0 false 1613741580098 0 p 10.10.112.123 _2fld 113521 721503 225160 4.1gb 3776954 true true 8.4.0 false 1613741580098 0 p 10.10.112.123 _2m9h 122165 831574 127572 4.2gb 3812013 true true 8.4.0 false 1613741580098 0 p 10.10.112.123 _2n01 123121 34000 27437 345.3mb 543426 true true 8.4.0 true 1613741580098 0 p 10.10.112.123 _2nq6 124062 36985 19838 319.2mb 515882 true true 8.4.0 true 1613741580098 0 p 10.10.112.123 _2o7d 124681 52725 40581 556.3mb 632128 true true 8.4.0 true 1613741580098 0 p 10.10.112.123 _2ouj 125515 11158 6330 114mb 235396 true true 8.4.0 true And I had an index with 564G, after bulk updating for a month, then grows up to 1400G. That caused significant waste of disk, and also highed up the search delay to 450ms. So we have to reindex the index per month now. My solution is to merge the large segments as low-frequency as possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org