cgejian opened a new issue, #14254:
URL: https://github.com/apache/lucene/issues/14254
**Background**: In version 7.6.0 of ES, an external client is continuously
executing update_by_query on an index.
**Phenomenon**: At this time, I found through /_cat/segments that the
docs.count and docs.deleted of many existing segments in the index are
constantly changing.
For example, the segment information is as follows
```java
segment generation docs.count docs.deleted size size.memory committed
searchable version compound
_mqg 29464 2683624 802282 2.5gb 2119594 true
true 8.4.0 false
_oxd 32305 1250591 632447 1.3gb 1138511 true
true 8.4.0 false
_oo5 31973 1434271 472773 1.3gb 1158523 true
true 8.4.0 false
_v6k 40412 1209509 320023 1.1gb 891519 true
true 8.4.0 false
_1w5 2453 539240 284964 629.9mb 584760 true
true 8.4.0 false
_v0q 40202 982360 266487 929.5mb 733621 true
true 8.4.0 false
_20b 2603 1367294 225623 1.1gb 1019214 true
true 8.4.0 false
_bu7 15343 1144383 210547 1007.5mb 846393 true
true 8.4.0 false
_733 9183 1875493 166523 1.5gb 1323250 true
true 8.4.0 false
```
After a few seconds, the segment information is as follows
```
segment generation docs.count docs.deleted size size.memory committed
searchable version compound
_mqg 29464 2683135 802771 2.5gb 2119594 true
true 8.4.0 false
_oxd 32305 1250591 632447 1.3gb 1138511 true
true 8.4.0 false
_oo5 31973 1434271 472773 1.3gb 1158523 true
true 8.4.0 false
_v6k 40412 1208615 320917 1.1gb 891519 true
true 8.4.0 false
_1w5 2453 537834 286370 629.9mb 584760 true
true 8.4.0 false
_v0q 40202 973870 274977 929.5mb 733621 true
true 8.4.0 false
_20b 2603 1361957 230960 1.1gb 1019214 true
true 8.4.0 false
_bu7 15343 1144383 210547 1007.5mb 846393 true
true 8.4.0 false
_733 9183 1870996 171020 1.5gb 1323250 true
true 8.4.0 false
```
The docs.count and docs.deleted of segments such as _mqg, _v6k, _1w5, etc.
have changed.
**Question**:Based on the above phenomenon, **it indicates that the
docs.deleted in the SegmentCommitInfo generated by flush may change**.
The logic of numDeletesToMerge() in CachingMergeContext is that if the cache
exists, it can directly obtain the number of deleted documents from the cache
according to SegmentCommitInfo; if it does not exist, it will obtain the number
of deleted documents and then put it into the cache.
The code is as follows(#12339):
```
/**
* a wrapper of IndexWriter MergeContext. Try to cache the {@link
* #numDeletesToMerge(SegmentCommitInfo)} result in merge phase, to avoid
duplicate calculation
*/
class CachingMergeContext implements MergePolicy.MergeContext {
final MergePolicy.MergeContext mergeContext;
final HashMap<SegmentCommitInfo, Integer> cachedNumDeletesToMerge = new
HashMap<>();
CachingMergeContext(MergePolicy.MergeContext mergeContext) {
this.mergeContext = mergeContext;
}
@Override
public final int numDeletesToMerge(SegmentCommitInfo info) throws
IOException {
Integer numDeletesToMerge = cachedNumDeletesToMerge.get(info);
if (numDeletesToMerge != null) {
return numDeletesToMerge;
}
numDeletesToMerge = mergeContext.numDeletesToMerge(info);
cachedNumDeletesToMerge.put(info, numDeletesToMerge);
return numDeletesToMerge;
}
@Override
public final int numDeletedDocs(SegmentCommitInfo info) {
return mergeContext.numDeletedDocs(info);
}
@Override
public final InfoStream getInfoStream() {
return mergeContext.getInfoStream();
}
@Override
public final Set<SegmentCommitInfo> getMergingSegments() {
return mergeContext.getMergingSegments();
}
}
```
When docs.deleted is constantly changing, **is the number of deleted
documents obtained from CachingMergeContext.numDeletesToMerge() possibly
incorrect?**
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]