Hi Michael, Indeed, only MatchAllDocsQuery knows how to produce a count when there are deletes.
Your idea sounds good to me, do you actually need a side car iterator for deletes, or could you use a nextClearBit() operation on the bit set? I don't think we can fold it into Weight#count since there is an expectation that it is negligible compared with the cost of a naive count, but we may be able to do it in IndexSearcher#count or on the OpenSearch side. Le ven. 2 févr. 2024, 23:50, Michael Froh <msf...@gmail.com> a écrit : > Hi, > > On OpenSearch, we've been taking advantage of the various O(1) > Weight#count() implementations to quickly compute various aggregations > without needing to iterate over all the matching documents (at least when > the top-level query is functionally a match-all at the segment level). Of > course, from what I've seen, every clever Weight#count() > implementation falls apart (returns -1) in the face of deletes. > > I was thinking that we could still handle small numbers of deletes > efficiently if only we could get a DocIdSetIterator for deleted docs. > > Like suppose you're doing a date histogram aggregation, you could get the > counts for each bucket from the points tree (ignoring deletes), then > iterate through the deleted docs and decrement their contribution from the > relevant bucket (determined based on a docvalues lookup). Assuming the > number of deleted docs is small, it should be cheap, right? > > The current LiveDocs implementation is just a FixedBitSet, so AFAIK it's > not great for iteration. I'm imagining adding a supplementary "deleted docs > iterator" that could sit next to the FixedBitSet if and only if the number > of deletes is "small". Is there a better way that I should be thinking > about this? > > Thanks, > Froh >