Re: Computing weight.count() cheaply in the face of deletes?

Adrien Grand Fri, 02 Feb 2024 23:38:46 -0800

Hi Michael,

Indeed, only MatchAllDocsQuery knows how to produce a count when there are
deletes.


Your idea sounds good to me, do you actually need a side car iterator for
deletes, or could you use a nextClearBit() operation on the bit set?

I don't think we can fold it into Weight#count since there is an
expectation that it is negligible compared with the cost of a naive count,
but we may be able to do it in IndexSearcher#count or on the OpenSearch
side.

Le ven. 2 févr. 2024, 23:50, Michael Froh <msf...@gmail.com> a écrit :

> Hi,
>
> On OpenSearch, we've been taking advantage of the various O(1)
> Weight#count() implementations to quickly compute various aggregations
> without needing to iterate over all the matching documents (at least when
> the top-level query is functionally a match-all at the segment level). Of
> course, from what I've seen, every clever Weight#count()
> implementation falls apart (returns -1) in the face of deletes.
>
> I was thinking that we could still handle small numbers of deletes
> efficiently if only we could get a DocIdSetIterator for deleted docs.
>
> Like suppose you're doing a date histogram aggregation, you could get the
> counts for each bucket from the points tree (ignoring deletes), then
> iterate through the deleted docs and decrement their contribution from the
> relevant bucket (determined based on a docvalues lookup). Assuming the
> number of deleted docs is small, it should be cheap, right?
>
> The current LiveDocs implementation is just a FixedBitSet, so AFAIK it's
> not great for iteration. I'm imagining adding a supplementary "deleted docs
> iterator" that could sit next to the FixedBitSet if and only if the number
> of deletes is "small". Is there a better way that I should be thinking
> about this?
>
> Thanks,
> Froh
>

Re: Computing weight.count() cheaply in the face of deletes?

Reply via email to