Re: Dense union of doc IDs

Adrien Grand Thu, 10 Nov 2022 00:55:37 -0800

I opened an issue to track this idea:
https://github.com/apache/lucene/issues/11915


On Mon, Nov 7, 2022 at 1:21 PM LuXugang <xugan...@icloud.com.invalid> wrote:

> +1 If we would have a new BulkAdder and it could detect long runs of set
> bits, It also could be at least used in LRUQueryCache to cache part dense
> docs instead of always building a huge BitSet by maxDoc?
>
> Xugang
>
> https://www.amazingkoala.com.cn
>
>
>
>
> On Nov 4, 2022, at 08:15, Michael Froh <msf...@gmail.com> wrote:
>
> Hi,
>
> I was recently poking around in the createWeight implementation for
> MultiTermQueryConstantScoreWrapper to get to the bottom of some slow
> queries, and I realized that the worst-case performance could be pretty
> bad, but (maybe) possible to optimize for.
>
> Imagine if we have a segment with N docs and our MultiTermQuery expands to
> hit M terms, where each of the M terms matches N-1 docs. (If we matched all
> N docs, then Greg Miller's recent optimization to replace the
> MultiTermQuery with a TermQuery would kick in.) In this case, my
> understanding is that we would iterate through all the terms and pass each
> one's postings to a DocIdSetBuilder, which will iterate through the
> postings to set bits. This whole thing would be O(MN), I think.
>
> I was thinking that it would be cool if the DocIdSetBuilder could detect
> long runs of set bits and advance() each DISI to skip over them (since
> they're guaranteed not to contribute anything new to the union). In the
> worst case that I described above, I think it would make the whole thing
> O(M log N) (assuming advance() takes log time).
>
> At the risk of overcomplicating things, maybe DocIdSetBuilder could use a
> third ("dense") BulkAdder implementation that kicks in once enough bits are
> set, to efficiently implement the "or" operation to skip over known
> (sufficiently long) runs of set bits?
>
> Would something like that be useful? Is the "dense union of doc IDs" case
> common enough to warrant it?
>
> Thanks,
> Froh
>
>
>

-- 
Adrien

Re: Dense union of doc IDs

Reply via email to