It sounds like a lot of complexity to handle an unusual edge case, but ... I guess this actually happened? Can you give any sense of the end-user behavior that caused it?
On Fri, Nov 4, 2022 at 2:26 AM Patrick Zhai <zhai7...@gmail.com> wrote: > > Hi Froh, > > The idea sounds reasonable to me, altho I wonder whether using > CONSTANT_SCORE_BOOLEAN_REWRITE would help with your case since that dense > union case should be already handled by disjunction query I suppose? > But that boolean rewrite is subject to max clause limit so it may have some > other problems depending on the use case I guess. > > Best > Patrick > > > On Thu, Nov 3, 2022 at 5:15 PM Michael Froh <msf...@gmail.com> wrote: >> >> Hi, >> >> I was recently poking around in the createWeight implementation for >> MultiTermQueryConstantScoreWrapper to get to the bottom of some slow >> queries, and I realized that the worst-case performance could be pretty bad, >> but (maybe) possible to optimize for. >> >> Imagine if we have a segment with N docs and our MultiTermQuery expands to >> hit M terms, where each of the M terms matches N-1 docs. (If we matched all >> N docs, then Greg Miller's recent optimization to replace the MultiTermQuery >> with a TermQuery would kick in.) In this case, my understanding is that we >> would iterate through all the terms and pass each one's postings to a >> DocIdSetBuilder, which will iterate through the postings to set bits. This >> whole thing would be O(MN), I think. >> >> I was thinking that it would be cool if the DocIdSetBuilder could detect >> long runs of set bits and advance() each DISI to skip over them (since >> they're guaranteed not to contribute anything new to the union). In the >> worst case that I described above, I think it would make the whole thing O(M >> log N) (assuming advance() takes log time). >> >> At the risk of overcomplicating things, maybe DocIdSetBuilder could use a >> third ("dense") BulkAdder implementation that kicks in once enough bits are >> set, to efficiently implement the "or" operation to skip over known >> (sufficiently long) runs of set bits? >> >> Would something like that be useful? Is the "dense union of doc IDs" case >> common enough to warrant it? >> >> Thanks, >> Froh --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org