I’ve solved this sort of thing in the past by indexing boundary tokens, and wrapping the queries with the equivalent of Intervals.notContaining(query, boundary-query); you could also put a very large position increment gap and use a width filter, but that’s a bit more error prone if you could conceivably have lots of text in the individual field entries.
> On 10 Sep 2020, at 10:38, Dawid Weiss <dawid.we...@gmail.com> wrote: > > Hi Alan, > > You're the expert here so I thought I'd ask before I jump in deep. Do > you think it's feasible to solve the following multivalued-field > problem: > > doc: field=["foo", "bar"] > query: field:(foo AND bar) > > I'd like the above to return zero hits (no single value contains both > foo and bar), but since multi-valued fields are logically indexed as a > single field, it returns doc. I recognize this as a well known problem > but subdocuments are not fun to deal with so I'd like to avoid them at > all costs. > > Would it be possible to solve the above with intervals? Say, something > like this: > > Intervals.containing(valuePositionRanges(), query). > > I assume the containment relationship would get rid of false-positives > crossing value boundary here. The problem is in how to construct those > value position ranges... Store them at index-construction time > somehow? Compute them on the fly for anything that has a chance to > match query? Your thoughts would be very appreciated. > > Dawid > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org