Yup - similar to what Alan suggested. I'd have to rewrite the (general text-to-query) query parser to only use intervals though. Still thinking about possible approaches to this.
D. On Thu, Sep 10, 2020 at 3:58 PM jim ferenczi <jim.feren...@gmail.com> wrote: > > You could set a very high position increment gap for multi-valued fields > (Analyzer#getPositionIncrementGap) and perform something > like Intervals.maxWidth(Intervals.unordered(...), pos_gap-1) ? > > > Le jeu. 10 sept. 2020 à 12:32, Dawid Weiss <dawid.we...@gmail.com> a écrit : >> >> Yeah... I was thinking about adding synthetic boundaries but this >> seems... impure. :) Another quick reflection is that I'd have to >> somehow translate the original query (which can be arbitrarily >> complex) into an interval query. Tough. >> >> D. >> >> On Thu, Sep 10, 2020 at 12:22 PM Alan Woodward <romseyg...@gmail.com> wrote: >> > >> > I’ve solved this sort of thing in the past by indexing boundary tokens, >> > and wrapping the queries with the equivalent of >> > Intervals.notContaining(query, boundary-query); you could also put a very >> > large position increment gap and use a width filter, but that’s a bit more >> > error prone if you could conceivably have lots of text in the individual >> > field entries. >> > >> > > On 10 Sep 2020, at 10:38, Dawid Weiss <dawid.we...@gmail.com> wrote: >> > > >> > > Hi Alan, >> > > >> > > You're the expert here so I thought I'd ask before I jump in deep. Do >> > > you think it's feasible to solve the following multivalued-field >> > > problem: >> > > >> > > doc: field=["foo", "bar"] >> > > query: field:(foo AND bar) >> > > >> > > I'd like the above to return zero hits (no single value contains both >> > > foo and bar), but since multi-valued fields are logically indexed as a >> > > single field, it returns doc. I recognize this as a well known problem >> > > but subdocuments are not fun to deal with so I'd like to avoid them at >> > > all costs. >> > > >> > > Would it be possible to solve the above with intervals? Say, something >> > > like this: >> > > >> > > Intervals.containing(valuePositionRanges(), query). >> > > >> > > I assume the containment relationship would get rid of false-positives >> > > crossing value boundary here. The problem is in how to construct those >> > > value position ranges... Store them at index-construction time >> > > somehow? Compute them on the fly for anything that has a chance to >> > > match query? Your thoughts would be very appreciated. >> > > >> > > Dawid >> > > >> > > --------------------------------------------------------------------- >> > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> > > For additional commands, e-mail: dev-h...@lucene.apache.org >> > > >> > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> > For additional commands, e-mail: dev-h...@lucene.apache.org >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org