Re: Avoiding false-positives in multivalued field search with intervals?

Alan Woodward Thu, 10 Sep 2020 03:22:24 -0700

I’ve solved this sort of thing in the past by indexing boundary tokens, and 
wrapping the queries with the equivalent of Intervals.notContaining(query, 
boundary-query); you could also put a very large position increment gap and use 
a width filter, but that’s a bit more error prone if you could conceivably have 
lots of text in the individual field entries.


> On 10 Sep 2020, at 10:38, Dawid Weiss <dawid.we...@gmail.com> wrote:
> 
> Hi Alan,
> 
> You're the expert here so I thought I'd ask before I jump in deep. Do
> you think it's feasible to solve the following multivalued-field
> problem:
> 
> doc: field=["foo", "bar"]
> query: field:(foo AND bar)
> 
> I'd like the above to return zero hits (no single value contains both
> foo and bar), but since multi-valued fields are logically indexed as a
> single field, it returns doc. I recognize this as a well known problem
> but subdocuments are not fun to deal with so I'd like to avoid them at
> all costs.
> 
> Would it be possible to solve the above with intervals? Say, something
> like this:
> 
> Intervals.containing(valuePositionRanges(), query).
> 
> I assume the containment relationship would get rid of false-positives
> crossing value boundary here. The problem is in how to construct those
> value position ranges... Store them at index-construction time
> somehow? Compute them on the fly for anything that has a chance to
> match query? Your thoughts would be very appreciated.
> 
> Dawid
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Avoiding false-positives in multivalued field search with intervals?

Reply via email to