Re: Avoiding false-positives in multivalued field search with intervals?

Dawid Weiss Thu, 10 Sep 2020 07:16:25 -0700

Yup - similar to what Alan suggested. I'd have to rewrite the (general
text-to-query) query parser to only use intervals though. Still
thinking about possible approaches to this.


D.

On Thu, Sep 10, 2020 at 3:58 PM jim ferenczi <jim.feren...@gmail.com> wrote:
>
> You could set a very high position increment gap for multi-valued fields 
> (Analyzer#getPositionIncrementGap) and perform something
> like Intervals.maxWidth(Intervals.unordered(...), pos_gap-1) ?
>
>
> Le jeu. 10 sept. 2020 à 12:32, Dawid Weiss <dawid.we...@gmail.com> a écrit :
>>
>> Yeah... I was thinking about adding synthetic boundaries but this
>> seems... impure. :) Another quick reflection is that I'd have to
>> somehow translate the original query (which can be arbitrarily
>> complex) into an interval query. Tough.
>>
>> D.
>>
>> On Thu, Sep 10, 2020 at 12:22 PM Alan Woodward <romseyg...@gmail.com> wrote:
>> >
>> > I’ve solved this sort of thing in the past by indexing boundary tokens, 
>> > and wrapping the queries with the equivalent of 
>> > Intervals.notContaining(query, boundary-query); you could also put a very 
>> > large position increment gap and use a width filter, but that’s a bit more 
>> > error prone if you could conceivably have lots of text in the individual 
>> > field entries.
>> >
>> > > On 10 Sep 2020, at 10:38, Dawid Weiss <dawid.we...@gmail.com> wrote:
>> > >
>> > > Hi Alan,
>> > >
>> > > You're the expert here so I thought I'd ask before I jump in deep. Do
>> > > you think it's feasible to solve the following multivalued-field
>> > > problem:
>> > >
>> > > doc: field=["foo", "bar"]
>> > > query: field:(foo AND bar)
>> > >
>> > > I'd like the above to return zero hits (no single value contains both
>> > > foo and bar), but since multi-valued fields are logically indexed as a
>> > > single field, it returns doc. I recognize this as a well known problem
>> > > but subdocuments are not fun to deal with so I'd like to avoid them at
>> > > all costs.
>> > >
>> > > Would it be possible to solve the above with intervals? Say, something
>> > > like this:
>> > >
>> > > Intervals.containing(valuePositionRanges(), query).
>> > >
>> > > I assume the containment relationship would get rid of false-positives
>> > > crossing value boundary here. The problem is in how to construct those
>> > > value position ranges... Store them at index-construction time
>> > > somehow? Compute them on the fly for anything that has a chance to
>> > > match query? Your thoughts would be very appreciated.
>> > >
>> > > Dawid
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> > > For additional commands, e-mail: dev-h...@lucene.apache.org
>> > >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> > For additional commands, e-mail: dev-h...@lucene.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Avoiding false-positives in multivalued field search with intervals?

Reply via email to