I am fine with the boundary token suggestion, actually. What I don't
see at the moment is how I can marry it with an output of a general
query parser (which returns any Query). I could give an attempt to
process the query node tree from standard query parser (which we're
using at the moment anyway) but if the tree becomes complex there is
no guarantee I can extract subtrees that can be parsed into
IntervalSources (and then in turn into IntervalQuery).

Dawid

On Thu, Sep 10, 2020 at 4:28 PM jim ferenczi <[email protected]> wrote:
>
> Right, I misunderstood Alan's answer. The boundary option is not "impure" in 
> my opinion. It solves this issue nicely but maybe it needs something more 
> packaged to add the boundaries and build queries easily.
>
> Le jeu. 10 sept. 2020 à 16:16, Dawid Weiss <[email protected]> a écrit :
>>
>> Yup - similar to what Alan suggested. I'd have to rewrite the (general
>> text-to-query) query parser to only use intervals though. Still
>> thinking about possible approaches to this.
>>
>> D.
>>
>> On Thu, Sep 10, 2020 at 3:58 PM jim ferenczi <[email protected]> wrote:
>> >
>> > You could set a very high position increment gap for multi-valued fields 
>> > (Analyzer#getPositionIncrementGap) and perform something
>> > like Intervals.maxWidth(Intervals.unordered(...), pos_gap-1) ?
>> >
>> >
>> > Le jeu. 10 sept. 2020 à 12:32, Dawid Weiss <[email protected]> a écrit 
>> > :
>> >>
>> >> Yeah... I was thinking about adding synthetic boundaries but this
>> >> seems... impure. :) Another quick reflection is that I'd have to
>> >> somehow translate the original query (which can be arbitrarily
>> >> complex) into an interval query. Tough.
>> >>
>> >> D.
>> >>
>> >> On Thu, Sep 10, 2020 at 12:22 PM Alan Woodward <[email protected]> 
>> >> wrote:
>> >> >
>> >> > I’ve solved this sort of thing in the past by indexing boundary tokens, 
>> >> > and wrapping the queries with the equivalent of 
>> >> > Intervals.notContaining(query, boundary-query); you could also put a 
>> >> > very large position increment gap and use a width filter, but that’s a 
>> >> > bit more error prone if you could conceivably have lots of text in the 
>> >> > individual field entries.
>> >> >
>> >> > > On 10 Sep 2020, at 10:38, Dawid Weiss <[email protected]> wrote:
>> >> > >
>> >> > > Hi Alan,
>> >> > >
>> >> > > You're the expert here so I thought I'd ask before I jump in deep. Do
>> >> > > you think it's feasible to solve the following multivalued-field
>> >> > > problem:
>> >> > >
>> >> > > doc: field=["foo", "bar"]
>> >> > > query: field:(foo AND bar)
>> >> > >
>> >> > > I'd like the above to return zero hits (no single value contains both
>> >> > > foo and bar), but since multi-valued fields are logically indexed as a
>> >> > > single field, it returns doc. I recognize this as a well known problem
>> >> > > but subdocuments are not fun to deal with so I'd like to avoid them at
>> >> > > all costs.
>> >> > >
>> >> > > Would it be possible to solve the above with intervals? Say, something
>> >> > > like this:
>> >> > >
>> >> > > Intervals.containing(valuePositionRanges(), query).
>> >> > >
>> >> > > I assume the containment relationship would get rid of false-positives
>> >> > > crossing value boundary here. The problem is in how to construct those
>> >> > > value position ranges... Store them at index-construction time
>> >> > > somehow? Compute them on the fly for anything that has a chance to
>> >> > > match query? Your thoughts would be very appreciated.
>> >> > >
>> >> > > Dawid
>> >> > >
>> >> > > ---------------------------------------------------------------------
>> >> > > To unsubscribe, e-mail: [email protected]
>> >> > > For additional commands, e-mail: [email protected]
>> >> > >
>> >> >
>> >> >
>> >> > ---------------------------------------------------------------------
>> >> > To unsubscribe, e-mail: [email protected]
>> >> > For additional commands, e-mail: [email protected]
>> >> >
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [email protected]
>> >> For additional commands, e-mail: [email protected]
>> >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to