Ok so the more general question is whether we need an interval query parser

Le jeu. 10 sept. 2020 à 17:28, Dawid Weiss <[email protected]> a écrit :

> I am fine with the boundary token suggestion, actually. What I don't
> see at the moment is how I can marry it with an output of a general
> query parser (which returns any Query). I could give an attempt to
> process the query node tree from standard query parser (which we're
> using at the moment anyway) but if the tree becomes complex there is
> no guarantee I can extract subtrees that can be parsed into
> IntervalSources (and then in turn into IntervalQuery).
>
> Dawid
>
> On Thu, Sep 10, 2020 at 4:28 PM jim ferenczi <[email protected]>
> wrote:
> >
> > Right, I misunderstood Alan's answer. The boundary option is not
> "impure" in my opinion. It solves this issue nicely but maybe it needs
> something more packaged to add the boundaries and build queries easily.
> >
> > Le jeu. 10 sept. 2020 à 16:16, Dawid Weiss <[email protected]> a
> écrit :
> >>
> >> Yup - similar to what Alan suggested. I'd have to rewrite the (general
> >> text-to-query) query parser to only use intervals though. Still
> >> thinking about possible approaches to this.
> >>
> >> D.
> >>
> >> On Thu, Sep 10, 2020 at 3:58 PM jim ferenczi <[email protected]>
> wrote:
> >> >
> >> > You could set a very high position increment gap for multi-valued
> fields (Analyzer#getPositionIncrementGap) and perform something
> >> > like Intervals.maxWidth(Intervals.unordered(...), pos_gap-1) ?
> >> >
> >> >
> >> > Le jeu. 10 sept. 2020 à 12:32, Dawid Weiss <[email protected]> a
> écrit :
> >> >>
> >> >> Yeah... I was thinking about adding synthetic boundaries but this
> >> >> seems... impure. :) Another quick reflection is that I'd have to
> >> >> somehow translate the original query (which can be arbitrarily
> >> >> complex) into an interval query. Tough.
> >> >>
> >> >> D.
> >> >>
> >> >> On Thu, Sep 10, 2020 at 12:22 PM Alan Woodward <[email protected]>
> wrote:
> >> >> >
> >> >> > I’ve solved this sort of thing in the past by indexing boundary
> tokens, and wrapping the queries with the equivalent of
> Intervals.notContaining(query, boundary-query); you could also put a very
> large position increment gap and use a width filter, but that’s a bit more
> error prone if you could conceivably have lots of text in the individual
> field entries.
> >> >> >
> >> >> > > On 10 Sep 2020, at 10:38, Dawid Weiss <[email protected]>
> wrote:
> >> >> > >
> >> >> > > Hi Alan,
> >> >> > >
> >> >> > > You're the expert here so I thought I'd ask before I jump in
> deep. Do
> >> >> > > you think it's feasible to solve the following multivalued-field
> >> >> > > problem:
> >> >> > >
> >> >> > > doc: field=["foo", "bar"]
> >> >> > > query: field:(foo AND bar)
> >> >> > >
> >> >> > > I'd like the above to return zero hits (no single value contains
> both
> >> >> > > foo and bar), but since multi-valued fields are logically
> indexed as a
> >> >> > > single field, it returns doc. I recognize this as a well known
> problem
> >> >> > > but subdocuments are not fun to deal with so I'd like to avoid
> them at
> >> >> > > all costs.
> >> >> > >
> >> >> > > Would it be possible to solve the above with intervals? Say,
> something
> >> >> > > like this:
> >> >> > >
> >> >> > > Intervals.containing(valuePositionRanges(), query).
> >> >> > >
> >> >> > > I assume the containment relationship would get rid of
> false-positives
> >> >> > > crossing value boundary here. The problem is in how to construct
> those
> >> >> > > value position ranges... Store them at index-construction time
> >> >> > > somehow? Compute them on the fly for anything that has a chance
> to
> >> >> > > match query? Your thoughts would be very appreciated.
> >> >> > >
> >> >> > > Dawid
> >> >> > >
> >> >> > >
> ---------------------------------------------------------------------
> >> >> > > To unsubscribe, e-mail: [email protected]
> >> >> > > For additional commands, e-mail: [email protected]
> >> >> > >
> >> >> >
> >> >> >
> >> >> >
> ---------------------------------------------------------------------
> >> >> > To unsubscribe, e-mail: [email protected]
> >> >> > For additional commands, e-mail: [email protected]
> >> >> >
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: [email protected]
> >> >> For additional commands, e-mail: [email protected]
> >> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to