Ok so the more general question is whether we need an interval query parser
Le jeu. 10 sept. 2020 à 17:28, Dawid Weiss <[email protected]> a écrit : > I am fine with the boundary token suggestion, actually. What I don't > see at the moment is how I can marry it with an output of a general > query parser (which returns any Query). I could give an attempt to > process the query node tree from standard query parser (which we're > using at the moment anyway) but if the tree becomes complex there is > no guarantee I can extract subtrees that can be parsed into > IntervalSources (and then in turn into IntervalQuery). > > Dawid > > On Thu, Sep 10, 2020 at 4:28 PM jim ferenczi <[email protected]> > wrote: > > > > Right, I misunderstood Alan's answer. The boundary option is not > "impure" in my opinion. It solves this issue nicely but maybe it needs > something more packaged to add the boundaries and build queries easily. > > > > Le jeu. 10 sept. 2020 à 16:16, Dawid Weiss <[email protected]> a > écrit : > >> > >> Yup - similar to what Alan suggested. I'd have to rewrite the (general > >> text-to-query) query parser to only use intervals though. Still > >> thinking about possible approaches to this. > >> > >> D. > >> > >> On Thu, Sep 10, 2020 at 3:58 PM jim ferenczi <[email protected]> > wrote: > >> > > >> > You could set a very high position increment gap for multi-valued > fields (Analyzer#getPositionIncrementGap) and perform something > >> > like Intervals.maxWidth(Intervals.unordered(...), pos_gap-1) ? > >> > > >> > > >> > Le jeu. 10 sept. 2020 à 12:32, Dawid Weiss <[email protected]> a > écrit : > >> >> > >> >> Yeah... I was thinking about adding synthetic boundaries but this > >> >> seems... impure. :) Another quick reflection is that I'd have to > >> >> somehow translate the original query (which can be arbitrarily > >> >> complex) into an interval query. Tough. > >> >> > >> >> D. > >> >> > >> >> On Thu, Sep 10, 2020 at 12:22 PM Alan Woodward <[email protected]> > wrote: > >> >> > > >> >> > I’ve solved this sort of thing in the past by indexing boundary > tokens, and wrapping the queries with the equivalent of > Intervals.notContaining(query, boundary-query); you could also put a very > large position increment gap and use a width filter, but that’s a bit more > error prone if you could conceivably have lots of text in the individual > field entries. > >> >> > > >> >> > > On 10 Sep 2020, at 10:38, Dawid Weiss <[email protected]> > wrote: > >> >> > > > >> >> > > Hi Alan, > >> >> > > > >> >> > > You're the expert here so I thought I'd ask before I jump in > deep. Do > >> >> > > you think it's feasible to solve the following multivalued-field > >> >> > > problem: > >> >> > > > >> >> > > doc: field=["foo", "bar"] > >> >> > > query: field:(foo AND bar) > >> >> > > > >> >> > > I'd like the above to return zero hits (no single value contains > both > >> >> > > foo and bar), but since multi-valued fields are logically > indexed as a > >> >> > > single field, it returns doc. I recognize this as a well known > problem > >> >> > > but subdocuments are not fun to deal with so I'd like to avoid > them at > >> >> > > all costs. > >> >> > > > >> >> > > Would it be possible to solve the above with intervals? Say, > something > >> >> > > like this: > >> >> > > > >> >> > > Intervals.containing(valuePositionRanges(), query). > >> >> > > > >> >> > > I assume the containment relationship would get rid of > false-positives > >> >> > > crossing value boundary here. The problem is in how to construct > those > >> >> > > value position ranges... Store them at index-construction time > >> >> > > somehow? Compute them on the fly for anything that has a chance > to > >> >> > > match query? Your thoughts would be very appreciated. > >> >> > > > >> >> > > Dawid > >> >> > > > >> >> > > > --------------------------------------------------------------------- > >> >> > > To unsubscribe, e-mail: [email protected] > >> >> > > For additional commands, e-mail: [email protected] > >> >> > > > >> >> > > >> >> > > >> >> > > --------------------------------------------------------------------- > >> >> > To unsubscribe, e-mail: [email protected] > >> >> > For additional commands, e-mail: [email protected] > >> >> > > >> >> > >> >> --------------------------------------------------------------------- > >> >> To unsubscribe, e-mail: [email protected] > >> >> For additional commands, e-mail: [email protected] > >> >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [email protected] > >> For additional commands, e-mail: [email protected] > >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
