Hi Alan, At the moment I'm using Payloads (exposed via TermSpans) to store positionLength in the index per-position <https://michaelgibney.net/2018/09/lucene-graph-queries-2/#we-can-do-something-useful-with-positionlength-now> (definitely in Uwe's category of "[because] payloads are stored together with the positions in the postings"). I'm using the positionLength for precise SpanNearQuery phrase matching with index-time synonyms/token-graphs.
I'm not sure how directly relevant positionLength would be to IntervalSource. But more generally, I can say that I really appreciate having access to Payloads as a generic framework for implementation of experimental features that rely on per-position indexed attributes. Michael On Wed, Feb 13, 2019 at 3:27 AM Uwe Schindler <u...@thetaphi.de> wrote: > Hi, > > I think the main reason why there are Payload implementation inside Spans > are the fact that the payloads are stored together with the positions in > the postings. Due to performance reasons, back at that time, the processing > of payloads was put into the span query series, because then you can score > by payload and do position based stuff in a single pass. > > I agree that adding that to the IntervalSource API is hard, because > IntervalSource does not know anything about payloads, so a combination of > different queries won't work. And as you said, the soring is separated. > > Payloads are mostly used for scoring, but I don't remember any use case I > had in the last 5 years that made use of this - it was just too slow. And > term-level boosts are seldomly used. In most cases people stick with > document-level boosts (docvalues). Nowadays I'd also recommend FeatureField > for term/keyword/category-level scoring. > > One thing that payloads were used are NLP features like word type > annotations and filtering based on that, which requires (of course support > in spans). But in most cases the better way to do this is to add the > annotation into the term text and do simple term queries (like terms called > "lucene#propernoun"). > > IMHO, adding a PayloadTermQuery-like type to change the term frequency > based on a function of payload is fine, but can easily be modelled with > FeatureField, too. > > Uwe > > ----- > Uwe Schindler > Achterdiek 19, D-28357 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -----Original Message----- > > From: Alan Woodward <romseyg...@gmail.com> > > Sent: Thursday, February 7, 2019 10:27 AM > > To: dev@lucene.apache.org > > Subject: Use of Payloads > > > > Hi all, > > > > The new intervals queries are now nearly at feature parity with Spans; > the > > implementations still outstanding are all to do with using payloads. > > Currently, span queries allow you to filter out spans based on the > payloads > > of the matching terms, and also allow you to modify the score of the > query > > as a whole based on those payloads. I’d like to get some idea of how > people > > are actually using these functions. > > > > In terms of filtering, adding an IntervalSource that wraps a simple term > and > > filters it out based on the payload will be simple enough. Adding this > for > > compound intervals is more complicated, and I think trickier to reason > about, > > so I’d like to try and avoid doing this if possible - feedback on actual > use- > > cases would be helpful here. > > > > For scoring, intervals use a completely different scoring mechanism to > Spans, > > just returning a scaled score between 0 and [boost]. To include term > > weighting as well, users should combine the Intervals query with a > boolean > > query consisting of all terms used in the IntervalsSource. This doesn’t > mix so > > well with payloads, but an alternative option here could be to add a > > PayloadTermQuery that can adjust the term frequency of a term on a > > particular document via a payload function. > > > > What do people think? Are there cases that I’ve missed, or other > possible > > uses here? > > > > - Alan > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > > For additional commands, e-mail: dev-h...@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >