I've seen payloads used in an interesting use-case. Storing different values in the _same_ term's payload to be used in A/B testing. I.e. I might index a-123|b-234|c-456 and then store it as a blob. Now when I attach "&experiment=a" to a query, custom code extracts "123" from the payload and uses that in scoring calculations.
FWIW, Erick On Wed, Feb 13, 2019 at 8:59 AM Alan Woodward <romseyg...@gmail.com> wrote: > > Hey Michael, that’s a really interesting use case. This should be possible > using intervals as well - you could write an IntervalsSource that reads > payloads and overrides end() to increment it by the encoded length. > > For the filtering cases, it should be easy to add a new factory method that > takes a Term and a Predicate<BytesRef> that allows you to filter out > particular terms based on their payloads. > > I’m still interested in seeing how people are using it for scoring as well, > so please keep replying to the thread. > > On 13 Feb 2019, at 15:21, Michael Gibney <mich...@michaelgibney.net> wrote: > > Hi Alan, > > At the moment I'm using Payloads (exposed via TermSpans) to store > positionLength in the index per-position (definitely in Uwe's category of > "[because] payloads are stored together with the positions in the postings"). > I'm using the positionLength for precise SpanNearQuery phrase matching with > index-time synonyms/token-graphs. > > I'm not sure how directly relevant positionLength would be to IntervalSource. > But more generally, I can say that I really appreciate having access to > Payloads as a generic framework for implementation of experimental features > that rely on per-position indexed attributes. > > Michael > > On Wed, Feb 13, 2019 at 3:27 AM Uwe Schindler <u...@thetaphi.de> wrote: >> >> Hi, >> >> I think the main reason why there are Payload implementation inside Spans >> are the fact that the payloads are stored together with the positions in the >> postings. Due to performance reasons, back at that time, the processing of >> payloads was put into the span query series, because then you can score by >> payload and do position based stuff in a single pass. >> >> I agree that adding that to the IntervalSource API is hard, because >> IntervalSource does not know anything about payloads, so a combination of >> different queries won't work. And as you said, the soring is separated. >> >> Payloads are mostly used for scoring, but I don't remember any use case I >> had in the last 5 years that made use of this - it was just too slow. And >> term-level boosts are seldomly used. In most cases people stick with >> document-level boosts (docvalues). Nowadays I'd also recommend FeatureField >> for term/keyword/category-level scoring. >> >> One thing that payloads were used are NLP features like word type >> annotations and filtering based on that, which requires (of course support >> in spans). But in most cases the better way to do this is to add the >> annotation into the term text and do simple term queries (like terms called >> "lucene#propernoun"). >> >> IMHO, adding a PayloadTermQuery-like type to change the term frequency based >> on a function of payload is fine, but can easily be modelled with >> FeatureField, too. >> >> Uwe >> >> ----- >> Uwe Schindler >> Achterdiek 19, D-28357 Bremen >> http://www.thetaphi.de >> eMail: u...@thetaphi.de >> >> > -----Original Message----- >> > From: Alan Woodward <romseyg...@gmail.com> >> > Sent: Thursday, February 7, 2019 10:27 AM >> > To: dev@lucene.apache.org >> > Subject: Use of Payloads >> > >> > Hi all, >> > >> > The new intervals queries are now nearly at feature parity with Spans; the >> > implementations still outstanding are all to do with using payloads. >> > Currently, span queries allow you to filter out spans based on the payloads >> > of the matching terms, and also allow you to modify the score of the query >> > as a whole based on those payloads. I’d like to get some idea of how >> > people >> > are actually using these functions. >> > >> > In terms of filtering, adding an IntervalSource that wraps a simple term >> > and >> > filters it out based on the payload will be simple enough. Adding this for >> > compound intervals is more complicated, and I think trickier to reason >> > about, >> > so I’d like to try and avoid doing this if possible - feedback on actual >> > use- >> > cases would be helpful here. >> > >> > For scoring, intervals use a completely different scoring mechanism to >> > Spans, >> > just returning a scaled score between 0 and [boost]. To include term >> > weighting as well, users should combine the Intervals query with a boolean >> > query consisting of all terms used in the IntervalsSource. This doesn’t >> > mix so >> > well with payloads, but an alternative option here could be to add a >> > PayloadTermQuery that can adjust the term frequency of a term on a >> > particular document via a payload function. >> > >> > What do people think? Are there cases that I’ve missed, or other possible >> > uses here? >> > >> > - Alan >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> > For additional commands, e-mail: dev-h...@lucene.apache.org >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org