Re: Use of Payloads

Erick Erickson Wed, 13 Feb 2019 09:46:17 -0800

I've seen payloads used in an interesting use-case. Storing different
values in the _same_ term's payload to be used in A/B testing. I.e. I
might index
a-123|b-234|c-456
and then store it as a blob. Now when I attach "&experiment=a" to a
query, custom code extracts "123" from the payload and uses that in
scoring calculations.


FWIW,
Erick

On Wed, Feb 13, 2019 at 8:59 AM Alan Woodward <[email protected]> wrote:
>
> Hey Michael, that’s a really interesting use case.  This should be possible 
> using intervals as well - you could write an IntervalsSource that reads 
> payloads and overrides end() to increment it by the encoded length.
>
> For the filtering cases, it should be easy to add a new factory method that 
> takes a Term and a Predicate<BytesRef> that allows you to filter out 
> particular terms based on their payloads.
>
> I’m still interested in seeing how people are using it for scoring as well, 
> so please keep replying to the thread.
>
> On 13 Feb 2019, at 15:21, Michael Gibney <[email protected]> wrote:
>
> Hi Alan,
>
> At the moment I'm using Payloads (exposed via TermSpans) to store 
> positionLength in the index per-position (definitely in Uwe's category of 
> "[because] payloads are stored together with the positions in the postings"). 
> I'm using the positionLength for precise SpanNearQuery phrase matching with 
> index-time synonyms/token-graphs.
>
> I'm not sure how directly relevant positionLength would be to IntervalSource. 
> But more generally, I can say that I really appreciate having access to 
> Payloads as a generic framework for implementation of experimental features 
> that rely on per-position indexed attributes.
>
> Michael
>
> On Wed, Feb 13, 2019 at 3:27 AM Uwe Schindler <[email protected]> wrote:
>>
>> Hi,
>>
>> I think the main reason why there are Payload implementation inside Spans 
>> are the fact that the payloads are stored together with the positions in the 
>> postings. Due to performance reasons, back at that time, the processing of 
>> payloads was put into the span query series, because then you can score by 
>> payload and do position based stuff in a single pass.
>>
>> I agree that adding that to the IntervalSource API is hard, because 
>> IntervalSource does not know anything about payloads, so a combination of 
>> different queries won't work. And as you said, the soring is separated.
>>
>> Payloads are mostly used for scoring, but I don't remember any use case I 
>> had in the last 5 years that made use of this - it was just too slow. And 
>> term-level boosts are seldomly used. In most cases people stick with 
>> document-level boosts (docvalues). Nowadays I'd also recommend FeatureField 
>> for term/keyword/category-level scoring.
>>
>> One thing that payloads were used are NLP features like word type 
>> annotations and filtering based on that, which requires (of course support 
>> in spans). But in most cases the better way to do this is to add the 
>> annotation into the term text and do simple term queries (like terms called 
>> "lucene#propernoun").
>>
>> IMHO, adding a PayloadTermQuery-like type to change the term frequency based 
>> on a function of payload is fine, but can easily be modelled with 
>> FeatureField, too.
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> Achterdiek 19, D-28357 Bremen
>> http://www.thetaphi.de
>> eMail: [email protected]
>>
>> > -----Original Message-----
>> > From: Alan Woodward <[email protected]>
>> > Sent: Thursday, February 7, 2019 10:27 AM
>> > To: [email protected]
>> > Subject: Use of Payloads
>> >
>> > Hi all,
>> >
>> > The new intervals queries are now nearly at feature parity with Spans; the
>> > implementations still outstanding are all to do with using payloads.
>> > Currently, span queries allow you to filter out spans based on the payloads
>> > of the matching terms, and also allow you to modify the score of the query
>> > as a whole based on those payloads.  I’d like to get some idea of how 
>> > people
>> > are actually using these functions.
>> >
>> > In terms of filtering, adding an IntervalSource that wraps a simple term 
>> > and
>> > filters it out based on the payload will be simple enough.  Adding this for
>> > compound intervals is more complicated, and I think trickier to reason 
>> > about,
>> > so I’d like to try and avoid doing this if possible - feedback on actual 
>> > use-
>> > cases would be helpful here.
>> >
>> > For scoring, intervals use a completely different scoring mechanism to 
>> > Spans,
>> > just returning a scaled score between 0 and [boost].  To include term
>> > weighting as well, users should combine the Intervals query with a boolean
>> > query consisting of all terms used in the IntervalsSource.  This doesn’t 
>> > mix so
>> > well with payloads, but an alternative option here could be to add a
>> > PayloadTermQuery that can adjust the term frequency of a term on a
>> > particular document via a payload function.
>> >
>> > What do people think?  Are there cases that I’ve missed, or other possible
>> > uses here?
>> >
>> > - Alan
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: [email protected]
>> > For additional commands, e-mail: [email protected]
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Use of Payloads

Reply via email to