RE: Use of Payloads

Uwe Schindler Wed, 13 Feb 2019 00:28:02 -0800

Hi,

I think the main reason why there are Payload implementation inside Spans are 
the fact that the payloads are stored together with the positions in the 
postings. Due to performance reasons, back at that time, the processing of 
payloads was put into the span query series, because then you can score by 
payload and do position based stuff in a single pass.


I agree that adding that to the IntervalSource API is hard, because 
IntervalSource does not know anything about payloads, so a combination of 
different queries won't work. And as you said, the soring is separated.

Payloads are mostly used for scoring, but I don't remember any use case I had 
in the last 5 years that made use of this - it was just too slow. And 
term-level boosts are seldomly used. In most cases people stick with 
document-level boosts (docvalues). Nowadays I'd also recommend FeatureField for 
term/keyword/category-level scoring.

One thing that payloads were used are NLP features like word type annotations 
and filtering based on that, which requires (of course support in spans). But 
in most cases the better way to do this is to add the annotation into the term 
text and do simple term queries (like terms called "lucene#propernoun").

IMHO, adding a PayloadTermQuery-like type to change the term frequency based on 
a function of payload is fine, but can easily be modelled with FeatureField, 
too.

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -----Original Message-----
> From: Alan Woodward <romseyg...@gmail.com>
> Sent: Thursday, February 7, 2019 10:27 AM
> To: dev@lucene.apache.org
> Subject: Use of Payloads
> 
> Hi all,
> 
> The new intervals queries are now nearly at feature parity with Spans; the
> implementations still outstanding are all to do with using payloads.
> Currently, span queries allow you to filter out spans based on the payloads
> of the matching terms, and also allow you to modify the score of the query
> as a whole based on those payloads.  I’d like to get some idea of how people
> are actually using these functions.
> 
> In terms of filtering, adding an IntervalSource that wraps a simple term and
> filters it out based on the payload will be simple enough.  Adding this for
> compound intervals is more complicated, and I think trickier to reason about,
> so I’d like to try and avoid doing this if possible - feedback on actual use-
> cases would be helpful here.
> 
> For scoring, intervals use a completely different scoring mechanism to Spans,
> just returning a scaled score between 0 and [boost].  To include term
> weighting as well, users should combine the Intervals query with a boolean
> query consisting of all terms used in the IntervalsSource.  This doesn’t mix 
> so
> well with payloads, but an alternative option here could be to add a
> PayloadTermQuery that can adjust the term frequency of a term on a
> particular document via a payload function.
> 
> What do people think?  Are there cases that I’ve missed, or other possible
> uses here?
> 
> - Alan
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: Use of Payloads

Reply via email to