Re: Absolute term position in scoring

Michael McCandless Mon, 26 Jan 2015 06:08:58 -0800

A custom query could improve on the situation by not pulling multiple
docs/positions enum for a single term.  E.g. the patch on
https://issues.apache.org/jira/browse/LUCENE-5288 (which never got
committed: too controversial) has such a query, letting you customize
how positions are scored for boolean term query matches.  Maybe you
could start from it and see how performance compares vs the
SpanFirstQuery approach...


Mike McCandless

http://blog.mikemccandless.com


On Mon, Jan 26, 2015 at 6:14 AM, Uwe Schindler <[email protected]> wrote:
> Hi,
>
> it depends on the query structure. In fact, SpanFirstQuery is slow (all span 
> queries are slow because of position use, this may improve in the future).
>
> You question was about using multiple fields - in fact querying for the same 
> terms on multiple fields and/or different query types: This is the standard 
> approach to tune the relevance! But it always has a cost. In most cases you 
> will not see a large difference (unless you use phrase or span queries). A 
> very good explanation what can be done using this is described in the 
> Elasticsearch Guide: 
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/multi-field-search.html
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [email protected]
>
>
>> -----Original Message-----
>> From: Alexey Morozov [mailto:[email protected]]
>> Sent: Monday, January 26, 2015 11:49 AM
>> To: [email protected]
>> Subject: Re: Absolute term position in scoring
>>
>> Hello!
>>
>> I'd like to ask if this approach: construct a complex query consisting of a
>> boosted "specialized" part and an "ordinary" part with no boost, - doesn't
>> [necessarily] cause a significant performance degradation compared to a
>> "custom query", specialized for a particular need.
>>
>> Thanks in advance,
>> Alexey Morozov
>>
>> 26.01.2015 14:57, Michael McCandless пишет:
>> > Well you could have ordinary term queries, and then a SHOULD
>> > SpanFirstQuery clause with a boost, to give higher scores to those
>> > docs that also had the
>> > term(s) close to the start of the document.
>> >
>> >
>> > Mike McCandless
>> >
>> > http://blog.mikemccandless.com
>> >
>> > On Sun, Jan 25, 2015 at 5:44 PM, Luis A Lastras <[email protected]>
>> wrote:
>> >
>> >> Thanks I didn't know about SpanFirstQuery. I can likely get something
>> >> going with that. I was still hoping that we could affect the scoring
>> >> formula with the position itself, but maybe this is not feasible.
>> >>
>> >> Luis
>> >>
>> >>
>> >>
>> >>    ------------------------------
>> >>
>> >>
>> >>
>> >> *Luis A Lastras, Ph.D. Research Staff Member & Manager, Concept
>> Analytics,
>> >>    IBM Watson*
>> >>    *Member of the iBM Academy of Technology*
>> >>
>> >> *IBM Master Inventor email: **[email protected]*
>> >> <[email protected]>
>> >> * | Tel: 914-945-3613 <914-945-3613> | Cell: 914-382-1879 <914-382-1879>
>> >>    address:  1101 Kitchawan Rd, Office 28-132, Yorktown Heights, NY,
>> >> 10598*
>> >>
>> >>
>> >>
>> >>
>> >>    <http://www.facebook.com/ibmwatson>
>> >>
>> >>
>> >>    ------------------------------
>> >>
>> >>
>> >>
>> >> [image: Inactive hide details for Michael McCandless ---01/25/2015
>> >> 08:12:18 AM---Maybe SpanFirstQuery? Mike McCandless]Michael
>> >> McCandless
>> >> ---01/25/2015 08:12:18 AM---Maybe SpanFirstQuery? Mike McCandless
>> >>
>> >> From: Michael McCandless <[email protected]>
>> >> To: Lucene Users <[email protected]>
>> >> Date: 01/25/2015 08:12 AM
>> >> Subject: Re: Absolute term position in scoring
>> >> ------------------------------
>> >>
>> >>
>> >>
>> >> Maybe SpanFirstQuery?
>> >>
>> >>
>> >> Mike McCandless
>> >>
>> >> http://blog.mikemccandless.com
>> >>
>> >> On Sat, Jan 24, 2015 at 9:34 PM, Luis A Lastras <[email protected]>
>> >> wrote:
>> >>
>> >>> Is it possible to incorporate in Lucene's scoring function the
>> >>> position
>> >> of
>> >>> a matching term (say as measured from the top of the document). The
>> >>> scenario is, if the set of documents tend to lk about the most
>> >>> important stuff at the beginning of the document, then we would like
>> >>> to give preference to documents that mention a term close to the top.
>> >>>
>> >>> Thanks,
>> >>>
>> >>> Luis
>> >>>
>> >>>
>> >>>
>> >>>    ------------------------------
>> >>>
>> >>>
>> >>>
>> >>> *Luis A Lastras, Ph.D. Research Staff Member & Manager, Concept
>> >> Analytics,
>> >>>    IBM Watson*
>> >>>    *Member of the iBM Academy of Technology*
>> >>>
>> >>> *IBM Master Inventor email: **[email protected]*
>> >>> <[email protected]
>> >>>
>> >>> * | Tel: 914-945-3613 <914-945-3613> | Cell: 914-382-1879 <914-382-
>> 1879>
>> >>>    address:  1101 Kitchawan Rd, Office 28-132, Yorktown Heights, NY,
>> >> 10598*
>> >>>
>> >>>
>> >>>
>> >>>    <http://www.facebook.com/ibmwatson>
>> >>>
>> >>>
>> >>>    ------------------------------
>> >>>
>> >>>
>> >>>
>> >>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Absolute term position in scoring

Reply via email to