You can create a different RewriteMethod for MultiTermQueries (see the default 
used by Fuzzy query). This one is used to convert the FuzzyQuery on rewrite to 
a BooleanQuery. To achieve what you want to have just create a subclass of 
RewriteMethod that uses a DisjunctionMaxQuery instead of BooleanQuery to 
collect the clauses:

Subclass this abstract one: 
https://lucene.apache.org/core/8_6_0/core/org/apache/lucene/search/TopTermsRewrite.html

...and set it as RewriteMethod on the Fuzzy. Use one of the already existing 
subclasses as example and adapt it for DisjunctionMaxQuery.

Uwe

Am September 23, 2020 5:58:29 PM UTC schrieb "Eastlack, Kainoa" 
<keastl...@novetta.com>:
>When performing a fuzzy search inside a BooleanQuery, it looks like the
>default behavior is to score all fuzzy matches separately and then sum
>them
>up to get an aggregate score. However, I need it to instead score based
>on
>the maximum of each distinct match it might find, rather than the sum
>of
>them, to avoid overly inflated scores in some circumstances.
>
>For example, consider a query for "Bstn~2" and four documents
>containing
>"Boston", "Basin", "Boston Basin", and "Boston Boston Basin". The query
>might respectively score them as 1, 1, 2, and 3 (or something like
>that,
>depending on the scorer used, of course). However, I need it to instead
>score them as 1, 1, 1, and 2, since that's the count of just the most
>frequent unique fuzzy match in each document.
>
>Ideally I'd like to use a built in mechanism for achieving this, but if
>it's not available, a way to extend the BooleanQuery, BooleanWeight,
>and/or
>BooleanScorer classes to have slightly different scoring logic but
>otherwise function exactly the same would also work, but all of those
>are
>either final classes or have no public constructor, effectively making
>it
>impossible to reuse their logic directly, as near as I can tell.
>
>If anyone has any ideas of how to approach this, it would be very
>helpful.
>
>Thanks,
>Kainoa

--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de

Reply via email to