[ 
https://issues.apache.org/jira/browse/LUCENE-7498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15974452#comment-15974452
 ] 

Robert Muir commented on LUCENE-7498:
-------------------------------------

Because this class is doing query expansion, I think before changing its 
algorithm, it should be measured with a relevance test. There is a lot more 
going on here than "switching from tf/idf to bm25", e.g. length normalization 
was never involved here in any way before.

Separately, this change makes things in the guts of the core lucene scoring 
system public that IMO should remain private. It also adds new public methods 
to these classes just for the purpose of MoreLikeThis. I think that formulas 
used here are usually different beasts than core scoring systems and we don't 
need to tie them together at all. E.G. MLT is doing something different than 
TFIDFSimilarity today: its a different formula.

Finally, I think it needs to be split up, the change is thousands of lines and 
a ton of stuff happening at once that should be maybe considered separately?:
* MLT internals broken into many many classes 
* Changing the default scoring algorithm of BM25
* Modification of core scoring systems to allow MLT to interact with it in a 
different way
* Changes of Solr cloud classes, classification module, etc (maybe because of 
api changes).
 


> More Like This to Use BM25
> --------------------------
>
>                 Key: LUCENE-7498
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7498
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/other
>            Reporter: Alessandro Benedetti
>            Assignee: Tommaso Teofili
>
> BM25 is now the default similarity, but the more like this is still using the 
> old TF/IDF .
>  
> This issue is to move to BM25 and refactor the MLT to be more organised, 
> extensible and maintainable.
> Few extensions will follow later, but the focus of this issue will be :
>  - BM25
> - code refactor + tests



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to