[
https://issues.apache.org/jira/browse/LUCENE-7498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15972833#comment-15972833
]
ASF GitHub Bot commented on LUCENE-7498:
----------------------------------------
GitHub user alessandrobenedetti opened a pull request:
https://github.com/apache/lucene-solr/pull/191
Lucene-7498
This Pull Request related to the JIRA issue : LUCENE-7498
It involves the introduction of a big refactor of the More Like This module
and the introduction of the BM25 similarity.
It is not supposed to be a final patch but to put the basis for a big
improvement in the More Like This code base.
Any feedback is welcome
**Summary**
MoreLikeThis becomes a facade, just to expose the main More Like This
functionality.
Responsibility are now separated in :
- Interesting Terms retriever ( from a docId in the index or from a Lucene
Document passed in input)
- Scorer ( to identify how much a term is interesting : BM25 and TFIDF
supported
- Mlt query builder ( to build the query from the interesting terms)
Every component is specifically tested.
The modification impact as a side effect :
**Classification**
Knn CLassifiers to use the refactored More Like This
Knn query in Lucene will be slightly different
**Single Solr Instance**
The refactored MLT usage by Solr
**SolrCloud**
The refactored MLT usage by SolrCloud
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/alessandrobenedetti/lucene-solr lucene-7498
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/lucene-solr/pull/191.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #191
----
commit 5c2648aff8258472105fd1e85df806f4871d8c98
Author: Alessandro Benedetti <[email protected]>
Date: 2017-02-06T22:48:41Z
[LUCENE-7498] initial patch
commit 562fb48acfe3cbf5df62c3818b89ab7904aa52a9
Author: Alessandro Benedetti <[email protected]>
Date: 2017-02-06T23:09:57Z
[LUCENE-7498] minor fix in field names with boost analysis
commit 061ca863a9f2fadd0ba996c9041cc720128a127b
Author: Alessandro Benedetti <[email protected]>
Date: 2017-02-06T23:32:56Z
[LUCENE-7498] original test was not correct, fixed
----
> More Like This to Use BM25
> --------------------------
>
> Key: LUCENE-7498
> URL: https://issues.apache.org/jira/browse/LUCENE-7498
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/other
> Reporter: Alessandro Benedetti
>
> BM25 is now the default similarity, but the more like this is still using the
> old TF/IDF .
>
> This issue is to move to BM25 and refactor the MLT to be more organised,
> extensible and maintainable.
> Few extensions will follow later, but the focus of this issue will be :
> - BM25
> - code refactor + tests
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]