pminkov opened a new pull request, #940: URL: https://github.com/apache/lucene/pull/940
### Description MoreLikeThis picks terms by their TF-IDF score. The TF part of the score was used by taking the term frequency directly, without applying a square root through ClassicSimilarity.tf(). The result is that how common a term is in an input can have too much weight on whether it's selected as a search term. An example of a negative effect is that this can make more stop words make their way into the final query. ### Tests Ran MoreLikeThis tests with: ```commandline ./gradlew -p lucene/queries test --tests TestMoreLikeThis ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org