jenkins-bot has submitted this change and it was merged. Change subject: Remove support for the all field in morelike ......................................................................
Remove support for the all field in morelike This was never used in production. It needs another pass to fetch the data. It also helps to make Keywordfeatures construction more homogeneous. Change-Id: I055816d8f8ff115fe499dbce64fd38a0e70a4ee5 --- M CirrusSearch.php M i18n/en.json M includes/Hooks.php M includes/Query/MoreLikeFeature.php M includes/Searcher.php M tests/browser/features/more_like_this_options.feature M tests/unit/Query/MoreLikeFeatureTest.php 7 files changed, 4 insertions(+), 73 deletions(-) Approvals: EBernhardson: Looks good to me, approved jenkins-bot: Verified Objections: Cindy-the-browser-test-bot: There's a problem with this change, please improve diff --git a/CirrusSearch.php b/CirrusSearch.php index a108f29..873f960 100644 --- a/CirrusSearch.php +++ b/CirrusSearch.php @@ -514,17 +514,7 @@ 'auxiliary_text', 'opening_text', 'headings', - 'all' ]; - -// When set to false cirrus will use the text content to build the query -// and search on the field listed in $wgCirrusSearchMoreLikeThisFields -// Set to true if you want to use field data as input text to build the initial -// query. -// Note that if the all field is used then this setting will be forced to true. -// This is because the all field is not part of the _source and its content cannot -// be retrieved by elasticsearch. -$wgCirrusSearchMoreLikeThisUseFields = false; // This allows redirecting queries to a separate cluster configured // in $wgCirrusSearchClusters. Note that queries can use multiple features, in diff --git a/i18n/en.json b/i18n/en.json index f84b64f..54bd983 100644 --- a/i18n/en.json +++ b/i18n/en.json @@ -22,7 +22,7 @@ "apihelp-cirrus-settings-dump-example": "Get a dump of CirrusSearch settings for this wiki.", "apierror-cirrus-requesttoolong": "Prefix search request was longer than the maximum allowed length. ($1 > $2)", "cirrussearch-give-feedback": "Give us your feedback", - "cirrussearch-morelikethis-settings": " #<!-- leave this line exactly as it is --> <pre>\n# This message lets you configure the settings of the \"more like this\" feature.\n# Changes to this take effect immediately.\n# The syntax is as follows:\n# * Everything from a \"#\" character to the end of the line is a comment.\n# * Every non-blank line is the setting name followed by a \":\" character followed by the setting value\n# The settings are:\n# * min_doc_freq (integer): Minimum number of documents (per shard) that need a term for it to be considered.\n# * max_doc_freq (integer): Maximum number of documents (per shard) that have a term for it to be considered.\n# High frequency terms are generally \"stop words\".\n# * max_query_terms (integer): Maximum number of terms to be considered. This value is limited to $wgCirrusSearchMoreLikeThisMaxQueryTermsLimit (100).\n# * min_term_freq (integer): Minimum number of times the term appears in the input to doc to be considered. For small fields (title) this value should be 1.\n# * minimum_should_match (percentage -100% to 100%, or integer number of terms): The percentage of terms to match on. Defaults to 30%.\n# * min_word_len (integer): Minimal length of a term to be considered. Defaults to 0.\n# * max_word_len (integer): The maximum word length above which words will be ignored. Defaults to unbounded (0).\n# * fields (comma separated list of values): These are the fields to use. Allowed fields are title, text, auxiliary_text, opening_text, headings and all.\n# * use_fields (true|false) : Tell the \"more like this\" query to use only the field data. Defaults to false: the system will extract the content of the text field to build the query.\n# Examples of good lines:\n# min_doc_freq:2\n# max_doc_freq:20000\n# max_query_terms:25\n# min_term_freq:2\n# minimum_should_match:30%\n# min_word_len:2\n# max_word_len:40\n# fields:text,opening_text\n# use_fields:true\n# </pre> <!-- leave this line exactly as it is -->", + "cirrussearch-morelikethis-settings": " #<!-- leave this line exactly as it is --> <pre>\n# This message lets you configure the settings of the \"more like this\" feature.\n# Changes to this take effect immediately.\n# The syntax is as follows:\n# * Everything from a \"#\" character to the end of the line is a comment.\n# * Every non-blank line is the setting name followed by a \":\" character followed by the setting value\n# The settings are:\n# * min_doc_freq (integer): Minimum number of documents (per shard) that need a term for it to be considered.\n# * max_doc_freq (integer): Maximum number of documents (per shard) that have a term for it to be considered.\n# High frequency terms are generally \"stop words\".\n# * max_query_terms (integer): Maximum number of terms to be considered. This value is limited to $wgCirrusSearchMoreLikeThisMaxQueryTermsLimit (100).\n# * min_term_freq (integer): Minimum number of times the term appears in the input to doc to be considered. For small fields (title) this value should be 1.\n# * minimum_should_match (percentage -100% to 100%, or integer number of terms): The percentage of terms to match on. Defaults to 30%.\n# * min_word_len (integer): Minimal length of a term to be considered. Defaults to 0.\n# * max_word_len (integer): The maximum word length above which words will be ignored. Defaults to unbounded (0).\n# * fields (comma separated list of values): These are the fields to use. Allowed fields are title, text, auxiliary_text, opening_text, headings.\n# Examples of good lines:\n# min_doc_freq:2\n# max_doc_freq:20000\n# max_query_terms:25\n# min_term_freq:2\n# minimum_should_match:30%\n# min_word_len:2\n# max_word_len:40\n# fields:text,opening_text\n# </pre> <!-- leave this line exactly as it is -->", "cirrussearch-didyoumean-settings": " #<!-- leave this line exactly as it is --> <pre>\n# This message lets you configure the settings of the \"Did you mean\" suggestions.\n# See also https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-phrase.html\n# Changes to this take effect immediately.\n# The syntax is as follows:\n# * Everything from a \"#\" character to the end of the line is a comment.\n# * Every non-blank line is the setting name followed by a \":\" character followed by the setting value\n# The settings are :\n# * max_errors (integer): the maximum number of terms that will be considered misspelled in order to be corrected. 1 or 2.\n# * confidence (float): The confidence level defines a factor applied to the input phrases score which is used as a threshold for other suggestion candidates. Only candidates that score higher than the threshold will be included in the result. For instance a confidence level of 1.0 will only return suggestions that score higher than the input phrase. If set to 0.0 the best candidate are returned.\n# * min_doc_freq (float 0 to 1): The minimal threshold in number of documents a suggestion should appear in.\n# High frequency terms are generally \"stop words\".\n# * max_term_freq (float 0 to 1): The maximum threshold in number of documents in which a term can exist in order to be included.\n# * prefix_length (integer): The minimal number of prefix characters that must match a term in order to be a suggestion.\n# * suggest_mode (missing, popular, always): The suggest mode controls the way suggestions are included.\n# Examples of good lines:\n# max_errors:2\n# confidence:2.0\n# max_term_freq:0.5\n# min_doc_freq:0.01\n# prefix_length:2\n# suggest_mode:always\n#\n# </pre> <!-- leave this line exactly as it is -->", "cirrussearch-query-too-long": "Search request is longer than the maximum allowed length. ($1 > $2)", "cirrussearch-completionsuggester-pref": "Completion suggester", diff --git a/includes/Hooks.php b/includes/Hooks.php index b7cda74..ad60b0a 100644 --- a/includes/Hooks.php +++ b/includes/Hooks.php @@ -204,7 +204,6 @@ */ private static function overrideMoreLikeThisOptionsFromMessage() { global $wgCirrusSearchMoreLikeThisConfig, - $wgCirrusSearchMoreLikeThisUseFields, $wgCirrusSearchMoreLikeThisAllowedFields, $wgCirrusSearchMoreLikeThisMaxQueryTermsLimit, $wgCirrusSearchMoreLikeThisFields; @@ -261,13 +260,6 @@ array_map( 'trim', explode( ',', $v ) ), $wgCirrusSearchMoreLikeThisAllowedFields ); break; - case 'use_fields': - if ( $v === 'true' ) { - $wgCirrusSearchMoreLikeThisUseFields = true; - } elseif ( $v === 'false' ) { - $wgCirrusSearchMoreLikeThisUseFields = false; - } - break; } if ( $wgCirrusSearchMoreLikeThisConfig['max_query_terms'] > $wgCirrusSearchMoreLikeThisMaxQueryTermsLimit ) { $wgCirrusSearchMoreLikeThisConfig['max_query_terms'] = $wgCirrusSearchMoreLikeThisMaxQueryTermsLimit; @@ -303,7 +295,6 @@ */ private static function overrideMoreLikeThisOptions( WebRequest $request ) { global $wgCirrusSearchMoreLikeThisConfig, - $wgCirrusSearchMoreLikeThisUseFields, $wgCirrusSearchMoreLikeThisAllowedFields, $wgCirrusSearchMoreLikeThisMaxQueryTermsLimit, $wgCirrusSearchMoreLikeThisFields; @@ -316,7 +307,6 @@ self::overrideMinimumShouldMatch( $wgCirrusSearchMoreLikeThisConfig['minimum_should_match'], $request, 'cirrusMltMinimumShouldMatch' ); self::overrideNumeric( $wgCirrusSearchMoreLikeThisConfig['min_word_len'], $request, 'cirrusMltMinWordLength' ); self::overrideNumeric( $wgCirrusSearchMoreLikeThisConfig['max_word_len'], $request, 'cirrusMltMaxWordLength' ); - self::overrideYesNo( $wgCirrusSearchMoreLikeThisUseFields, $request, 'cirrusMltUseFields' ); $fields = $request->getVal( 'cirrusMltFields' ); if( isset( $fields ) ) { $wgCirrusSearchMoreLikeThisFields = array_intersect( diff --git a/includes/Query/MoreLikeFeature.php b/includes/Query/MoreLikeFeature.php index 08b2b41..01837b7 100644 --- a/includes/Query/MoreLikeFeature.php +++ b/includes/Query/MoreLikeFeature.php @@ -20,19 +20,10 @@ private $config; /** - * @var callable Callable for fetching page from elasticsearch. See - * Searcher::get. - */ - private $getCallable; - - /** * @param SearchConfig $config - * @param callable $getCallable Callable for fetching page from - * elasticsearch. See Searcher::get. */ - public function __construct( SearchConfig $config, $getCallable ) { + public function __construct( SearchConfig $config ) { $this->config = $config; - $this->getCallable = $getCallable; } /** @@ -192,36 +183,10 @@ } $moreLikeThisFields = $this->config->get( 'CirrusSearchMoreLikeThisFields' ); - $moreLikeThisUseFields = $this->config->get( 'CirrusSearchMoreLikeThisUseFields' ); sort( $moreLikeThisFields ); $query = new \Elastica\Query\MoreLikeThis(); $query->setParams( $this->config->get( 'CirrusSearchMoreLikeThisConfig' ) ); $query->setFields( $moreLikeThisFields ); - - // The 'all' field cannot be retrieved from _source - // We have to extract the text content before. - if ( in_array( 'all', $moreLikeThisFields ) ) { - $moreLikeThisUseFields = false; - } - - if ( !$moreLikeThisUseFields && $moreLikeThisFields !== ['text'] ) { - // Run a first pass to extract the text field content because we - // want to compare it against other fields. - $text = []; - $found = call_user_func( $this->getCallable, $docIds, ['text'] ); - if ( !$found->isOK() ) { - return null; - } - $found = $found->getValue(); - if ( !count( $found ) ) { - return null; - } - foreach ( $found as $foundArticle ) { - $text[] = $foundArticle->text; - } - sort( $text, SORT_STRING ); - $likeDocs = array_merge( $likeDocs, $text ); - } /** @suppress PhanTypeMismatchArgument library is mis-annotated */ $query->setLike( $likeDocs ); diff --git a/includes/Searcher.php b/includes/Searcher.php index c44b4dd..9044304 100644 --- a/includes/Searcher.php +++ b/includes/Searcher.php @@ -299,7 +299,7 @@ // Handle morelike keyword (greedy). This needs to be the // very first item until combining with other queries // is worked out. - new Query\MoreLikeFeature( $this->config, [$this, 'get'] ), + new Query\MoreLikeFeature( $this->config ), // Handle title prefix notation (greedy) new Query\PrefixFeature(), // Handle prefer-recent keyword diff --git a/tests/browser/features/more_like_this_options.feature b/tests/browser/features/more_like_this_options.feature index f41ab68..d957684 100644 --- a/tests/browser/features/more_like_this_options.feature +++ b/tests/browser/features/more_like_this_options.feature @@ -31,11 +31,3 @@ Scenario: Searching for morelike:<page> with the title field and settings with poor precision When I set More Like This Options to title field, word length to 2 and I search for morelike:More Like Me 1 Then ChangeMe is in the search results - - Scenario: Searching for morelike:<page> with the all field works even if cirrusMtlUseFields is set to yes - When I set More Like This Options to all field, word length to 4 and I search for morelike:More Like Me 1 - Then More Like Me 2 is in the search results - And More Like Me 3 is in the search results - And More Like Me 4 is in the search results - And More Like Me 5 is in the search results - But ChangeMe is not in the search results diff --git a/tests/unit/Query/MoreLikeFeatureTest.php b/tests/unit/Query/MoreLikeFeatureTest.php index 04ccee0..da9333c 100644 --- a/tests/unit/Query/MoreLikeFeatureTest.php +++ b/tests/unit/Query/MoreLikeFeatureTest.php @@ -128,14 +128,8 @@ $context = new SearchContext( $config ); - // This is only used for the 'all' feature which is currently - // untested, and is planned to be removed. - $getCallback = function ( array $docIds, array $fields ) { - throw new \RuntimeException( 'No requests should be made to elasticsearch' ); - }; - // Finally run the test - $feature = new MoreLikeFeature( $config, $getCallback ); + $feature = new MoreLikeFeature( $config ); $result = $feature->apply( $context, $term ); -- To view, visit https://gerrit.wikimedia.org/r/323860 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: merged Gerrit-Change-Id: I055816d8f8ff115fe499dbce64fd38a0e70a4ee5 Gerrit-PatchSet: 2 Gerrit-Project: mediawiki/extensions/CirrusSearch Gerrit-Branch: master Gerrit-Owner: DCausse <dcau...@wikimedia.org> Gerrit-Reviewer: Cindy-the-browser-test-bot <bernhardsone...@gmail.com> Gerrit-Reviewer: EBernhardson <ebernhard...@wikimedia.org> Gerrit-Reviewer: Gehel <gleder...@wikimedia.org> Gerrit-Reviewer: Manybubbles <never...@wikimedia.org> Gerrit-Reviewer: Siebrand <siebr...@kitano.nl> Gerrit-Reviewer: Smalyshev <smalys...@wikimedia.org> Gerrit-Reviewer: jenkins-bot <> _______________________________________________ MediaWiki-commits mailing list MediaWiki-commits@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits