Manybubbles has uploaded a new change for review.
https://gerrit.wikimedia.org/r/118763
Change subject: Lots of small fixes
......................................................................
Lots of small fixes
1. Fix Bug: 62625 - exclamation points were finding results but Elasticsearch
was whining about them. Now we check for the whining in integration tests.
2. Fix Bug: 62626 - if a search term started with an asterisk, had some non-
asterisk characters, then contained another asterisk then the escaping that we
did for Elasticsearch wasn't working. This just removes the first asterisk.
3. There was an error in the store norms code that was causing it to not
work quite right - add some more parrens to fix it. This is part of the commit
because it came up when testing the rest.
Change-Id: Idc120275172b339d4c3fe55ffc72c6e91ca648c4
---
M includes/MappingConfigBuilder.php
M includes/Searcher.php
M maintenance/updateOneSearchIndexConfig.php
M tests/browser/features/bad_syntax.feature
M tests/browser/features/step_definitions/search_steps.rb
M tests/browser/features/support/pages/search_results_page.rb
6 files changed, 56 insertions(+), 26 deletions(-)
git pull ssh://gerrit.wikimedia.org:29418/mediawiki/extensions/CirrusSearch
refs/changes/63/118763/1
diff --git a/includes/MappingConfigBuilder.php
b/includes/MappingConfigBuilder.php
index 94cbe89..3751a08 100644
--- a/includes/MappingConfigBuilder.php
+++ b/includes/MappingConfigBuilder.php
@@ -157,7 +157,7 @@
),
)
);
- $disableNorms = $options & MappingConfigBuilder::ENABLE_NORMS
=== 0;
+ $disableNorms = ( $options & MappingConfigBuilder::ENABLE_NORMS
) === 0;
if ( $disableNorms ) {
$disableNorms = array( 'norms' => array( 'enabled' =>
false ) );
$field = array_merge( $field, $disableNorms );
diff --git a/includes/Searcher.php b/includes/Searcher.php
index 25a86e8..e464bf4 100644
--- a/includes/Searcher.php
+++ b/includes/Searcher.php
@@ -423,7 +423,7 @@
// Find prefix matches and force them to only match against the
plain analyzed fields. This
// prevents prefix matches from getting confused by stemming.
Users really don't expect stemming
// in prefix queries.
- $query = self::replaceAllPartsOfQuery( $query,
'/\w*\*(?:\w*\*?)*/',
+ $query = self::replaceAllPartsOfQuery( $query,
'/\w+\*(?:\w*\*?)*/',
function ( $matches ) use ( $searcher ) {
$term = $searcher->fixupQueryStringPart(
$matches[ 0 ][ 0 ] );
return array( 'escaped' =>
$searcher->switchSearchToExact( $term ) );
@@ -983,10 +983,9 @@
* Make sure the the query string part is well formed by escaping some
syntax that we don't
* want users to get direct access to and making sure quotes are
balanced.
* These special characters _aren't_ escaped:
- * *: Do a prefix or postfix search against the stemmed text which
isn't strictly a good
+ * * and ?: Do a wildcard search against the stemmed text which isn't
strictly a good
* idea but this is so rarely used that adding extra code to flip
prefix searches into
- * real prefix searches isn't really worth it. The same goes for
postfix searches but
- * doubly because we don't have a postfix index (backwards ngram.)
+ * real prefix searches isn't really worth it.
* ~: Do a fuzzy match against the stemmed text which isn't strictly a
good idea but it
* gets the job done and fuzzy matches are a really rarely used feature
to be creating an
* extra index for.
@@ -1051,9 +1050,8 @@
$string = preg_replace_callback( '/(?<![\w"])~/',
'CirrusSearch\Searcher::escapeBadSyntax', $string );
- // Escape ? and * that don't follow a term. These are slow so
we turned them off.
- $string = preg_replace_callback( '/(?<![\w])[?*]/',
- 'CirrusSearch\Searcher::escapeBadSyntax', $string );
+ // Remove ? and * that don't follow a term. These are slow so
we turned them off and escaping isn't working....
+ $string = preg_replace( '/(?<![\w])([?*])/', '', $string );
// Reduce token ranges to bare tokens without the < or >
$string = preg_replace( '/(?:<|>)([^\s])/', '$1', $string );
@@ -1087,7 +1085,7 @@
// acceptable use is "foo -bar" and "-bar foo".
$string = preg_replace_callback( '/[+\-!]+(?!\w)/',
'CirrusSearch\Searcher::escapeBadSyntax', $string );
- $string = preg_replace_callback( '/(?<!^| )[+\-!]+/',
+ $string = preg_replace_callback( '/(?<!^|[ \\\\])[+\-!]+/',
'CirrusSearch\Searcher::escapeBadSyntax', $string );
// Escape || when not between terms
diff --git a/maintenance/updateOneSearchIndexConfig.php
b/maintenance/updateOneSearchIndexConfig.php
index 3e4772f..ae8fb18 100644
--- a/maintenance/updateOneSearchIndexConfig.php
+++ b/maintenance/updateOneSearchIndexConfig.php
@@ -226,7 +226,7 @@
$this->output( "Not supported!\n" );
$this->error( "Only Elasticsearch 1.x is supported.
Your version: $result.", 1 );
} else {
- $this->output( "OK\n" );
+ $this->output( "ok\n" );
}
}
diff --git a/tests/browser/features/bad_syntax.feature
b/tests/browser/features/bad_syntax.feature
index 2112016..61859b4 100644
--- a/tests/browser/features/bad_syntax.feature
+++ b/tests/browser/features/bad_syntax.feature
@@ -6,25 +6,29 @@
@bad_syntax @setup_main
Scenario: Searching for <text>~<text> treats the tilde like a space (finding
a result if the term is correct)
When I search for ffnonesenseword~pickles
- Then Two Words is the first search result
+ Then there is no warning
+ And Two Words is the first search result
And there is a link to create a new page from the search result
@bad_syntax @setup_main
Scenario: Searching for <text>~<text> treats the tilde like a space (not
finding any results if a fuzzy search was needed)
When I search for ffnonesensewor~pickles
- Then there are no search results
+ Then there is no warning
+ And there are no search results
And there is a link to create a new page from the search result
@bad_syntax @exact_quotes @setup_main
Scenario: Searching for "<word> <word>"~<not a numer> treats the ~ as a space
When I search for "ffnonesenseword catapult"~anotherword
- Then Two Words is the first search result
+ Then there is no warning
+ And Two Words is the first search result
And there is no link to create a new page from the search result
@bad_syntax @balance_quotes
Scenario Outline: Searching for for a phrase with a hanging quote adds the
quote automatically
When I search for <term>
- Then Two Words is the first search result
+ Then there is no warning
+ And Two Words is the first search result
And there is no link to create a new page from the search result
Examples:
| term |
@@ -37,7 +41,8 @@
Scenario Outline: Searching for a phrase containing /, :, and \" find the
page as expected
Given a page named <title> exists
When I search for <term>
- Then <title> is the first search result
+ Then there is no warning
+ And <title> is the first search result
And there is no link to create a new page from the search result
Examples:
| term |
title |
@@ -50,7 +55,8 @@
@bad_syntax @boolean_operators
Scenario Outline: boolean operators in bad positions in the query are
ignored so you get the option to create a new page
When I search for <query>
- Then Catapult is in the first search result
+ Then there is no warning
+ And Catapult is in the first search result
And there is a link to create a new page from the search result
Examples:
| query |
@@ -82,11 +88,14 @@
| catapult ~/ |
| catapult ~/ |
| catapult~◆~catapult |
+ | ******* catapult |
+
@bad_syntax @boolean_operators
Scenario Outline: boolean operators in bad positions in the query are
ignored but if there are other valid operators then you don't get the option to
create a new page
When I search for <query>
- Then Catapult is in the first search result
+ Then there is no warning
+ And Catapult is in the first search result
And there is no link to create a new page from the search result
Examples:
| query |
@@ -98,11 +107,15 @@
| T:8~=~¥9:77:7:57;7;76;6346- OR catapult |
| catapult OR T:8~=~¥9:77:7:57;7;76;6346- |
| --- AND catapult |
+ | *catapult* |
+ | ***catapult* |
+ | ****** catapult* |
@bad_syntax @boolean_operators
Scenario Outline: boolean operators in bad positions in the query are
ignored and if the title isn't a valid article title then you don't get the
option to create a new page
When I search for <query>
- Then Catapult is in the first search result
+ Then there is no warning
+ And Catapult is in the first search result
And there is no link to create a new page from the search result
Examples:
| query |
@@ -116,12 +129,14 @@
@bad_syntax
Scenario: searching for NOT something will not crash (technically it should
bring up the most linked document, but this isn't worth checking)
When I search for NOT catapult
- Then there is a search result
+ Then there is no warning
+ And there is a search result
And there is no link to create a new page from the search result
Scenario Outline: searching for less than and greater than doesn't find tons
and tons of tokens
When I search for <query>
- Then there are no search results
+ Then there is no warning
+ And there are no search results
And there is no link to create a new page from the search result
Examples:
| query |
@@ -137,7 +152,8 @@
@bad_syntax @filters
Scenario Outline: Empty filters work like terms but aren't in test data so
aren't found
When I search for <term>
- Then there are no search results
+ Then there is no warning
+ And there are no search results
Examples:
| term |
| intitle:"" catapult |
@@ -152,9 +168,10 @@
@wildcard
Scenario Outline: Wildcards can't start a term but they aren't valid titles
so you still don't get the link to create an article
When I search for <wildcard>ickle
- Then there are no search results
- And there is <link> to create a new page from the search result
+ Then there is no warning
+ And there are no search results
+ And there is a link to create a new page from the search result
Examples:
- | wildcard | link |
- | * | no link |
- | ? | a link |
+ | wildcard |
+ | * |
+ | ? |
diff --git a/tests/browser/features/step_definitions/search_steps.rb
b/tests/browser/features/step_definitions/search_steps.rb
index 37af0f4..d9aba09 100644
--- a/tests/browser/features/step_definitions/search_steps.rb
+++ b/tests/browser/features/step_definitions/search_steps.rb
@@ -212,6 +212,10 @@
end
end
+Then(/^there is no warning$/) do
+ on(SearchResultsPage).warning.should == ''
+end
+
def within(seconds)
end_time = Time.new + Integer(seconds)
begin
diff --git a/tests/browser/features/support/pages/search_results_page.rb
b/tests/browser/features/support/pages/search_results_page.rb
index d7a6840..9702689 100644
--- a/tests/browser/features/support/pages/search_results_page.rb
+++ b/tests/browser/features/support/pages/search_results_page.rb
@@ -29,6 +29,7 @@
div(:suggestion_wrapper, class: "searchdidyoumean")
div(:error_report, class: "error")
paragraph(:create_page, :class => "mw-search-createlink")
+
def suggestion_element
suggestion_wrapper_element.link_element
end
@@ -54,6 +55,16 @@
get_highlighted_text(first_result_alttitle_wrapper_element)
end
end
+ # Note that this is really only useful if Warnings are being echod to the
page. In testing environments they usually are.
+ def warning
+ text = @browser.text
+ if text.start_with?("Warning: ") then
+ return text.slice("Warning: ".length, text.index("\n"))
+ else
+ return ""
+ end
+ end
+
private
def get_highlighted_text(element)
--
To view, visit https://gerrit.wikimedia.org/r/118763
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: Idc120275172b339d4c3fe55ffc72c6e91ca648c4
Gerrit-PatchSet: 1
Gerrit-Project: mediawiki/extensions/CirrusSearch
Gerrit-Branch: master
Gerrit-Owner: Manybubbles <[email protected]>
_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits