Manybubbles has uploaded a new change for review. https://gerrit.wikimedia.org/r/98966
Change subject: Make incategory: and intitle: negatable ...................................................................... Make incategory: and intitle: negatable not-incategory: and not-intitle: negate incategory: and not-incategory: respectively. Rework the tests for these filters as well. Move them to their own file because there are a bunch of them. Give them a tag. Break them into a a bunch of scenarios so they each get their own name. Change-Id: Ie3d338a5335e590016112936b106bb11083b265b --- M includes/CirrusSearchSearcher.php M tests/browser/features/full_text.feature A tests/browser/features/full_text_filters.feature M tests/browser/features/support/hooks.rb 4 files changed, 143 insertions(+), 53 deletions(-) git pull ssh://gerrit.wikimedia.org:29418/mediawiki/extensions/CirrusSearch refs/changes/66/98966/1 diff --git a/includes/CirrusSearchSearcher.php b/includes/CirrusSearchSearcher.php index 39ae283..f915796 100644 --- a/includes/CirrusSearchSearcher.php +++ b/includes/CirrusSearchSearcher.php @@ -57,7 +57,14 @@ * @var \Elastica\Query\AbstractQuery|null main query. null defaults to \Elastica\Query\MatchAll */ private $query = null; + /** + * @var array(\Elastica\Filter\AbstractFilter) filters that MUST hold true of all results + */ private $filters = array(); + /** + * @var array(\Elastica\Filter\AbstractFilter) filters that MUST NOT hold true of all results + */ + private $notFilters = array(); private $suggest = null; /** * @var null|array of rescore configuration as used by elasticsearch. The query needs to be an Elastica query. @@ -188,23 +195,35 @@ //Handle other filters wfProfileIn( __METHOD__ . '-other-filters' ); $filters = $this->filters; + $notFilters = $this->notFilters; + // Match filters that look like foobar:thing or foobar:"thing thing" + // The {7,14} keeps this from having horrible performance on big strings $term = preg_replace_callback( - '/(?<key>[^ ]{6,10}):(?<value>(?:"[^"]+")|(?:[^ "]+)) ?/', - function ( $matches ) use ( &$filters ) { + '/(?<key>[a-z\\-]{7,14}):(?<value>(?:"[^"]+")|(?:[^ "]+)) ?/', + function ( $matches ) use ( &$filters, &$notFilters ) { $key = $matches['key']; $value = $matches['value']; // Note that if the user supplied quotes they are not removed + $filterDestination = &$filters; + $keepText = true; switch ( $key ) { + case 'not-incategory': + $filterDestination = &$notFilters; + // Intentional fall through case 'incategory': $match = new \Elastica\Query\Match(); $match->setFieldQuery( 'category', trim( $value, '"' ) ); - $filters[] = new \Elastica\Filter\Query( $match ); + $filterDestination[] = new \Elastica\Filter\Query( $match ); return ''; + case 'not-intitle': + $filterDestination = &$notFilters; + $keepText = false; + // Intentional fall through case 'intitle': - $filters[] = new \Elastica\Filter\Query( new \Elastica\Query\Field( 'title', + $filterDestination[] = new \Elastica\Filter\Query( new \Elastica\Query\Field( 'title', CirrusSearchSearcher::fixupWholeQueryString( CirrusSearchSearcher::fixupQueryStringPart( $value ) ) ) ); - return "$value "; + return $keepText ? "$value " : ''; default: return $matches[0]; } @@ -212,6 +231,7 @@ $term ); $this->filters = $filters; + $this->notFilters = $notFilters; wfProfileOut( __METHOD__ . '-other-filters' ); wfProfileIn( __METHOD__ . '-switch-phrase-queries-to-plain' ); // Match quoted phrases including those containing escaped quotes @@ -444,12 +464,16 @@ // Wrap $this->query in a filtered query if there are filters. $filterCount = count( $this->filters ); - if ( $filterCount > 0 ) { - if ( $filterCount > 1 ) { + $notFilterCount = count( $this->notFilters ); + if ( $filterCount > 0 || $notFilterCount > 0 ) { + if ( $filterCount > 1 || $notFilterCount > 0 ) { $filter = new \Elastica\Filter\Bool(); foreach ( $this->filters as $must ) { $filter->addMust( $must ); } + foreach ( $this->notFilters as $mustNot ) { + $filter->addMustNot( $mustNot ); + } } else { $filter = $this->filters[ 0 ]; } diff --git a/tests/browser/features/full_text.feature b/tests/browser/features/full_text.feature index ce3b2a4..4d67e73 100644 --- a/tests/browser/features/full_text.feature +++ b/tests/browser/features/full_text.feature @@ -13,15 +13,6 @@ | catapult | Catapult is in | in | | | pickles | Two Words is | in | | | rdir | Two Words is | not in | | - | intitle:catapult | Catapult is in | not in | | - | intitle:catapult amazing | Amazing Catapult is | not in | | - | incategory:weaponry | Catapult is in | not in | | - | incategory:weaponry amazing | Amazing Catapult is | not in | | - | incategory:weaponry intitle:catapult | Catapult is in | not in | | - | incategory:alpha incategory:beta | AlphaBeta is | not in | | - | incategory:twowords catapult | Two Words is | in | | - | incategory:twowords intitle:catapult | none is | not in | | - | incategory:templatetagged two words | Two Words is | in | | | talk:catapult | Talk:Two Words is | not in | | | talk:intitle:words | Talk:Two Words is | not in | | | template:pickles | Template:Template Test is | not in | | @@ -39,8 +30,6 @@ | "3.1 Conquest of Persian empire" | none is | not in | | # You can't search for the [edit] tokens that users can click to edit sections | "Succession of Umar edit" | none is | not in | | - | intitle:"" catapult | none is | not in | | - | incategory:"" catapult | none is | not in | | @setup_main Scenario Outline: Searching for empty-string like values @@ -52,10 +41,6 @@ | term | title | | the empty string | Search | | ♙ | Search results | - | intitle: | Search results | - | intitle:"" | Search results | - | incategory: | Search results | - | incategory:"" | Search results | | %{exact: } | Search results | | %{exact: } | Search results | | %{exact: } | Search results | @@ -103,21 +88,6 @@ Scenario: Ignored headings aren't searched so text with the same word is wins When I search for incategory:HeadingsTest References Then HasReferencesInText is the first search result - - @setup_main - Scenario: Searching for a quoted category that doesn't exist finds nothing even though there is a category that matches one of the words - When I search for incategory:"Dontfindme Weaponry" - Then there are no search results - - @setup_main - Scenario: Searching for a single word category doesn't find a two word category that contains that word - When I search for incategory:ASpace - Then there are no search results - - @setup_main - Scenario: Searching for multiword category finds it - When I search for incategory:"CategoryWith ASpace" - Then IHaveATwoWordCategory is the first search result @setup_more_like_this Scenario: Searching for morelike:<page that doesn't exist> returns no results @@ -443,16 +413,6 @@ Scenario: searching with a single wildcard finds expected results When I search for catapul* Then Catapult is the first search result - - @wildcards @setup_main - Scenario: searching for intitle: with a wildcard find expected results - When I search for intitle:catapul* - Then Catapult is the first search result - - @wildcards @setup_main - Scenario: searching for intitle: with a wildcard and a regular wildcard find expected results - When I search for intitle:catapul* amaz* - Then Amazing Catapult is the first search result @wildcards @setup_main Scenario: wildcards match plain matches diff --git a/tests/browser/features/full_text_filters.feature b/tests/browser/features/full_text_filters.feature new file mode 100644 index 0000000..2efdcd9 --- /dev/null +++ b/tests/browser/features/full_text_filters.feature @@ -0,0 +1,98 @@ +Feature: Full text search with filters + Background: + Given I am at a random page + + @filters + Scenario: intitle: only includes pages with the title + When I search for intitle:catapult + Then Catapult is in the search results + And Amazing Catapult is in the search results + But Two Words is not in the search results + + @filters + Scenario: intitle: can be combined with other text + When I search for intitle:catapult amazing + Then Amazing Catapult is the first search result + And Two Words is not in the search results + + @filters + Scenario: not-intitle: excludes pages with part of the title + When I search for not-intitle:amazing intitle:catapult + Then Catapult is the first search result + And Amazing Catapult is not in the search results + + @filters + Scenario: not-intitle: doesn't highlight excluded title + When I search for not-intitle:catapult two words + Then Two Words is the first search result + And ffnonesenseword catapult pickles anotherword is the highlighted text of the first search result + + @wildcards @filters + Scenario: intitle: can take a wildcard + When I search for intitle:catapul* + Then Catapult is the first search result + + @wildcards @setup_main + Scenario: intitle: can take a wildcard and combine it with a regular wildcard + When I search for intitle:catapul* amaz* + Then Amazing Catapult is the first search result + + @filters + Scenario: incategory: only includes pages with the category + When I search for incategory:weaponry + Then Catapult is in the search results + And Amazing Catapult is in the search results + But Two Words is not in the search results + + @filters + Scenario: incategory: can be combined with other text + When I search for incategory:weaponry amazing + Then Amazing Catapult is the first search result + + @filters + Scenario: not-incategory: excludes pages with the category + When I search for not-incategory:weaponry incategory:twowords + Then Two Words is the first search result + + @filters + Scenario: incategory: works on categories from templates + When I search for incategory:templatetagged incategory:twowords + Then Two Words is the first search result + + @filters + Scenario: incategory: when passed a quoted category that doesn't exist finds nothing even though there is a category that matches one of the words + When I search for incategory:"Dontfindme Weaponry" + Then there are no search results + + @filters + Scenario: incategory when passed a single word category doesn't find a two word category that contains that word + When I search for incategory:ASpace + Then there are no search results + + @filters + Scenario: incategory: finds a multiword category when it is surrounded by quotes + When I search for incategory:"CategoryWith ASpace" + Then IHaveATwoWordCategory is the first search result + + @filters + Scenario Outline: Filters can be combined + When I search for <term> + Then <first_result> is the first search result + Examples: + | term | first_result | + | incategory:twowords intitle:catapult | none | + | incategory:twowords intitle:"Two Words" | Two Words | + | incategory:alpha incategory:beta | AlphaBeta | + + @filters + Scenario Outline: Empty filters work like terms but aren't in test data so aren't found + When I search for <term> + Then there are no search results + Examples: + | term | + | intitle:"" catapult | + | incategory:"" catapult | + | intitle: | + | intitle:"" | + | incategory: | + | incategory:"" | diff --git a/tests/browser/features/support/hooks.rb b/tests/browser/features/support/hooks.rb index b2b3684..093422a 100644 --- a/tests/browser/features/support/hooks.rb +++ b/tests/browser/features/support/hooks.rb @@ -1,6 +1,6 @@ # encoding: utf-8 -Before("@setup_main") do +Before('@setup_main, @filters') do if !$setup_main steps %Q{ Given a page named Template:Template Test exists with contents pickles [[Category:TemplateTagged]] @@ -9,18 +9,26 @@ And a page named Catapult exists with contents ♙ asdf [[Category:Weaponry]] And a page named Amazing Catapult exists with contents test [[Catapult]] [[Category:Weaponry]] And a page named Two Words exists with contents ffnonesenseword catapult {{Template_Test}} anotherword [[Category:TwoWords]] - And a page named África exists with contents for testing - And a page named Rdir exists with contents #REDIRECT [[Two Words]] And a page named AlphaBeta exists with contents [[Category:Alpha]] [[Category:Beta]] - And a file named File:Savepage-greyed.png exists with contents Savepage-greyed.png and description Screenshot, for test purposes, associated with https://bugzilla.wikimedia.org/show_bug.cgi?id=52908 . - And a page named IHaveAVideo exists with contents [[File:How to Edit Article in Arabic Wikipedia.ogg|thumb|267x267px]] - And a page named IHaveASound exists with contents [[File:Serenade for Strings -mvt-1- Elgar.ogg]] And a page named IHaveATwoWordCategory exists with contents [[Category:CategoryWith ASpace]] } $setup_main = true end end +Before('@setup_main') do + if !$setup_main2 + steps %Q{ + Given a page named África exists with contents for testing + And a page named Rdir exists with contents #REDIRECT [[Two Words]] + And a file named File:Savepage-greyed.png exists with contents Savepage-greyed.png and description Screenshot, for test purposes, associated with https://bugzilla.wikimedia.org/show_bug.cgi?id=52908 . + And a page named IHaveAVideo exists with contents [[File:How to Edit Article in Arabic Wikipedia.ogg|thumb|267x267px]] + And a page named IHaveASound exists with contents [[File:Serenade for Strings -mvt-1- Elgar.ogg]] + } + $setup_main2 = true + end +end + Before("@setup_weight") do if !$setup_weight steps %Q{ -- To view, visit https://gerrit.wikimedia.org/r/98966 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ie3d338a5335e590016112936b106bb11083b265b Gerrit-PatchSet: 1 Gerrit-Project: mediawiki/extensions/CirrusSearch Gerrit-Branch: master Gerrit-Owner: Manybubbles <never...@wikimedia.org> _______________________________________________ MediaWiki-commits mailing list MediaWiki-commits@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits