Manybubbles has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/98966


Change subject: Make incategory: and intitle: negatable
......................................................................

Make incategory: and intitle: negatable

not-incategory: and not-intitle: negate incategory: and not-incategory:
respectively.

Rework the tests for these filters as well.  Move them to their own file
because there are a bunch of them.  Give them a tag.  Break them into a
a bunch of scenarios so they each get their own name.

Change-Id: Ie3d338a5335e590016112936b106bb11083b265b
---
M includes/CirrusSearchSearcher.php
M tests/browser/features/full_text.feature
A tests/browser/features/full_text_filters.feature
M tests/browser/features/support/hooks.rb
4 files changed, 143 insertions(+), 53 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/mediawiki/extensions/CirrusSearch 
refs/changes/66/98966/1

diff --git a/includes/CirrusSearchSearcher.php 
b/includes/CirrusSearchSearcher.php
index 39ae283..f915796 100644
--- a/includes/CirrusSearchSearcher.php
+++ b/includes/CirrusSearchSearcher.php
@@ -57,7 +57,14 @@
         * @var \Elastica\Query\AbstractQuery|null main query.  null defaults 
to \Elastica\Query\MatchAll
         */
        private $query = null;
+       /**
+        * @var array(\Elastica\Filter\AbstractFilter) filters that MUST hold 
true of all results
+        */
        private $filters = array();
+       /**
+        * @var array(\Elastica\Filter\AbstractFilter) filters that MUST NOT 
hold true of all results
+        */
+       private $notFilters = array();
        private $suggest = null;
        /**
         * @var null|array of rescore configuration as used by elasticsearch.  
The query needs to be an Elastica query.
@@ -188,23 +195,35 @@
                //Handle other filters
                wfProfileIn( __METHOD__ . '-other-filters' );
                $filters = $this->filters;
+               $notFilters = $this->notFilters;
+               // Match filters that look like foobar:thing or foobar:"thing 
thing"
+               // The {7,14} keeps this from having horrible performance on 
big strings
                $term = preg_replace_callback(
-                       '/(?<key>[^ ]{6,10}):(?<value>(?:"[^"]+")|(?:[^ "]+)) 
?/',
-                       function ( $matches ) use ( &$filters ) {
+                       '/(?<key>[a-z\\-]{7,14}):(?<value>(?:"[^"]+")|(?:[^ 
"]+)) ?/',
+                       function ( $matches ) use ( &$filters, &$notFilters ) {
                                $key = $matches['key'];
                                $value = $matches['value'];  // Note that if 
the user supplied quotes they are not removed
+                               $filterDestination = &$filters;
+                               $keepText = true;
                                switch ( $key ) {
+                                       case 'not-incategory':
+                                               $filterDestination = 
&$notFilters;
+                                               // Intentional fall through
                                        case 'incategory':
                                                $match = new 
\Elastica\Query\Match();
                                                $match->setFieldQuery( 
'category', trim( $value, '"' ) );
-                                               $filters[] = new 
\Elastica\Filter\Query( $match );
+                                               $filterDestination[] = new 
\Elastica\Filter\Query( $match );
                                                return '';
+                                       case 'not-intitle':
+                                               $filterDestination = 
&$notFilters;
+                                               $keepText = false;
+                                               // Intentional fall through
                                        case 'intitle':
-                                               $filters[] = new 
\Elastica\Filter\Query( new \Elastica\Query\Field( 'title',
+                                               $filterDestination[] = new 
\Elastica\Filter\Query( new \Elastica\Query\Field( 'title',
                                                        
CirrusSearchSearcher::fixupWholeQueryString(
                                                                
CirrusSearchSearcher::fixupQueryStringPart( $value )
                                                        ) ) );
-                                               return "$value ";
+                                               return $keepText ? "$value " : 
'';
                                        default:
                                                return $matches[0];
                                }
@@ -212,6 +231,7 @@
                        $term
                );
                $this->filters = $filters;
+               $this->notFilters = $notFilters;
                wfProfileOut( __METHOD__ . '-other-filters' );
                wfProfileIn( __METHOD__ . '-switch-phrase-queries-to-plain' );
                // Match quoted phrases including those containing escaped 
quotes
@@ -444,12 +464,16 @@
 
                // Wrap $this->query in a filtered query if there are filters.
                $filterCount = count( $this->filters );
-               if ( $filterCount > 0 ) {
-                       if ( $filterCount > 1 ) {
+               $notFilterCount = count( $this->notFilters );
+               if ( $filterCount > 0 || $notFilterCount > 0 ) {
+                       if ( $filterCount > 1 || $notFilterCount > 0 ) {
                                $filter = new \Elastica\Filter\Bool();
                                foreach ( $this->filters as $must ) {
                                        $filter->addMust( $must );
                                }
+                               foreach ( $this->notFilters as $mustNot ) {
+                                       $filter->addMustNot( $mustNot );
+                               }
                        } else {
                                $filter = $this->filters[ 0 ];
                        }
diff --git a/tests/browser/features/full_text.feature 
b/tests/browser/features/full_text.feature
index ce3b2a4..4d67e73 100644
--- a/tests/browser/features/full_text.feature
+++ b/tests/browser/features/full_text.feature
@@ -13,15 +13,6 @@
     | catapult                             | Catapult is in                    
| in              |        |
     | pickles                              | Two Words is                      
| in              |        |
     | rdir                                 | Two Words is                      
| not in          |        |
-    | intitle:catapult                     | Catapult is in                    
| not in          |        |
-    | intitle:catapult amazing             | Amazing Catapult is               
| not in          |        |
-    | incategory:weaponry                  | Catapult is in                    
| not in          |        |
-    | incategory:weaponry amazing          | Amazing Catapult is               
| not in          |        |
-    | incategory:weaponry intitle:catapult | Catapult is in                    
| not in          |        |
-    | incategory:alpha incategory:beta     | AlphaBeta is                      
| not in          |        |
-    | incategory:twowords catapult         | Two Words is                      
| in              |        |
-    | incategory:twowords intitle:catapult | none is                           
| not in          |        |
-    | incategory:templatetagged two words  | Two Words is                      
| in              |        |
     | talk:catapult                        | Talk:Two Words is                 
| not in          |        |
     | talk:intitle:words                   | Talk:Two Words is                 
| not in          |        |
     | template:pickles                     | Template:Template Test is         
| not in          |        |
@@ -39,8 +30,6 @@
     | "3.1 Conquest of Persian empire"     | none is                           
| not in          |        |
     # You can't search for the [edit] tokens that users can click to edit 
sections
     | "Succession of Umar edit"            | none is                           
| not in          |        |
-    | intitle:"" catapult                  | none is                           
| not in          |        |
-    | incategory:"" catapult               | none is                           
| not in          |        |
 
   @setup_main
   Scenario Outline: Searching for empty-string like values
@@ -52,10 +41,6 @@
     | term                    | title          |
     | the empty string        | Search         |
     | ♙                       | Search results |
-    | intitle:                | Search results |
-    | intitle:""              | Search results |
-    | incategory:             | Search results |
-    | incategory:""           | Search results |
     | %{exact: }              | Search results |
     | %{exact:      }         | Search results |
     | %{exact:              } | Search results |
@@ -103,21 +88,6 @@
   Scenario: Ignored headings aren't searched so text with the same word is wins
     When I search for incategory:HeadingsTest References
     Then HasReferencesInText is the first search result
-
-  @setup_main
-  Scenario: Searching for a quoted category that doesn't exist finds nothing 
even though there is a category that matches one of the words
-    When I search for incategory:"Dontfindme Weaponry"
-    Then there are no search results
-
-  @setup_main
-  Scenario: Searching for a single word category doesn't find a two word 
category that contains that word
-    When I search for incategory:ASpace
-    Then there are no search results
-
-  @setup_main
-  Scenario: Searching for multiword category finds it
-    When I search for incategory:"CategoryWith ASpace"
-    Then IHaveATwoWordCategory is the first search result
 
   @setup_more_like_this
   Scenario: Searching for morelike:<page that doesn't exist> returns no results
@@ -443,16 +413,6 @@
   Scenario: searching with a single wildcard finds expected results
     When I search for catapul*
     Then Catapult is the first search result
-
-  @wildcards @setup_main
-  Scenario: searching for intitle: with a wildcard find expected results
-    When I search for intitle:catapul*
-    Then Catapult is the first search result
-
-  @wildcards @setup_main
-  Scenario: searching for intitle: with a wildcard and a regular wildcard find 
expected results
-    When I search for intitle:catapul* amaz*
-    Then Amazing Catapult is the first search result
 
   @wildcards @setup_main
   Scenario: wildcards match plain matches
diff --git a/tests/browser/features/full_text_filters.feature 
b/tests/browser/features/full_text_filters.feature
new file mode 100644
index 0000000..2efdcd9
--- /dev/null
+++ b/tests/browser/features/full_text_filters.feature
@@ -0,0 +1,98 @@
+Feature: Full text search with filters
+  Background:
+    Given I am at a random page
+
+  @filters
+  Scenario: intitle: only includes pages with the title
+    When I search for intitle:catapult
+    Then Catapult is in the search results
+    And Amazing Catapult is in the search results
+    But Two Words is not in the search results
+
+  @filters
+  Scenario: intitle: can be combined with other text
+    When I search for intitle:catapult amazing
+    Then Amazing Catapult is the first search result
+    And Two Words is not in the search results
+
+  @filters
+  Scenario: not-intitle: excludes pages with part of the title
+    When I search for not-intitle:amazing intitle:catapult
+    Then Catapult is the first search result
+    And Amazing Catapult is not in the search results
+
+  @filters
+  Scenario: not-intitle: doesn't highlight excluded title
+    When I search for not-intitle:catapult two words
+    Then Two Words is the first search result
+    And ffnonesenseword catapult pickles anotherword is the highlighted text 
of the first search result
+
+  @wildcards @filters
+  Scenario: intitle: can take a wildcard
+    When I search for intitle:catapul*
+    Then Catapult is the first search result
+
+  @wildcards @setup_main
+  Scenario: intitle: can take a wildcard and combine it with a regular wildcard
+    When I search for intitle:catapul* amaz*
+    Then Amazing Catapult is the first search result
+
+  @filters
+  Scenario: incategory: only includes pages with the category
+    When I search for incategory:weaponry
+    Then Catapult is in the search results
+    And Amazing Catapult is in the search results
+    But Two Words is not in the search results
+
+  @filters
+  Scenario: incategory: can be combined with other text
+    When I search for incategory:weaponry amazing
+    Then Amazing Catapult is the first search result
+
+  @filters
+  Scenario: not-incategory: excludes pages with the category
+    When I search for not-incategory:weaponry incategory:twowords
+    Then Two Words is the first search result
+
+  @filters
+  Scenario: incategory: works on categories from templates
+    When I search for incategory:templatetagged incategory:twowords
+    Then Two Words is the first search result
+
+  @filters
+  Scenario: incategory: when passed a quoted category that doesn't exist finds 
nothing even though there is a category that matches one of the words
+    When I search for incategory:"Dontfindme Weaponry"
+    Then there are no search results
+
+  @filters
+  Scenario: incategory when passed a single word category doesn't find a two 
word category that contains that word
+    When I search for incategory:ASpace
+    Then there are no search results
+
+  @filters
+  Scenario: incategory: finds a multiword category when it is surrounded by 
quotes
+    When I search for incategory:"CategoryWith ASpace"
+    Then IHaveATwoWordCategory is the first search result
+
+  @filters
+  Scenario Outline: Filters can be combined
+    When I search for <term>
+    Then <first_result> is the first search result
+  Examples:
+    |                  term                   | first_result |
+    | incategory:twowords intitle:catapult    | none         |
+    | incategory:twowords intitle:"Two Words" | Two Words    |
+    | incategory:alpha incategory:beta        | AlphaBeta    |
+
+  @filters
+  Scenario Outline: Empty filters work like terms but aren't in test data so 
aren't found
+    When I search for <term>
+    Then there are no search results
+  Examples:
+    |         term           |
+    | intitle:"" catapult    |
+    | incategory:"" catapult |
+    | intitle:               |
+    | intitle:""             |
+    | incategory:            |
+    | incategory:""          |
diff --git a/tests/browser/features/support/hooks.rb 
b/tests/browser/features/support/hooks.rb
index b2b3684..093422a 100644
--- a/tests/browser/features/support/hooks.rb
+++ b/tests/browser/features/support/hooks.rb
@@ -1,6 +1,6 @@
 # encoding: utf-8
 
-Before("@setup_main") do
+Before('@setup_main, @filters') do
   if !$setup_main
     steps %Q{
       Given a page named Template:Template Test exists with contents pickles 
[[Category:TemplateTagged]]
@@ -9,18 +9,26 @@
       And a page named Catapult exists with contents ♙ asdf 
[[Category:Weaponry]]
       And a page named Amazing Catapult exists with contents test [[Catapult]] 
[[Category:Weaponry]]
       And a page named Two Words exists with contents ffnonesenseword catapult 
{{Template_Test}} anotherword [[Category:TwoWords]]
-      And a page named África exists with contents for testing
-      And a page named Rdir exists with contents #REDIRECT [[Two Words]]
       And a page named AlphaBeta exists with contents [[Category:Alpha]] 
[[Category:Beta]]
-      And a file named File:Savepage-greyed.png exists with contents 
Savepage-greyed.png and description Screenshot, for test purposes, associated 
with https://bugzilla.wikimedia.org/show_bug.cgi?id=52908 .
-      And a page named IHaveAVideo exists with contents [[File:How to Edit 
Article in Arabic Wikipedia.ogg|thumb|267x267px]]
-      And a page named IHaveASound exists with contents [[File:Serenade for 
Strings -mvt-1- Elgar.ogg]]
       And a page named IHaveATwoWordCategory exists with contents 
[[Category:CategoryWith ASpace]]
     }
     $setup_main = true
   end
 end
 
+Before('@setup_main') do
+  if !$setup_main2
+    steps %Q{
+      Given a page named África exists with contents for testing
+      And a page named Rdir exists with contents #REDIRECT [[Two Words]]
+      And a file named File:Savepage-greyed.png exists with contents 
Savepage-greyed.png and description Screenshot, for test purposes, associated 
with https://bugzilla.wikimedia.org/show_bug.cgi?id=52908 .
+      And a page named IHaveAVideo exists with contents [[File:How to Edit 
Article in Arabic Wikipedia.ogg|thumb|267x267px]]
+      And a page named IHaveASound exists with contents [[File:Serenade for 
Strings -mvt-1- Elgar.ogg]]
+    }
+    $setup_main2 = true
+  end
+end
+
 Before("@setup_weight") do
   if !$setup_weight
     steps %Q{

-- 
To view, visit https://gerrit.wikimedia.org/r/98966
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie3d338a5335e590016112936b106bb11083b265b
Gerrit-PatchSet: 1
Gerrit-Project: mediawiki/extensions/CirrusSearch
Gerrit-Branch: master
Gerrit-Owner: Manybubbles <never...@wikimedia.org>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to