[MediaWiki-commits] [Gerrit] Shrink the weighted all field's positions - change (mediawiki...CirrusSearch)

jenkins-bot (Code Review) Mon, 08 Sep 2014 14:32:10 -0700

jenkins-bot has submitted this change and it was merged.

Change subject: Shrink the weighted all field's positions
......................................................................



Shrink the weighted all field's positions

We need to store position information for the weighted all fields and
right now we store position information for the entire article text twice
and that is big!  This has us store it once but at a cost: now file text
and aux text is worth the same as article text.  We still highlight it
later which is good.

The disk size reduction is pretty good but where this _should_ really
shine is in using less disk IO.  The less disk IO should be worth the
loss of precision.

Change-Id: I84612b829b00aa4790d9c53ce44dc944ea67ea1f
---
M CirrusSearch.php
M tests/browser/features/full_text.feature
M tests/browser/features/relevancy.feature
M tests/browser/features/step_definitions/search_steps.rb
M tests/browser/features/support/hooks.rb
5 files changed, 28 insertions(+), 20 deletions(-)

Approvals:
  Chad: Looks good to me, approved
  jenkins-bot: Verified



diff --git a/CirrusSearch.php b/CirrusSearch.php
index 243bb0f..c8b7c11 100644
--- a/CirrusSearch.php
+++ b/CirrusSearch.php
@@ -213,14 +213,14 @@
 // is false this can be changed on the fly.  If it is true then changes to 
this require
 // an in place reindex to take effect.
 $wgCirrusSearchWeights = array(
-       'title' => 40,
-       'redirect' => 30,
-       'category' => 16,
-       'heading' => 10,
-       'opening_text' => 6,
-       'text' => 2,
-       'auxiliary_text' => 1,
-       'file_text' => 1,
+       'title' => 20,
+       'redirect' => 15,
+       'category' => 8,
+       'heading' => 5,
+       'opening_text' => 3,
+       'text' => 1,
+       'auxiliary_text' => 0.5,
+       'file_text' => 0.5,
 );
 
 // Enable building and using of "all" fields that contain multiple copies of 
other fields
diff --git a/tests/browser/features/full_text.feature 
b/tests/browser/features/full_text.feature
index 2a6e7e6..5920cdc 100644
--- a/tests/browser/features/full_text.feature
+++ b/tests/browser/features/full_text.feature
@@ -74,7 +74,7 @@
   @setup_phrase_rescore
   Scenario: Searching for an unquoted phrase finds the phrase first
     When I search for Rescore Test Words
-    Then Rescore Test Words is the first search result
+    Then Rescore Test Words Chaff is the first search result
 
   @setup_phrase_rescore
   Scenario: Searching for a quoted phrase finds higher scored matches before 
the whole query interpreted as a phrase
diff --git a/tests/browser/features/relevancy.feature 
b/tests/browser/features/relevancy.feature
index 29943d3..c0b9e74 100644
--- a/tests/browser/features/relevancy.feature
+++ b/tests/browser/features/relevancy.feature
@@ -11,8 +11,8 @@
     And Relevancytestviacategory is the third search result
     And Relevancytestviaheading is the fourth search result
     And Relevancytestviaopening is the fifth search result
-    And Relevancytestviatext is the sixth search result
-    And Relevancytestviaauxtext is the seventh search result
+    And Relevancytestviatext is the sixth or seventh search result
+    And Relevancytestviaauxtext is the sixth or seventh search result
 
   Scenario: Results are sorted based on what part of the page matches: title, 
redirect, category, etc
     When I search for "Relevancytestphrase phrase"
@@ -22,8 +22,8 @@
     And Relevancytestphraseviacategory is the third search result
     And Relevancytestphraseviaheading is the fourth search result
     And Relevancytestphraseviaopening is the fifth search result
-    And Relevancytestphraseviatext is the sixth search result
-    And Relevancytestphraseviaauxtext is the seventh search result
+    And Relevancytestphraseviatext is the sixth or seventh search result
+    And Relevancytestphraseviaauxtext is the sixth or seventh search result
 
   Scenario: Words in order are worth more then words out of order
     When I search for Relevancytwo Wordtest
diff --git a/tests/browser/features/step_definitions/search_steps.rb 
b/tests/browser/features/step_definitions/search_steps.rb
index 3fb6c6f..6374737 100644
--- a/tests/browser/features/step_definitions/search_steps.rb
+++ b/tests/browser/features/step_definitions/search_steps.rb
@@ -115,13 +115,21 @@
 Then(/^there is a search result$/) do
   on(SearchResultsPage).first_result_element.should exist
 end
-Then(/^(.+) is( in)? the ([^ ]+) search result$/) do |title, in_ok, index|
+Then(/^(.+) is( in)? the ((?:[^ ])+(?: or (?:[^ ])+)*) search result$/) do 
|title, in_ok, indexes|
   on(SearchResultsPage) do |page|
-    check_search_result(
-      page.send("#{index}_result_wrapper_element"),
-      page.send("#{index}_result_element"),
-      title,
-      in_ok)
+    found = indexes.split(/ or /).any? { |index|
+      begin
+        check_search_result(
+          page.send("#{index}_result_wrapper_element"),
+          page.send("#{index}_result_element"),
+          title,
+          in_ok)
+        true
+      rescue
+        false
+      end
+    }
+    found.should == true
   end
 end
 Then(/^(.*) is( in)? the first search imageresult$/) do |title, in_ok|
diff --git a/tests/browser/features/support/hooks.rb 
b/tests/browser/features/support/hooks.rb
index b91b251..e8ca05f 100644
--- a/tests/browser/features/support/hooks.rb
+++ b/tests/browser/features/support/hooks.rb
@@ -211,7 +211,7 @@
 Before("@setup_phrase_rescore") do
   unless phrase_rescore
     steps %(
-      Given a page named Rescore Test Words exists
+      Given a page named Rescore Test Words Chaff exists
       And a page named Test Words Rescore Rescore Test Words exists
       And a page named Rescore Test TextContent exists with contents Chaff
       And a page named Rescore Test HasTextContent exists with contents 
Rescore Test TextContent

-- 
To view, visit https://gerrit.wikimedia.org/r/159079
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I84612b829b00aa4790d9c53ce44dc944ea67ea1f
Gerrit-PatchSet: 1
Gerrit-Project: mediawiki/extensions/CirrusSearch
Gerrit-Branch: master
Gerrit-Owner: Manybubbles <[email protected]>
Gerrit-Reviewer: Chad <[email protected]>
Gerrit-Reviewer: jenkins-bot <>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

[MediaWiki-commits] [Gerrit] Shrink the weighted all field's positions - change (mediawiki...CirrusSearch)

Reply via email to