jenkins-bot has submitted this change and it was merged.
Change subject: Shrink the weighted all field's positions
......................................................................
Shrink the weighted all field's positions
We need to store position information for the weighted all fields and
right now we store position information for the entire article text twice
and that is big! This has us store it once but at a cost: now file text
and aux text is worth the same as article text. We still highlight it
later which is good.
The disk size reduction is pretty good but where this _should_ really
shine is in using less disk IO. The less disk IO should be worth the
loss of precision.
Change-Id: I84612b829b00aa4790d9c53ce44dc944ea67ea1f
---
M CirrusSearch.php
M tests/browser/features/full_text.feature
M tests/browser/features/relevancy.feature
M tests/browser/features/step_definitions/search_steps.rb
M tests/browser/features/support/hooks.rb
5 files changed, 28 insertions(+), 20 deletions(-)
Approvals:
Chad: Looks good to me, approved
jenkins-bot: Verified
diff --git a/CirrusSearch.php b/CirrusSearch.php
index 243bb0f..c8b7c11 100644
--- a/CirrusSearch.php
+++ b/CirrusSearch.php
@@ -213,14 +213,14 @@
// is false this can be changed on the fly. If it is true then changes to
this require
// an in place reindex to take effect.
$wgCirrusSearchWeights = array(
- 'title' => 40,
- 'redirect' => 30,
- 'category' => 16,
- 'heading' => 10,
- 'opening_text' => 6,
- 'text' => 2,
- 'auxiliary_text' => 1,
- 'file_text' => 1,
+ 'title' => 20,
+ 'redirect' => 15,
+ 'category' => 8,
+ 'heading' => 5,
+ 'opening_text' => 3,
+ 'text' => 1,
+ 'auxiliary_text' => 0.5,
+ 'file_text' => 0.5,
);
// Enable building and using of "all" fields that contain multiple copies of
other fields
diff --git a/tests/browser/features/full_text.feature
b/tests/browser/features/full_text.feature
index 2a6e7e6..5920cdc 100644
--- a/tests/browser/features/full_text.feature
+++ b/tests/browser/features/full_text.feature
@@ -74,7 +74,7 @@
@setup_phrase_rescore
Scenario: Searching for an unquoted phrase finds the phrase first
When I search for Rescore Test Words
- Then Rescore Test Words is the first search result
+ Then Rescore Test Words Chaff is the first search result
@setup_phrase_rescore
Scenario: Searching for a quoted phrase finds higher scored matches before
the whole query interpreted as a phrase
diff --git a/tests/browser/features/relevancy.feature
b/tests/browser/features/relevancy.feature
index 29943d3..c0b9e74 100644
--- a/tests/browser/features/relevancy.feature
+++ b/tests/browser/features/relevancy.feature
@@ -11,8 +11,8 @@
And Relevancytestviacategory is the third search result
And Relevancytestviaheading is the fourth search result
And Relevancytestviaopening is the fifth search result
- And Relevancytestviatext is the sixth search result
- And Relevancytestviaauxtext is the seventh search result
+ And Relevancytestviatext is the sixth or seventh search result
+ And Relevancytestviaauxtext is the sixth or seventh search result
Scenario: Results are sorted based on what part of the page matches: title,
redirect, category, etc
When I search for "Relevancytestphrase phrase"
@@ -22,8 +22,8 @@
And Relevancytestphraseviacategory is the third search result
And Relevancytestphraseviaheading is the fourth search result
And Relevancytestphraseviaopening is the fifth search result
- And Relevancytestphraseviatext is the sixth search result
- And Relevancytestphraseviaauxtext is the seventh search result
+ And Relevancytestphraseviatext is the sixth or seventh search result
+ And Relevancytestphraseviaauxtext is the sixth or seventh search result
Scenario: Words in order are worth more then words out of order
When I search for Relevancytwo Wordtest
diff --git a/tests/browser/features/step_definitions/search_steps.rb
b/tests/browser/features/step_definitions/search_steps.rb
index 3fb6c6f..6374737 100644
--- a/tests/browser/features/step_definitions/search_steps.rb
+++ b/tests/browser/features/step_definitions/search_steps.rb
@@ -115,13 +115,21 @@
Then(/^there is a search result$/) do
on(SearchResultsPage).first_result_element.should exist
end
-Then(/^(.+) is( in)? the ([^ ]+) search result$/) do |title, in_ok, index|
+Then(/^(.+) is( in)? the ((?:[^ ])+(?: or (?:[^ ])+)*) search result$/) do
|title, in_ok, indexes|
on(SearchResultsPage) do |page|
- check_search_result(
- page.send("#{index}_result_wrapper_element"),
- page.send("#{index}_result_element"),
- title,
- in_ok)
+ found = indexes.split(/ or /).any? { |index|
+ begin
+ check_search_result(
+ page.send("#{index}_result_wrapper_element"),
+ page.send("#{index}_result_element"),
+ title,
+ in_ok)
+ true
+ rescue
+ false
+ end
+ }
+ found.should == true
end
end
Then(/^(.*) is( in)? the first search imageresult$/) do |title, in_ok|
diff --git a/tests/browser/features/support/hooks.rb
b/tests/browser/features/support/hooks.rb
index b91b251..e8ca05f 100644
--- a/tests/browser/features/support/hooks.rb
+++ b/tests/browser/features/support/hooks.rb
@@ -211,7 +211,7 @@
Before("@setup_phrase_rescore") do
unless phrase_rescore
steps %(
- Given a page named Rescore Test Words exists
+ Given a page named Rescore Test Words Chaff exists
And a page named Test Words Rescore Rescore Test Words exists
And a page named Rescore Test TextContent exists with contents Chaff
And a page named Rescore Test HasTextContent exists with contents
Rescore Test TextContent
--
To view, visit https://gerrit.wikimedia.org/r/159079
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: I84612b829b00aa4790d9c53ce44dc944ea67ea1f
Gerrit-PatchSet: 1
Gerrit-Project: mediawiki/extensions/CirrusSearch
Gerrit-Branch: master
Gerrit-Owner: Manybubbles <[email protected]>
Gerrit-Reviewer: Chad <[email protected]>
Gerrit-Reviewer: jenkins-bot <>
_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits