Manybubbles has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/129678

Change subject: Improved experimental highlighter settings
......................................................................

Improved experimental highlighter settings

These settings will improve quality of snippets.  The tests will only pass
with the new 0.0.5 release of the highlighter though.

Bug: 64259
Change-Id: If970dea018eb631af13f561e33f11bbf1d2395b2
(cherry picked from commit ab93e0f2984f94d805c51eb04c89b2acd3a6ee1f)
---
M includes/ResultsType.php
M tests/browser/features/highlighting.feature
2 files changed, 19 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/mediawiki/extensions/CirrusSearch 
refs/changes/78/129678/1

diff --git a/includes/ResultsType.php b/includes/ResultsType.php
index 7e0b8ff..e8df16e 100644
--- a/includes/ResultsType.php
+++ b/includes/ResultsType.php
@@ -188,20 +188,20 @@
                                        'top_scoring' => true,
                                        'boost_before' => array(
                                                // Note these values are super 
arbitrary right now.
-                                               '20' => 8,
-                                               '50' => 7,
-                                               '200' => 4,
-                                               '1000' => 2,
+                                               '20' => 2,
+                                               '50' => 1.8,
+                                               '200' => 1.5,
+                                               '1000' => 1.2,
                                        ),
-                                       // Since the best fragments are 
typically at the beginning of the article
-                                       // any way we can relatively safely 
stop searching for matches after 50
-                                       // fragments.  This should help with 
really crazy documents, say 10MB of
-                                       // "d d".  Without this we'll scan out 
all the "d"s on a search for "d".
-                                       // With it, only the first 50.
-                                       'max_fragments_scored' => 50,
+                                       // We should set a limit on the number 
of fragments we try because if we
+                                       // don't then we'll hit really crazy 
documents, say 10MB of "d d".  This'll
+                                       // keep us from scanning more then the 
first couple thousand of them.
+                                       // Setting this too low (like 50) can 
bury good snippets if the search
+                                       // contains common words.
+                                       'max_fragments_scored' => 5000,
                                ),
                        );
-                       // If there isn't a match just return some of the the 
first few sentences .
+                       // If there isn't a match just return some of the the 
first few characters.
                        $text = $singleFragment;
                        $text[ 'no_match_size' ] = 100;
                } else {
diff --git a/tests/browser/features/highlighting.feature 
b/tests/browser/features/highlighting.feature
index d9c2d15..6b22b98 100644
--- a/tests/browser/features/highlighting.feature
+++ b/tests/browser/features/highlighting.feature
@@ -18,12 +18,9 @@
     | template:test pickle       | Template:Template *Test* | *pickles*        
                                |
     # Verify highlighting the presence of accent squashing
     | Africa test                | *África*                 | for *testing*    
                                |
-    # Verify highlighting on large pages (Bug 52680).
-    # Bug 64259
-    # | "discuss problems of social and cultural importance" | Rashidun 
Caliphate | the faithful gathered to *discuss problems of social and cultural 
importance*. During the caliphate of |
-    # | "discuss problems of social and cultural importance"~ | Rashidun 
Caliphate | the faithful gathered to *discuss problems of social and cultural 
importance*. During the caliphate of |
-    | "discuss problems of social and cultural importance" | Rashidun 
Caliphate | first Khalifa Rasul Allah (Successor *of* the Messenger *of* God), 
*and* embarked on campaigns to propagate |
-    | "discuss problems of social and cultural importance"~ | Rashidun 
Caliphate | first Khalifa Rasul Allah (Successor *of* the Messenger *of* God), 
*and* embarked on campaigns to propagate |
+    # Verify highlighting on large pages.
+    | "discuss problems of social and cultural importance" | Rashidun 
Caliphate | gathered to *discuss* *problems* *of* *social* *and* *cultural* 
*importance*. During the caliphate *of* Umar as many |
+    | "discuss problems of social and cultural importance"~ | Rashidun 
Caliphate | gathered to *discuss* *problems* *of* *social* *and* *cultural* 
*importance*. During the caliphate *of* Umar as many |
     # Auxiliary text
     | tallest alborz             | Rashidun Caliphate       | Mount Damavand, 
Iran's *tallest* mountain is located in *Alborz* mountain range. |
 
@@ -33,9 +30,7 @@
 
   Scenario: Found words are highlighted even if found by different analyzers
     When I search for "threatening the unity" community
-    # Bug 64259
-    # Then *threatening the unity* and stability of the new *community* is in 
the highlighted text of the first search result
-    Then *the* is in the highlighted text of the first search result
+    Then *threatening* *the* *unity* and stability of *the* new *community* is 
in the highlighted text of the first search result
 
   @headings
   Scenario: Found words are highlighted in headings
@@ -48,15 +43,11 @@
 
   Scenario: Found words are highlighted in text even in large documents
     When I search for Allowance to non-Muslims
-    # Bug 64259
-    # Then *Allowance to non-Muslims* is in the highlighted text of the first 
search result
-    Then *to* is in the highlighted text of the first search result
+    Then *Allowance* *to* *non*-*Muslims* is in the highlighted text of the 
first search result
 
   Scenario: Found words are highlighted in text even in large documents
     When I search for "Allowance to non-Muslims"
-    # Bug 64259
-    # Then *Allowance to non-Muslims* is in the highlighted text of the first 
search result
-    Then *to* is in the highlighted text of the first search result
+    Then *Allowance* *to* *non*-*Muslims* is in the highlighted text of the 
first search result
 
   Scenario: Words are not found in image captions
     When I search for The Rose Trellis Egg
@@ -79,9 +70,7 @@
 
   Scenario: Found words are highlighted in headings even in large documents 
when searching in a non-strict phrase
     When I search for "Allowance to non-Muslims"~
-    # Bug 64259
-    # Then *Allowance to non-Muslims* is in the highlighted text of the first 
search result
-    Then *to* is in the highlighted text of the first search result
+    Then *Allowance* *to* *non*-*Muslims* is in the highlighted text of the 
first search result
 
   @headings
   Scenario: The highest scoring heading is highlighted AND it doesn't contain 
html even if the heading on the page does
@@ -109,7 +98,7 @@
     When I search for Rashidun Caliphate
     Then Template:History of the Arab States The *Rashidun* *Caliphate* 
(Template:lang-ar al-khilafat ar-Rāshidīyah) is the highlighted text of the 
first search result
     When I search for caliphs
-    Then al-khilafat ar-Rāshidīyah), comprising the first four *caliphs* in 
Islam's history, was founded after Muhammad's is the highlighted text of the 
first search result
+    Then The first four *caliphs* are called the Rashidun, meaning the Rightly 
Guided *Caliphs*, because they are is the highlighted text of the first search 
result
 
   @references
   Scenario: References don't appear in highlighted section titles

-- 
To view, visit https://gerrit.wikimedia.org/r/129678
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: If970dea018eb631af13f561e33f11bbf1d2395b2
Gerrit-PatchSet: 1
Gerrit-Project: mediawiki/extensions/CirrusSearch
Gerrit-Branch: wmf/1.24wmf2
Gerrit-Owner: Manybubbles <never...@wikimedia.org>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to