Manybubbles has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/127629

Change subject: Limit number of fragments scored if possible
......................................................................

Limit number of fragments scored if possible

Since the best fragments are typically at the beginning of the article any
way we can relatively safely stop searching for matches after 50 fragments.
This should help with really crazy documents, say 10MB of "d d".  Without
this we'll scan out all the "d"s on a search for "d". With it, only the
first 50.

Only works for experimental highlighter because, well, only the experimental
highlighter has this feature.

Change-Id: I09b9718ee84fb4cf30178b9b2949f55513b3f448
---
M includes/ResultsType.php
1 file changed, 7 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/mediawiki/extensions/CirrusSearch 
refs/changes/29/127629/1

diff --git a/includes/ResultsType.php b/includes/ResultsType.php
index 9cd8179..12c3db7 100644
--- a/includes/ResultsType.php
+++ b/includes/ResultsType.php
@@ -4,7 +4,7 @@
 use \Title;
 
 /**
- * Lightweight classes to describe specific result types we can return
+ * Lightweight classes to describe specific result types we can return.
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -181,7 +181,6 @@
                                'fragmenter' => 'scan',
                                'fragment_size' => 100,
                                'options' => array(
-                                       'locale' => wfGetLangObj()->getCode(),
                                        'top_scoring' => true,
                                        'boost_before' => array(
                                                // Note these values are super 
arbitrary right now.
@@ -190,6 +189,12 @@
                                                '200' => 4,
                                                '1000' => 2,
                                        ),
+                                       // Since the best fragments are 
typically at the beginning of the article
+                                       // any way we can relatively safely 
stop searching for matches after 50
+                                       // fragments.  This should help with 
really crazy documents, say 10MB of
+                                       // "d d".  Without this we'll scan out 
all the "d"s on a search for "d".
+                                       // With it, only the first 50.
+                                       'max_fragments_scored' => 50,
                                ),
                        );
                        // If there isn't a match just return some of the the 
first few sentences .

-- 
To view, visit https://gerrit.wikimedia.org/r/127629
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I09b9718ee84fb4cf30178b9b2949f55513b3f448
Gerrit-PatchSet: 1
Gerrit-Project: mediawiki/extensions/CirrusSearch
Gerrit-Branch: master
Gerrit-Owner: Manybubbles <never...@wikimedia.org>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to