jenkins-bot has submitted this change and it was merged.

Change subject: Trim text on the way into elasticsearch
......................................................................


Trim text on the way into elasticsearch

Most article text seems to come up with a hand full of trailing spaces.
Trim it to save a tiny bit of space and time.

Change-Id: If42b751257b9727869f5b9d7b18a5608e3ca421a
---
M includes/CirrusSearchUpdater.php
1 file changed, 1 insertion(+), 0 deletions(-)

Approvals:
  Chad: Looks good to me, approved
  jenkins-bot: Verified



diff --git a/includes/CirrusSearchUpdater.php b/includes/CirrusSearchUpdater.php
index ef4d18d..3737a12 100644
--- a/includes/CirrusSearchUpdater.php
+++ b/includes/CirrusSearchUpdater.php
@@ -253,6 +253,7 @@
                        $parserOutput = $page->getParserOutput( new 
ParserOptions(), $page->getRevision()->getId() );
                        $text = Sanitizer::stripAllTags( SearchEngine::create( 
'CirrusSearch' )
                                ->getTextFromContent( $title, 
$page->getContent(), $parserOutput ) );
+                       $text = trim( $text ); // No need to store the trailing 
spaces in Elasticsearch....
                        $doc->add( 'text', $text );
                        $doc->add( 'text_bytes', strlen( $text ) );
                        $doc->add( 'text_words', str_word_count( $text ) ); // 
It would be better if we could let ES calculate it

-- 
To view, visit https://gerrit.wikimedia.org/r/95071
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: If42b751257b9727869f5b9d7b18a5608e3ca421a
Gerrit-PatchSet: 2
Gerrit-Project: mediawiki/extensions/CirrusSearch
Gerrit-Branch: master
Gerrit-Owner: Manybubbles <never...@wikimedia.org>
Gerrit-Reviewer: Chad <ch...@wikimedia.org>
Gerrit-Reviewer: jenkins-bot

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to