Benjavalero has uploaded a new change for review.
https://gerrit.wikimedia.org/r/181360
Change subject: textlib: Improve replace algorithm
......................................................................
textlib: Improve replace algorithm
The current replace algorithm searches several times the same
regular expressions in the text. This patch reduces the times
the regex are run, improving the performance for long text pages.
Change-Id: I6f4fb9e757c1be55bb3217c57106e2b33a87f910
---
M pywikibot/textlib.py
1 file changed, 15 insertions(+), 13 deletions(-)
git pull ssh://gerrit.wikimedia.org:29418/pywikibot/core
refs/changes/60/181360/1
diff --git a/pywikibot/textlib.py b/pywikibot/textlib.py
index d37f654..5961b3d 100644
--- a/pywikibot/textlib.py
+++ b/pywikibot/textlib.py
@@ -216,26 +216,28 @@
inside[count] = item
index = 0
markerpos = len(text)
+
+ # Pre-calculate all the exception matches
+ exceptionMatches = []
+ for dontTouchR in dontTouchRegexes:
+ for exceptionMatch in dontTouchR.finditer(text):
+ exceptionMatches.append(exceptionMatch)
+
while True:
match = old.search(text, index)
if not match:
# nothing left to replace
break
- # check which exception will occur next.
- nextExceptionMatch = None
- for dontTouchR in dontTouchRegexes:
- excMatch = dontTouchR.search(text, index)
- if excMatch and (
- nextExceptionMatch is None or
- excMatch.start() < nextExceptionMatch.start()):
- nextExceptionMatch = excMatch
+ # Check if the match is included in any exception
+ matchInException = False
+ for excMatch in exceptionMatches:
+ if (excMatch.start() <= match.start() and excMatch.end() >=
match.end()):
+ matchInException = True
+ break
- if nextExceptionMatch is not None \
- and nextExceptionMatch.start() <= match.start():
- # an HTML comment or text in nowiki tags stands before the next
- # valid match. Skip.
- index = nextExceptionMatch.end()
+ if matchInException:
+ index = match.end()
else:
# We found a valid match. Replace it.
if callable(new):
--
To view, visit https://gerrit.wikimedia.org/r/181360
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I6f4fb9e757c1be55bb3217c57106e2b33a87f910
Gerrit-PatchSet: 1
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Owner: Benjavalero <[email protected]>
_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits