Matěj Suchánek has uploaded a new change for review. ( 
https://gerrit.wikimedia.org/r/329761 )

Change subject: Fix and improve default regexes
......................................................................

Fix and improve default regexes

- Remove unneccessary flags.
- Clean up 'header' using multiline.
- Expand 'pre' to support HTML attributes (mostly 'style').
- Update 'property' to support parameters (currently, it supports
  "|from=" but it might support more in the future).
- Localize 'property' and 'invoke' using magic words.

Change-Id: Ib805bf70cb1cc99711138d7d6c7e40971f31b602
---
M pywikibot/textlib.py
1 file changed, 9 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/pywikibot/core 
refs/changes/61/329761/2

diff --git a/pywikibot/textlib.py b/pywikibot/textlib.py
index 9f7782e..2908ee1 100644
--- a/pywikibot/textlib.py
+++ b/pywikibot/textlib.py
@@ -221,13 +221,13 @@
     _regex_cache.update({
         'comment':      re.compile(r'(?s)<!--.*?-->'),
         # section headers
-        'header':       re.compile(r'\r?\n=+.+=+ *\r?\n'),
+        'header':       re.compile(r'(?m)^=+.+=+ *$'),
         # preformatted text
-        'pre':          re.compile(r'(?ism)<pre>.*?</pre>'),
+        'pre':          re.compile(r'(?is)<pre[ >].*?</pre>'),
         'source':       re.compile(r'(?is)<source .*?</source>'),
-        'score':        re.compile(r'(?ism)<score[ >].*?</score>'),
+        'score':        re.compile(r'(?is)<score[ >].*?</score>'),
         # inline references
-        'ref':          re.compile(r'(?ism)<ref[ >].*?</ref>'),
+        'ref':          re.compile(r'(?is)<ref[ >].*?</ref>'),
         'template':     NESTED_TEMPLATE_REGEX,
         # lines that start with a space are shown in a monospace font and
         # have whitespace preserved.
@@ -247,11 +247,13 @@
                              site.validLanguageLinks() +
                              list(site.family.obsolete.keys()))),
         # Wikibase property inclusions
-        'property':     re.compile(r'(?i)\{\{\s*#property:\s*p\d+\s*\}\}'),
+        'property':     (r'(?i)\{\{\s*#(%s):\s*p\d+.*?\}\}',
+                         lambda site: 
'|'.join(site.getmagicwords('property'))),
         # Module invocations (currently only Lua)
-        'invoke':       re.compile(r'(?i)\{\{\s*#invoke:.*?}\}'),
+        'invoke':       (r'(?i)\{\{\s*#(%s):.*?\}\}',
+                         lambda site: '|'.join(site.getmagicwords('invoke'))),
         # categories
-        'category':     ('\[\[ *(?:%s)\s*:.*?\]\]',
+        'category':     (r'\[\[ *(?:%s)\s*:.*?\]\]',
                          lambda site: '|'.join(site.namespaces[14])),
         # files
         'file':         (FILE_LINK_REGEX,

-- 
To view, visit https://gerrit.wikimedia.org/r/329761
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib805bf70cb1cc99711138d7d6c7e40971f31b602
Gerrit-PatchSet: 2
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Owner: Matěj Suchánek <matejsuchane...@gmail.com>
Gerrit-Reviewer: jenkins-bot <>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to