Merlijn van Deen has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/78525


Change subject: Change title whitelist to title blacklist
......................................................................

Change title whitelist to title blacklist

Titles with characters outside the BMP [1] (>\uFFFF) are now no longer
detected as illegal. See this thread: [2]

[1] https://en.wikipedia.org/wiki/Plane_(Unicode)#Basic_Multilingual_Plane
[2] http://thread.gmane.org/gmane.comp.python.pywikipediabot.general/13197/

This list of characters was generated by using the old re and by
enumerating characters:

import re
m = re.compile(u'''[^ 
%!\"$&'()*,\\-.\\/0-9:;=?@A-Z\\\\^_`a-z~\u0080-\uFFFF+]''')
for x in range(0,0x80):
   if m.match(unichr(x)):
         print "%x" % x,

0 1 2 3 4 5 6 7 8 9 a b c d e f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 
23 3c 3e 5b 5d 7b 7c 7d 7f

Change-Id: I02c26be9ad814ce11d9adf2f997d3d1e05764fd1
---
M pywikibot/page.py
1 file changed, 2 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/pywikibot/core 
refs/changes/25/78525/1

diff --git a/pywikibot/page.py b/pywikibot/page.py
index e51977c..2c346b5 100644
--- a/pywikibot/page.py
+++ b/pywikibot/page.py
@@ -2853,8 +2853,8 @@
 
     """
     illegal_titles_pattern = re.compile(
-        # Matching titles will be held as illegal.
-            u'''[^ %!\"$&'()*,\\-.\\/0-9:;=?@A-Z\\\\^_`a-z~\u0080-\uFFFF+]'''
+            # Matching titles will be held as illegal.
+            ur'''[\x00-\x1f\x23\x3c\x3e\x5b\x5d\x7b-\x7f]'''
             # URL percent encoding sequences interfere with the ability
             # to round-trip titles -- you can't link to them consistently.
             u'|%[0-9A-Fa-f]{2}'

-- 
To view, visit https://gerrit.wikimedia.org/r/78525
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I02c26be9ad814ce11d9adf2f997d3d1e05764fd1
Gerrit-PatchSet: 1
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Owner: Merlijn van Deen <valhall...@arctus.nl>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to