Merlijn van Deen has uploaded a new change for review. https://gerrit.wikimedia.org/r/78525
Change subject: Change title whitelist to title blacklist ...................................................................... Change title whitelist to title blacklist Titles with characters outside the BMP [1] (>\uFFFF) are now no longer detected as illegal. See this thread: [2] [1] https://en.wikipedia.org/wiki/Plane_(Unicode)#Basic_Multilingual_Plane [2] http://thread.gmane.org/gmane.comp.python.pywikipediabot.general/13197/ This list of characters was generated by using the old re and by enumerating characters: import re m = re.compile(u'''[^ %!\"$&'()*,\\-.\\/0-9:;=?@A-Z\\\\^_`a-z~\u0080-\uFFFF+]''') for x in range(0,0x80): if m.match(unichr(x)): print "%x" % x, 0 1 2 3 4 5 6 7 8 9 a b c d e f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 23 3c 3e 5b 5d 7b 7c 7d 7f Change-Id: I02c26be9ad814ce11d9adf2f997d3d1e05764fd1 --- M pywikibot/page.py 1 file changed, 2 insertions(+), 2 deletions(-) git pull ssh://gerrit.wikimedia.org:29418/pywikibot/core refs/changes/25/78525/1 diff --git a/pywikibot/page.py b/pywikibot/page.py index e51977c..2c346b5 100644 --- a/pywikibot/page.py +++ b/pywikibot/page.py @@ -2853,8 +2853,8 @@ """ illegal_titles_pattern = re.compile( - # Matching titles will be held as illegal. - u'''[^ %!\"$&'()*,\\-.\\/0-9:;=?@A-Z\\\\^_`a-z~\u0080-\uFFFF+]''' + # Matching titles will be held as illegal. + ur'''[\x00-\x1f\x23\x3c\x3e\x5b\x5d\x7b-\x7f]''' # URL percent encoding sequences interfere with the ability # to round-trip titles -- you can't link to them consistently. u'|%[0-9A-Fa-f]{2}' -- To view, visit https://gerrit.wikimedia.org/r/78525 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I02c26be9ad814ce11d9adf2f997d3d1e05764fd1 Gerrit-PatchSet: 1 Gerrit-Project: pywikibot/core Gerrit-Branch: master Gerrit-Owner: Merlijn van Deen <valhall...@arctus.nl> _______________________________________________ MediaWiki-commits mailing list MediaWiki-commits@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits