[issue10703] Regex 0.1.20101210
Steve Moran s...@uw.edu added the comment: (Forehead slap.) On Tue, 14 Dec 2010, Matthew Barnett wrote: Matthew Barnett pyt...@mrabarnett.plus.com added the comment: The regex module is intended to replace the re module, so its default behaviour is the same: in Python 2, regexes default to matching ASCII, and in Python 3, they default to matching Unicode. If you want to use a regex on a Unicode string in Python 2 then you need to set the Unicode flag, either by providing the UNICODE flag or by putting (?u) in the regex itself. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10703 ___ -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10703 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10703] Regex 0.1.20101210
New submission from Steve Moran s...@uw.edu: The regex package doesn't seem to correctly implement the single grapheme match \X (\P{M}\p{M}*) for pre-Python 3. I'm using the string íi-te (i, U+0301, i, -, t, e -- where U+0301 is Unicode COMBINING ACUTE ACCENT), reading it in from a file to bypass Unicode cp issues in the older IDLEs). s...@x$ python3.1 Python 3.1.2 (r312:79147, May 19 2010, 11:50:28) [GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2 Type help, copyright, credits or license for more information. import regex file = open(test_data, rt, encoding=utf-8) s = file.readline() print (s) íi-te print (g.findall(s)) ['í', 'i', '-', 't', 'e'] * Correct in 3.1 - i+U+0301 considered one grapheme. s...@x$ python2.7 Python 2.7 (r27:82500, Oct 4 2010, 14:49:53) [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 Type help, copyright, credits or license for more information. import codecs import regex file = codecs.open(test_data, r, utf-8) g = regex.compile(\X) s = file.readline() s u'i\u0301i-te' print s.encode(utf-8) íi-te print g.findall(s) [u'i', u'\u0301', u'i', u'-', u't', u'e'] *Not correct -- accent is treated as a separate character. Thanks. -- components: Regular Expressions messages: 123961 nosy: stiv priority: normal severity: normal status: open title: Regex 0.1.20101210 type: behavior versions: Python 2.7 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10703 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10703] Regex 0.1.20101210
Alexander Belopolsky belopol...@users.sourceforge.net added the comment: Regex 0.1.20101210 is not part of the standard Python distribution, so this bug report is invalid. -- nosy: +belopolsky resolution: - invalid status: open - closed superseder: - Regexp 2.7 (modifications to current re 2.2.2) ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10703 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10703] Regex 0.1.20101210
Changes by R. David Murray rdmur...@bitdance.com: -- assignee: - mark.dickinson nosy: +mark.dickinson, mrabarnett ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10703 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10703] Regex 0.1.20101210
Matthew Barnett pyt...@mrabarnett.plus.com added the comment: The regex module is intended to replace the re module, so its default behaviour is the same: in Python 2, regexes default to matching ASCII, and in Python 3, they default to matching Unicode. If you want to use a regex on a Unicode string in Python 2 then you need to set the Unicode flag, either by providing the UNICODE flag or by putting (?u) in the regex itself. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10703 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10703] Regex 0.1.20101210
Changes by R. David Murray rdmur...@bitdance.com: -- assignee: mark.dickinson - nosy: -mark.dickinson ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10703 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com