New submission from Serhiy Storchaka: Currently regular expressions support on '\n' as line boundary. To meet Unicode standard requirement RL1.6 [1] all Unicode line separators should be supported: '\n', '\r', '\v', '\f', '\x85', '\u2028', '\u2029' and two-character '\r\n'. Also it is recommended that '.' in "dotall" mode matches '\r\n'. Also strongly recommended to support the '\R' pattern which matches all line separators (equivalent to '(?:\\r\n|(?!\r\n)[\n\v\f\r\x85\u2028\u2029]').
>>> [m.start() for m in re.finditer('$', '\r\n\n\r', re.M)] [1, 2, 4] # should be [0, 2, 3, 4] >>> [m.start() for m in re.finditer('^', '\r\n\n\r', re.M)] [0, 2, 3] # should be [0, 2, 3, 4] >>> [m.group() for m in re.finditer('.', '\r\n\n\r', re.M|re.S)] ['\r', '\n', '\n', '\r'] # should be ['\r\n', '\n', '\r'] >>> [m.group() for m in re.finditer(r'\R', '\r\n\n\r')] [] # should be ['\r\n', '\n', '\r'] [1] http://www.unicode.org/reports/tr18/#RL1.6 ---------- components: Extension Modules, Regular Expressions messages: 227508 nosy: ezio.melotti, mrabarnett, pitrou, serhiy.storchaka priority: normal severity: normal stage: needs patch status: open title: Support Unicode line boundaries in regular expression type: enhancement versions: Python 3.5 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue22491> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com