New submission from Serhiy Storchaka:

Currently regular expressions support on '\n' as line boundary. To meet Unicode 
standard requirement RL1.6 [1] all Unicode line separators should be supported: 
'\n', '\r', '\v', '\f', '\x85', '\u2028', '\u2029' and two-character '\r\n'. 
Also it is recommended that '.' in "dotall" mode matches '\r\n'. Also strongly 
recommended to support the '\R' pattern which matches all line separators 
(equivalent to '(?:\\r\n|(?!\r\n)[\n\v\f\r\x85\u2028\u2029]').

>>> [m.start() for m in re.finditer('$', '\r\n\n\r', re.M)]
[1, 2, 4]  # should be [0, 2, 3, 4]
>>> [m.start() for m in re.finditer('^', '\r\n\n\r', re.M)]
[0, 2, 3]  # should be [0, 2, 3, 4]
>>> [m.group() for m in re.finditer('.', '\r\n\n\r', re.M|re.S)]
['\r', '\n', '\n', '\r']  # should be ['\r\n', '\n', '\r']
>>> [m.group() for m in re.finditer(r'\R', '\r\n\n\r')]
[]  # should be ['\r\n', '\n', '\r']

[1] http://www.unicode.org/reports/tr18/#RL1.6

----------
components: Extension Modules, Regular Expressions
messages: 227508
nosy: ezio.melotti, mrabarnett, pitrou, serhiy.storchaka
priority: normal
severity: normal
stage: needs patch
status: open
title: Support Unicode line boundaries in regular expression
type: enhancement
versions: Python 3.5

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue22491>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to