[issue22491] Support Unicode line boundaries in regular expression

2019-10-27 Thread Lewis Gaul
Lewis Gaul added the comment: Hi there, I'm running 'EnHackathon' in a couple of weeks, and was wondering if this could be a good issue for a small team of first-time contributors with experience in C to work on. Would anyone be able to offer any guidance for where to start in

[issue22491] Support Unicode line boundaries in regular expression

2019-07-22 Thread Zackery Spytz
Zackery Spytz added the comment: > To meet Unicode standard requirement RL1.6 [1] all Unicode line separators > should be supported: It seems that large portions of Modules/_sre.c would have to be rewritten in order to do this. -- nosy: +ZackerySpytz

[issue22491] Support Unicode line boundaries in regular expression

2014-09-25 Thread Serhiy Storchaka
New submission from Serhiy Storchaka: Currently regular expressions support on '\n' as line boundary. To meet Unicode standard requirement RL1.6 [1] all Unicode line separators should be supported: '\n', '\r', '\v', '\f', '\x85', '\u2028', '\u2029' and two-character '\r\n'. Also it is

[issue22491] Support Unicode line boundaries in regular expression

2014-09-25 Thread Matthew Barnett
Matthew Barnett added the comment: For reference, the regex module normally considers the line ending to be '\n', but it has a WORD flag ('(?w)') that turns on the Unicode definition of a 'word' character as well as Unicode line separator. -- ___