Lewis Gaul added the comment:
Hi there, I'm running 'EnHackathon' in a couple of weeks, and was wondering if
this could be a good issue for a small team of first-time contributors with
experience in C to work on.
Would anyone be able to offer any guidance for where to start in
Zackery Spytz added the comment:
> To meet Unicode standard requirement RL1.6 [1] all Unicode line separators
> should be supported:
It seems that large portions of Modules/_sre.c would have to be rewritten in
order to do this.
--
nosy: +ZackerySpytz
New submission from Serhiy Storchaka:
Currently regular expressions support on '\n' as line boundary. To meet Unicode
standard requirement RL1.6 [1] all Unicode line separators should be supported:
'\n', '\r', '\v', '\f', '\x85', '\u2028', '\u2029' and two-character '\r\n'.
Also it is
Matthew Barnett added the comment:
For reference, the regex module normally considers the line ending to be '\n',
but it has a WORD flag ('(?w)') that turns on the Unicode definition of a
'word' character as well as Unicode line separator.
--
___