On Wed, 17 Aug 2022 19:23:02 +0100
MRAB <pyt...@mrabarnett.plus.com> wrote:
> >> 
> >> I do not like introducing escapes which are not supported in other RE
> >> implementations. There is a chance of future conflicts.
> >> 
> >> Java broke compatibility in Java 8 by redefining \v from a single
> >> vertical tab character to the vertical whitespace class. I am not sure
> >> that it is a good example that we should follow, because different
> >> semantic of \v in raw and non-raw strings is a potential source of bugs.
> >> But with special flag which controls the meaning of \v it may be more safe.
> >> 
> >> Horizontal whitespace can be matched by [
> >> \t\xA0\u1680\u180e\u2000-\u200a\u202f\u205f\u3000] in re or [\t\p{Zs}]
> >> in regex. Vertical whitespace can be matched by
> >> [\n\x0b\f\r\x85\u2028\u2029]. Note that there is a dedicated Unicode
> >> category for horizontal whitespaces (excluding the tab itself), but not
> >> for vertical whitespaces, it means that vertical whitespaces are less
> >> important.
> >> 
> >> In any case it is simple to introduce special Unicode categories and use
> >> \p{ht} and \p{vt} for horizontal and vertical whitespaces.  
>  >
> > It's not just Java. Perl supports all 4 of \h, \H, \v and \V. That might 
> > be why Java 8 changed.
> > I've found that Perl has \p{HorizSpace} and \p{VertSpace}, so I'm going   
> with that.

+1 for special Unicode categories rather than retargetting existing
escapes for something else.

(also, matching horizontal/vertical whitespace sounds rather unusual)

Regards

Antoine.


_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/7XN73YFKX4CGMSZBP7D4D3GOQOQVH5NM/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to