On Wed, 17 Aug 2022 19:23:02 +0100 MRAB <pyt...@mrabarnett.plus.com> wrote: > >> > >> I do not like introducing escapes which are not supported in other RE > >> implementations. There is a chance of future conflicts. > >> > >> Java broke compatibility in Java 8 by redefining \v from a single > >> vertical tab character to the vertical whitespace class. I am not sure > >> that it is a good example that we should follow, because different > >> semantic of \v in raw and non-raw strings is a potential source of bugs. > >> But with special flag which controls the meaning of \v it may be more safe. > >> > >> Horizontal whitespace can be matched by [ > >> \t\xA0\u1680\u180e\u2000-\u200a\u202f\u205f\u3000] in re or [\t\p{Zs}] > >> in regex. Vertical whitespace can be matched by > >> [\n\x0b\f\r\x85\u2028\u2029]. Note that there is a dedicated Unicode > >> category for horizontal whitespaces (excluding the tab itself), but not > >> for vertical whitespaces, it means that vertical whitespaces are less > >> important. > >> > >> In any case it is simple to introduce special Unicode categories and use > >> \p{ht} and \p{vt} for horizontal and vertical whitespaces. > > > > It's not just Java. Perl supports all 4 of \h, \H, \v and \V. That might > > be why Java 8 changed. > > I've found that Perl has \p{HorizSpace} and \p{VertSpace}, so I'm going > with that.
+1 for special Unicode categories rather than retargetting existing escapes for something else. (also, matching horizontal/vertical whitespace sounds rather unusual) Regards Antoine. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/7XN73YFKX4CGMSZBP7D4D3GOQOQVH5NM/ Code of Conduct: http://python.org/psf/codeofconduct/