I put the actual space characters here so you can see them in a non-proportional font (which I assume most Python programmer use).
https://gist.github.com/stephanh42/7c1c122154fd3f2cccc6d864233a40d8 The control characters aren't rendered at all (Vim renders them as ^\ ^] ^^ ^_, respectively). Most of the other spaces are rendered exactly like the normal space. The only ones which render differently are U+1680 | | OGHAM SPACE MARK U+3000 | | IDEOGRAPHIC SPACE I understand Ogham has recently (since 6th century CE) seen a decline in popularity. However, I think Python should totally adopt U+3000 as a new whitespace character and start promoting it as the One True Way to indent code, so as to finally end the age-old spaces vs tabs conflict. [That was supposed to be a joke.] Stephan 2017-11-17 16:38 GMT+01:00 Victor Stinner <victor.stin...@gmail.com>: > I don't think that we need more than space (U+0020) and Unix newline > (U+000A) ;-) > > Victor > > 2017-11-16 11:23 GMT+01:00 Serhiy Storchaka <storch...@gmail.com>: > > Currently the re module ignores only 6 ASCII whitespaces in the > re.VERBOSE > > mode: > > > > U+0009 CHARACTER TABULATION > > U+000A LINE FEED > > U+000B LINE TABULATION > > U+000C FORM FEED > > U+000D CARRIAGE RETURN > > U+0020 SPACE > > > > Perl ignores characters that Unicode calls "Pattern White Space" in the > /x > > mode. It ignores additional 5 non-ASCII characters. > > > > U+0085 NEXT LINE > > U+200E LEFT-TO-RIGHT MARK > > U+200F RIGHT-TO-LEFT MARK > > U+2028 LINE SEPARATOR > > U+2029 PARAGRAPH SEPARATOR > > > > The regex module just ignores characters for which str.isspace() returns > > True. It ignores additional 20 non-ASCII whitespace characters, including > > characters U+001C..001F whose classification as whitespaces is > questionable, > > but doesn't ignore LEFT-TO-RIGHT MARK and RIGHT-TO-LEFT MARK. > > > > U+001C [FILE SEPARATOR] > > U+001D [GROUP SEPARATOR] > > U+001E [RECORD SEPARATOR] > > U+001F [UNIT SEPARATOR] > > U+00A0 NO-BREAK SPACE > > U+1680 OGHAM SPACE MARK > > U+2000 EN QUAD > > U+2001 EM QUAD > > U+2002 EN SPACE > > U+2003 EM SPACE > > U+2004 THREE-PER-EM SPACE > > U+2005 FOUR-PER-EM SPACE > > U+2006 SIX-PER-EM SPACE > > U+2007 FIGURE SPACE > > U+2008 PUNCTUATION SPACE > > U+2009 THIN SPACE > > U+200A HAIR SPACE > > U+202F NARROW NO-BREAK SPACE > > U+205F MEDIUM MATHEMATICAL SPACE > > U+3000 IDEOGRAPHIC SPACE > > > > Is it worth to extend the set of ignored whitespaces to "Pattern > > Whitespaces"? Would it add any benefit? Or add confusion? Should this > depend > > on the re.ASCII mode? Should the byte b'\x85' be ignorable in verbose > bytes > > patterns? > > > > And there is a similar question about the Python parser. If Python uses > > Unicode definition for identifier, shouldn't it accept non-ASCII "Pattern > > Whitespaces" as whitespaces? There will be technical problems with > > supporting this, but are there any benefits? > > > > > > https://perldoc.perl.org/perlre.html > > https://www.unicode.org/reports/tr31/tr31-4.html#Pattern_Syntax > > https://unicode.org/L2/L2005/05012r-pattern.html > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas@python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > _______________________________________________ > Python-ideas mailing list > Python-ideas@python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ >
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/