On Tue, Jun 30, 2020 at 09:04:15PM +0300, Mikhail V wrote: > > Counter-proposal: hex escapes allow optional curly brackets, similar to > > unicode name escapes. You could even allow spaces within the braces, for > > grouping: > > > > # Proposed enhancement: > > "\x{2b}2c" # '+2c' > > "\x{2b2c}" # '+,' > > "\x{DEAD BEEF}" # "\xDE\xAD\xBE\xEF" > > Nice. But I am not sure about the data type and interpretation depending > on string type. E.g. the second example: > > "\x{2b2c}" # '+,' > > In my example I was showing hex codepoints, e.g. U+2b2c is ⬬ (Black > Horizontal Ellipse)
Your example used the `\x` escape, which takes a pair of hex digits between 0 and 255 inclusive (`\x00` to `\xFF`) and returns a single unicode character between `\u0000` and `\u00FF`. You cannot use x escapes to build up higher unicode code points in a string: '\x2b\x2c' != '\u2b2c' So I assumed that you wanted a way to include multiple such escapes in a sequence. If you want the horizontal ellipse, don't use an `\x` escape, it is the wrong one! Use `\u2b2c`. I have no interest in making `\x{2b2c}` an alternative way of writing `\u2b2c`. Just use the u (or U) escape instead of x. I have no objection to adding the same braces to unicode u and U escapes. Inside the braces, spaces and underscores can be just ignored (they are there for visual grouping). (1) Byte strings support optional braces, spaces and underscores for grouping in hex escapes: b'\x{2b 2c_2a}' == b'\x2b\x2c\x2a' == b'+,*' The spaces/underscores can appear anywhere within the braces, in any order. "Consenting adults" apply: # Valid, but don't do this. b'\x{ 2 ___ _ ___ b }' Style guides and linters can warn against writing ugly strings :-) (2) Unicode strings support the same, with the equivalent semantics: '\x{2b 2c_2a}' == '\x2b\x2c\x2a' == '+,*' (3) Similarly Unicode strings support optional braces and grouping for u and U escapes: '\u{2b 2c}' == '\u2b2c' == '\N{BLACK HORIZONTAL ELLIPSE}' '\U{0000 2b2c}' == '\U00002b2c' == '\N{BLACK HORIZONTAL ELLIPSE}' Likewise any combination of spaces and underscores, in any order, are valid. We can write hideous strings if we want :-) # Valid but don't do this. '\U{ __ 0 __0__ 0 0 2_b 2 ___c___ }' Unlike x escapes, I don't think we should support multiple code points within the u and U braces: # Not part of the proposal '\u{221a221e}' == '\N{SQUARE ROOT}\N{INFINITY}' My reasoning for this is that the leading `\x` is proportionally very "heavy" for hex escapes: fifty percent of the escape code is made up by the leading `\x`, versus just 33% for u escapes and 20% for U escapes. So there is much less benefit to grouping multiple u and U escapes in a single set of braces. The other reason why grouping u and U escapes is less useful is that often we can just include the literal unicode character as a string: '√∞' whereas you cannot do so for control characters. So my argument is to make the conservative change and only allow multiple escape codes inside braces for x escapes. (We can relax the restriction later if there is demand for it, but we cannot tighten it if we change our mind.) Likewise, I would prefer the conservative approach of still requiring leading zeroes in u and U escapes. (4) Lastly, f-strings support the same rules as unicode strings. -- Steven _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/473YKBKZMOH2FNMNDUOMD263VEJ3HH66/ Code of Conduct: http://python.org/psf/codeofconduct/