On 3/31/2018 9:48 PM, Steven D'Aprano wrote:
On Sun, Apr 01, 2018 at 02:20:16AM +0100, Rob Cliffe via Python-ideas wrote:

New unordered 'd' and 'D' prefixes, for 'dedent', applied to multiline
strings only, would multiply the number of alternatives by about 5 and
would require another rewrite of all code (Python or not) that parses
Python code (such as in syntax colorizers).

I think you're exaggerating the difficulty somewhat.  Multiplying the
number of alternatives by 5 is not the same thing as increasing the
complexity of code to parse it by 5.

Terry didn't say that it would increase the complexity of the code by a
factor of five. He said it would multiply the number of alternatives by
"about 5". There would be a significant increase in the complexity of
the code too, but I wouldn't want to guess how much.

Starting with r and f prefixes, in both upper and lower case, we have:

4 single letter prefixes
(plus 2 more, u and U, that don't combine with others)
8 double letter prefixes

making 14 in total. Adding one more prefix, d|D, increases it to:

6 single letter prefixes
(plus 2 more, u and U)
24 double letter prefixes
48 triple letter prefixes

making 80 prefixes in total. Terry actually underestimated the explosion
in prefixes: it is closer to six times more than five (but who is
counting? apart from me *wink*)

[Aside: if we add a fourth, the total becomes 634 prefixes.]

Not that it really matters, but there's some code I use whenever I feel like playing with adding string prefixes. It usually encourages me to not do that!

Lib/tokenize.py:_all_string_prefixes exists just for calculating string prefixes. Since it's not what is actually used by the tokenizer, I don't claim it's perfect (but I don't know of any errors in it).

According to it, and ignoring the empty string, there are currently 24 prefixes: {'B', 'BR', 'Br', 'F', 'FR', 'Fr', 'R', 'RB', 'RF', 'Rb', 'Rf', 'U', 'b', 'bR', 'br', 'f', 'fR', 'fr', 'r', 'rB', 'rF', 'rb', 'rf', 'u'}

And if you add 'd', and it can't combine with 'b' or 'u', I count 90:
{'rdf', 'FR', 'dRF', 'rD', 'FrD', 'DFr', 'frd', 'RDf', 'u', 'DF', 'd', 'Frd', 'frD', 'dFr', 'rDF', 'fD', 'rB', 'dFR', 'FD', 'dr', 'Fr', 'DfR', 'fdR', 'Rb', 'dfr', 'rdF', 'rf', 'Drf', 'R', 'RB', 'BR', 'FdR', 'bR', 'DFR', 'RdF', 'dF', 'F', 'fd', 'Br', 'Dfr', 'Dr', 'r', 'rfd', 'RFd', 'Fdr', 'dfR', 'rb', 'fDr', 'rFD', 'fRd', 'Rfd', 'RDF', 'rFd', 'Rdf', 'rF', 'FDr', 'drF', 'dR', 'D', 'br', 'fr', 'drf', 'DrF', 'rd', 'DRF', 'DR', 'RFD', 'Rf', 'fR', 'RfD', 'Df', 'rDf', 'U', 'f', 'df', 'DRf', 'fdr', 'B', 'FRD', 'RF', 'Fd', 'Rd', 'fRD', 'FRd', 'b', 'dRf', 'FDR', 'RD', 'fDR', 'rfD'}

I guess it's debatable if you want to count prefixes that contain 'b' as string prefixes or not, but the tokenizer thinks they are. If you leave them out, you come up with the 14 and 80 that Steven mentions.

I agree with Terry that adding 'F' was a mistake. But since the upper case versions of 'r', and 'b' already existed, it was included.

Interestingly, in 2.7 'ur' is a valid prefix, but not in 3.6. I don't recall if that was deliberate or not. And 'ru' isn't valid in either version.

Eric
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to