[issue40027] re.sub inconsistency beginning with 3.7
Wayne Davison <4way...@gmail.com> added the comment: Can this bug please be reopened and fixed? This is an anchored substitution, and so should never match more than once. -- nosy: +4wayned ___ Python tracker <https://bugs.python.org/issue40027> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40027] re.sub inconsistency beginning with 3.7
Wayne Davison added the comment: Another argument in favor of this being a bug, this does not exhibit the same doubling: txt = ' test' txt = re.sub(r'^\s*', '^', txt) That always substitutes once. -- ___ Python tracker <https://bugs.python.org/issue40027> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40027] re.sub inconsistency beginning with 3.7
Wayne Davison added the comment: This is not the same thing because the match is anchored, so it is not adjacent to the prior match -- it is the same match. I think that r'\s*\Z' should behave the same way as r'\s*x' due to the anchor point. The current behavior is matching the same \Z twice. -- ___ Python tracker <https://bugs.python.org/issue40027> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40027] re.sub inconsistency beginning with 3.7
New submission from Wayne Davison : There is an inconsistency in re.sub() when substituting at the end of a string using a prior match with a '*' qualifier: the substitution now occurs twice. For example: txt = re.sub(r'\s*\Z', "\n", txt) This should work like txt.rstrip() + "\n", but beginning in 3.7, the re.sub version now matches twice and changes any non-empty whitespace into "\n\n" instead of "\n". (If there is no trailing whitespace it only matches once.) The bug is the same if '$' is used instead of '\Z', but it does not happen if an actual character is specified (e.g. a substitution of r'\s*x' does not substitute twice if x has preceding whitespace). I tested 2.7.17, 3.6.9, 3.7.7, 3.8.2, and 3.9.0a4, and it starts to fail in 3.7.7 and beyond. Attached is a test program. ------ components: Regular Expressions files: sub-bug.py messages: 364688 nosy: Wayne Davison, ezio.melotti, mrabarnett priority: normal severity: normal status: open title: re.sub inconsistency beginning with 3.7 type: behavior versions: Python 3.7, Python 3.8, Python 3.9 Added file: https://bugs.python.org/file48990/sub-bug.py ___ Python tracker <https://bugs.python.org/issue40027> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com