Re: Possible re bug when using ".*"
On 2022-12-28 19:07:06 +, MRAB wrote: > On 2022-12-28 18:42, Alexander Richert - NOAA Affiliate via Python-list > wrote: > > print(re.sub(".*", "replacement", "pattern")) > > yields the output "replacementreplacement". [...] > It's not a bug, it's a change in behaviour to bring it more into line with > other regex implementations in other languages. Interesting. Perl does indeed behave that way, too. Never noticed that in 28 years of using it. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature -- https://mail.python.org/mailman/listinfo/python-list
Re: Possible re bug when using ".*"
On 12/28/22 11:07, MRAB wrote: On 2022-12-28 18:42, Alexander Richert - NOAA Affiliate via Python-list wrote: In a couple recent versions of Python (including 3.8 and 3.10), the following code: import re print(re.sub(".*", "replacement", "pattern")) yields the output "replacementreplacement". This behavior does not occur in 3.6. Which behavior is the desired one? Perhaps relatedly, I noticed that even in 3.6, the code print(re.findall(".*","pattern")) yields ['pattern',''] which is not what I was expecting. It's not a bug, it's a change in behaviour to bring it more into line with other regex implementations in other languages. The new behavior makes no sense to me, but better to be consistent with the other regex engines than not -- I still get thrown off by vim's regex. -- ~Ethan~ -- https://mail.python.org/mailman/listinfo/python-list
Re: Possible re bug when using ".*"
On 2022-12-28 18:42, Alexander Richert - NOAA Affiliate via Python-list wrote: In a couple recent versions of Python (including 3.8 and 3.10), the following code: import re print(re.sub(".*", "replacement", "pattern")) yields the output "replacementreplacement". This behavior does not occur in 3.6. Which behavior is the desired one? Perhaps relatedly, I noticed that even in 3.6, the code print(re.findall(".*","pattern")) yields ['pattern',''] which is not what I was expecting. It's not a bug, it's a change in behaviour to bring it more into line with other regex implementations in other languages. -- https://mail.python.org/mailman/listinfo/python-list
Re: Possible re bug when using ".*"
Roel Schroeven schreef op 28/12/2022 om 19:59: Alexander Richert - NOAA Affiliate via Python-list schreef op 28/12/2022 om 19:42: In a couple recent versions of Python (including 3.8 and 3.10), the following code: import re print(re.sub(".*", "replacement", "pattern")) yields the output "replacementreplacement". This behavior does not occur in 3.6. Which behavior is the desired one? Perhaps relatedly, I noticed that even in 3.6, the code print(re.findall(".*","pattern")) yields ['pattern',''] which is not what I was expecting. The documentation for re.sub() and re.findall() has these notes: "Changed in version 3.7: Empty matches for the pattern are replaced when adjacent to a previous non-empty match." and "Changed in version 3.7: Non-empty matches can now start just after a previous empty match." That's probably describes the behavior you're seeing. ".*" first matches "pattern", which is a non-empty match; then it matches the empty string at the end, which is an empty match but is replaced because it is adjacent to a non-empty match. Seems somewhat counter-intuitive to me, but AFAICS it's the intended behavior. For what it's worth, there's some discussion about this in this Github issue: https://github.com/python/cpython/issues/76489 -- "Je ne suis pas d’accord avec ce que vous dites, mais je me battrai jusqu’à la mort pour que vous ayez le droit de le dire." -- Attribué à Voltaire "I disapprove of what you say, but I will defend to the death your right to say it." -- Attributed to Voltaire "Ik ben het niet eens met wat je zegt, maar ik zal je recht om het te zeggen tot de dood toe verdedigen" -- Toegeschreven aan Voltaire -- https://mail.python.org/mailman/listinfo/python-list
Re: Possible re bug when using ".*"
Alexander Richert - NOAA Affiliate via Python-list schreef op 28/12/2022 om 19:42: In a couple recent versions of Python (including 3.8 and 3.10), the following code: import re print(re.sub(".*", "replacement", "pattern")) yields the output "replacementreplacement". This behavior does not occur in 3.6. Which behavior is the desired one? Perhaps relatedly, I noticed that even in 3.6, the code print(re.findall(".*","pattern")) yields ['pattern',''] which is not what I was expecting. The documentation for re.sub() and re.findall() has these notes: "Changed in version 3.7: Empty matches for the pattern are replaced when adjacent to a previous non-empty match." and "Changed in version 3.7: Non-empty matches can now start just after a previous empty match." That's probably describes the behavior you're seeing. ".*" first matches "pattern", which is a non-empty match; then it matches the empty string at the end, which is an empty match but is replaced because it is adjacent to a non-empty match. Seems somewhat counter-intuitive to me, but AFAICS it's the intended behavior. -- "Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Universe is winning." -- Douglas Adams -- https://mail.python.org/mailman/listinfo/python-list
Possible re bug when using ".*"
In a couple recent versions of Python (including 3.8 and 3.10), the following code: import re print(re.sub(".*", "replacement", "pattern")) yields the output "replacementreplacement". This behavior does not occur in 3.6. Which behavior is the desired one? Perhaps relatedly, I noticed that even in 3.6, the code print(re.findall(".*","pattern")) yields ['pattern',''] which is not what I was expecting. Thanks, Alex Richert -- Alexander Richert, PhD *RedLine Performance Systems* -- https://mail.python.org/mailman/listinfo/python-list