Re: Possible re bug when using ".*"

2023-01-01 Thread Peter J. Holzer
On 2022-12-28 19:07:06 +, MRAB wrote:
> On 2022-12-28 18:42, Alexander Richert - NOAA Affiliate via Python-list
> wrote:
> > print(re.sub(".*", "replacement", "pattern"))
> > yields the output "replacementreplacement".
[...]
> It's not a bug, it's a change in behaviour to bring it more into line with
> other regex implementations in other languages.

Interesting. Perl does indeed behave that way, too. Never noticed that
in 28 years of using it.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Possible re bug when using ".*"

2022-12-28 Thread Ethan Furman

On 12/28/22 11:07, MRAB wrote:

On 2022-12-28 18:42, Alexander Richert - NOAA Affiliate via Python-list wrote:

  In a couple recent versions of Python (including 3.8 and 3.10), the
following code:
import re
print(re.sub(".*", "replacement", "pattern"))
yields the output "replacementreplacement".

This behavior does not occur in 3.6.

Which behavior is the desired one? Perhaps relatedly, I noticed that even
in 3.6, the code
print(re.findall(".*","pattern"))
yields ['pattern',''] which is not what I was expecting.


It's not a bug, it's a change in behaviour to bring it more into line with 
other regex implementations in other languages.


The new behavior makes no sense to me, but better to be consistent with the other regex engines than not -- I still get 
thrown off by vim's regex.


--
~Ethan~
--
https://mail.python.org/mailman/listinfo/python-list


Re: Possible re bug when using ".*"

2022-12-28 Thread MRAB
On 2022-12-28 18:42, Alexander Richert - NOAA Affiliate via Python-list 
wrote:

  In a couple recent versions of Python (including 3.8 and 3.10), the
following code:
import re
print(re.sub(".*", "replacement", "pattern"))
yields the output "replacementreplacement".

This behavior does not occur in 3.6.

Which behavior is the desired one? Perhaps relatedly, I noticed that even
in 3.6, the code
print(re.findall(".*","pattern"))
yields ['pattern',''] which is not what I was expecting.

It's not a bug, it's a change in behaviour to bring it more into line 
with other regex implementations in other languages.

--
https://mail.python.org/mailman/listinfo/python-list


Re: Possible re bug when using ".*"

2022-12-28 Thread Roel Schroeven

Roel Schroeven schreef op 28/12/2022 om 19:59:
Alexander Richert - NOAA Affiliate via Python-list schreef op 
28/12/2022 om 19:42:

  In a couple recent versions of Python (including 3.8 and 3.10), the
following code:
import re
print(re.sub(".*", "replacement", "pattern"))
yields the output "replacementreplacement".

This behavior does not occur in 3.6.

Which behavior is the desired one? Perhaps relatedly, I noticed that even
in 3.6, the code
print(re.findall(".*","pattern"))
yields ['pattern',''] which is not what I was expecting.
The documentation for re.sub() and re.findall() has these notes: 
"Changed in version 3.7: Empty matches for the pattern are replaced 
when adjacent to a previous non-empty match." and "Changed in version 
3.7: Non-empty matches can now start just after a previous empty match."
That's probably describes the behavior you're seeing. ".*" first 
matches "pattern", which is a non-empty match; then it matches the 
empty string at the end, which is an empty match but is replaced 
because it is adjacent to a non-empty match.


Seems somewhat counter-intuitive to me, but AFAICS it's the intended 
behavior.
For what it's worth, there's some discussion about this in this Github 
issue: https://github.com/python/cpython/issues/76489


--
"Je ne suis pas d’accord avec ce que vous dites, mais je me battrai jusqu’à
la mort pour que vous ayez le droit de le dire."
-- Attribué à Voltaire
"I disapprove of what you say, but I will defend to the death your right to
say it."
-- Attributed to Voltaire
"Ik ben het niet eens met wat je zegt, maar ik zal je recht om het te zeggen
tot de dood toe verdedigen"
-- Toegeschreven aan Voltaire
--
https://mail.python.org/mailman/listinfo/python-list


Re: Possible re bug when using ".*"

2022-12-28 Thread Roel Schroeven
Alexander Richert - NOAA Affiliate via Python-list schreef op 28/12/2022 
om 19:42:

  In a couple recent versions of Python (including 3.8 and 3.10), the
following code:
import re
print(re.sub(".*", "replacement", "pattern"))
yields the output "replacementreplacement".

This behavior does not occur in 3.6.

Which behavior is the desired one? Perhaps relatedly, I noticed that even
in 3.6, the code
print(re.findall(".*","pattern"))
yields ['pattern',''] which is not what I was expecting.
The documentation for re.sub() and re.findall() has these notes: 
"Changed in version 3.7: Empty matches for the pattern are replaced when 
adjacent to a previous non-empty match." and "Changed in version 3.7: 
Non-empty matches can now start just after a previous empty match."
That's probably describes the behavior you're seeing. ".*" first matches 
"pattern", which is a non-empty match; then it matches the empty string 
at the end, which is an empty match but is replaced because it is 
adjacent to a non-empty match.


Seems somewhat counter-intuitive to me, but AFAICS it's the intended 
behavior.


--
"Programming today is a race between software engineers striving to build bigger
and better idiot-proof programs, and the Universe trying to produce bigger and
better idiots. So far, the Universe is winning."
-- Douglas Adams
--
https://mail.python.org/mailman/listinfo/python-list


Possible re bug when using ".*"

2022-12-28 Thread Alexander Richert - NOAA Affiliate via Python-list
 In a couple recent versions of Python (including 3.8 and 3.10), the
following code:
import re
print(re.sub(".*", "replacement", "pattern"))
yields the output "replacementreplacement".

This behavior does not occur in 3.6.

Which behavior is the desired one? Perhaps relatedly, I noticed that even
in 3.6, the code
print(re.findall(".*","pattern"))
yields ['pattern',''] which is not what I was expecting.

Thanks,
Alex Richert

-- 
Alexander Richert, PhD
*RedLine Performance Systems*
-- 
https://mail.python.org/mailman/listinfo/python-list