On 5/8/2016 12:32 PM, Sergio Spina wrote:
Il giorno domenica 8 maggio 2016 18:16:56 UTC+2, Peter Otten ha scritto:
Sergio Spina wrote:
In the following ipython session:
Python 3.5.1+ (default, Feb 24 2016, 11:28:57)
Type "copyright", "credits" or "license" for more information.
IPython 2.3.0 -- An enhanced Interactive Python.
In [1]: import re
In [2]: patt = r""" # the match pattern is:
...: .+ # one or more characters
...: [ ] # followed by a space
...: (?=[@#D]:) # that is followed by one of the
...: # chars "@#D" and a colon ":"
...: """
In [3]: pattern = re.compile(patt, re.VERBOSE)
In [4]: m = pattern.match("Jun@i Bun#i @:Janji")
In [5]: m.group()
Out[5]: 'Jun@i Bun#i '
In [6]: m = pattern.match("Jun@i Bun#i @:Janji D:Banji")
In [7]: m.group()
Out[7]: 'Jun@i Bun#i @:Janji '
In [8]: m = pattern.match("Jun@i Bun#i @:Janji D:Banji #:Junji")
In [9]: m.group()
Out[9]: 'Jun@i Bun#i @:Janji D:Banji '
Why the regex engine stops the search at last piece of string?
Why not at the first match of the group "@:"?
What can it be a regex pattern with the following result?
In [1]: m = pattern.match("Jun@i Bun#i @:Janji D:Banji #:Junji")
In [2]: m.group()
Out[2]: 'Jun@i Bun#i '
Compare:
re.compile("a+").match("aaaa").group()
'aaaa'
re.compile("a+?").match("aaaa").group()
'a'
By default pattern matching is "greedy" -- the ".+" part of your regex
matches as many characters as possible. Adding a ? like in ".+?" triggers
non-greedy matching.
In [2]: patt = r""" # the match pattern is:
...: .+ # one or more characters
Peter meant that you should replace '.+' with '.+?' to get the
non-greedy match.
...: [ ] # followed by a space
...: (?=[@#D]:) # ONLY IF is followed by one of the <<< please note
...: # chars "@#D" and a colon ":"
...: """
From the python documentation
(?=...)
Matches if ... matches next, but doesn't consume any of the string.
This is called a lookahead assertion. For example,
Isaac (?=Asimov) will match 'Isaac ' only if it's followed by 'Asimov'.
I know about greedy and not-greedy, but the problem remains.
Greedy '.+' matches the whole string. The matcher then back up to find
a space -- initially the last space. It then, and only then, checks the
lookahead assertion. If that failed, it would back up again. In your
examples, it succeeds, and the matcher stops.
--
Terry Jan Reedy
--
https://mail.python.org/mailman/listinfo/python-list