On 5/8/2016 12:32 PM, Sergio Spina wrote:
Il giorno domenica 8 maggio 2016 18:16:56 UTC+2, Peter Otten ha scritto:
Sergio Spina wrote:

In the following ipython session:

Python 3.5.1+ (default, Feb 24 2016, 11:28:57)
Type "copyright", "credits" or "license" for more information.

IPython 2.3.0 -- An enhanced Interactive Python.

In [1]: import re

In [2]: patt = r"""  # the match pattern is:
...:     .+          # one or more characters
...:     [ ]         # followed by a space
...:     (?=[@#D]:)  # that is followed by one of the
...:                 # chars "@#D" and a colon ":"
...:    """

In [3]: pattern = re.compile(patt, re.VERBOSE)

In [4]: m = pattern.match("Jun@i Bun#i @:Janji")

In [5]: m.group()
Out[5]: 'Jun@i Bun#i '

In [6]: m = pattern.match("Jun@i Bun#i @:Janji D:Banji")

In [7]: m.group()
Out[7]: 'Jun@i Bun#i @:Janji '

In [8]: m = pattern.match("Jun@i Bun#i @:Janji D:Banji #:Junji")

In [9]: m.group()
Out[9]: 'Jun@i Bun#i @:Janji D:Banji '

Why the regex engine stops the search at last piece of string?
Why not at the first match of the group "@:"?
What can it be a regex pattern with the following result?

In [1]: m = pattern.match("Jun@i Bun#i @:Janji D:Banji #:Junji")

In [2]: m.group()
Out[2]: 'Jun@i Bun#i '

Compare:

re.compile("a+").match("aaaa").group()
'aaaa'
re.compile("a+?").match("aaaa").group()
'a'

By default pattern matching is "greedy" -- the ".+" part of your regex
matches as many characters as possible. Adding a ? like in ".+?" triggers
non-greedy matching.

 In [2]: patt = r"""  # the match pattern is:
 ...:     .+          # one or more characters

Peter meant that you should replace '.+' with '.+?' to get the non-greedy match.

 ...:     [ ]         # followed by a space
 ...:     (?=[@#D]:)  # ONLY IF is followed by one of the <<< please note
 ...:                 # chars "@#D" and a colon ":"
 ...:    """

From the python documentation

 (?=...)
     Matches if ... matches next, but doesn't consume any of the string.
     This is called a lookahead assertion. For example,
     Isaac (?=Asimov) will match 'Isaac ' only if it's followed by 'Asimov'.

I know about greedy and not-greedy, but the problem remains.

Greedy '.+' matches the whole string. The matcher then back up to find a space -- initially the last space. It then, and only then, checks the lookahead assertion. If that failed, it would back up again. In your examples, it succeeds, and the matcher stops.

--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to