On 2018-10-28 18:51, Karsten Hilbert wrote:
Dear list members,

I cannot figure out why my regular expression does not work as I expect it to:

#---------------------------
#!/usr/bin/python

from __future__ import print_function
import re as regex

rx_works = '\$<[^<:]+?::.*?::\d*?>\$|\$<[^<:]+?::.*?::\d+-\d+>\$'
# it fails if switched around:
rx_fails = '\$<[^<:]+?::.*?::\d+-\d+>\$|\$<[^<:]+?::.*?::\d*?>\$'
line = 'junk  $<match_A::options A::4>$  junk  $<match_B::options B::4-5>$  
junk'

print ('')
print ('line:', line)
print ('expected: $<match_A::options A::4>$')
print ('expected: $<match_B::options B::4-5>$')

print ('')
placeholders_in_line = regex.findall(rx_works, line, regex.IGNORECASE)
print('found (works):')
for ph in placeholders_in_line:
        print (ph)

print ('')
placeholders_in_line = regex.findall(rx_fails, line, regex.IGNORECASE)
print('found (fails):')
for ph in placeholders_in_line:
        print (ph)

#---------------------------

I am sure I simply don't see the problem ?

Here are some of the steps while matching the second regex. (View this in a monospaced font.)


1:
junk  $<match_A::options A::4>$  junk  $<match_B::options B::4-5>$  junk
      ^

\$<[^<:]+?::.*?::\d+-\d+>\$|\$<[^<:]+?::.*?::\d*?>\$
^


2:
junk  $<match_A::options A::4>$  junk  $<match_B::options B::4-5>$  junk
                 ^

\$<[^<:]+?::.*?::\d+-\d+>\$|\$<[^<:]+?::.*?::\d*?>\$
            ^


3:
The .*? matches as few characters as possible, initially none.

junk  $<match_A::options A::4>$  junk  $<match_B::options B::4-5>$  junk
                          ^
                                                    ^
\$<[^<:]+?::.*?::\d+-\d+>\$|\$<[^<:]+?::.*?::\d*?>\$
               ^


4:
junk  $<match_A::options A::4>$  junk  $<match_B::options B::4-5>$  junk
                             ^

\$<[^<:]+?::.*?::\d+-\d+>\$|\$<[^<:]+?::.*?::\d*?>\$
                    ^

At this point it can't match, so it backtracks.


5:
The .*? matches more characters, including the ":".

After more matching it's like the following.

junk  $<match_A::options A::4>$  junk  $<match_B::options B::4-5>$  junk
                                                ^

\$<[^<:]+?::.*?::\d+-\d+>\$|\$<[^<:]+?::.*?::\d*?>\$
               ^


6:
junk  $<match_A::options A::4>$  junk  $<match_B::options B::4-5>$  junk
                                                  ^

\$<[^<:]+?::.*?::\d+-\d+>\$|\$<[^<:]+?::.*?::\d*?>\$
                 ^

Again it can't match, so it backtracks.


7:
The .*? matches more characters, including the ":".

After more matching it's like the following.

junk  $<match_A::options A::4>$  junk  $<match_B::options B::4-5>$  junk
                                                           ^

\$<[^<:]+?::.*?::\d+-\d+>\$|\$<[^<:]+?::.*?::\d*?>\$
               ^

8:
junk  $<match_A::options A::4>$  junk  $<match_B::options B::4-5>$  junk
                                                                  ^

\$<[^<:]+?::.*?::\d+-\d+>\$|\$<[^<:]+?::.*?::\d*?>\$
                           ^

Success!

The first choice has matched this:

$<match_A::options A::4>$  junk  $<match_B::options B::4-5>$
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to