Le 06/08/2021 à 02:57, Jach Feng a écrit :
ast 在 2021年8月5日 星期四下午11:29:15 [UTC+8] 的信中寫道:
Le 05/08/2021 à 17:11, ast a écrit :
Le 05/08/2021 à 11:40, Jach Feng a écrit :

import regex

# regex is more powerful that re
text = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
regex.findall(r'ch \d++(?!\.)', text)

['ch 4', 'ch 56']

## ++ means "possessive", no backtrack is allowed

Can someone explain how the difference appear? I just can't figure it out:-(


+, *, ? are greedy, means they try to catch as many characters
as possible. But if the whole match doesn't work, they release
some characters once at a time and try the whole match again.
That's backtrack.
With ++, backtrack is not allowed. This works with module regex
and it is not implemented in module re

with string = "ch 23." and pattern = r"ch \d+\."

At first trial \d+  catch 23
but whole match will fail because next character is . and . is not allowed (\.)

A backtrack happens:

\d+  catch only 2
and the whole match is successful because the next char 3 is not .
But this is not what we want.

with ++, no backtrack, so no match
"ch 23." is rejected
this is what we wanted


Using re only, the best way is probably

re.findall(r"ch \d+(?![.0-9])", text)
['ch 4', 'ch 56']
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to