On 2018-07-18 22:40, Larry Martell wrote:
On Tue, Jul 17, 2018 at 11:43 AM, Neil Cerutti <ne...@norwich.edu> wrote:
On 2018-07-16, Larry Martell <larry.mart...@gmail.com> wrote:
I had some code that did this:

meas_regex = '_M\d+_'
meas_re = re.compile(meas_regex)

if meas_re.search(filename):
    stuff1()
else:
    stuff2()

I then had to change it to this:

if meas_re.search(filename):
    if 'MeasDisplay' in filename:
        stuff1a()
    else:
        stuff1()
else:
    if 'PatternFov' in filename:
        stuff2a()
   else:
        stuff2()

This code needs to process many tens of 1000's of files, and it
runs often, so it needs to run very fast. Needless to say, my
change has made it take 2x as long. Can anyone see a way to
improve that?

Can you expand/improve the regex pattern so you don't have rescan
the string to check for the presence of MeasDisplay and
PatternFov? In other words, since you're already using the giant,
Swiss Army sledgehammer of the re module, go ahead and use enough
features to cover your use case.

Yeah, that was my first thought, but I haven't been able to come up
with a regex that works.

There are 4 cases I need to detect:

case1 = 'spam_M123_eggs_MeasDisplay_sausage'
case2 = 'spam_M123_eggs_sausage_and_spam'
case3 = 'spam_spam_spam_PatternFov_eggs_sausage_and_spam'
case4 = 'spam_spam_spam_eggs_sausage_and_spam'

I thought this regex would work:

'(_M\d+_){0,1}.*?(MeasDisplay|PatternFOV){0,1}'

And then I could look at the match objects and see which of the 4
cases it was. But try as I might, I could not get it to work. Any
regex gurus want to tell me what I am doing wrong here?

The trick to capturing both of the parts when they are both optional is to use a lookahead and make it optional:

r'(?=.*?(_M\d+_))?(?=.*?(MeasDisplay|PatternFov))?'
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to