RE: How to escape strings for re.finditer?

avi.e.gross Tue, 28 Feb 2023 12:28:11 -0800

Jen,


I had no doubt the code you ran was indented properly or it would not work.

 

I am merely letting you know that somewhere in the process of copying the code 
or the transition between mailers, my version is messed up. It happens to be 
easy for me to fix but I sometimes see garbled code I then simply ignore.

 

At times what may help is to leave blank lines that python ignores but also 
keeps the line rearrangements minimal.

 

On to your real question.

 

In my OPINION, there are many interesting questions that can get in the way of 
just getting a working solution. Some may be better in some abstract way but 
except for big projects it often hardly matters.

 

So regex is one thing or more a cluster of things and a list comp is something 
completely different. They are both tools you can use and abuse or lose.

 

The distinction I believe we started with was how to find a fixed string inside 
another fixed string in as many places as needed and perhaps return offset 
info. So this can be solved in too many ways using a side of python focused on 
pure text. As discussed, solutions can include explicit loops such as “for” and 
“while” and their syntactic sugar cousin of a list comp. Not mentioned yet are 
other techniques like a recursive function that finds the first and passes on 
the rest of the string to itself to find the rest, or various functional 
programming techniques that may do sort of hidden loops. YOU DO NOT NEED ALL OF 
THEM but it can be interesting to learn.

 

Regex is a completely different universe that is a bit more of MORE. If I ask 
you for a ride to the grocery store, I might expect you to show up with a car 
and not a James Bond vehicle that also is a boat, submarine, airplane, and 
maybe spaceship. Well, Regex is the latter. And in your case, it is this 
complexity that meant you had to convert your text so it will not see what it 
considers commands or hints.

 

In normal use, put a bit too simply, it wants a carefully crafted pattern to be 
spelled out and it weaves an often complex algorithm it then sort of compiles 
that represents the understanding of what you asked for. The simplest pattern 
is to match EXACTLY THIS. That is your case.

 

A more complex pattern may say to match Boston OR Chicago followed by any 
amount of whitespace then a number of digits between 3 and 5 and then should 
not be followed by something specific. Oh, and by the way, save selected parts 
in parentheses to be accessed as \1 or \2 so I can ask you to do things like 
match a word followed by itself. It goes on and on. 

 

Be warned RE is implemented now all over the place including outside the usual 
UNIX roots and there are somewhat different versions. For your need, it does 
not matter.

 

The compiled monstrosity though can be fairly fast and might be a tad hard for 
you to write by yourself as a bunch of if statements nested that are  weirdly 
matching various patterns with some look ahead or look behind. 

 

What you are being told is that despite this being way more than you asked for, 
it not only works but is fairly fast when doing the simple thing you asked for. 
That may be why a text version you are looking for is hard to find.

 

I am not clear what exactly the rest of your project is about but my guess is 
your first priority is completing it decently and not to try umpteen methods 
and compare them. Not today. Of course if the working version is slow and you 
profile it and find this part seems to be holding it back, it may be worth 
examining.

 

 

From: Jen Kris <jenk...@tutanota.com> 
Sent: Tuesday, February 28, 2023 12:58 PM
To: avi.e.gr...@gmail.com
Cc: 'Python List' <python-list@python.org>
Subject: RE: How to escape strings for re.finditer?

 

The code I sent is correct, and it runs here.  Maybe you received it with a 
carriage return removed, but on my copy after posting, it is correct:

 

example = 'X - abc_degree + 1 + qq + abc_degree + 1'

 find_string = re.escape('abc_degree + 1')

 for match in re.finditer(find_string, example):

     print(match.start(), match.end())

 

One question:  several people have made suggestions other than regex (not your 
terser example with regex you shown below).  Is there a reason why regex is not 
preferred to, for example, a list comp?  Performance?  Reliability?  

 

 

 

  

 

 

Feb 27, 2023, 18:16 by avi.e.gr...@gmail.com <mailto:avi.e.gr...@gmail.com> :

Jen,

 

Can you see what SOME OF US see as ASCII text? We can help you better if we get 
code that can be copied and run as-is.

 

What you sent is not terse. It is wrong. It will not run on any python 
interpreter because you somehow lost a carriage return and indent.

 

This is what you sent:

 

example = 'X - abc_degree + 1 + qq + abc_degree + 1'

find_string = re.escape('abc_degree + 1') for match in re.finditer(find_string, 
example):

print(match.start(), match.end())

 

This is code indentedproperly:

 

example = 'X - abc_degree + 1 + qq + abc_degree + 1'

find_string = re.escape('abc_degree + 1') 

for match in re.finditer(find_string, example):

print(match.start(), match.end())

 

Of course I am sure you wrote and ran code more like the latter version but 
somewhere in your copy/paste process, ....

 

And, just for fun, since there is nothing wrong with your code, this minor 
change is terser:

example = 'X - abc_degree + 1 + qq + abc_degree + 1'

for match in re.finditer(re.escape('abc_degree + 1') , example):

... print(match.start(), match.end())

... 

... 

4 18

26 40

 

But note once you use regular expressions, and not in your case, you might 
match multiple things that are far from the same such as matching two repeated 
words of any kind in any case including "and and" and "so so" or finding words 
that have multiple doubled letter as in the stereotypical bookkeeper. In those 
cases, you may want even more than offsets but also show the exact text that 
matched or even show some characters before and/or after for context.

 

 

-----Original Message-----

From: Python-list <python-list-bounces+avi.e.gross=gmail....@python.org 
<mailto:python-list-bounces+avi.e.gross=gmail....@python.org> > On Behalf Of 
Jen Kris via Python-list

Sent: Monday, February 27, 2023 8:36 PM

To: Cameron Simpson <c...@cskk.id.au <mailto:c...@cskk.id.au> >

Cc: Python List <python-list@python.org <mailto:python-list@python.org> >

Subject: Re: How to escape strings for re.finditer?

 

 

I haven't tested it either but it looks like it would work. But for this case I 
prefer the relative simplicity of:

 

example = 'X - abc_degree + 1 + qq + abc_degree + 1'

find_string = re.escape('abc_degree + 1') for match in re.finditer(find_string, 
example):

print(match.start(), match.end())

 

4 18

26 40

 

I don't insist on terseness for its own sake, but it's cleaner this way. 

 

Jen

 

 

Feb 27, 2023, 16:55 by c...@cskk.id.au <mailto:c...@cskk.id.au> :

On 28Feb2023 01:13, Jen Kris <jenk...@tutanota.com 
<mailto:jenk...@tutanota.com> > wrote:

I went to the re module because the specified string may appear more than once 
in the string (in the code I'm writing).

 

Sure, but writing a `finditer` for plain `str` is pretty easy (untested):

 

pos = 0

while True:

found = s.find(substring, pos)

if found < 0:

break

start = found

end = found + len(substring)

... do whatever with start and end ...

pos = end

 

Many people go straight to the `re` module whenever they're looking for 
strings. It is often cryptic error prone overkill. Just something to keep in 
mind.

 

Cheers,

Cameron Simpson <c...@cskk.id.au <mailto:c...@cskk.id.au> >

--

https://mail.python.org/mailman/listinfo/python-list

 

-- 

https://mail.python.org/mailman/listinfo/python-list

 

-- 
https://mail.python.org/mailman/listinfo/python-list

RE: How to escape strings for re.finditer?

Reply via email to