subject:"Regular Expressions"

Re: issue with regular expressions

2019-10-22 Thread joseph pareti

Ok, thanks. It works for me.
regards,

Am Di., 22. Okt. 2019 um 11:29 Uhr schrieb Matt Wheeler :

>
>
> On Tue, 22 Oct 2019, 09:44 joseph pareti,  wrote:
>
>> the following code ends in an exception:
>>
>> import re
>> pattern = 'Sottoscrizione unica soluzione'
>> mylines = []# Declare an empty list.
>
> with open ('tmp.txt', 'rt') as myfile:  # Open tmp.txt for reading
>> text.
>> for myline in myfile:   # For each line in the file,
>> mylines.append(myline.rstrip('\n')) # strip newline and add to
>> list.
>> for element in mylines: # For each element in the
>> list,
>> #print(element)
>>match = re.search(pattern, element)
>>s = match.start()
>>e = match.end()
>>print(element[s:e])
>>
>>
>>
>> F:\October20-2019-RECOVERY\Unicredit_recovery\tmp_re_search>c:\Users\joepareti\Miniconda3\pkgs\python-3.7.1-h8c8aaf0_6\python.exe
>> search_0.py
>> Traceback (most recent call last):
>>   File "search_0.py", line 10, in 
>> s = match.start()
>> AttributeError: 'NoneType' object has no attribute 'start'
>>
>> any help? Thanks
>>
>
> Check over the docs for re.match again, you'll see it returns either a
> Match object (which is always truthy), or None.
>
> So a simple solution is to wrap your attempts to use the Match object in
>
> ```
> if match:
> ...
> ```
>
>>

-- 
Regards,
Joseph Pareti - Artificial Intelligence consultant
Joseph Pareti's AI Consulting Services
https://www.joepareti54-ai.com/
cell +49 1520 1600 209
cell +39 339 797 0644
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: issue with regular expressions

2019-10-22 Thread Matt Wheeler

On Tue, 22 Oct 2019, 09:44 joseph pareti,  wrote:

> the following code ends in an exception:
>
> import re
> pattern = 'Sottoscrizione unica soluzione'
> mylines = []# Declare an empty list.

with open ('tmp.txt', 'rt') as myfile:  # Open tmp.txt for reading text.
> for myline in myfile:   # For each line in the file,
> mylines.append(myline.rstrip('\n')) # strip newline and add to
> list.
> for element in mylines: # For each element in the list,
> #print(element)
>match = re.search(pattern, element)
>s = match.start()
>e = match.end()
>print(element[s:e])
>
>
>
> F:\October20-2019-RECOVERY\Unicredit_recovery\tmp_re_search>c:\Users\joepareti\Miniconda3\pkgs\python-3.7.1-h8c8aaf0_6\python.exe
> search_0.py
> Traceback (most recent call last):
>   File "search_0.py", line 10, in 
> s = match.start()
> AttributeError: 'NoneType' object has no attribute 'start'
>
> any help? Thanks
>

Check over the docs for re.match again, you'll see it returns either a
Match object (which is always truthy), or None.

So a simple solution is to wrap your attempts to use the Match object in

```
if match:
...
```

>
-- 
https://mail.python.org/mailman/listinfo/python-list

issue with regular expressions

2019-10-22 Thread joseph pareti

the following code ends in an exception:

import re
pattern = 'Sottoscrizione unica soluzione'
mylines = []# Declare an empty list.
with open ('tmp.txt', 'rt') as myfile:  # Open tmp.txt for reading text.
for myline in myfile:   # For each line in the file,
mylines.append(myline.rstrip('\n')) # strip newline and add to list.
for element in mylines: # For each element in the list,
#print(element)
   match = re.search(pattern, element)
   s = match.start()
   e = match.end()
   print(element[s:e])


F:\October20-2019-RECOVERY\Unicredit_recovery\tmp_re_search>c:\Users\joepareti\Miniconda3\pkgs\python-3.7.1-h8c8aaf0_6\python.exe
search_0.py
Traceback (most recent call last):
  File "search_0.py", line 10, in 
s = match.start()
AttributeError: 'NoneType' object has no attribute 'start'

any help? Thanks
-- 
Regards,
Joseph Pareti - Artificial Intelligence consultant
Joseph Pareti's AI Consulting Services
https://www.joepareti54-ai.com/
cell +49 1520 1600 209
cell +39 339 797 0644
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: regular expressions help

2019-09-20 Thread Barry Scott

When I'm debugging a regex I make the regex shorter and shorter to figure out
what the problem is.

Try starting with re.compile(r'm') and then add the chars one by one seeing
what happens as the string gets longer.

Barry


> On 19 Sep 2019, at 09:41, Pradeep Patra  wrote:
> 
> I am using python 2.7.6 but I also tried on python 3.7.3.
> 
> On Thursday, September 19, 2019, Pradeep Patra 
> wrote:
> 
>> Beginning of the string. But I tried removing that as well and it still
>> could not find it. When I tested at www.regex101.com and it matched
>> successfully whereas I may be wrong. Could you please help here?
>> 
>> On Thursday, September 19, 2019, David  wrote:
>> 
>>> On Thu, 19 Sep 2019 at 17:51, Pradeep Patra 
>>> wrote:
 
 pattern=re.compile(r'^my\-dog$')
 matches = re.search(mystr)
 
 In the above example both cases(match/not match) the matches returns
>>> "None"
>>> 
>>> Hi, do you know what the '^' character does in your pattern?
>>> 
>> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list
> 

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: regular expressions help

2019-09-19 Thread Chris Angelico

On Fri, Sep 20, 2019 at 1:07 AM Pradeep Patra  wrote:
>
> Thanks  David /Anthony for your help. I figured out the issue myself. I
> dont need any ^, $ etc to the regex pattern and the plain string (for exp
> my-dog) works fine. I am looking at creating a generic method so that
> instead of passing my-dog i can pass my-cat or blah blah. I am thinking of
> creating a list of probable combinations to search from the list. Anybody
> have better ideas?
>

If you just want to find a string in another string, don't use regular
expressions at all! Just ask Python directly:

>>> print("my-cat" in "This is where you can find my-cat, look, see!")
True

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: regular expressions help

2019-09-19 Thread Pradeep Patra

Thanks  David /Anthony for your help. I figured out the issue myself. I
dont need any ^, $ etc to the regex pattern and the plain string (for exp
my-dog) works fine. I am looking at creating a generic method so that
instead of passing my-dog i can pass my-cat or blah blah. I am thinking of
creating a list of probable combinations to search from the list. Anybody
have better ideas?

On Thu, Sep 19, 2019 at 3:46 PM David  wrote:

> On Thu, 19 Sep 2019 at 19:34, Pradeep Patra 
> wrote:
>
> > Thanks David for your quick help. Appreciate it. When I tried on python
> 2.7.3 the same thing you did below I got the error after matches.group(0)
> as follows:
> >
> > AttributeError: NoneType object has no attribute 'group'.
> >
> > I tried to check 'None' for no match for re.search as the documentation
> says but it's not working.
> >
> > Unfortunately I cannot update the python version now to 2.7.13 as other
> programs are using this version and need to test all and it requires more
> testing. Any idea how I can fix this ? I am ok to use any other re
> method(not only tied to re.search) as long as it works.
>
> Hi again Pradeep,
>
> We are now on email number seven, so I am
> going to try to give you some good advice ...
>
> When you ask on a forum like this for help, it is very
> important to show people exactly what you did.
> Everything that you did. In the shortest possible
> way that demonstrates whatever issue you are
> facing.
>
> It is best to give us a recipe that we can follow
> exactly that shows every step that you do when
> you have the problem that you need help with.
>
> And the best way to do that is for you to learn
> how to cut and paste between where you run
> your problem code, and where you send your
> email message to us.
>
> Please observe the way that I communicated with
> you last time. I sent you an exact cut and paste
> from my terminal, to help you by allowing you to
> duplicate exactly every step that I made.
>
> You should communicate with us in the same
> way. Because when you write something like
> your most recent message
>
> > I got the error after matches.group(0) as follows:
> > AttributeError: NoneType object has no attribute 'group'.
>
> this tells us nothing useful!! Because we cannot
> see everything you did leading up to that, so we
> cannot reproduce your problem.
>
> For us to help you, you need to show all the steps,
> the same way I did.
>
> Now, to help you, I found the same old version of
> Python 2 that you have, to prove to you that it works
> on your version.
>
> So you talking about updating Python is not going
> to help. Instead, you need to work out what you
> are doing that is causing your problem.
>
> Again, I cut and paste my whole session to show
> you, see below. Notice that the top lines show that
> it is the same version that you have.
>
> If you cut and paste my commands into
> your Python then it should work the same way
> for you too.
>
> If it does not work for you, then SHOW US THE
> WHOLE SESSION, EVERY STEP, so that we can
> reproduce your problem. Run your python in a terminal,
> and copy and paste the output you get into your message.
>
> $ python
> Python 2.7.3 (default, Jun 20 2016, 16:18:47)
> [GCC 4.7.2] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import re
> >>> mystr = "where is my-dog"
> >>> pattern = re.compile(r'my-dog$')
> >>> matches = re.search(pattern, mystr)
> >>> matches.group(0)
> 'my-dog'
> >>>
>
> I hope you realise that the re module has been used
> by thousands of programmers, for many years.
> So it's extremely unlikely that it "doesn't work" in a way that
> gets discovered by someone who hardly knows how to use it.
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: regular expressions help

2019-09-19 Thread David

On Thu, 19 Sep 2019 at 19:34, Pradeep Patra  wrote:

> Thanks David for your quick help. Appreciate it. When I tried on python 2.7.3 
> the same thing you did below I got the error after matches.group(0) as 
> follows:
>
> AttributeError: NoneType object has no attribute 'group'.
>
> I tried to check 'None' for no match for re.search as the documentation says 
> but it's not working.
>
> Unfortunately I cannot update the python version now to 2.7.13 as other 
> programs are using this version and need to test all and it requires more 
> testing. Any idea how I can fix this ? I am ok to use any other re method(not 
> only tied to re.search) as long as it works.

Hi again Pradeep,

We are now on email number seven, so I am
going to try to give you some good advice ...

When you ask on a forum like this for help, it is very
important to show people exactly what you did.
Everything that you did. In the shortest possible
way that demonstrates whatever issue you are
facing.

It is best to give us a recipe that we can follow
exactly that shows every step that you do when
you have the problem that you need help with.

And the best way to do that is for you to learn
how to cut and paste between where you run
your problem code, and where you send your
email message to us.

Please observe the way that I communicated with
you last time. I sent you an exact cut and paste
from my terminal, to help you by allowing you to
duplicate exactly every step that I made.

You should communicate with us in the same
way. Because when you write something like
your most recent message

> I got the error after matches.group(0) as follows:
> AttributeError: NoneType object has no attribute 'group'.

this tells us nothing useful!! Because we cannot
see everything you did leading up to that, so we
cannot reproduce your problem.

For us to help you, you need to show all the steps,
the same way I did.

Now, to help you, I found the same old version of
Python 2 that you have, to prove to you that it works
on your version.

So you talking about updating Python is not going
to help. Instead, you need to work out what you
are doing that is causing your problem.

Again, I cut and paste my whole session to show
you, see below. Notice that the top lines show that
it is the same version that you have.

If you cut and paste my commands into
your Python then it should work the same way
for you too.

If it does not work for you, then SHOW US THE
WHOLE SESSION, EVERY STEP, so that we can
reproduce your problem. Run your python in a terminal,
and copy and paste the output you get into your message.

$ python
Python 2.7.3 (default, Jun 20 2016, 16:18:47)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> mystr = "where is my-dog"
>>> pattern = re.compile(r'my-dog$')
>>> matches = re.search(pattern, mystr)
>>> matches.group(0)
'my-dog'
>>>

I hope you realise that the re module has been used
by thousands of programmers, for many years.
So it's extremely unlikely that it "doesn't work" in a way that
gets discovered by someone who hardly knows how to use it.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: regular expressions help

2019-09-19 Thread Pradeep Patra

Thanks David for your quick help. Appreciate it. When I tried on python
2.7.3 the same thing you did below I got the error after matches.group(0)
as follows:

AttributeError: NoneType object has no attribute 'group'.

I tried to check 'None' for no match for re.search as the documentation
says but it's not working.

Unfortunately I cannot update the python version now to 2.7.13 as other
programs are using this version and need to test all and it requires more
testing. Any idea how I can fix this ? I am ok to use any other re
method(not only tied to re.search) as long as it works.

On Thursday, September 19, 2019, David  wrote:

> On Thu, 19 Sep 2019 at 18:41, Pradeep Patra 
> wrote:
> > On Thursday, September 19, 2019, Pradeep Patra 
> wrote:
> >> On Thursday, September 19, 2019, David  wrote:
> >>> On Thu, 19 Sep 2019 at 17:51, Pradeep Patra 
> wrote:
>
> >>> > pattern=re.compile(r'^my\-dog$')
> >>> > matches = re.search(mystr)
>
> >>> > In the above example both cases(match/not match) the matches returns
> "None"
>
> >>> Hi, do you know what the '^' character does in your pattern?
>
> >> Beginning of the string. But I tried removing that as well and it still
> could not find it. When I tested at www.regex101.com and it matched
> successfully whereas I may be wrong. Could you please help here?
>
> > I am using python 2.7.6 but I also tried on python 3.7.3.
>
> $ python2
> Python 2.7.13 (default, Sep 26 2018, 18:42:22)
> [GCC 6.3.0 20170516] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import re
> >>> mystr= "where is my-dog"
> >>> pattern=re.compile(r'my-dog$')
> >>> matches = re.search(mystr)  # this is syntax error, but it is what you
> showed above
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: search() takes at least 2 arguments (1 given)
> >>> matches = re.search(pattern, mystr)
> >>> matches.group(0)
> 'my-dog'
> >>>
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: regular expressions help

2019-09-19 Thread David

On Thu, 19 Sep 2019 at 18:41, Pradeep Patra  wrote:
> On Thursday, September 19, 2019, Pradeep Patra  
> wrote:
>> On Thursday, September 19, 2019, David  wrote:
>>> On Thu, 19 Sep 2019 at 17:51, Pradeep Patra  
>>> wrote:

>>> > pattern=re.compile(r'^my\-dog$')
>>> > matches = re.search(mystr)

>>> > In the above example both cases(match/not match) the matches returns 
>>> > "None"

>>> Hi, do you know what the '^' character does in your pattern?

>> Beginning of the string. But I tried removing that as well and it still 
>> could not find it. When I tested at www.regex101.com and it matched 
>> successfully whereas I may be wrong. Could you please help here?

> I am using python 2.7.6 but I also tried on python 3.7.3.

$ python2
Python 2.7.13 (default, Sep 26 2018, 18:42:22)
[GCC 6.3.0 20170516] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> mystr= "where is my-dog"
>>> pattern=re.compile(r'my-dog$')
>>> matches = re.search(mystr)  # this is syntax error, but it is what you 
>>> showed above
Traceback (most recent call last):
  File "", line 1, in 
TypeError: search() takes at least 2 arguments (1 given)
>>> matches = re.search(pattern, mystr)
>>> matches.group(0)
'my-dog'
>>>
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: regular expressions help

2019-09-19 Thread Pradeep Patra

I am using python 2.7.6 but I also tried on python 3.7.3.

On Thursday, September 19, 2019, Pradeep Patra 
wrote:

> Beginning of the string. But I tried removing that as well and it still
> could not find it. When I tested at www.regex101.com and it matched
> successfully whereas I may be wrong. Could you please help here?
>
> On Thursday, September 19, 2019, David  wrote:
>
>> On Thu, 19 Sep 2019 at 17:51, Pradeep Patra 
>> wrote:
>> >
>> > pattern=re.compile(r'^my\-dog$')
>> > matches = re.search(mystr)
>> >
>> > In the above example both cases(match/not match) the matches returns
>> "None"
>>
>> Hi, do you know what the '^' character does in your pattern?
>>
>
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: regular expressions help

2019-09-19 Thread David

On Thu, 19 Sep 2019 at 17:51, Pradeep Patra  wrote:
>
> pattern=re.compile(r'^my\-dog$')
> matches = re.search(mystr)
>
> In the above example both cases(match/not match) the matches returns "None"

Hi, do you know what the '^' character does in your pattern?
-- 
https://mail.python.org/mailman/listinfo/python-list

regular expressions help

2019-09-19 Thread Pradeep Patra

Hi all,

I was playing around with regular expressions and testing the simple
regular expression and its notworking for some reason.

I want to search "my-dog" at any of the place in a string and return the
index but its not working. I tried both in python 3.7.3 and 2.7.x. Can
anyone please help?
I tried re.search, re.finditer, re.findall and none of them is not working
for me.
import re

mystr= "where is my-dog"

pattern=re.compile(r'^my\-dog$')
matches = re.search(mystr)

print(matches)

In the above example both cases(match/not match) the matches returns "None"

I tried re.finditer() and then a loop to find all the occurences of the
pattern in the string but even if there is no error but i could not find
the match.

Can anyone help me in this regard?

Regards
Pradeep
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions, Speed, Python, and NFA

2017-04-17 Thread breamoreboy

On Friday, April 14, 2017 at 4:12:27 PM UTC+1, Malik Rumi wrote:
> I am running some tests using the site regex101 to figure out the correct 
> regexs to use for a project. I was surprised at how slow it was, constantly 
> needing to increase the timeouts. I went Googling for a reason, and solution, 
> and found Russ Cox’s article from 2007: 
> https://swtch.com/~rsc/regexp/regexp1.html . I couldn’t understand why, if 
> this was even remotely correct, we don’t use NFA in Python, which led me here:
> 
> https://groups.google.com/forum/#!msg/comp.lang.python/L1ZFI_R2hAo/C12Nf3patWIJ;context-place=forum/comp.lang.python
>  where all of these issues were addressed. Unfortunately, this is also from 
> 2007. 
> 
> BTW, John Machin in one of his replies cites Navarro’s paper, but that link 
> is broken. Navarro’s work can now be found at 
> http://citeseer.ist.psu.edu/viewdoc/download?doi=10.1.1.21.3112&rep=rep1&type=pdf
>  But be forewarned, it is 68 pages of dense reading. I am not a computer 
> science major. I am not new to Python, but I don’t think I’m qualified to 
> take on the idea of creating a new NFA module for Python.  
> 
> I am not a computer science major. I am not new to Python, but I don’t think 
> I’m qualified to take on the idea of creating a new NFA module for Python.  
> Nor am I entirely sure I want to try something new (to me) like TRE. 
> 
> Most threads related to this topic are older than 2007. I did find this 
> https://groups.google.com/forum/#!searchin/comp.lang.python/regex$20speed%7Csort:relevance/comp.lang.python/O7rUwVoD2t0/NYAQM0mUX7sJ
>  from 2011 but I did not do an exhaustive search. 
> 
> The bottom line is I wanted to know if anything has changed since 2007, and 
> if there is a) any hope for improving regex speeds in Python, or b) some 3rd 
> party module/library that is already out there and solves this problem? Or 
> should I just take this advice?
> 
> 
> Thanks.

Check out https://pypi.python.org/pypi/regex and for a little light background 
reading please see http://bugs.python.org/issue2636

Kindest regards.

Mark Lawrence.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions, Speed, Python, and NFA

2017-04-14 Thread Rob Gaddi

On 04/14/2017 08:12 AM, Malik Rumi wrote:

I am running some tests using the site regex101 to figure out the correct
regexs to use for a project. I was surprised at how slow it was, constantly
needing to increase the timeouts. I went Googling for a reason, and solution,
and found Russ Cox’s article from 2007:
https://swtch.com/~rsc/regexp/regexp1.html . I couldn’t understand why, if this
was even remotely correct, we don’t use NFA in Python, which led me here:

https://groups.google.com/forum/#!msg/comp.lang.python/L1ZFI_R2hAo/C12Nf3patWIJ;context-place=forum/comp.lang.python
where all of these issues were addressed. Unfortunately, this is also from
2007.

BTW, John Machin in one of his replies cites Navarro’s paper, but that link is broken.
Navarro’s work can now be found at
http://citeseer.ist.psu.edu/viewdoc/download?doi=10.1.1.21.3112&rep=rep1&type=pdf
But be forewarned, it is 68 pages of dense reading. I am not a computer science major.
I am not new to Python, but I don’t think I’m qualified to take on the idea of creating
a new NFA module for Python.

Getting back to the "It would be nice ..." bit: yes, it would be nice
to have even more smarts in re, but who's going to do it? It's not a
"rainy Sunday afternoon" job :-)
Cheers,
John

Well, just as an idea, there is a portable C library for this at
http://laurikari.net/tre/ released under LGPL. If one is willing to
give up PCRE extensions for speed, it might be worth the work to
wrap this library using SWIG.
Kirk Sluder

(BTW, this link is also old. TRE is now at https://github.com/laurikari/tre/ )

I am not a computer science major. I am not new to Python, but I don’t think
I’m qualified to take on the idea of creating a new NFA module for Python. Nor
am I entirely sure I want to try something new (to me) like TRE.

Most threads related to this topic are older than 2007. I did find this
https://groups.google.com/forum/#!searchin/comp.lang.python/regex$20speed%7Csort:relevance/comp.lang.python/O7rUwVoD2t0/NYAQM0mUX7sJ
from 2011 but I did not do an exhaustive search.

The bottom line is I wanted to know if anything has changed since 2007, and if
there is a) any hope for improving regex speeds in Python, or b) some 3rd party
module/library that is already out there and solves this problem? Or should I
just take this advice?

The cheap way in terms of programmer time is to pipe out to grep or
awk on this one.
Kirk Sluder

Thanks.

I'll also throw in the obligatory quote from Jamie Zawinsky, "Some
people, when confronted with a problem, think 'I know, I'll use regular
expressions.' Now they have two problems."

It's not that regexes are the wrong tool for any job; I personally use
them all the time. But they're, tautologically, the wrong tool for any
job that can be done better with a different one. In Python, you've got
"in", .startswith, .endswith, and .split to handle simple parsing tasks.
On the other end, you've got lxml and the like to handle complex tasks
that provably cannot be done with regexes at all, let alone efficiently.

This leaves them in a fairly bounded middle ground, where is my task so
complex that it warrants something as difficult to read as a regex, but
still simple enough to be solved by one. And that ground is definitely
occupied. But not vast. So is it worth the time to try to write a more
efficient regex parser for Python? Yours if you want it to be, but not
mine.

--
Rob Gaddi, Highland Technology -- www.highlandtechnology.com
Email address domain is currently out of order. See above to fix.
--
https://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions, Speed, Python, and NFA

2017-04-14 Thread Peter Otten

Malik Rumi wrote:

> I am running some tests using the site regex101 to figure out the correct
> regexs to use for a project. I was surprised at how slow it was,
> constantly needing to increase the timeouts. I went Googling for a reason,
> and solution, and found Russ Cox’s article from 2007:
> https://swtch.com/~rsc/regexp/regexp1.html . I couldn’t understand why, if
> this was even remotely correct, we don’t use NFA in Python, which led me
> here:

You might try

https://en.wikipedia.org/wiki/RE2_(software)

for which Python wrappers are available. However, 

"RE2 does not support back-references, which cannot be implemented 
efficiently."

-- 
https://mail.python.org/mailman/listinfo/python-list

RE: Regular Expressions, Speed, Python, and NFA

2017-04-14 Thread Joseph L. Casale

-Original Message-
From: Python-list [mailto:python-list-
bounces+jcasale=activenetwerx@python.org] On Behalf Of Malik Rumi
Sent: Friday, April 14, 2017 9:12 AM
To: python-list@python.org
Subject: Regular Expressions, Speed, Python, and NFA

> I am running some tests using the site regex101 to figure out the correct
> regexs to use for a project. I was surprised at how slow it was, constantly
> needing to increase the timeouts. I went Googling for a reason, and solution,
> and found Russ Cox’s article from 2007:
> https://swtch.com/~rsc/regexp/regexp1.html . I couldn’t understand why, if
> this was even remotely correct, we don’t use NFA in Python, which led me
> here:

Do you have any sample data you can share?
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions, Speed, Python, and NFA

2017-04-14 Thread Steve D'Aprano

On Sat, 15 Apr 2017 01:12 am, Malik Rumi wrote:

> I couldn’t understand why, if this was even remotely correct, 
> we don’t use NFA in Python
[...]

> I don’t think I’m qualified to take on the idea of creating 
> a new NFA module for Python. 

If not you, then who should do it?

Python is open source and written by volunteers. If you want something done,
there are only four possibilities:

- you do it yourself;

- you wait as long as it takes for somebody else to do it;

- you pay somebody to do it;

- or it doesn't get done.

If you're not willing or able to do the work yourself, does that help you
understand why we don't use NFA?

-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

-- 
https://mail.python.org/mailman/listinfo/python-list

Regular Expressions, Speed, Python, and NFA

2017-04-14 Thread Malik Rumi

I am running some tests using the site regex101 to figure out the correct 
regexs to use for a project. I was surprised at how slow it was, constantly 
needing to increase the timeouts. I went Googling for a reason, and solution, 
and found Russ Cox’s article from 2007: 
https://swtch.com/~rsc/regexp/regexp1.html . I couldn’t understand why, if this 
was even remotely correct, we don’t use NFA in Python, which led me here:

https://groups.google.com/forum/#!msg/comp.lang.python/L1ZFI_R2hAo/C12Nf3patWIJ;context-place=forum/comp.lang.python
 where all of these issues were addressed. Unfortunately, this is also from 
2007. 

BTW, John Machin in one of his replies cites Navarro’s paper, but that link is 
broken. Navarro’s work can now be found at 
http://citeseer.ist.psu.edu/viewdoc/download?doi=10.1.1.21.3112&rep=rep1&type=pdf
 But be forewarned, it is 68 pages of dense reading. I am not a computer 
science major. I am not new to Python, but I don’t think I’m qualified to take 
on the idea of creating a new NFA module for Python.  

>Getting back to the "It would be nice ..." bit: yes, it would be nice
>to have even more smarts in re, but who's going to do it? It's not a
>"rainy Sunday afternoon" job :-)
>Cheers,
>John
-
>Well, just as an idea, there is a portable C library for this at 
>http://laurikari.net/tre/ released under LGPL.  If one is willing to 
>give up PCRE extensions for speed, it might be worth the work to 
>wrap this library using SWIG.
>Kirk Sluder

(BTW, this link is also old. TRE is now at https://github.com/laurikari/tre/ )

I am not a computer science major. I am not new to Python, but I don’t think 
I’m qualified to take on the idea of creating a new NFA module for Python.  Nor 
am I entirely sure I want to try something new (to me) like TRE. 

Most threads related to this topic are older than 2007. I did find this 
https://groups.google.com/forum/#!searchin/comp.lang.python/regex$20speed%7Csort:relevance/comp.lang.python/O7rUwVoD2t0/NYAQM0mUX7sJ
 from 2011 but I did not do an exhaustive search. 

The bottom line is I wanted to know if anything has changed since 2007, and if 
there is a) any hope for improving regex speeds in Python, or b) some 3rd party 
module/library that is already out there and solves this problem? Or should I 
just take this advice?

>The cheap way in terms of programmer time is to pipe out to grep or
>awk on this one.
>Kirk Sluder

Thanks. 
-- 
https://mail.python.org/mailman/listinfo/python-list

1 2 3 4 5 6 7 8 >

1 - 100 of 785 matches

Mail list logo