Re: issue with regular expressions

2019-10-22 Thread joseph pareti
Ok, thanks. It works for me.
regards,

Am Di., 22. Okt. 2019 um 11:29 Uhr schrieb Matt Wheeler :

>
>
> On Tue, 22 Oct 2019, 09:44 joseph pareti,  wrote:
>
>> the following code ends in an exception:
>>
>> import re
>> pattern = 'Sottoscrizione unica soluzione'
>> mylines = []# Declare an empty list.
>
> with open ('tmp.txt', 'rt') as myfile:  # Open tmp.txt for reading
>> text.
>> for myline in myfile:   # For each line in the file,
>> mylines.append(myline.rstrip('\n')) # strip newline and add to
>> list.
>> for element in mylines: # For each element in the
>> list,
>> #print(element)
>>match = re.search(pattern, element)
>>s = match.start()
>>e = match.end()
>>print(element[s:e])
>>
>>
>>
>> F:\October20-2019-RECOVERY\Unicredit_recovery\tmp_re_search>c:\Users\joepareti\Miniconda3\pkgs\python-3.7.1-h8c8aaf0_6\python.exe
>> search_0.py
>> Traceback (most recent call last):
>>   File "search_0.py", line 10, in 
>> s = match.start()
>> AttributeError: 'NoneType' object has no attribute 'start'
>>
>> any help? Thanks
>>
>
> Check over the docs for re.match again, you'll see it returns either a
> Match object (which is always truthy), or None.
>
> So a simple solution is to wrap your attempts to use the Match object in
>
> ```
> if match:
> ...
> ```
>
>>

-- 
Regards,
Joseph Pareti - Artificial Intelligence consultant
Joseph Pareti's AI Consulting Services
https://www.joepareti54-ai.com/
cell +49 1520 1600 209
cell +39 339 797 0644
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: issue with regular expressions

2019-10-22 Thread Matt Wheeler
On Tue, 22 Oct 2019, 09:44 joseph pareti,  wrote:

> the following code ends in an exception:
>
> import re
> pattern = 'Sottoscrizione unica soluzione'
> mylines = []# Declare an empty list.

with open ('tmp.txt', 'rt') as myfile:  # Open tmp.txt for reading text.
> for myline in myfile:   # For each line in the file,
> mylines.append(myline.rstrip('\n')) # strip newline and add to
> list.
> for element in mylines: # For each element in the list,
> #print(element)
>match = re.search(pattern, element)
>s = match.start()
>e = match.end()
>print(element[s:e])
>
>
>
> F:\October20-2019-RECOVERY\Unicredit_recovery\tmp_re_search>c:\Users\joepareti\Miniconda3\pkgs\python-3.7.1-h8c8aaf0_6\python.exe
> search_0.py
> Traceback (most recent call last):
>   File "search_0.py", line 10, in 
> s = match.start()
> AttributeError: 'NoneType' object has no attribute 'start'
>
> any help? Thanks
>

Check over the docs for re.match again, you'll see it returns either a
Match object (which is always truthy), or None.

So a simple solution is to wrap your attempts to use the Match object in

```
if match:
...
```

>
-- 
https://mail.python.org/mailman/listinfo/python-list


issue with regular expressions

2019-10-22 Thread joseph pareti
the following code ends in an exception:

import re
pattern = 'Sottoscrizione unica soluzione'
mylines = []# Declare an empty list.
with open ('tmp.txt', 'rt') as myfile:  # Open tmp.txt for reading text.
for myline in myfile:   # For each line in the file,
mylines.append(myline.rstrip('\n')) # strip newline and add to list.
for element in mylines: # For each element in the list,
#print(element)
   match = re.search(pattern, element)
   s = match.start()
   e = match.end()
   print(element[s:e])


F:\October20-2019-RECOVERY\Unicredit_recovery\tmp_re_search>c:\Users\joepareti\Miniconda3\pkgs\python-3.7.1-h8c8aaf0_6\python.exe
search_0.py
Traceback (most recent call last):
  File "search_0.py", line 10, in 
s = match.start()
AttributeError: 'NoneType' object has no attribute 'start'

any help? Thanks
-- 
Regards,
Joseph Pareti - Artificial Intelligence consultant
Joseph Pareti's AI Consulting Services
https://www.joepareti54-ai.com/
cell +49 1520 1600 209
cell +39 339 797 0644
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: regular expressions help

2019-09-20 Thread Barry Scott
When I'm debugging a regex I make the regex shorter and shorter to figure out
what the problem is.

Try starting with re.compile(r'm') and then add the chars one by one seeing
what happens as the string gets longer.

Barry


> On 19 Sep 2019, at 09:41, Pradeep Patra  wrote:
> 
> I am using python 2.7.6 but I also tried on python 3.7.3.
> 
> On Thursday, September 19, 2019, Pradeep Patra 
> wrote:
> 
>> Beginning of the string. But I tried removing that as well and it still
>> could not find it. When I tested at www.regex101.com and it matched
>> successfully whereas I may be wrong. Could you please help here?
>> 
>> On Thursday, September 19, 2019, David  wrote:
>> 
>>> On Thu, 19 Sep 2019 at 17:51, Pradeep Patra 
>>> wrote:
 
 pattern=re.compile(r'^my\-dog$')
 matches = re.search(mystr)
 
 In the above example both cases(match/not match) the matches returns
>>> "None"
>>> 
>>> Hi, do you know what the '^' character does in your pattern?
>>> 
>> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list
> 

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: regular expressions help

2019-09-19 Thread Chris Angelico
On Fri, Sep 20, 2019 at 1:07 AM Pradeep Patra  wrote:
>
> Thanks  David /Anthony for your help. I figured out the issue myself. I
> dont need any ^, $ etc to the regex pattern and the plain string (for exp
> my-dog) works fine. I am looking at creating a generic method so that
> instead of passing my-dog i can pass my-cat or blah blah. I am thinking of
> creating a list of probable combinations to search from the list. Anybody
> have better ideas?
>

If you just want to find a string in another string, don't use regular
expressions at all! Just ask Python directly:

>>> print("my-cat" in "This is where you can find my-cat, look, see!")
True

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: regular expressions help

2019-09-19 Thread Pradeep Patra
Thanks  David /Anthony for your help. I figured out the issue myself. I
dont need any ^, $ etc to the regex pattern and the plain string (for exp
my-dog) works fine. I am looking at creating a generic method so that
instead of passing my-dog i can pass my-cat or blah blah. I am thinking of
creating a list of probable combinations to search from the list. Anybody
have better ideas?

On Thu, Sep 19, 2019 at 3:46 PM David  wrote:

> On Thu, 19 Sep 2019 at 19:34, Pradeep Patra 
> wrote:
>
> > Thanks David for your quick help. Appreciate it. When I tried on python
> 2.7.3 the same thing you did below I got the error after matches.group(0)
> as follows:
> >
> > AttributeError: NoneType object has no attribute 'group'.
> >
> > I tried to check 'None' for no match for re.search as the documentation
> says but it's not working.
> >
> > Unfortunately I cannot update the python version now to 2.7.13 as other
> programs are using this version and need to test all and it requires more
> testing. Any idea how I can fix this ? I am ok to use any other re
> method(not only tied to re.search) as long as it works.
>
> Hi again Pradeep,
>
> We are now on email number seven, so I am
> going to try to give you some good advice ...
>
> When you ask on a forum like this for help, it is very
> important to show people exactly what you did.
> Everything that you did. In the shortest possible
> way that demonstrates whatever issue you are
> facing.
>
> It is best to give us a recipe that we can follow
> exactly that shows every step that you do when
> you have the problem that you need help with.
>
> And the best way to do that is for you to learn
> how to cut and paste between where you run
> your problem code, and where you send your
> email message to us.
>
> Please observe the way that I communicated with
> you last time. I sent you an exact cut and paste
> from my terminal, to help you by allowing you to
> duplicate exactly every step that I made.
>
> You should communicate with us in the same
> way. Because when you write something like
> your most recent message
>
> > I got the error after matches.group(0) as follows:
> > AttributeError: NoneType object has no attribute 'group'.
>
> this tells us nothing useful!! Because we cannot
> see everything you did leading up to that, so we
> cannot reproduce your problem.
>
> For us to help you, you need to show all the steps,
> the same way I did.
>
> Now, to help you, I found the same old version of
> Python 2 that you have, to prove to you that it works
> on your version.
>
> So you talking about updating Python is not going
> to help. Instead, you need to work out what you
> are doing that is causing your problem.
>
> Again, I cut and paste my whole session to show
> you, see below. Notice that the top lines show that
> it is the same version that you have.
>
> If you cut and paste my commands into
> your Python then it should work the same way
> for you too.
>
> If it does not work for you, then SHOW US THE
> WHOLE SESSION, EVERY STEP, so that we can
> reproduce your problem. Run your python in a terminal,
> and copy and paste the output you get into your message.
>
> $ python
> Python 2.7.3 (default, Jun 20 2016, 16:18:47)
> [GCC 4.7.2] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import re
> >>> mystr = "where is my-dog"
> >>> pattern = re.compile(r'my-dog$')
> >>> matches = re.search(pattern, mystr)
> >>> matches.group(0)
> 'my-dog'
> >>>
>
> I hope you realise that the re module has been used
> by thousands of programmers, for many years.
> So it's extremely unlikely that it "doesn't work" in a way that
> gets discovered by someone who hardly knows how to use it.
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: regular expressions help

2019-09-19 Thread David
On Thu, 19 Sep 2019 at 19:34, Pradeep Patra  wrote:

> Thanks David for your quick help. Appreciate it. When I tried on python 2.7.3 
> the same thing you did below I got the error after matches.group(0) as 
> follows:
>
> AttributeError: NoneType object has no attribute 'group'.
>
> I tried to check 'None' for no match for re.search as the documentation says 
> but it's not working.
>
> Unfortunately I cannot update the python version now to 2.7.13 as other 
> programs are using this version and need to test all and it requires more 
> testing. Any idea how I can fix this ? I am ok to use any other re method(not 
> only tied to re.search) as long as it works.

Hi again Pradeep,

We are now on email number seven, so I am
going to try to give you some good advice ...

When you ask on a forum like this for help, it is very
important to show people exactly what you did.
Everything that you did. In the shortest possible
way that demonstrates whatever issue you are
facing.

It is best to give us a recipe that we can follow
exactly that shows every step that you do when
you have the problem that you need help with.

And the best way to do that is for you to learn
how to cut and paste between where you run
your problem code, and where you send your
email message to us.

Please observe the way that I communicated with
you last time. I sent you an exact cut and paste
from my terminal, to help you by allowing you to
duplicate exactly every step that I made.

You should communicate with us in the same
way. Because when you write something like
your most recent message

> I got the error after matches.group(0) as follows:
> AttributeError: NoneType object has no attribute 'group'.

this tells us nothing useful!! Because we cannot
see everything you did leading up to that, so we
cannot reproduce your problem.

For us to help you, you need to show all the steps,
the same way I did.

Now, to help you, I found the same old version of
Python 2 that you have, to prove to you that it works
on your version.

So you talking about updating Python is not going
to help. Instead, you need to work out what you
are doing that is causing your problem.

Again, I cut and paste my whole session to show
you, see below. Notice that the top lines show that
it is the same version that you have.

If you cut and paste my commands into
your Python then it should work the same way
for you too.

If it does not work for you, then SHOW US THE
WHOLE SESSION, EVERY STEP, so that we can
reproduce your problem. Run your python in a terminal,
and copy and paste the output you get into your message.

$ python
Python 2.7.3 (default, Jun 20 2016, 16:18:47)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> mystr = "where is my-dog"
>>> pattern = re.compile(r'my-dog$')
>>> matches = re.search(pattern, mystr)
>>> matches.group(0)
'my-dog'
>>>

I hope you realise that the re module has been used
by thousands of programmers, for many years.
So it's extremely unlikely that it "doesn't work" in a way that
gets discovered by someone who hardly knows how to use it.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: regular expressions help

2019-09-19 Thread Pradeep Patra
Thanks David for your quick help. Appreciate it. When I tried on python
2.7.3 the same thing you did below I got the error after matches.group(0)
as follows:

AttributeError: NoneType object has no attribute 'group'.

I tried to check 'None' for no match for re.search as the documentation
says but it's not working.

Unfortunately I cannot update the python version now to 2.7.13 as other
programs are using this version and need to test all and it requires more
testing. Any idea how I can fix this ? I am ok to use any other re
method(not only tied to re.search) as long as it works.

On Thursday, September 19, 2019, David  wrote:

> On Thu, 19 Sep 2019 at 18:41, Pradeep Patra 
> wrote:
> > On Thursday, September 19, 2019, Pradeep Patra 
> wrote:
> >> On Thursday, September 19, 2019, David  wrote:
> >>> On Thu, 19 Sep 2019 at 17:51, Pradeep Patra 
> wrote:
>
> >>> > pattern=re.compile(r'^my\-dog$')
> >>> > matches = re.search(mystr)
>
> >>> > In the above example both cases(match/not match) the matches returns
> "None"
>
> >>> Hi, do you know what the '^' character does in your pattern?
>
> >> Beginning of the string. But I tried removing that as well and it still
> could not find it. When I tested at www.regex101.com and it matched
> successfully whereas I may be wrong. Could you please help here?
>
> > I am using python 2.7.6 but I also tried on python 3.7.3.
>
> $ python2
> Python 2.7.13 (default, Sep 26 2018, 18:42:22)
> [GCC 6.3.0 20170516] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import re
> >>> mystr= "where is my-dog"
> >>> pattern=re.compile(r'my-dog$')
> >>> matches = re.search(mystr)  # this is syntax error, but it is what you
> showed above
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: search() takes at least 2 arguments (1 given)
> >>> matches = re.search(pattern, mystr)
> >>> matches.group(0)
> 'my-dog'
> >>>
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: regular expressions help

2019-09-19 Thread David
On Thu, 19 Sep 2019 at 18:41, Pradeep Patra  wrote:
> On Thursday, September 19, 2019, Pradeep Patra  
> wrote:
>> On Thursday, September 19, 2019, David  wrote:
>>> On Thu, 19 Sep 2019 at 17:51, Pradeep Patra  
>>> wrote:

>>> > pattern=re.compile(r'^my\-dog$')
>>> > matches = re.search(mystr)

>>> > In the above example both cases(match/not match) the matches returns 
>>> > "None"

>>> Hi, do you know what the '^' character does in your pattern?

>> Beginning of the string. But I tried removing that as well and it still 
>> could not find it. When I tested at www.regex101.com and it matched 
>> successfully whereas I may be wrong. Could you please help here?

> I am using python 2.7.6 but I also tried on python 3.7.3.

$ python2
Python 2.7.13 (default, Sep 26 2018, 18:42:22)
[GCC 6.3.0 20170516] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> mystr= "where is my-dog"
>>> pattern=re.compile(r'my-dog$')
>>> matches = re.search(mystr)  # this is syntax error, but it is what you 
>>> showed above
Traceback (most recent call last):
  File "", line 1, in 
TypeError: search() takes at least 2 arguments (1 given)
>>> matches = re.search(pattern, mystr)
>>> matches.group(0)
'my-dog'
>>>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: regular expressions help

2019-09-19 Thread Pradeep Patra
I am using python 2.7.6 but I also tried on python 3.7.3.

On Thursday, September 19, 2019, Pradeep Patra 
wrote:

> Beginning of the string. But I tried removing that as well and it still
> could not find it. When I tested at www.regex101.com and it matched
> successfully whereas I may be wrong. Could you please help here?
>
> On Thursday, September 19, 2019, David  wrote:
>
>> On Thu, 19 Sep 2019 at 17:51, Pradeep Patra 
>> wrote:
>> >
>> > pattern=re.compile(r'^my\-dog$')
>> > matches = re.search(mystr)
>> >
>> > In the above example both cases(match/not match) the matches returns
>> "None"
>>
>> Hi, do you know what the '^' character does in your pattern?
>>
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: regular expressions help

2019-09-19 Thread David
On Thu, 19 Sep 2019 at 17:51, Pradeep Patra  wrote:
>
> pattern=re.compile(r'^my\-dog$')
> matches = re.search(mystr)
>
> In the above example both cases(match/not match) the matches returns "None"

Hi, do you know what the '^' character does in your pattern?
-- 
https://mail.python.org/mailman/listinfo/python-list


regular expressions help

2019-09-19 Thread Pradeep Patra
Hi all,

I was playing around with regular expressions and testing the simple
regular expression and its notworking for some reason.

I want to search "my-dog" at any of the place in a string and return the
index but its not working. I tried both in python 3.7.3 and 2.7.x. Can
anyone please help?
I tried re.search, re.finditer, re.findall and none of them is not working
for me.
import re

mystr= "where is my-dog"

pattern=re.compile(r'^my\-dog$')
matches = re.search(mystr)

print(matches)

In the above example both cases(match/not match) the matches returns "None"

I tried re.finditer() and then a loop to find all the occurences of the
pattern in the string but even if there is no error but i could not find
the match.

Can anyone help me in this regard?

Regards
Pradeep
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular Expressions, Speed, Python, and NFA

2017-04-17 Thread breamoreboy
On Friday, April 14, 2017 at 4:12:27 PM UTC+1, Malik Rumi wrote:
> I am running some tests using the site regex101 to figure out the correct 
> regexs to use for a project. I was surprised at how slow it was, constantly 
> needing to increase the timeouts. I went Googling for a reason, and solution, 
> and found Russ Cox’s article from 2007: 
> https://swtch.com/~rsc/regexp/regexp1.html . I couldn’t understand why, if 
> this was even remotely correct, we don’t use NFA in Python, which led me here:
> 
> https://groups.google.com/forum/#!msg/comp.lang.python/L1ZFI_R2hAo/C12Nf3patWIJ;context-place=forum/comp.lang.python
>  where all of these issues were addressed. Unfortunately, this is also from 
> 2007. 
> 
> BTW, John Machin in one of his replies cites Navarro’s paper, but that link 
> is broken. Navarro’s work can now be found at 
> http://citeseer.ist.psu.edu/viewdoc/download?doi=10.1.1.21.3112&rep=rep1&type=pdf
>  But be forewarned, it is 68 pages of dense reading. I am not a computer 
> science major. I am not new to Python, but I don’t think I’m qualified to 
> take on the idea of creating a new NFA module for Python.  
> 
> I am not a computer science major. I am not new to Python, but I don’t think 
> I’m qualified to take on the idea of creating a new NFA module for Python.  
> Nor am I entirely sure I want to try something new (to me) like TRE. 
> 
> Most threads related to this topic are older than 2007. I did find this 
> https://groups.google.com/forum/#!searchin/comp.lang.python/regex$20speed%7Csort:relevance/comp.lang.python/O7rUwVoD2t0/NYAQM0mUX7sJ
>  from 2011 but I did not do an exhaustive search. 
> 
> The bottom line is I wanted to know if anything has changed since 2007, and 
> if there is a) any hope for improving regex speeds in Python, or b) some 3rd 
> party module/library that is already out there and solves this problem? Or 
> should I just take this advice?
> 
> 
> Thanks.

Check out https://pypi.python.org/pypi/regex and for a little light background 
reading please see http://bugs.python.org/issue2636

Kindest regards.

Mark Lawrence.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular Expressions, Speed, Python, and NFA

2017-04-14 Thread Rob Gaddi

On 04/14/2017 08:12 AM, Malik Rumi wrote:

I am running some tests using the site regex101 to figure out the correct 
regexs to use for a project. I was surprised at how slow it was, constantly 
needing to increase the timeouts. I went Googling for a reason, and solution, 
and found Russ Cox’s article from 2007: 
https://swtch.com/~rsc/regexp/regexp1.html . I couldn’t understand why, if this 
was even remotely correct, we don’t use NFA in Python, which led me here:

https://groups.google.com/forum/#!msg/comp.lang.python/L1ZFI_R2hAo/C12Nf3patWIJ;context-place=forum/comp.lang.python
 where all of these issues were addressed. Unfortunately, this is also from 
2007.

BTW, John Machin in one of his replies cites Navarro’s paper, but that link is broken. 
Navarro’s work can now be found at 
http://citeseer.ist.psu.edu/viewdoc/download?doi=10.1.1.21.3112&rep=rep1&type=pdf
 But be forewarned, it is 68 pages of dense reading. I am not a computer science major. 
I am not new to Python, but I don’t think I’m qualified to take on the idea of creating 
a new NFA module for Python.


Getting back to the "It would be nice ..." bit: yes, it would be nice
to have even more smarts in re, but who's going to do it? It's not a
"rainy Sunday afternoon" job :-)
Cheers,
John

-

Well, just as an idea, there is a portable C library for this at
http://laurikari.net/tre/ released under LGPL.  If one is willing to
give up PCRE extensions for speed, it might be worth the work to
wrap this library using SWIG.
Kirk Sluder


(BTW, this link is also old. TRE is now at https://github.com/laurikari/tre/ )

I am not a computer science major. I am not new to Python, but I don’t think 
I’m qualified to take on the idea of creating a new NFA module for Python.  Nor 
am I entirely sure I want to try something new (to me) like TRE.

Most threads related to this topic are older than 2007. I did find this 
https://groups.google.com/forum/#!searchin/comp.lang.python/regex$20speed%7Csort:relevance/comp.lang.python/O7rUwVoD2t0/NYAQM0mUX7sJ
 from 2011 but I did not do an exhaustive search.

The bottom line is I wanted to know if anything has changed since 2007, and if 
there is a) any hope for improving regex speeds in Python, or b) some 3rd party 
module/library that is already out there and solves this problem? Or should I 
just take this advice?


The cheap way in terms of programmer time is to pipe out to grep or
awk on this one.
Kirk Sluder


Thanks.



I'll also throw in the obligatory quote from Jamie Zawinsky, "Some 
people, when confronted with a problem, think 'I know, I'll use regular 
expressions.'   Now they have two problems."


It's not that regexes are the wrong tool for any job; I personally use 
them all the time.  But they're, tautologically, the wrong tool for any 
job that can be done better with a different one.  In Python, you've got 
"in", .startswith, .endswith, and .split to handle simple parsing tasks. 
 On the other end, you've got lxml and the like to handle complex tasks 
that provably cannot be done with regexes at all, let alone efficiently.


This leaves them in a fairly bounded middle ground, where is my task so 
complex that it warrants something as difficult to read as a regex, but 
still simple enough to be solved by one.  And that ground is definitely 
occupied.  But not vast.  So is it worth the time to try to write a more 
efficient regex parser for Python?  Yours if you want it to be, but not 
mine.



--
Rob Gaddi, Highland Technology -- www.highlandtechnology.com
Email address domain is currently out of order.  See above to fix.
--
https://mail.python.org/mailman/listinfo/python-list


Re: Regular Expressions, Speed, Python, and NFA

2017-04-14 Thread Peter Otten
Malik Rumi wrote:

> I am running some tests using the site regex101 to figure out the correct
> regexs to use for a project. I was surprised at how slow it was,
> constantly needing to increase the timeouts. I went Googling for a reason,
> and solution, and found Russ Cox’s article from 2007:
> https://swtch.com/~rsc/regexp/regexp1.html . I couldn’t understand why, if
> this was even remotely correct, we don’t use NFA in Python, which led me
> here:

You might try

https://en.wikipedia.org/wiki/RE2_(software)

for which Python wrappers are available. However, 

"RE2 does not support back-references, which cannot be implemented 
efficiently."

-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Regular Expressions, Speed, Python, and NFA

2017-04-14 Thread Joseph L. Casale
-Original Message-
From: Python-list [mailto:python-list-
bounces+jcasale=activenetwerx@python.org] On Behalf Of Malik Rumi
Sent: Friday, April 14, 2017 9:12 AM
To: python-list@python.org
Subject: Regular Expressions, Speed, Python, and NFA

> I am running some tests using the site regex101 to figure out the correct
> regexs to use for a project. I was surprised at how slow it was, constantly
> needing to increase the timeouts. I went Googling for a reason, and solution,
> and found Russ Cox’s article from 2007:
> https://swtch.com/~rsc/regexp/regexp1.html . I couldn’t understand why, if
> this was even remotely correct, we don’t use NFA in Python, which led me
> here:

Do you have any sample data you can share?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular Expressions, Speed, Python, and NFA

2017-04-14 Thread Steve D'Aprano
On Sat, 15 Apr 2017 01:12 am, Malik Rumi wrote:

> I couldn’t understand why, if this was even remotely correct, 
> we don’t use NFA in Python
[...]

> I don’t think I’m qualified to take on the idea of creating 
> a new NFA module for Python. 

If not you, then who should do it?

Python is open source and written by volunteers. If you want something done,
there are only four possibilities:

- you do it yourself;

- you wait as long as it takes for somebody else to do it;

- you pay somebody to do it;

- or it doesn't get done.


If you're not willing or able to do the work yourself, does that help you
understand why we don't use NFA?



-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

-- 
https://mail.python.org/mailman/listinfo/python-list


Regular Expressions, Speed, Python, and NFA

2017-04-14 Thread Malik Rumi
I am running some tests using the site regex101 to figure out the correct 
regexs to use for a project. I was surprised at how slow it was, constantly 
needing to increase the timeouts. I went Googling for a reason, and solution, 
and found Russ Cox’s article from 2007: 
https://swtch.com/~rsc/regexp/regexp1.html . I couldn’t understand why, if this 
was even remotely correct, we don’t use NFA in Python, which led me here:

https://groups.google.com/forum/#!msg/comp.lang.python/L1ZFI_R2hAo/C12Nf3patWIJ;context-place=forum/comp.lang.python
 where all of these issues were addressed. Unfortunately, this is also from 
2007. 

BTW, John Machin in one of his replies cites Navarro’s paper, but that link is 
broken. Navarro’s work can now be found at 
http://citeseer.ist.psu.edu/viewdoc/download?doi=10.1.1.21.3112&rep=rep1&type=pdf
 But be forewarned, it is 68 pages of dense reading. I am not a computer 
science major. I am not new to Python, but I don’t think I’m qualified to take 
on the idea of creating a new NFA module for Python.  

>Getting back to the "It would be nice ..." bit: yes, it would be nice
>to have even more smarts in re, but who's going to do it? It's not a
>"rainy Sunday afternoon" job :-)
>Cheers,
>John
-
>Well, just as an idea, there is a portable C library for this at 
>http://laurikari.net/tre/ released under LGPL.  If one is willing to 
>give up PCRE extensions for speed, it might be worth the work to 
>wrap this library using SWIG.
>Kirk Sluder

(BTW, this link is also old. TRE is now at https://github.com/laurikari/tre/ )

I am not a computer science major. I am not new to Python, but I don’t think 
I’m qualified to take on the idea of creating a new NFA module for Python.  Nor 
am I entirely sure I want to try something new (to me) like TRE. 

Most threads related to this topic are older than 2007. I did find this 
https://groups.google.com/forum/#!searchin/comp.lang.python/regex$20speed%7Csort:relevance/comp.lang.python/O7rUwVoD2t0/NYAQM0mUX7sJ
 from 2011 but I did not do an exhaustive search. 

The bottom line is I wanted to know if anything has changed since 2007, and if 
there is a) any hope for improving regex speeds in Python, or b) some 3rd party 
module/library that is already out there and solves this problem? Or should I 
just take this advice?

>The cheap way in terms of programmer time is to pipe out to grep or
>awk on this one.
>Kirk Sluder

Thanks. 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-06 Thread rurpy--- via Python-list
On Thursday, November 5, 2015 at 8:12:22 AM UTC-7, Seymore4Head wrote:
> On Thu, 05 Nov 2015 11:54:20 +1100, Steven D'Aprano  
> wrote:
> >On Thu, 5 Nov 2015 10:02 am, Seymore4Head wrote:
> >> So far the only use I have for regex is to replace slicing, but I
> >> think it is an improvement.
> >
> >I don't understand this. This is like saying "so far the only use I have for
> >a sandwich press is to replace my coffee pot". Regular expressions and
> >slicing do very different things.
> >[...]
> 
> Here is an example of the text we are slicing apart.
> 
>[...email headers...]
>
> The practice problems are something like pull out all the email
> addresses or pull out the days of the week and give the most common.

Yes, that is a perfectly appropriate use of regexes.

As Steven mentioned though, the term "slicing" is also used with a 
very specific and different meaning in Python, specifically referring
to a part of a list using a syntax like "alist[a:b]".  I can't seem
to get to python.org at the moment but if you look in the Python
docs index under "slicing" you'll find more info.
 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-06 Thread Larry Martell
On Fri, Nov 6, 2015 at 3:36 PM, Christian Gollwitzer  wrote:
> Am 06.11.15 um 20:52 schrieb ru...@yahoo.com:
>>
>> I have always thought lexing
>> and parsing solutions for Python were a weak spot in the Python eco-
>> system and I was about to write that I would love to see a PEG parser
>> for python when I saw this:
>>
>> http://fdik.org/pyPEG/
>>
>> Unfortunately it suffers from the same problem that Pyparsing, Ply
>> and the rest suffer from: they use Python syntax to express the
>> parsing rules rather than using a dedicated problem-specific syntax
>> such as you used to illustrate peg parsing:
>>
>>> pattern <- phone_number name phone_number
>
>>> phone_number <- '+' [0-9]+ ( '-' [0-9]+ )*
>>> name <-  [[:alpha:]]+
>
> That is actually real syntax of a parser generator used by me for another
> language (Tcl). A calculator example using this package can be found here:
> http://wiki.tcl.tk/39011
> (actually it is a retargetable compiler in a few lines - very impressive)

Ah, Tcl - I wrote many a Tcl script back in the 80s to login to BBSs.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-06 Thread Christian Gollwitzer

Am 06.11.15 um 20:52 schrieb ru...@yahoo.com:

I have always thought lexing
and parsing solutions for Python were a weak spot in the Python eco-
system and I was about to write that I would love to see a PEG parser
for python when I saw this:

http://fdik.org/pyPEG/

Unfortunately it suffers from the same problem that Pyparsing, Ply
and the rest suffer from: they use Python syntax to express the
parsing rules rather than using a dedicated problem-specific syntax
such as you used to illustrate peg parsing:


pattern <- phone_number name phone_number

>> phone_number <- '+' [0-9]+ ( '-' [0-9]+ )*
>> name <-  [[:alpha:]]+

That is actually real syntax of a parser generator used by me for 
another language (Tcl). A calculator example using this package can be 
found here: http://wiki.tcl.tk/39011

(actually it is a retargetable compiler in a few lines - very impressive)

And exactly as you say, it is working well exactly because it doesn't 
try to abuse function composition in the frontend to construct the parser.


Looking through the parser generators listed at 
http://bford.info/packrat/ it seems that waxeye could be interesting 
http://waxeye.org/manual.html#_using_waxeye - however I'm not sure the 
Python backend works with Python 3, maybe there will be unicode issues. 
Another bonus would be a compilable backend, like Cython or similar. The 
pt package mentioned above allows to generate a C module with an 
interface for Tcl. Compiled parsers are approximately 100x faster. I 
would expect a similar speedup for Python parsers.



Some here have complained about excessive brevity of regexs but I
much prefer using problem-specific syntax like "(a*)" to having to
express a pattern using python with something like

star = RegexMatchAny()
a_group = RegexGroup('a' + star)
...


Yeah that is nonsense. Mechanical verbosity never leads to clarity (XML 
anyone?)



I think in many cases those most hostile to regexes are the also
those who use them (or need to use them) the least. While my use
of regexes are limited to fairly simple ones they are complicated
enough that I'm sure it would take orders of magnitude longer
to get the same effect in python.


That's also my impression. The "two problems quote" was lame already for 
the first time. If you are satisfied with simple string functions, then 
either you do not have problems where you need regexps/other formal 
parsing tools, or you are very masochistic.


Christian
--
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-06 Thread rurpy--- via Python-list
On 11/05/2015 01:18 AM, Christian Gollwitzer wrote:
> Am 05.11.15 um 06:59 schrieb rurpy:
>>> Can you call yourself a well-rounded programmer without at least
>>> a basic understanding of some regex library? Well, probably not.
>>> But that's part of the problem with regexes. They have, to some
>>> degree, driven out potentially better -- or at least differently
>>> bad -- pattern matching solutions, such as (E)BNF grammars,
>>> SNOBOL pattern matching, or lowly globbing patterns. Or even
>>> alternative idioms, like Hypercard's "chunking" idioms.
>> 
>> Hmm, very good point.  I wonder why all those "potentially better" 
>> solutions have not been more widely adopted?  A conspiracy by a 
>> secret regex cabal?
> 
> I'm mostly on the pro-side of the regex discussion, but this IS a
> valid point. regexes are not always a good way to express a pattern,
> even if the pattern is regular. The point is, that you can't build
> them up easily piece-by-piece. Say, you want a regex like "first an
> international phone number, then a name, then a second phone number"
> - you will have to *repeat* the pattern for phone number twice. In
> more complex cases this can become a nightmare, like the monster that
> was mentioned before to validate an email.
> 
> A better alternative, then, is PEG for example. You can easily write
> [...]

That is the solution adopted by Perl 6. I have always thought lexing
and parsing solutions for Python were a weak spot in the Python eco-
system and I was about to write that I would love to see a PEG parser
for python when I saw this:

http://fdik.org/pyPEG/

Unfortunately it suffers from the same problem that Pyparsing, Ply
and the rest suffer from: they use Python syntax to express the
parsing rules rather than using a dedicated problem-specific syntax
such as you used to illustrate peg parsing:

> pattern <- phone_number name phone_number phone_number <- '+' [0-9]+
> ( '-' [0-9]+ )* name <-  [[:alpha:]]+

Some here have complained about excessive brevity of regexs but I
much prefer using problem-specific syntax like "(a*)" to having to
express a pattern using python with something like

star = RegexMatchAny()
a_group = RegexGroup('a' + star)
...

and I don't want to have to do something similar with PEG (or Ply
or Pyparsing) to formulate their rules.

>[...]
> As a 12 year old, not knowing anything about pattern recognition, but
> thinking I was the king, as is usual for boys in that age, I sat down
> and manually constructed a recursive descent parser in a BASIC like
> language. It had 1000 lines and took me a few weeks to get it
> correct. Finally the solution was accepted as working, but my
> participation was rejected because the solutions lacked
> documentation. 16 years later I used the problem for a course on
> string processing (that's what the PDF is for), and asked the
> students to solve it using regexes. My own solution consists of 67
> characters, and it took me5 minutes to write it down.
> 
> Admittedly, this problem is constructed, but solving similar tasks by
> regexes is still something that I need to do on a daily basis, when I
> get data from other scientists in odd formats and I need to
> preprocess them. I know people who use a spreadsheet and copy/paste
> millions of datapoints manually becasue they lack the knowledge of
> using such tools.

I think in many cases those most hostile to regexes are the also
those who use them (or need to use them) the least. While my use
of regexes are limited to fairly simple ones they are complicated
enough that I'm sure it would take orders of magnitude longer
to get the same effect in python.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-05 Thread Seymore4Head
On Thu, 05 Nov 2015 11:54:20 +1100, Steven D'Aprano
 wrote:

>On Thu, 5 Nov 2015 10:02 am, Seymore4Head wrote:
>
>> So far the only use I have for regex is to replace slicing, but I
>> think it is an improvement.
>
>I don't understand this. This is like saying "so far the only use I have for
>a sandwich press is to replace my coffee pot". Regular expressions and
>slicing do very different things.
>
>Slicing extracts substrings, given known starting and ending positions:
>
>
>py> the_str = "Now is the time for all good men..."
>py> the_str[7:12]
>'the t'
>
>
>Regular expressions don't extract substrings with known start/end positions.
>They *find* matching text, giving a search string with metacharacters. (If
>there are no metacharacters in your search string, you shouldn't use a
>regex. str.find will be significantly faster and more convenient.)
>
>Slicing is not about finding text, it is about extracting text once you've
>already found it. So they are complementary, not alternatives.

Here is an example of the text we are slicing apart.

>From stephen.marqu...@uct.ac.za Sat Jan  5 09:14:16 2008
Return-Path: 
Received: from murder (mail.umich.edu [141.211.14.90])
 by frankenstein.mail.umich.edu (Cyrus v2.3.8) with LMTPA;
 Sat, 05 Jan 2008 09:14:16 -0500
X-Sieve: CMU Sieve 2.3
Received: from murder ([unix socket])
 by mail.umich.edu (Cyrus v2.2.12) with LMTPA;
 Sat, 05 Jan 2008 09:14:16 -0500
Received: from holes.mr.itd.umich.edu (holes.mr.itd.umich.edu
[141.211.14.79])
by flawless.mail.umich.edu () with ESMTP id m05EEFR1013674;
Sat, 5 Jan 2008 09:14:15 -0500
Received: FROM paploo.uhi.ac.uk (app1.prod.collab.uhi.ac.uk
[194.35.219.184])
BY holes.mr.itd.umich.edu ID 477F90B0.2DB2F.12494 ; 
 5 Jan 2008 09:14:10 -0500
Received: from paploo.uhi.ac.uk (localhost [127.0.0.1])
by paploo.uhi.ac.uk (Postfix) with ESMTP id 5F919BC2F2;
Sat,  5 Jan 2008 14:10:05 + (GMT)
Message-ID: <200801051412.m05eciah010...@nakamura.uits.iupui.edu>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Received: from prod.collab.uhi.ac.uk ([194.35.219.182])
  by paploo.uhi.ac.uk (JAMES SMTP Server 2.1.3) with SMTP ID
899
  for ;
  Sat, 5 Jan 2008 14:09:50 + (GMT)
Received: from nakamura.uits.iupui.edu (nakamura.uits.iupui.edu
[134.68.220.122])
by shmi.uhi.ac.uk (Postfix) with ESMTP id A215243002
for ; Sat,  5 Jan 2008
14:13:33 + (GMT)
Received: from nakamura.uits.iupui.edu (localhost [127.0.0.1])
by nakamura.uits.iupui.edu (8.12.11.20060308/8.12.11) with
ESMTP id m05ECJVp010329
for ; Sat, 5 Jan 2008 09:12:19
-0500
Received: (from apache@localhost)
by nakamura.uits.iupui.edu (8.12.11.20060308/8.12.11/Submit)
id m05ECIaH010327
for sou...@collab.sakaiproject.org; Sat, 5 Jan 2008 09:12:18
-0500
Date: Sat, 5 Jan 2008 09:12:18 -0500
X-Authentication-Warning: nakamura.uits.iupui.edu: apache set sender
to stephen.marqu...@uct.ac.za using -f
To: sou...@collab.sakaiproject.org
From: stephen.marqu...@uct.ac.za

The practice problems are something like pull out all the email
addresses or pull out the days of the week and give the most common.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-05 Thread Tim Chase
On 2015-11-05 23:05, Steven D'Aprano wrote:
> Oh the shame, I knew that. Somehow I tangled myself in a knot,
> thinking that it had to be 1 *followed by* zero or more characters.
> But of course it's not a glob, it's a regex.

But that's a good reminder of fnmatch/glob modules too.  Sometimes
all you need is to express a simple glob, in which case using a
regexp can cloud the clarity.

The overarching principle is to go for clarity & simplicity, rather
than favoring built-ins/glob/regex/parser modules all the time.

Want to test for presence in a string?  Just use the builtin "a in b"
test.  At the beginning/end?  Use .startswith()/.endswith() for
clarity.  Need to check if a string is purely
digits/alpha/alphanumerics/etc?  Use the
string 
.is{alnum,alpha,decimal,digit,identifier,lower,numeric,printable,space,title,upper}
methods on the string.

For simple wild-carding, use the fnmatch module to do simple
globbing.

For more complex pattern matching, you've got regexps.

Finally, for occasions when you're searching for repeated/nested
structures, using an add-on module like pyparsing will give you
clearer code.

Oh, and with regexps, people should be less afraid of verbose
multi-line strings with commenting

  r = re.compile(r"""
^   # start of the string
(?P\d{4}) # capture 4 digits
-   # a literal dash
(?P\d{1,2})  # capture 1-2 digits
-   # another literal dash
(?P\d{1,2})# capture 1-2 digits
_   # a literal underscore
(?P # capture the account-number
  [A-Z]{1,3}   # 1-3 letters
  \d+  # followed by 1+ digits
  )
\.txt   # the extension of the file (ignored)
$   # the end of the string
""", re.VERBOSE)

They are a LOT easier to come back to if you haven't touched the code
for a year.

-tkc



-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-05 Thread Albert van der Horst
Steven D'Aprano  writes:

>On Wed, 4 Nov 2015 07:57 pm, Peter Otten wrote:

>> I tried Tim's example
>>
>> $ seq 5 | grep '1*'
>> 1
>> 2
>> 3
>> 4
>> 5
>> $

>I don't understand this. What on earth is grep matching? How does "4"
>match "1*"?


>> which surprised me because I remembered that there usually weren't any
>> matching lines when I invoked grep instead of egrep by mistake. So I tried
>> another one
>>
>> $ seq 5 | grep '[1-3]+'
>> $
>>
>> and then headed for the man page. Apparently there is a subset called
>> "basic regular expressions":
>>
>> """
>>   Basic vs Extended Regular Expressions
>>In basic regular expressions the meta-characters ?, +, {, |, (,
>>and ) lose their special meaning; instead use  the  backslashed
>>versions \?, \+, \{, \|, \(, and \).
>> """

>None of this appears relevant, as the metacharacter * is not listed. So
>what's going on?

* is so fundamental that it never looses it special meaning.
Same for [ .

* means zero more of the preceeding char.
This makes + superfluous (a mere convenience) as
[1-3]+
can be expressed as
[1-3][1-3]*

Note that [1-3]* matches the empty string. This happens a lot.

Groetjes Albert




>--
>Steven
-- 
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-05 Thread Steven D'Aprano
On Thu, 5 Nov 2015 07:33 pm, Peter Otten wrote:

> Steven D'Aprano wrote:
> 
>> On Wed, 4 Nov 2015 07:57 pm, Peter Otten wrote:
>> 
>>> I tried Tim's example
>>> 
>>> $ seq 5 | grep '1*'
>>> 1
>>> 2
>>> 3
>>> 4
>>> 5
>>> $
>> 
>> I don't understand this. What on earth is grep matching? How does "4"
>> match "1*"?
> 
> Look for zero or more "1".

Doh!

Oh the shame, I knew that. Somehow I tangled myself in a knot, thinking that
it had to be 1 *followed by* zero or more characters. But of course it's
not a glob, it's a regex.




-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-05 Thread Antoon Pardon
Op 05-11-15 om 01:33 schreef Chris Angelico:
> "I want to swim from Sydney to Los Angeles, but my gloves keep wearing
> out half way across the Pacific. How can I make my gloves strong
> enough to get me to LA?"
>
> Response 1: "If you use industrial-strength gloves and go via Papua
> New Guinea, you can double up the gloves and swim to LA."
>
> Response 2: "Swimming across the Pacific is a bad idea. Have you
> considered taking a boat or plane instead?"
>
> Which is the more helpful response? You can go ahead and assume the OP
> always knows best; I'm going to at least offer some alternatives.

What I see often enough doesn't look like offering an alternative but
more like strong argumentation against the direction the OP is going.

I have nothing against offering an alternative. There is the possibilty
that there are better methods to solve the original problem and there
is nothing wrong with suggesting this possibility.

But there is also the possibility that the direction the OP is heading
is the correct one, even if you can't see it. Take the original question
on how to recognize a line that ends with a '*' with a regular expression.

What almost noone seems to have considered is that the real problem might
have been more involved and an excellent example of a problem you can
solve with regular expressions but that there was this subproblem of recognizing
a '*' at the end of the line that was troublesome for the OP.

This is a possibility that is all too often ignored by the members on this
list. We advise people here to just show to most bare code that still 
shows the problem, yet we ignore that this effects the part of the problem we
get to see and often enough people then insist on a better alternative
to deal with the problem totally ignoring that this better alternative
may be totally useless in the original context.

-- 
Antoon Pardon

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-05 Thread Peter Otten
Steven D'Aprano wrote:

> On Wed, 4 Nov 2015 07:57 pm, Peter Otten wrote:
> 
>> I tried Tim's example
>> 
>> $ seq 5 | grep '1*'
>> 1
>> 2
>> 3
>> 4
>> 5
>> $
> 
> I don't understand this. What on earth is grep matching? How does "4"
> match "1*"?

Look for zero or more "1". Written in Python:

for line in sys.stdin:
if re.compile("1*").search(line):
print(line, end="")
 
>> which surprised me because I remembered that there usually weren't any
>> matching lines when I invoked grep instead of egrep by mistake. So I
>> tried another one
>> 
>> $ seq 5 | grep '[1-3]+'
>> $
>> 
>> and then headed for the man page. Apparently there is a subset called
>> "basic regular expressions":
>> 
>> """
>>   Basic vs Extended Regular Expressions
>>In basic regular expressions the meta-characters ?, +, {, |, (,
>>and ) lose their special meaning; instead use  the  backslashed
>>versions \?, \+, \{, \|, \(, and \).
>> """
> 
> None of this appears relevant, as the metacharacter * is not listed. 

That's the very point. 

> So what's going on?

Most special characters are not working with grep, but * is. The quote 
explains why many regular expressions like "[1-3]+" that you may know from 
Python's re don't work, but a small subset including the ominous "1*" do.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-05 Thread Christian Gollwitzer

Am 05.11.15 um 06:59 schrieb ru...@yahoo.com:

Can you call yourself a well-rounded programmer without at least a basic
understanding of some regex library? Well, probably not. But that's part of
the problem with regexes. They have, to some degree, driven out potentially
better -- or at least differently bad -- pattern matching solutions, such
as (E)BNF grammars, SNOBOL pattern matching, or lowly globbing patterns. Or
even alternative idioms, like Hypercard's "chunking" idioms.


Hmm, very good point.  I wonder why all those "potentially better"
solutions have not been more widely adopted?  A conspiracy by a
secret regex cabal?


I'm mostly on the pro-side of the regex discussion, but this IS a valid 
point. regexes are not always a good way to express a pattern, even if 
the pattern is regular. The point is, that you can't build them up 
easily piece-by-piece. Say, you want a regex like "first an 
international phone number, then a name, then a second phone number" - 
you will have to *repeat* the pattern for phone number twice. In more 
complex cases this can become a nightmare, like the monster that was 
mentioned before to validate an email.


A better alternative, then, is PEG for example. You can easily write

pattern <- phone_number name phone_number
phone_number <- '+' [0-9]+ ( '-' [0-9]+ )*
name <-  [[:alpha:]]+

or something similar using a PEG parser. It has almost the same 
quantifiers as a Regex, is much more readable, runs in linear time over 
all inputs and can parse languages with the approximately the same 
complexity as the Knuth style parsers (LR(k) etc.), but without 
ambiguity. I'm really astonished that PEG parsing is not better 
supported in the world of computing, instead most people choose to stick 
to the lexer+scanner combination


Finally, an anecdote from my "early" life of computing. In 1990, when I 
was 12 years old, I participated in an annual competition of computer 
science for high school students. I was learning how to program without 
formal training, and solved one problem where a grammar was depicted as 
a flowchart and the task was to write parser for it, to check the 
validity of input strings. The grammar is depicted here (problem 1):


http://www.auriocus.de/StringKurs/RegEx/uebungen1.pdf

As a 12 year old, not knowing anything about pattern recognition, but 
thinking I was the king, as is usual for boys in that age, I sat down 
and manually constructed a recursive descent parser in a BASIC like 
language. It had 1000 lines and took me a few weeks to get it correct. 
Finally the solution was accepted as working, but my participation was 
rejected because the solutions lacked documentation. 16 years later I 
used the problem for a course on string processing (that's what the PDF 
is for), and asked the students to solve it using regexes. My own 
solution consists of 67 characters, and it took me5 minutes to write it 
down.


Admittedly, this problem is constructed, but solving similar tasks by 
regexes is still something that I need to do on a daily basis, when I 
get data from other scientists in odd formats and I need to preprocess 
them. I know people who use a spreadsheet and copy/paste millions of 
datapoints manually becasue they lack the knowledge of using such tools.


Christian

--
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-05 Thread Chris Angelico
On Thu, Nov 5, 2015 at 6:55 PM, Gregory Ewing
 wrote:
> Tim Chase wrote:
>
>> You get even crazier when you start adding zgrep/zegrep/zfgrep.
>
>
> It's fitting somehow that we should need an RE
> to describe all the possible names of the grep
> command.

Regex engine golf: Find the shortest regex that matches the names of
all GNU commands which accept regular expressions, and no other
commands!

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-05 Thread Gregory Ewing

Tim Chase wrote:


You get even crazier when you start adding zgrep/zegrep/zfgrep.


It's fitting somehow that we should need an RE
to describe all the possible names of the grep
command.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-04 Thread rurpy--- via Python-list
On Wednesday, November 4, 2015 at 7:46:24 PM UTC-7, Chris Angelico wrote:
> On Thu, Nov 5, 2015 at 11:24 AM, rurpy wrote:

> The "take away" that I recommend is: Rurpy loves to argue in favour of
> regular expressions,

No, I don't love it, I quite dislike it.

> but as you can see from the other posts, there
> are alternatives, which are often FAR superior.

No, not FAR superior, just preferable and just in the simple cases,
regexes generally being better in anything beyond simple. 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-04 Thread rurpy--- via Python-list
On 11/04/2015 07:24 PM, Steven D'Aprano wrote:
> On Thu, 5 Nov 2015 11:24 am, wrote:
>
>> You will find they are an indispensable tool, not just in Python
>> programming but in many aspects of computer use.
>
> You will find them a useful tool, but not indispensable by any means.
>
> Hint:
>
> - How many languages make arithmetic a built-in part of the language? Almost
> all of them. I don't know of any language that doesn't let you express
> something like "1 + 1" using built-in functions or syntax. Arithmetic is
> much closer to indispensable.

By my count there are 2377.  That's counting rpn languages where it is
1 1 +.  If you don't count them it is 2250.

> - How many languages make regular expressions a built-in part of the
> language? Almost none of them. There's Perl, obviously, and its
> predecessors sed and awk, and probably a few others, but most languages
> relegate regular expressions to a library.

Yes, like python relegates io to a library.  
Clearly useful but not indispensable, after all who *really* needs 
anything beyond print() and input().  And that stuff in math like sin()
and exp().  How many programs use that geeky trig stuff?  Definitely not 
indispensable.  In fact, now that you pointed it out to me, clearly all
that stdlib stuff is dispensable, all one really needs to write 
"real programmer" programs is just core python.  Who the hell needs "sys"!

> - How many useful programs can be written with regexes? Clearly there are
> many. Some of them would even be quite difficult without regexes. (In
> effect, you would have to invent your own pattern-matching code.)

Lucky for me then that there are regexes.

> - How many useful programs can be written without regexes? Clearly there are
> also many. Every time you write a Python program and fail to import re,
> you've written one.

By golly, you're right.  Not every program I write uses regexes.
Who would have thought?!  However, you failed to establish that 
the programs I write without re are useful.

> Can you call yourself a well-rounded programmer without at least a basic
> understanding of some regex library? Well, probably not. But that's part of
> the problem with regexes. They have, to some degree, driven out potentially
> better -- or at least differently bad -- pattern matching solutions, such
> as (E)BNF grammars, SNOBOL pattern matching, or lowly globbing patterns. Or
> even alternative idioms, like Hypercard's "chunking" idioms.

Hmm, very good point.  I wonder why all those "potentially better" 
solutions have not been more widely adopted?  A conspiracy by a 
secret regex cabal? 

> When all you have is a hammer, everything looks like a nail.

Lucky for us then, that we have more than just hammers!

Sorry for the flippant response (well, not really) but I find your 
arguments pedantic beyond the point of absurdity.  For me, regular 
expressions are indispensable in that if they were not available in 
Python I would not use Python.  The same is true of a number of other 
stdlib modules.  I don't give a rat's ass whether they are in a 
"library" that has to be explicitly requested with import or a 
"library" that is automatically loaded at startup.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-04 Thread rurpy--- via Python-list
On Wednesday, November 4, 2015 at 7:31:34 PM UTC-7, Steven D'Aprano wrote:
> On Thu, 5 Nov 2015 11:13 am, rurpy wrote:
> 
> > There would be far fewer computer languages, and they would be much
> > more primitive if regular expressions (and the fundamental concepts
> > that they express) did not exist.
> 
> Well, that's certainly true. But only because contra-factual statements can
> imply the truth of anything. If squares had seven sides, then Joseph Stalin
> would have been the first woman to go to the moon on horseback.

Yes, thank you for that profoundly insightful comment.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-04 Thread rurpy--- via Python-list
On 11/04/2015 05:33 PM, Chris Angelico wrote:
> On Thu, Nov 5, 2015 at 11:13 AM, rurpy--- via Python-list
>  wrote:
>> On 11/04/2015 07:52 AM, Chris Angelico wrote:
>>> On Thu, Nov 5, 2015 at 1:38 AM, rurpy wrote:
>>>> I'm afraid you are making a category error but perhaps that's in
>>>> part because I wasn't clear.  I was not talking about computer
>>>> science.  I was talking about human beings learning about computers.
>>>> Most people I know consider programming to be a higher level activity
>>>> than "using" a computer: editing, sending email etc.  Many computer
>>>> users (not programmers) learn to use regular expressions as part
>>>> of using a computer without knowing anything about programming.
>>>> It was on that basis I called them more fundamental -- something
>>>> learned earlier which is expanded on and added to later.  But you
>>>> have a bit of a point, perhaps "fundamental" was not the best choice
>>>> of word to communicate that.
>>>
>>> The "fundamentals" of something are its most basic functions, not its
>>> most basic uses. The most common use of a computer might be to browse
>>> the web, but the fundamental functionality is arithmetic and logic.
>>
>> If one accepted that then one would have to reject the term "fundamental
>> use" as meaningless.  A quick trip to google shows that's not true.
>
> A quick trip to Google showed me that there are a number of uses of
> the phrase, mostly in scientific papers and such. I've no idea how
> that helps your argument.

I was showing that your objection to my use of "fundamental" on the 
grounds it does not apply to "use" is patently silly.  From Google:

   interferes with B's more fundamental use because
   fundamental use of english
   The fundamental use of testing
   Fundamental Use of the Michigan Terminal System
   negotiate a fundamental use and exchange of power
   the most fundamental use of pointers
   makes fundamental use of statistical theory

This is what I meant in a recent post when I referred to the Alice-
in-Wonderland nature of this group.  I'm afraid I don't have the 
time or interest to discuss basic english with you.  If you want 
to maintain that "fundamental" does apply to "use" please go right
ahead, it's your credibility at risk.

>> But string matching *is* a fundamental problem that arises frequently
>> in many aspects of CS, programming and, as I mentioned, day-to-day
>> computer use.  Saying its "only" for pattern matching is like saying
>> floating point numbers are "only" for doing non-integer arithmetic,
>> or unicode is "only" for representing text.  (Neither of those is a
>> good analogy because both lack the important theoretical underpinnings
>> that regular expressions have [*]).
>
> String matching does happen a lot. How often do you actually need
> pattern matching? Most of the time, you're doing equality checks - or
> prefix/suffix checks, at best.
>
>> There would be far fewer computer languages, and they would be much
>> more primitive if regular expressions (and the fundamental concepts
>> that they express) did not exist.
>
> So? There would also be far fewer computer languages if braces didn't
> exist, because we wouldn't have the interminable arguments about
> whether they're good or not.

Sorry, that makes no sense to me.  

>> To be sure, I did gloss over Michael Torries' point that there are
>> other concepts that are more basic in the context of learning
>> programming, he was correct about that.
>>
>> But that does not negate the fact that regexes are important and
>> fundamental.  They are both very useful in a practical sense (they
>> are even available in Microsoft Excel) and important in a theoretical
>> sense.  You are not well rounded as a programmer if you decline to
>> learn about regular expressions because "they are too cryptic", or
>> "I can do in code anything they do".
>
> You've proven that they are important, but in no way have you proven
> them fundamental. A regular expression library is the ideal solution
> to the problem "I want to let my users search for patterns of their
> own choosing". That's great, but it's only one specific class of
> problem.

If you think that is the sole use of pattern matching or even the most
important use, I can understand why you find regexes fairly useless.
Lexing (tokenization) and simple parsing are often done with regular
expressions.  Many do

Re: Regular expressions

2015-11-04 Thread Ben Finney
Steven D'Aprano  writes:

> Yes yes, I know that regexes aren't the only tool in my tool box, but
> *right now* I want to learn how to use regexes.

I'll gently suggest this isn't a particularly good forum to do so.

Learn them with a tool like http://www.regexr.com/> and a tutorial
http://www.usna.edu/Users/cs/wcbrown/regexp/RegexpTutorial.html> or
something longer.

-- 
 \“Fascism is capitalism plus murder.” —Upton Sinclair |
  `\   |
_o__)  |
Ben Finney

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-04 Thread Ben Finney
Steven D'Aprano  writes:

> On Wed, 4 Nov 2015 07:57 pm, Peter Otten wrote:
>
> > I tried Tim's example
> > 
> > $ seq 5 | grep '1*'
> > 1
> > 2
> > 3
> > 4
> > 5
> > $
>
> I don't understand this. What on earth is grep matching? How does "4"
> match "1*"?

You can experiment with regular expressions to find out. Here's a link
to the RegExr tool for the above pattern http://regexr.com/3c4ot>.

Matching patterns can include specifications meaning “match some number
of the preceding segment”, with the ‘{n,m}’ notation. That means “match
at least n, and at most m, occurrences of the preceding segment”. Either
‘n’ or ‘m’ can be omitted, meaning “at least 0” and “no maximum”
respectively.

Those are quite useful, so there are shortcuts for the most common
cases: ‘?’ is a short cut for ‘{0,1}’, ‘*’ is a short cut for ‘{0,}’,
and ‘+’ is a short cut for ‘{1,}’.

In this case, ‘*’ is a short cut for ‘{0,}’ meaning “match 0 or more
occurences of the preceding segment”. The segment here is the atom ‘1’.
Since ‘1*’ is the entirety of the pattern, the pattern can match zero
characters, anywhere within any string. So, it matches every possible
string.

To match (some atom) 1 or more times, ‘+’ is a short cut for ‘(1,}’
meaning “match 1 or more occurrences of the preceding segment”.

-- 
 \學而不思則罔,思而不學則殆。 (To study and not think is a waste. |
  `\ To think and not study is dangerous.) |
_o__)—孔夫子 Confucius (551 BCE – 479 BCE) |
Ben Finney

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-04 Thread Tim Chase
On 2015-11-05 13:28, Steven D'Aprano wrote:
> > I tried Tim's example
> > 
> > $ seq 5 | grep '1*'
> > 1
> > 2
> > 3
> > 4
> > 5
> > $  
> 
> I don't understand this. What on earth is grep matching? How does
> "4" match "1*"?

The line with "4" matches "zero or more 1s".  If it was searching for
a literal "1*" (as would happen with fgrep or "grep -F"), it would
return no results:

  $ seq 5 | fgrep '1*'
  $

-tkc



-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-04 Thread Chris Angelico
On Thu, Nov 5, 2015 at 11:24 AM, rurpy--- via Python-list
 wrote:
> On Wednesday, November 4, 2015 at 4:05:06 PM UTC-7, Seymore4Head wrote:
>>[...]
>> I am still here, but I have to admit I am not picking up too much.
>
> The "take away" I recommend is: the folks here are often way
> overly negative regarding regular expressions and that you not
> ignore them, but take them with a BIG grain of salt and continue
> learning about and using regexs.
>
> You will find they are an indispensable tool, not just in Python
> programming but in many aspects of computer use.

The "take away" that I recommend is: Rurpy loves to argue in favour of
regular expressions, but as you can see from the other posts, there
are alternatives, which are often FAR superior.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-04 Thread Steven D'Aprano
On Thu, 5 Nov 2015 11:13 am, ru...@yahoo.com wrote:

> There would be far fewer computer languages, and they would be much
> more primitive if regular expressions (and the fundamental concepts
> that they express) did not exist.

Well, that's certainly true. But only because contra-factual statements can
imply the truth of anything. If squares had seven sides, then Joseph Stalin
would have been the first woman to go to the moon on horseback.

I can't imagine a world where pattern matching doesn't exist. That's like
trying to imagine a world where arithmetic doesn't exist. But I think we
can safely say that, had nobody thought of the idea of searching for
patterns ('find me all the lines with "green" in them'), there would be far
fewer regex libraries in existence. I doubt that there would be "far fewer"
programming languages. With the possible exception of Perl, sed and awk,
I'm not aware of any languages which were specifically inspired by, and
exist primarily to apply, regular expressions, nor any languages which
*require* regexes in their implementation. Most languages are built on
parsers, not regular expressions.


> But I really wish every mention of regexes here wasn't reflexively 
> greeted with a barrage of negative comments and that lame "two problems"
> quote, especially without an answer to the poster's regex question.

I don't disagree with this. Certainly we should accept questions from people
who are simply trying to learn how to use regexes without bombarding them
with admonitions to do something different. Yes yes, I know that regexes
aren't the only tool in my tool box, but *right now* I want to learn how to
use regexes.



-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-04 Thread Steven D'Aprano
On Wed, 4 Nov 2015 07:57 pm, Peter Otten wrote:

> I tried Tim's example
> 
> $ seq 5 | grep '1*'
> 1
> 2
> 3
> 4
> 5
> $

I don't understand this. What on earth is grep matching? How does "4"
match "1*"?


> which surprised me because I remembered that there usually weren't any
> matching lines when I invoked grep instead of egrep by mistake. So I tried
> another one
> 
> $ seq 5 | grep '[1-3]+'
> $
> 
> and then headed for the man page. Apparently there is a subset called
> "basic regular expressions":
> 
> """
>   Basic vs Extended Regular Expressions
>In basic regular expressions the meta-characters ?, +, {, |, (,
>and ) lose their special meaning; instead use  the  backslashed
>versions \?, \+, \{, \|, \(, and \).
> """

None of this appears relevant, as the metacharacter * is not listed. So
what's going on?




-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-04 Thread Steven D'Aprano
On Thu, 5 Nov 2015 11:24 am, ru...@yahoo.com wrote:

> You will find they are an indispensable tool, not just in Python
> programming but in many aspects of computer use.

You will find them a useful tool, but not indispensable by any means.

Hint:

- How many languages make arithmetic a built-in part of the language? Almost
all of them. I don't know of any language that doesn't let you express
something like "1 + 1" using built-in functions or syntax. Arithmetic is
much closer to indispensable.

- How many languages make regular expressions a built-in part of the
language? Almost none of them. There's Perl, obviously, and its
predecessors sed and awk, and probably a few others, but most languages
relegate regular expressions to a library.

- How many useful programs can be written with regexes? Clearly there are
many. Some of them would even be quite difficult without regexes. (In
effect, you would have to invent your own pattern-matching code.)

- How many useful programs can be written without regexes? Clearly there are
also many. Every time you write a Python program and fail to import re,
you've written one.

Can you call yourself a well-rounded programmer without at least a basic
understanding of some regex library? Well, probably not. But that's part of
the problem with regexes. They have, to some degree, driven out potentially
better -- or at least differently bad -- pattern matching solutions, such
as (E)BNF grammars, SNOBOL pattern matching, or lowly globbing patterns. Or
even alternative idioms, like Hypercard's "chunking" idioms.

When all you have is a hammer, everything looks like a nail.





-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-04 Thread Steven D'Aprano
On Thu, 5 Nov 2015 10:02 am, Seymore4Head wrote:

> So far the only use I have for regex is to replace slicing, but I
> think it is an improvement.

I don't understand this. This is like saying "so far the only use I have for
a sandwich press is to replace my coffee pot". Regular expressions and
slicing do very different things.

Slicing extracts substrings, given known starting and ending positions:


py> the_str = "Now is the time for all good men..."
py> the_str[7:12]
'the t'


Regular expressions don't extract substrings with known start/end positions.
They *find* matching text, giving a search string with metacharacters. (If
there are no metacharacters in your search string, you shouldn't use a
regex. str.find will be significantly faster and more convenient.)

Slicing is not about finding text, it is about extracting text once you've
already found it. So they are complementary, not alternatives.



-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-04 Thread Chris Angelico
On Thu, Nov 5, 2015 at 11:13 AM, rurpy--- via Python-list
 wrote:
> On 11/04/2015 07:52 AM, Chris Angelico wrote:
>> On Thu, Nov 5, 2015 at 1:38 AM, rurpy wrote:
>>> I'm afraid you are making a category error but perhaps that's in
>>> part because I wasn't clear.  I was not talking about computer
>>> science.  I was talking about human beings learning about computers.
>>> Most people I know consider programming to be a higher level activity
>>> than "using" a computer: editing, sending email etc.  Many computer
>>> users (not programmers) learn to use regular expressions as part
>>> of using a computer without knowing anything about programming.
>>> It was on that basis I called them more fundamental -- something
>>> learned earlier which is expanded on and added to later.  But you
>>> have a bit of a point, perhaps "fundamental" was not the best choice
>>> of word to communicate that.
>>
>> The "fundamentals" of something are its most basic functions, not its
>> most basic uses. The most common use of a computer might be to browse
>> the web, but the fundamental functionality is arithmetic and logic.
>
> If one accepted that then one would have to reject the term "fundamental
> use" as meaningless.  A quick trip to google shows that's not true.

A quick trip to Google showed me that there are a number of uses of
the phrase, mostly in scientific papers and such. I've no idea how
that helps your argument.

> But string matching *is* a fundamental problem that arises frequently
> in many aspects of CS, programming and, as I mentioned, day-to-day
> computer use.  Saying its "only" for pattern matching is like saying
> floating point numbers are "only" for doing non-integer arithmetic,
> or unicode is "only" for representing text.  (Neither of those is a
> good analogy because both lack the important theoretical underpinnings
> that regular expressions have [*]).

String matching does happen a lot. How often do you actually need
pattern matching? Most of the time, you're doing equality checks - or
prefix/suffix checks, at best.

> There would be far fewer computer languages, and they would be much
> more primitive if regular expressions (and the fundamental concepts
> that they express) did not exist.

So? There would also be far fewer computer languages if braces didn't
exist, because we wouldn't have the interminable arguments about
whether they're good or not.

> To be sure, I did gloss over Michael Torries' point that there are
> other concepts that are more basic in the context of learning
> programming, he was correct about that.
>
> But that does not negate the fact that regexes are important and
> fundamental.  They are both very useful in a practical sense (they
> are even available in Microsoft Excel) and important in a theoretical
> sense.  You are not well rounded as a programmer if you decline to
> learn about regular expressions because "they are too cryptic", or
> "I can do in code anything they do".

You've proven that they are important, but in no way have you proven
them fundamental. A regular expression library is the ideal solution
to the problem "I want to let my users search for patterns of their
own choosing". That's great, but it's only one specific class of
problem.

> I think the constant negative reception the posters receive here when
> they ask about regexes does them a great disservice.
>
> By all means point out that python offers a number of functions that
> can avoid the need for using regexes in simple cases.  Even point out
> that you (the plural you) don't like them and prefer other solutions
> (like writing code that does the same thing in a more half-assed bug
> ridden way, the posts in this thread being a case in point.)
>
> But I really wish every mention of regexes here wasn't reflexively
> greeted with a barrage of negative comments and that lame "two problems"
> quote, especially without an answer to the poster's regex question.

When has that happened? Usually there'll be at least two answers - one
that uses a regex and one that doesn't - and people get to read both.

>> Sure, you can
>> abuse that into a primality check and other forms of crazy arithmetic,
>> but it's not what they truly do. I also would not teach regexes to
>> people as part of an "introduction to computing" course, any more than
>> I would teach the use of Microsoft Excel, which some such courses have
>> been known to do. (And no, it's not because of the Microsoftness. I
>> wouldn't teach LibreOffi

Re: Regular expressions

2015-11-04 Thread rurpy--- via Python-list
On Wednesday, November 4, 2015 at 4:05:06 PM UTC-7, Seymore4Head wrote:
>[...]
> I am still here, but I have to admit I am not picking up too much.

The "take away" I recommend is: the folks here are often way 
overly negative regarding regular expressions and that you not
ignore them, but take them with a BIG grain of salt and continue 
learning about and using regexs.

You will find they are an indispensable tool, not just in Python 
programming but in many aspects of computer use.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-04 Thread rurpy--- via Python-list
On 11/04/2015 07:52 AM, Chris Angelico wrote:
> On Thu, Nov 5, 2015 at 1:38 AM, rurpy wrote:
>> I'm afraid you are making a category error but perhaps that's in
>> part because I wasn't clear.  I was not talking about computer
>> science.  I was talking about human beings learning about computers.
>> Most people I know consider programming to be a higher level activity
>> than "using" a computer: editing, sending email etc.  Many computer
>> users (not programmers) learn to use regular expressions as part
>> of using a computer without knowing anything about programming.
>> It was on that basis I called them more fundamental -- something
>> learned earlier which is expanded on and added to later.  But you
>> have a bit of a point, perhaps "fundamental" was not the best choice
>> of word to communicate that.
>
> The "fundamentals" of something are its most basic functions, not its
> most basic uses. The most common use of a computer might be to browse
> the web, but the fundamental functionality is arithmetic and logic.

If one accepted that then one would have to reject the term "fundamental 
use" as meaningless.  A quick trip to google shows that's not true.

> Setting aside the choice of word, though, I still don't think regular
> expressions are a more basic use of computing than loops and
> conditionals. A regex can't be used for anything other than string
> matching; they exist for one purpose, and one purpose only: to answer
> the question "Does this string match this pattern?". 

But string matching *is* a fundamental problem that arises frequently
in many aspects of CS, programming and, as I mentioned, day-to-day
computer use.  Saying its "only" for pattern matching is like saying 
floating point numbers are "only" for doing non-integer arithmetic,
or unicode is "only" for representing text.  (Neither of those is a 
good analogy because both lack the important theoretical underpinnings 
that regular expressions have [*]).
There would be far fewer computer languages, and they would be much
more primitive if regular expressions (and the fundamental concepts
that they express) did not exist.

To be sure, I did gloss over Michael Torries' point that there are 
other concepts that are more basic in the context of learning 
programming, he was correct about that. 

But that does not negate the fact that regexes are important and 
fundamental.  They are both very useful in a practical sense (they 
are even available in Microsoft Excel) and important in a theoretical 
sense.  You are not well rounded as a programmer if you decline to 
learn about regular expressions because "they are too cryptic", or 
"I can do in code anything they do".  

I think the constant negative reception the posters receive here when
they ask about regexes does them a great disservice.

By all means point out that python offers a number of functions that 
can avoid the need for using regexes in simple cases.  Even point out 
that you (the plural you) don't like them and prefer other solutions
(like writing code that does the same thing in a more half-assed bug
ridden way, the posts in this thread being a case in point.)

But I really wish every mention of regexes here wasn't reflexively 
greeted with a barrage of negative comments and that lame "two problems"
quote, especially without an answer to the poster's regex question.

> Sure, you can
> abuse that into a primality check and other forms of crazy arithmetic,
> but it's not what they truly do. I also would not teach regexes to
> people as part of an "introduction to computing" course, any more than
> I would teach the use of Microsoft Excel, which some such courses have
> been known to do. (And no, it's not because of the Microsoftness. I
> wouldn't teach LibreOffice Calc either.) You don't need to know how to
> work a spreadsheet as part of the basics of computer usage, and you
> definitely don't need an advanced form of text search.

Seems to me that clearly depends on the intent of the class, the students
goal's, what they'll be studying after the class, what their current 
level of knowledge is, etc.  Your scenario seems way too under-specified
to say anything definitive.  And further, the pedagogy of CS (or of any 
subject of education) is not "settled science" and that kind of question
almost never has a clear right/wrong answer.

This list is not a class.  If someone comes here with a question about 
Python's regexes they deserve an answer and not be bombarded with reasons
why they shouldn't be using regexes beyond mentioning some of the alternatives
in a "oh, by the way" way.  (And yes, I recognize in this case the OP did 
get a good answer from MRAB early on.)


[*] yes, I know there is a lot of CS theory underlying floating point.
I don't think it is as deep or as important as that underlying regexes,
automata and language.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-04 Thread Seymore4Head
On Wed, 4 Nov 2015 18:08:51 -0500, Terry Reedy 
wrote:

>On 11/3/2015 10:23 PM, Steven D'Aprano wrote:
>
>> I don't even know what grep stands for.
>
>Get Regular Expression & Print

Thanks,  I may get around to that eventually.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-04 Thread Seymore4Head
On Wed, 04 Nov 2015 14:48:21 +1100, Steven D'Aprano
 wrote:

>On Wednesday 04 November 2015 11:33, ru...@yahoo.com wrote:
>
>>> Not quite.  Core language concepts like ifs, loops, functions,
>>> variables, slicing, etc are the socket wrenches of the programmer's
>>> toolbox.  Regexs are like an electric impact socket wrench.  You can do
>>> the same work without it, but in many cases it's slower. But you have to
>>> learn the other hand tools first in order to really use the electric
>>> driver properly (understanding torques, direction of threads, etc), lest
>>> you wonder why you're breaking off so many bolts with the torque of the
>>> impact drive.
>> 
>> I consider regexs more fundemental
>
>I'm sure that there are people who consider the International Space Station 
>more fundamental than the lever, the wedge and the hammer, but they would be 
>wrong too.
>
>Given primitives for branching, loops and variables, you can build support 
>for regexes. Given regexes, how would you build support for variables?
>
>Of course, you could easily prove me wrong. All you would need to do to 
>demonstrate that regexes are more fundamental than branching, loops and 
>variables would be to demonstrate that the primitive operations available in 
>commonly used CPUs are regular expressions, and that (for example) C's for 
>loop and if...else are implemented in machine code as regular expressions, 
>rather than the other way around.

So far the only use I have for regex is to replace slicing, but I
think it is an improvement.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-04 Thread Terry Reedy

On 11/3/2015 10:23 PM, Steven D'Aprano wrote:


I don't even know what grep stands for.


Get Regular Expression & Print

--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-04 Thread Seymore4Head
On Wed, 04 Nov 2015 08:13:51 -0700, Michael Torrie 
wrote:

>On 11/04/2015 01:57 AM, Peter Otten wrote:
>> and then headed for the man page. Apparently there is a subset
>> called "basic regular expressions":
>>
>> """>   Basic vs Extended Regular Expressions
>>In basic regular expressions the meta-characters ?, +, {, |, (,
>>and ) lose their special meaning; instead use  the  backslashed
>>versions \?, \+, \{, \|, \(, and \).
>> """
>
>Good catch. I think this must have been what my brain was thinking when
>I commented about grep and regular expressions earlier. I checked the
>man page but didn't read down far enough.
>
>I was still technically wrong though.
>
>It's neat to learn so much on these tangents that the python list goes
>on frequently. Hope the OP is still lurking, reading all these comments,
>though I suspect he's not.
>
I am still here, but I have to admit I am not picking up too much.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: What does “grep” stand for? (was: Regular expressions)

2015-11-04 Thread Tim Chase
On 2015-11-05 05:24, Ben Finney wrote:
> A very common command to issue, then, is “actually show me the line
> of text I just specified”; the ‘p’ (for “print”) command.
> 
> Another very common command is “find the text matching this pattern
> and perform these commands on it”, which is ‘g’ (for “global”). The
> ‘g’ command addresses text matching a regular expression pattern,
> delimited by slashes ‘/’.
> 
> So, for users with feeble human brains incapable of remembering
> perfectly the entire content of the text while it changes and
> therefore not always knowing exactly which lines they wanted to
> operate on without seeing them all the time, a very frequent
> combination command is:
> 
> g/RE/p

Though since the default action for g/ is to print the line, I've
always wondered why the utility wasn't named just "gre"

   $ ed myfile.txt
   g/re
   [matching lines follow]
   q
   $

-tkc
(the goofball behind https://twitter.com/ed1conf )




-- 
https://mail.python.org/mailman/listinfo/python-list


What does “grep” stand for? (was: Regular expressions)

2015-11-04 Thread Ben Finney
Steven D'Aprano  writes:

> On Wednesday 04 November 2015 13:55, Dan Sommers wrote:
>
> > Its very name indicates that its default mode most certainly is
> > regular expressions.
>
> I don't even know what grep stands for. 

“grep” stands for ‘g/RE/p’.

The name is a mnemonic for a compound command in ‘ed’ [0], a text editor
that pre-dates extravagant luxuries like “presenting a full screen of
text at one time”.


In an ‘ed’ session, the user is obliged to keep mental track of the
current line in the text buffer, and even what that text contains during
the session.

Single-letter commands, with various terse parameters such as the range
of lines or some text to insert, are issued at a command prompt one
after another.

For these reasons, the manual page describes ‘ed’ as a “line-oriented
text editor”. Everything is done by specifying lines, blindly, to
commands which then operate on those lines.

The name of the ‘vi’ editor means “visual interface (to a text editor)”,
to proudly declare the innovation of a full screen of text that updates
content during the editing session. That was not available for users of
‘ed’.


A very common command to issue, then, is “actually show me the line of
text I just specified”; the ‘p’ (for “print”) command.

Another very common command is “find the text matching this pattern and
perform these commands on it”, which is ‘g’ (for “global”). The ‘g’
command addresses text matching a regular expression pattern, delimited
by slashes ‘/’.

So, for users with feeble human brains incapable of remembering
perfectly the entire content of the text while it changes and therefore
not always knowing exactly which lines they wanted to operate on without
seeing them all the time, a very frequent combination command is:

g/RE/p

meaning “find lines forward from here that match the regular expression
pattern “RE”, and do nothing to those lines except print them to
standard output”.


Wikipedia has useful pages on both ‘grep’ and ‘ed’
https://en.wikipedia.org/wiki/Grep>
https://en.wikipedia.org/wiki/Ed_%28text_editor%29>.

You can see a full specification of how the ‘ed’ interface is to behave
as part of the “Open Group Base Specifications Issue 7”, which is the
specification for Unix.

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/ed.html>

See the manual for GNU ed which includes an example session to
appreciate just how far things have come.


https://www.gnu.org/software/ed/manual/ed_manual.html#Introduction-to-line-editing>

Of course, if you yearn for the days of minimalist purity, nothing beats
Ed, man! !man ed


[0] The standard text editor.
https://www.gnu.org/fun/jokes/ed-msg.txt>

-- 
 \ “If you can't annoy somebody there is little point in writing.” |
  `\—Kingsley Amis |
_o__)  |
Ben Finney

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-04 Thread Tim Chase
On 2015-11-04 09:57, Peter Otten wrote:
> Well, I didn't know that grep uses regular expressions by default.

It doesn't help that grep(1) comes in multiple flavors:

grep:  should use BRE (Basic REs)
fgrep:  same as "grep -F"; uses fixed strings, no REs
egrep:  same as "grep -E"; uses ERE (Extended REs)
grep -P: a GNUism to use PCREs (Perl Compatible REs)

there's also an "rgrep" which is just "grep -r" which I find kinda
silly/redundant. Though frankly I feel the same way about fgrep/egrep
since they just activate a command-line switch.

You get even crazier when you start adding zgrep/zegrep/zfgrep.

-tkc


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Irregular last line in a text file, was Re: Regular expressions

2015-11-04 Thread Tim Chase
On 2015-11-04 14:39, Steven D'Aprano wrote:
> On Wednesday 04 November 2015 03:56, Tim Chase wrote:
>> Or even more valuable to me:
>> 
>>   with open(..., newline="strip") as f:
>> assert all(not line.endswith(("\n", "\r")) for line in f)
> 
> # Works only on Windows text files.
> def chomp(lines):
> for line in lines:
> yield line.rstrip('\r\n')

.rstrip() takes a string that is a set of characters, so it will
remove any \r or \n at the end of the string (so it works with
both Windows & *nix line-endings) whereas just using .rstrip()
without a parameter can throw away data you might want:

  >>> "hello \r\n\r\r\n\n\n".rstrip("\r\n")
  'hello '
  >>> "hello \r\n\r\r\n\n\n".rstrip()
  'hello'

-tkc




-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-04 Thread Michael Torrie
On 11/04/2015 01:57 AM, Peter Otten wrote:
> and then headed for the man page. Apparently there is a subset
> called "basic regular expressions":
>
> """>   Basic vs Extended Regular Expressions
>In basic regular expressions the meta-characters ?, +, {, |, (,
>and ) lose their special meaning; instead use  the  backslashed
>versions \?, \+, \{, \|, \(, and \).
> """

Good catch. I think this must have been what my brain was thinking when
I commented about grep and regular expressions earlier. I checked the
man page but didn't read down far enough.

I was still technically wrong though.

It's neat to learn so much on these tangents that the python list goes
on frequently. Hope the OP is still lurking, reading all these comments,
though I suspect he's not.


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-04 Thread Chris Angelico
On Thu, Nov 5, 2015 at 1:38 AM, rurpy--- via Python-list
 wrote:
> I'm afraid you are making a category error but perhaps that's in
> part because I wasn't clear.  I was not talking about computer
> science.  I was talking about human beings learning about computers.
> Most people I know consider programming to be a higher level activity
> than "using" a computer: editing, sending email etc.  Many computer
> users (not programmers) learn to use regular expressions as part
> of using a computer without knowing anything about programming.
> It was on that basis I called them more fundamental -- something
> learned earlier which is expanded on and added to later.  But you
> have a bit of a point, perhaps "fundamental" was not the best choice
> of word to communicate that.

The "fundamentals" of something are its most basic functions, not its
most basic uses. The most common use of a computer might be to browse
the web, but the fundamental functionality is arithmetic and logic.

Setting aside the choice of word, though, I still don't think regular
expressions are a more basic use of computing than loops and
conditionals. A regex can't be used for anything other than string
matching; they exist for one purpose, and one purpose only: to answer
the question "Does this string match this pattern?". Sure, you can
abuse that into a primality check and other forms of crazy arithmetic,
but it's not what they truly do. I also would not teach regexes to
people as part of an "introduction to computing" course, any more than
I would teach the use of Microsoft Excel, which some such courses have
been known to do. (And no, it's not because of the Microsoftness. I
wouldn't teach LibreOffice Calc either.) You don't need to know how to
work a spreadsheet as part of the basics of computer usage, and you
definitely don't need an advanced form of text search.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-04 Thread rurpy--- via Python-list
On Wednesday, November 4, 2015 at 1:52:31 AM UTC-7, Steven D'Aprano wrote:
> On Wednesday 04 November 2015 18:21, Christian Gollwitzer wrote:
> 
> > What rurpy meant, was that regexes can surface to a computer user
> > earlier than variables and branches; a user who does not go into the
> > depth to actually program the machine, might still encounter them in a
> > text editor or database engine. Even some web forms allow some limited
> > form, like e.g. the DVD rental here or Google.
> [...]
> What *I* think that Rurpy means is that one can construct a mathematical 
> system based on pattern matching which is Turing complete, and therefore in 
> principle any problem you can solve using a program written in (say) Python, 
> C, Lisp, Smalltalk, etc, or execute on a CPU (or simulate in your head!) 
> could be written as a sufficiently complex regular expression.

No, Christian was correct.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-04 Thread rurpy--- via Python-list
On 11/03/2015 08:48 PM, Steven D'Aprano wrote:
> On Wednesday 04 November 2015 11:33, rurpy wrote:
>
>>> Not quite.  Core language concepts like ifs, loops, functions,
>>> variables, slicing, etc are the socket wrenches of the programmer's
>>> toolbox.  Regexs are like an electric impact socket wrench.  You can do
>>> the same work without it, but in many cases it's slower. But you have to
>>> learn the other hand tools first in order to really use the electric
>>> driver properly (understanding torques, direction of threads, etc), lest
>>> you wonder why you're breaking off so many bolts with the torque of the
>>> impact drive.
>>
>> I consider regexs more fundemental
>
> I'm sure that there are people who consider the International Space Station 
> more fundamental than the lever, the wedge and the hammer, but they would be 
> wrong too.
>
> Given primitives for branching, loops and variables, you can build support 
> for regexes. Given regexes, how would you build support for variables?
>
> Of course, you could easily prove me wrong. All you would need to do to 
> demonstrate that regexes are more fundamental than branching, loops and 
> variables would be to demonstrate that the primitive operations available in 
> commonly used CPUs are regular expressions, and that (for example) C's for 
> loop and if...else are implemented in machine code as regular expressions, 
> rather than the other way around.

I'm afraid you are making a category error but perhaps that's in 
part because I wasn't clear.  I was not talking about computer 
science.  I was talking about human beings learning about computers.  
Most people I know consider programming to be a higher level activity 
than "using" a computer: editing, sending email etc.  Many computer
users (not programmers) learn to use regular expressions as part
of using a computer without knowing anything about programming.
It was on that basis I called them more fundamental -- something
learned earlier which is expanded on and added to later.  But you
have a bit of a point, perhaps "fundamental" was not the best choice
of word to communicate that.

Here is what I wrote:

> I consider regexs more fundemental.  One need not even be a programmer
> to use them: consider grep, sed, a zillion editors, database query 
> languages, etc.

I thought the context, which you removed even to the point cutting 
text from the very same line you quoted, made that clear but perhaps
not.

Indeed it is quite eye-opening when one does learn a little CS and 
discovers these things that were just a useful "feature" actually have 
a deep and profound theoretical basis.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-04 Thread Grant Edwards
On 2015-11-04, Michael Torrie  wrote:
> On 11/03/2015 08:23 PM, Steven D'Aprano wrote:
>>
>>>> Grep can use regular expressions (and I do so with it regularly), but
>>>> it's default mode is certainly not regular expressions ...
>>>
>>> Its very name indicates that its default mode most certainly is
>>> regular expressions.
>> 
>> I don't even know what grep stands for. 

General Regular Expression Parser (or somesuch)

>> But I think what Michael may mean is that if you "grep foo", no regex
>> magic takes place since "foo" contains no metacharacters.
>
> More likely I just don't know what I'm talking about.  I must have
> been thinking about something else (shell globbing perhaps).
>
> Certainly most of the times I've seen grep used, it's to look for a
> word with no special metacharacters, as you say. Still a valid RE of
> course. But I have learned to night I don't need to resort to grep -e
> to use regular expressions.

The -e turns on "enhanced" regexes which add a few more features to
the regex language it parses.  I've never been entirely sure if the -e
regex language is backwards compatible with the default one or not...

> At least with GNU grep, that's the default.

Grep has always by default parsed its first command line argument (at
least since v7 and Sys5).  If you didn't want it treated as a regex,
you had to specify -f.

-- 
Grant Edwards   grant.b.edwardsYow! I like your SNOOPY
  at   POSTER!!
  gmail.com
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-04 Thread Antoon Pardon
Op 04-11-15 om 04:35 schreef Steven D'Aprano:
> On Wednesday 04 November 2015 03:20, Chris Angelico wrote:
>
>> (So,
>> too, are all the comments about using [-1] or string methods. But we
>> weren't to know that.)
> If MRAB could understand what he wanted, I'm sure most others could have 
> too.

Yes, they were just to busy to try pushing the OP in an other direction, for
them to answer the question.

-- 
Antoon Pardon

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Irregular last line in a text file, was Re: Regular expressions

2015-11-04 Thread Oscar Benjamin
On 4 November 2015 at 03:39, Steven D'Aprano
 wrote:
>
> Better would be this:
>
> def chomp(lines):
> for line in lines:
> yield line.rstrip()  # remove all trailing whitespace
>
>
> with open(...) as f:
> for line in chomp(f): ...

with open(...) as f:
for line in map(str.rstrip, f): ...

--
Oscar
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-04 Thread Peter Otten
Michael Torrie wrote:

> On 11/03/2015 08:23 PM, Steven D'Aprano wrote:
>>>> Grep can use regular expressions (and I do so with it regularly), but
>>>> it's default mode is certainly not regular expressions ...
>>>
>>> Its very name indicates that its default mode most certainly is regular
>>> expressions.
>> 
>> I don't even know what grep stands for.
>> 
>> But I think what Michael may mean is that if you "grep foo", no regex
>> magic takes place since "foo" contains no metacharacters.
> 
> More likely I just don't know what I'm talking about.  I must have been
> thinking about something else (shell globbing perhaps).
> 
> Certainly most of the times I've seen grep used, it's to look for a word
> with no special metacharacters, as you say. Still a valid RE of course.
>  But I have learned to night I don't need to resort to grep -e to use
> regular expressions.  At least with GNU grep, that's the default.

Well, I didn't know that grep uses regular expressions by default.

I tried Tim's example

$ seq 5 | grep '1*'
1
2
3
4
5
$ 

which surprised me because I remembered that there usually weren't any 
matching lines when I invoked grep instead of egrep by mistake. So I tried 
another one

$ seq 5 | grep '[1-3]+'
$ 

and then headed for the man page. Apparently there is a subset called "basic 
regular expressions":

"""
  Basic vs Extended Regular Expressions
   In basic regular expressions the meta-characters ?, +, {, |, (,
   and ) lose their special meaning; instead use  the  backslashed
   versions \?, \+, \{, \|, \(, and \).
"""


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-04 Thread Steven D'Aprano
On Wednesday 04 November 2015 18:21, Christian Gollwitzer wrote:

> What rurpy meant, was that regexes can surface to a computer user
> earlier than variables and branches; a user who does not go into the
> depth to actually program the machine, might still encounter them in a
> text editor or database engine. Even some web forms allow some limited
> form, like e.g. the DVD rental here or Google.

What Rurpy meant, only Rurpy can say, but I doubt that is what he is talking 
about. By that logic, a full-screen high-def 3D first-person shooter game 
with an advanced AI is "more fundamental" than an assembly language branch 
operation, because there are people who play computer games without doing 
assembly programming.

In context, Michael suggested that programmers should learn the basic 
fundamentals of their chosen language, such as variables, for-loops and 
branching, before regexes -- which Rurpy then disagreed with, claiming that 
regexes are more fundamental than those basic operations.

What *I* think that Rurpy means is that one can construct a mathematical 
system based on pattern matching which is Turing complete, and therefore in 
principle any problem you can solve using a program written in (say) Python, 
C, Lisp, Smalltalk, etc, or execute on a CPU (or simulate in your head!) 
could be written as a sufficiently complex regular expression.

I think he is *technically wrong*, if by "regex" we mean actual regular 
expressions. Perl, and Python, regexes are strictly more powerful than 
regular expressions (despite the name). I know that Perl regexes are Turing 
complete (mainly because they can call out to the Perl interpreter), I'm not 
sure about Python regexes.

But I also think that Rurpy is *not even wrong* if he means Perl or Python 
regexes. The (entirely theoretical) ability to solve a problem like "What is 
pi to the power of the first prime number larger than 97531000?" using a 
regex doesn't make regexes more fundamental than variables, branches and 
loops. It just makes them an alternative computing paradigm -- one which is 
*exponentially* more difficult to use than the standard paradigms of 
functional, procedural, OOP, etc. for anything except the limited subset of 
pattern matching problems they were created for.



-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-03 Thread Christian Gollwitzer

Am 04.11.15 um 04:48 schrieb Steven D'Aprano:

On Wednesday 04 November 2015 11:33, ru...@yahoo.com wrote:


Not quite.  Core language concepts like ifs, loops, functions,
variables, slicing, etc are the socket wrenches of the programmer's
toolbox.  Regexs are like an electric impact socket wrench.  You can do
the same work without it, but in many cases it's slower. But you have to
learn the other hand tools first in order to really use the electric
driver properly (understanding torques, direction of threads, etc), lest
you wonder why you're breaking off so many bolts with the torque of the
impact drive.


I consider regexs more fundemental


I'm sure that there are people who consider the International Space Station
more fundamental than the lever, the wedge and the hammer, but they would be
wrong too.

Given primitives for branching, loops and variables, you can build support
for regexes. Given regexes, how would you build support for variables?

Of course, you could easily prove me wrong.


You *know* that they are not equivalent, I assume? regexes are 
equivalent to finite state machines, which are less powerful than Turing 
machines, and even less powerful than stack machines. You can't even 
construct a regexp which validates, if parentheses are balanced.


What rurpy meant, was that regexes can surface to a computer user 
earlier than variables and branches; a user who does not go into the 
depth to actually program the machine, might still encounter them in a 
text editor or database engine. Even some web forms allow some limited 
form, like e.g. the DVD rental here or Google.


Christian
--
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-03 Thread Nobody
On Wed, 04 Nov 2015 14:23:04 +1100, Steven D'Aprano wrote:

>> Its very name indicates that its default mode most certainly is regular
>> expressions.
> 
> I don't even know what grep stands for.

>From the ed command "g /re/p" (where "re" is a placeholder for an
arbitrary regular expression). Tests all lines ("g" for global) against
the specified regexp and prints ("p") any which match.

> But I think what Michael may mean is that if you "grep foo", no regex
> magic takes place since "foo" contains no metacharacters.

At least the GNU version will treat the input as a regexp regardless of
whether it contains only literal characters. I.e. "grep foo" and
"grep [f][o][o]" will both construct the same state machine then process
the input with it.

You need to actually use -F to change the matching algorithm.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-03 Thread Steven D'Aprano
On Wednesday 04 November 2015 11:33, ru...@yahoo.com wrote:

>> Not quite.  Core language concepts like ifs, loops, functions,
>> variables, slicing, etc are the socket wrenches of the programmer's
>> toolbox.  Regexs are like an electric impact socket wrench.  You can do
>> the same work without it, but in many cases it's slower. But you have to
>> learn the other hand tools first in order to really use the electric
>> driver properly (understanding torques, direction of threads, etc), lest
>> you wonder why you're breaking off so many bolts with the torque of the
>> impact drive.
> 
> I consider regexs more fundemental

I'm sure that there are people who consider the International Space Station 
more fundamental than the lever, the wedge and the hammer, but they would be 
wrong too.

Given primitives for branching, loops and variables, you can build support 
for regexes. Given regexes, how would you build support for variables?

Of course, you could easily prove me wrong. All you would need to do to 
demonstrate that regexes are more fundamental than branching, loops and 
variables would be to demonstrate that the primitive operations available in 
commonly used CPUs are regular expressions, and that (for example) C's for 
loop and if...else are implemented in machine code as regular expressions, 
rather than the other way around.



-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-03 Thread Michael Torrie
On 11/03/2015 08:23 PM, Steven D'Aprano wrote:
>>> Grep can use regular expressions (and I do so with it regularly), but
>>> it's default mode is certainly not regular expressions ...
>>
>> Its very name indicates that its default mode most certainly is regular
>> expressions.
> 
> I don't even know what grep stands for. 
> 
> But I think what Michael may mean is that if you "grep foo", no regex magic 
> takes place since "foo" contains no metacharacters.

More likely I just don't know what I'm talking about.  I must have been
thinking about something else (shell globbing perhaps).

Certainly most of the times I've seen grep used, it's to look for a word
with no special metacharacters, as you say. Still a valid RE of course.
 But I have learned to night I don't need to resort to grep -e to use
regular expressions.  At least with GNU grep, that's the default.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Irregular last line in a text file, was Re: Regular expressions

2015-11-03 Thread Steven D'Aprano
On Wednesday 04 November 2015 03:56, Tim Chase wrote:

> Or even more valuable to me:
> 
>   with open(..., newline="strip") as f:
> assert all(not line.endswith(("\n", "\r")) for line in f)

# Works only on Windows text files.
def chomp(lines):
for line in lines:
yield line.rstrip('\r\n')


Better would be this:

def chomp(lines):
for line in lines:
yield line.rstrip()  # remove all trailing whitespace


with open(...) as f:
for line in chomp(f): ...


-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-03 Thread Steven D'Aprano
On Wednesday 04 November 2015 03:20, Chris Angelico wrote:

> On Wed, Nov 4, 2015 at 3:10 AM, Seymore4Head
>  wrote:
>> Yes I knew that -1 represents the end character.  It is not a question
>> of trying to accomplish anything.  I was just practicing with regex
>> and wasn't sure how to express a * since it was one of the
>> instructions.
> 
> In that case, it's nothing to do with ending a string. 

Seymore never said anything about ending a string.


> What you really
> want to know is: How do you match a '*' using a regular expression?

He may want to know that too, but that's not what he asked for. He asked how 
to match an asterisk at the end of the line.


> Which is what MRAB answered, courtesy of a working crystal ball: You
> use '\*'. Everything about the end of the string is irrelevant.

Not at all -- matching "\*" will find lines *beginning* with an asterisk if 
you use re.match, and lines containing an asterisk *anywhere* in the line if 
you use re.search.

I say "line" because the most common use for re.match and re.search is to 
match against a single line of text, but of course regexes can operate on 
multiline blocks of text, with or without multiline mode turned on.

And calling it "a working crystal ball" is somewhat of an exaggeration. The 
plain English meaning of Seymore's plain English question is easily 
understood: he wants to know how to match an asterisk at the end of the 
line, just like he said :-P


> (So,
> too, are all the comments about using [-1] or string methods. But we
> weren't to know that.)

If MRAB could understand what he wanted, I'm sure most others could have 
too.



-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-03 Thread Steven D'Aprano
On Wednesday 04 November 2015 13:55, Dan Sommers wrote:

> On Tue, 03 Nov 2015 19:04:23 -0700, Michael Torrie wrote:
> 
>> On 11/03/2015 05:33 PM, rurpy--- via Python-list wrote:
>>> I consider regexs more fundemental.  One need not even be a programmer
>>> to use them: consider grep, sed, a zillion editors, database query
>>> languages, etc.
>> 
>> Grep can use regular expressions (and I do so with it regularly), but
>> it's default mode is certainly not regular expressions ...
> 
> Its very name indicates that its default mode most certainly is regular
> expressions.

I don't even know what grep stands for. 

But I think what Michael may mean is that if you "grep foo", no regex magic 
takes place since "foo" contains no metacharacters.




-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-03 Thread Chris Angelico
On Wed, Nov 4, 2015 at 2:12 PM, Tim Chase  wrote:
> It's not as helpful as one might hope because you're stuck using a
> fixed regexp rather than an arbitrary regexp, but if you have a
> particular regexp you search for frequently, you can index it.
> Otherwise, you'd be doing full table-scans (or at least a full scan
> of whatever subset the active non-regexp'ed index yields) which can
> be pretty killer on performance.

If the regex anchors the start of the string, you can generally use an
index to save at least some effort. Otherwise, you're relying on some
kind of alternate indexing style, such as:

http://www.postgresql.org/docs/current/static/pgtrgm.html

which specifically mentions regex searches as being indexable.

Some more info, including 'explain' results:

http://www.depesz.com/2013/04/10/waiting-for-9-3-support-indexing-of-regular-expression-searches-in-contribpg_trgm/

But this kind of thing isn't widely supported across databases.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-03 Thread Tim Chase
On 2015-11-03 19:04, Michael Torrie wrote:
> Grep can use regular expressions (and I do so with it regularly),
> but it's default mode is certainly not regular expressions, and it
> is still very powerful.

I suspect you're thinking of `fgrep` (AKA "grep -F") which uses fixed
strings rather than regular expressions.  By default, `grep` certainly
does use regular expressions:

  tim@linux$ seq 5 | grep "1*"
  tim@bsd$ jot 5 | grep "1*"

will output the entire input, not just lines containing a "1"
followed by an asterisk.

> I've never used regular expressions in a database query language;
> until this moment I didn't know any supported such things in their
> queries.  Good to know.  How you would index for regular
> expressions in queries I don't know.

At least PostgreSQL allows for creating indexes on a particular
regular expression.  E.g. (shooting from the hip so I might have
missed something):

  CREATE TABLE contacts (
   -- ...
   phonenumber VARCHAR(15),
   -- ...
   )
  CREATE INDEX contacts_just_phone_digits_idx
   ON contacts((regexp_replace(phonenumber, '[^0-9]', '')));

  INSERT INTO contacts(..., phonenumber, ...)
   VALUES (..., '800-555-1212', ...)

  SELECT *
  FROM contacts
  WHERE -- should use contacts_just_phone_digits_idx
   regexp_replace(phonenumber, '[^0-9]', '') = '8005551212';

It's not as helpful as one might hope because you're stuck using a
fixed regexp rather than an arbitrary regexp, but if you have a
particular regexp you search for frequently, you can index it.
Otherwise, you'd be doing full table-scans (or at least a full scan
of whatever subset the active non-regexp'ed index yields) which can
be pretty killer on performance.

You'd have to research on other DB engines.

-tkc




-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-03 Thread Dan Sommers
On Tue, 03 Nov 2015 19:04:23 -0700, Michael Torrie wrote:

> On 11/03/2015 05:33 PM, rurpy--- via Python-list wrote:
>> I consider regexs more fundemental.  One need not even be a programmer
>> to use them: consider grep, sed, a zillion editors, database query 
>> languages, etc.
> 
> Grep can use regular expressions (and I do so with it regularly), but
> it's default mode is certainly not regular expressions ...

Its very name indicates that its default mode most certainly is regular
expressions.

Dan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-03 Thread Michael Torrie
On 11/03/2015 05:33 PM, rurpy--- via Python-list wrote:
> I consider regexs more fundemental.  One need not even be a programmer
> to use them: consider grep, sed, a zillion editors, database query 
> languages, etc.

Grep can use regular expressions (and I do so with it regularly), but
it's default mode is certainly not regular expressions, and it is still
very powerful.  I've never used regular expressions in a database query
language; until this moment I didn't know any supported such things in
their queries.  Good to know.  How you would index for regular
expressions in queries I don't know.

> When there is a mini-language explicitly developed for describing
> string patterns, why, except is very simple cases, would one not
> take advantage of it?  

Mainly because the programming language itself often can do it just as
cleanly and just as fast (slicing, string methods, etc).  I certainly
programmed for many years without needing regular expressions in my
small projects.  In fact, REs are a bit of a pain to use in, say, C or
C++, requiring a library.  With Python they are much more readily
accessible so I use them much more.

But honestly it wasn't until college when I learned about finite state
automata that I really grasped what regular expressions were and how to
use them.

> Beyond trivial operations a regex, although
> terse (overly perhaps), is still likely to be more understandable 
> more maintainable than bunch of ad-hoc code.  And the relative ease 
> of expressing complex patterns means one is more likely to create
> more specific patterns, resulting in detecting unexpected input 
> earlier than with ad-hoc code. 

Maybe, maybe not.  Using Python string class methods is probably more
clear when such methods are sufficient.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-03 Thread rurpy--- via Python-list
On Monday, November 2, 2015 at 9:38:24 PM UTC-7, Michael Torrie wrote:
> On 11/02/2015 09:23 PM, rurpy--- via Python-list wrote:
> >> My completely unsolicited advice is that regular expressions shouldn't be
> >> very high on the list of things to learn.  They are very useful, and very
> >> tricky and prone many problems that can and should be learned to be
> >> resolved with much simpler methods.  If you really want to learn regular
> >> expressions, that's great but the problem you posed is not one for which
> >> they are the best solution.  Remember simpler is better than complex.
> > 
> > Regular expressions should be learned by every programmer or by anyone
> > who wants to use computers as a tool.  They are a fundamental part of
> > computer science and are used in all sorts of matching and searching 
> > from compilers down to your work-a-day text editor.
> > 
> > Not knowing how to use them is like an auto mechanic not knowing how to 
> > use a socket wrench.
> 
> Not quite.  Core language concepts like ifs, loops, functions,
> variables, slicing, etc are the socket wrenches of the programmer's
> toolbox.  Regexs are like an electric impact socket wrench.  You can do
> the same work without it, but in many cases it's slower. But you have to
> learn the other hand tools first in order to really use the electric
> driver properly (understanding torques, direction of threads, etc), lest
> you wonder why you're breaking off so many bolts with the torque of the
> impact drive.

I consider regexs more fundemental.  One need not even be a programmer
to use them: consider grep, sed, a zillion editors, database query 
languages, etc.

When there is a mini-language explicitly developed for describing
string patterns, why, except is very simple cases, would one not
take advantage of it?  Beyond trivial operations a regex, although
terse (overly perhaps), is still likely to be more understandable 
more maintainable than bunch of ad-hoc code.  And the relative ease 
of expressing complex patterns means one is more likely to create
more specific patterns, resulting in detecting unexpected input 
earlier than with ad-hoc code. 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-03 Thread rurpy--- via Python-list
On 11/03/2015 12:15 AM, Steven D'Aprano wrote:
> On Tue, 3 Nov 2015 03:23 pm, rurpy wrote:
> 
>> Regular expressions should be learned by every programmer or by anyone
>> who wants to use computers as a tool.  They are a fundamental part of
>> computer science and are used in all sorts of matching and searching
>> from compilers down to your work-a-day text editor.
> 
> You are absolutely right.
> 
> If only regular expressions weren't such an overly-terse, cryptic
> mini-language, with all but no debugging capabilities, they would be great.
> 
> If only there wasn't an extensive culture of regular expression abuse within
> programming communities, they would be fine.
> 
> All technologies are open to abuse. But we don't say:
> 
>   Some people, when confronted with a problem, think "I know, I'll use
>   arithmetic." Now they have two problems.
> 
> because abuse of arithmetic is rare. It's hard to misuse it, and while
> arithmetic can be complicated, it's rare for programmers to abuse it. But
> the same cannot be said for regexes -- they are regularly misused, abused,
> and down-right hard to use right even when you have a good reason for using
> them:
> 
> http://www.thedailywtf.com/articles/Irregular_Expression
> 
> http://blog.codinghorror.com/regex-use-vs-regex-abuse/
> 
> http://psung.blogspot.com.au/2008/01/wonderful-abuse-of-regular-expressions.html

Thanks for pointing out three cases of misuse of regexes out of the
approximately 37500 [*] uses of regexes in the wild. I hope you're
not dumb enough to think that constitutes significant evidence.

Even worse, of the three only one was a real example. One of the others
was machine-generated code, the other was a "look what you can do with
regexes" example, not serious code.

Here is an example of "abusing" python

  https://benkurtovic.com/2014/06/01/obfuscating-hello-world.html

I wouldn't use this as evidence that Python is to be avoided.

> If there is one person who has done more to create a regex culture, it is
> Larry Wall, inventor of Perl. Even Larry Wall says that regexes are
> overused and their syntax is harmful, and he has recreated them for Perl 6:
> 
> http://www.perl.com/pub/2002/06/04/apo5.html

You really should have read beyond the first paragraph. He proposes
fixing regexes by adding even more special character combinations and
making regexes even *more* powerful. (He turned them into full-blown
parsers.)

Nowhere does he advocate not using, or avoiding if possible, regexes
as is the mantra in this list.

Here is Larry's "recreation" that you are touting:

  http://design.perl6.org/S05.html

Please explain to us how you think this "fix" addresses the complaints
you and other Python anti-regexers have about regexes.

I hope you also noted Larry's tongue-in-cheek writing style. Right after
pointing out that some claim Perl is hard to read due largely to regex
syntax, he writes:

  "Funny that other languages have been borrowing Perl's regular
  expressions as fast as they can..."

So I don't think you can claim Larry Wall as a supporter of this list's
anti-regex attitude beyond some superficial verbiage taken out of context.

> Oh, and the icing on the cake, regexes can be a security vulnerability too:
> https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS

And here is a list of CVEs involving Python. There are (at time of
writing) 190 of them.

  http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=python

So if a security vulnerability is reason not to use regexes, we should
all be *running* from Python. I sure you'll point out that most have
been fixed.

But you failed to point out that same is true of regex engines. From
your source:

  "Notice, that not all algorithms are naïve, and actually Regex
  algorithms can be written in an efficient way."

And in fact, again, had you looked beyond a headline that suited your
purpose, you could have tried the "Evil Regexes" noted in that source
and discovered none of them are a DoS in Python.

Even were that not true, normal practice applies: if the input is
untrusted then sanitize it, or mitigate the threat by imposing a timeout,
etc. Not exactly a problem or solution unique to regexes. And common
sense should tell you that since there are a lot of "try a regex" web
sites, this is not a problem without a solution.

And *certainly* not a reason not to use them in the *far* more common
case when they *are* trusted because you are in control of them,

Finally, preemptively, I'll repeat I acknowledge regexs are not the
the optimum solution in every case where they could be used. But they
are very useful when one passes the border of the trivial; and they are
nowhere near as bad as routinely portrayed here.


[*] Yes, I made that number up.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-03 Thread Robin Koch

Am 03.11.2015 um 05:23 schrieb ru...@yahoo.com:


Of course there are people who misuse regexes.


/^1?$|^(11+?)\1+$/

There are? 0:-)

--
Robin Koch
--
https://mail.python.org/mailman/listinfo/python-list


Re: Irregular last line in a text file, was Re: Regular expressions

2015-11-03 Thread Grant Edwards
On 2015-11-03, Tim Chase  wrote:

[re. iterating over lines in a file]

> I can't think of more than 1-2 times in my last 10+ years of
> Pythoning that I've actually had potential use for the newlines,

If you can think of 1-2 times when you've been interating over the
lines in a file and wanted to see the EOL markers, then that's 1-2
times more than I've ever wanted to see them since I started using
Python 16 years ago...

-- 
Grant Edwards   grant.b.edwardsYow! !  Up ahead!  It's a
  at   DONUT HUT!!
  gmail.com
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Irregular last line in a text file, was Re: Regular expressions

2015-11-03 Thread Tim Chase
On 2015-11-03 11:39, Ian Kelly wrote:
> >> because I have countless loops that look something like
> >>
> >>   with open(...) as f:
> >> for line in f:
> >>   line = line.rstrip('\r\n')
> >>   process(line)  
> >
> > What would happen if you read a file opened like this without
> > iterating over lines?  
> 
> I think I'd go with this:
> 
> >>> def strip_newlines(iterable):  
> ... for line in iterable:
> ... yield line.rstrip('\r\n')
> ...

Behind the scenes, this is what I usually end up doing, but the
effective logic is the same.  I just like the notion of being able to
tell open() that I want iteratation to happen over the *content* of
the lines, ignoring the new-line delimiters.

I can't think of more than 1-2 times in my last 10+ years of
Pythoning that I've actually had potential use for the newlines,
usually on account of simply feeding the entire line back into some
filelike.write() method where I wanted the newlines in the resulting
file. But even in those cases, I seem to recall stripping off the
arbitrary newlines (LF vs. CR/LF) and then adding my own known line
delimiter.

-tkc



-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Irregular last line in a text file, was Re: Regular expressions

2015-11-03 Thread Ian Kelly
On Tue, Nov 3, 2015 at 11:33 AM, Ian Kelly  wrote:
> On Tue, Nov 3, 2015 at 9:56 AM, Tim Chase  
> wrote:
>> Or even more valuable to me:
>>
>>   with open(..., newline="strip") as f:
>> assert all(not line.endswith(("\n", "\r")) for line in f)
>>
>> because I have countless loops that look something like
>>
>>   with open(...) as f:
>> for line in f:
>>   line = line.rstrip('\r\n')
>>   process(line)
>
> What would happen if you read a file opened like this without
> iterating over lines?

I think I'd go with this:

>>> def strip_newlines(iterable):
... for line in iterable:
... yield line.rstrip('\r\n')
...
>>> list(strip_newlines(['one\n', 'two\r', 'three']))
['one', 'two', 'three']

Or if I care about optimizing the for loop (but we're talking about
file I/O, so probably not), this might be faster:

>>> import operator
>>> def strip_newlines(iterable):
... return map(operator.methodcaller('rstrip', '\r\n'), iterable)
...
>>> list(strip_newlines(['one\n', 'two\r', 'three']))
['one', 'two', 'three']

Then the iteration is just:
for line in strip_newlines(f):
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Irregular last line in a text file, was Re: Regular expressions

2015-11-03 Thread Ian Kelly
On Tue, Nov 3, 2015 at 9:56 AM, Tim Chase  wrote:
> On 2015-11-03 16:35, Peter Otten wrote:
>> I wish there were a way to prohibit such files. Maybe a special
>> value
>>
>> with open(..., newline="normalize") f:
>> assert all(line.endswith("\n") for line in f)
>>
>> to ensure that all lines end with "\n"?
>
> Or even more valuable to me:
>
>   with open(..., newline="strip") as f:
> assert all(not line.endswith(("\n", "\r")) for line in f)
>
> because I have countless loops that look something like
>
>   with open(...) as f:
> for line in f:
>   line = line.rstrip('\r\n')
>   process(line)

What would happen if you read a file opened like this without
iterating over lines?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Irregular last line in a text file, was Re: Regular expressions

2015-11-03 Thread Peter Otten
Tim Chase wrote:

> On 2015-11-03 16:35, Peter Otten wrote:
>> I wish there were a way to prohibit such files. Maybe a special
>> value
>> 
>> with open(..., newline="normalize") f:
>> assert all(line.endswith("\n") for line in f)
>> 
>> to ensure that all lines end with "\n"?
> 
> Or even more valuable to me:
> 
>   with open(..., newline="strip") as f:
> assert all(not line.endswith(("\n", "\r")) for line in f)
> 
> because I have countless loops that look something like
> 
>   with open(...) as f:
> for line in f:
>   line = line.rstrip('\r\n')
>   process(line)

Indeed. It's obvious now you're saying it...

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Irregular last line in a text file, was Re: Regular expressions

2015-11-03 Thread Tim Chase
On 2015-11-03 16:35, Peter Otten wrote:
> I wish there were a way to prohibit such files. Maybe a special
> value
> 
> with open(..., newline="normalize") f: 
> assert all(line.endswith("\n") for line in f)
> 
> to ensure that all lines end with "\n"?

Or even more valuable to me:

  with open(..., newline="strip") as f:
assert all(not line.endswith(("\n", "\r")) for line in f)

because I have countless loops that look something like

  with open(...) as f:
for line in f:
  line = line.rstrip('\r\n')
  process(line)

-tkc




-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Irregular last line in a text file, was Re: Regular expressions

2015-11-03 Thread Jussi Piitulainen
Peter Otten writes:
> Jussi Piitulainen wrote:
>> Peter Otten writes:
>> 
>>> If a "line" is defined as a string that ends with a newline
>>>
>>> def ends_in_asterisk(line):
>>> return False
>>>
>>> would also satisfy the requirement. Lies, damned lies, and specs ;)
>> 
>> Even if a "line" is defined as a string that comes from reading
>> something like a file with default options, a line may end in
>> an asterisk.
>  
> Note that the last line from the file is not a line as defined by me
> in the above post ;)

Noted.

> [ line.endswith('*') for line in StringIO('rivi*\nrivi*\nrivi*') ]
>> [False, False, True]
>
> I wish there were a way to prohibit such files. Maybe a special value
>
> with open(..., newline="normalize") f: 
> assert all(line.endswith("\n") for line in f)
>
> to ensure that all lines end with "\n"?

I'd like that. It should be the default.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-03 Thread Chris Angelico
On Wed, Nov 4, 2015 at 3:10 AM, Seymore4Head
 wrote:
> Yes I knew that -1 represents the end character.  It is not a question
> of trying to accomplish anything.  I was just practicing with regex
> and wasn't sure how to express a * since it was one of the
> instructions.

In that case, it's nothing to do with ending a string. What you really
want to know is: How do you match a '*' using a regular expression?
Which is what MRAB answered, courtesy of a working crystal ball: You
use '\*'. Everything about the end of the string is irrelevant. (So,
too, are all the comments about using [-1] or string methods. But we
weren't to know that.)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-03 Thread Seymore4Head
On Tue, 3 Nov 2015 10:34:12 -0500, Joel Goldstick
 wrote:

>On Mon, Nov 2, 2015 at 10:17 PM, Seymore4Head 
>wrote:
>
>> On Mon, 2 Nov 2015 20:42:37 -0600, Tim Chase
>>  wrote:
>>
>> >On 2015-11-02 20:09, Seymore4Head wrote:
>> >> How do I make a regular expression that returns true if the end of
>> >> the line is an asterisk
>> >
>> >Why use a regular expression?
>> >
>> >  if line[-1] == '*':
>> >yep(line)
>> >  else:
>> >nope(line)
>> >
>> >-tkc
>> >
>> >
>> Because that is the part of Python I am trying to learn at the moment.
>>
>
>Are we to infer that you were aware of doing the   if line[-1] == '*': ...
>, but just wanted to learn how to do the same thing with regex? Or that you
>heard about regexes and thought that would be the way to solve your puzzle?
>
>> Thanks
>> --
>> https://mail.python.org/mailman/listinfo/python-list
>>
Yes I knew that -1 represents the end character.  It is not a question
of trying to accomplish anything.  I was just practicing with regex
and wasn't sure how to express a * since it was one of the
instructions.

-- 
https://mail.python.org/mailman/listinfo/python-list


Irregular last line in a text file, was Re: Regular expressions

2015-11-03 Thread Peter Otten
Jussi Piitulainen wrote:

> Peter Otten writes:
> 
>> If a "line" is defined as a string that ends with a newline
>>
>> def ends_in_asterisk(line):
>> return False
>>
>> would also satisfy the requirement. Lies, damned lies, and specs ;)
> 
> Even if a "line" is defined as a string that comes from reading
> something like a file with default options, a line may end in
> an asterisk.
 
Note that the last line from the file is not a line as defined by me in the 
above post ;)

 [ line.endswith('*') for line in StringIO('rivi*\nrivi*\nrivi*') ]
> [False, False, True]

I wish there were a way to prohibit such files. Maybe a special value

with open(..., newline="normalize") f: 
assert all(line.endswith("\n") for line in f)

to ensure that all lines end with "\n"?


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-03 Thread Joel Goldstick
On Mon, Nov 2, 2015 at 10:17 PM, Seymore4Head 
wrote:

> On Mon, 2 Nov 2015 20:42:37 -0600, Tim Chase
>  wrote:
>
> >On 2015-11-02 20:09, Seymore4Head wrote:
> >> How do I make a regular expression that returns true if the end of
> >> the line is an asterisk
> >
> >Why use a regular expression?
> >
> >  if line[-1] == '*':
> >yep(line)
> >  else:
> >nope(line)
> >
> >-tkc
> >
> >
> Because that is the part of Python I am trying to learn at the moment.
>

Are we to infer that you were aware of doing the   if line[-1] == '*': ...
, but just wanted to learn how to do the same thing with regex? Or that you
heard about regexes and thought that would be the way to solve your puzzle?

> Thanks
> --
> https://mail.python.org/mailman/listinfo/python-list
>



-- 
Joel Goldstick
http://joelgoldstick.com/stats/birthdays
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-03 Thread Jussi Piitulainen
Peter Otten writes:

> If a "line" is defined as a string that ends with a newline
>
> def ends_in_asterisk(line):
> return False
>
> would also satisfy the requirement. Lies, damned lies, and specs ;)

Even if a "line" is defined as a string that comes from reading
something like a file with default options, a line may end in
an asterisk.

>>> [ line.endswith('*') for line in StringIO('rivi*\nrivi*\nrivi*') ]
[False, False, True]
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-03 Thread Grant Edwards
On 2015-11-03, Tim Chase  wrote:
> On 2015-11-02 20:09, Seymore4Head wrote:
>> How do I make a regular expression that returns true if the end of
>> the line is an asterisk
>
> Why use a regular expression?
>
>   if line[-1] == '*':

Why use a negative index and then a compare?

if line.endswith('*'):

If you want to know if a string ends with something, just ask it!

;)

-- 
Grant Edwards   grant.b.edwardsYow! RELATIVES!!
  at   
  gmail.com
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-03 Thread Tim Chase
On 2015-11-02 22:17, Seymore4Head wrote:
> On Mon, 2 Nov 2015 20:42:37 -0600, Tim Chase
>  wrote:
> 
> >On 2015-11-02 20:09, Seymore4Head wrote:
> >> How do I make a regular expression that returns true if the end
> >> of the line is an asterisk
> >
> >Why use a regular expression?
> >
> Because that is the part of Python I am trying to learn at the
> moment. Thanks

Ah, well that's an entirely different problem-space, so then you
would want to use MRAB's answer

  r = re.compile(r"\*$")

-tkc



-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-03 Thread Peter Otten
Tim Chase wrote:

> On 2015-11-03 10:25, Peter Otten wrote:
>> >>> How do I make a regular expression that returns true if the end
>> >>> of the line is an asterisk
>> >> 
>> >> Why use a regular expression?
>> >> 
>> >>   if line[-1] == '*':
>> >> yep(line)
>> >>   else:
>> >> nope(line)
>> 
>> Incidentally the code example has two "problems", too.
>> 
>> - What about the empty string?
> 
> Good catch: .endswith() works better.
> 
>> - What about lines with a trailing "\n", i. e. as they are usually
>> delivered when iterating over a file?
> 
> Then your string *doesn't* end with a "*", but rather with a
> newline. ;-)
> 
> Though according to the OP's specs, the following function would work
> too:
> 
>   def ends_in_asterisk(s):
> return True
> 
> It *does* return True if the line ends in an asterisk (no requirement
> to make the function return False under any other conditions).

If a "line" is defined as a string that ends with a newline

def ends_in_asterisk(line):
return False

would also satisfy the requirement. Lies, damned lies, and specs ;)

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-03 Thread Denis McMahon
On Mon, 02 Nov 2015 22:17:49 -0500, Seymore4Head wrote:

> On Mon, 2 Nov 2015 20:42:37 -0600, Tim Chase
>  wrote:
> 
>>On 2015-11-02 20:09, Seymore4Head wrote:

>>> How do I make a regular expression that returns true if the end of the
>>> line is an asterisk

>>Why use a regular expression?

> Because that is the part of Python I am trying to learn at the moment.

The most important thing to learn about regular expressions is when to 
use them and when not to use them.

Returning true if the last character in a string is an asterisk is almost 
certainly a brilliant example of when not to use a regular expression. 
Here are some timings I tested:

#!/usr/bin/python

import re

import timeit

patt = re.compile("\*$")

start_time = timeit.default_timer()
for i in range(100):
x = re.match("\*$", "test 1")
elapsed = timeit.default_timer() - start_time
print "re, false", elapsed

start_time = timeit.default_timer()
for i in range(100):
x = re.match("\*$", "test *")
elapsed = timeit.default_timer() - start_time
print "re, true", elapsed

start_time = timeit.default_timer()
for i in range(100):
x = patt.match("test 1")
elapsed = timeit.default_timer() - start_time
print "compiled re, false", elapsed

start_time = timeit.default_timer()
for i in range(100):
x = patt.match("test *")
elapsed = timeit.default_timer() - start_time
print "compiled re, true", elapsed

start_time = timeit.default_timer()
for i in range(100):
x = "test 1"[-1] == "*"
elapsed = timeit.default_timer() - start_time
print "char compare, false", elapsed

start_time = timeit.default_timer()
for i in range(100):
x = "test *"[-1] == "*"
elapsed = timeit.default_timer() - start_time
print "char compare, true", elapsed

RESULTS:

re, false 2.4701731205
re, true 2.42048001289
compiled re, false 0.875837087631
compiled re, true 0.876382112503
char compare, false 0.26283121109
char compare, true 0.263465881348

The compiled re is about 3 times as fast as the uncompiled re. The 
character comparison is about 3 times as fast as the compiled re.

-- 
Denis McMahon, denismfmcma...@gmail.com
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-03 Thread Tim Chase
On 2015-11-03 10:25, Peter Otten wrote:
> >>> How do I make a regular expression that returns true if the end
> >>> of the line is an asterisk
> >> 
> >> Why use a regular expression?
> >> 
> >>   if line[-1] == '*':
> >> yep(line)
> >>   else:
> >> nope(line)
> 
> Incidentally the code example has two "problems", too.
> 
> - What about the empty string?

Good catch: .endswith() works better.

> - What about lines with a trailing "\n", i. e. as they are usually
> delivered when iterating over a file?

Then your string *doesn't* end with a "*", but rather with a
newline. ;-)

Though according to the OP's specs, the following function would work
too:

  def ends_in_asterisk(s):
return True

It *does* return True if the line ends in an asterisk (no requirement
to make the function return False under any other conditions).

-tkc





-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-03 Thread Peter Otten
Michael Torrie wrote:

> On 11/02/2015 07:42 PM, Tim Chase wrote:
>> On 2015-11-02 20:09, Seymore4Head wrote:
>>> How do I make a regular expression that returns true if the end of
>>> the line is an asterisk
>> 
>> Why use a regular expression?
>> 
>>   if line[-1] == '*':
>> yep(line)
>>   else:
>> nope(line)
> 
> Indeed, sometimes Jamie Zawinski's is often quite appropriate:
> 
> Some people, when confronted with a problem, think "I know, I'll use
> regular expressions." Now they have two problems.

Incidentally the code example has two "problems", too.

- What about the empty string?
- What about lines with a trailing "\n", i. e. as they are usually delivered
  when iterating over a file?

Below is a comparison of some of your options. The "one obvious way" 
line.rstrip("\n").endswith("*") is not included ;)

$ cat starry_table.py 
import re


def show_table(data, header):
rows = [header]
rows.extend([str(c) for c in row] for row in data)
widths = [max(len(row[i]) for row in rows) for i in range(len(header))]
template = "  ".join("{:%d}" % w for w in widths)
for row in rows:
print(template.format(*row))


def compare(sample_lines):
for line in sample_lines:
got_re = bool(re.compile("\*$").search(line))
got_re_M = bool(re.compile("\*$", re.M).search(line))
got_endswith = line.endswith("*")
got_endswith2 = line.endswith(("*", "*\n"))
got_substring = line[-1:] == "*"
try:
got_char = line[-1] == "*"
except IndexError:
got_char = "#exception"
results = (
got_re, got_re_M,
got_endswith, got_endswith2,
got_substring, got_char)
yield (
["", "X"][len(set(results)) > 1],
repr(line)) + results


SAMPLE = ["", "\n", "foo\n", "*\n", "*", "foo*", "foo*\n", "foo*\nbar"]
HEADER = [
"", "line", "regex", "re.M",
"endswith", 'endswith(("*", "*\\n"))',
"substring", "char"]

if __name__ == "__main__":
show_table(compare(SAMPLE), HEADER)


$ python3 starry_table.py 
   line regex  re.M   endswith  endswith(("*", "*\n"))  substring  char 
 
X  ''   False  False  False False   False  
#exception
   '\n' False  False  False False   False  
False 
   'foo\n'  False  False  False False   False  
False 
X  '*\n'True   True   False TrueFalse  
False 
   '*'  True   True   True  TrueTrue   True 
 
   'foo*'   True   True   True  TrueTrue   True 
 
X  'foo*\n' True   True   False TrueFalse  
False 
X  'foo*\nbar'  False  True   False False   False  
False 


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-03 Thread Nick Sarbicki
On Tue, Nov 3, 2015 at 7:15 AM, Steven D'Aprano  wrote:

> On Tue, 3 Nov 2015 03:23 pm, ru...@yahoo.com wrote:
>
> > Regular expressions should be learned by every programmer or by anyone
> > who wants to use computers as a tool.  They are a fundamental part of
> > computer science and are used in all sorts of matching and searching
> > from compilers down to your work-a-day text editor.
>
> You are absolutely right.
>
> If only regular expressions weren't such an overly-terse, cryptic
> mini-language, with all but no debugging capabilities, they would be great.
>
> If only there wasn't an extensive culture of regular expression abuse
> within
> programming communities, they would be fine.
>
> All technologies are open to abuse. But we don't say:
>
>   Some people, when confronted with a problem, think "I know, I'll use
>   arithmetic." Now they have two problems.
>
> because abuse of arithmetic is rare. It's hard to misuse it, and while
> arithmetic can be complicated, it's rare for programmers to abuse it. But
> the same cannot be said for regexes -- they are regularly misused, abused,
> and down-right hard to use right even when you have a good reason for using
> them:
>
> http://www.thedailywtf.com/articles/Irregular_Expression
>
> http://blog.codinghorror.com/regex-use-vs-regex-abuse/
>
>
> http://psung.blogspot.com.au/2008/01/wonderful-abuse-of-regular-expressions.html
>
>
> If there is one person who has done more to create a regex culture, it is
> Larry Wall, inventor of Perl. Even Larry Wall says that regexes are
> overused and their syntax is harmful, and he has recreated them for Perl 6:
>
> http://www.perl.com/pub/2002/06/04/apo5.html
>
> Oh, and the icing on the cake, regexes can be a security vulnerability too:
>
>
> https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS
>
>
>
> --
> Steven
>
> --
> https://mail.python.org/mailman/listinfo/python-list
>


+1

I agree that regex is an entirely necessary part of a programmers toolkit,
but dear god some people need to be taught restraint. The majority of
people I talk about regex to have no idea when and where it shouldn't be
used.

As an example part of my job is bringing our legacy Python code into the
modern day, and one of the largest roadblocks is the amount of regex used.

Some is necessary.

Some can be replaced by an `if word in str` or something similarly basic.

Some spans hundreds of lines and causes acute alopecia.

Just yesterday I found a colleague trying to parse HTML with regex.

So yes, teach regex, but teach it after the basics, and please emphasise
when it is appropriate to use it.

Yes I am bitter.

- Nick.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-02 Thread Steven D'Aprano
On Tue, 3 Nov 2015 03:23 pm, ru...@yahoo.com wrote:

> Regular expressions should be learned by every programmer or by anyone
> who wants to use computers as a tool.  They are a fundamental part of
> computer science and are used in all sorts of matching and searching
> from compilers down to your work-a-day text editor.

You are absolutely right.

If only regular expressions weren't such an overly-terse, cryptic
mini-language, with all but no debugging capabilities, they would be great.

If only there wasn't an extensive culture of regular expression abuse within
programming communities, they would be fine.

All technologies are open to abuse. But we don't say:

  Some people, when confronted with a problem, think "I know, I'll use
  arithmetic." Now they have two problems.

because abuse of arithmetic is rare. It's hard to misuse it, and while
arithmetic can be complicated, it's rare for programmers to abuse it. But
the same cannot be said for regexes -- they are regularly misused, abused,
and down-right hard to use right even when you have a good reason for using
them:

http://www.thedailywtf.com/articles/Irregular_Expression

http://blog.codinghorror.com/regex-use-vs-regex-abuse/

http://psung.blogspot.com.au/2008/01/wonderful-abuse-of-regular-expressions.html


If there is one person who has done more to create a regex culture, it is
Larry Wall, inventor of Perl. Even Larry Wall says that regexes are
overused and their syntax is harmful, and he has recreated them for Perl 6:

http://www.perl.com/pub/2002/06/04/apo5.html

Oh, and the icing on the cake, regexes can be a security vulnerability too:

https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS



-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-02 Thread Michael Torrie
On 11/02/2015 09:23 PM, rurpy--- via Python-list wrote:
>> My completely unsolicited advice is that regular expressions shouldn't be
>> very high on the list of things to learn.  They are very useful, and very
>> tricky and prone many problems that can and should be learned to be
>> resolved with much simpler methods.  If you really want to learn regular
>> expressions, that's great but the problem you posed is not one for which
>> they are the best solution.  Remember simpler is better than complex.
> 
> Regular expressions should be learned by every programmer or by anyone
> who wants to use computers as a tool.  They are a fundamental part of
> computer science and are used in all sorts of matching and searching 
> from compilers down to your work-a-day text editor.
> 
> Not knowing how to use them is like an auto mechanic not knowing how to 
> use a socket wrench.

Not quite.  Core language concepts like ifs, loops, functions,
variables, slicing, etc are the socket wrenches of the programmer's
toolbox.  Regexs are like an electric impact socket wrench.  You can do
the same work without it, but in many cases it's slower. But you have to
learn the other hand tools first in order to really use the electric
driver properly (understanding torques, direction of threads, etc), lest
you wonder why you're breaking off so many bolts with the torque of the
impact drive.


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-02 Thread Michael Torrie
On 11/02/2015 09:23 PM, rurpy--- via Python-list wrote:
> On 11/02/2015 08:51 PM, Michael Torrie wrote:
>> [...]
>> Indeed, sometimes Jamie Zawinski's is often quite appropriate:
>>
>> Some people, when confronted with a problem, think "I know, I'll use
>> regular expressions." Now they have two problems.
> 
> Or its sometimes heard paraphrase:
>   Some people, when confronted with a problem, think "I know, I'll use
>   Python." Now they have two problems
> The point being it's a cute and memorable aphorism but not very meaningful
> because it can be applied to anything one wishes to denigrate.
> 
> Of course there are people who misuse regexes. But I am quite sure,
> especially in the Python community, there are just as many who fail to
> use them when they are appropriate which is just as bad.

Judging by a few posts on the list lately, I'd say it is highly relevant
to Python itself.  Too many people have only a vague notion of a problem
they'd like to solve and although they don't really understand the
problem, they've heard Python is a good language to learn, so they ask
how they can solve that problem with Python.

Now, this certainly can work for a person who's already experienced in
several languages and who already understands the problem.  For others,
it's very much now two intractable problems.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions

2015-11-02 Thread rurpy--- via Python-list
On Monday, November 2, 2015 at 8:58:45 PM UTC-7, Joel Goldstick wrote:
> On Mon, Nov 2, 2015 at 10:17 PM, Seymore4Head 
> wrote:
> 
> > On Mon, 2 Nov 2015 20:42:37 -0600, Tim Chase
> >  wrote:
> >
> > >On 2015-11-02 20:09, Seymore4Head wrote:
> > >> How do I make a regular expression that returns true if the end of
> > >> the line is an asterisk
> > >
> > >Why use a regular expression?
> > >
> > >  if line[-1] == '*':
> > >yep(line)
> > >  else:
> > >nope(line)
> > >
> > >-tkc
> > >
> > >
> > Because that is the part of Python I am trying to learn at the moment.
> > Thanks
> > --
> > https://mail.python.org/mailman/listinfo/python-list
> >
> 
> My completely unsolicited advice is that regular expressions shouldn't be
> very high on the list of things to learn.  They are very useful, and very
> tricky and prone many problems that can and should be learned to be
> resolved with much simpler methods.  If you really want to learn regular
> expressions, that's great but the problem you posed is not one for which
> they are the best solution.  Remember simpler is better than complex.

Regular expressions should be learned by every programmer or by anyone
who wants to use computers as a tool.  They are a fundamental part of
computer science and are used in all sorts of matching and searching 
from compilers down to your work-a-day text editor.

Not knowing how to use them is like an auto mechanic not knowing how to 
use a socket wrench.
-- 
https://mail.python.org/mailman/listinfo/python-list


  1   2   3   4   5   6   7   8   >