Re: [Tutor] Help with re in Python 3

2011-11-05 Thread Andreas Perstinger

On 2011-11-04 20:59, Albert-Jan Roskam wrote:

It seems that you are not opening the file properly. You could do
f = file('///Users/joebatt/Desktop/python3.txt','r')
or:
withfile('///Users/joebatt/Desktop/python3.txt','r') as f:


OP is using Python 3, where file is removed. Thus, you have to use open:

f = open('...')

with open('...') as f:

Bye, Andreas
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Help with re in Python 3

2011-11-04 Thread Joel Goldstick
On Fri, Nov 4, 2011 at 3:42 PM, Joe Batt joeb...@hotmail.co.uk wrote:

  Hi all,
 Still trying with Python and programming in general….

 I am trying to get a grip with re. I am writing a program to open a text
 file and scan it for exactly 3 uppercase letters in a row followed by a
 lowercase followed by exactly 3 uppercase letters. ( i.e.  oooXXXoXXXooo )
 If possible could you explain why I am getting EOL while scanning string
 literal when I try running the following program in Python 3.

 My program:

 import re

 regexp=re.compile(r[a-z]r[A-Z]r[A-Z]r[A-Z]r[a-z]r[A-Z]r[A-Z]r[A-Z]r[a-z])

 file=('///Users/joebatt/Desktop/python3.txt','r')
 for line in file.readlines():
 if regexp.search(line):
 print(Found value 3 caps followed by lower case followed by 3
 caps)
 file.close()

 If possible could you explain why I am getting EOL while scanning string
 literal when I try running my program in Python 3.

 Thanks for your help

 Joe



 ___
 Tutor maillist  -  Tutor@python.org
 To unsubscribe or change subscription options:
 http://mail.python.org/mailman/listinfo/tutor


You should read a little more about regular expressions to simplify yours,
but I believe your problem is that you have no closing 
after this: r[a-z])

change it to r[a-z])


-- 
Joel Goldstick
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Help with re in Python 3

2011-11-04 Thread Albert-Jan Roskam
It seems that you are not opening the file properly. You could do
f = file('///Users/joebatt/Desktop/python3.txt','r')
or:
withfile('///Users/joebatt/Desktop/python3.txt','r') as f:
  for line in f:
    m = re.search([A-Z]{3}[a-z][A-Z]{3}, line)
    if m:
  print(Pattern found)
  print(m.group(0))
 
Cheers!!
Albert-Jan


~~
All right, but apart from the sanitation, the medicine, education, wine, public 
order, irrigation, roads, a fresh water system, and public health, what have 
the Romans ever done for us?
~~



From: Joe Batt joeb...@hotmail.co.uk
To: tutor@python.org
Sent: Friday, November 4, 2011 8:42 PM
Subject: [Tutor] Help with re in Python 3


 
Hi all,
Still trying with Python and programming in general…. 


I am trying to get a grip with re. I am writing a program to open a text file 
and scan it for exactly 3 uppercase letters in a row followed by a lowercase 
followed by exactly 3 uppercase letters. ( i.e.  oooXXXoXXXooo )
If possible could you explain why I am getting EOL while scanning string 
literal when I try running the following program in Python 3.


My program:


import re
regexp=re.compile(r[a-z]r[A-Z]r[A-Z]r[A-Z]r[a-z]r[A-Z]r[A-Z]r[A-Z]r[a-z])
                  
file=('///Users/joebatt/Desktop/python3.txt','r')
for line in file.readlines():
    if regexp.search(line):
        print(Found value 3 caps followed by lower case followed by 3 caps)
file.close()


If possible could you explain why I am getting EOL while scanning string 
literal when I try running my program in Python 3.


Thanks for your help


Joe




___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Help with re in Python 3

2011-11-04 Thread Prasad, Ramit
m = re.search([A-Z]{3}[a-z][A-Z]{3}, line)

That is the expression I would suggest, except it is still more efficient to 
use a compiled regular expression like the original version.

Ramit


Ramit Prasad | JPMorgan Chase Investment Bank | Currencies Technology
712 Main Street | Houston, TX 77002
work phone: 713 - 216 - 5423



This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Help with re in Python 3

2011-11-04 Thread Steven D'Aprano

Prasad, Ramit wrote:

m = re.search([A-Z]{3}[a-z][A-Z]{3}, line)


That is the expression I would suggest, except it is still more
efficient to use a compiled regular expression like the original
version.


Not necessarily. The Python regex module caches recently used regex 
strings, avoiding re-compiling them when possible.


However there is no guarantee on how many regexes are kept in the cache, 
so if you care, it is safer to keep your own compiled version.




--
Steven
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] help with re module and parsing data

2011-03-07 Thread Kushal Kumaran
On Mon, Mar 7, 2011 at 1:24 PM, vineeth vineethrak...@gmail.com wrote:
 Hello all I am doing some analysis on my trace file. I am finding the lines
 Recvd-Content and Published-Content. I am able to find those lines but the
 re module as predicted just gives the word that is being searched. But I
 require the entire  line similar to a grep in unix. Can some one tell me how
 to do this. I am doing the following way.

 import re
 file = open('file.txt','r')
 file2 = open('newfile.txt','w')

 LineFile = ' '

 for line in file:
    LineFile += line

 StripRcvdCnt = re.compile('(P\w+\S\Content|Re\w+\S\Content)')

 FindRcvdCnt = re.findall(StripRcvdCnt, LineFile)

 for SrcStr in FindRcvdCnt:
    file2.write(SrcStr)


Is there any particular reason why you're using regular expressions
for this?  You are already iterating over the lines in your first for
loop.  You can just make the tests you need there.

for line in file:
  if 'Recvd-Content' in line or 'Published-Content' in line:
do something with the line

Your regular expression seems like it will match a lot more strings
than the two you mentioned earlier.

Also, 'file' is a python built-in.  It will be best to use a different
name for your variable.

-- 
regards,
kushal
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] help with re module and parsing data

2011-03-07 Thread wesley chun
 import re
 file = open('file.txt','r')
 file2 = open('newfile.txt','w')

 LineFile = ' '

 for line in file:
    LineFile += line

 StripRcvdCnt = re.compile('(P\w+\S\Content|Re\w+\S\Content)')

 FindRcvdCnt = re.findall(StripRcvdCnt, LineFile)

 for SrcStr in FindRcvdCnt:
    file2.write(SrcStr)


 Is there any particular reason why you're using regular expressions
 for this?  You are already iterating over the lines in your first for
 loop.  You can just make the tests you need there.

 for line in file:
  if 'Recvd-Content' in line or 'Published-Content' in line:
    do something with the line

 Your regular expression seems like it will match a lot more strings
 than the two you mentioned earlier.

 Also, 'file' is a python built-in.  It will be best to use a different
 name for your variable.


i have a few suggestions as well:

1) class names should be titlecased, not ordinary variables, so
LineFile should be linefile, line_file, or lineFile.

2) you don't need to read in the file one line at-a-time. you can just
do linefile = f.read() ... this reads the entire file in as one
massive string.

3) you don't need to compile your regex (unless you will be using this
pattern over and over within one execution of this script). you can
just call findall() directly: findrcvdcnt =
re.findall('(P\w+\S\Content|Re\w+\S\Content)', LineFile)

hope this helps!
-- wesley
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Core Python, Prentice Hall, (c)2007,2001
Python Fundamentals, Prentice Hall, (c)2009
    http://corepython.com

wesley.chun : wescpy-gmail.com : @wescpy
python training and technical consulting
cyberweb.consulting : silicon valley, ca
http://cyberwebconsulting.com
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] help with re module and parsing data

2011-03-07 Thread Steven D'Aprano
On Mon, 7 Mar 2011 06:54:30 pm vineeth wrote:
 Hello all I am doing some analysis on my trace file. I am finding the
 lines Recvd-Content and Published-Content. I am able to find those
 lines but the re module as predicted just gives the word that is
 being searched. But I require the entire  line similar to a grep in
 unix. Can some one tell me how to do this. I am doing the following
 way.

If you want to match *lines*, then you need to process each line 
individually, not the whole file at once. Something like this:

for line in open('file.txt'):
if Recvd-Content in line or Published-Content in line:
process_match(line)

A simple substring test should be enough, that will be *really* fast. 
But if you need a more heavy-duty test, you can use a regex, but 
remember that regexes are usually slow.

pattern = 'whatever...'
for line in open('file.txt'):
if re.search(pattern, line):
process_match(line)


Some further comments below:


 import re
 file = open('file.txt','r')
 file2 = open('newfile.txt','w')

 LineFile = ' '

Why do you initialise LineFile to a single space, instead of the empty 
string?


 for line in file:
  LineFile += line

Don't do that! Seriously, that is completely the wrong way.

What this does is something like this:

Set LineFile to  .
Read one line from the file.
Make a copy of LineFile plus line 1.
Assign that new string to LineFile.
Delete the old contents of LineFile.
Read the second line from the file.
Make a copy of LineFile plus line 2.
Assign that new string to LineFile.
Delete the old contents of LineFile.
Read the third line from the file.
Make a copy of LineFile plus line 3.
and so on... 

Can you see how much copying of data is being done? If there are 1000 
lines in the file, the first line gets copied 1000 times, the second 
line 999 times, the third 998 times... See this essay for more about 
why this is s-l-o-w:

http://www.joelonsoftware.com/articles/fog000319.html

Now, it turns out that *some* versions of Python have a clever 
optimization which, *sometimes*, can speed that up. But you shouldn't 
rely on it. The better way to add many strings is:

accumulator = []
for s in some_strings:
accumulator.append(s)
result = ''.join(accumulator)

But in your case, when reading from a file, an even better way is to 
just read from the file in one chunk!

LineFile = open('file.txt','r').read()



-- 
Steven D'Aprano
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Help on RE

2011-01-23 Thread Japhy Bartlett
it's a bug in your regex - you want something like -?\d+

- japhy

On Sat, Jan 22, 2011 at 7:38 PM, tee chwee liong tc...@hotmail.com wrote:
 hi,

 i have a set of data and using re to extract it into array. however i only
 get positive value, how to extract the whole value including the -ve sign?
 For eg:

 Platform: PC
 Tempt : 25
 TAP0 :0
 TAP1 :1
 +
 Port Chnl Lane EyVt EyHt
 +
 0  1  1  75  55
 0  1  2  10 35
 0  1  3  25 35
 0  1  4  35 25
 0  1  5  10 -1
 +
 Time: 20s

 When i run my code, i get 1 instead of -1 in the last line. here is my code.
 pls advise. i'm using Python 2.5 and Win XP. tq
 ##code###
 import re
 file = open(C:/Python25/myscript/plot/sampledata.txt, r)
 x1 = []
 y1 = []
 y2 = []
 for line in file:
     numbers = re.findall(\d+, line)
     print numbers

 ___
 Tutor maillist  -  Tutor@python.org
 To unsubscribe or change subscription options:
 http://mail.python.org/mailman/listinfo/tutor


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Help on RE

2011-01-23 Thread Albert-Jan Roskam


http://imgs.xkcd.com/comics/regular_expressions.png

;-)
 
Cheers!!
Albert-Jan


~~
All right, but apart from the sanitation, the medicine, education, wine, public 
order, irrigation, roads, a fresh water system, and public health, what have 
the 
Romans ever done for us?
~~





From: Steven D'Aprano st...@pearwood.info
To: tutor@python.org
Sent: Sun, January 23, 2011 4:10:35 AM
Subject: Re: [Tutor] Help on RE

tee chwee liong wrote:
 thanks for making me understand more on re. re is a confusing topic as i'm 
starting on python. 


I quote the great Jamie Zawinski, a world-class programmer and hacker:

Some people, when confronted with a problem, think 'I know, I'll
use regular expressions. Now they have two problems.


Zawinski doesn't mean that you should never use regexes. But they should be 
used 
only when necessary, for problems that are difficult enough to require a 
dedicated domain-specific language for solving search problems.

Because that's what regexes are: they're a programming language for text 
searching. They're not a full-featured programming language like Python 
(technically, they are not Turing Complete) but nevertheless they are a 
programming language. A programming language with a complicated, obscure, 
hideously ugly syntax (and people complain about Forth!). Even the creator of 
Perl, Larry Wall, has complained about regex syntax and gives 19 serious faults 
with regular expressions:

http://dev.perl.org/perl6/doc/design/apo/A05.html

Most people turn to regexes much too quickly, using them to solve problems that 
are either too small to need regexes, or too large. Using regexes for solving 
your problem is like using a chainsaw for peeling an orange.

Your data is very simple, and doesn't need regexes. It looks like this:


Platform: PC
Tempt : 25
TAP0 :0
TAP1 :1
+
Port Chnl Lane EyVt EyHt
+
0  1  1  75  55
0  1  2  10 35
0  1  3  25 35
0  1  4  35 25
0  1  5  10 -1
+
Time: 20s


The part you care about is the table of numbers, each line looks like this:

0  1  5  10 -1

The easiest way to parse this line is this:

numbers = [int(word) for word in line.split()]

All you need then is a way of telling whether you have a line in the table, or 
a 
header. That's easy -- just catch the exception and ignore it.

template = Port=%d, Channel=%d, Lane=%d, EyVT=%d, EyHT=%d
for line in lines:
try:
numbers = [int(word) for word in line.split()]
except ValueError:
continue
print(template % tuple(numbers))


Too easy. Adding regexes just makes it slow, fragile, and difficult.


My advice is, any time you think you might need regexes, you probably don't.


-- Steven

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor



  ___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Help on RE

2011-01-22 Thread tee chwee liong

thanks it works!! :) 
 
 Date: Sat, 22 Jan 2011 19:51:35 -0500
 Subject: Re: [Tutor] Help on RE
 From: ja...@pearachute.com
 To: tc...@hotmail.com
 CC: tutor@python.org
 
 it's a bug in your regex - you want something like -?\d+
 
 - japhy
 
 On Sat, Jan 22, 2011 at 7:38 PM, tee chwee liong tc...@hotmail.com wrote:
  hi,
 
  i have a set of data and using re to extract it into array. however i only
  get positive value, how to extract the whole value including the -ve sign?
  For eg:
 
  Platform: PC
  Tempt : 25
  TAP0 :0
  TAP1 :1
  +
  Port Chnl Lane EyVt EyHt
  +
  0  1  1  75  55
  0  1  2  10 35
  0  1  3  25 35
  0  1  4  35 25
  0  1  5  10 -1
  +
  Time: 20s
 
  When i run my code, i get 1 instead of -1 in the last line. here is my code.
  pls advise. i'm using Python 2.5 and Win XP. tq
  ##code###
  import re
  file = open(C:/Python25/myscript/plot/sampledata.txt, r)
  x1 = []
  y1 = []
  y2 = []
  for line in file:
  numbers = re.findall(\d+, line)
  print numbers
 
  ___
  Tutor maillist  -  Tutor@python.org
  To unsubscribe or change subscription options:
  http://mail.python.org/mailman/listinfo/tutor
 
 
  ___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Help on RE

2011-01-22 Thread tee chwee liong

thanks for making me understand more on re. re is a confusing topic as i'm 
starting on python. 
 
 Date: Sat, 22 Jan 2011 16:55:37 -0800
 From: st...@alchemy.com
 To: tc...@hotmail.com
 CC: tutor@python.org
 Subject: Re: [Tutor] Help on RE
 
 On Sun, Jan 23, 2011 at 12:38:10AM +, tee chwee liong wrote:
  i have a set of data and using re to extract it into array. however i only 
  get positive value, how to extract the whole value including the -ve sign? 
  numbers = re.findall(\d+, line)
 
 The \d matches a digit character. \d+ matches one or more digit characters. 
 Nothing in your regex matches a sign character. You might want something like 
 [-+]\d+
 which would require either a - or + followed by digits. If you want the sign
 to be optional, maybe this would work:
 [-+]?\d+
 
 
 
 
 -- 
 Steve Willoughby | Using billion-dollar satellites
 st...@alchemy.com | to hunt for Tupperware.
  ___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Help on RE

2011-01-22 Thread Steve Willoughby
On Sun, Jan 23, 2011 at 12:38:10AM +, tee chwee liong wrote:
 i have a set of data and using re to extract it into array. however i only 
 get positive value, how to extract the whole value including the -ve sign? 
 numbers = re.findall(\d+, line)

The \d matches a digit character.  \d+ matches one or more digit characters. 
Nothing in your regex matches a sign character.  You might want something like 
   [-+]\d+
which would require either a - or + followed by digits.  If you want the sign
to be optional, maybe this would work:
   [-+]?\d+




-- 
Steve Willoughby|  Using billion-dollar satellites
st...@alchemy.com   |  to hunt for Tupperware.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Help on RE

2011-01-22 Thread Steven D'Aprano

tee chwee liong wrote:
thanks for making me understand more on re. re is a confusing topic as i'm starting on python. 


I quote the great Jamie Zawinski, a world-class programmer and hacker:

Some people, when confronted with a problem, think 'I know, I'll
use regular expressions. Now they have two problems.


Zawinski doesn't mean that you should never use regexes. But they should 
be used only when necessary, for problems that are difficult enough to 
require a dedicated domain-specific language for solving search problems.


Because that's what regexes are: they're a programming language for text 
searching. They're not a full-featured programming language like Python 
(technically, they are not Turing Complete) but nevertheless they are a 
programming language. A programming language with a complicated, 
obscure, hideously ugly syntax (and people complain about Forth!). Even 
the creator of Perl, Larry Wall, has complained about regex syntax and 
gives 19 serious faults with regular expressions:


http://dev.perl.org/perl6/doc/design/apo/A05.html

Most people turn to regexes much too quickly, using them to solve 
problems that are either too small to need regexes, or too large. Using 
regexes for solving your problem is like using a chainsaw for peeling an 
orange.


Your data is very simple, and doesn't need regexes. It looks like this:


Platform: PC
Tempt : 25
TAP0 :0
TAP1 :1
+
Port Chnl Lane EyVt EyHt
+
0  1  1  75  55
0  1  2  10 35
0  1  3  25 35
0  1  4  35 25
0  1  5  10 -1
+
Time: 20s


The part you care about is the table of numbers, each line looks like this:

0  1  5  10 -1

The easiest way to parse this line is this:

numbers = [int(word) for word in line.split()]

All you need then is a way of telling whether you have a line in the 
table, or a header. That's easy -- just catch the exception and ignore it.


template = Port=%d, Channel=%d, Lane=%d, EyVT=%d, EyHT=%d
for line in lines:
try:
numbers = [int(word) for word in line.split()]
except ValueError:
continue
print(template % tuple(numbers))


Too easy. Adding regexes just makes it slow, fragile, and difficult.


My advice is, any time you think you might need regexes, you probably don't.


--
Steven

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Help on RE

2011-01-22 Thread tee chwee liong

elegant. :) 
simple yet elegant. 
 

 
 Date: Sun, 23 Jan 2011 14:10:35 +1100
 From: st...@pearwood.info
 To: tutor@python.org
 Subject: Re: [Tutor] Help on RE
 
 tee chwee liong wrote:
  thanks for making me understand more on re. re is a confusing topic as i'm 
  starting on python. 
 
 I quote the great Jamie Zawinski, a world-class programmer and hacker:
 
 Some people, when confronted with a problem, think 'I know, I'll
 use regular expressions. Now they have two problems.
 
 
 Zawinski doesn't mean that you should never use regexes. But they should 
 be used only when necessary, for problems that are difficult enough to 
 require a dedicated domain-specific language for solving search problems.
 
 Because that's what regexes are: they're a programming language for text 
 searching. They're not a full-featured programming language like Python 
 (technically, they are not Turing Complete) but nevertheless they are a 
 programming language. A programming language with a complicated, 
 obscure, hideously ugly syntax (and people complain about Forth!). Even 
 the creator of Perl, Larry Wall, has complained about regex syntax and 
 gives 19 serious faults with regular expressions:
 
 http://dev.perl.org/perl6/doc/design/apo/A05.html
 
 Most people turn to regexes much too quickly, using them to solve 
 problems that are either too small to need regexes, or too large. Using 
 regexes for solving your problem is like using a chainsaw for peeling an 
 orange.
 
 Your data is very simple, and doesn't need regexes. It looks like this:
 
 
 Platform: PC
 Tempt : 25
 TAP0 :0
 TAP1 :1
 +
 Port Chnl Lane EyVt EyHt
 +
 0 1 1 75 55
 0 1 2 10 35
 0 1 3 25 35
 0 1 4 35 25
 0 1 5 10 -1
 +
 Time: 20s
 
 
 The part you care about is the table of numbers, each line looks like this:
 
 0 1 5 10 -1
 
 The easiest way to parse this line is this:
 
 numbers = [int(word) for word in line.split()]
 
 All you need then is a way of telling whether you have a line in the 
 table, or a header. That's easy -- just catch the exception and ignore it.
 
 template = Port=%d, Channel=%d, Lane=%d, EyVT=%d, EyHT=%d
 for line in lines:
 try:
 numbers = [int(word) for word in line.split()]
 except ValueError:
 continue
 print(template % tuple(numbers))
 
 
 Too easy. Adding regexes just makes it slow, fragile, and difficult.
 
 
 My advice is, any time you think you might need regexes, you probably don't.
 
 
 -- 
 Steven
 
 ___
 Tutor maillist - Tutor@python.org
 To unsubscribe or change subscription options:
 http://mail.python.org/mailman/listinfo/tutor
  ___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor