Re: [Tutor] Help with re in Python 3
On 2011-11-04 20:59, Albert-Jan Roskam wrote: It seems that you are not opening the file properly. You could do f = file('///Users/joebatt/Desktop/python3.txt','r') or: withfile('///Users/joebatt/Desktop/python3.txt','r') as f: OP is using Python 3, where file is removed. Thus, you have to use open: f = open('...') with open('...') as f: Bye, Andreas ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Help with re in Python 3
On Fri, Nov 4, 2011 at 3:42 PM, Joe Batt joeb...@hotmail.co.uk wrote: Hi all, Still trying with Python and programming in general…. I am trying to get a grip with re. I am writing a program to open a text file and scan it for exactly 3 uppercase letters in a row followed by a lowercase followed by exactly 3 uppercase letters. ( i.e. oooXXXoXXXooo ) If possible could you explain why I am getting EOL while scanning string literal when I try running the following program in Python 3. My program: import re regexp=re.compile(r[a-z]r[A-Z]r[A-Z]r[A-Z]r[a-z]r[A-Z]r[A-Z]r[A-Z]r[a-z]) file=('///Users/joebatt/Desktop/python3.txt','r') for line in file.readlines(): if regexp.search(line): print(Found value 3 caps followed by lower case followed by 3 caps) file.close() If possible could you explain why I am getting EOL while scanning string literal when I try running my program in Python 3. Thanks for your help Joe ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor You should read a little more about regular expressions to simplify yours, but I believe your problem is that you have no closing after this: r[a-z]) change it to r[a-z]) -- Joel Goldstick ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Help with re in Python 3
It seems that you are not opening the file properly. You could do f = file('///Users/joebatt/Desktop/python3.txt','r') or: withfile('///Users/joebatt/Desktop/python3.txt','r') as f: for line in f: m = re.search([A-Z]{3}[a-z][A-Z]{3}, line) if m: print(Pattern found) print(m.group(0)) Cheers!! Albert-Jan ~~ All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us? ~~ From: Joe Batt joeb...@hotmail.co.uk To: tutor@python.org Sent: Friday, November 4, 2011 8:42 PM Subject: [Tutor] Help with re in Python 3 Hi all, Still trying with Python and programming in general…. I am trying to get a grip with re. I am writing a program to open a text file and scan it for exactly 3 uppercase letters in a row followed by a lowercase followed by exactly 3 uppercase letters. ( i.e. oooXXXoXXXooo ) If possible could you explain why I am getting EOL while scanning string literal when I try running the following program in Python 3. My program: import re regexp=re.compile(r[a-z]r[A-Z]r[A-Z]r[A-Z]r[a-z]r[A-Z]r[A-Z]r[A-Z]r[a-z]) file=('///Users/joebatt/Desktop/python3.txt','r') for line in file.readlines(): if regexp.search(line): print(Found value 3 caps followed by lower case followed by 3 caps) file.close() If possible could you explain why I am getting EOL while scanning string literal when I try running my program in Python 3. Thanks for your help Joe ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Help with re in Python 3
m = re.search([A-Z]{3}[a-z][A-Z]{3}, line) That is the expression I would suggest, except it is still more efficient to use a compiled regular expression like the original version. Ramit Ramit Prasad | JPMorgan Chase Investment Bank | Currencies Technology 712 Main Street | Houston, TX 77002 work phone: 713 - 216 - 5423 This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Help with re in Python 3
Prasad, Ramit wrote: m = re.search([A-Z]{3}[a-z][A-Z]{3}, line) That is the expression I would suggest, except it is still more efficient to use a compiled regular expression like the original version. Not necessarily. The Python regex module caches recently used regex strings, avoiding re-compiling them when possible. However there is no guarantee on how many regexes are kept in the cache, so if you care, it is safer to keep your own compiled version. -- Steven ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] help with re module and parsing data
On Mon, Mar 7, 2011 at 1:24 PM, vineeth vineethrak...@gmail.com wrote: Hello all I am doing some analysis on my trace file. I am finding the lines Recvd-Content and Published-Content. I am able to find those lines but the re module as predicted just gives the word that is being searched. But I require the entire line similar to a grep in unix. Can some one tell me how to do this. I am doing the following way. import re file = open('file.txt','r') file2 = open('newfile.txt','w') LineFile = ' ' for line in file: LineFile += line StripRcvdCnt = re.compile('(P\w+\S\Content|Re\w+\S\Content)') FindRcvdCnt = re.findall(StripRcvdCnt, LineFile) for SrcStr in FindRcvdCnt: file2.write(SrcStr) Is there any particular reason why you're using regular expressions for this? You are already iterating over the lines in your first for loop. You can just make the tests you need there. for line in file: if 'Recvd-Content' in line or 'Published-Content' in line: do something with the line Your regular expression seems like it will match a lot more strings than the two you mentioned earlier. Also, 'file' is a python built-in. It will be best to use a different name for your variable. -- regards, kushal ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] help with re module and parsing data
import re file = open('file.txt','r') file2 = open('newfile.txt','w') LineFile = ' ' for line in file: LineFile += line StripRcvdCnt = re.compile('(P\w+\S\Content|Re\w+\S\Content)') FindRcvdCnt = re.findall(StripRcvdCnt, LineFile) for SrcStr in FindRcvdCnt: file2.write(SrcStr) Is there any particular reason why you're using regular expressions for this? You are already iterating over the lines in your first for loop. You can just make the tests you need there. for line in file: if 'Recvd-Content' in line or 'Published-Content' in line: do something with the line Your regular expression seems like it will match a lot more strings than the two you mentioned earlier. Also, 'file' is a python built-in. It will be best to use a different name for your variable. i have a few suggestions as well: 1) class names should be titlecased, not ordinary variables, so LineFile should be linefile, line_file, or lineFile. 2) you don't need to read in the file one line at-a-time. you can just do linefile = f.read() ... this reads the entire file in as one massive string. 3) you don't need to compile your regex (unless you will be using this pattern over and over within one execution of this script). you can just call findall() directly: findrcvdcnt = re.findall('(P\w+\S\Content|Re\w+\S\Content)', LineFile) hope this helps! -- wesley - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Core Python, Prentice Hall, (c)2007,2001 Python Fundamentals, Prentice Hall, (c)2009 http://corepython.com wesley.chun : wescpy-gmail.com : @wescpy python training and technical consulting cyberweb.consulting : silicon valley, ca http://cyberwebconsulting.com ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] help with re module and parsing data
On Mon, 7 Mar 2011 06:54:30 pm vineeth wrote: Hello all I am doing some analysis on my trace file. I am finding the lines Recvd-Content and Published-Content. I am able to find those lines but the re module as predicted just gives the word that is being searched. But I require the entire line similar to a grep in unix. Can some one tell me how to do this. I am doing the following way. If you want to match *lines*, then you need to process each line individually, not the whole file at once. Something like this: for line in open('file.txt'): if Recvd-Content in line or Published-Content in line: process_match(line) A simple substring test should be enough, that will be *really* fast. But if you need a more heavy-duty test, you can use a regex, but remember that regexes are usually slow. pattern = 'whatever...' for line in open('file.txt'): if re.search(pattern, line): process_match(line) Some further comments below: import re file = open('file.txt','r') file2 = open('newfile.txt','w') LineFile = ' ' Why do you initialise LineFile to a single space, instead of the empty string? for line in file: LineFile += line Don't do that! Seriously, that is completely the wrong way. What this does is something like this: Set LineFile to . Read one line from the file. Make a copy of LineFile plus line 1. Assign that new string to LineFile. Delete the old contents of LineFile. Read the second line from the file. Make a copy of LineFile plus line 2. Assign that new string to LineFile. Delete the old contents of LineFile. Read the third line from the file. Make a copy of LineFile plus line 3. and so on... Can you see how much copying of data is being done? If there are 1000 lines in the file, the first line gets copied 1000 times, the second line 999 times, the third 998 times... See this essay for more about why this is s-l-o-w: http://www.joelonsoftware.com/articles/fog000319.html Now, it turns out that *some* versions of Python have a clever optimization which, *sometimes*, can speed that up. But you shouldn't rely on it. The better way to add many strings is: accumulator = [] for s in some_strings: accumulator.append(s) result = ''.join(accumulator) But in your case, when reading from a file, an even better way is to just read from the file in one chunk! LineFile = open('file.txt','r').read() -- Steven D'Aprano ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Help on RE
it's a bug in your regex - you want something like -?\d+ - japhy On Sat, Jan 22, 2011 at 7:38 PM, tee chwee liong tc...@hotmail.com wrote: hi, i have a set of data and using re to extract it into array. however i only get positive value, how to extract the whole value including the -ve sign? For eg: Platform: PC Tempt : 25 TAP0 :0 TAP1 :1 + Port Chnl Lane EyVt EyHt + 0 1 1 75 55 0 1 2 10 35 0 1 3 25 35 0 1 4 35 25 0 1 5 10 -1 + Time: 20s When i run my code, i get 1 instead of -1 in the last line. here is my code. pls advise. i'm using Python 2.5 and Win XP. tq ##code### import re file = open(C:/Python25/myscript/plot/sampledata.txt, r) x1 = [] y1 = [] y2 = [] for line in file: numbers = re.findall(\d+, line) print numbers ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Help on RE
http://imgs.xkcd.com/comics/regular_expressions.png ;-) Cheers!! Albert-Jan ~~ All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us? ~~ From: Steven D'Aprano st...@pearwood.info To: tutor@python.org Sent: Sun, January 23, 2011 4:10:35 AM Subject: Re: [Tutor] Help on RE tee chwee liong wrote: thanks for making me understand more on re. re is a confusing topic as i'm starting on python. I quote the great Jamie Zawinski, a world-class programmer and hacker: Some people, when confronted with a problem, think 'I know, I'll use regular expressions. Now they have two problems. Zawinski doesn't mean that you should never use regexes. But they should be used only when necessary, for problems that are difficult enough to require a dedicated domain-specific language for solving search problems. Because that's what regexes are: they're a programming language for text searching. They're not a full-featured programming language like Python (technically, they are not Turing Complete) but nevertheless they are a programming language. A programming language with a complicated, obscure, hideously ugly syntax (and people complain about Forth!). Even the creator of Perl, Larry Wall, has complained about regex syntax and gives 19 serious faults with regular expressions: http://dev.perl.org/perl6/doc/design/apo/A05.html Most people turn to regexes much too quickly, using them to solve problems that are either too small to need regexes, or too large. Using regexes for solving your problem is like using a chainsaw for peeling an orange. Your data is very simple, and doesn't need regexes. It looks like this: Platform: PC Tempt : 25 TAP0 :0 TAP1 :1 + Port Chnl Lane EyVt EyHt + 0 1 1 75 55 0 1 2 10 35 0 1 3 25 35 0 1 4 35 25 0 1 5 10 -1 + Time: 20s The part you care about is the table of numbers, each line looks like this: 0 1 5 10 -1 The easiest way to parse this line is this: numbers = [int(word) for word in line.split()] All you need then is a way of telling whether you have a line in the table, or a header. That's easy -- just catch the exception and ignore it. template = Port=%d, Channel=%d, Lane=%d, EyVT=%d, EyHT=%d for line in lines: try: numbers = [int(word) for word in line.split()] except ValueError: continue print(template % tuple(numbers)) Too easy. Adding regexes just makes it slow, fragile, and difficult. My advice is, any time you think you might need regexes, you probably don't. -- Steven ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Help on RE
thanks it works!! :) Date: Sat, 22 Jan 2011 19:51:35 -0500 Subject: Re: [Tutor] Help on RE From: ja...@pearachute.com To: tc...@hotmail.com CC: tutor@python.org it's a bug in your regex - you want something like -?\d+ - japhy On Sat, Jan 22, 2011 at 7:38 PM, tee chwee liong tc...@hotmail.com wrote: hi, i have a set of data and using re to extract it into array. however i only get positive value, how to extract the whole value including the -ve sign? For eg: Platform: PC Tempt : 25 TAP0 :0 TAP1 :1 + Port Chnl Lane EyVt EyHt + 0 1 1 75 55 0 1 2 10 35 0 1 3 25 35 0 1 4 35 25 0 1 5 10 -1 + Time: 20s When i run my code, i get 1 instead of -1 in the last line. here is my code. pls advise. i'm using Python 2.5 and Win XP. tq ##code### import re file = open(C:/Python25/myscript/plot/sampledata.txt, r) x1 = [] y1 = [] y2 = [] for line in file: numbers = re.findall(\d+, line) print numbers ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Help on RE
thanks for making me understand more on re. re is a confusing topic as i'm starting on python. Date: Sat, 22 Jan 2011 16:55:37 -0800 From: st...@alchemy.com To: tc...@hotmail.com CC: tutor@python.org Subject: Re: [Tutor] Help on RE On Sun, Jan 23, 2011 at 12:38:10AM +, tee chwee liong wrote: i have a set of data and using re to extract it into array. however i only get positive value, how to extract the whole value including the -ve sign? numbers = re.findall(\d+, line) The \d matches a digit character. \d+ matches one or more digit characters. Nothing in your regex matches a sign character. You might want something like [-+]\d+ which would require either a - or + followed by digits. If you want the sign to be optional, maybe this would work: [-+]?\d+ -- Steve Willoughby | Using billion-dollar satellites st...@alchemy.com | to hunt for Tupperware. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Help on RE
On Sun, Jan 23, 2011 at 12:38:10AM +, tee chwee liong wrote: i have a set of data and using re to extract it into array. however i only get positive value, how to extract the whole value including the -ve sign? numbers = re.findall(\d+, line) The \d matches a digit character. \d+ matches one or more digit characters. Nothing in your regex matches a sign character. You might want something like [-+]\d+ which would require either a - or + followed by digits. If you want the sign to be optional, maybe this would work: [-+]?\d+ -- Steve Willoughby| Using billion-dollar satellites st...@alchemy.com | to hunt for Tupperware. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Help on RE
tee chwee liong wrote: thanks for making me understand more on re. re is a confusing topic as i'm starting on python. I quote the great Jamie Zawinski, a world-class programmer and hacker: Some people, when confronted with a problem, think 'I know, I'll use regular expressions. Now they have two problems. Zawinski doesn't mean that you should never use regexes. But they should be used only when necessary, for problems that are difficult enough to require a dedicated domain-specific language for solving search problems. Because that's what regexes are: they're a programming language for text searching. They're not a full-featured programming language like Python (technically, they are not Turing Complete) but nevertheless they are a programming language. A programming language with a complicated, obscure, hideously ugly syntax (and people complain about Forth!). Even the creator of Perl, Larry Wall, has complained about regex syntax and gives 19 serious faults with regular expressions: http://dev.perl.org/perl6/doc/design/apo/A05.html Most people turn to regexes much too quickly, using them to solve problems that are either too small to need regexes, or too large. Using regexes for solving your problem is like using a chainsaw for peeling an orange. Your data is very simple, and doesn't need regexes. It looks like this: Platform: PC Tempt : 25 TAP0 :0 TAP1 :1 + Port Chnl Lane EyVt EyHt + 0 1 1 75 55 0 1 2 10 35 0 1 3 25 35 0 1 4 35 25 0 1 5 10 -1 + Time: 20s The part you care about is the table of numbers, each line looks like this: 0 1 5 10 -1 The easiest way to parse this line is this: numbers = [int(word) for word in line.split()] All you need then is a way of telling whether you have a line in the table, or a header. That's easy -- just catch the exception and ignore it. template = Port=%d, Channel=%d, Lane=%d, EyVT=%d, EyHT=%d for line in lines: try: numbers = [int(word) for word in line.split()] except ValueError: continue print(template % tuple(numbers)) Too easy. Adding regexes just makes it slow, fragile, and difficult. My advice is, any time you think you might need regexes, you probably don't. -- Steven ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Help on RE
elegant. :) simple yet elegant. Date: Sun, 23 Jan 2011 14:10:35 +1100 From: st...@pearwood.info To: tutor@python.org Subject: Re: [Tutor] Help on RE tee chwee liong wrote: thanks for making me understand more on re. re is a confusing topic as i'm starting on python. I quote the great Jamie Zawinski, a world-class programmer and hacker: Some people, when confronted with a problem, think 'I know, I'll use regular expressions. Now they have two problems. Zawinski doesn't mean that you should never use regexes. But they should be used only when necessary, for problems that are difficult enough to require a dedicated domain-specific language for solving search problems. Because that's what regexes are: they're a programming language for text searching. They're not a full-featured programming language like Python (technically, they are not Turing Complete) but nevertheless they are a programming language. A programming language with a complicated, obscure, hideously ugly syntax (and people complain about Forth!). Even the creator of Perl, Larry Wall, has complained about regex syntax and gives 19 serious faults with regular expressions: http://dev.perl.org/perl6/doc/design/apo/A05.html Most people turn to regexes much too quickly, using them to solve problems that are either too small to need regexes, or too large. Using regexes for solving your problem is like using a chainsaw for peeling an orange. Your data is very simple, and doesn't need regexes. It looks like this: Platform: PC Tempt : 25 TAP0 :0 TAP1 :1 + Port Chnl Lane EyVt EyHt + 0 1 1 75 55 0 1 2 10 35 0 1 3 25 35 0 1 4 35 25 0 1 5 10 -1 + Time: 20s The part you care about is the table of numbers, each line looks like this: 0 1 5 10 -1 The easiest way to parse this line is this: numbers = [int(word) for word in line.split()] All you need then is a way of telling whether you have a line in the table, or a header. That's easy -- just catch the exception and ignore it. template = Port=%d, Channel=%d, Lane=%d, EyVT=%d, EyHT=%d for line in lines: try: numbers = [int(word) for word in line.split()] except ValueError: continue print(template % tuple(numbers)) Too easy. Adding regexes just makes it slow, fragile, and difficult. My advice is, any time you think you might need regexes, you probably don't. -- Steven ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor