On May 29, 1:26 pm, jared.s.ba...@gmail.com wrote: > Hello, > > I'm new to python and I'm having problems with a regular expression. I > use textmate as my editor and when I run the regex in textmate it > works fine, but when I run it as part of the script it freezes. Could > anyone help me figure out why this is happening and how to fix it. > Here is the script: > > ====================================================== > # regular expression search and replace > import sys, os, re, string, csv > > #Open the file and taking its data > myfile=open('Steve_query3.csv') #Steve_query_test.csv > #create an error flag to loop the script twice > #store all file's data in the string object 'text' > myfile.seek(0) > text = myfile.read() > > for i in range(2): > #def textParse(text, reRun): > print 'how many times is this getting executed', i > > #Now to create the newfile 'test' and write our 'text' > newfile = open('Steve_query3_out.csv', 'w') > #open the new file and set it with 'w' for "write" > #loop trough 'text' clean them up and write them into the 'newfile' > #sub( pattern, repl, string[, count]) > #"sub("(?i)b+", "x", "bbbb BBBB")" returns 'x x'. > text = re.sub('(\<(/?[^\>]+)\>)', "", text)#remove the HTML > text = re.sub('/<!--(.|\s)*?-->/', "", text) #remove comments <!--[^ > \-]+--> > text = re.sub('\/\*(.|\s)*?;}', "", text) #remove css formatting > #remove a bunch of word formatting yuck > text = re.sub(" ", " ", text) > text = re.sub("<", "<", text) > text = re.sub(">", ">", text) > text = re.sub(""|&rquot;|“", "\'", text) > #=================================== > #The two following lines are the ones giving me the problems > text = re.sub("w:(.|\s)*?\n", "", text) > text = re.sub("UnhideWhenUsed=(.|\s)*?\n", "", text) > #=========================================== > text = re.sub(re.compile('^\r?\n?$', re.MULTILINE), '', text) #remove > the extra whitespace > #now write out the new file and close it > newfile.write(text) > newfile.close() > > #open the newfile and run the script again > #Open the file and taking its data > > myfile=open('Steve_query3_out.csv') #Steve_query_test.csv > #store all file's data in the string object 'text' > myfile.seek(0) > text = myfile.read() > > Thanks for the help, > > -Jared
Can you give a string that you would expect the regex to match and what the expected result would be? Currently, it looks like the interesting part of the regex (.|\s)*? would match any character of any length once. There seems to be some redundancy that makes it more confusing then it needs to be. I'm pretty sure that . will also match anything that \s will match or maybe you just need to escape . because you meant for it to be a literal. -- http://mail.python.org/mailman/listinfo/python-list