On Feb 20, 4:15 pm, "John Machin" <[EMAIL PROTECTED]> wrote:
> What is an "exclusionary set"? It would help enormously if you were to > tell us what the regex actually is. Feel free to obfuscate any > proprietary constant strings, of course. My apologies. I don't have specifics right now, but it's something along the line of this: error_list = re.compile(r"error|miss|issing|inval|nvalid|math") exclusion_list = re.complie(r"No Errors Found|Premature EOF, stopping translate") for test_text in test_file: if error_list.match(test_text) and not exclusion_list.match(test_text): #Process test_text Yes, I know, these are not re expressions, but the requirements for the script specified that the error list be capable of accepting regular expressions, since these lists are configurable. > I presume you mean you didn't read the whole file into memory; > correct? 2 million lines doesn't sound like much to me; what is the > average line length and what is the spec for the machine you are > running it on? You are correct. The individual files can be anywhere from a few bytes to 2gig. The average is around one gig, and there are a number of files to be iterated over (an average of 4). I do not know the machine specs, though I can safely say it is a single core machine, sub 2.5ghz, with 2gigs of RAM running linux. > map is a built-in function, not a trick. What "tricks"? I'm using the term "tricks" where I may be obfuscating the code in an effort to make it run faster. In the case of map, getting rid of the interpreted for loop overhead in favor of the implied c loop offered by map. > What system calls? Do you mean running grep as a subprocess? Yes. While this may not seem evil in and of itself, we are trying to get our company to adopt Python into more widespread use. I'm guessing the limiting factor isn't python, but us python newbies missing an obvious way to speed up the process. > To help you, we need either (a) basic information or (b) crystal > balls. Is it possible for you to copy & paste your code into a web > browser or e-mail/news client? Telling us which version of Python you > are running might be a good idea too. Can't copy and paste code (corp policy and all that), no crystal balls for sale, though I hope the above information helps. Also, running a trace on the program indicated that python was spending a lot of time looping around lines, checking for each element of the expression in sequence. And python 2.5.2. Thanks! -- http://mail.python.org/mailman/listinfo/python-list