On Feb 21, 8:29 am, [EMAIL PROTECTED] wrote: > While creating a log parser for fairly large logs, we have run into an > issue where the time to process was relatively unacceptable (upwards > of 5 minutes for 1-2 million lines of logs). In contrast, using the > Linux tool grep would complete the same search in a matter of seconds. > > The search we used was a regex of 6 elements "or"ed together, with an > exclusionary set of ~3 elements.
What is an "exclusionary set"? It would help enormously if you were to tell us what the regex actually is. Feel free to obfuscate any proprietary constant strings, of course. > Due to the size of the files, we > decided to run these line by line, I presume you mean you didn't read the whole file into memory; correct? 2 million lines doesn't sound like much to me; what is the average line length and what is the spec for the machine you are running it on? > and due to the need of regex > expressions, we could not use more traditional string find methods. > > We did pre-compile the regular expressions, and attempted tricks such > as map to remove as much overhead as possible. map is a built-in function, not a trick. What "tricks"? > > With the known limitations of not being able to slurp the entire log > file into memory, and the need to use regular expressions, do you have > an ideas on how we might speed this up without resorting to system > calls (our current "solution")? What system calls? Do you mean running grep as a subprocess? To help you, we need either (a) basic information or (b) crystal balls. Is it possible for you to copy & paste your code into a web browser or e-mail/news client? Telling us which version of Python you are running might be a good idea too. Cheers, John -- http://mail.python.org/mailman/listinfo/python-list