While creating a log parser for fairly large logs, we have run into an issue where the time to process was relatively unacceptable (upwards of 5 minutes for 1-2 million lines of logs). In contrast, using the Linux tool grep would complete the same search in a matter of seconds.
The search we used was a regex of 6 elements "or"ed together, with an exclusionary set of ~3 elements. Due to the size of the files, we decided to run these line by line, and due to the need of regex expressions, we could not use more traditional string find methods. We did pre-compile the regular expressions, and attempted tricks such as map to remove as much overhead as possible. With the known limitations of not being able to slurp the entire log file into memory, and the need to use regular expressions, do you have an ideas on how we might speed this up without resorting to system calls (our current "solution")? -- http://mail.python.org/mailman/listinfo/python-list