On Feb 21, 11:40 am, [EMAIL PROTECTED] wrote: > On Feb 20, 4:15 pm, "John Machin" <[EMAIL PROTECTED]> wrote: > > > What is an "exclusionary set"? It would help enormously if you were to > > tell us what the regex actually is. Feel free to obfuscate any > > proprietary constant strings, of course. > > My apologies. I don't have specifics right now, but it's something > along the line of this: > > error_list = re.compile(r"error|miss|issing|inval|nvalid|math") > exclusion_list = re.complie(r"No Errors Found|Premature EOF, stopping > translate") > > for test_text in test_file: > if error_list.match(test_text) and not > exclusion_list.match(test_text): > #Process test_text > > Yes, I know, these are not re expressions, but the requirements for > the script specified that the error list be capable of accepting > regular expressions, since these lists are configurable.
You could do a quick check on the list of patterns; if they are simple strings (no ()+*?[] etc), then you could use Danny Yoo's ahocorasick module (from http://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/): >>> import ahocorasick as ac >>> gadget = ac.KeywordTree() >>> for text in "error|miss|issing|inval|nvalid|math".split('|'): ... gadget.add(text) ... >>> gadget.make() >>> def showall(machine, line): ... startpos = 0 ... while 1: ... result = machine.search(line, startpos) ... if not result: return ... beg, end = result ... print beg, end, repr(line[beg:end]) ... startpos = beg + 1 ... >>> showall(gadget, 'the hissing misses invalidated erroneous mathematics') 5 11 'issing' 12 16 'miss' 19 24 'inval' 20 26 'nvalid' 41 45 'math' >>> showall(gadget, 'the hissing misses invalidated terror mathematics') 5 11 'issing' 12 16 'miss' 19 24 'inval' 20 26 'nvalid' 32 37 'error' 38 42 'math' >>> showall(gadget, 'these are not the droids that you are looking for') >>> But of course you would just use a simple search() per line -- I'm just showing you that it works :-) > > > What system calls? Do you mean running grep as a subprocess? > > Yes. While this may not seem evil in and of itself, we are trying to > get our company to adopt Python into more widespread use. I'm guessing > the limiting factor isn't python, but us python newbies missing an > obvious way to speed up the process. You (can't/don't have to) write everything in Python or any other single language. One of Pythons's very good points is that you can connect it to anything -- or if you can't, just ask in this newsgroup :-) Look, boss, yesterday we got it running grep; today we're calling a module written in C; not bad for a bunch of newbies, eh? > And python 2.5.2. 2.5.2? Who needs crystal balls when you've got a time machine? Or did you mean 2.5? Or 1.5.2 -- say it ain't so, Joe! Cheers, John -- http://mail.python.org/mailman/listinfo/python-list