On May 17, 6:12 pm, John Machin <[EMAIL PROTECTED]> wrote: > > Note: "must not be *part of* any match" [my emphasis] > Ooops, my bad. See this version:
from pyparsing import Regex,ParseException,col,lineno,getTokensEndLoc # fake (and inefficient) version of any if not yet upgraded to Py2.5 any = lambda lst : sum(list(lst)) > 0 def guardedSearch(pattern, text, forbidden_offsets): def offsetValidator(strng,locn,tokens): start,end = locn,getTokensEndLoc()-1 if any( start <= i <= end for i in forbidden_offsets ): raise ParseException, "can't match at offset %d" % locn regex = Regex(pattern).setParseAction(offsetValidator) return [ (tokStart,toks[0]) for toks,tokStart,tokEnd in regex.scanString(text) ] print guardedSearch(ur"o\S", u"Hollo how are you", [8,]) def guardedSearchByColumn(pattern, text, forbidden_columns): def offsetValidator(strng,locn,tokens): start,end = col(locn,strng), col(getTokensEndLoc(),strng)-1 if any( start <= i <= end for i in forbidden_columns ): raise ParseException, "can't match at col %d" % start regex = Regex(pattern).setParseAction(offsetValidator) return [ (lineno(tokStart,text),col(tokStart,text),toks[0]) for toks,tokStart,tokEnd in regex.scanString(text) ] text = """\ alksjdflasjf;sa a;sljflsjlaj ;asjflasfja;sf aslfj;asfj;dsf aslf;lajdf;ajsf aslfj;afsj;sd """ print guardedSearchByColumn("[fa];", text, [4,12,13,]) Prints: [(1, 'ol'), (15, 'ou')] [(2, 1, 'a;'), (5, 10, 'f;')] > > While we're waiting for clarification from the OP, there's a chicken- > and-egg thought that's been nagging me: if the OP knows so much about > the searched string that he can specify offsets which search patterns > should not span, why does he still need to search it? > I suspect that this is column/tabular data (a log file perhaps?), and some columns are not interesting, but produce many false hits for the search pattern. -- Paul -- http://mail.python.org/mailman/listinfo/python-list