John Machin wrote: > Devan L wrote: > >> John Machin wrote: >> >>> Aahz wrote: >>> >>>> In article <[EMAIL PROTECTED]>, >>>> John Machin <[EMAIL PROTECTED]> wrote: >>>> >>>> >>>>> Search for r'^something' can never be better/faster than match for >>>>> r'something', and with a dopey implementation of search [which >>>>> Python's >>>>> re is NOT] it could be much worse. So please don't tell newbies to >>>>> search for r'^something'. >>>> >>>> >>>> >>>> You're somehow getting mixed up in thinking that "^" is some kind of >>>> "not" operator -- it's the start of line anchor in this context. >>> >>> >>> I can't imagine where you got that idea from. >>> >>> If I change "[which Python's re is NOT]" to "[Python's re's search() is >>> not dopey]", does that help you? >>> >>> The point was made in a context where the OP appeared to be reading a >>> line at a time and parsing it, and re.compile(r'something').match() >>> would do the job; re.compile(r'^something').search() will do the job too >>> -- BECAUSE ^ means start of line anchor -- but somewhat redundantly, and >>> very inefficiently in the failing case with dopey implementations of >>> search() (which apply match() at offsets 0, 1, 2, .....). >> >> >> >> I don't see much difference. > > > and I didn't expect that you would -- like I wrote above: "Python's re's > search() is not dopey".
*ahem* C:\junk>python Python 2.4.1 (#65, Mar 30 2005, 09:13:57) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import timeit >>> t1 = timeit.Timer('re.search("^\w"," will not work")','import re') >>> t2 = timeit.Timer('re.match("\w"," will not work")','import re') >>> t3 = timeit.Timer('obj(" will not work")','import re;obj=re.compile("^\w").s earch') >>> t4 = timeit.Timer('obj(" will not work")','import re;obj=re.compile("\w").ma tch') >>> t5 = timeit.Timer('obj(" will not work qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq")' ,'import re;obj=re.compile("^\w").search') >>> t6 = timeit.Timer('obj(" will not work qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq")' ,'import re;obj=re.compile("\w").match') >>> ["%.3f" % t.timeit() for t in t1, t2, t3, t4] ['5.510', '4.835', '1.588', '1.178'] >>> ["%.3f" % t.timeit() for t in t1, t2, t3, t4] ['5.512', '4.808', '1.584', '1.170'] Observation: factoring out the compile step makes the difference much more apparent. >>> ["%.3f" % t.timeit() for t in t3, t4, t5, t6] ['1.578', '1.175', '2.283', '1.174'] >>> ["%.3f" % t.timeit() for t in t3, t4, t5, t6] ['1.582', '1.179', '2.284', '1.172'] >>> Conclusion: search time depends on length of searched string. Meta-conclusion: Either I have to retract my based-on-hope-rather-than-on-experimentation assertion, or redefine "not dopey" to mean "surely nobody would search for ^x when match x would do, so it would be dopey to optimise re for that" :-) So, back to the original point: If re.match("something") does the job you want, don't use re.search("^something") instead. -- http://mail.python.org/mailman/listinfo/python-list