On Thu, 14 Aug 2008 01:54:55 +0000, Steven D'Aprano wrote: > In full knowledge that Python is relatively hard to guess what is fast > compared to what is slow, I'll make my guess of fastest to slowest: > > 1. repeated replace > 2. repeated use of the form > "if ch in my_string: my_string = my_string.replace(ch, "") > 3. re.sub with literal replacement > 4. re.sub with callback (lambda m: "")
I added an extra test, which I expected to be fastest of all: using the string.translate() function. Here are my results, as generated with the timeit module under Python 2.5: $ python delchars.py Replacing 72 chars from a string of length 216 [(5.3256440162658691, 'delchars5'), (10.688904047012329, 'delchars2'), (10.85448694229126, 'delchars1'), (67.739475965499878, 'delchars3'), (120.5037829875946, 'delchars4')] Based on these results, the fastest to slowest techniques are: 1. string translate (delchars5) 2. repeated replace with a test (delchars2) 3. repeated replace without a test (delchars1) 4. re.sub with literal replacement (delchars3) 5. re.sub with callback (delchars4) However the two versions using replace are quite close, and possibly not significant. I imagine that it would be easy to find test cases where they were in the opposite order. While I'm gratified that my prediction was so close to the results I found, I welcome any suggestions to better/faster/more efficient code. Test code follows: ================================================== import re, string def delchars1(s, chars): for c in chars: s = s.replace(c, '') return s def delchars2(s, chars): for c in chars: if c in s: s = s.replace(c, '') return s def delchars3(s, chars): chars = re.escape(chars) x = re.compile(r'[%s]' % chars) return x.sub('', s) def delchars4(s, chars): chars = re.escape(chars) x = re.compile(r'[%s]' % chars) return x.sub(lambda m: '', s) def delchars5(s, chars): return string.translate(s, string.maketrans('', ''), chars) funcs = [delchars1, delchars2, delchars3, delchars4, delchars5] def test_same(s, chars, known_result): results = [f(s, chars) for f in funcs] for i in range(len(results)): if results[i] != known_result: msg = "function %s incorrectly gives %s" \ % (funcs[i], results[i]) raise AssertionError(msg) s = "abcd.abcd-abcd/abcd" chars = ".-/?" test_same(s, chars, "abcd"*4) # try something a little bigger s = s*2 + "abcd..--//" + "a.b.c.d.a-b-c-d-a/b/c/d/" s *= 3 test_same(s, chars, "abcd"*36) # now do the timing tests from timeit import Timer t1 = Timer("delchars1(s, chars)", "from __main__ import delchars1, s, chars") t2 = Timer("delchars2(s, chars)", "from __main__ import delchars2, s, chars") t3 = Timer("delchars3(s, chars)", "from __main__ import delchars3, s, chars") t4 = Timer("delchars4(s, chars)", "from __main__ import delchars4, s, chars") t5 = Timer("delchars5(s, chars)", "from __main__ import delchars5, s, chars") times = [min(t.repeat()) for t in (t1, t2, t3, t4, t5)] results = zip(times, [f.__name__ for f in funcs]) results.sort() n = sum(s.count(c) for c in chars) print "Replacing %d chars from a string of length %d" % (n, len(s)) print results ================================================== -- Steven -- http://mail.python.org/mailman/listinfo/python-list