Tim Chase: > In practice, however, for such small strings as the given > whitelist, the underlying find() operation likely doesn't put a > blip on the radar. If your whitelist were some huge document > that you were searching repeatedly, it could have worse > performance. Additionally, the find() in the underlying C code > is likely about as bare-metal as it gets, whereas the set > membership aspect of things may go through some more convoluted > setup/teardown/hashing and spend a lot more time further from the > processor's op-codes.
With this specific test (half good half bad), on Py2.5, on my PC, sets start to be faster than the string search when the string "good" is about 5-6 chars long (this means set are quite fast, I presume). from random import choice, seed from time import clock def main(choice=choice): seed(1) n = 100000 for good in ("ab", "abc", "abcdef", "abcdefgh", "abcdefghijklmnopqrstuvwxyz"): poss = good + good.upper() data = [choice(poss) for _ in xrange(n)] * 10 print "len(good) = ", len(good) t = clock() for c in data: c in good print round(clock()-t, 2) t = clock() sgood = set(good) for c in data: c in sgood print round(clock()-t, 2), "\n" main() Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list