Tim Chase:
> In practice, however, for such small strings as the given
> whitelist, the underlying find() operation likely doesn't put a
> blip on the radar.  If your whitelist were some huge document
> that you were searching repeatedly, it could have worse
> performance.  Additionally, the find() in the underlying C code
> is likely about as bare-metal as it gets, whereas the set
> membership aspect of things may go through some more convoluted
> setup/teardown/hashing and spend a lot more time further from the
> processor's op-codes.

With this specific test (half good half bad), on Py2.5, on my PC, sets
start to be faster than the string search when the string "good" is
about 5-6 chars long (this means set are quite fast, I presume).

from random import choice, seed
from time import clock

def main(choice=choice):
    seed(1)
    n = 100000

    for good in ("ab", "abc", "abcdef", "abcdefgh",
                 "abcdefghijklmnopqrstuvwxyz"):
        poss = good + good.upper()
        data = [choice(poss) for _ in xrange(n)] * 10
        print "len(good) = ", len(good)

        t = clock()
        for c in data:
            c in good
        print round(clock()-t, 2)

        t = clock()
        sgood = set(good)
        for c in data:
            c in sgood
        print round(clock()-t, 2), "\n"

main()


Bye,
bearophile

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to