Hello, x = x.union(set([i]) is the problem. You are creating two new data structures (set[i], and x.union) at every iteration, and I suspect union is a slow operation.
Just change this to "x.add(i)" and the script will run 20x faster than your list version! $ ./settest2.py done in 0.026 x = set() for i in range(20000): x.add(i) enjoy! --nickg On May 30, 2005, at 10:44 AM, Yair Benita wrote: > Hi All, > My research involves genomic research and the use of sets (recently > introduced in version 2.4) makes my life easier in a lot of ways. > However, I > noticed that working with large set slows the script to unbearable > speed. > > Below I compare two simple scripts, one makes use of a list and the > other > makes use of a set. The difference of 20 seconds may not be much, > but let me > tell you that this difference grows exponentially. When my sets > reach more > that 100,000 elements, a union command is painfully slow. True, a > union > command of a set may be much more complicated than using a list but > the time > difference is simply too big for me. > > Any thoughts/suggestions? > > I suppose that the obvious solution is to work with lists most of > the time > and use the sets only when I really need them. But then, what's the > point of > having those set? > > > Consider the following example just for making my point: > (sets were run using MacPython 2.4.1 on a PowerMac G4 dual 1.25) > ############################ > from time import time > > x = set([]) > prev_time = time() > initial_time = time() > for i in range(20000): > x = x.union(set([i])) > if i % 1000 == 0: > print i, "done. seconds since last: %.3f" % (time()-prev_time) > prev_time = time() > print "done in %.3f" %(time() - initial_time) > > ############################ > The output of this script is: > 0 done. seconds since last: 0.002 > 1000 done. seconds since last: 0.103 > 2000 done. seconds since last: 0.354 > 3000 done. seconds since last: 0.784 > 4000 done. seconds since last: 0.861 > 5000 done. seconds since last: 1.501 > 6000 done. seconds since last: 1.707 > 7000 done. seconds since last: 1.751 > 8000 done. seconds since last: 1.894 > 9000 done. seconds since last: 2.907 > 10000 done. seconds since last: 3.155 > 11000 done. seconds since last: 3.278 > 12000 done. seconds since last: 3.374 > 13000 done. seconds since last: 3.559 > 14000 done. seconds since last: 3.824 > 15000 done. seconds since last: 3.905 > 16000 done. seconds since last: 4.040 > 17000 done. seconds since last: 5.853 > 18000 done. seconds since last: 6.928 > 19000 done. seconds since last: 7.038 > done in 64.414 > > > Lets run the same script using a list: > ############################ > from time import time > > x = [] > prev_time = time() > initial_time = time() > for i in range(20000): > if i not in x: > x.append(i) > > if i % 1000 == 0: > print i, "done. seconds since last: %.3f" % (time()-prev_time) > prev_time = time() > > print "done in %.3f" %(time() - initial_time) > ############################ > > The output of this script is: > 0 done. seconds since last: 0.002 > 1000 done. seconds since last: 0.102 > 2000 done. seconds since last: 0.323 > 3000 done. seconds since last: 0.506 > 4000 done. seconds since last: 0.676 > 5000 done. seconds since last: 0.871 > 6000 done. seconds since last: 1.051 > 7000 done. seconds since last: 1.242 > 8000 done. seconds since last: 1.365 > 9000 done. seconds since last: 1.635 > 10000 done. seconds since last: 1.760 > 11000 done. seconds since last: 1.929 > 12000 done. seconds since last: 2.133 > 13000 done. seconds since last: 2.294 > 14000 done. seconds since last: 2.659 > 15000 done. seconds since last: 3.015 > 16000 done. seconds since last: 3.182 > 17000 done. seconds since last: 3.332 > 18000 done. seconds since last: 3.552 > 19000 done. seconds since last: 3.690 > done in 39.197 > > > > _______________________________________________ > Pythonmac-SIG maillist - Pythonmac-SIG@python.org > http://mail.python.org/mailman/listinfo/pythonmac-sig > _______________________________________________ Pythonmac-SIG maillist - Pythonmac-SIG@python.org http://mail.python.org/mailman/listinfo/pythonmac-sig