[issue8685] set(range(100000)).difference(set()) is slow

Andrew Bennetts Wed, 12 May 2010 06:22:54 -0700

Andrew Bennetts <s...@users.sourceforge.net> added the comment:

Regarding memory, good question... but this patch turns out to be an 
improvement there too.


This optimisation only applies when len(x) > len(y) * 4.  So the minimum size 
of the result is a set with 3/4 of the elems of x (and possibly would be a full 
copy of x anyway).

So if you like this optimisation is simply taking advantage of the fact we're 
going to be copying almost all of these elements anyway.  We could make it less 
aggressive, but large sets are tuned to be between 1/2 and 1/3 empty internally 
anyway, so 1/4 overhead seems reasonable.

Also, because this code immediately makes the result set be about the right 
size, rather than growing it one element at a time, the memory consumption is 
actually *better*.  I'll attach a script that demonstrates this; for me it 
shows that large_set.difference(small_set) [where large_set has 4M elems, 
small_set has 100] peaks at 50MB memory consumption without my patch, but only 
18MB with.  (after discounting the memory required for large_set itself, etc.)

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue8685>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue8685] set(range(100000)).difference(set()) is slow

Reply via email to