[issue8425] a -= b should be fast if a is a small set and b is a large set

Alexander Belopolsky Mon, 10 May 2010 10:44:44 -0700

Alexander Belopolsky <[email protected]> added the comment:


The problem is apparently due to the fact that small_set -= large_set
iterates over the large set removing entries from small_set while more
efficient small_set - large_set builds a new set iterating over a
small_set and adding items that are not in the large set.  Since
lookups are O(1), it is more efficient to iterate over a small set
while looking up a large set than the other way around.  I am
attaching a minimalist patch which gives up in-place update for s1 -=
s2 if len(s1) < len(s2) and effectively replaces it with s1 = s1 - s2.
 The code can be further optimized by replicating set_difference logic
in set_isub instead of relying on fall back behavior, but the big
question here  with whether it is acceptable to give up preserving set
identity in in-place subtract to gain performance.

----------
keywords: +patch
Added file: http://bugs.python.org/file17282/issue8425.diff

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue8425>
_______________________________________

Index: Objects/setobject.c
===================================================================
--- Objects/setobject.c (revision 81048)
+++ Objects/setobject.c (working copy)
@@ -1612,7 +1612,10 @@
 static PyObject *
 set_isub(PySetObject *so, PyObject *other)
 {
-    if (!PyAnySet_Check(other)) {
+    if (!PyAnySet_Check(other) ||
+       /* Fall back to s = s - o if len(s) < len(o) to
+          avoid interation over a large set. */
+       PySet_GET_SIZE(so) < PySet_GET_SIZE(other)) {
         Py_INCREF(Py_NotImplemented);
         return Py_NotImplemented;
     }

_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue8425] a -= b should be fast if a is a small set and b is a large set

Reply via email to