Alexander Belopolsky <belopol...@users.sourceforge.net> added the comment:
The problem is apparently due to the fact that small_set -= large_set
iterates over the large set removing entries from small_set while more
efficient small_set - large_set builds a new set iterating over a
small_set and adding items that are not in the large set. Since
lookups are O(1), it is more efficient to iterate over a small set
while looking up a large set than the other way around. I am
attaching a minimalist patch which gives up in-place update for s1 -=
s2 if len(s1) < len(s2) and effectively replaces it with s1 = s1 - s2.
The code can be further optimized by replicating set_difference logic
in set_isub instead of relying on fall back behavior, but the big
question here with whether it is acceptable to give up preserving set
identity in in-place subtract to gain performance.
----------
keywords: +patch
Added file: http://bugs.python.org/file17282/issue8425.diff
_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue8425>
_______________________________________
Index: Objects/setobject.c
===================================================================
--- Objects/setobject.c (revision 81048)
+++ Objects/setobject.c (working copy)
@@ -1612,7 +1612,10 @@
static PyObject *
set_isub(PySetObject *so, PyObject *other)
{
- if (!PyAnySet_Check(other)) {
+ if (!PyAnySet_Check(other) ||
+ /* Fall back to s = s - o if len(s) < len(o) to
+ avoid interation over a large set. */
+ PySet_GET_SIZE(so) < PySet_GET_SIZE(other)) {
Py_INCREF(Py_NotImplemented);
return Py_NotImplemented;
}
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com